Joining Records
Left Join with Missing
Join two tables by a shared key, keeping every left-side row regardless of
whether the key exists in the right table. Unmatched rows receive None for
the right-side value. The replay shows result growing one pair per order,
with (3, None) appearing for the unmatched key 'x'.
By hand
The Pythonic way
A list comprehension over zip(order_ids, order_ckeys) with index.get(k)
handles matched and unmatched rows in one expression, with no filter clause.
naive.py
order_ids = [1, 2, 3, 4, 5]
order_ckeys = ['a', 'b', 'x', 'c', 'a']
cust_keys = ['a', 'b', 'c', 'd']
cust_names = ['alice', 'bob', 'carol', 'dave']
index = {}
for k, name in zip(cust_keys, cust_names):
index[k] = name
result = []
for oid, k in zip(order_ids, order_ckeys):
result.append((oid, index.get(k)))
print('RESULT:', result)
library.py
order_ids = [1, 2, 3, 4, 5]
order_ckeys = ['a', 'b', 'x', 'c', 'a']
cust_keys = ['a', 'b', 'c', 'd']
cust_names = ['alice', 'bob', 'carol', 'dave']
index = {k: name for k, name in zip(cust_keys, cust_names)}
result = [(oid, index.get(k)) for oid, k in zip(order_ids, order_ckeys)]
print('RESULT:', result)
RESULT: [(1, 'alice'), (2, 'bob'), (3, None), (4, 'carol'), (5, 'alice')]
Implementation notes
- The key difference from
inner-join-by-key: the inner join drops row 3 entirely; the left join keeps it withNone. Row count is always preserved for the left table in a left join. index.get(k)returnsNoneby default; pass a second argument (index.get(k, 'unknown')) to substitute a different sentinel.- This is the mechanism behind
pd.merge(orders, customers, on='key', how='left')in thepython-pandastrack — mechanism here, API there. - The simpler loop (no
ifbranch) also produces fewer trace events than the inner join version for the same dataset.