Inner Join by Key - Data Handling in Pure Python

Join two tables by a shared key, keeping only rows where the key exists in both. A first pass builds an index dict from the smaller table; a second pass walks the larger table and appends matched pairs, silently skipping any key that is absent from the index. Order 3 (key 'x') has no matching customer and is dropped.

By hand

The Pythonic way

Build the index with a dict comprehension, then produce result with a list comprehension that folds the lookup and the if k in index filter into one expression.

naive.py

order_ids = [1, 2, 3, 4, 5]
order_ckeys = ['a', 'b', 'x', 'c', 'a']
cust_keys = ['a', 'b', 'c', 'd']
cust_names = ['alice', 'bob', 'carol', 'dave']
index = {}
for k, name in zip(cust_keys, cust_names):
    index[k] = name
result = []
for oid, k in zip(order_ids, order_ckeys):
    if k in index:
        result.append((oid, index[k]))
print('RESULT:', result)

library.py

order_ids = [1, 2, 3, 4, 5]
order_ckeys = ['a', 'b', 'x', 'c', 'a']
cust_keys = ['a', 'b', 'c', 'd']
cust_names = ['alice', 'bob', 'carol', 'dave']
index = {k: name for k, name in zip(cust_keys, cust_names)}
result = [(oid, index[k]) for oid, k in zip(order_ids, order_ckeys) if k in index]
print('RESULT:', result)

RESULT: [(1, 'alice'), (2, 'bob'), (4, 'carol'), (5, 'alice')]

Implementation notes

Building the index first makes the join O(n + m) overall: O(m) to build the index, then O(n) for the order pass. A nested-loop approach would be O(n × m).
This is the same mechanism as pd.merge(orders, customers, on='key', how='inner') in the python-pandas track — mechanism here, API there.
An outer join variant would use index.get(k, None) and keep all orders, substituting None for unmatched customers.
Key 'x' in order 3 has no entry in the customer table; the if k in index guard drops it silently, which is the defining behaviour of an inner join.