User Record Similarity Matcher is a Stripe interview problem that builds a graph from user records based on field similarity scores, then finds connected users using graph algorithms like BFS or Union-Find.[1]
Given a list of user records rows, each with fields id (unique integer), name (string), email (string), and company (string), plus a weights map (field name to non-negative float, e.g., {name: 0.2, email: 0.5, company: 0.3}) and a float threshold (0 ≤ threshold ≤ 1), define similarity score between records A and B as the sum of weights[field] for matching fields (exact string equality). Records are directly linked if score ≥ threshold (undirected relation).[1]
The core task constructs an undirected graph with records as nodes and direct links as edges. For a target_user_id, progressively extended requirements are:
Suitable function signatures:
find_direct_links(rows, weights, threshold, target_user_id) -> List[int] find_all_linked(rows, weights, threshold, target_user_id) -> List[int]
Assumptions: Unique IDs; rows fit in memory (no distributed design needed). Tests Hash Table (adjacency list, visited sets), Graph (construction/traversal), Union Find (component merging), BFS (reachability/flood fill).[1]
rows = [ { id: 1, name: "Alice", email: "alice@gmail.com", company: "Stripe" }, { id: 2, name: "Alicia", email: "alice@gmail.com", company: "Stripe" }, { id: 3, name: "Alice", email: "alice@yahoo.com", company: "Google" } ] weights = {name: 0.2, email: 0.5, company: 0.3} threshold = 0.5, target_user_id = 1
find_all_linked, output includes {2, 3} if further links exist; here direct is {2}, but example notes 3 as one-hop via 2 (implying additional setup).[1]No explicit numeric bounds given, but implied: N = |rows| reasonable for O(N²) similarity checks (e.g., N ≤ 10⁴ typical for interviews); strings short; weights sum ≤ 1; 0 ≤ threshold ≤ 1; IDs unique positives. Graph may have up to O(N²) edges (dense case). [1]