You are building a reconciliation tool that joins two CSV-style datasets — one coming from Stripe's internal customer ledger, the other from an external payment processor — so downstream analytics can compare and match records between the two datasets.
Given two CSV files, ledger.csv and processor.csv, implement a function that joins these two datasets on a common key (e.g., customer ID). The function should return a new CSV file with the combined records from both datasets.
customer_id in both datasets.Input: ` ledger.csv: customer_id,amount,date 1,100,2022-01-01 2,200,2022-01-02 3,300,2022-01-03
processor.csv: customer_id,transaction_id,amount 1,101,150 2,102,250 3,103,350 `
Output:
customer_id,amount,date,transaction_id 1,100,2022-01-01,101 2,200,2022-01-02,102 3,300,2022-01-03,103
Input: ` ledger.csv: customer_id,amount,date 1,100,2022-01-01 2,200,2022-01-02
processor.csv: customer_id,transaction_id,amount 1,101,150 2,102,250 3,103,350 `
Output:
customer_id,amount,date,transaction_id 1,100,2022-01-01,101 2,200,2022-01-02,102
customer_id may not be present in both datasets.Here's a sample Python solution using the pandas library:
`python import pandas as pd
def join_datasets(ledger_path, processor_path): # Read the CSV files into DataFrames ledger_df = pd.read_csv(ledger_path) processor_df = pd.read_csv(processor_path)
# Join the DataFrames on the 'customer_id' column
result_df = pd.merge(ledger_df, processor_df, on='customer_id', how='inner')
# Save the result to a new CSV file
result_df.to_csv('joined.csv', index=False)
join_datasets('ledger.csv', 'processor.csv') `
This solution reads the input CSV files into pandas DataFrames, joins them on the customer_id column, and saves the result to a new CSV file. The how='inner' parameter ensures that only matching records are included in the output.