Practice/Stripe/Business Account Data Verification
Business Account Data Verification
CodingMust
Problem Overview
You are building a KYC (Know Your Customer) verification system for onboarding business accounts. The system processes business information provided in CSV format and determines whether each account should be marked as VERIFIED or flagged with a specific error code based on progressively complex validation rules.
The problem is divided into five progressive parts, each introducing additional verification criteria:
- Part 1 -- Complete Field Validation: Ensure all required business information is provided
- Part 2 -- Descriptor Length Constraints: Validate that statement descriptors meet character length requirements
- Part 3 -- Generic Name Detection: Flag businesses using prohibited generic terms
- Part 4 -- Name Consistency Check: Verify that business names are consistent across different fields
- Part 5 -- Error Code Classification: Return specific error codes for different validation failures
Input Format
You will receive a CSV-formatted string representing a dataset of business accounts. The first row is the header. Each subsequent row represents one business account with the following 6 columns:
- col1: Business ID
- col2: Business legal name
- col3: Business website URL
- col4: Short statement descriptor
- col5: Full statement descriptor
- col6: Product description
csv_data = """col1,col2,col3,col4,col5,col6 BIZ001,Pawsome Pets Inc.,pawsomepets.com,Pawsome,PAWSOME PETS INC,Premium pet supplies BIZ002,Bean Bliss Coffee,beanbliss.com,,,Artisan coffee roasters"""
Output Format
For each business account (excluding the header row), output one line in the format:
`
"VERIFIED: Pawsome Pets Inc."
or
"NOT VERIFIED: Bean Bliss Coffee"
or (Part 5)
"ERROR_MISSING_FIELDS: Bean Bliss Coffee"
`
Output should maintain the original order of rows from the input.
Important Notes
- There are multiple parts to this problem -- ask the interviewer how many parts there are to better manage your time
- Start with simple validation logic and extend it as parts get more complex
- Write your own test cases and ensure your code compiles and runs correctly
Part 1: Complete Field Validation
Implement a function validate_businesses(csv_data) that verifies each business has provided all required fields. A business account can only be VERIFIED if every field contains a non-empty value. Fields containing only whitespace should be treated as empty.
Example
`
csv_data = """col1,col2,col3,col4,col5,col6
BIZ001,Pawsome Pets Inc.,pawsomepets.com,Pawsome,PAWSOME PETS INC,Premium pet supplies
BIZ002,Bean Bliss Coffee,beanbliss.com,,,Artisan coffee roasters
BIZ003,,,,,
BIZ004,Tech Solutions,techsol.io,Tech,TECH SOLUTIONS,Software consulting"""
validate_businesses(csv_data)
Output:
VERIFIED: Pawsome Pets Inc.
NOT VERIFIED: Bean Bliss Coffee
NOT VERIFIED:
VERIFIED: Tech Solutions
`
- BIZ001: All 6 fields are non-empty -- VERIFIED
- BIZ002: col4 and col5 are empty -- NOT VERIFIED
- BIZ003: Only col1 is provided -- NOT VERIFIED (business name is empty, so output shows empty)
- BIZ004: All fields complete -- VERIFIED
Requirements
- All 6 columns must contain non-empty values after stripping whitespace
- Skip the header row when processing
- Maintain the original order of businesses in the output
- Handle rows where the business name (col2) itself is empty
Part 2: Statement Descriptor Length Validation
Extend your solution to validate that the full statement descriptor (col5) meets character length requirements. The descriptor must be between 5 and 31 characters (inclusive) after trimming whitespace.
Accounts failing this check should be marked NOT VERIFIED, even if all other validations pass.
Example
`
csv_data = """col1,col2,col3,col4,col5,col6
BIZ001,Pawsome Pets Inc.,pawsomepets.com,Pawsome,PAWSOME PETS INC,Premium pet supplies
BIZ002,Bean Bliss Coffee,beanbliss.com,Bean,Bean,Artisan coffee roasters
BIZ003,Oakridge Furniture,oakridge.com,Oak,OAKRIDGE CUSTOM WOODWORKING AND FURNITURE EMPORIUM,Custom furniture
BIZ004,Tech Solutions,techsol.io,Tech,ITCS,Software consulting"""
validate_businesses(csv_data)
Output:
VERIFIED: Pawsome Pets Inc.
NOT VERIFIED: Bean Bliss Coffee
NOT VERIFIED: Oakridge Furniture
NOT VERIFIED: Tech Solutions
`
- BIZ001: Length of "PAWSOME PETS INC" is 16 (valid range) -- VERIFIED
- BIZ002: Length of "Bean" is 4 (below minimum 5) -- NOT VERIFIED
- BIZ003: Length of descriptor is 50 (exceeds maximum 31) -- NOT VERIFIED
- BIZ004: Length of "ITCS" is 4 (below minimum 5) -- NOT VERIFIED
Requirements
- Strip whitespace from col5 before checking length
- Valid length range: 5 to 31 characters inclusive on both ends
- This check is performed in addition to the Part 1 validation
- Descriptors outside this range cause automatic verification failure
Part 3: Generic Business Name Blocklist
Add validation to block generic business names that appear in the full statement descriptor (col5). Accounts using any of the following prohibited terms should be marked NOT VERIFIED:
Blocked Terms (case-insensitive): ONLINE STORE, ECOMMERCE, RETAIL, SHOP, GENERAL MERCHANDISE
The check is case-insensitive, meaning "Online Store", "ONLINE STORE", and "online store" all match.
Example
`
csv_data = """col1,col2,col3,col4,col5,col6
BIZ001,Pawsome Pets Inc.,pawsomepets.com,Pawsome,PAWSOME PETS INC,Premium pet supplies
BIZ002,Global Goods Market,globalgoods.com,Global,ONLINE STORE,Various products
BIZ003,Northwest Tech,nwtech.com,NW Tech,NORTHWEST INNOVATION TECH,Technology solutions
BIZ004,Sweet Dreams,sweetdreams.com,Sweet,SWEET DREAMS CREAMERY,Ice cream shop"""
validate_businesses(csv_data)
Output:
VERIFIED: Pawsome Pets Inc.
NOT VERIFIED: Global Goods Market
VERIFIED: Northwest Tech
VERIFIED: Sweet Dreams
`
- BIZ001: "PAWSOME PETS INC" contains no blocked terms -- VERIFIED
- BIZ002: "ONLINE STORE" is a blocked term -- NOT VERIFIED
- BIZ003: "NORTHWEST INNOVATION TECH" contains no blocked terms -- VERIFIED
- BIZ004: "SWEET DREAMS CREAMERY" contains no blocked terms (note: col6 has "shop" but we only check col5) -- VERIFIED
Requirements
- Check col5 (full descriptor) for blocked terms
- Matching is case-insensitive
- This validation applies in addition to Part 1 and Part 2 checks
Part 4: Business Name Consistency Validation
Add validation to ensure the business name (col2) is consistent with either the short descriptor (col4) or full descriptor (col5). The business name must share at least 50% of its words with one of these descriptor fields.
Word Matching Rules:
- Split col2, col4, and col5 into words using whitespace as the delimiter
- Ignore the words "LLC" and "Inc" (case-insensitive) when comparing
- Comparison is case-insensitive
- At least 50% of words from col2 must appear in either col4 or col5
Example
`
csv_data = """col1,col2,col3,col4,col5,col6
BIZ001,land water,landwater.com,land,land water LLC,Environmental services
BIZ002,Acme Global Trading,acme.com,Acme,XYZ ENTERPRISES,Import export services
BIZ003,Maple Ridge Bakery,maplebakery.com,Maple,MAPLE RIDGE BAKERY LLC,Artisan baked goods
BIZ004,Innovation Labs Inc,innovlabs.com,Labs,INNOVATION RESEARCH,R&D services"""
validate_businesses(csv_data)
Output:
VERIFIED: land water
NOT VERIFIED: Acme Global Trading
VERIFIED: Maple Ridge Bakery
VERIFIED: Innovation Labs Inc
`
- BIZ001: "land water" has 2 words. Both appear in col5 "land water LLC" (ignoring LLC). Match: 2/2 = 100% -- VERIFIED
- BIZ002: "Acme Global Trading" has 3 words. Only "Acme" appears in col4. Match: 1/3 = 33% which is below 50% -- NOT VERIFIED
- BIZ003: "Maple Ridge Bakery" has 3 words. All 3 appear in col5 (ignoring LLC). Match: 3/3 = 100% -- VERIFIED
- BIZ004: "Innovation Labs Inc" has 3 words, but ignoring "Inc" leaves 2 words. Both "Innovation" and "Labs" appear across col4 and col5. Match: 2/2 = 100% -- VERIFIED
Requirements
- Split fields into words using whitespace
- Remove "LLC" and "Inc" (case-insensitive) from all fields before comparison
- Perform case-insensitive word matching
- At least 50% of col2 words must match words in col4 OR col5 (combined)
- This validation applies in addition to all previous checks
Part 5: Error Code Classification
Instead of outputting generic "NOT VERIFIED" messages, return specific error codes based on the validation failure. If multiple validations fail, return the error code for the first failure encountered.
Validation Order and Error Codes:
- Empty fields -- ERROR_MISSING_FIELDS
- Invalid descriptor length -- ERROR_INVALID_LENGTH
- Blocked generic term -- ERROR_GENERIC_NAME
- Insufficient name consistency -- ERROR_NAME_MISMATCH
Example
`
csv_data = """col1,col2,col3,col4,col5,col6
BIZ001,Pawsome Pets Inc.,pawsomepets.com,Pawsome,PAWSOME PETS INC,Premium pet supplies
BIZ002,Bean Bliss Coffee,beanbliss.com,,,Artisan coffee roasters
BIZ003,Short Name,short.com,Short,SHRT,Products
BIZ004,Generic Store,generic.com,Store,RETAIL,Various items
BIZ005,Mismatched Corp,mismatch.com,Wrong,DIFFERENT BUSINESS,Services"""
validate_businesses(csv_data)
Output:
VERIFIED: Pawsome Pets Inc.
ERROR_MISSING_FIELDS: Bean Bliss Coffee
ERROR_INVALID_LENGTH: Short Name
ERROR_GENERIC_NAME: Generic Store
ERROR_NAME_MISMATCH: Mismatched Corp
`
- BIZ001: Passes all validations -- VERIFIED
- BIZ002: Missing col4 and col5 (first failure) -- ERROR_MISSING_FIELDS
- BIZ003: "SHRT" has length 4, below minimum 5 (first failure after field check) -- ERROR_INVALID_LENGTH
- BIZ004: Contains blocked term "RETAIL" (first failure after length check) -- ERROR_GENERIC_NAME
- BIZ005: No words match between "Mismatched Corp" and descriptors -- ERROR_NAME_MISMATCH
Requirements
- Check validations in the specified order
- Return the error code for the first validation that fails
- Only return VERIFIED if all validations pass
- Maintain the same output format: STATUS followed by colon and business name