You are building a KYC (Know Your Customer) verification system for onboarding business accounts. The system processes business information provided in CSV format and determines whether each account should be marked as VERIFIED or flagged with a specific error code based on progressively complex validation rules.
The problem is divided into five progressive parts, each introducing additional verification criteria:
Part 1 -- Complete Field Validation: Ensure all required business information is provided
Part 2 -- Descriptor Length Constraints: Validate that statement descriptors meet character length requirements
Part 3 -- Generic Name Detection: Flag businesses using prohibited generic terms
Part 4 -- Name Consistency Check: Verify that business names are consistent across different fields
Part 5 -- Error Code Classification: Return specific error codes for different validation failures
You will receive a CSV-formatted string representing a dataset of business accounts. The first row is the header. Each subsequent row represents one business account with the following 6 columns:
col1: Business ID
col2: Business legal name
col3: Business website URL
col4: Short statement descriptor
col5: Full statement descriptor
col6: Product description
csv_data = """col1,col2,col3,col4,col5,col6 BIZ001,Pawsome Pets Inc.,pawsomepets.com,Pawsome,PAWSOME PETS INC,Premium pet supplies BIZ002,Bean Bliss Coffee,beanbliss.com,,,Artisan coffee roasters"""
For each business account (excluding the header row), output one line in the format:
` "VERIFIED: Pawsome Pets Inc."
"NOT VERIFIED: Bean Bliss Coffee"
"ERROR_MISSING_FIELDS: Bean Bliss Coffee" `
Output should maintain the original order of rows from the input.
There are multiple parts to this problem -- ask the interviewer how many parts there are to better manage your time
Start with simple validation logic and extend it as parts get more complex
Write your own test cases and ensure your code compiles and runs correctly
Implement a function validate_businesses(csv_data) that verifies each business has provided all required fields. A business account can only be VERIFIED if every field contains a non-empty value. Fields containing only whitespace should be treated as empty.
` csv_data = """col1,col2,col3,col4,col5,col6 BIZ001,Pawsome Pets Inc.,pawsomepets.com,Pawsome,PAWSOME PETS INC,Premium pet supplies BIZ002,Bean Bliss Coffee,beanbliss.com,,,Artisan coffee roasters BIZ003,,,,, BIZ004,Tech Solutions,techsol.io,Tech,TECH SOLUTIONS,Software consulting"""
validate_businesses(csv_data)
`
BIZ001: All 6 fields are non-empty -- VERIFIED
BIZ002: col4 and col5 are empty -- NOT VERIFIED
BIZ003: Only col1 is provided -- NOT VERIFIED (business name is empty, so output shows empty)
BIZ004: All fields complete -- VERIFIED
Extend your solution to validate that the full statement descriptor (col5) meets character length requirements. The descriptor must be between 5 and 31 characters (inclusive) after trimming whitespace.
Accounts failing this check should be marked NOT VERIFIED, even if all other validations pass.
` csv_data = """col1,col2,col3,col4,col5,col6 BIZ001,Pawsome Pets Inc.,pawsomepets.com,Pawsome,PAWSOME PETS INC,Premium pet supplies BIZ002,Bean Bliss Coffee,beanbliss.com,Bean,Bean,Artisan coffee roasters BIZ003,Oakridge Furniture,oakridge.com,Oak,OAKRIDGE CUSTOM WOODWORKING AND FURNITURE EMPORIUM,Custom furniture BIZ004,Tech Solutions,techsol.io,Tech,ITCS,Software consulting"""
validate_businesses(csv_data)
`
BIZ001: Length of "PAWSOME PETS INC" is 16 (valid range) -- VERIFIED
BIZ002: Length of "Bean" is 4 (below minimum 5) -- NOT VERIFIED
BIZ003: Length of descriptor is 50 (exceeds maximum 31) -- NOT VERIFIED
BIZ004: Length of "ITCS" is 4 (below minimum 5) -- NOT VERIFIED
Strip whitespace from col5 before checking length
Valid length range: 5 to 31 characters inclusive on both ends
This check is performed in addition to the Part 1 validation
Descriptors outside this range cause automatic verification failure
Add validation to block generic business names that appear in the full statement descriptor (col5). Accounts using any of the following prohibited terms should be marked NOT VERIFIED:
The check is case-insensitive, meaning "Online Store", "ONLINE STORE", and "online store" all match.
` csv_data = """col1,col2,col3,col4,col5,col6 BIZ001,Pawsome Pets Inc.,pawsomepets.com,Pawsome,PAWSOME PETS INC,Premium pet supplies BIZ002,Global Goods Market,globalgoods.com,Global,ONLINE STORE,Various products BIZ003,Northwest Tech,nwtech.com,NW Tech,NORTHWEST INNOVATION TECH,Technology solutions BIZ004,Sweet Dreams,sweetdreams.com,Sweet,SWEET DREAMS CREAMERY,Ice cream shop"""
validate_businesses(csv_data)
`
BIZ001: "PAWSOME PETS INC" contains no blocked terms -- VERIFIED
BIZ002: "ONLINE STORE" is a blocked term -- NOT VERIFIED
BIZ003: "NORTHWEST INNOVATION TECH" contains no blocked terms -- VERIFIED
BIZ004: "SWEET DREAMS CREAMERY" contains no blocked terms (note: col6 has "shop" but we only check col5) -- VERIFIED
Check col5 (full descriptor) for blocked terms
Matching is case-insensitive
This validation applies in addition to Part 1 and Part 2 checks
Add validation to ensure the business name (col2) is consistent with either the short descriptor (col4) or full descriptor (col5). The business name must share at least 50% of its words with one of these descriptor fields.
Split col2, col4, and col5 into words using whitespace as the delimiter
Ignore the words "LLC" and "Inc" (case-insensitive) when comparing
Comparison is case-insensitive
At least 50% of words from col2 must appear in either col4 or col5
BIZ001: "land water" has 2 words. Both appear in col5 "land water LLC" (ignoring LLC). Match: 2/2 = 100% -- VERIFIED
BIZ002: "Acme Global Trading" has 3 words. Only "Acme" appears in col4. Match: 1/3 = 33% which is below 50% -- NOT VERIFIED
BIZ003: "Maple Ridge Bakery" has 3 words. All 3 appear in col5 (ignoring LLC). Match: 3/3 = 100% -- VERIFIED
BIZ004: "Innovation Labs Inc" has 3 words, but ignoring "Inc" leaves 2 words. Both "Innovation" and "Labs" appear across col4 and col5. Match: 2/2 = 100% -- VERIFIED
Split fields into words using whitespace
Remove "LLC" and "Inc" (case-insensitive) from all fields before comparison
Perform case-insensitive word matching
At least 50% of col2 words must match words in col4 OR col5 (combined)
This validation applies in addition to all previous checks
Instead of outputting generic "NOT VERIFIED" messages, return specific error codes based on the validation failure. If multiple validations fail, return the error code for the first failure encountered.
Empty fields -- ERROR_MISSING_FIELDS
Invalid descriptor length -- ERROR_INVALID_LENGTH
Blocked generic term -- ERROR_GENERIC_NAME
Insufficient name consistency -- ERROR_NAME_MISMATCH
BIZ001: Passes all validations -- VERIFIED
BIZ002: Missing col4 and col5 (first failure) -- ERROR_MISSING_FIELDS
BIZ003: "SHRT" has length 4, below minimum 5 (first failure after field check) -- ERROR_INVALID_LENGTH
BIZ004: Contains blocked term "RETAIL" (first failure after length check) -- ERROR_GENERIC_NAME
BIZ005: No words match between "Mismatched Corp" and descriptors -- ERROR_NAME_MISMATCH
Check validations in the specified order
Return the error code for the first validation that fails
Only return VERIFIED if all validations pass
Maintain the same output format: STATUS followed by colon and business name