ML Coding - Implement TF-IDF from Scratch

[ OK ] c5122d6c-0198-4d3f-80b8-3df8c68ea565 — full content available

[ INFO ] category: Coding · Ml Coding difficulty: unknown freq: first seen: 2026-04-22

[UNKNOWN][ML CODING]

$ cat problem.md

Problem Statement: Implement TF-IDF from Scratch

Implement TF-IDF (Term Frequency-Inverse Document Frequency) from scratch in a programming language of your choice. TF-IDF is a numerical statistic intended to reflect how important a word is to a document in a collection or corpus.

Constraints:

The input is a list of documents, where each document is a list of words.
The output should be a matrix where each row corresponds to a document and each column corresponds to a word in the vocabulary.
The TF-IDF value for each word in a document should be calculated using the formula: TF-IDF = TF * IDF.

Examples:

Input: documents = [ ["apple", "banana", "apple"], ["banana", "orange", "apple", "banana", "banana"] ]
Output: [ [0.81, 0.40, 0.40], # Document 1: TF-IDF for "apple", "banana", "orange" [0.20, 0.60, 0.20] # Document 2: TF-IDF for "apple", "banana", "orange" ]

Hints:

Calculate the term frequency (TF) for each word in a document.
Calculate the inverse document frequency (IDF) for each word across all documents.
Combine TF and IDF to get the TF-IDF score for each word in each document.

Search Results:

1. DarkInterview URL:

The problem statement is directly provided from the DarkInterview URL: ML Coding - Implement TF-IDF from Scratch.

2. Reddit (r/cscareerquestions, r/leetcode, r/csMajors), 1point3acres, PracHub, Glassdoor, Blind, GitHub, and Interview Prep Sites:

No additional information or variations of the problem were found in the searches.

Conclusion:

The problem statement is complete and can be directly used for interview preparation. The constraints, examples, and hints are provided to guide the implementation of the TF-IDF from scratch.

user@intervues:~/apple$