Software Engineer · Full Journey · Coding — Snowflake

Snowflake — Software Engineer ✅ Passed

Level: Unknown Level

Round: Full Journey · Type: Coding · Difficulty: 7/10 · Duration: 180 min · Interviewer: Neutral

Topics: Hashing, Two Pointers, Aho-Corasick Algorithm, String Matching

Location: San Francisco Bay Area

Interview date: 2026-01-13

Summary

Round 1: Basic Solution (Hash Approach)

Question: Implement recipe matching using a rolling hash strategy.

Round 2: Low Memory Constraint

Question: Implement recipe matching using O(1) extra space.

Round 3: Streaming Data

Question: Implement recipe matching for streaming data using the Aho-Corasick algorithm.

Details

The problem is to determine if a list of recipes (each a list of ingredients) exists as a contiguous sequence within a main list of ingredients.

Part 1: Basic Solution (Hash Approach)

The initial solution uses a rolling hash to preprocess the ingredients list. I convert each ingredient into a number and compute hash codes for all possible sub-sequences, storing them in a hash map. To check a recipe, I compute its hash code and check if it exists in the map. The code provided uses two different prime numbers and bases to minimize collisions:

` from collections import defaultdict

class RecipeMatcher: _MOD1 = 1_000_000_007 _MOD2 = 1_000_000_009 _BASE1 = 911_382_323 _BASE2 = 972_663_749

def __init__(self, ingredients: list[str]):
    self._token_id: dict[str, int] = {}
    self._next_id = 1
    self._hashes_by_len: dict[int, set[tuple[int, int]]] = defaultdict(set)

    for token in ingredients:
        self._get_token_id(token)

    n = len(ingredients)
    for i in range(n):
        h1 = 0
        h2 = 0
        for j in range(i, n):
            token_value = self._get_token_id(ingredients[j])
            h1 = (h1 * self._BASE1 + token_value) % self._MOD1
            h2 = (h2 * self._BASE2 + token_value) % self._MOD2
            self._hashes_by_len[j - i + 1].add((h1, h2))

def _get_token_id(self, token: str) -> int:
    if token not in self._token_id:
        self._token_id[token] = self._next_id
        self._next_id += 1
    return self._token_id[token]

def can_make(self, recipe: list[str]) -> bool:
    if not recipe:
        return True

    h1 = 0
    h2 = 0
    for token in recipe:
        token_value = self._token_id.get(token)
        if token_value is None:
            # If an ingredient is missing from main list, recipe fails.
            return False
        h1 = (h1 * self._BASE1 + token_value) % self._MOD1
        h2 = (h2 * self._BASE2 + token_value) % self._MOD2

    return (h1, h2) in self._hashes_by_len[len(recipe)]

def match_recipes(ingredients: list[str], recipes: list[list[str]]) -> list[bool]: matcher = RecipeMatcher(ingredients) return [matcher.can_make(recipe) for recipe in recipes] `

Part 2: Low Memory Constraint

To reduce memory usage to O(1), I used a two-pointer approach. For each recipe, I iterate through the ingredients list using a sliding window. If a match is found, I return True; otherwise, I continue sliding the window. The code is as follows:

` def contains_recipe_o1_space(ingredients: list[str], recipe: list[str]) -> bool: n = len(ingredients) m = len(recipe)

if m == 0:
    return True
if m > n:
    return False

start = 0
while start <= n - m:
    i = start
    j = 0
    while j < m and ingredients[i] == recipe[j]:
        i += 1
        j += 1
    if j == m:
        return True
    start += 1
return False

def match_recipes_o1_space( ingredients: list[str], recipes: list[list[str]] ) -> list[bool]: return [contains_recipe_o1_space(ingredients, recipe) for recipe in recipes] `

Part 3: Streaming Data

For streaming data, I implemented the Aho-Corasick algorithm. This involves building a Trie from the recipes and creating fail links to handle partial matches efficiently. As ingredients arrive, I traverse the Trie and mark recipes as found when reaching the end of a recipe node. The code to build the trie and process the stream is given below:

` from collections import deque

class StreamRecipeMatcher: def init(self, recipes: list[list[str]]): self._recipes = recipes self._children = [dict()] # node -> {token: next_node} self._fail = [0] self._out = [[]] # node -> recipe ids ending here self._build_trie() self._build_fail_links()

def _new_node(self) -> int:
    self._children.append({})
    self._fail.append(0)
    self._out.append([])
    return len(self._children) - 1

def _build_trie(self) -> None:
    for recipe_id, recipe in enumerate(self._recipes):
        node = 0
        for token in recipe:
            if token not in self._children[node]:
                self._children[node][token] = self._new_node()
            node = self._children[node][token]
        self._out[node].append(recipe_id)

def _build_fail_links(self) -> None:
    q = deque()
    for _, nxt in self._children[0].items():
        self._fail[nxt] = 0
        q.append(nxt)

    while q:
        node = q.popleft()
        for token, nxt in self._children[node].items():
            f = self._fail[node]
            while f and token not in self._children[f]:
                f = self._fail[f]
            self._fail[nxt] = self._children[f].get(token, 0)
            self._out[nxt].extend(self._out[self._fail[nxt]])
            q.append(nxt)

def match_stream(self, ingredient_stream) -> list[bool]:
    matched = [False] * len(self._recipes)
    remaining = len(self._recipes)
    state = 0

    # Empty recipes match immediately, even if the stream has no tokens.
    for recipe_id in self._out[0]:
        if not matched[recipe_id]:
            matched[recipe_id] = True
            remaining -= 1
    if remaining == 0:
        return matched

    for token in ingredient_stream:
        while state and token not in self._children[state]:
            state = self._fail[state]
        state = self._children[state].get(token, 0)

        for recipe_id in self._out[state]:
            if not matched[recipe_id]:
                matched[recipe_id] = True
                remaining -= 1

        if remaining == 0:
            break

    return matched