Problem Overview
The "Stack Traces Parser" problem from Anthropic interviews involves converting a sequence of stack samples—each with a timestamp and a stack of function names (outermost to innermost)—into a chronological list of start and end events for visualization in a trace UI. Samples are ordered by increasing timestamp, and unlimited execution can occur between them. The output events must ensure nested function end events precede enclosing ones (LIFO order), with unfinished frames from the last sample assumed ongoing. Tags like Stack, Tree, Parsing, and Distributed Systems highlight needs for stack/tree data structures, diffing consecutive samples, and scalability.[1]
Full Problem Statement
Given a list of Sample structs:
struct Sample { double ts; // timestamp std::vector<std::string> stack; // function names, index 0 = outermost (e.g., "main") };
Implement std::vector<Event> convertToTrace(const std::vector<Sample> &samples) where:
struct Event { std::string kind; // "start" or "end" double ts; // event timestamp std::string name; // function name };
- Interpolate events between samples by comparing stacks: start new inner functions, end popped ones.
- Assign timestamps from the sample where the change is first observed.
- For identical consecutive stacks, no events (steady state).
- Last sample's stack generates no ends (ongoing).
- Handle recursion by position/depth, not just name (e.g., distinguish multiple "a" calls).[1]
Core Algorithm
Process samples iteratively:
- Track current active stack as a vector or tree of functions with depths.
- For each new sample:
- Find longest common prefix length
l between current and new stack.
- End functions from current[l:] in reverse order (innermost first) at new
ts.
- Start new functions from new[l:] in order at new
ts.
- Update current stack to new stack.
This builds a properly nested event sequence.[1]
Input/Output Examples
No complete canonical examples in sources, but illustrative fragments:
`
Input samples:
- Sample{ts: 1.0, stack: ["main"]}
- Sample{ts: 2.5, stack: ["main", "func1"]}
- Sample{ts: 5.0, stack: ["main"]} // func1 ended
Expected output events:
start 1.0 main
start 2.5 func1
end 5.0 func1
// "main" has no end (last sample) [web:1]
For recursion:{ts: 0.0, stack: ["a", "b", "a", "c"]}` treats inner "a" separately by position. Identical stacks produce no events.[1]
Constraints
- Timestamps: floats, strictly increasing.
- Stack depths: unspecified, but recursion/depth differentiation required (use indices/trees).
- Functions: unique by name+position; no max names/lengths stated.
- Efficiency: O(total stack frames across samples); distributed follow-ups imply scalability for massive traces (e.g., MapReduce for mode-finding on large datasets).[1]
- Edge cases: empty samples, single sample (all starts, no ends), identical stacks, unbalanced changes.
Follow-ups
- Identify traces repeating >N times consecutively.
- Denoising: Skip calls shorter than N duration.
- Filter calls appearing <N consecutive times.
- Distributed: Parallelize over machines for mode stacks or large-scale processing.[1]