Software Engineer · Onsite · Coding — OpenAI

OpenAI — Software Engineer

Level: Unknown Level

Round: Onsite · Type: Coding · Difficulty: 6/10 · Duration: 60 min · Interviewer: Neutral

Topics: Sorting, Sharding

Location: San Francisco Bay Area

Interview date: 2025-12-26

Summary

I had to implement a shard management system that rebalances data across multiple shards, given a limit on the maximum number of shards covering a single key. The task involved resolving overlapping intervals, filling gaps, and adhering to specific rules for processing order, overlap resolution, and gap filling.

Details

The coding question I got was:

` You are implementing a shard management system for a distributed key-value store. The system distributes data across multiple shards given in the format "<id>:<start>:<end>". Each shard is responsible for a contiguous range of integer keys, represented by an inclusive interval [start, end].

To balance the load effectively, the system enforces a limit on the maximum number of shards that can cover any single key. Your task is to implement the rebalance(limit, shards) method, which resolves overlapping intervals that exceed the limit and fills any gaps between the overall covered ranges.

To minimize data movement across the distributed system, you must follow these deterministic rules during rebalancing:

Processing Order: Iterate through the existing shards sorted primarily by their start boundary (ascending) and secondarily by their end boundary (ascending). Resolve Overlaps (Shift): If a key is already covered by limit active shards, any subsequent shard attempting to cover this key must have its start boundary shifted to the right. It should start at the earliest possible key where the coverage is strictly less than limit. Delete Excess (Drop): If shifting a shard's start boundary to the right causes its start to be strictly greater than its original end, the shard is completely eclipsed and must be deleted from the system. Fill Gaps (Extend): The global coverage of keys must remain continuous from the absolute minimum start to the absolute maximum end of the initial shards. If a gap exists between the current contiguous coverage and the next shard's start, extend the end boundary of the most recently processed active shard to bridge the gap. Return the updated internal state of the shards based on these rules. You may return the result in any order.

Constraints:

0 ≤ limit ≤ 105 0 ≤ shards.length ≤ 104 -109 ≤ start, end ≤ 109 start ≤ end Identifiers are distinct and contain no colons. Example 1:

Input: limit = 2, shards = ["A:0:100", "B:40:110", "C:80:200", "D:210:300"] Output: ["A:0:100", "B:40:110", "C:101:209", "D:210:300"] Explanation: Keys [0, 100] are covered by A and B. Shard C overlaps keys [80, 100] which are already at capacity, so its start shifts to 101. A gap exists between C (ending at 200) and D (starting at 210), so C is extended to 209.

Example 2:

Input: limit = 1, shards = ["A:0:100", "B:80:180"] Output: ["A:0:100", "B:101:180"]

Example 3:

Input: limit = 2, shards = ["A:0:30", "B:0:31", "C:0:32", "D:0:100"] Output: ["A:0:30", "B:0:31", "C:31:32", "D:32:100"] `

My approach:

First, I parsed the input shards into objects containing id, start and end values.
Then, I sorted the shards based on the problem's requirements.
Next, I iterated through the sorted shards, simulating coverage count for each key.
If a shard's start overlaps with existing shards beyond the limit, I incremented its start until the overlap condition is resolved.
I checked if the incremented start value surpassed the shard's end value; if it did, the shard gets dropped.
Finally, I took care of the gaps. After each shard is processed, I extended it to cover till the start of the next shard if there's a gap.

Key insights:

Understanding and correctly implementing shard overlapping and gapping logic.
Handling large integer intervals efficiently to avoid arithmetic overflows.
Thoroughly testing the edge cases to make sure all rules are satisfied.

My Solution

` from typing import List, Optional import heapq

class Shard: def init(self, id: str, start: int, end: int): self.id = id self.start = start self.end = end

def __str__(self):
    return f"{self.id}:{self.start}:{self.end}"

class Solution: def rebalance(self, limit: int, input: List[str]) -> List[str]: if not input: return []

    shards = []
    for s in input:
        parts = s.split(":")
        shards.append(Shard(parts[0], int(parts[1]), int(parts[2])))

    maxEnd = float('-inf')
    for s in shards:
        if s.end > maxEnd:
            maxEnd = s.end

    # Sort by start asc, then end asc
    shards.sort(key=lambda x: (x.start, x.end))

    # Min-heap of active shard end boundaries
    endHeap = []
    lastStart = float('-inf')
    accepted = []

    for shard in shards:
        newStart = max(shard.start, lastStart)

        # Shift start until coverage at newStart is below limit
        while True:
            while endHeap and endHeap[0] < newStart:
                heapq.heappop(endHeap)

            if len(endHeap) < limit:
                break
            # Shift past the earliest ending active shard
            newStart = heapq.heappop(endHeap) + 1

        if newStart > shard.end:
            continue

        shard.start = newStart
        heapq.heappush(endHeap, shard.end)
        lastStart = newStart
        accepted.append(shard)

    # Fill gaps and extend last shard to maxEnd
    if accepted:
        coverEnd = accepted[0].end
        for i in range(1, len(accepted)):
            prev = accepted[i - 1]
            cur = accepted[i]

            if cur.start > coverEnd + 1:
                # Gap exists
                prev.end = cur.start - 1
                coverEnd = prev.end
            if cur.end > coverEnd:
                coverEnd = cur.end
        last = accepted[-1]
        if maxEnd > last.end:
            last.end = maxEnd

    result = []
    for s in accepted:
        result.append(str(s))
    return result