Airbnb's "Retryer Implementation" interview question focuses on designing a robust retry mechanism for handling transient failures in distributed systems. It emphasizes object-oriented principles, design patterns like Strategy and Decorator, and sophisticated error classification.
Design and implement a configurable Retryer class that executes a given task (e.g., an API call or database operation) with automatic retries on failure. The retryer must:
The implementation should be thread-safe and extensible for different execution contexts (HTTP, gRPC, database).[1][3]
class Retryer { execute(Task task) -> Result withMaxRetries(int count) withBackoff(BackoffStrategy strategy) withErrorClassifier(ErrorClassifier classifier) withCircuitBreaker(CircuitBreaker cb) withTimeout(Duration timeout) }
`java Retryer retryer = new Retryer() .withMaxRetries(3) .withBackoff(new ExponentialBackoff(100ms, 2.0, 0.1)) // base=100ms, factor=2x, jitter=10% .withErrorClassifier(new HttpStatusClassifier());
Result result = retryer.execute(() -> externalApi.call(userId)); ` Expected behavior:
retryCount=2, totalDelay=~350ms`java Retryer retryer = new Retryer() .withCircuitBreaker(new CircuitBreaker(5, 30s)); // 5 failures → OPEN for 30s
// First 5 calls fail rapidly → circuit OPEN Result r1 = retryer.execute(slowService::call); // immediate failure: "Circuit open"
// After 30s half-open, then success → circuit CLOSED
Result r2 = retryer.execute(slowService::call); // succeeds
**Metrics emitted**:
retryer.circuit.state: OPEN → HALF_OPEN → CLOSED
retryer.failure_rate: 100% → 20%
retryer.total_retries: 12
`
java retryer.execute(() -> { throw new ValidationException("Invalid userId format"); // 400-equivalent }); // Immediately fails without retries, logs "Non-retryable error"
retryer_attempts_total{outcome="success|failure|timeout"}retryer_duration_seconds{quantile="0.5|0.9|0.99"}retryer_circuit_stateBackoffStrategy, ErrorClassifierTask with retry logic` Real Airbnb constraints from their payments system:
This matches Airbnb's real-world needs for payments, booking, and search services where transient failures are common but double-charging or infinite retries are catastrophic.[3][4]