Design a system that enables organizations to conduct surveys by initiating outbound phone calls to participants, delivering a brief introduction, and then transitioning the interaction to SMS so respondents can answer questions at their own pace via text message. Surveys support branching logic (subsequent questions depend on previous answers), may span hours or days with pauses and resumptions, and must comply with telecommunications regulations including consent tracking and opt-out handling.
The core engineering challenges are managing long-running, stateful conversations over unreliable channels (telephony and SMS), integrating with third-party providers subject to rate limits and delivery failures, orchestrating branching survey logic at scale, and providing real-time campaign analytics. You will need to reason about state-machine-based workflow orchestration, idempotent message processing, carrier rate limiting, and decoupled event-driven architectures.
Based on real interview experiences at Aurelian, these are the areas interviewers probe most deeply:
Each respondent interaction is a long-running, multi-step workflow that must survive server restarts, handle replies arriving hours or days later, and correctly interpret answers in the context of the current question. Interviewers want to see how you model, persist, and protect conversation state.
Hints to consider:
Telephony and SMS providers deliver webhooks that may arrive late, duplicated, or not at all. Carrier throughput limits fluctuate, and over-sending degrades sender reputation or triggers blocks.
Hints to consider:
Surveys have conditional paths where question three may depend on the answer to question two. Interviewers assess how you evaluate rules efficiently, keep survey definitions decoupled from execution code, and enable non-technical users to author complex flows.
Hints to consider:
Campaign managers need near-real-time visibility into response rates, bottleneck questions, and completion funnels without slowing down the operational message-processing path.
Hints to consider:
Telecommunications regulations require explicit consent before sending marketing SMS, immediate processing of opt-out requests, and retention of proof-of-consent records.
Hints to consider:
Confirm scale parameters: concurrent campaigns, respondents per campaign, and peak inbound SMS rate. Ask about survey complexity: maximum questions, branching depth, and multimedia support. Clarify timing: can surveys span multiple days, and are there timezone restrictions for sending? Determine provider strategy: single provider or multi-provider redundancy. Verify compliance requirements and consent retention period.
Sketch the main components. Control plane: REST API for campaign creation, survey authoring, and result retrieval. Campaign orchestrator: manages outbound call scheduling, drives conversation state machines, and consumes work from internal queues. Telephony adapter: wraps provider APIs (Twilio, Plivo) for placing calls and receiving status webhooks. SMS gateway: sends outbound texts with rate limiting and receives inbound webhooks, publishing to Kafka. Conversation engine: stateful service that loads a respondent session, evaluates branching logic, determines the next question, and updates state atomically. Event stream (Kafka): central bus for inbound messages, outbound sends, state transitions, and analytics events. Session store (Redis plus PostgreSQL): Redis caches active sessions; PostgreSQL persists survey definitions, consent records, and responses. Analytics pipeline: consumes events, aggregates metrics, and writes to an OLAP store for dashboards and exports.
Walk through a single respondent's session. When consent is obtained on the call, a session record is created in PostgreSQL with the respondent ID, campaign ID, survey version, and initial state (question one). The session summary is written to Redis with a seven-day TTL. An inbound SMS webhook arrives and is published to the inbound-sms Kafka topic. A consumer fetches the session from Redis (or PostgreSQL on cache miss), validates the message is not a duplicate using the idempotency set, and locks the session with optimistic versioning. It evaluates the answer against the current question's validation rules. If valid, the answer is appended to history, the branching logic selects the next question, the session version is incremented, and the transaction is committed. An outbound message event is published to the outbound-sms Kafka topic. A separate consumer dequeues it, checks the token bucket for the campaign's phone number pool, and calls the SMS provider API.
Cover scalability: partition Kafka topics by respondent ID; shard Redis by respondent hash; scale conversation engine consumers horizontally. Discuss reliability: dead-letter queues for messages that fail after retries; circuit breakers on provider API calls. Address rate limiting: token buckets per campaign, per phone number, and globally. Mention compliance: global opt-out bloom filter checked before every send; consent records stored with call recording references; GDPR export and delete APIs. Cover monitoring: queue depth, processing latency, provider error rates, and campaign completion metrics with alerting on anomalies.