Design and implement a service that wraps an EmbeddingModel class to handle batch processing of text embeddings. The server should efficiently process batches of data and support graceful shutdown.
This problem tests your understanding of API design, batch processing, resource management, and error handling in production systems.
Batch Processing: Process multiple text inputs simultaneously
Graceful Shutdown: Handle a special shutdown command to stop the server
Result Mapping: Return embeddings with their corresponding IDs
Scalability: Handle batches with up to 100 elements
Error Handling: Gracefully handle failures and edge cases
Start with a simple single-threaded implementation, then extend to multi-threaded if asked
Focus on clean API design and proper resource management
Consider production concerns like monitoring, error handling, and graceful shutdown
Write your own test cases and ensure your code compiles and runs correctly
Implement a basic server that processes requests sequentially.
Input (Batch Request):
batch_data = [ {'id': '1', 'data': 'example text 1'}, {'id': '2', 'data': 'example text 2'} ]
[ {'id': '1', 'embedding': [0.1, 0.2, ...]}, {'id': '2', 'embedding': [0.3, 0.4, ...]} ]
`
request = 'shutdown'
'Server shutdown complete.' `
` server = ModelServer()
result = server.handle_request([ {'id': '1', 'data': 'machine learning'}, {'id': '2', 'data': 'neural networks'} ])
result = server.handle_request('shutdown')
`
Parse and validate input requests
Handle both batch processing and shutdown commands
Maintain ID mapping between input and output
Process items sequentially (no parallelization yet)
Return results in a consistent format
Should we validate the batch size? What's the maximum?
How should we handle individual item failures within a batch?
What embedding dimension should we use (384, 768, 1536)?
Should results maintain the same order as input?
How should we handle duplicate IDs in a batch?
Extend the solution to process batch items in parallel using a thread pool for better throughput.
Performance ComparisonImplementationTime for 100 items (10ms each)ThroughputSingle-threaded~1000ms (sequential)100 items/sMulti-threaded (10 workers)~100ms (parallel)1000 items/s
` from concurrent.futures import ThreadPoolExecutor
server = ThreadSafeModelServer(max_workers=10)
batch = [{'id': str(i), 'data': f'text {i}'} for i in range(100)]
start_time = time.time() results = server.handle_request(batch) elapsed = time.time() - start_time
print(f"Processed {len(results)} items in {elapsed:.2f}s")
`
Use ThreadPoolExecutor for managing worker threads
Submit all items to thread pool concurrently
Collect results as they complete using as_completed
Handle individual item failures without crashing entire batch
Implement thread-safe shutdown with proper cleanup
Validate batch size limit (max 100 items)