Practice/Perplexity/Embedding Model Server

Embedding Model Server

CodingMust

Problem Overview

Design and implement a service that wraps an EmbeddingModel class to handle batch processing of text embeddings. The server should efficiently process batches of data and support graceful shutdown.

This problem tests your understanding of API design, batch processing, resource management, and error handling in production systems.

Key Requirements

Batch Processing: Process multiple text inputs simultaneously
Graceful Shutdown: Handle a special shutdown command to stop the server
Result Mapping: Return embeddings with their corresponding IDs
Scalability: Handle batches with up to 100 elements
Error Handling: Gracefully handle failures and edge cases

Important Notes

Start with a simple single-threaded implementation, then extend to multi-threaded if asked
Focus on clean API design and proper resource management
Consider production concerns like monitoring, error handling, and graceful shutdown
Write your own test cases and ensure your code compiles and runs correctly

Part 1: Single-Threaded Implementation

Implement a basic server that processes requests sequentially.

Input/Output Format

Input (Batch Request):

batch_data = [ {'id': '1', 'data': 'example text 1'}, {'id': '2', 'data': 'example text 2'} ]

Output (Batch Response):

[ {'id': '1', 'embedding': [0.1, 0.2, ...]}, {'id': '2', 'embedding': [0.3, 0.4, ...]} ]

Shutdown:

Input

request = 'shutdown'

Output

'Server shutdown complete.' `

Example

` server = ModelServer()

Process batch

result = server.handle_request([ {'id': '1', 'data': 'machine learning'}, {'id': '2', 'data': 'neural networks'} ])

Returns: [{'id': '1', 'embedding': [...]}, {'id': '2', 'embedding': [...]}]

Shutdown

result = server.handle_request('shutdown')

Returns: 'Server shutdown complete.'

Requirements

Parse and validate input requests
Handle both batch processing and shutdown commands
Maintain ID mapping between input and output
Process items sequentially (no parallelization yet)
Return results in a consistent format

Clarification Questions to Ask

Should we validate the batch size? What's the maximum?
How should we handle individual item failures within a batch?
What embedding dimension should we use (384, 768, 1536)?
Should results maintain the same order as input?
How should we handle duplicate IDs in a batch?

Part 2: Multi-Threaded Implementation (Follow-up)

Extend the solution to process batch items in parallel using a thread pool for better throughput.

Why Threading?

Performance Comparison:

Performance ComparisonImplementationTime for 100 items (10ms each)ThroughputSingle-threaded~1000ms (sequential)100 items/sMulti-threaded (10 workers)~100ms (parallel)1000 items/s

10x speedup with proper parallelization!

Example

` from concurrent.futures import ThreadPoolExecutor

server = ThreadSafeModelServer(max_workers=10)

Process large batch in parallel

batch = [{'id': str(i), 'data': f'text {i}'} for i in range(100)]

start_time = time.time() results = server.handle_request(batch) elapsed = time.time() - start_time

print(f"Processed {len(results)} items in {elapsed:.2f}s")

Output: Processed 100 items in ~0.10s (with 10 workers)

Requirements

Use ThreadPoolExecutor for managing worker threads
Submit all items to thread pool concurrently
Collect results as they complete using as_completed
Handle individual item failures without crashing entire batch
Implement thread-safe shutdown with proper cleanup
Validate batch size limit (max 100 items)

Practice/Perplexity/Embedding Model Server

Embedding Model Server

CodingMust

Problem Overview

Design and implement a service that wraps an EmbeddingModel class to handle batch processing of text embeddings. The server should efficiently process batches of data and support graceful shutdown.

This problem tests your understanding of API design, batch processing, resource management, and error handling in production systems.

Key Requirements

Batch Processing: Process multiple text inputs simultaneously
Graceful Shutdown: Handle a special shutdown command to stop the server
Result Mapping: Return embeddings with their corresponding IDs
Scalability: Handle batches with up to 100 elements
Error Handling: Gracefully handle failures and edge cases

Important Notes

Start with a simple single-threaded implementation, then extend to multi-threaded if asked
Focus on clean API design and proper resource management
Consider production concerns like monitoring, error handling, and graceful shutdown
Write your own test cases and ensure your code compiles and runs correctly

Part 1: Single-Threaded Implementation

Implement a basic server that processes requests sequentially.

Input/Output Format

Input (Batch Request):

batch_data = [ {'id': '1', 'data': 'example text 1'}, {'id': '2', 'data': 'example text 2'} ]

Output (Batch Response):

[ {'id': '1', 'embedding': [0.1, 0.2, ...]}, {'id': '2', 'embedding': [0.3, 0.4, ...]} ]

Shutdown:

Input

request = 'shutdown'

Output

'Server shutdown complete.' `

Example

` server = ModelServer()

Process batch

result = server.handle_request([ {'id': '1', 'data': 'machine learning'}, {'id': '2', 'data': 'neural networks'} ])

Returns: [{'id': '1', 'embedding': [...]}, {'id': '2', 'embedding': [...]}]

Shutdown

result = server.handle_request('shutdown')

Returns: 'Server shutdown complete.'

Requirements

Parse and validate input requests
Handle both batch processing and shutdown commands
Maintain ID mapping between input and output
Process items sequentially (no parallelization yet)
Return results in a consistent format

Clarification Questions to Ask

Should we validate the batch size? What's the maximum?
How should we handle individual item failures within a batch?
What embedding dimension should we use (384, 768, 1536)?
Should results maintain the same order as input?
How should we handle duplicate IDs in a batch?

Part 2: Multi-Threaded Implementation (Follow-up)

Extend the solution to process batch items in parallel using a thread pool for better throughput.

Why Threading?

Performance Comparison:

Performance ComparisonImplementationTime for 100 items (10ms each)ThroughputSingle-threaded~1000ms (sequential)100 items/sMulti-threaded (10 workers)~100ms (parallel)1000 items/s

10x speedup with proper parallelization!

Example

` from concurrent.futures import ThreadPoolExecutor

server = ThreadSafeModelServer(max_workers=10)

Process large batch in parallel

batch = [{'id': str(i), 'data': f'text {i}'} for i in range(100)]

start_time = time.time() results = server.handle_request(batch) elapsed = time.time() - start_time

print(f"Processed {len(results)} items in {elapsed:.2f}s")

Output: Processed 100 items in ~0.10s (with 10 workers)

Requirements

Use ThreadPoolExecutor for managing worker threads
Submit all items to thread pool concurrently
Collect results as they complete using as_completed
Handle individual item failures without crashing entire batch
Implement thread-safe shutdown with proper cleanup
Validate batch size limit (max 100 items)