Design a distributed, horizontally-scalable in-memory cache that supports billions of keys, tens of millions of queries per second, and sub-millisecond latency. The system must provide get(key), set(key, value, ttl), and delete(key) APIs and guarantee that once a key is written every subsequent read returns the latest value (read-after-write consistency). Support automatic sharding with consistent hashing so nodes can be added or removed with minimal key movement, and replicate each key to at least three nodes for fault tolerance. Implement LRU eviction per shard when memory limits are reached. Nodes must detect failures within five seconds and elect new primaries without manual intervention; during fail-over no more than 0.1 % of in-flight requests may fail. Expose a client SDK that automatically retries on transient errors and routes requests to the correct shard. Provide operational tooling to rebalance keys, monitor hit ratio, eviction rate, and per-node latency, and allow live config changes such as TTL or replication factor without downtime. The design should handle worldwide deployment across three regions with <50 ms cross-region replication lag and survive an entire region outage with <30 s recovery time and zero data loss.