Design a throttling system for Databricks’ serving infrastructure that protects both incoming traffic (clients → API gateway) and outgoing calls (API servers → downstream dependencies). The system must handle sudden bursts, prevent cascading failures, and enforce rate limits across a multi-instance, multi-region deployment. Support multiple limit dimensions (global, per-service, per-user, per-workspace) and make limits adaptive: when a dependency slows down, automatically reduce the rate of outgoing requests. The solution should impose minimal latency overhead (<1 ms on the critical path), scale to millions of decisions per second, and tolerate machine failures without losing accuracy. You may assume a micro-service architecture behind an Envoy-based ingress gateway, with services written in Scala/Java and a control plane that can push configuration changes within seconds.