Practice/Amazon/Design a Carbon Footprint Tracker for AWS Resources
Design a Carbon Footprint Tracker for AWS Resources
Product DesignMust
Problem Statement
Your task is to design a comprehensive system that monitors and reports on the environmental impact of cloud infrastructure. The system needs to track carbon emissions generated by AWS resources across an organization's entire cloud footprint, calculate CO2 equivalents based on resource usage patterns, and provide actionable insights to reduce environmental impact.
The platform should support large enterprises with thousands of AWS accounts, millions of resources across multiple regions, and generate real-time carbon metrics alongside historical trend analysis. The system must integrate with AWS billing data, resource utilization metrics, and regional energy grid carbon intensity factors to provide accurate emissions calculations. Additionally, it should offer forecasting capabilities, sustainability recommendations, and compliance reporting for organizations committed to carbon neutrality goals.
Key Requirements
Functional
- Resource Discovery and Tracking -- automatically discover and monitor all AWS resources (EC2, RDS, S3, Lambda, etc.) across multiple accounts and regions
- Carbon Calculation Engine -- compute carbon emissions based on resource type, utilization, runtime hours, and regional energy source mix
- Dashboard and Reporting -- provide real-time dashboards, historical trends, team-level breakdowns, and exportable compliance reports
- Recommendations Engine -- suggest optimization opportunities such as instance rightsizing, region migration, or renewable energy zones
- Alerting and Budgets -- notify stakeholders when carbon budgets are exceeded or anomalous usage patterns are detected
Non-Functional
- Scalability -- handle 10,000+ AWS accounts, 50M+ resources, ingesting 100K metrics/second
- Reliability -- achieve 99.9% uptime with no data loss; critical for monthly sustainability reporting
- Latency -- dashboard queries under 2 seconds; real-time metrics with less than 5-minute delay
- Consistency -- eventual consistency acceptable for analytics; strong consistency for billing-related carbon cost allocation
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Data Collection Architecture
The system needs to continuously gather resource metadata and utilization metrics from potentially thousands of AWS accounts without impacting customer workloads or hitting AWS API rate limits.
Hints to consider:
- Discuss using AWS Organizations for multi-account discovery and cross-account IAM roles for secure access
- Consider CloudWatch Metrics Streams or EventBridge for near real-time data ingestion versus periodic polling
- Address API throttling strategies such as exponential backoff, account-level queues, and regional sharding
- Evaluate whether to pull data centrally or deploy collectors in each account/region
2. Carbon Calculation Methodology
Converting cloud resource usage into accurate carbon emissions requires domain knowledge of power consumption, data center efficiency (PUE), and regional energy grid composition.
Hints to consider:
- Discuss obtaining or building a database of resource power consumption baselines (watts per instance type)
- Factor in CPU utilization curves -- idle resources consume less power than fully utilized ones
- Integrate real-time or historical carbon intensity data for each AWS region based on local energy grid mix
- Handle data gaps gracefully when precise utilization metrics aren't available (use estimates or industry averages)
3. Storage and Query Performance
The system must store massive time-series data while supporting both real-time dashboards and complex analytical queries for trend analysis and forecasting.
Hints to consider:
- Evaluate time-series databases (InfluxDB, TimescaleDB, Amazon Timestream) versus data lakes (S3 + Athena)
- Discuss data retention policies and aggregation strategies (roll up minute-level data to hourly/daily after certain periods)
- Consider using a Lambda architecture with hot path (real-time) and cold path (batch processing for historical analysis)
- Address how to efficiently query by multiple dimensions (account, region, team, resource type, time range)
4. Attribution and Cost Allocation
Organizations need to understand which teams, projects, or services are responsible for carbon emissions to drive accountability and optimization efforts.
Hints to consider:
- Leverage AWS tags and resource groups for mapping resources to cost centers or teams
- Handle untagged or improperly tagged resources with default allocation rules or anomaly detection
- Discuss integration with existing cost allocation frameworks (AWS Cost Explorer tags, chargeback systems)
- Consider carbon credits or offsets tracking for teams that invest in renewable energy projects
5. Scalability and Resource Efficiency
Ironically, the monitoring system itself consumes resources and generates carbon emissions, so it needs to be efficient.
Hints to consider:
- Discuss sampling strategies for very high-cardinality resources (millions of Lambda functions or S3 objects)
- Use serverless components (Lambda, Fargate) that scale to zero when not needed
- Implement intelligent caching to avoid redundant API calls or recalculations
- Batch process historical analysis during off-peak hours to reduce infrastructure footprint
Suggested Approach
Step 1: Clarify Requirements
Begin by confirming the scope and constraints with your interviewer:
- What is the expected scale? How many AWS accounts, regions, and resources?
- Which AWS services should be tracked? Start with core compute/storage or cover all 200+ services?
- What accuracy level is required? Exact watt-hour measurements or reasonable estimates?
- What latency is acceptable for dashboard updates? Real-time (seconds) or near real-time (minutes)?
- Are there regulatory compliance requirements (GHG Protocol, ISO 14064, CSRD)?
- Should the system support multi-cloud environments (Azure, GCP) or only AWS initially?
- Do users need predictive forecasting or just historical reporting?