Home Blog GreenTech and Sustainable IT AI energy consumption: how to measure and reduce LLM co...
🌱 GreenTech and Sustainable IT June 23, 2026 12 min read

AI energy consumption: how to measure and reduce LLM costs

GreenTech and Sustainable IT Enterprise Guide 2026 SCALE D2C D2C Technology GreenTech and Sustainable IT Enterprise Guide 2026 SCALE D2C

AI workloads — particularly large language model inference and training — are now the fastest-growing contributor to enterprise carbon footprints and cloud bills simultaneously. A single GPT-4-scale training run consumes approximately 1,287 MWh of electricity — equivalent to the annual energy consumption of 120 US homes. Enterprise LLM inference at scale compounds this: a million API calls per day to GPT-4 costs $15,000 and generates measurable carbon. This guide covers how to measure AI energy consumption, benchmark model efficiency, and reduce both cost and carbon through architectural choices.

The Scale of AI Energy Consumption

AI Energy Consumption — Enterprise Context
AI's energy footprint has three components: (1) Training — the one-time cost of training a foundation model, measured in MWh to GWh; (2) Inference — the ongoing cost of serving the model for requests, which scales with usage volume; (3) Fine-tuning — intermediate cost of adapting a pre-trained model to specific tasks. For enterprise consumers of AI (not model trainers), inference dominates the energy and cost profile — and it is the component most amenable to optimisation through model selection and deployment architecture.

Energy and Carbon Benchmarks by Model

ModelEnergy per 1K tokens (Wh)CO₂ per 1K tokens (gCO₂)Relative Cost Index
GPT-4 / Claude claude-opus-4-6~0.001–0.003 Wh~0.4–1.2 gCO₂100× (baseline high)
GPT-4o / Claude claude-sonnet-4-6~0.0003–0.001 Wh~0.12–0.4 gCO₂30×
Llama 4 8B (self-hosted, A100)~0.00005 Wh~0.02 gCO₂ (us-east-1)
Llama 4 8B (self-hosted, eu-north-1)~0.00005 Wh~0.001 gCO₂1× (baseline low)
DeepSeek V3 (self-hosted)~0.0002 Wh~0.08 gCO₂

Practical Reduction Strategies

10–30×
Energy reduction achievable by switching from GPT-4-class models to small, fine-tuned 8B models for suitable tasks — the single highest-impact action in most enterprise AI energy optimisation programmes
11×
Carbon intensity difference between running inference in us-east-1 (~400 gCO₂/kWh) vs eu-north-1 (~18 gCO₂/kWh) — region selection is free and has the second-largest carbon impact after model selection
75%
Energy reduction from INT4 quantisation of LLM inference vs FP16 — with only 3–8% quality degradation on most enterprise tasks, quantisation is the highest-ROI inference optimisation
📏
Right-Size Your Models
The most impactful reduction: use the smallest model that meets quality requirements for each task. Classification, extraction, and structured output tasks don't need GPT-4. A fine-tuned Llama 3 8B model typically matches GPT-4 on narrow domain tasks at 10–30× lower energy. Run an A/B test: same task, smaller model, measure quality difference. Most enterprises find 60–70% of their GPT-4 calls can be served by 8–13B models without quality loss.
🌍
Deploy in Low-Carbon Regions
For self-hosted models, run inference in eu-north-1 (AWS Stockholm, Nordic hydro) or eu-west-1 (Ireland, high renewable mix). For proprietary API calls, select the lowest-carbon data centre option — Azure and GCP expose data centre carbon intensity data. Carbon-aware routing of AI inference to the cleanest available region costs zero additional engineering effort for new deployments. Connect to your infrastructure-as-code for region selection automation.
Quantise and Optimise
INT8 quantisation: 50% energy reduction with <1% quality loss. INT4 quantisation (AWQ/GPTQ): 75% energy reduction with 3–8% quality loss on most tasks. Deploy with TensorRT for NVIDIA hardware (2–4× throughput improvement vs naive serving) or vLLM with PagedAttention (3–5× throughput improvement). Higher throughput = fewer GPUs needed = lower energy per token served.
🗃️
Cache Aggressively
The greenest LLM call is one never made. KV cache (built into vLLM and TGI) reuses computation for common prompt prefixes. Semantic cache (GPTCache, Redis + embeddings) returns cached responses for semantically similar queries — useful for FAQ, documentation, and high-repetition enterprise tasks. A well-implemented semantic cache reduces LLM calls by 30–60% for typical enterprise knowledge base Q&A workloads.

How to Measure Your AI Carbon Footprint

01
Step 1
Instrument with CodeCarbon and Kepler

For self-hosted models, deploy Kepler (Kubernetes eBPF energy measurement) for per-inference energy tracking. For training runs, add CodeCarbon to your training script — one decorator, zero code changes. For proprietary API calls, use the Ecologits library (open source) which estimates energy from token count and model type. Connect all measurement to your GreenOps dashboards.

CodeCarbonKeplerEcologits
02
Step 2
Calculate SCI Score per AI Service

Apply the Software Carbon Intensity formula to each AI service: SCI = (E × I + M) per R (per API call, per active user). This gives you a comparable carbon metric across different AI services, enabling data-driven model selection and optimisation prioritisation. Report SCI scores monthly in your engineering metrics dashboard alongside latency and cost.

SCI per AI serviceMonthly trackingModel comparison
Reduce Your AI Carbon and Cost

Our DevOps, ML, and digital transformation teams help enterprises measure, reduce, and report AI energy consumption as part of integrated GreenOps programmes. Book a free advisory session to build your AI sustainability strategy.

Frequently Asked Questions

End-to-end GreenTech and Sustainable IT strategy, implementation, and optimisation for enterprise and D2C brands. Contact us for a free consultation.

Strategy projects: 4–8 weeks. Full implementation: 3–12 months. ROI typically within 12–18 months.

Yes — D2C brands to enterprise. View our pricing.

GREENTECH AN

Ready to Implement GreenTech and Sustainable IT?

Our specialist team delivers measurable ROI from GreenTech and Sustainable IT programmes for enterprise and D2C brands.

Free Audit