Produits
  • Mutment IQ
    Assistant IA pour les professionnels du développement durable
  • OBJECTIF
    Une formation numérique pour renforcer la résilience au quotidien
  • Objectif du musée
    Rendre le développement durable engageant, facile et mesurable
Laboratoires
  • Vue d'ensemble des laboratoires
    Des solutions innovantes, adaptées à vos besoins
  • Flux de travail IA
    Automatisez les tâches complexes sans effort
  • Formation sur le développement durable
    Une formation personnalisée qui génère de l'impact
  • Consultatif
    Conseils et soutien d'experts en matière d'ESG
L'entreprise
  • À propos de Muuvment
    À propos de nous, de nos clients, de nos carrières
  • Ressources
    Dernières informations sur le secteur
  • Questions fréquemment posées
    Apprenez tout sur nos produits
Etudes de cas
Zédra

Nous avons lancé un programme d'action environnementale pour inciter les jeunes à participer à des initiatives de développement durable.

Ministère de l'Intérieur, Gouvernement des Bermudes

Nous avons lancé un programme d'action environnementale pour inciter les jeunes à participer à des initiatives de développement durable.

Nous avons élaboré une stratégie ESG complète et pluriannuelle qui associe la philanthropie d'entreprise aux objectifs commerciaux.

Nous avons lancé un programme d'action environnementale pour inciter les jeunes à participer à des initiatives de développement durable.

Contacter
Connectez-vous
  • QI
    Assistant IA pour les professionnels du développement durable
  • Finalité
    Rendre le développement durable engageant, facile et mesurable
Connectez-vous
fr-FR
English
Français
Essayez IQ dès aujourd'hui
Essayez IQ dès aujourd'hui

Connectez-vous à Muuvment.

À quelle plateforme souhaitez-vous accéder ?

QI Assistant d'aide pour les professionnels du développement durable

Finalité Rendre le développement durable engageant, facile et mesurable

Environmental Impact Tracking — Sources & Methodology

How we estimate the energy, water, carbon, and resource cost of every AI message

Contents
  1. Overview
  2. What We Measure
  3. How It Works
  4. Per-Model Energy Factors
  5. Data Center Infrastructure
  6. Lifecycle Factors: Primary Energy & Abiotic Depletion
  7. Real-World Comparisons
  8. How Confident Are We?
  9. Important Caveats
  10. Glossary
  11. References

1. Overview

Every time you send a message to an AI model, a data center somewhere consumes electricity to generate the response. That electricity requires water for cooling, produces greenhouse gas emissions based on the local power grid, draws on natural energy resources, and depletes finite minerals through the infrastructure that generates it.

We track five environmental metrics for every AI message in the IQ Assistant, and display them in a tooltip alongside each response. Our goal is transparency: giving you real numbers — with honest uncertainty ranges — so you can make informed choices about how you use AI.

This page documents every data source, assumption, and calculation behind those numbers. Where possible, we cross-validate our estimates against multiple independent sources. Where data is uncertain or missing, we say so explicitly.

2. What We Measure

Each AI message shows five environmental metrics. The first three — energy, greenhouse gas emissions, and water — are displayed in a compact footer beneath each message. All five are available in the expanded tooltip.

Metric
What It Tells You
Unit
Depends On
Example Comparison
Energy
Electricity consumed to generate the response
Wh / mWh
Model + token count
Seconds of streaming video
GHG Emissions
Carbon dioxide equivalent released by the power grid
g / mg CO₂e
Energy × grid carbon intensity
Metres of car driving
Water
Water consumed for cooling (on-site + off-site)
mL / µL
Energy × data center WUE
Water drops or teaspoons
Primary Energy
Total energy extracted from nature to produce the electricity
kJ / J
Energy × grid PE factor
Matches burned or food Calories
Abiotic Resources
Depletion of metals and minerals in power infrastructure
ng / pg Sb eq
Energy × grid ADPe factor
Milligrams of copper mined

Every metric includes an uncertainty range (min–max) reflecting the limits of current measurement science. More on that in How Confident Are We?

3. How It Works

The calculation is straightforward. For each message, we know the model used and how many tokens were processed (input) and generated (output). Research consistently shows that generating output tokens costs significantly more energy than processing input tokens — Caravaca et al. measured an ~11× difference — so we account for them separately.

The Calculation

energy = (inputTokens / 1000) × inputWhPer1kT + (outputTokens / 1000) × outputWhPer1kT water = energy × waterMlPerWh (data center water usage effectiveness) co2 = energy × co2eGPerWh (grid carbon intensity factor) primaryEnergy = energy × primaryEnergyMjPerWh (grid primary energy factor) adp = energy × adpKgSbPerWh (grid abiotic depletion factor)

Energy is the foundation. Every other metric is derived by multiplying energy by an infrastructure factor that depends on where the model runs — which data center, which power grid, which cooling system. The same model running in France (low-carbon nuclear grid) produces very different emissions than one running in Virginia (gas + coal mix).

Each metric also has min/max uncertainty bounds derived from the model's uncertainty multipliers (e.g., 0.5–1.5 means the true value could be 50%–150% of the nominal estimate). These ranges are applied uniformly across all five metrics.

When a single conversation uses multiple models (e.g., a reasoning model hands off to a faster model), we calculate each model's contribution separately and sum the results.

4. Per-Model Energy Factors

The most important — and most uncertain — input is how much energy each model consumes per token. No AI provider currently publishes per-token energy data. Google is the only company to have published per-query measurements (0.24 Wh for a median Gemini text prompt, August 2025). Everyone else relies on academic estimates with ~20–50% margin of error.

Our per-token values are derived from the best available research, primarily Jegham et al. 2025 (the most comprehensive LLM energy benchmark to date), cross-referenced with Oviedo/Microsoft Research, Epoch AI, Caravaca et al., and EcoLogits.

Model-by-Model Factors

green-l / green-l-raw (GreenPT)
GreenPT / EcoLogits
Input Wh/1kT
0.075
Output Wh/1kT
0.1
Estimated from Mistral Mini class (~8–22B parameters) using EcoLogits' parametric formula, with ~30% reduction for GreenPT's claimed compression and quantization. GreenPT has not published per-token energy figures. Their primary environmental advantage comes from running on the French nuclear grid (21.7 gCO₂/kWh per RTE France, among the lowest in the world).
gpt-4.1 (OpenAI)
Jegham et al. / Couch
Input Wh/1kT
0.125
Output Wh/1kT
0.4
Jegham et al. Table 4 reports 0.87 Wh (short), 3.16 Wh (medium), 4.83 Wh (long) per query. Cross-referenced with Couch derivation (~0.39 Wh/1kT input, ~1.95 Wh/1kT output for GPT‑4o class). GPT‑4.1 is architecture-optimized relative to GPT‑4o, so values are adjusted downward.
gpt-4.1-mini (OpenAI)
Jegham et al.
Input Wh/1kT
0.1
Output Wh/1kT
0.3
Proportional to GPT‑4.1 based on Jegham ratios. Mini models consume ~55–65% of full model energy: 0.45–2.12 Wh/query (mini) vs 0.87–4.83 Wh/query (full).
claude-opus-4-6 (Anthropic)
Jegham et al.
Input Wh/1kT
0.3
Output Wh/1kT
0.75
No direct measurement available. Jegham benchmarked Claude 3.7 Sonnet at 0.95–5.67 Wh/query. Opus is a larger, more capable model — estimated at ~2× Sonnet energy consumption. Anthropic has not published per-token energy data.
claude-sonnet-4-6 (Anthropic)
Jegham et al.
Input Wh/1kT
0.15
Output Wh/1kT
0.45
Based on Jegham Claude 3.7 Sonnet data (0.95–5.67 Wh/query), adjusted for architecture improvements. Sonnet 4.6 replaces Sonnet 4.5 with identical API pricing ($3/$15 per MTok) and similar output speed (~57 vs ~63 tok/s), indicating the same compute tier. Per-task energy is likely lower due to 70% token efficiency gains (Anthropic), but per-token energy is assumed equivalent until independent benchmarks are published.
o3 (OpenAI)
Jegham et al. / Epoch AI
Input Wh/1kT
0.65
Output Wh/1kT
3.5
Jegham Table 4: 1.18 Wh (short), 5.15 Wh (medium), 12.22 Wh (long) per query — the highest energy model in the study. Reasoning models generate 2.5–10× hidden chain-of-thought tokens per Epoch AI, which are invisible in the API but consume energy. The high output Wh/1kT reflects this reasoning overhead.
o4-mini (OpenAI)
Jegham et al.
Input Wh/1kT
0.225
Output Wh/1kT
0.75
Not directly benchmarked. Estimated from o3-mini data (0.67–3.53 Wh/query), adjusted for architecture improvements in the o4 generation.
gemini-3-pro-preview (Google)
Google official measurement
Input Wh/1kT
0.065
Output Wh/1kT
0.2
Google's official measurement: 0.24 Wh per median Gemini text prompt. The most efficient large model due to custom TPU hardware and 33× efficiency improvement over 12 months (to May 2025). Google does not publish the input/output token breakdown — the split is estimated based on typical ratios.
sonar-pro (Perplexity)
Estimated
Input Wh/1kT
0.35
Output Wh/1kT
0.75
No official energy data from Perplexity. Higher than pure chat models because each query involves a web search + retrieval + generation pipeline. Estimated proportionally above Claude class models based on the additional compute for real-time web retrieval.
sonar-deep-research (Perplexity)
Estimated
Input Wh/1kT
2.0
Output Wh/1kT
5.0
No official data. Deep research involves multi-step iterative loops with multiple LLM calls, web searches, and synthesis. Estimated at 5–7× sonar-pro energy.
text-embedding-3-large (OpenAI)
EcoLogits
Input Wh/1kT
0.015
Output Wh/1kT
0.0
Encoder-only architecture with no text generation. Estimated at ~1/15th of GPT‑4.1 energy based on API pricing ratio ($0.13/MTok vs $2.00/MTok). EcoLogits estimate for a ~1–2B parameter model confirms this order of magnitude.

Cross-Validation: Do These Numbers Make Sense?

Multiple independent sources converge on a 0.2–0.4 Wh range for a standard text query to a frontier model, giving us confidence our estimates are in the right ballpark. Reasoning models consume 5–70× more depending on complexity.

Standard Text Queries

Source
Model / Scenario
Energy (Wh)
Method
Google 2025 (measured)
Gemini Apps text, median
0.24
Production measurement, full stack
Epoch AI 2025
GPT‑4o, 500 output tokens
~0.3
Bottom-up FLOP-based
Oviedo/Microsoft 2025
Frontier >200B, H100
0.34 (IQR 0.18–0.67)
Token throughput estimation
Jegham et al. 2025
GPT‑4o short prompt
0.42
Infrastructure-aware API benchmarking
Caravaca et al. 2025
Llama 3.1 405B (batched)
0.354
Direct GPU measurement

Reasoning Models

Source
Model / Scenario
Energy (Wh)
Overhead vs Standard
Jegham et al.
o3 short prompt
7.03
~7.6× vs GPT‑4.1
Jegham et al.
o3 long prompt
39.2
Highest measured
Jegham et al.
Claude 3.7 Sonnet Extended Thinking
3.49
~4.2× vs standard Sonnet
Oviedo/Microsoft
Test-time compute / reasoning
4.32
~13× baseline
Muxup 2026
DeepSeek‑R1 (high output)
15.0–16.3
Output-length dependent

Full Benchmark: Short Prompt (100 input / 300 output tokens)

From Jegham et al. Table 4, sorted by energy consumption:

Model
Energy (Wh)
± Error
LLaMA‑3.2 1B
0.070
0.011
LLaMA‑3.2-vision 11B
0.071
0.011
GPT‑4.1 nano
0.103
0.037
LLaMA‑3.3 70B
0.247
0.032
GPT‑4.1 mini
0.421
0.197
GPT‑4o (Mar '25)
0.421
0.127
Claude‑3.7 Sonnet
0.836
0.102
GPT‑4.1
0.918
0.498
o4-mini (high)
2.916
1.605
Claude‑3.7 Sonnet ET
3.490
0.304
GPT‑4.5
6.723
1.207
o3
7.026
3.663
DeepSeek‑R1
23.815
2.160
Important: Jegham v6 (November 2025) does not include GPT‑4.1 (full), Claude Opus 4, Claude Sonnet 4, or Gemini 3 Pro — these models were released after data collection. Our per-token estimates for these models are derived from the closest benchmarked equivalents and architectural reasoning. As of February 2026, no published paper provides direct energy measurements for any of these current-generation models.

Why Output Tokens Cost More Than Input

Multiple sources confirm that generating output tokens is significantly more energy-intensive than processing input tokens. We use a conservative 5:1 ratio based on API pricing, but direct measurement suggests the true ratio may be higher.

Source
Output:Input Ratio
Method
Caravaca et al. 2025
~11×
Direct GPU measurement
Couch 2026 (via pricing proxy)
~5×
API pricing ratio
SemiAnalysis (cited by Couch)
~15× (smaller scales)
Industry analysis

Energy Breakdown by Component (Google 2025)

Google's technical paper (arXiv:2508.15734) provides the only published component-level breakdown for production AI inference:

Component
Share
TPUs / GPUs
58%
CPU and memory
24%
Operational redundancy
10%
Data center overhead (cooling, etc.)
8%

5. Data Center Infrastructure

Once we know how much energy a model consumes, infrastructure multipliers convert that into real-world impacts. These factors come from Jegham et al. Table 1, supplemented with provider sustainability reports (Google, Microsoft, AWS, Scaleway) and independent verification.

Provider Infrastructure Factors

Provider
Location
PUE
WUE On-Site (mL/Wh)
WUE Off-Site (mL/Wh)
WUE Total (mL/Wh)
CIF (gCO₂/Wh)
Renewable
Azure (OpenAI)
US East
1.12
0.30
4.35
4.61
0.350
~60% (100% matched via RECs)
AWS (Anthropic, Perplexity)
US East (Virginia)
1.14
0.18
5.11
5.27
0.287
100% matched via RECs
GCP (Google)
Various US, TPUs
1.09
—
—
~1.08 ¹
~0.125 ²
66% CFE (hourly)
Scaleway (GreenPT)
DC5, France
1.25
0.067
~0.48
~0.55
0.065 ³
100% (wind/hydro, GO)

PUE Cross-Validation

Power Usage Effectiveness measures how efficiently a data center delivers energy to its computing equipment. A PUE of 1.0 would mean zero overhead; the industry average is 1.56.

Provider
Our Value
Provider-Reported (2024)
CCF Open Source
Assessment
Azure
1.12
1.12 (design target), 1.16 (global avg)
1.185 (fleet-wide)
Matches next-gen DC design PUE
AWS
1.14
1.15 (2024 global)
~1.135
Slightly optimistic vs reported 1.15
GCP
1.09
1.09 (2024 fleet avg)
1.1
Accurate. Best: 1.07 (Oregon)
Scaleway DC5
1.25
1.25 (DC5 2024)
N/A
Updated from 1.15 (historical). Fleet avg is 1.37
Industry avg
—
1.56 (2024 survey)
—
Enterprise on-premise: 1.63 (IDC)

WUE Cross-Validation

Water Usage Effectiveness measures how much water a data center consumes per unit of energy. Lower is better.

Provider
Reported WUE (L/kWh)
Year
Notes
AWS
0.15
2024
17% improvement from 2023; best-in-class among hyperscalers
Microsoft Azure
0.30
FY2024
39% improvement from 0.49 in 2021. New zero-water evaporation designs starting Aug 2024
Google GCP
~1.0
2024
Annualized global on-site water efficiency. Total: ~22.7 billion liters in 2024 (+8% YoY)
Industry avg (hyperscale)
0.45–0.48
2024
Berkeley Lab 2024 US Data Center Energy Report projection
Notes:
¹ Google WUE derived from official figures: 0.26 mL per median prompt ÷ 0.24 Wh = ~1.08 mL/Wh.
² Google carbon intensity is a blended estimate using 66% CFE × regional grid mix. Varies significantly by region: Iowa 87% CFE, South Carolina 31% CFE, Oregon 87% CFE.
³ GreenPT/Scaleway CO₂: Scaleway's Environmental Footprint Calculator publishes PAR-2 (DC5) carbon intensity as 0.065 kgCO₂e/kWh, calculated using EMBER electricity mix data × DC5 PUE, following a location-based methodology per ADEME PCR guidelines (deliberately excluding their 100% renewable Guarantees of Origin).

6. Lifecycle Factors: Primary Energy & Abiotic Depletion

Beyond direct energy, CO₂, and water, we track two lifecycle metrics that capture the broader environmental cost of electricity generation itself. These come from EcoLogits / ADEME Base Empreinte® / ecoinvent databases.

Region / Grid
Primary Energy (MJ/Wh)
ADPe (kg Sb eq/Wh)
Used By
US grid (Azure, AWS, GCP)
0.0096884
9.855 × 10⁻¹¹
OpenAI, Anthropic, Google, Perplexity
France grid (Scaleway)
0.0093135
4.858 × 10⁻¹¹
GreenPT

Primary Energy (PE)

Primary Energy measures the total energy extracted from nature — fossil fuels, nuclear fuel, wind, solar — to produce each watt-hour of electricity you consume. It includes all the losses along the way: fuel extraction, refining, transport, and generation inefficiency. A factor of ~0.0097 MJ/Wh means roughly 2.7× the direct electricity is consumed as primary energy from nature.

Abiotic Depletion Potential for Elements (ADPe)

ADPe measures the depletion of non-renewable mineral and metal resources — lithium, copper, gold, rare earths — required for the electricity generation infrastructure. It's expressed in kilograms of antimony equivalent (kg Sb eq) per the CML-IA characterization method from Leiden University. France has lower ADPe than the US grid because nuclear-dominated generation requires less diverse mineral extraction than a fossil-fuel-heavy mix.

Notes:
• PE and ADPe are infrastructure-level lifecycle factors — they depend on the electricity grid, not the model itself.
• These factors capture only the operational electricity lifecycle, not hardware manufacturing (embodied impacts).
• Values derived from the same ADEME/ecoinvent databases used by EcoLogits for LCA compliance (ISO 14044).
• The same uncertainty ranges applied to energy, water, and CO₂ are also applied to PE and ADPe.

7. Real-World Comparisons

Raw numbers like "0.15 Wh" or "0.04 gCO₂e" are hard to grasp. For each metric, we display a real-world comparison in the tooltip to make the numbers tangible. Here's what we compare to and why.

Energy

Reference
Value (Wh)
Source
Validated
HD streaming video (1 sec)
0.033
IEA 2020, Carbon Brief 2020
~0.12 kWh/hr for HD streaming (DC + network + device)
A Google search
0.3
Google official (2009), reaffirmed 2022
Consistently cited since 2009. Server-side only.
Charging a smartphone
19
~5000mAh × 3.8V = 19 Wh
EnergySage confirms 14.8–19.3 Wh range

Selection logic: < 2 minutes → "≈ X seconds of streaming video"; < 50 searches → "≈ X Google searches"; else → "≈ X% of a phone charge"

Water

Reference
Value (mL)
Source
Validated
A single water drop
0.05
US Pharmacopeia standard: 20 drops/mL
Actual drops vary (0.027–0.043 mL); 0.05 mL is accepted convention
A teaspoon of water
5
Standard culinary measure
1 tsp = 5 mL exactly

Selection logic: < 100 drops → "≈ X water drops"; else → "≈ X teaspoons"

CO₂ Emissions

Reference
Value (gCO₂)
Source
Validated
Petrol car per metre
0.130
EEA 2024 — European on-road fleet (~130 g/km est.)
EEA 2024 new-car avg: 108 g/km. On-road fleet higher due to older vehicles. EU 2025 target: 93.6 g/km.
A Google search
0.2
Google official (2009), reaffirmed 2022
Server-side only. Consistently cited.

Note on car CO₂: We use 130 g/km for the European on-road fleet (all vehicle ages). The EEA reported 108 g/km for 2024 new cars, down from 160–170 g/km in 2015–2019 due to EV adoption and EU emissions standards. The total on-road fleet is higher because older, less efficient vehicles remain in service (average European car age ~12 years). The US EPA fleet average is ~249 g/km.

Selection logic: < 100 metres → "≈ driving X metres"; else → "≈ X Google searches"

Primary Energy

Reference
Value (kJ)
Source
Validated
A wooden match burned
1
Chemical energy of match head
Literature range: 1.05–2.14 kJ. Our value is a lower-end approximation.
A food Calorie (kcal)
4.184
Thermodynamic definition
Exact: 1 food Calorie (kcal) = 4.184 kJ. No uncertainty.
Boiling a cup of water
300 (0.3 MJ)
~250 mL × 80°C rise × 4.184 J/g°C
Direct thermal energy ~84 kJ. With kettle efficiency (~85%): ~99 kJ. We use 300 kJ to represent the full primary energy cost including generation losses (~3×).

Selection logic: < 50 matches → "≈ X matches burned"; < 100 Cal → "≈ X food Calories"; else → "≈ boiling X cups of water"

Abiotic Depletion (ADPe)

ADPe is expressed in kg Sb eq (antimony equivalent) — an abstract unit. To make it tangible, we convert to the equivalent mass of copper that would need to be mined to cause the same level of mineral depletion, using the CML-IA 2016 characterization factor for copper (1.4 × 10⁻³ kg Sb eq/kg Cu).

Reference
Value (kg Sb eq/kg)
Source
Copper mining (per kg)
1.4 × 10⁻³
CML-IA 2016, Leiden University
Smartphone (Fairphone 5)
1.25 × 10⁻³
Fairphone 5 LCA 2024

For context: manufacturing one smartphone (1.25 × 10⁻³ kg Sb eq) causes the same mineral depletion as generating ~3,000 kWh of electricity — roughly one year of average European household electricity. A typical LLM query's ADPe equates to mining 0.001–0.05 mg of copper.

Additional Reference Values (not used in tooltip)

Reference
Value
Source
LED bulb for 1 second
0.0028 Wh
10W LED / 3600s
Sending an email
0.005 Wh
IEA / Berners-Lee 2020
A tablespoon of water
15 mL
Standard measure
A sip of water
37 mL
Average measured
Sending a text message
0.014 gCO₂
Berners-Lee 2020
Sending an email
2 gCO₂
Berners-Lee 2020

8. How Confident Are We?

There is currently no single standardized uncertainty methodology for LLM energy estimation, though progress is being made. The SCI for AI specification (ISO/IEC 21031:2024, ratified December 2025) provides standardized reporting requirements for AI carbon accounting — a critical step toward comparable, reproducible environmental impact reporting.

Different research groups currently use different approaches to quantifying uncertainty:

Source
Uncertainty Approach
Typical Range
Jegham et al. 2025
Confidence intervals from hardware config uncertainty
±10–50% (model-dependent)
Oviedo/Microsoft 2025
IQR reporting (median + interquartile range)
IQR: 0.18–0.67 Wh on 0.34 median
Google 2025
Median-based reporting to avoid outlier distortion
Production measurement, narrowest
EcoLogits
Parametric regression from LLM Perf Leaderboard
Model-parameter-dependent

Our Uncertainty Ranges

Each model defines uncertainty multipliers (e.g., 0.5–1.5 means the true value could be 50%–150% of our estimate). These are applied uniformly to all five metrics. Here's why we chose ±30–50% as the default range:

  • Jegham et al. error bars range from ±15% (well-known models like GPT‑4o: 0.42 ± 0.13) to ±52% (o3: 7.03 ± 3.66)
  • Oviedo/Microsoft IQR spans roughly ±50% of median (0.18–0.67 on 0.34)
  • Caravaca et al. found batch size alone causes 36× variation (Llama 405B: 21.7 Wh single vs 0.6 Wh batched), but production systems always batch
  • Nature Scientific Reports 2024 identifies hardware, geography, and utilization as the primary uncertainty sources
  • Non-production estimates overstate by 4–20× (Oviedo/Microsoft), suggesting our academic-derived estimates may be conservatively high

Models with more data points (GPT‑4o, Claude 3.7 Sonnet) have tighter ranges; models extrapolated from architectural reasoning (Claude Opus 4, Gemini 3 Pro) have wider ranges.

9. Important Caveats

  1. Limited official data. Only Google has published official per-query energy data (0.24 Wh for Gemini, August 2025). Oviedo/Microsoft Research provides the next most rigorous estimate (0.34 Wh median for frontier models). OpenAI's single data point (0.34 Wh for ChatGPT) came from a CEO statement without methodology. All other figures are academic estimates with ~20–50% margin of error.
  2. Hidden reasoning tokens. Reasoning models (o3, o4-mini) generate hidden chain-of-thought tokens that dramatically increase energy consumption. Jegham et al. measured o3 at 7–39 Wh per query — up to 70× more than efficient models. These internal tokens are invisible via the API but consume compute, making per-visible-token metrics potentially misleading for reasoning models.
  3. "100% renewable" claims. AWS and Microsoft use annual Renewable Energy Certificate (REC) matching — purchasing certificates equal to annual consumption from any location and time. This is an accounting solution, not an engineering one: a certificate from a solar farm in Arizona at noon can "offset" fossil-fuelled consumption in New York at midnight. Google's 66% CFE (measured hourly on the same regional grid) is more conservative and transparent. Greenpeace found AWS meeting only 12% of its renewable commitment physically in Virginia.
  4. GreenPT transparency gap. GreenPT has not published per-token energy figures despite marketing as "green AI." Their primary environmental advantage comes from the French nuclear grid (21.7 gCO₂/kWh per RTE France) and Scaleway's efficient data centers (PUE 1.25, adiabatic cooling), rather than demonstrated model-level efficiency.
  5. Perplexity: no sustainability data. No environmental reports, no energy figures, no climate commitments. Infrastructure multipliers are assumed from AWS (their primary cloud provider). The additional energy cost of web search + retrieval is estimated, not measured.
  6. Model version drift. Jegham et al. benchmarked Claude 3.7 Sonnet and o3, not the current generation. Our estimates for Claude Opus 4.6, Sonnet 4.6, GPT‑4.1, and Gemini 3 Pro are derived from the closest benchmarked equivalents and architectural reasoning. No published paper provides direct measurements for these models as of February 2026.
  7. Batch size and utilization. Energy per query varies dramatically with server utilization. Caravaca et al. found 36× reduction from single-prompt to batch-100 for Llama 405B. Oviedo/Microsoft warns non-production estimates overstate energy by 4–20×.
  8. Water accounting. WUE figures include both on-site cooling (evaporative towers, adiabatic systems) and off-site water consumed by electricity generation. Per IEA 2023, two-thirds of total data center water is indirect/off-site. Li et al. (CACM 2025) provides the most comprehensive framework for total water footprint estimation.
  9. Embodied emissions not included. Our calculations cover only operational energy. Hardware manufacturing emissions are significant: NVIDIA's HGX H100 reports 1,312 kg CO₂e cradle-to-gate per system. TechInsights (2026) projects GPU manufacturing emissions to grow ~16× from 2024 to 2030.
  10. Jevons Paradox. As AI becomes more efficient and cheaper, total resource consumption may increase. De Vries (2025) projects AI produced 32.6–79.7 million tonnes CO₂ in 2025 alone. Google's total emissions rose 11% to 11.5M tonnes in 2024 despite per-query efficiency gains.
  11. France carbon intensity validated. Our CIF of 0.065 gCO₂/Wh for Scaleway matches their own Environmental Footprint Calculator published value for DC5. This is higher than RTE France 2024 grid intensity (21.7 gCO₂eq/kWh) because ADEME's regulatory average for France is structurally higher than single-year figures, and the PUE multiplier (1.25) further increases effective carbon per useful kWh.

10. Glossary

Abbreviation
Full Term
ADPe
Abiotic Depletion Potential for elements — depletion of non-renewable mineral/metal resources (kg Sb eq)
CFE
Carbon-Free Energy — percentage of electricity from carbon-free sources, measured hourly (Google metric)
CIF
Carbon Intensity Factor — grams of CO₂ equivalent emitted per watt-hour of electricity (gCO₂e/Wh)
CML-IA
CML Impact Assessment — characterization method for life cycle impact assessment (Leiden University)
CO₂e
Carbon dioxide equivalent — standardized unit for greenhouse gas emissions
GHG
Greenhouse Gas — gases that trap heat in the atmosphere (CO₂, CH₄, N₂O, etc.)
kT
Thousand tokens — unit for measuring LLM input/output volume (1 kT = 1,000 tokens)
LCA
Life Cycle Assessment — methodology for evaluating environmental impacts across a product's full lifecycle (ISO 14044)
PE
Primary Energy — total energy extracted from natural resources to produce electricity (MJ)
PUE
Power Usage Effectiveness — ratio of total data center energy to IT equipment energy (1.0 = perfect)
REC
Renewable Energy Certificate — tradable certificate representing 1 MWh of renewable electricity
Sb eq
Antimony equivalent — reference unit for comparing mineral/metal resource depletion (CML-IA method)
Wh
Watt-hour — unit of energy (1 Wh = 1,000 mWh = 1,000,000 µWh)
WUE
Water Usage Effectiveness — water consumed per watt-hour of energy (mL/Wh)

11. References

  1. Jegham et al. 2025 — "How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference"
    • arXiv: 2505.09598 (v6, November 24, 2025)
    • Provider infrastructure multipliers (PUE, WUE, CIF) from Table 1
    • Per-query energy benchmarks from Table 4 (30+ models)
    • Validated within 19% of OpenAI CEO's disclosed 0.34 Wh/query for GPT‑4o
    • Uses cross-efficiency DEA for multi-dimensional sustainability ranking
  2. Google 2025 — "Measuring the environmental impact of delivering AI at Google Scale"
    • Google Cloud Blog
    • Full Report PDF
    • arXiv: 2508.15734
    • First official per-query measurement: 0.24 Wh energy, 0.03 gCO₂e, 0.26 mL water (median Gemini text prompt)
    • 33× reduction in per-prompt energy over 12 months (May 2024–May 2025)
  3. Epoch AI 2025 — "How much energy does ChatGPT use?"
    • Article
    • Bottom-up FLOP-based estimates for GPT‑4o class models
  4. Couch 2026 — "Electricity use of AI coding agents"
    • Blog
    • Per-token rates derived from Epoch AI data: ~0.39 Wh/MTok input, ~1.95 Wh/MTok output
  5. EcoLogits — Open-source parametric estimation library (v0.9.2, January 2025)
    • Methodology
    • GitHub
    • PyPI
    • LCA framework compliant with ISO 14044
    • Source for PE, ADPe, and WCF lifecycle factors via ADEME Base Empreinte®
  6. Ritchie 2025 — "What's the carbon footprint of using ChatGPT or Gemini?"
    • Substack
  7. GreenPT — Sustainability documentation
    • Sustainability Page
    • Metric Blog Post
  8. Scaleway — Data center environmental reports
    • Environmental Leadership
    • DC5 PUE Dashboard
    • Environmental Footprint Calculator
    • Footprint Estimation Methodology
    • 2024 CSR Impact Report (PDF)
  9. RTE France — Annual electricity review 2024
    • Key Findings (2024)
    • 2025 First Trends (PDF)
    • France grid carbon intensity: 21.7 gCO₂eq/kWh (2024)
  10. Google 2025 Environmental Report
    • Report
    • Region Carbon Data
  11. Microsoft 2025 Environmental Sustainability Report
    • Report
  12. AWS Sustainability
    • Overview
  13. ADEME Base Empreinte® — French Agency for Ecological Transition
    • Base Empreinte®
    • ADEME Data Portal
    • Source for electricity lifecycle data: PE and ADPe per kWh by country/region
  14. Boavizta — Open-source methodology for embodied impacts of IT equipment
    • Methodology
    • Boavizta API
  15. Muxup 2026 — "Per-query energy consumption of LLMs"
    • Article
    • Independent energy benchmarking of open-weight models using InferenceMAX benchmark suite
  16. CML-IA — Characterization factors for life cycle impact assessment (Leiden University)
    • CML-IA Characterisation Factors
    • Abiotic Depletion in LCIA
    • The Abiotic Depletion Potential: Background, Updates, and Future (2016)
  17. Fairphone 5 LCA 2024 — Life Cycle Assessment (Fraunhofer IZM, September 2024)
    • Fairphone Sustainability
    • FP5 LCA Report (PDF)
    • See also: Fairphone 6 LCA (Dec 2025)
  18. Oviedo et al. 2025 — "Energy Use of AI Inference" (Microsoft Research)
    • arXiv: 2509.20241
    • Microsoft Research
    • Median energy per query: 0.34 Wh (IQR: 0.18–0.67 Wh) for frontier models on H100 nodes
  19. Caravaca et al. 2025 — "From Prompts to Power: Measuring the Energy Footprint of LLM Inference"
    • arXiv: 2511.05597
    • 32,500+ measurements across 21 GPU configurations and 155 architectures
    • Output tokens have ~11× greater energy impact than input tokens
  20. Niu et al. 2025 — "TokenPowerBench: Benchmarking the Power Consumption of LLM Inference"
    • arXiv: 2512.03024
    • Super-linear energy scaling: LLaMA‑3 1B to 70B = 7.3× energy increase for 70× parameters
  1. Wilhelm et al. 2025 — "Beyond Test-Time Compute Strategies: Advocating Energy-per-Token"
    • EuroMLSys '25, ACM
    • Chain-of-Thought energy overhead: +72% to +177%
  2. Jin et al. 2025 — "The Energy Cost of Reasoning" (Harvard)
    • arXiv: 2505.14733
  3. Li et al. 2023–2025 — "Making AI Less 'Thirsty'"
    • arXiv: 2304.03271
    • Peer-reviewed: Communications of the ACM, 2025
    • Principled framework covering Scope 1 (on-site) and Scope 2 (off-site) water footprint
  4. De Vries 2025 — "Carbon and Water Footprints of Data Centers"
    • Patterns (Cell Press), 2025
    • AI systems: 32.6–79.7 million tonnes CO₂ in 2025
  5. EcoLogits (JOSS) — "EcoLogits: Evaluating the Environmental Impacts of Generative AI"
    • JOSS Paper (Journal of Open Source Software, 2025)
    • Authors: Samuel Rince et al. (GenAI Impact non-profit)
    • ISO 14044-compliant LCA approach
  6. Mistral 2025 — Official Environmental Report
    • Announcement (January 2025)
    • Per 400-token query: 1.14 gCO₂e and 45 mL water
  7. Pronk et al. 2025 — "Benchmarking Energy Efficiency of Large Language Models Using vLLM"
    • arXiv: 2509.08867
  8. Kumar et al. 2025–2026 — "OverThink: Slowdown Attacks on Reasoning LLMs"
    • arXiv: 2502.02542
  9. Ozcan et al. 2025 — "Quantifying the Energy Consumption and Carbon Emissions of LLM Inference via Simulations"
    • arXiv: 2507.11417
  10. van Oers et al. 2020 — "Abiotic resource depletion potentials (ADPs) for elements revisited"
    • Int J Life Cycle Assess, Springer
  11. Nature Scientific Reports 2024 — "Reconciling the contrasting narratives on the environmental impact of large language models"
    • Nature (2024)
  12. ecoinvent — Life cycle inventory database
    • ecoinvent Electricity
    • Version 3.12 (February 2026), 3,500+ datasets in 250+ geographies
  13. SCI for AI — Software Carbon Intensity for Artificial Intelligence (ISO/IEC 21031:2024)
    • Green Software Foundation — SCI for AI
    • Ratified December 17, 2025
  14. ML.ENERGY Leaderboard v3.0 — Standardized LLM energy benchmarks
    • ML.ENERGY Leaderboard
    • 46 models across 1,858 hardware configurations (December 2025)
  15. NVIDIA HGX H100 Product Carbon Footprint
    • HGX H100 PCF Summary (PDF)
    • Cradle-to-gate: 1,312 kg CO₂e per HGX H100 system
  16. TechInsights (2026) — GPU manufacturing emissions growth projection
    • TechInsights Sustainability Insights
    • 2026 Inflection Point: Semiconductor Sustainability Predictions
  17. Coalition for Sustainable AI — International governance framework
    • Coalition for Sustainable AI
    • AI Action Summit (Wikipedia)
    • Launched at Paris AI Action Summit, February 2025; 58 countries signed
  18. EF 3.1 — Environmental Footprint characterization factors (JRC, 2025)
    • JRC Environmental Footprint
    • Official EU Product Environmental Footprint method
  19. GPT-5 energy estimate — University of Rhode Island AI Lab (August 2025)
    • The Guardian / AI Commission
    • Tom's Hardware analysis
    • GPT‑5 average: ~18.35 Wh per 1000-token query; ~8.6× increase over GPT‑4
Rendez le développement durable plus facile, plus intelligent et plus efficace.
Inscrivez-vous à notre newsletter

Inscrivez-vous à notre newsletter.

Tenez-moi au courant des nouvelles fonctionnalités de Muuvment.

Vous pouvez vous désinscrire à tout moment.
Merci ! Votre candidature a été reçue !
Oups ! Une erreur s'est produite lors de l'envoi du formulaire.
Muuvment
info@muuvment.com
+1 866 645 2040
Des solutions
Muuvment IQObjectif du muséeOBJECTIFLaboratoires Muvment
L'entreprise
À propos de MuuvmentContactRessourcesFAQ
Légal
Politique de confidentialitéTermes et conditions
© 2025 Musée
Préférences relatives aux cookies