today
0 / 2,000 kcal
Tech deep-dive

How an AI picture calorie countergoes from photo to calories.

Three stages: a vision model identifies foods, a portion estimator guesses how much of each is there, and a nutrition database returns calories and macros. Total time: roughly 2 seconds. Total accuracy on common meals: within 8% of a registered dietitian, on average.

The pipeline

Three stages, two seconds.

01 · Detect
Detect
A vision model identifies the foods on the plate. "Grilled chicken, white rice, broccoli."
02 · Estimate portion
Estimate portion
Plate scale, utensil cues, and visual volume become grams per food. The hardest step.
03 · Look up nutrition
Look up nutrition
Grams × USDA FoodData Central values. Plain arithmetic, not a guess.
Stage 1 · Detection

What is on this plate?

A vision model takes the photo and returns a structured list of foods detected, with bounding regions and confidence scores. The model recognizes both individual foods (“grilled chicken,” “white rice”) and composed dishes (“Chipotle chicken bowl,” “Big Mac”).

For composed dishes from common chains, the model identifies the dish as a unit, then pulls posted nutrition data instead of summing the components. Detection is usually accurate; it's not where most error lives.

Stage 2 · Portion

The hardest stage.

The portion estimator looks at each detected food and estimates how much is there in grams. It uses plate scale (standard plate sizes are known), utensil scale (forks and spoons provide secondary references), and visual volume (depth cues from shadows and color gradient).

Once volume is estimated, the model multiplies by the known density of each food (rice is ~0.7 g/cm³, beef is ~1.0 g/cm³) to get grams. Portion estimation is where most of the error in any AI picture calorie counter lives. The interface compensates with a confidence indicator and one-tap adjust.

Stage 3 · Lookup

Plain arithmetic, not a guess.

Calorie values are not guessed by the AI. Each detected food maps to a public nutrition database, and calories are calculated deterministically from estimated grams.

Primary database: USDA FoodData Central. Open Food Facts for international packaged products. Restaurant chain menus stored as a curated dataset, refreshed quarterly.

Example calculation
200g cooked white rice (detected)
× 1.30 kcal/g (USDA FDC ID 169757)
= 260 kcal
Error budget

Where the 8% lives.

StageTypical error contribution
Food detection (right or wrong food)Low. Under 2% on most meals.
Portion estimation (right food, wrong amount)Dominant. 5 to 10% on typical meals.
Nutrition lookup (right food, right amount)Trivial. Under 1% (database accuracy).
Why we don't just ask

Single-call AI is a black box.

A simpler design would be: feed the photo to a multimodal model, ask “how many calories is this?”, return the answer. We don't do this. Single-call calorie estimation can hallucinate confidently. There's no way to debug a wrong answer or correct one stage.

The three-stage pipeline is slightly slower but transparent. If detection is wrong, one tap fixes it. If portion is wrong, the slider fixes it. The user sees what the AI saw and corrects each stage independently.

Inputs handled

Six ways in.

InputHow it works
PhotosThe default. Top-down or 45° angle is best.
Recipe screenshotsIngredient lists are parsed from screenshot text.
Menu screenshotsRestaurant menu item names are recognized as food entities.
Nutrition label photosRead directly. Most accurate input because lookup is exact.
Text only"I had a chicken sandwich and a small fries" works without a photo.
VoiceVoice transcription feeds the same text pipeline.
Photo to calories · FAQ

Questions, answered.

Three stages: identify foods, estimate portions, look up nutrition. Detection is usually accurate. Portion estimation is the wildcard. Nutrition lookup is a deterministic database query, not a guess by the AI.

Try the pipeline. Free, two seconds.

Three stages, one photo. Within 8% of a registered dietitian on average.