All it takes is a photo of your food, and within seconds you get calories, macros, and an “analysis.”

Sounds simple. And more importantly, accurate.

But here’s the problem: AI doesn’t actually calculate calories. It estimates them.

It doesn’t know the exact amount of ingredients, it doesn’t understand how the food was prepared, and it doesn’t see everything that’s really in the meal. It works with what it “sees” and even more with what it assumes is probably there based on data.

And that’s where the gap starts, between what looks precise and what actually is.

The goal of this article isn’t to criticize AI. The goal is to show its limits.

Because if you understand them, you can use AI as a tool. If you don’t, it can easily become a source of misleading decisions.

Here are the 10 most common problems that currently affect the accuracy of AI in nutrition.

1. The Illusion of the Third Dimension: Pixels Don’t See Real Volume

One of the fundamental technical limitations of AI when evaluating food from a photo is that it processes images as a two-dimensional signal. It does not perceive true depth or portion volume the way humans do in real space.

Without depth-sensing technologies, such as LiDAR, AI cannot reliably determine whether food on a plate is spread flat or stacked vertically. As a result, it tends to systematically underestimate the amount of food and this issue becomes more pronounced as portion size increases.

The data shows this quite clearly:

  • ChatGPT-4 underestimated food weight in 76.3% of cases
  • The average error in weight estimation was around 36% for models like ChatGPT and Claude, while Gemini reached errors of up to 65%
  • In one test, AI estimated a portion of curry at 255 g, while the actual weight was 480 g  meaning the model “missed” nearly half of the food

Importantly, this issue doesn’t affect only weight. If AI misjudges volume, it will also miscalculate calories and nutrients. The error doesn’t happen at the calculation stage it starts at the very beginning.

While accuracy can be relatively acceptable for small portions, it breaks down significantly for medium and large portions (p < 0.001, meaning it is extremely unlikely that this result is due to chance).

In practice, this means these tools should not be used as a precise “digital scale.” They can serve as a rough estimate, but they tend to underestimate reality. For users who require accurate energy and nutrient intake such as in clinical nutrition or performance-focused settings this limitation is critical.

2. Fat Blindness: Hidden Calories in Oils and Sauces

One of the biggest limitations of AI when analyzing food from a photo is its inability to work with non-visible components. The model evaluates only what it sees on the surface, while ignoring ingredients that are absorbed into the food or hidden within its structure, such as oils, butter, or dressings. This issue is referred to as “non-visible component blindness.”

From a nutritional perspective, this is a critical problem. Fat is the most energy-dense macronutrient, and even small amounts can significantly impact total calorie intake. If AI fails to detect it, the entire calculation becomes systematically underestimated.

Research findings highlight this clearly:

  • In the ChatGPT-4o model, fat estimation error reached as high as 76.5%
  • Even with simple foods like hazelnuts, the model misestimated fat content by approximately 75%
  • With more complex meals, the issue becomes even more pronounced, for example, in a tuna salad, AI initially identified only about 24% of the actual fat content

The problem isn’t that the model makes random mistakes. It simply doesn’t “see” fat. If oil is absorbed into the food or mixed into a sauce, the model has no visual signal that it’s there.

Interestingly, accuracy improves significantly when a text description is added to the image. When researchers included details such as “2 tablespoons of oil,” the accuracy of energy estimation (R²) increased from 0.59 to 0.94. This shows that the issue is not in the calculation itself, but in missing input data.

(R²  a measure of model accuracy, increased from 0.59 to 0.94, indicating a substantial improvement in agreement with reality)

In practice, this means that estimates based solely on a photo are insufficient for accurate nutrition tracking. Without information about preparation methods and added fats, actual intake can be higher by hundreds of calories per day. These tools should not be seen as a complete solution, but rather as a rough aid that requires manual input of key details.

3. Nutritional Hallucinations: Probability of Words Instead of Real Data

AI models in nutrition suffer from a phenomenon known as “nutritional hallucinations.” Unlike specialized nutrition software, they do not perform precise calculations based on real chemical databases. Instead, they generate responses based on the statistical probability of words and patterns learned from text.

This means AI does not work with actual nutritional values in real time. It produces answers that sound correct. It lacks a true understanding of relationships between nutrients, which leads to incorrect combinations, even when the output appears convincing.

Research shows that these errors are not rare, but repeatable.
When food is misidentified, the deviations can be extreme:

  • Claude 3.5 Sonnet confused scrambled eggs with pasta, resulting in a 1788% overestimation of carbohydrates
  • Gemini 1.5 Pro identified falafel as meatballs, leading to a 360% overestimation of protein

Even more advanced models are not immune. ChatGPT-4 showed statistically significant inaccuracies in 10 out of 16 monitored nutrients and tended to systematically underestimate as many as 11 of them. For 13 nutrients, including potassium, fiber, and vitamin D, the deviation from reality exceeded 10%.

The problem is not just the error itself, but how it is presented.

AI delivers outputs in a smooth, authoritative tone, often supported by tables and numbers that appear scientific. For the average user, it is practically impossible to distinguish between a precise calculation and a “probable estimate.”

In practice, this means AI should not be treated as a reliable calculation tool, but rather as a supportive text-based tool.

In situations where accuracy matters, especially in health conditions or clinical diets, human oversight is essential. Misinterpreted or “hallucinated” data can represent a real risk to health.

4. Clinical Risk: When Inaccuracy Becomes a Health Threat

In clinical nutrition, AI inaccuracy is no longer just a statistical deviation it becomes a real risk. Generative models are not able to apply medical guidelines with the required precision or account for individual patient limits, where exact nutrient amounts matter.

In chronic conditions such as kidney disease, diabetes, or cardiovascular disorders, even relatively small deviations can lead to a worsening of health. In these cases, the issue isn’t that AI isn’t perfect, it’s that its errors have real consequences.

Research shows that these deviations can be significant even in critical parameters:

  • When generating meals for dialysis patients, ChatGPT-4 underestimated:
    • potassium by 49%
    • energy by 36%
    • protein by 28%
  • The app Fastic reported sodium levels up to 34 times higher than reality
  • Fitbit reported approximately 20 times higher iron content

These are not just theoretical errors. For patients who must monitor specific minerals or macronutrients, even a ~30% deviation can pose a health risk.

It’s also important to understand how “good” results are interpreted. In one evaluation, 97% of ChatGPT’s energy estimates fell within ±40% of USDA reference values. At first glance, this may seem like high accuracy. In practice, however, a 40% deviation means that a meal estimated at 500 kcal could actually be anywhere between 300 and 700 kcal a difference that significantly impacts any dietary plan.

The issue is not only accuracy, but also context. AI can generate recommendations that are not aligned with specific diagnoses, for example:

  • suggesting sugary drinks when trying to manage sugar intake
  • presenting processed foods as “suitable” sources of nutrients

In such cases, the model does not demonstrate clinical judgment it simply generates a probable answer without considering risk.

In practice, this means AI should not be used to independently manage diet in medical conditions. It can serve as a supportive tool, but final decisions must remain under professional supervision. Without this oversight, what appears to be a helpful assistant can become a source of inaccurate and sometimes inappropriate recommendations.

5. Cultural Blindness: Algorithms Trained on a “Western” Plate

Current nutrition apps and AI models show a significant limitation known as “cultural blindness.” Most systems are trained primarily on Western databases, especially the U.S.-based USDA, where meals are typically represented as clearly separated components on a plate.

This approach does not work well for complex, mixed, or layered dishes that are common in Asian, Mediterranean, or Middle Eastern cuisines. In these cases, AI often fails to identify individual components and their proportions, leading to significantly distorted estimates.

Results from large-scale testing (Li et al., 2024) show that this is not an isolated issue, but a systematic bias:

  • for Western diets, apps overestimated energy intake by an average of 1040 kJ
  • for Asian diets, they underestimated it by −1520 kJ (approx. −360 kcal), (95% CI: −874 to −2165 kJ)

For specific foods, the discrepancies can be even larger. For example, AI underestimated the energy content of Pearl Milk Tea by up to 76%. For dishes like pho or stir-fry, systems often failed to correctly identify individual ingredients, resulting in highly inaccurate calculations.

These deviations are not limited to total energy, but also affect macronutrient composition:

  • in Western diets, some apps reported carbohydrate intake higher by 7–8% of total energy
  • in Asian diets, fat intake was on average 6% higher than reference values
  • across multiple diet types, carbohydrate intake was systematically overestimated

These differences highlight that models do not operate with a universal understanding of food, but rather rely on data shaped by a specific cultural context. When that context does not match the user’s dietary habits, the results can be significantly skewed.

In practice, this means users receive data that may appear precise, but does not reflect the actual food they consumed. This issue is especially pronounced with mixed dishes, where AI cannot accurately separate components or estimate their quantities.

For this reason, relying solely on AI for analyzing complex or regional dishes without manual verification is not recommended. A more accurate approach is to search for specific foods in a database or input the meal ingredient by ingredient. Visual recognition in these cases often fails due to limited diversity in training data.

6. Nutritional Imbalance: Healthy Foods in the Wrong Proportions

At first glance, AI seems to get things right. It can build a meal plan that includes “healthy” foods: vegetables, yogurt, fish, whole grains. The problem is that nutrition isn’t just about what you eat, but especially about the proportions in which you eat it.

And this is where a major limitation appears.

AI models do not operate with a real understanding of physiology or biochemical relationships between nutrients. They do not optimize meal plans the way a professional would. Instead, they generate them based on probability, what commonly appears together, not what makes nutritional sense together.

The result is a meal plan that may look “clean,” but doesn’t function properly underneath.

The data confirms this. In a study (Kaya Kaçar et al., 2025), where AI generated 30 weight-loss meal plans (1400–1800 kcal), the models achieved relatively solid overall quality scores (around 71 points on the DQI-I scale). They showed sufficient variety and included all major food groups.

However, when nutritional balance was evaluated, meaning the ratio of macronutrients and fatty acids, the results essentially failed:

  • the average balance score was only 0.27 out of 10
  • ChatGPT 4.0 scored 0.0 out of 10
  • other models were around 0.4 out of 10

In other words, AI can choose “good foods,” but it cannot combine them correctly.

The biggest issue lies in proportions, especially between:

  • protein, fat, and carbohydrates
  • different types of fats (saturated vs. unsaturated)
  • omega-6 and omega-3 fatty acids

These are not minor details. They are fundamental to how the body functions and directly affect inflammation, cardiovascular health, hormonal balance, and overall metabolism.

With incorrect ratios, a meal plan can look “healthy on paper,” but be harmful in the long term.

It’s also worth noting how these plans look in practice. They tend to follow repetitive patterns:

  • very low food variety
  • limited protein sources (e.g., fish only as salmon or cod)
  • complete omission of certain food groups (e.g., red meat)
  • ignoring details like dressings or added fats

This suggests that models are not working with real nutritional logic, but rather with “safe templates” that look healthy but are not truly optimized.

The core issue is that AI cannot handle complexity. Creating a balanced diet requires simultaneously optimizing energy, macronutrients, micronutrients, and fat quality a combination that current models are not capable of managing.

The biggest risk is the illusion of expertise. The user sees a well-structured plan full of healthy foods and assumes it is correct. In reality, it may just be a random combination of foods without deeper nutritional logic.

That’s why one simple rule applies: AI can be a useful source of inspiration, but not a reliable tool for building a diet plan. Especially in weight-loss diets or medical conditions, professional oversight remains essential.

7. The Large Portion Paradox: The More You Eat, the More AI Lies

At first glance, it may seem like AI makes random errors. In reality, there is a clear and repeatable pattern: the larger the portion, the bigger the error.

AI models tend to “normalize” what they see. Instead of estimating the actual volume, they gravitate toward an average version of a given dish. This may work reasonably well for small portions, but the error increases significantly with larger meals.

The result is systematic underestimation.

Data confirms this. In weight estimation, models like ChatGPT and Claude showed average errors around 36%, while Gemini ranged much higher, between 64% and 109%. The key point is that the error does not grow randomly, but increases directly with portion size.

For small meals, accuracy was relatively good. For medium and large portions, it dropped significantly.

Concrete measurements show the trend clearly:

  • small portions: 408 g vs. 430 g (minimal difference)
  • medium portions: 580 g vs. 426 g
  • large portions: 798 g vs. 530 g

In other words: the larger the portion, the more calories “disappear.”

The average deviation was approximately 27.8%, and overall, AI underestimated food weight in 76.3% of cases.

A practical example:
lentil curry: AI estimated 255 g vs. the actual 480 g. Nearly half of the food and therefore calories simply “disappeared.”

This trend is consistent:

  • accuracy for large portions is 20–30% lower than for small ones
  • all models show systematic underestimation
  • the more food on the plate, the greater the deviation

This is not a flaw of a specific model. It is a property of the system.

The problem is that users are not aware of this error. They see a number that looks precise and naturally trust it. But if AI consistently “removes” hundreds of calories from larger meals, actual intake ends up significantly higher than the data suggests.

This leads to a common frustration:
“I eat less, I track everything, but I’m not losing weight.”

In reality, this is not a failure of the person, it’s a systematic error of the tool.

AI today is not a precise measurement tool. It is an estimate. And that estimate has a clear direction: underestimation, especially with larger portions.

If your goal is general orientation, it may be sufficient. If your goal is accuracy, for example in weight loss, it becomes a risk.

That’s why one simple rule applies: when accuracy matters, a scale beats a camera.

8. The Allergy Trap: When an Error Isn’t Just an Error

When it comes to food allergies, there is no room for “close enough.” A meal is either safe or it isn’t. And this is exactly where one of AI’s most dangerous limitations becomes clear.

AI models do not operate with a medical understanding of risk. They cannot assess when an error may have real health consequences. Instead, they generate responses based on probability, what “sounds right”, not what is actually safe.

This means that even if AI is given clear information about an allergy, it cannot reliably guarantee compliance.

Testing confirms this very clearly. When generating 56 meal plans for a person with food allergies, ChatGPT failed in 7% of cases. In practice, this means that 4 meals included an allergen that should not have been there.

A specific example: in a nut-free diet, the model included almond milk without hesitation.

This is not a minor inaccuracy. It is a potential health risk.

Even more concerning is that AI does not recognize its own mistakes. It does not flag them, does not express uncertainty, and delivers the answer in a confident, authoritative tone.

A similar issue appears with dangerously low-calorie diets. When AI was intentionally asked to create an extremely low-calorie plan, it produced no warning. Instead, it presented the plan as a valid solution, even though it could lead to serious health consequences in practice.

It’s important to understand that AI can sometimes produce correct answers. In some cases, it generated meal plans aligned with guidelines (e.g., for diabetes or dialysis). The problem is consistency.

When the same request was repeated, the model often produced completely different and sometimes incorrect, results.

This means it is not a reliable system, but a tool with high variability.

The core issue is the absence of accountability. When AI makes a mistake, there is no mechanism to stop it or label it as dangerous. At the same time, it is unclear who is responsible for the consequences.

For the user, this creates a false sense of safety. The response looks professional, reads smoothly, and often includes seemingly logical explanations. Without expertise, it is nearly impossible to recognize that something is wrong.

With allergies, this becomes critical.

Even a small error can have serious consequences, from acute reactions to anaphylactic shock. Long-term risks also exist, such as nutritional deficiencies caused by poorly designed elimination diets.

That’s why one simple rule applies:
AI can help with guidance, but it must not manage diet in medical conditions.

Especially with allergies, professional supervision should always be present. Otherwise, a seemingly helpful tool becomes a hidden risk.

9. Blindness to Fortification and Brands

AI can recognize what’s on the plate. But it cannot understand what is actually in the food.

When analyzing a meal, it relies entirely on visual input. Anything that isn’t visible on the surface effectively doesn’t exist to the model. This becomes a major issue, especially with processed foods.

AI cannot identify fortification (added vitamins and minerals), nor can it distinguish between different brands unless it has access to the exact product name or visible packaging.

For a human, the difference between two cereals can be significant. One may be fortified with iron and B vitamins, while the other is not. For AI, however, they are simply “flakes.”

As a result, the model relies on average database values rather than specific data. The output may appear precise, but it is only an estimate based on a “typical version” of the food.

Even the models themselves acknowledge this limitation. ChatGPT-4, for example, stated that it cannot determine whether cornflakes are fortified with vitamins and minerals even though this significantly affects their nutritional profile.

Data supports this quantitatively. In an analysis of 114 meals, the average deviation was approximately 26.9%. For most nutrients, the error exceeded 10%, and in 11 out of 16 cases, AI systematically underestimated values.

This means that while energy and basic macronutrients may appear relatively accurate, micronutrients are often far from reality.

The biggest issue arises when tracking micronutrients. A person with anemia may believe they are consuming enough iron, while in reality they are not. Similarly, for sodium, sugar, or other critical nutrients, AI can systematically distort intake and the user has no way to detect it.

That’s why one simple rule applies: the more processed the food, the less reliable AI analysis from a photo becomes.

If accuracy matters, it is essential to work with specific data:

  • the exact product name
  • the nutrition label
  • or a database linked to a specific brand

Without that, AI always works with averages. And in this case, “average” often means a deviation that can be nutritionally significant.

10. Stochastic Variability: Different Answer for the Same Photo

Unlike a calculator or a laboratory scale, AI does not function as a precise, repeatable tool. It is a probabilistic system. This means that the same input does not always produce the same output.

In practice, it’s simple: the same photo of a meal can return different values even when nothing about the image has changed.

The reason is what’s known as stochastic variability. The model does not generate answers through fixed calculations, but by selecting the most probable output based on learned data. And this selection can vary slightly each time.

The result is that AI is not fully consistent.

A simple example:
one day, AI estimates your meal at 500 kcal; the next day: using the exact same photo, it estimates 600 kcal. Not because the food changed, but because the model’s output did.

A similar issue appears in diet planning. When the same request was repeated for the same health profile (e.g., a person with diabetes), the models generated different and sometimes inconsistent, recommendations. This means AI does not create a stable reference point.

And that’s a major problem when tracking progress. If the numbers change not because of your behavior, but because of tool variability, you lose the ability to evaluate what actually works.

In simple terms:
you don’t know whether your body is changing or just the algorithm’s response.

From a scientific perspective, this is an even bigger issue. A reliable tool must be reproducible, the same input should lead to the same result. With AI, this is not currently the case.

That’s why experts suggest working with ranges instead of single values. Instead of relying on one number, multiple outputs should be generated and averaged, or expressed as a confidence interval.

However, this significantly complicates everyday use.

The main risk, once again, is hidden. The output appears precise, clear, and definitive. The user has no reason to question it. In reality, it is just the “best current estimate,” not a stable result.

That’s why one simple rule applies:
AI today is not a measurement tool, it is an estimation tool.

And until it relies on deterministic calculations rather than generative outputs, its accuracy will always depend on the model’s momentary “choice,” not objective reality.

AI in nutrition has enormous potential.

But today, it is not a precise tool. It is an estimation tool.

It can speed up the process, simplify tracking, and provide a basic overview. But it cannot be relied on where details matter. And in nutrition, details matter the most.

The problem isn’t that AI makes mistakes. The problem is that you don’t see them.

The outputs look precise, sound professional, and make sense. And that’s exactly why people trust them more than they should.

If we simplify it into one idea:
👉 AI can help today. But it shouldn’t decide.

If you’re looking for general guidance, it’s a useful tool. If you need accuracy, whether for weight loss, performance, or health you need more than an estimate.

And that’s exactly where humans, real data, and context still matter.

Sources:
https://www.cambridge.org/core/journals/british-journal-of-nutrition/article/validity-and-accuracy-of-artificial-intelligencebased-dietary-intake-assessment-methods-a-systematic-review/6829E54E37F38BB07D09A97D5982C73D
https://pmc.ncbi.nlm.nih.gov/articles/PMC11243505/
https://pmc.ncbi.nlm.nih.gov/articles/PMC11206595/
https://pubmed.ncbi.nlm.nih.gov/38194819/
https://pubmed.ncbi.nlm.nih.gov/38060823/
https://www.sciencedirect.com/science/article/pii/S088915752501659X
https://pubmed.ncbi.nlm.nih.gov/39125452/
https://pubmed.ncbi.nlm.nih.gov/41081011/
https://pmc.ncbi.nlm.nih.gov/articles/PMC12367769/
https://pmc.ncbi.nlm.nih.gov/articles/PMC11199627/
https://www.mdpi.com/2072-6643/16/15/2573
https://www.mdpi.com/2072-6643/17/4/607
https://www.mdpi.com/2072-6643/17/2/206
https://www.researchgate.net/publication/395491050_Performance_evaluation_of_Three_Large_Language_Models_for_Nutritional_Content_Estimation_from_Food_Images
https://www.researchgate.net/publication/399109330_Image-based_nutritional_assessment_evaluating_the_performance_of_ChatGPT-4o_on_simple_and_complex_meals
https://scholarworks.merrimack.edu/cgi/viewcontent.cgi?article=1195&context=health_facpubs