Personalized AI Wine Recommendation Engines are mostly junk science

Recommending wine is hard for a human, but even harder for a computer.

Feb 28, 2025

Beilue: When Hollywood came to my hometown

With the new wave of generative AI, as in all industries, an onslaught of companies are claiming they can make personalized recommendations for consumers in wine. This is pure hogwash. Personalized recommendation engines for wine (in which a system can, based on a set of basic inputs, tell you wines you're going to like) is a veritable Three-Body Problem for wine.

I'm not saying that a functional recommendation engine isn't possible for wine. There are many known methodologies for building tools that help surface "approximately" similar wines. Some are more effective and useful than others, but they all assist a buyer in becoming exposed to other products from a retail selection.

Content-Aware Recommendation Engines

Metadata allows easy, blunt groupings of products (e.g., Napa Cabs at the same price, Red Bordeaux Blends, Red wines with the same Parker rating, White wines with the word "crisp" at the same price point). These content-based filtering factors are not personal recommendations but are standard matches and "good enough" for most average wine consumers.

Data inputs help determine the similarity of products for the recommendation engine. However, there are multiple problems with this approach. Metadata is an imperfect tool for correlating matches for a myriad of reasons. First and foremost, the taxonomy qualities of wine do not guarantee specificity or even commonality other than location, price, and variety. For example, a Sonoma Coast Chardonnay at $70/bottle might get you Ceritas Chardonnay Carex Vineyard Sonoma Coast or Moone-Tsai Heitz Vineyard Sonoma Coast Chardonnay. Any reasonably competent taster will tell you that these two wines couldn't be more different in style and almost certainly will not appeal to the same drinker. Content-aware recommendation engines have a limited product set to choose from for these content matches, which is to their benefit. For example, Wine.com has one of the largest online selections of any retailer, but when you filter their offerings by meta-data, things get small quickly.

A content-aware matching engine must only filter through 256 Sonoma Pinot Noirs or 756 California Cabernets. That's not super challenging for a human, much less a machine.

Almost any match this type of engine produces will be "Good Enough" for average consumers but in no way represents a truly personalized recommendation.

Another standard method, the latent factor model, involves mapping objects and making recommendations based on which things are closer to each other. Here is a movie recommendation example.

Like most types of recommendation engines, the methodology makes good approximations for people who enjoy wine but are not overly expert. But even in the simple diagram below, you can see that the neighboring wines are significantly different for a trained palate. And even more extreme when you skip one nearby wine to the next.

Building a proximity algorithm that theoretically links a consumer from a chardonnay to a Chablis is much easier than matching a wine from the Fiano grape that might have the same saline qualities as a field blend from the Canary Islands. Crossing from a meta-data model to a proximity model is much more complicated for a wine recommendation engine because nearby wines may cluster too easily, and exciting and really good wine recommendations often traverse or skip neighboring recommendations. For example, what red wine would a proximity model recommend for someone who likes Lopez De Heredia? Or what white wine would be a good match for someone who enjoys Beaujolais Nouveau?

Another traditional type of building recommendation is collaborative filtering.

Collaborative Filtering was first made famous by Amazon and operates under the thesis that people with similar tastes will likely enjoy similar items. Thus, the past choices of others can be used as a predictor for future taste preferences. For example, if 100 people like Cool Ranch Doritos, and 90 of them also buy Tostitos Hint of Lime chips, the remaining 10 consumers would likely also buy that chip if recommended.

The problem with collaborative Filtering as applied to wine is that long-tail products in wine are, by nature, limited in quantity and often sell out. The long tail is a statistical model frequently applied to the demand (and supply of products). Sometimes, it is represented in other ways, like the Pareto Principle.

The nature of the Long Tail means there will rarely be enough crowd dynamics for most products to cluster accurately and correlate products, especially those with limited distribution and low or no engagement rates to use in recommendations. As such, Collaborative Filtering strongly favors popular products with more distribution and awareness but is nearly impossible for the many niche products that make up the world of fine wine.

LLM’s Using Consumer or Trade Reviews

Trying to use consumer or trade tasting notes and reviews for wine also doesn't work because language is limited, and the same taste sensation is either homogenized across so many wines or might be interpreted differently across other tasters.

Here are two descriptions from a popular retailer for chardonnays at the same price point from Sonoma.

"It showcases vibrant citrus, stone fruit, and floral notes, complemented by hints of toasted nuts. Known for its elegance and balance, it combines richness with refreshing acidity."

“Following fermentation in stainless steel tanks, the wine was aged for 12 months in new and used French oak barrels, further enhancing its complexity. Focused and precise, the Chardonnay has a delicate tension with lovely notes of citrus zest, pear, apple blossom, wet stone, and crushed rocks.”

One of these wines is Kistler, and one is Ceritas. Two wines that are almost diametrically different in their taste profiles, but due to the finite nature of language, people's limitations in describing taste, and interpretations of flavors, the descriptions of the wines aren’t that different, at least from the standpoint of a computer. How many red wines have "cherry" in their descriptors? You can cross from Gamay to Petit Syrah, Pinot, and even Rose. And "lemon" or "vanilla" in white wines? Many professionals think this is only a problem applied to consumer tasting notes because of their limited experience.

However, the machine has equally as many problems interpreting professional tasters because interpreting obscure flavors like pencil shavings, marzipan, brambly, sandalwood, treacle tart, Chinese spices, exotic spices, brooding, angular, white flowers, and more is equally challenging.

Even Bad Recommendation Engines Increase Conversion

The industry's dirty little secret, however, is that if any group of wines is presented as a recommendation, no matter how bad, human psychology ensures that these recommendations will naturally increase sales (correlation, not causation). Even just putting random wines as recommended suggestions will have a net positive effect on increasing sales and basket size. But no matter how much all these apps, websites, and technology tools want you to believe to the contrary, this is not the same thing as a personalized recommendation.

By way of example:

Gives you these recommendations . . .

Fundamentally, these are not bad recommendations for an average consumer. However, someone well versed in wine knows that Cain Five differs from Peter Michael's Bordeaux Blend, and Oracle Miner differs significantly from Overture.

Here are my "personalized" Vine & Cellar Recommendations from Preferbli -

This broad spectrum of flavors, styles, and price points barely aligns with my personal taste. It is a blunt-force tool vs a personalized recommendation engine.

There are so many ways to generate recommendations. Data science, known recommendation methodologies, and even proprietary algorithms will give decent wine alternatives and even incorrect selections that will still encourage discovery, resulting in a correlated lift from cross-sell and upsell.

But anyone who says they make a personalized recommendation engine based on a consumer's tastes, web history, or shopping behaviors is lying to themselves, the press, and whomever they are shilling their service to this week.

The Four Necessary Inputs to Create A Personalized Recommendation Engine

INPUT #1 - Individual Ontogenic Factors

When you see assertions like, "We've built the industry's first and only database of ~248 million US consumer drinking-age palates for wine," you get serious Elizabeth Holmes vibes.

How we could all become an Elizabeth Holmes

Consider creating a user's taste profile, including genetic/physiological, cultural, and psychological factors unique to a single person. If we focus solely on genetic makeup to determine an individual's taste, we must map their taste and smell receptors to identify their preferences.

Let's (for a moment) suspend our disbelief and assume that a company could obtain a detailed analysis of a user's taste and smell genetics using the direct-to-consumer genetic testing kit from 23andMe at a serious discount of $50. The cost of genetically testing consumers at large would be astronomical. To emphasize how difficult this would be, 23andMe was founded in 2006, and in 2024, they disclosed they had only tested 14M people. Even if the company whose marketing line I quoted above acquired data in any and every way possible, the figure of 248 million doesn't even pass the sniff test because the total adult U.S. population is approximately 143 million (ages 21 to 59), with the core wine-drinking group being only about 15 million consumers.

But we're still in our tiny second of suspended disbelief here. Let's say a company COULD acquire this data. What exactly is the data they are acquiring? Taste is, as we know, actually a combination of both taste and smell. These two systems provide data that our brains unify into what we experience as sensory perception. This occurs through an incredibly complex sensorial and neurologic interaction process that produces the "flavors" we experience. The number of combinations that inform our perception of taste is astronomical.

There are approximately 25 different receptor genes for taste. These contribute to the five basic tastes: sweet, sour, salty, bitter, and umami.

At the same time, there are around 400 functional olfactory receptor genes (we can detect even more, but for simplicity, let's stick with the four hundred).

Combining these senses to form a unified flavor perception can build an equation that each combination of taste and smell is unique. We can find out the range of variations in human experiences using simple mathematics.

• Varieties of Taste: Organizing the receptors of the five flavors into a power set formula (ignoring the empty set, which would mean no taste) gives us 2⁵ - 1 = 31 combinations of basic tastes.

• Varieties of Smell: Similarly, using a simple model that assumes how every smell receptor can be activated and ignores complexities (e.g., partial activation or overlapping scent detection) gives 2⁴⁰⁰ possible combinations.

Total Combinations:

To find the total potential combinations of taste and smell, you would theoretically multiply the combinations of tastes by the combinations of smells:

(2⁵- 1) * 2⁴⁰⁰= 31* 2⁴⁰⁰

That's an insanely large set of combinations. Still, it isn't that useful for a computer to synthesize the math because how the brain interprets and integrates these sensations is even more complex than just multiplying the possibilities.

Our brains sometimes enhance or suppress certain aspects of flavor due to the presence of others. In real-world scenarios, not every theoretical combination is distinguishable or even perceivable by humans. Many combinations might result in similar or indistinguishable perceptions due to the brain's interpretation, which often simplifies the complexity.

Research suggests that each individual’s saliva unlocks or produces statistically different flavor and aroma compounds from the same wine.

So even with amazing computers, AI, and mathematical modeling to provide the full spectrum of the theoretical maximum analysis of individual taste, the actual perceptual experience of taste and smell combinations is governed by biological and psychological processes that reduce the practical number of unique experiences in a way that can't be predicted by AI or a computer algorithm—at least not currently.

This doesn't even account for a person's experiential understanding of a flavor. One of my favorite people on the planet, with a big heart and one of the most contagious smiles, Ntsiki Biyela from Aslina Wines told so many great stories about how her introduction to wine described flavors she'd never experienced. "So, rather than saying a wine has an aroma of truffles, say, she would use the term amasi, a local fermented milk product that would be very familiar to people."

This also doesn’t doesn't even account for non-flavor-based qualities like texture. Both Yoon Ha and Alder Yarrow often speak about a third dimension of taste—texture. How to texture layers onto our experience is not fully understood and is currently impossible to discern from someone's genetic makeup.

INPUT #2 - The Chemical Genetic Makeup Of The Wine

Even if you've done everything described above, you're still only a quarter of the way there. You'd also need to map the flavor profiles of any and every wine you wanted to have in your recommendation system. While the technology exists to map any wine in the world from the standpoint of its chemical components, the output is ultimately ineffective because when it comes to flavor, wine is much more than the sum of its parts. The components do not accurately reflect the Gestalt of what someone drinking the bottle would describe as the wine's "flavor." And, of course, the exercise of chemically mapping every wine in the world is not only a near logistical impossibility but also a financial one.

Every year, an estimated 1.4 million new wines are released across the planet. Purchasing, shipping, and testing all prove both logistically and financially impossible. At an average-inclusive cost of merely $20/bottle, that's $28M per year. EVERY YEAR.

Returning to reality for a moment, chemically mapping the most popular wines so that wineries can try to copy those components is an excellent methodology for creating a product that is likely to have a positive response from consumers. This is where a technology such as Tastry might work—helping a winery compare what it already makes against a target wine it wants to make and adding the components that will emulate the target wine.

INPUT #3 - Situational Awareness

Externalities are a bitch. A personalized tasting engine can not predict or understand all the externalities that will influence how a wine tastes to a person in the specific moment when they are tasting. Here's a small sampling of the many types of things that might change the perception of a wine: -

The foods they are eating in conjunction with the wine.
Environmental smells. Are they near a trash dispenser, in an aromatic Indian restaurant, or sitting outdoors at a cafe with blooming honeysuckle?
The person's physical state. Is the person healthy, or do they have a cold? Are they recovering from COVID-19, or do they have an underlying condition like Dysgeusia? All could severely influence their tasting experience.
Their psychological frame of mind. Emotions
Are there other sensory stimulus? Too cold or too hot? Is there music in the background? Susan Lin MW's research says it makes a difference.

Or even more fundamentally, what lived experiences created a mental overlay of the wine or flavors? There's a well-known phenomenon called the Proust Effect, where a taste and smell can trigger a vivid memory from the past. Does the scent of NZ Sauvignon Blanc remind them of when they used to mow the lawn, a chore they still resent from their past?

15 Years Later, Ratatouille's Message About Art Still Inspires Me - Reactor

INPUT #4 - the Magic that Makes it all work - THE FEEDBACK LOOP

The fourth and most crucial factor is the feedback loop of user preferences over time. Everyone who references the efficacy of Netflix, Spotify, and Amazon recommendations fails to understand why those personal recommendation engines are so effective and accurate. This has to do with them knowing what you are consuming and when.

They see movies, books, and music that you browse frequently. They know how often you listen to a song, if you finish a movie, when you stop a song before its completion, etc. That's a vast wealth of data that has no analog in wine.

Even CellarTracker or Vivino can't come close to tracking enough of anyone's consumption of wine to understand genuine INDIVIDUAL taste preferences. Most consumers using label scanning apps scan less than 10 wines annually, and most scan only 3-4. A retailer's average shopping visit involves viewing less than six wines per session.

Determining correlations from such small inputs yields very unreliable data. Companies that claim a quiz plus a few other user inputs will allow them to create personalized recommendations are misleading at best:

"New members first experience Firstleaf's one-of-a-kind technology with its advanced quiz system that analyzes 1 quintillion data points to personalize each shipment. With 98% of boxes being completely unique, Firstleaf works on an individual level through its first-party data and not with large-group flavor profiles. Firstleaf boasts a 96% accuracy rate once a member rates at least three bottles."

One side note: This is a stupid assertion. It only takes 17 questions, with 10 possible answers per question, to get to 1 quintillion data points.

Companies like Spotify emphasize that it takes hundreds of songs to create accurate recommendations, and those inputs include other user actions outside of listening feedback, such as skipping, playing, or adding to playlists.

No current wine software or system collects enough data from most users to give a personalized recommendation. Even with the minority of power users they serve, most systems collect limited information about how a consumer interacts with a specific wine.

Label scanning tools are used to get information about a bottle of wine before purchase or to "journal it" a single time when you try it. Only the most insignificant number of users are obsessive enough to log each wine every time they drink a bottle.

That is in contrast to Spotify, which measures how often you play "Argue (I Can't Argue With You)" by DJ Suede to determine how much you like that particular style of music.

Building better recommendation engines using some key methods outlined above is a worthy and vital goal. The goals of some of the technology solutions are to catalog and map wine by words, clusters, data science, and more, which are noble aspirations, and some even provide helpful services. But when they make hyperbolic claims of efficacy, lift, and especially the ability to map a person's taste profile, they stretch the truth (at best).

Today, and for a long time, the quest for a PERSONALIZED wine recommendation engine is still science fiction, not fact.

Pierre Marcelin

Feb 28

Very insightful and well documented. Excellent article. Thank you!

Expand full comment

A.J. Weinzettel

Your breakdown of recommendation engines and their inherent limitations in wine is insightful. But what’s missing from the conversation is something no algorithm—no matter how sophisticated—can ever predict: the emotional connection a person has with the story behind a wine.

People don’t just fall in love with wine because of its flavor profile. They fall in love with the people who make it, the history behind a vineyard, the way a single bottle transports them back to a place, a moment, or a shared experience with friends and family. No LLM or data model can quantify the feeling of walking through a vineyard at sunset, hearing a winemaker’s journey, or discovering a wine that suddenly means something beyond just what’s in the glass.

Personalized recommendations imply an understanding of what moves someone, what resonates with them on a human level. That’s not something an algorithm can provide. Wine is, at its core, about connection—something technology will never be able to replicate.

With Gratitude,

A.J.

———

Founder - Block 55

Winery Reservation Platform

https://block55.app

Website - http://weinnotes.com

Instagram - https://www.instagram.com/weinnotes/

Newsletter - https://newsletter.oregonvinocountry.com

iPhone App - https://apps.apple.com/us/app/id1522306889

YouTube - https://www.youtube.com/@weinnotes

Apple Podcasts - https://podcasts.apple.com/us/podcast/weinnotes/id1603014320