(Oh, I’m the one one who’s been asking this query…? Hm. Nicely, when you have a minute, please take pleasure in this exploratory Knowledge Evaluation — that includes experimental design, statistics, and interactive visualization — utilized a bit too earnestly to resolve a world debate.)
1. Introduction
1.1 Background and motivation
Chocolate is loved all over the world. From historical practices harvesting natural cacao within the Amazon basin, to chocolatiers sculpting edible artwork within the mountains of Switzerland, and massive factories in Hershey, Pennsylvania churning out 70 million kisses per day, the nuanced types and flavors of chocolate have been built-in into many cultures and their customs. Whereas high quality can enormously range throughout chocolate merchandise, a widely known, shelf-stable, simply shareable type of chocolate are M&Ms. Readily discovered by comfort retailer check-out counters and in lodge merchandising machines, the brightly coloured pellets are a preferred deal with whose packaging is re-branded to suit almost any commercializable American vacation.
Whereas dwelling in Denmark in 2022, I heard a regarding declare: M&Ms manufactured in Europe style totally different, and arguably “higher,” than M&Ms produced in america. Whereas I acknowledged that fancy European chocolate is certainly fairly tasty and sometimes superior to American chocolate, it was unclear to me if the identical declare ought to maintain for M&Ms. I discovered that many Europeans understand an “disagreeable” or “tangy” style in American chocolate, which is essentially attributed to butyric acid, a compound ensuing from variations in how milk is handled earlier than incorporation into milk chocolate.
However truthfully, how a lot of a distinction may this make for M&Ms? M&Ms!? I imagined M&Ms would retain a comparatively processed/mass-produced/low-cost sweet taste wherever they had been manufactured. Because the lone American visiting a various lab of worldwide scientists pursuing cutting-edge analysis in biosustainability, I used to be impressed to interrupt out my knowledge science toolbox and examine this M&M taste phenomenon.
1.2 Earlier work
To cite a European girl, who shall stay nameless, after she tasted an American M&M whereas touring in New York:
“They style so gross. Like vomit. I don’t perceive how individuals can eat this. I threw the remainder of the bag away.”
Vomit? Actually? In my expertise, youngsters raised in america had no qualms about consuming M&Ms. Rising up, I used to be accustomed to bowls of M&Ms strategically positioned in excessive visitors areas round my home to supply available sugar. Clearly American M&Ms are edible. However are they considerably totally different and/or inferior to their European equal?
In response to the nameless European girl’s scathing report, myself and two different Individuals visiting Denmark sampled M&Ms bought regionally within the Lyngby Storcenter Føtex. We hoped to expertise the unimaginable enchancment in M&M taste that was apparently hidden from us all through our youths. However curiously, we detected no apparent taste enhancements.
Sadly, neither preliminary examine was in a position to conduct a side-by-side style take a look at with correct controls and randomized M&M sampling. Thus, we flip to science.
1.3 Research Objectives
This examine seeks to treatment the earlier lack of thoroughness and examine the next questions:
- Is there a international consensus that European M&Ms are in reality higher than American M&Ms?
- Can Europeans truly detect a distinction between M&Ms bought within the US vs in Europe once they don’t know which one they’re consuming? Or is that this a grand, coordinated lie amongst Europeans to make Individuals really feel embarrassed?
- Are Individuals truly taste-blind to American vs European M&Ms? Or can they style a distinction however merely don’t describe this distinction as “an enchancment” in taste?
- Can these alleged style variations be perceived by residents of different continents? In that case, do they discover one taste clearly superior?
2. Strategies
2.1 Experimental design and knowledge assortment
Individuals had been recruited by luring — er, inviting them to a social gathering (with the promise of free meals) that was conveniently co-located with the testing web site. As soon as a participant agreed to pause socializing and be a part of the examine, they had been positioned at a testing station with a educated experimenter who guided them via the next steps:
- Individuals sat at a desk and obtained two cups: 1 empty and 1 stuffed with water. With one cup in every hand, the participant was requested to shut their eyes, and maintain them closed via the rest of the experiment.
- The experimenter randomly extracted one M&M with a spoon, delivered it to the participant’s empty cup, and the participant was requested to eat the M&M (eyes nonetheless closed).
- After consuming every M&M, the experimenter collected the style response by asking the participant to report in the event that they thought the M&M tasted: Particularly Good, Particularly Unhealthy, or Regular.
- Every participant obtained a complete of 10 M&Ms (5 European, 5 American), one by one, in a random sequence decided by random.org.
- Between consuming every M&M, the participant was requested to take a sip of water to assist “cleanse their palate.”
- Knowledge collected: for every participant, the experimenter recorded the participant’s continent of origin (if this was ambiguous, the participant was requested to checklist the continent on which they’ve the strongest reminiscences of consuming sweet as a baby). For every of the ten M&Ms delivered, the experimenter recorded the M&M origin (“Denmark” or “USA”), the M&M shade, and the participant’s style response. Experimenters had been additionally inspired to jot down any amusing phrases uttered by the participant through the take a look at, recorded below notes (knowledge accessible right here).
2.2 Sourcing supplies and recruiting contributors
Two luggage of M&Ms had been bought for this examine. The American-sourced M&Ms (“USA M&M”) had been acquired on the SFO airport and delivered by the creator’s dad and mom, who visited her in Denmark. The European-sourced M&Ms (“Denmark M&M”) had been bought at an area Føtex grocery retailer in Lyngby, just a little north of Copenhagen.
Experiments had been carried out at two principal time factors. The primary 14 contributors had been examined in Lyngby, Denmark in August 2022. They largely consisted of mates and housemates the creator met on the Novo Nordisk Basis Heart for Biosustainability on the Technical College of Denmark (DTU) who got here to a “going away get together” into which the experimental process was inserted. A number of further family and friends who visited Denmark had been additionally examined throughout their travels (e.g. on the practice).
The remaining 37 contributors had been examined in Seattle, WA, USA in October 2022, primarily throughout a “TGIF completely satisfied hour” hosted by graduate college students within the laptop science PhD program on the College of Washington. This second batch largely consisted of scholars and workers of the Paul. G. Allen College of Laptop Science & Engineering (UW CSE) who responded to the weekly Friday summoning to the Allen Heart atrium without cost snacks and drinks.
Whereas this examine got down to analyze international traits, sadly knowledge was solely collected from 51 contributors the creator was in a position to lure to the examine websites and isn’t well-balanced nor consultant of the 6 inhabited continents of Earth (Determine 1). We hope to enhance our recruitment ways in future work. For now, our analytical energy with this dataset is restricted to response traits for people from North America, Europe, and Asia, extremely biased by subcommunities the creator occurred to interact with in late 2022.
2.3 Dangers
Whereas we didn’t purchase formal approval for experimentation with human take a look at topics, there have been minor dangers related to this experiment: contributors had been warned that they might be subjected to elevated ranges of sugar and attainable “disagreeable flavors” on account of taking part on this examine. No different dangers had been anticipated.
After the experiment nevertheless, we sadly noticed a number of circumstances of deflated delight when a participant discovered their style response was skewed extra positively in direction of the M&M sort they weren’t anticipating. This delight deflation appeared most extreme amongst European contributors who discovered their very own or their fiancé’s desire skewed in direction of USA M&Ms, although this was not quantitatively measured and can’t be confirmed past anecdotal proof.
3. Outcomes & Dialogue
3.1 General response to “USA M&Ms” vs “Denmark M&Ms”
3.1.1 Categorical response evaluation — complete dataset
In our first evaluation, we rely the whole variety of “Unhealthy”, “Regular”, and “Good” style responses and report the proportion of every response obtained by every M&M sort. M&Ms from Denmark extra steadily obtained “Good” responses than USA M&Ms but additionally extra steadily obtained “Unhealthy” responses. M&Ms from the USA had been most steadily reported to style “Regular” (Determine 2). This will outcome from the elevated variety of contributors hailing from North America, the place the USA M&M is the default and thus extra “Regular,” whereas the Denmark M&M was extra usually perceived as higher or worse than the baseline.
Determine 2. Qualitative style response distribution throughout the entire dataset. The share of style responses for “Unhealthy”, “Regular” or “Good” was calculated for every sort of M&M. Determine made with Altair.
Now let’s get away some Statistics, equivalent to a chi-squared (X2) take a look at to match our noticed distributions of categorical style responses. Utilizing the scipy.stats chi2_contingency operate, we constructed contingency tables of the noticed counts of “Good,” “Regular,” and “Unhealthy” responses to every M&M sort. Utilizing the X2 take a look at to guage the null speculation that there is no such thing as a distinction between the 2 M&Ms, we discovered the p-value for the take a look at statistic to be 0.0185, which is critical on the frequent p-value reduce off of 0.05, however not at 0.01. So a stable “perhaps,” relying on whether or not you’d like this outcome to be vital or not.
3.1.2 Quantitative response evaluation — complete dataset.
The X2 take a look at helps consider if there’s a distinction in categorical responses, however subsequent, we need to decide a relative style rating between the 2 M&M varieties. To do that, we transformed style responses to a quantitative distribution and calculated a style rating. Briefly, “Unhealthy” = 1, “Regular” = 2, “Good” = 3. For every participant, we averaged the style scores throughout the 5 M&Ms they tasted of every sort, sustaining separate style scores for every M&M sort.

With the typical style rating for every M&M sort in hand, we flip to scipy.stats ttest_ind (“T-test”) to guage if the technique of the USA and Denmark M&M style scores are totally different (the null speculation being that the means are an identical). If the means are considerably totally different, it could present proof that one M&M is perceived as considerably tastier than the opposite.
We discovered the typical style scores for USA M&Ms and Denmark M&Ms to be fairly shut (Determine 3), and never considerably totally different (T-test: p = 0.721). Thus, throughout all contributors, we don’t observe a distinction between the perceived style of the 2 M&M varieties (or in the event you take pleasure in parsing triple negatives: “we can’t reject the null speculation that there’s not a distinction”).
However does this alteration if we separate contributors by continent of origin?
3.2 Continent-specific responses to “USA M&Ms” vs “Denmark M&Ms”
We repeated the above X2 and T-test analyses after grouping contributors by their continents of origin. The Australia and South America teams had been mixed as a minimal try to protect knowledge privateness. As a result of comparatively small pattern measurement of even the mixed Australia/South America group (n=3), we are going to chorus from analyzing traits for this group however embody the info in a number of figures for completeness and delight of the contributors who might ultimately learn this.
3.2.1 Categorical response evaluation — by continent
In Determine 4, we show each the style response counts (higher panel, word the interactive legend) and the response percentages (decrease panel) for every continent group. Each North America and Asia comply with an analogous development to the entire inhabitants dataset: contributors report Denmark M&Ms as “Good” extra steadily than USA M&Ms, but additionally report Denmark M&Ms as “Unhealthy” extra steadily. USA M&Ms had been most steadily reported as “Regular” (Determine 4).
Quite the opposite, European contributors report USA M&Ms as “Unhealthy” almost 50% of the time and “Good” solely 18% of the time, which is essentially the most detrimental and least optimistic response sample, respectively (when excluding the under-sampled Australia/South America group).
Determine 4. Qualitative style response distribution by continent. Higher panel: counts of style responses — click on the legend to interactively filter! Decrease panel: share of style responses for every sort of M&M. Determine made with Altair.
This appeared hanging in bar chart kind, nevertheless solely North America had a major X2 p-value (p = 0.0058) when evaluating every continent’s distinction in style response profile between the 2 M&M varieties. The European p-value is probably “approaching significance” in some circles, however we’re about to build up a number of extra speculation assessments and ought to be conscious of a number of speculation testing (Desk 1). A false optimistic outcome right here could be devastating.

When evaluating the style response profiles between two continents for a similar M&M sort, there are a pair attention-grabbing notes. First, we noticed no main style discrepancies between all pairs of continents when evaluating Denmark M&Ms — the world appears usually constant of their vary of emotions about M&Ms sourced from Europe (proper column X2 p-values, Desk 2). To visualise this comparability extra simply, we reorganize the bars in Determine 4 to group them by M&M sort (Determine 5).
Determine 5. Qualitative style response distribution by M&M sort, reported as percentages. (Identical knowledge as Determine 4 however re-arranged). Determine made with Altair.
Nevertheless, when evaluating continents to one another in response to USA M&Ms, we see bigger discrepancies. We discovered one pairing to be considerably totally different: European and North American contributors evaluated USA M&Ms very in a different way (p = 0.000007) (Desk 2). It appears not possible that this noticed distinction is by random likelihood (left column, Desk 2).

3.2.2 Quantitative response evaluation — by continent
We once more convert the explicit profiles to quantitative distributions to evaluate continents’ relative desire of M&M varieties. For North America, we see that the style rating technique of the 2 M&M varieties are literally fairly comparable, however there’s a larger density round “Regular” scores for USA M&Ms (Determine 6A). The European distributions keep a bit extra of a separation of their means (although not fairly considerably so), with USA M&Ms scoring decrease (Determine 6B). The style rating distributions of Asian contributors is most comparable (Determine 6C).
Reorienting to match the quantitative means between continents’ style scores for a similar M&M sort, solely the comparability between North American and European contributors on USA M&Ms is considerably totally different primarily based on a T-test (p = 0.001) (Determine 6D), although now we actually are at risk of a number of speculation testing! Be cautious if you’re taking this evaluation in any respect significantly.

At this level, I really feel myself contemplating that perhaps Europeans are usually not simply making this up. I’m not saying it’s as dramatic as a few of them declare, however maybe a distinction does certainly exist… To some extent, North American contributors additionally understand a distinction, however the analysis of Europe-sourced M&Ms isn’t constantly optimistic or detrimental.
3.3 M&M style alignment chart
In our analyses so far, we didn’t account for the baseline variations in M&M appreciation between contributors. For instance, say Individual 1 scored all Denmark M&Ms as “Good” and all USA M&Ms as “Regular”, whereas Individual 2 scored all Denmark M&Ms as “Regular” and all USA M&Ms as “Unhealthy.” They might have the identical relative desire for Denmark M&Ms over USA M&Ms, however Individual 2 maybe simply doesn’t take pleasure in M&Ms as a lot as Individual 1, and the relative desire sign is muddled by averaging the uncooked scores.
Impressed by the Lawful/Chaotic x Good/Evil alignment chart utilized in tabletop function enjoying video games like Dungeons & Dragons©™, in Determine 7, we set up an M&M alignment chart to assist decide the distribution of contributors throughout M&M enjoyment courses.

Notably, the higher proper quadrant the place each M&M varieties are perceived as “Good” to “Regular” is usually occupied by North American contributors and some Asian contributors. All European contributors land within the left half of the determine the place USA M&Ms are “Regular” to “Unhealthy”, however Europeans are considerably break up between the higher and decrease halves, the place perceptions of Denmark M&Ms vary from “Good” to “Unhealthy.”
An interactive model of Determine 7 is offered beneath for the reader to discover the counts of varied M&M alignment areas.
Determine 7 (interactive): click on and brush your mouse over the scatter plot to see the counts of continents in several M&M enjoyment areas. Determine made with Altair.
3.4 Participant style response ratio
Subsequent, to issue out baseline M&M enjoyment and concentrate on contributors’ relative desire between the 2 M&M varieties, we took the log ratio of every particular person’s USA M&M style rating common divided by their Denmark M&M style rating common.

As such, optimistic scores point out a desire in direction of USA M&Ms whereas detrimental scores point out a desire in direction of Denmark M&Ms.
On common, European contributors had the strongest desire in direction of Denmark M&Ms, with Asians additionally exhibiting a slight desire in direction of Denmark M&Ms (Determine 8). To the 2 Europeans who exhibited deflated delight upon studying their slight desire in direction of USA M&Ms, worry not: you didn’t suppose USA M&Ms had been “Good,” however merely ranked them as much less unhealthy than Denmark M&Ms (see participant_id 4 and 17 within the interactive model of Determine 7). In case you assert that M&Ms are a nasty American invention not value replicating and return to consuming artisanal European chocolate, your honor can seemingly be restored.

North American contributors are fairly break up of their desire ratios: some fall fairly neutrally round 0, others strongly want the acquainted USA M&M, whereas a handful reasonably want Denmark M&Ms. Anecdotally, North Individuals who discovered their desire skewed in direction of European M&Ms displayed alerts of inflated delight, as if their outcomes signaled posh refinement.
General, a T-test evaluating the distributions of M&M desire ratios reveals a presumably vital distinction within the means between European and North American contributors (p = 0.049), however come on, that is just like the twentieth p-value I’ve reported — this one might be too near name.
3.5 Style inconsistency and “Good Classifiers”
For every participant, we assessed their style rating consistency by averaging the usual deviations of their responses to every M&M sort, and plotting that in opposition to their desire ratio (Determine 9).
Determine 9. Participant style consistency by desire ratio. The x-axis is a participant’s relative M&M desire ratio. The y-axis is the typical of the usual deviation of their USA M&M scores and the usual deviation of their Denmark M&M scores. A price of 0 on the y-axis signifies excellent consistency in responses, whereas larger values point out extra inconsistent responses. Determine made with Altair.
Most contributors had been considerably inconsistent of their rankings, rating the identical M&M sort in a different way throughout the 5 samples. This may be anticipated if the style distinction between European-sourced and American-sourced M&Ms isn’t truly all that perceptible. Most inconsistent had been contributors who gave the identical M&M sort “Good”, “Regular”, and “Unhealthy” responses (e.g., factors excessive on the y-axis, with wider commonplace deviations of style scores), indicating decrease style notion talents.
Intriguingly, 4 contributors — one from every continent group — had been completely constant: they reported the identical style response for every of the 5 M&Ms from every M&M sort, leading to a median commonplace deviation of 0.0 (backside of Determine 9). Excluding the one of many 4 who merely rated all 10 M&Ms as “Regular”, the opposite three seemed to be “Good Classifiers” — both ranking all M&Ms of 1 sort “Good” and the opposite “Regular”, or ranking all M&Ms of 1 sort “Regular” and the opposite “Unhealthy.” Maybe these of us are “tremendous tasters.”
3.6 M&M shade
One other attainable rationalization for the inconsistency in particular person style responses is that there exists a perceptible style distinction primarily based on the M&M shade. Visually, the USA M&Ms had been noticeably extra easy and vibrant than the Denmark M&Ms, which had been considerably extra “splotchy” in look (Determine 10A). M&M shade was recorded through the experiment, and though balanced sampling was not formally constructed into the experimental design, colours appeared to be sampled roughly evenly, except Blue USA M&Ms, which had been oversampled (Determine 10B).

We briefly visualized attainable variations in style responses primarily based on shade (Determine 11), nevertheless we don’t imagine there are sufficient knowledge to help agency conclusions. In any case, on common every participant would seemingly solely style 5 of the 6 M&M colours as soon as, and 1 shade in no way. We depart additional M&M shade investigations to future work.

3.7 Colourful commentary
We assured every participant that there was no “proper “reply” on this experiment and that every one emotions are legitimate. Whereas some contributors took this to coronary heart and sometimes spent over a minute deeply savoring every M&M and evaluating it as in the event that they had been a sommelier, many contributors appeared to view the experiment as a contest (which sometimes led to deflated or inflated delight). Experimenters wrote down quotes and notes together with M&M responses, a few of which had been a bit “colourful.” We offer a unexpectedly rendered phrase cloud for every M&M sort for leisure functions (Determine 12) although we warning in opposition to studying too far into them with out diligent sentiment evaluation.

4. Conclusion
General, there doesn’t seem like a “international consensus” that European M&Ms are higher than American M&Ms. Nevertheless, European contributors tended to extra strongly categorical detrimental reactions to USA M&Ms whereas North American contributors appeared comparatively break up on whether or not they most well-liked M&Ms sourced from the USA vs from Europe. The desire traits of Asian contributors usually fell someplace between the North Individuals and Europeans.
Due to this fact, I’ll admit that it’s possible that Europeans are usually not engaged in a grand coordinated lie about M&Ms. The skew of most European contributors in direction of Denmark M&Ms is compelling, particularly since I used to be the experimenter who personally collected a lot of the style response knowledge. In the event that they discovered a solution to cheat, it was accomplished properly sufficient to exceed my very own passive notion such that I didn’t discover. Nevertheless, primarily based on this examine, it could seem {that a} strongly detrimental “vomit taste” isn’t universally perceived and doesn’t turn out to be obvious to non-Europeans when tasting each M&Ms varieties aspect by aspect.
We hope this examine has been illuminating! We might stay up for extensions of this work with improved participant sampling, further M&M varieties sourced from different continents, and deeper investigations into attainable style variations because of shade.
Thanks to everybody who participated and ate M&Ms within the identify of science!
Figures and evaluation will be discovered on github: https://github.com/erinhwilson/mnm-taste-test
Article by Erin H. Wilson, Ph.D.[1,2,3] who determined the time between defending her dissertation and beginning her subsequent job could be greatest spent on this extremely helpful evaluation. Hopefully it’s clear that this text is meant to be comedic— I don’t truly harbor any detrimental emotions in direction of Europeans who don’t like American M&Ms, however loved the prospect to be sassy and poke enjoyable at our energetic debates with overly-enthusiastic knowledge evaluation.
Shout out to Matt, Galen, Ameya, and Gian-Marco for aiding in knowledge assortment!
[1] Former Ph.D. pupil within the Paul G. Allen College of Laptop Science and Engineering on the College of Washington
[2] Former visiting Ph.D. pupil on the Novo Nordisk Basis Heart for Biosustainability on the Technical College of Denmark
[3] Future knowledge scientist at LanzaTech