Bias: How easily we’re fooled.

In sensory science, we deal directly with extracting information from tricky and fickle systems:  humans which, as we know, are animals with brains just advanced enough to get them into trouble.  Particularly, we focus on what the subjects are experiencing from a sensorial standpoint and that, on its own, is a system which is easily confounded. This article is about the things that fool us:  the phenomena that occur around us which influence us and how we perceive reality.  It’s rather startling just how easily we can be tricked and even manipulated, and there are long and growing lists which detail our understanding of the “failures” which can be triggered in our sensory systems.  Of course these are general tendencies and not concrete rules that every human unknowingly follows.  But bias is a clear and present threat to the validity of all sensory data, and care and vigilance must be exercised by panel administrators in order to mitigate its effects.

First, we’ll discuss some of the more general ways that humans can be fooled, some of which you’ve probably seen before, then we’ll move into how it directly affects a sensory panel and even the average beer taster.

Probably one of the most famous examples of these failures of the human brain’s perception abilities is the selective attention test by Daniel Simons and Christopher Chabris from 1999. Basically the video shows you a small group of people passing basketballs back and forth and asks you to count how many times the balls are passed. If you haven’t seen it, follow that link and watch it. It’s only a minute or so long, I’ll wait here. — Great, did you see the gorilla? At one point in the video someone in a gorilla suit walks across the screen, right through the basketball game. The point of the exercise is, if you’re so attentive to the basketballs you can miss something which is right in front of you, even if it is quite absurd and out of place. It’s probably so famous that it’s hard to fall for it anymore, but it is a well documented experience.

Another interesting one is the perceptual size-weight illusion (SWI). This occurs when the weight of a large object is underestimated when compared to a small object of the same weight. This can be clearly demonstrated by putting the same weight in two different sized boxes and picking them up to estimate their weight; the larger one will always be judged to be lighter, often despite attempts to convince yourself otherwise by examining the inside of the  box. While the mechanism behind this experience is still somewhat a mystery, it’s thought that it’s due to a sub-optimal integration of prior expectations and proprioception (the sensory modality by which your body interprets how it’s positioned relative to itself and how it moves).

Another famous example of how your brain can interpret reality incorrectly is represented well by this image. The first couple times you read that, you may read it as “I love Paris in the springtime”, but if you look closer you will note that there is actually two “the”s in the sentence, separated by a line break. This happens because when you normally read something your eyes tend to skim over the words without noticing individual letters, and your brain interprets what it sees by recognizing the general shapes of words and often filling in any blanks with what your brain assumes would be there, based on context. Using a line break between the “offending” words heightens the brains susceptibility to this illusion, as does using a fairly well-known phrase like a popular song title (giving your brain and eyes a reason to “skip to the end because I’ve heard this one before”). A similar effect is seen in this image, which demonstrates that it is still relatively easy to read and understand text if the letters of the words are jumbled, so long as the first and last letters remain in place.  As the image explains, your brain recognizes the word holistically, but it seems apparent that context plays a big role in how quickly you can read the text as well.

There are so many of these examples which can be found all over the internet with a simple search, but I’ll finish this section with a link to a video that I’ve always found quite fascinating. If you’re not familiar with TED Talks, you need to be. So many good, interesting, and enlightening videos there, free. One of the ones that will always stand out to me was one by the ever-interesting Vilayanur Ramachandran where he discussed phantom limb pain and the interesting ways he’s figured out how to treat it. You really should watch this video (I’ve linked to YouTube because my work computer doesn’t have Flash which is required to view TED videos). Also, I recommend you watch any other videos that Ramachandran has done. His specialty is issues with the brain and sensory science, from a physiological point of view. I’ve seen him on some television documentaries as well, including one about synesthesia, which is a neurological disorder characterized by the overlap of sensory signals, where a patient may see blue when they hear an A# musical note, or they taste bitter when they see the number 9 – symptoms are pretty unique among those who have it; no two patients seem to experience it the same way. The brain is awesome.

Anyway, moving back to bias in sensory science and how it applies to tasting beer. There are a number of seemingly minor ways that your perception of a beverage or a food product can be swayed or influenced, even if there are no actual changes in the product itself. Let’s run through some of these errors and biases:

  • Adaptation – This happens when your sensitivity to a given stimulus changes (usually decreases) due to repeated exposure to that stimulus or ones which interact with that stimulus in some way.  This is a physiological factor, and can lead to unwanted variability in your data.  This is particularly important in bitterness analysis, which much of my graduate research focused on.  The best way to inhibit this effect is to limit sample sizes and the number of samples per session.  Other things like unsalted crackers, water, pectin rinses (bitterness and astringency remover), and smelling yourself (“blanks” your nose, resetting it’s sensitivity a bit) can help mitigate this effect, depending on the stimulus in question.   Specific examples of this include sweetness ratings being artificially low for a sample which follows other sweet samples or, conversely, bitterness ratings being artificially high for samples following sweet samples.
  • Expectation Error – This is when information which comes along with the samples can trigger preconceived notions, and this can come in many many forms.  Threshold tests, where samples can be presented in increasing concentrations, can lead a panelist to give a result before the sample is thoroughly evaluated.  Or if a panelist knows, for example, that infected beer has been returned to the brewery they may start describing more spoilage-oriented defects in the samples they see, expecting to be seeing that sample on panel that day.  It can even affect Joe Pubcrawler when they taste a beer that a friend has just described, leading them to perhaps detect things they may not normally have, or may not even be there at all. All of these sources of information can bias the taster, and the steps that must be taken to limit this effect take a lot of planning and forethought on the part of the panel administrator.
  • Error of Habituation –  This error occurs as a tendency to rate a series of samples with a slowly increasing or decreasing stimuli as the same.  If the changes are gradual enough, it gets very difficult for a panelist to notice it.  Proper randomization and good experiemental design can hopefully prevent this problem.
  • Stimulus Error – This occurs when panelists notice otherwise irrelevent factors regarding samples or their presentation which modifies their perception.  Things like wine closures can have an effect on panelist’s perceptions, since screw-cap or boxes wines are generally (and unfairly) viewed as lower quality.  Even the timing of panel sessions can influence panelists if they see that a panel has been called on short notice, since they may assume that there are production defects which need to be evaluated.
  • Logical Error – This error comes when various characteristics of samples are commonly associated with each other.  One example of this that I frequently mention is from the lab portion of the brewing classes I was involved with in university,  where caramel coloring was added to something like Bud Light and panelists began to describe the beer with terms usually associated with darker, more flavorful beers.  In this case, the influential information comes in the form of the color of the beer.  Panel administrators often try to prevent this effect by using opaque glassware or colored enviromental lighting.  Another example of this would be the general correlation between bitterness and hop aroma, where a beer of high bitterness might also be seen as having higher hop aroma, since they are often associated with each other.  Again, administrators must use diligence in determining where these influential characteristics will arise, and take measures to prevent it.
  • Halo Effect – This happens when multiple attributes of a single sample are evaulated at the same time and how they are evaluated is influenced by the others.  For example, if a beer is being tested on its bitterness, hop aroma, sweetness, and also the sample’s general favorability, then a sample which was favored less may receive less favorable ratings for each of the attributes individually as well.   Prevention of this comes from pulling out the attributes to be tested and evaulating them in separate tests.  This is a particularly good idea with “hedonic” testing, where panelists are asked to describe how well they like or prefer a sample;  it’s not usually considered good sensory science to mix hedonic testing with the more objective types of tests (like descriptive profiling, difference tests, etc).
  • Mutual Suggestion – Panelists can obviously be influenced by other panelists.  If one grimaces at a particularly bitter sample, other panelists may notice this and alter their bitterness ratings, consciously or otherwise.  Diligence on the part of the administrator is needed to maintain discipline amongst the panelists.
  • Lack of Motivation – It should be of no surprise that a lack of interest in the process and motivation to produce good results will lead to a panelist of less-than-stellar acuity.  Panel administrators must decide whether the panel needs to be thinned (if you have that luxury), or whether additional “morale measures” need to be taken.  Of course that doesn’t mean that you don’t need to take those measures even when there isn’t a morale problem.  I end each of my panels with some sort of treat, ranging from cookies from the bakery to cheese tastings.  I’m considering adding additional perks in the future, as I’m able to focus all my attention on sensory in the near future.
  • Capriciousness vs. Timidity – Every panelist’s personality is somewhat different, and this translates into how they describe their experiences.  Some panelists may use the extremes of an intensity rating scale while others may be more hesitant to give something the highest rating, thinking that they should leave room for the possibility of future samples being higher.  It’s up to the administrator to monitor the panel’s progress and offer guidance where necessary.
  • Physical Condition – A number of other physical factors can influence a person’s ability to accurately assess samples, and they can range from sickness and fatigue, to smoking, drinking coffee and eating within an hour before panel.  Each of these can alter your capabilities, each in somewhat different ways.  Smoking is perhaps the most influential among those factors which are considered personal choices not just because it can dull your sense of smell, but also because it affects the panelists near you as well.  Cologne, perfume, and other strong smelling lotions have the same affect.  Generally, the best time for panel is just before lunch, since the taste and smell senses are particularly tuned for eating soon, and it’s not late in the day when people tend to get a bit more tired.  I suppose I could put zinc deficiency in this category as well, since zinc is an important part of the g-protein coupled receptors which are used in a variety of sensory systems.  A deficiency of zinc can lead to a decreased sense of taste.
  • Environmental Effects – Noise, televisions, cell phones, iPods, improper climate control, conversation:  the list of environmental factors which can bias or distract a panel is long.     A properly set up tasting room is essential for good results.  Booths help keep the panelist focused, so long as the experimental design permits their use.  I’ve even heard of (and perhaps seen) the effect that Monday has on a panel’s performance.  It’s hard to say for sure because, in our case, we usually decided to preempt its affect by either not tasting on that day or by using it as a training day and essentially discarding the data.
  • Sample Order – There many different ways that bias can be introduced just by how samples are ordered when presented.  The Temporal/Positional Effect is the tendency for the first or last samples to be rated better or worse, respectively, than they might otherwise.  If a test is long, the last sample might be rated artificially poor due to fatigue and loss of interest.  If a test is short, the first sample may be rated artificially high due to excitement or hunger.  The Pattern Effect is fairly self-explanatory, in that if there is any sort of pattern that is apparent in how samples are presented, then panelists will probably pick up on it and begin anticipating it.  The Error of Central Tendency means that samples near the middle of a set will tend to be preferred over samples at the ends.  This can manifest in discrimination tests, where the odd sample will be detected more often if it is placed in the middle, although this would be an “artificial” detection due to the Error of Central Tendency.  The Group Effect can cause a poor sample to be rated better if it’s part of a set of good samples. The opposite of the Group Effect is the Contrast Effect, where a sample which comes after a particularly bad one will be rated higher than if it had been rated by itself.  Randomization and adequate panel size should limit these errors.

Controlling bias is not a minor task.  It should be on the mind of the panel administrator at all times: during experimental design, designing the tasting room, preparing and serving samples, and interacting with panelists.  I like to think of our brains like the artificial intelligence from a video game: you’re playing in its world. It shows you the world how it thinks it is. It writes the rules that you must follow, and can break them at any time and you may never notice. Theoretically, it could put Richard Nixon, your great-aunt Selma (may she rest in peace), or a talking aardvark right in front of you and there may be no way for you to tell whether it’s real or not. Our jobs as sensory panel administrators, and even average beer drinkers, is to understand how the human brain can be fooled, misled, or otherwise influenced.  Then measures must be taken to mitigate or compensate for it so that we can be reasonably confident that we are interpreting our world correctly and describing our beer as accurately as we, being mere humans, can.


One response to “Bias: How easily we’re fooled.

  1. Belinda Bramley

    Hi there,

    Does any know of a good demonstration to show bias? We often show the coloured jellies flavoured differently but am looking for something new ! Any ideas ?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s