Let’s change tack a little bit here and discuss a specific set of sensory tests: overall difference tests.
Deciding whether two samples of beer are different is not as easy as it may seem. Everyone perceives their senses slightly different than others, so what one person may find to be a noticeable difference may only be detected by few, if any, others. Various types of bias may also lead people to find differences that don’t exist. On top of this, the need for accuracy in your data means that you often need more than just a few people to be able to truly say whether there is a difference. So, as with all laboratory procedures, there are standardized methods and tests that are used when searching for differences in food and beverage systems. Even under the heading of “overall difference tests” there are a number of different tests that can be used, each with their own pros and cons.
Overall difference tests are used to find whether there is any detectable difference between two samples. Where exactly that difference originates is not necessarily part of the goal of the difference test, although you can usually pull out some hints to help guide your progress. This type of test differs from the more specific “attribute difference test” which seeks to determine whether a difference exists on the basis of one specific aspect of the sample, whether it is the color, the bitterness, the phenolic aroma, etc. I’ll discuss attribute difference tests later, but before we move on to the tests themselves, a word about error first.
In statistical tests such as these, there are essentially two types of error: α-error, and β-error. α-error is a numerical representation of the risk you are willing to accept for the possibility of finding a false-positive, or finding a difference when one doesn’t exist. β-error is the same type of numerical representation, but it signifies the risk you accept for possibly finding a false negative, or missing a real difference that exists between the samples. In practical situations, you must balance which risk you want to minimize over the other, since minimizing both requires many more panelists and samples than most production environments can accommodate. For overall difference tests it is usually the alpha that is minimized, while the β-risk is allowed to be large to keep the number of assessors reasonable. The default value for α is usually 0.05 meaning you, as the administrator, are accepting the possibility that there is a 1/20 chance that the results will indicate a difference when one doesn’t actually exist. An α of 0.05 isn’t required by any means, but it usually offers a good balance between risk-management and panel size.
What follows is a breakdown of a few of the more commonly used types of tests that can be used to find an overall difference between test samples.