Points of View

Consumer Surveys for Class Action Litigation

Consumer Science Applied to Class Action Litigation

How to Measure Likelihood of Confusion

The Likelihood of Confusion (LOC) research used in trademark infringement litigation is based on the following text of 15 U.S.C. 1125 known as the Lanham Act.

Any person who, on or in connection with any goods or services, or any container for goods, uses in commerce any word, term, name, symbol, or device, or any combination thereof, or any false designation of origin, false or misleading description of fact, or false or misleading representation of fact, which–


is likely to cause confusion, or to cause mistake, or to deceive as to the affiliation, connection, or association of such person with another person, or as to the origin, sponsorship, or approval of his or her goods, services, or commercial activities by another person, or


in commercial advertising or promotion, misrepresents the nature, characteristics, qualities, or geographic origin of his or her or another person’s goods, services, or commercial activities, shall be liable in a civil action by any person who believes that he or she is or is likely to be damaged by such act.

Current Survey Research Practice

As can be seen here, the act sets up three conditions that may cause the alleged infringer to be found liable:

  • Confusion
  • Mistake, or
  • Deception

Confusion, or more precisely, likelihood of confusion, is probably the most common target of litigation and is distinguished by the courts’ readiness to admit survey results into evidence. The text of the law stipulates that LOC may manifest itself in six different ways (in the order indicated below) describing the relationship between the products or services being litigated:

  • Affiliation
  • Connection
  • Association
  • Origin
  • Sponsorship, or
  • Approval

The most commonly used research designs usually take the six states mentioned in the text and ask survey respondents whether they think they apply in the case being litigated. The specific questions asked of respondents tend to break up the six-state array into subsets resulting in several questions rather than listing them in a single question.

The questions posed in this paper are:

  • Is this practice appropriate?
  • Is it just a matter of research convenience—copy the legal text verbatim and don’t worry about the implications—or is there some fundamental logic behind it?
  • What would the ordering be if it followed established consumer behavior theory principles?
  • And, finally, how would a consumer behavior-driven approach affect the results of LOC studies and, ultimately, judicial decisions?

It appears that the law intended to cover what it perceived to be many, or all, of the manifestations of likelihood of confusion. But in stringing all the various circumstances in the order it did it placed an unacceptable burden on survey researchers who are driven by the dictates of consumer behavior theory. When looking through the prism of human behavior the six-element set seems completely disorganized and, therefore, unfitting as a list of variables that can be measured via questionnaire-based survey research.

Likelihood of Confusion as a Process

LOC research tries to discover whether consumers think there is any relationship between the plaintiff’s and the defendant’s product, brand, or mark. To obtain that information researchers employ one of two methods. They either expose a sample of target market members to the two stimuli, either sequentially or side-by-side, and ask them for their opinions regarding the relationship between the two with regard to the six variables or expose respondents to the alleged infringing product and ask them to identify what company is responsible for it.

How do consumers develop an answer to that question? The interview situation is supposed to replicate, or come as close as possible to, real life conditions when consumers are about to make a choice as to what product they will buy. Both in real life situations and in the simulated interview situation, consumers are exposed to stimuli. What do people do when exposed to stimuli? They go through a three-step process. First they take the information in by means of perception; that is, they absorb as much of the information presented to them as they require to get to the next step. As soon as the information is stored in short-term memory they proceed to label it relative to their experience field, referring to such antecedents as: product category, circumstances of use, prior experience and satisfaction, knowledge about the manufacturer and the other products marketed by the same manufacturer, brand image, perceived value for the money, etc. Once the stimulus is idiosyncratically labeled by each respondent, they are ready for action. If this were a real life situation, the action would be product choice. In surveys the action is answer choice, i.e., which answer best represents the interviewee’s response about the nature of the relationship between the two stimuli.

To be reliable, the answer choice must be specific and differentiated from all the other possible answers; it cannot be a compound answer where sub-sets of the six-item set are indistinguishable from one another because they are presented in sets of three, two, or any number larger than one.

States of Likelihood of Confusion – The Hierarchy

This brings us to the next point—trying to determine whether the “set of six” likelihood of confusion circumstances expressed in the text are ordered randomly or are following some purposeful pattern, or whether they represent a hierarchy. If the latter is the case, then we must uncover the hierarchical order. As will be shown below, the presence of a hierarchical order dictates a fixed ordering in the questionnaire itself and has implications regarding data analysis and the conclusions that can be derived from that analysis.

The table below shows the hierarchical array formed by the six LOC circumstances enumerated in the text. The items are listed top-down in the order of the strength of the relationship.

Likelihood of Confusion Condition or Circumstance Nature of the Relationship
Origin Equivalency
Sponsorship Active involvement by the Plaintiff
Approval Passive involvement by the Plaintiff
Affiliation Weak relationship of a vague nature
Association Weak relationship of a vague nature
Connection Non-specific

Origin is placed at the top of the list because it represents the strongest form of relationship possible when respondents believe that the two products they have been shown as stimuli are actually put out by the same company.

Next in line, in terms of strength of relationship is sponsorship because it implies that respondents believe that the Plaintiff has initiated an active relationship whereby it has taken the Defendant’s product under its auspices and actively supports and promotes it.

Third in line is approval, which denotes a passive acceptance of the Defendant’s product by the Plaintiff. While passive—as opposed to sponsorship, which we have labeled active—approval connotes a conscious and knowing act on the part of the Plaintiff.

Affiliation and association suggest a vague relationship; the terms themselves do not give any information as to possible reasons for the affiliation or association, the strength of the relationship or its durability.

Finally, connection is the lowest level in the hierarchy of strength of relationship because it lacks any specificity as to motive, strength or marketing rationale. The term itself is quite vacuous.

Implications for Questionnaire Design

If a correct interpretation of the law is that any of these conditions is enough to label the respondent “likely to be confused,” the implications for questionnaire design is to ask each of the six items individually and hierarchically, that is, to start from the top of the hierarchy going down until the respondent says “Yes” to an item. That designation of the type of relationship between the Plaintiff’s and Defendant’s products is then taken as an indication of likelihood of confusion.

This interviewing method results in the inclusion of any answer among the six in the “LOC bucket” regardless of the strength of the perceived relationship. One must assume that that was the intent of the writers of the Act; why would they have, otherwise, offered a list made up of six distinct possibilities?

This approach can find great support in consumer behavior theory in so far as perception and cognition are concerned. In other words, consumers may process information, form beliefs and build expectations based on very little information. The reason is that the brain abhors information gaps and fills in the “missing information” to form an opinion or stake out a position. Depending on the degree of objective similarity between the two stimuli and the individual respondent’s idiosyncratic processing mechanism, when we sum over all the respondents in a survey we can quantity the extent of LOC in the case and compare it to the results of a control cell to determine whether the results are statistically significant. In addition to that, judges and juries will still have to decide if the results are meaningful in terms of their absolute magnitude.

One of the ways in which this approach differs from current practice is that it asks about each state individually while current practice often lumps three or more states in a single question. This improvement can go a long way towards clarifying the basis for the finding of LOC and provide judges and juries with a deeper and better understanding of the severity of the infringement.

The data produced using this approach can help determine courses of action, damages, etc. For instance, cases where the large majority of “likely to be confused” consumers got there because they believe there is an association between the two marks may be treated differently than cases where they got there because a majority identified their LOC with equivalency. It is conceivable that judgments may include remedies commensurate with those findings.

Summary and Conclusions

This paper has shown that the six ways that the Lanham Act recognizes likelihood of confusion are not created equal, certainly not in the eyes of consumer behavior theory. The paper argues in favor of viewing the six states as arrayed along a hierarchy, which leads to a different questionnaire design architecture than currently used. The advantages of the proposed method are: the ability to describe in detail the reasons associated with LOC in every case; the application of internally consistent logic, which follows consumer behavior principles; and the increased capacity for meaningful analysis.

Viewing the Lanham Act stipulations as a hierarchy of relationships rather than as an undifferentiated assortment brings the interpretation of the Act in line with the teachings of consumer behavior theory providing a much needed link between the law and real life realities.

Can Data Analysis Establish Secondary Meaning?

Below are three hypothetical results of three “secondary meaning” studies using a test v. control design on a sample of approximately 250 respondents in each cell. The three tables show the results generated when respondents were asked if the stimuli they have just seen are “made or put out by one company, more than one company or you don’t know or are not sure?”

Case 1

Test Group

Control Group

(Sample Size)



Percent identifying the stimulus product as “made or put out by one company”



(*)  Significantly different from the Control Group at 95% confidence

Case 2

Test Group

Control Group

(Sample Size)



Percent identifying the stimulus product as “made or put out by one company”



(*)  Significantly different from the Control Group at 95% confidence

Case 3

Test Group

Control Group

(Sample Size)



Percent identifying the stimulus product as “made or put out by one company”



(*)  Significantly different from the Control Group at 95% confidence


In each case, the difference between the test group and the control group results is statistically significant.

Evaluate each case separately and determine whether the results point to a finding of secondary meaning or not and explain your rationale.

I, as well as many visitors would appreciate your point of view on this important issue. Please use the Comment box below.

Letting Science In Through the Front Door

Likelihood of confusion surveys routinely employ Test/Control (T/C) designs—a scientifically sanctioned approach for detecting the presence of likelihood of confusion and for assessing its magnitude. Based upon the results presented by survey research experts, judges and juries are asked to determine if results support a finding for the plaintiff or not.

In other words, judicial decision-makers are charged with: (1) determining the presence of alleged infringements, and (2) evaluating the significance of the findings in forming a decision. In many cases, that determination is made on the basis of either case-by-case subjective evaluation or precedent. Of course, what we call precedent today was a subjective, though well-reasoned, assessment at the time of its establishment.

The purpose of this paper is to argue for giving the science of statistics a chance to determine significance in likelihood of confusion cases as it does in many other fields, ranging from F.D.A.-required new drug approvals to basic science experiments in fields as diverse as biology, behavioral economics, experimental psychology, or physics and chemistry.

Why Use Test/Control Designs?

Before getting into the heart of the argument let us review the reasons for using T/C designs in likelihood of confusion cases. The process of detecting likelihood of confusion benefits greatly from the T/C design by allowing us to compare the results obtained from measuring the allegedly infringing mark, word, design, etc., to the effect of a stimulus that would not have given the plaintiff reason to launch a complaint in the first place, ceteris paribus.

The Control stimulus is the equivalent of a placebo in new drug research—it “controls for” (by keeping them constant across both the Test and the Control cells) all the variables that might have affected the results produced by the test stimulus except the critical factor that prompted the lawsuit.

While adding a control cell heaps additional data collection costs on the survey, it has the beneficial effect of eliminating all other possible explanations for the test results turning out the way they did except the reason that set off the suit originally. This procedure brings us as close as possible to a cause-effect relationship as can be created in survey research.

What Gets Measured in Test/Control Designs?

The T/C design yields two results in both the Test and the Control cell:

  1. The proportion of consumers who think that the plaintiff’s and defendant’s products are made by the same company or by companies that are affiliated, connected, associated, or related to one another by means of licensing or permission.
  2. The proportion of consumers who do not think that any of the above relationships exists.

When the proportion of respondents that think there is a relationship between the two products to which they have been exposed in the Control cell is subtracted from the same proportion in the Test cell, the result is a net likelihood of confusion value. If that value is greater than zero, we conclude that likelihood of confusion is present. The next decision that needs to be made at that point is how significant is the relationship in the eyes of the law. Does the likelihood of confusion value found by the expert justify finding for the plaintiff?

In summary, T/C designs cover two areas: presence and significance of likelihood of confusion.

What Are the Guiding Principles for Determining Significance?

There are two ways for determining significance:
The first method is employed when the determination is totally judgmental. The decision-maker will be pursuing a maximal decision rule such that the larger the net value of the likelihood of confusion she has found, the higher the confidence that the results are not simply a research artifact and that they must represent what is really going on in the marketplace.

Method Guiding Principle Evidence
1. Subjective/Precedent Maximal net likelihood of confusion The larger, the better
2. Scientific Statistical significance of net likelihood of confusion Just enough to eliminate pure chance as the explanatory reason

The second method uses the traditional statistical test of difference between two proportions to determine if the results are statistically significant at 95 percent confidence, the level traditionally used in survey research.

The major difference between the two methods—in addition to the first being subjective and the second being scientific and completely objective—is in the evidence required by them. When using the subjective method, the decision maker looks for large differences between Test and Control; the higher the net likelihood of confusion values the higher his or her confidence in finding for the plaintiff. By contrast, statistical decision-making is driven by finding just enough of a difference between Test and Control to determine whether or not the difference is significant or could it have happened by chance.

“The more the better” rule has strong intuitive appeal, but intuition is not always a good estimator of what is going on in the real world. The statistical significance method is superior. By eliminating the possibility of chance it is a far more precise and a far sharper measuring device because it can “spot” likelihood of confusion as soon as it happens vs. subjective estimation, which relies solely on intuition to detect when likelihood of confusion has actually happened.

The subjective method errs most often on the side of the defendant as, for example, when jurors decide that, say, 20 percent net likelihood of confusion is not enough when, unknown to them, 11 percent might have been enough to satisfy the objective criteria used by statistical testing. Statistical significance does not have the propensity to err in any direction.

Finally, if there is a choice between a precise, science-based, measurement and an intuitive measurement, wouldn’t justice be better served by relying on objective criteria that eliminate all uncertainties?


Measuring the Impact of False Advertising

Cases of false advertising typically revolve around two questions: (1) did the false advertising have an impact on purchasing behavior? and (2) if such an impact did indeed occur, what was the potential loss to the plaintiff’s brand? The first question addresses the presence of impact while the second is concerned with its magnitude.

In preparing for litigation, attorneys often employ consumer research to answer the first question and damages experts, i.e., economists, forensic accountants, corporate finance professionals, etc., to answer the second question.

This paper describes a new method for using consumer research methodology that can not only detect the presence of impact, but also go one step further and quantify the potential gain due to false claims. What this new method cannot do is to quantify what percent of sales has the plaintiff’s brand lost to defendant’s brand. That brand-to-brand comparison is still reserved to damages experts using economic theory and corporate finance estimates.

The Setting

Let’s take as an example a case in which the plaintiff (P) argues that the defendant (D) has been using four false claims on its packaging and on its website. In its suit, P argues that those false claims are likely to affect consumer choice behavior and to cause P to lose sales to D.

The Research Design

The design of this study presents a sample of consumers with a choice between two unidentified brands—labeled Brand W and Brand H—each described by the four attributes being litigated. Brand W lists the attributes verbatim as D has been using on the package and on the website. Brand H lists the same attributes either in what would be their accurate version or by a statement denoting that Brand H does not make any reference or claim regarding that particular attribute. Thus, Brand W replicates the marketing communication employed by D—which brought about the suit—while Brand H attempts to replicate the marketing communication of a brand that would not have caused P to file suit.

Each attribute is presented to respondents separately on a screen (if the interview is conducted online) that shows the attribute statement for Brand W and the attribute statement for Brand H. Respondents are instructed to imagine that they are shopping for the product category under study and that they are holding in their hands two packages—Brand W and Brand H—that contain the four attributes.  They are asked to choose a point on a 5-point scale that best represents their preference when faced with that information. The scale is:

  • Very likely to buy Brand W
  • Somewhat likely to buy Brand W
  • Indifferent between the two brands
  • Somewhat likely to buy Brand H
  • Very likely to buy Brand H

The attribute ratings question is followed by an importance ratings question in which respondents are asked to use a number between 0 and 100 that best reflects the importance of each of the four attributes when selecting for purchase a brand in the product category of interest.

The reason for this question is to quantify inter-personal differences of each and every category user since not all consumers go to market with the same set of requirements or values. Allowing for the real expression of attribute importance, rather than assuming that all consumers value all attributes equally, injects a high level of realism into the experiment. The greater the realism the higher is the validity of results.

Analysis of Results

The table below captures the key results of a recent study.

Importance Rating
Percent Impacted
Impact Weighted by Importance
Percent Not Impacted
Lack of Impact Weighted by Importance
A .81 92 75 8 7
B .63 75 47 25 16
C .48 53 25 47 23
D .74 53 39 47 35
Average Impact 47 39

Column [1] lists the four attributes that P deems to be false.

Column [2] shows the importance ratings mentioned earlier as a fraction ranging from 0 to 1.00. As can be seen in Column 2, the allegedly offending attributes differ in their capacity to impact consumers because not all consumers are equally susceptible to all the attributes. The “susceptibility variance” is measured by how important each attribute is to every individual member of the consuming public. In other words, not every false claim has the same “sticking power” for all consumers; consumers implicitly differ from one another.

Column [3] shows the percent of the sample that checked the “Very/Somewhat likely to buy Brand W” in the questionnaire. These are the people who were affected by the content and the phrasing of the false claim.

Column [4] is the result of weighting the percent of people impacted by the false advertising by the attribute’s relative importance. The weighted impact is obtained by multiplying the data for each attribute in Column [2] by Column [3]. In so doing we account for the very important fact that not all attributes are equally important to all the members of a market. The adjusted, or weighted, result captures interpersonal differences as well as inter-attribute differences and thus safeguards the validity of results.

Column [5] shows the percent of people who were not impacted by the false advertising, i.e., those who showed preference for the “not-misleading” brand or were indifferent between the two brands. The data in Column [5] is the complement of Column [3]; the sum of the two equals 100 percent.

Column [6] presents the weighted percentages of the non-impacted people for each attribute, just as Column [4] presented the results for the impacted people.

The last line at the bottom of the table shows the weighted average brand impact for the impacted (BI[I]) group, which equals 47, and the weighted average brand impact for the not impacted group (BI[NI]), which equals 39. The Brand Impact Indexes are calculated by summing over the four attributes and dividing by four.

The Gain Factor

Armed with the two indices we can now derive a gain factor due to false advertising using the formula:

Gain Factor = [BI(I) – BI (NI)]/BI(I)

The formula states that the gain factor is equal to the net impact of the falsely advertised brand as a proportion of its total brand impact. The Gain Factor answers the question: What proportion of the preference for the misleading brand is due to the false claims made in its marketing communication? In this case, the gain factor equals:

Gain Factor = [47 – 39]/47 = 17%

This means that the false advertising being contested here is responsible for 17 percent of the total preference for the brand that is attributable directly to the false advertising of the four attributes.


When a brand is alleged to have engaged in false advertising the plaintiff lays claim to a portion of the infringing brand’s revenue and profits arguing that ceteris paribus, if it were not for the false claims, its own sales and profits would have been higher. This formulation pits two brands against each other in a zero-sum scenario.

Consumer research cannot estimate the transfer of sales, revenue or profits from one brand to another that is due to false claims. But, as demonstrated in this paper, consumer research can estimate the overall ill-gotten gain of the alleged misleading brand.

Instead of pitting a specific brand against the alleged infringer, this estimate is pitting the infringer against the average of all brands in the product category. It does so by estimating the ill-gotten gain as compared to not having used the allegedly misleading claims.

We can employ consumer research as shown here to assess the gain from misleading communication regardless of who the loser might be, as long as the loser is a brand operating in the same product space.

The methodology discussed here provides the finder of fact with a quantitative estimate of the impact of false advertising that should be of value in finding for one of the litigants.

The Best Way to Rebut a Rebuttal

Most rebuttal reports start with the phrase “This research is fatally flawed.” Whether that is true or not, experts tend to use that phrase quite liberally without paying too much attention to the supporting evidence.

When rebutting a survey that has been submitted in evidence, one should ask two questions: (1) has the researcher adhered to the fundamental scientific principles governing the design and execution of survey research; and (2) has he or she gone beyond acceptable and customary design choices? If the rebutter can demonstrate that the researcher has violated fundamental principles, then the “fatally flawed” conclusion is definitely appropriate.

When it comes to design choices, the rebutter must be careful to distinguish between unacceptable choices—which would automatically fall into the “fundamental principles” category—and choices that might differ from the choices the rebutter would have made, but which certainly do not qualify as “fatal flaws.” What can the researcher whose work was wrongly accused of having committed “fatal flaws” do?

In a case in which I was involved five years ago in the Eastern District of Michigan I was fortunate that the client agreed to fund a remake of my original research in which I substituted my original design choices with the rebutter’s recommendations. The rebutting expert in this case was one of the most prominent and well-published members of the profession who had to dig very deeply to find something wrong with my research and ultimately resorted to making indefensible assertions.

As one would expect, the results of the “remake” were identical to the original study proving to the court that design choices that are clearly not intended to sway the results one way or the other are not worthy of the “fatally flawed” label.

When litigating a case in which a rebuttal is proffered, attorneys should be aware of the “fundamental principles vs. design latitude” dichotomy and pursue it at deposition and also, if possible, by replicating the original research while incorporating the rebutter’s design recommendations. Yes, that adds to the overall expense, but it saves the agony of having to convince the finder of fact or juries that the points raised by the rebutter are trivial and inconsequential.