Interpretation of Statistical Inferences

Editor's Note: This article is based upon a presentation sponsored by District 1 (Harris County) of the Legal Assistants Division entitled Daubert and The Reference Manual on Scientific Evidence: Using Statistical Evidence Persuasively. The program was held for attorneys and legal assistants for the purpose of adding to their understanding of statistical methods in response to the decisions by the U.S. Supreme Court in Daubert v. Merrell Dow Pharmaceuticals, Inc., 113 S.Ct. 2786 (1993) and by the Texas Supreme Court in E.I. DuPont De Nemours and Co., Inc. v. C.R. Robinson and Shirley Robinson, (Tex., June 15, 1995).
  

Statistics, as presented in an Elementary Statistics course or through advanced regression modeling software, is the science of collecting and interpreting numerical data. The science of statistics is rigorously founded by the axioms and properties of mathematical probability and any inferences made from statistics are completely dependent on the validity of the probabilistic assumptions that are made. Statistics is a new field in the sciences, relatively speaking, since most of the significant progress has been made in the past seventy years. Through these developments two approaches have emerged as the basis for statistical research and study. They are exploratory data analysis and formal statistical inference. The techniques, measures, and results of these two approaches are very similar even though the interpretation of their results is drastically different. It is, in fact, the interpretation step which embodies most of the conflict for today's statistical consumer so we will briefly review both approaches.

Exploratory Data Analysis

Exploratory data analysis is an extremely important part of many scientific research projects because it reveals the relationships inherent in the collected data. Exploratory analysis can be performed on any set of data and the results are commonly found in the appropriate peer review journals. Exploratory analysis uses graphical techniques and numerical summaries to describe the variables in a data set and the relations among them. A statistical analyst will carefully 'explore' the data in every possible manner looking for patterns and/or relationships which might suggest conclusions or raise questions for further study. The Reference Manual on Scientific Evidence 1 devotes a complete chapter on the most common exploratory technique called multiple regression. The uses of multiple regression are so diverse and complex that if one is faced with regression based evidence he or she should consult an expert statistician specializing in regression techniques.

An important issue to be addressed with any analysis is the robustness of the technique. Robustness of multiple regression is discussed in the Reference Manual on Scientific Evidence which refers to robustness as ''whether regression results are sensitive to slight modifications in assumptions (e.g., that the data are measured accurately).''2 This form of analysis relies heavily on graphical techniques and interpretations, as well as experience. An intrinsic problem exists with consumer use of the results generated by exploratory analysis. While the results of exploratory analyses are easily obtainable through literary searches, the consumer is left to make his or her own interpretation. This is where the mistakes occur. What inferences can one make from the results of an exploratory analysis? The answer: none. This is not truly comprehensive, but it portrays a very important message about the freedom one has to make inferences from an exploratory analysis. Results gained through exploratory analyses are best used to pose a question, or hypothesis, which can then be studied through a complete and rigorous statistical design referred to as formal statistical inference. When a deduction, or inference, is made from exploratory analysis to a certain population, or parent population, the results become anecdotal, irrelevant and, therefore, worthless.

Formal Statistical Inference

Formal statistical inference begins with (1) a hypothesis and is followed by (2) the creation of the design of the study, or analysis. These designs can become quite complex and rigorous, and unfortunately may still fail to answer the question posed in the hypothesis. Assuming a proper design of the study has been completed, the next step is (3) sampling. Analysis of the data (4) begins after the data are collected. The last step which follows is (5) inference. We will briefly review sampling, analysis of the data and inference below. While the scope of this article is really too broad, hopefully you will finish with a grasp of the basic ideas and vocabulary and also be able to find references to assist you in further understanding.

Sampling and Data Collection

Two extremely important issues with respect to sampling are the problems of biased sampling and inadequate sample size. If the sampling is not 100% correct then any conclusions drawn from the sample statistics are not valid. In addition to the possible problems faced during sampling are the problems which might occur during the actual data collection. Valid data collection depends on blinding and reliable instrumentation.

Blinding refers to data being collected by someone who does not know anything about the exposure status for the subject being studied. This is a necessary process and is most commonly used in drug and chemical studies. Detailed explanations for blinding, and double blind experiments, can be found in Introduction to the Practice of Statistics.3 This introductory level text is referred to frequently in the Reference Manual on Scientific Evidence, and is a fantastic reference for definitions and explanations of many important statistical concepts.

Analysis of Data

The proper technique of analysis depends on the form of the variables of interest. The two forms of variables are continuous and discrete.

Continuous variables are variables which measure an outcome on a continuum such as height, temperature, distance, cost, and blood pressure. Discrete variables are variables which measure an outcome which can only occur in one of many different categories such as gender, number of children in a family, type of car, state of residence, and smoker or non-smoker. Also, variables are considered either dependent or independent. Dependent variables, sometimes called response variables, are variables which measure the outcomes of interest in a study. Independent variables, sometimes called explanatory variables, are variables which measure outcomes which might explain the changes in the observed outcome of interest. For example, if you were studying changes in blood pressure with regard to age and exercise the dependent variable would be something like systolic blood pressure, SBP, and the independent variables would be age and, maybe, hours a week each subject exercises.

Some data analysis techniques are specified in certain specific cases. If the data are categorical, that is, both independent and dependent variables are discrete, a method called chi-square testing is the proper form of analysis. If both independent and dependent variables are continuous, then the method of analysis called multiple regression is the proper technique. And, if the independent variables are discrete but the dependent variable is continuous then the correct technique is called analysis of variance, or ANOVA. In a complicated data analysis situation ANOVA is frequently used because it encompasses fixed-effect, random-effect, and mixed models. These models are extremely useful, and unfortunately extremely complex, for evaluating the level of variation caused by a particular variable. Questions relating to these perplexities can be left to an expert. If you are ever using the results of an analysis which uses terms and ideas such as these you should consult a professional statistician for interpretation.

To identify mistakes in the interpretation of an analysis a statistician looks for improper assumptions, improper techniques, and lack of identification and control for confounding factors. Also, for the results of a statistical analysis to be considered valid they must be able to be replicated. After each of these stages has been properly addressed, the last step is the inference, or more accurately, testing the hypothesis.

Inference

It is with the inference that the results of a properly conducted sample and analysis may be generalized, or inferred, to pertain to the parent population. The results of formal statistical inference not only answer the specific question posed but also provide a measure of reliability of the conclusions. Some common techniques used for inferential statistics are confidence intervals and significance testing. Confidence intervals are usually considered easier to interpret while a significance test and its p-value are actually more informative. Both of these techniques yield identical conclusions concerning the validity of the hypothesis and report a level of confidence or significance, respectively.

Conclusion

The differences between the results from the two different approaches of statistical analyses are subtle and are the playground for the vast amount of misinterpretation of statistics. If each of the previously stated stages of formal statistical inference are not correctly addressed then any generalization of the results is not valid. Unfortunately, many users of statistics do not adhere to these standards. Consumers of statistics, especially members of the legal profession, should concern themselves with the legitimacy of the inferences that they make from reported or published statistical results. Statistics have become a part of everyday life in the modern world, but the interpretation of the results is not necessarily being carried out correctly. The proper use of statistical results is a powerful tool and should be used as scientific evidence for decision making in the face of uncertainty. The power and strength of statistics are irrefutable and so they should be used whenever a case permits. Also, be prepared to attack the statistical evidence of others because more than likely an improper inference is being made against your case.

Statistics are fun, useful, and helpful, so take the time to learn as much as you can about the subject. Statistics are not going to go away, so prepare yourself to use them in your favor, that is, prepare yourself to use them correctly.


1Reference Manual on Scientific Evidence, Federal Judicial Center, 1995.
2 Id. at 432.
3 David S. Moore and George P. McCabe, Introduction to the Practice of Statistics (New York: W.H. Freeman and Company, 1993).
Patrick Tarwater received his B.A. in Mathematics with a Minor in Philosophy from Texas Tech University in 1990 and received his M.S. in Mathematics from Texas Tech University in 1992. Pat is currently a member of the faculty at the University of St. Thomas and a Ph.D. student in the Department of Biometry (or Bio-statistics).
To order copies of the bound seminar materials, complete the form below.

Order Form
Daubert v. Merrel Dow Pharmaceuticals, Inc.
Seminar Materials.

Cost is $15.50 (includes sales tax and postage). Please make payment to LAD.

Name____________________________________________________

Address_________________________________________________

City____________________________________________________

St._____________________________________________________

Zip_____________________________________________________

Mail form and payment to Norma Hackler, P.O. Box 1375, Manchaca, TX 78652.

TEXAS PARALEGAL JOURNAL
Summer 1996
©1996 Legal Assistants Division, State Bar of Texas


Return to TPJ Fourth Edition


Return to TPJ Home Page