A few months ago, I attended a presentation by a well-known geneticist who usually starts his lectures by acknowledging his debt to the Simons Foundation. The generosity of the Simons family has allowed the establishment and availability of repositories on the genetic, cellular and clinical data of a large population of autistic individuals as well as their immediate family members. Within this research portfolio, The Simons Simplex Collection, for example, is a warehouse of genetic and clinical data gathered from some 3,000 families each having one child affected with an autism spectrum disorder (ASD). This Collection advertises its mission with the intended meme “The Hunt for Autism Genes”. The massive undertaking has surmounted many laborious and financial thresholds for researchers who now can seat back and analyze this gigantic databank. In effect, the new environment has created a cadre of arm chair researchers and a new way of doing research. The new research culture has procreated an assembly line of projects each one initiated by looking into a database for positive correlations and possible ulterior funding. This is a far cry from classical research studies that in a stepwise fashion started with a question (hypothesis), did a literature review, designed a study, and finally collected data before pursuing any statistical analysis. Later on, funding considerations would be based on how researchers had pursued their findings and provided mechanistic underpinnings to their results.
Although I have spoken primarily about genetic studies, the same may be said about other research domains like neuroimaging. In all of these studies, differences between the classic research design and the modern database driven research include the following:
- In the classic method a researcher attempted to answer a question that in his mind had a valid interest. Many of them were borne from clinical observations. The results bore statistical significance when they were very unlikely to have occurred given the null hypothesis. The new method, in turn, attempts to find a publishable result within an already available dataset. The former approach focused on the patient, the latter on numbers.
- Every comparison in the classic design originated form a hypothesis specifically addressed by the research design. The new approach generates significant results by data crunching; sense and explanations coming after the fact. Many large database studies are fishing expeditions (think: “The Hunt for Autism Genes”); they do not remain tied to a stated design or protocol but shift the goal line as they move along. When working with a biostatistician the initial analysis is done and redone until something positive or interesting to the researcher pops up! Modern research is the purview of data manipulation through reanalysis.
- Prospective cohort studies in the classic method were used to discover causal relations but could not by themselves establish causality. Results, in this regard, were the first step in an experimental ladder aimed at explaining underlying mechanisms and making sense of the findings By way of contrast, the modern database approach to research does not pursue causal or explanatory research. Results of one study only lead to more data crunching. The fact that the results get published and funded, has rewarded researchers who pursue the “low-hanging fruit” questions.
- Positive results from large databases are to be expected! This is a problem inherent to large population samples. Indeed, large populations embolden the strength of statistical results by limiting the effect of outliers and diminishing the width of the confidence interval. This propitiates finding many results; but how many of these will be clinically relevant?
A short review of the research literature in autism, especially regarding genetics, reveals an unsettling trend towards data banking. A friend of mine, Paul Patterson (1943-2014) wrote a criticism for one of my books regarding the skewed funding of the federal government in favor of genetic studies. I was honored to receive his contribution as Paul was very sick and this was probably his last scientific contribution before passing away. Paul complained that analyzing large data banks was extremely naive when considering a multifactorial or complex condition. You could not read the effects of the environment or follow patients longitudinally from fixed databases. My own concern is, -how significant are the results of such analyses? I believe that when conducting a study, publishing an article, or asking for funding, researchers should stipulate how the results of their study make a difference in the quality of life of patients; in the now, not in the future. If such a question leaves a researcher gasping for air, then the original study gained significance only in the mind of the researcher, but not in the real world.
Casanova MF. The National Databse for Autism Research (NDAR): Playing Favoritesin Autism Research. Corticalchauvinism.com Readers interested in the problem of research in autism should read also read this blog and the next reference, both of which exemplify how the federal government dictates funding initiatives.
Casanova MF. Federally Funded Autism Research. Corticalchauvinism.com
Casanova MF. Paul Patterson: The Pathology of Autism: Is it a Strictly Genetic Disorder? Corticalchauvinism.com
Casanova MF. Autism Updated: Symptoms, Treatments and Controversies. Amazon Publishing, 2019.