The Testing Fallacy
How the government, “experts,” and the media drank the COVID-19 testing Kool-Aid.
Test, test, test
Medical and public health experts, government and the media have put all our collective eggs in the COVID-19 testing basket.
Re-opening the economy and the world is dependent on testing. Many demand to be tested because they’re scared, uncertain and want to know if they are infected or a risk to others. Some are getting tested so they don’t have to wear masks, stand six feet apart, and live in separate rooms for the next two years. Others are getting tested because the COVID-19 drive-thru is just a block away from the Starbucks drive-thru. Still others don’t care or would never submit to a giant Q-tip being shoved up their nose. And now businesses are being pressured to test everybody to let them back to work.
The experts are convinced the problem isn’t the test—it’s that we’re not testing enough. They tell us that without this information, “it’s risky” for anyone, anywhere, at any time.
Science doesn’t get any more objective than this, right? You do the nasal swab test—or now a blood or saliva test—and either you’re “positive” or you’re “negative.” How can something so straightforward not be the right thing to do?
When it comes to a scientific strategy for diagnostic testing for COVID-19, and the admonition to “test, test, test,” our country and most others have drunk the proverbial Kool-Aid. We are finding out the hard way that the results are tainted with wrong assumptions and poor outcomes. The entirety of the problem lies in the reliance on a flawed assumption of a “perfect” test.
The hazards of treating testing as a diagnostic panacea is crippling a sensible return to normalcy while doing little to stem the potential rate of disease transmission. At the same time and ironically, skewed reporting of the results by counting only the positives is generating a desperation for still more testing—a theme that has itself gone viral.
But virtually no one in a position of leadership—virtually no “expert” from Dr. Fauci or Dr. Birx to Harvard School of Public Health or the World Health Organization (WHO)—nor the New York Times, Chris Cuomo or Fox News, nor any other naysayers and conspiracy theorists—has the technical insight or willingness to be politically incorrect in calling B.S. on the sweeping failure to apply an appropriate statistical principle quietly understood by everybody but the security guards at the Centers for Disease Control (CDC).
The consequences of the deadly combination of scientific semi-literacy—knowing enough to have opinions and broadcast them on social media—and bowing to the demands of political correctness, could not be more dire.
This mistake has been made nationwide en route to mismanaging the worst pandemic in a century. The consequences have been catastrophic. Over 120,000 people have died of infection, but millions of uninfected people have been needlessly quarantined, and countless infected people have been told they are free and clear to go out into the world and infect others. Nobody knows who’s really infected, so everybody is forced to wear a mask and sanitize every surface they think they touched (despite no actual evidence this route has transmitted any infections).
Paranoia is the new sanity. Obsessive-compulsives have attained nirvana.
The livelihood of millions of people—formerly known as “the economy”—has been decimated by sweeping restrictions on personal space in the name of public health. Only the stock market has been spared as it writes off 40 million unemployed and the growing death toll because Netflix revenues are up and a vaccine is coming any day now.
How did this happen, and where does it end?
No medical test is ever perfect. That’s why knowing a test’s performance characteristics—its accuracy in detecting infection in people who test “positive” (the sensitivity) and ruling out infection in those who test “negative” (the specificity)—is critical to interpreting the results, especially when the test is applied to screen a large population.
There is a very important distinction between laboratory sensitivity/specificity (i.e., in a test tube) and clinical test performance characteristics (i.e., in actual people) as they relate to both individual diagnosis and population screening. The profound importance of this distinction between using a test to make an individual clinical diagnosis and applying it for population screening interpretation cannot be overstated.
Interpretation of a test for an individual diagnosis—a clinical application—does not really depend on population prevalence, which is the percentage of people who are actually infected.
However, when that very same individual diagnostic test is applied as a screening method for the general population, the reliability of a positive result—the likelihood it’s a “true positive” rather than a “false positive”—depends largely on the actual prevalence in the population as well as the test performance characteristics.
This basic tenet of biostatistics is known as Bayes’ theorem. Most people have never heard of it, or can’t grasp the math. But every public health graduate student must know it to get a degree. It’s not a matter of opinion or belief, like the solution for global warming or whether the call to the Ukraine was perfect.
For those who do know Bayes’ theorem or have become experts after graduating from Google University, here’s the bombshell that blows apart virtually every news conference with Governor Andrew Cuomo you’ve ever seen: When a large number of asymptomatic, otherwise healthy or low-risk people are tested (screened) with even a nearly perfect test in a population with relatively low prevalence, most of the positive results are actually false positives, rather than true positives. Worse still, even a small decrease in test specificity lowers the positive predictive value significantly.
What this means for COVID-19 is that the likelihood of a false positive test is actually far greater than the likelihood of a true positive. This is the case because from an epidemiological standpoint, the prevalence of “confirmed cases” is still relatively low even though it climbed substantially between February and June: 1 in 222,222 back on February 26 when Trump declared the 15 cases “within a couple of days is going to be down to close to zero;” to approximately 1 in 1,000 on April 10 (per the CDC: 427,460 ‘confirmed’ cases in a population of 329,450,000); to around 1 in 166 on June 10 (1,973,797 divided by 329,450,000 – 112,133). For comparison, common diseases like hypertension and diabetes mellitus (which the coronavirus likes) have a much higher prevalence, in the range of 1 in 10 to 1 in 4.
The reason why the actual false positive rate is necessarily much higher than the true positive rate is not because the test isn’t reasonably accurate, but because it’s not being applied correctly.
If a test is reliable, utilized correctly, and interpreted properly, it can provide valuable clinical and public health information to inform policy decisions. But if it’s unreliable, applied inefficiently, or misinterpreted—or all of the above—it may provide misleading or contradictory information. The key is understanding the clinical test characteristics and the mathematical principles behind them.
It is just as important to recognize that “testing” is not required to understand and contain an epidemic. In fact, prior to 1980 nearly all infectious disease outbreaks were defined by clinical and epidemiological characteristics. One example is the Broad Street cholera outbreak of 1853-4 in London. A more recent example is Legionnaires’ disease of 1976 in Philadelphia.
To understand how we got from theoretical math to this precarious place, we need to look at the science behind the “science.”
The Brief History of COVID-19 Testing
The WHO originally developed a real-time polymerase chain reaction (RT-PCR) in about two weeks after the outbreak was first reported in Wuhan and the genetic code of the novel coronavirus was sequenced. This test can detect even one molecule of coronavirus RNA. The Chinese rolled out the WHO test and leaned on it heavily. Italy and Spain used the WHO/Chinese test as well.
However, while laboratory sensitivity and specificity (i.e., in a test tube) were measured, clinical test performance characteristics (i.e., in actual people) –what Bayes’ theorem predicts—were not. At this stage in the pandemic the value of test characteristics was highest, and therefore the opportunity to ensure the test would be applied and interpreted effectively was also at its highest.
Fast forward one month. What happened in Wuhan didn’t stay in Wuhan. While the Trump administration was calling the threat of spread to the U.S. a “Chinese hoax” and the WHO was giving the novel coronavirus a politically correct, sanitized appellation of “COVID-19,” the CDC decided to reinvent the wheel on diagnostic testing. It took just a couple of weeks for CDC to create its own PCR test, which was approved by the FDA the first week of February. That was the easy part.
Clinical case series from China and Singapore published online in prestigious, English-language journals such as The Lancet¸ JAMA and the New England Journal of Medicine as well as Chinese journals in January through March, 2020 documented how heavily diagnostic RT-PCR testing was relied upon for infection diagnosis and treatment. Only positive tests were deemed to be “confirmed cases.” No information about the outcomes of those who tested “negative” was reported. This is a lot like evaluating a baseball player’s batting average by the number of hits without counting how many at-bats it took him.
The obvious lesson from the Chinese experience should have been: don’t rush to test until you’ve tested the test. Instead, CDC performed a very limited clinical measurement of the test in January, 2020. This single study demonstrated a positive rate of just 5%—specifically, 11 positive tests among 210 high probability cases which included 68% of patients with fever and 90% with a cough. In other words, 95% of individuals highly likely to have the disease did not test positive. These striking findings translate to a clinical sensitivity of just 5%, a result that is breathtaking in the worst way possible—and that’s without even knowing the specificity (the false positive rate).
As it turns out during the initial outbreak in China, measurement of the test’s clinical reliability was not only virtually non-existent, what little was available was also censored. The Chinese assumptions of perfect test characteristics were challenged by a single research article published in a Chinese medical journal on March 5, 2020. Using Bayes’ theorem—the very same established principle discussed herein—the authors estimated the false positive rate of PCR tests used in China could be as high as 80%. However, this valuable epidemiological analysis of the diagnostic test’s unreliability—literally, this test of the test—was abruptly withdrawn from publication by “the editors.”
Translation: Big Brother censored it.
This occurrence—highly unusual even in communist China—has profoundly important implications for worldwide application of this same COVID-19 test and other similar tests. Yet during this once-in-more-than-a-lifetime epidemic that has had a stranglehold on the global conversation for months and that has already generated nearly 30,000 other scientific publications (most not peer-reviewed, and growing at a rate faster than the virus itself), this troubling story received only cursory attention from the US media. Startlingly, during this same timeframe the question of RT-PCR diagnostic test reliability has not been challenged or questioned by the WHO, the CDC or any other international medical or public health authority.
By early February, largely due to CDC’s inefficiency, about seven different private US labs were recruited or otherwise somehow developed their own proprietary versions of the RT-PCR test, which the FDA then approved on an emergency basis. But, much like in China, neither CDC nor the FDA ever measured, estimated or disclosed any of these tests’ clinical test characteristics—or required them to be compared either to one another or to the CDC version.
It is worth noting that the gold standard for calculating clinical sensitivity would be a culture of infected host cells taken from the patient. This elaborate approach had technical constraints in those who were asymptomatic. However, testing the test using basic clinical evidence of infection to estimate test characteristics was both vital and feasible. This approach could have helped establish test sensitivity in high-risk, clinically ill people. Randomized sampling in unexposed populations could have been performed in order to estimate test specificity. None of these things was ever done.
Rather than advise physicians and the public of these significant limitations from the outset, the CDC only briefly acknowledged this information weeks later by burying it near the end of the third revision of a CDC technical laboratory manual on March 30, 2020 with a generic statement: “Positive and negative predictive values are highly dependent on prevalence. False negative test results are more likely when prevalence of disease is high. False positive test results are more likely when prevalence is moderate to low.”
It is important to understand that the virus RNA was sequenced and a test was developed in under one month all told. A few more days to field test the test before rolling it out to save the world would not have killed anybody—literally. The Chinese had time. So did CDC, LabCorp, Quest, Roche and others. They all rushed their product to market. And the rest is recent history.
During this time the media focused intensely on overt delays in US test implementation and distribution, reveling in the chaos and denial of the Trump Administration. As the “us vs. them” political media narrative unfolded before the public, the significance of the 5% sensitivity coupled with no measures of specificity whatsoever went unchallenged and unquestioned by federal, state and local governmental agencies. The public thirst for truth was satisfied viewing the cool graphics on the Johns Hopkins COVID-19 dashboard.
Without having established the test characteristics or knowing the population prevalence as the disease quickly spread, and by conflating “positive” with “infection” and relying only on positive results to estimate and respond to the public health burden, the experts and governments collectively unleashed the statistical chaos predicted by Bayes’ theorem on an unsuspecting public.
The Fallacy and its Consequences
For those who know Bayes’ theorem is real and not a conspiracy theory, mass testing in the US and other nations over the past three months has produced an unsurprising finding: the “asymptomatic positives” are the fastest growing sector of positive cases, now accounting for more than 25% of all positives and potentially as high as 45% of all positives. According to the CDC, presently only 11% of all tests in the US are “positive”—with no clinical information made publicly available about the 89% of tests that are “negative.”
Public health authorities and medical experts have deduced and concluded that because so many asymptomatic people are “testing positive,” COVID-19 can cause a spectrum of effects ranging from acute pneumonia to no symptoms whatsoever—a phenomenon that has no precedent for acutely fatal, respiratory-transmitted infectious diseases. (For the what-about’ers, tuberculosis is not an acutely fatal infectious disease.)
And therein lies the testing fallacy—the secret ingredient in the COVID-19 Kool-Aid. The specious conclusion that “asymptomatic positives” are both infected and infectious has been accepted based largely upon the false premise the test is 100% reliable, throwing evidence and long-established statistical truths out the window.
Actually proving that asymptomatic positives are infectious would have been vital to public health, and would have been fairly simple to do early on in the course of testing. Either culture the swab or have a statistically significant sample of people cough, sneeze, or rant into a sterile plastic bag and then measure the virus. The RT-PCR test is not perfect in all applications but can, nonetheless, detect 100 RNA molecules out of the millions of virus particles that infect human cells in the respiratory tract.
Yet while billions of US taxpayer stimulus money has been fed to private and academic research for hydroxychloroquine, remdesivir, vaccines and high-tech face masks, on top of trillions of dollars in relief to the unemployed and struggling businesses, no scientist has bothered to challenge the highly questionable contention that asymptomatic positives are infectious.
Instead, the true positives among those with obvious clinical disease and the false positives among the unexposed or “asymptomatic” have all been lumped together as “confirmed cases.” There has been no attempt to stratify them by risk or comparing the findings to other similar risk groups. Meanwhile, and with consequences at least as ominous, in the absence of knowing the test clinical sensitivity, most of the people with false negative test results among those with obvious clinical infection or high risk have been falsely reassured they are not a risk to infect others.
Worse still, the experts have used this information to convince governments, the media and the public the infection is spreading fastest in places where testing is being done the most. In response, the experts propose and demand even more reliance on testing.
As the spread of COVID-19 infection has been contained through the lockdown of business and social interaction from March to early May 2020 a paradox has emerged. The incidence of severe cases, hospitalizations and deaths is declining in most places—even those like New York and New Jersey that had the heaviest toll—and yet the infection nonetheless appears to be spreading because an increasing number of “positive” cases are “confirmed” by testing. At the same time, hospitals that set up special COVID-19 wards, imported ventilators and diverted the rest of their “revenue” (i.e., disease) are now struggling to fill beds.
With prevalence remaining relatively low and general population testing continuing in a haphazard, non-targeted manner, this paradox is most plausibly explained by the high probability of false positives, particularly among so-called asymptomatic people.
Based on the testing fallacy, many businesses are now contemplating the notion of testing everybody as they re-open. Spurred on by the media and a plethora of mostly liberal public health champions, employees are pressuring their employers to test, test, test. But without knowledge of the test characteristics measured against real clinical results, any test-based decisions made by a company to determine the safety of entry into the workplace or need for quarantining of individual employees would have a significant built-in error rate—both false positives and false negatives.
Companies who believe they’re doing the right thing by spending hundreds of thousands or even millions of dollars on mass-testing to appease their employees or improve their standing on social media will find themselves with lots of unexpected false positives they can’t differentiate from true positives. When the unexpected “spike” of positives is reported by the testing laboratories, these companies will find themselves investigated or shut down by local or state public health agencies.
At the very same time, using tests as public safety criteria to enter the workplace or a community (as Lanai, Hawaii is doing with antibody tests, for example) means that negative results may allow infected people to walk right through the front door. In this scenario, no good deed goes unpunished.
The Not-So-Rosy Conclusion
As non-strategic testing increases in the general population, so do false positives that nobody wants to talk about or understand.
Continued reliance on and misinterpretation and misapplication of highly flawed tests will continue to produce more economic devastation, along with attendant confusion, blame, and social conflict. A vaccine is not likely anytime soon.
In place of the arbitrary division of people into “positive” and “negative,” a much more intelligent and rational response to population-based disease prevention and risk stratification is needed—and soon.
Efficient, statistically-based sampling as part of an ongoing return-to-work risk assessment requires a much more sophisticated public health approach than the present federal, state and local (non-) strategy for testing anybody and everybody who wants a test or who is obviously already infected. For a start, a scientific, population-based approach requires that both the numerators—that is, positive tests—AND the denominators—that is, both the positive and negative tests—be counted and stratified in terms of clinical probability.
Even though the initial opportunity to get it right early in the pandemic has long since passed, we have a second chance as we attempt to reopen. Strategic shifting of available resources in the clinical direction will not be easy, but will prove vastly superior to the current emperor-has-no-clothes approach.
The bottom line is that testing for infection and for antibodies both have serious constraints and will not provide the ballyhooed solution that the “test, test, test” approach advocates have promoted. The antidote for the Kool-Aid may not be effective in time to reverse the damage already done, but it could still prevent trillions of dollars in future damage and lost livelihoods.
This is not a test. We cannot afford to keep getting it wrong.
Rich Herschlag, PE is an author, blogger and civil engineer based in Easton, PA. This article was co-authored with a strategically situated physician colleague with a public health degree who scrutinized the science and recognized the fallacy, but needs to remain anonymous.