
The Trouble with Drug Development
Open a new tab, load up a science media site you know. What do you first see across the front page? You will almost certainly find a headline blaring “NEW FINDINGS SHOW AUTISM’S DAYS ARE NUMBERED” or “UNIVERSITY RESEARCHER CURES PARKINSON’S DISEASE”. Open the page up next week, and you will almost certainly see some version of these headlines again. We appear to cure Alzheimer’s disease every few months, if not more often. Strangely, more and more people continue to suffer from Alzheimer’s disease and die shortly after, despite claims of a cure just over the horizon for the past few decades.

The headline is usually accompanied by a stock image from the galaxy brain meme.
For pharmaceutical companies large and small, this contradiction has not gone unnoticed. Despite the increasing pace of academic discoveries related to major neurological and psychiatric diseases, running the gamut from depression and anxiety to Parkinson’s disease and schizophrenia, Food and Drug Administration (FDA) approvals for new drugs to treat these diseases have dropped by more than half since the 1990s [1]. These low approval rates have not been due to lack of interest: pharmaceutical companies have sunk billions into developing and clinically testing these apparent miracle compounds we hear so much about. However, potential psychiatric drugs tend to fail during testing more than those to treat any other type of disease, like cancer or digestive issues [2].
In fact, the costs have been so high and the approvals so few that most pharmaceutical companies have decided to cut their losses and stop developing new drugs for neurological disorders [3]. This collective decision represents a loss of billions in research funding to find new ways to treat these diseases, with still greater losses (albeit harder to calculate) to patients and healthcare providers deprived of these new treatments. Our current position has become so precarious that the head of the National Institute of Mental Health, the primary backer of mental health research in America, has labeled it “the crisis in drug development” [4].
Why do drugs fail?
So why do these drugs fail despite the headlines showing such a rosy outlook to those outside? If you ask 10 researchers, you’ll hear at least 20 different answers. Though many fields are split many ways, answers usually touch on one of a few different broad topics. Those in the world of policy and economics tend to discuss issues like overly restrictive FDA regulations or channeling funding to the wrong technologies. On the other hand, many neuroscientists and statisticians disagree, instead focusing on a concept known as “predictive validity,” which simply refers to the likelihood that a certain preclinical result will successfully predict a certain outcome.
Many critics believe that current practices in neuroscience prevent academic discoveries from adequately predicting anything. These criticisms neatly dovetail with a larger discussion in science centered on the “Replication Crisis,” where the findings from most fields of science cannot be reliably repeated, such as in psychology, where only about a third stand up to closer scrutiny [5]. This phenomenon undermines a core tenet of science’s claim to truth, where if a researcher has doubts about a finding, they can check whether it is true themselves. In neuroscience specifically, many statisticians have identified systemic reliance on shaky assumptions, use of improper statistical tests, flawed experimental designs, and relatively low standards of proof [6].
Not all of these issues have bright-line solutions, though, and the few that do are far from easy. Refinement and clarification of these shaky assumptions remains one of the major goals of theoretical neuroscience requires long-term systemic change, and increasing low standards of proof requires an inherent trade-off: increasing statistical power requires increasing resource investment (e.g. more subjects, more tests, more validation), allowing researchers to conduct fewer experiments with the same amount of resources, while availability is already stagnant or on the decline. As a result, validity cannot just be prioritized on its own, but instead only acts as a component of a larger cost-benefit analysis, which by nature leads validity to take a backseat in design compared to efficiency.
Still other critics argue that predictions made based on animal research in neuroscience do not neatly correspond to potential treatments in humans [7]. For example, mice are most commonly used as a substitute for humans in most neuroscience studies, simply because we generally object to the removal of brains from humans, but not other animals. Of the remaining mammals available, mice are less expensive, better-validated, and take less time to grow and breed than most others [8]. These critics will note that these advantages do not include “have brains and behavior that exactly mimic humans”. As a result, they generally believe that models need to become more similar to humans to allow discoveries from models to guide clinical trials. These models include human cell lines, reprogrammed human stem cells, and newly-developed animal-human hybrids [9].

The face of modern neuroscience.
These critiques are sensible, but lead next to their own set of questions: how close is close enough? While some of these critics suggest using modified human cells as models, they are far from a brain. Is a more similar species or a more similar nervous system more important to a model system’s predictions? Though such questions seem obscure, reliable answers could shift the spending of millions of dollars by pharmaceutical companies. However, these questions are even murkier and more difficult to pursue than those prior, such that few have even made attempts to answer them despite their importance.
Where do we go from here?
Despite its inherent difficulty, one of the first systematic attempts to do so was published earlier this year. A diverse team of neuroscientists from a wide range of disciplines and institutions (disclosure: I was part of this team) attempted to answer a relatively simple question: if scientists know a drug successfully treats a neurological or psychiatric disease in humans, does it treat the disease in mice as well [10]? The question poses the typical mouse-to-human prediction in reverse, showing how well drug discovery pipelines can spot known true results, instead of just how often drugs fail in humans.
It also allows for the verification of one of the major concerns about drug development to be tested. If mice are indeed not similar enough to humans for clinical relevance, predictions in the opposite direction should fail at a similarly high rate. On the other hand, if the predictions succeed, then the issue is likely not due to using animal models. Instead, an inability to design experiments that identify effective compounds altogether would be far more likely, suggesting the statisticians are on the right track.
What did this analysis turn up? Across hundreds of studies looking at 40 drugs, a wide range of neurological diseases, and 66 combinations of the two, every single drug that worked in humans worked in mice as well. In other words, nearly every drug examined that worked in humans worked in mice, under essentially every possible condition examined. Therefore, these failures are highly unlikely to result from species differences between human and mice themselves.
Though we spotted this trend in a broad sense, it did not hold true for every single study. Many tests were individually unreliable; knowing the result of only one experiment would not yield accurate predictions. Only after four or five studies could be collated would the results reflect their true effectiveness in humans. Interestingly, this pattern supports the second view, where the statistical issues make any one academic study unreliable for predicting its future success as a potential clinical treatment.
Despite these problems’ notorious intractability, as discussed above, a few options could be pursued immediately. First, before use in industry, results could either require another group or multiple independent tests to independently confirm them (or ideally both), drastically limiting the chance of a false positive from moving forward in the pipeline. This may limit the number of compounds that move on to testing, but it would limit the losses incurred from failures late in development to an even greater extent. Instead of investing in a portfolio of a few dozen compounds where only about 10% succeed, pharmaceutical companies could focus on perhaps the ten most promising, where all have a far greater chance of repeated success throughout clinical trials all the way to FDA approval.
Second, the predictive validity for testing metrics should be studied using compounds that eventually succeeded as well as those that failed late in development, where failures are costliest and the standards most exhaustive. Such a framework could systematically determine a test’s ability to detect true successes and failures. This would allow for scientists to both choose the best existing methods, refine those that perform poorly, and enable the development of better ones for future research.
Third, though species differences may not be the main issue, tests in both species could still be made far more similar to one another. For example, mouse tests for psychosis study the orderliness of motion, but human tests center on self-reported questionnaires. Though these different behaviors may rely on similar underlying factors, the two still measure different indicators. To better unify the two, some scientists are developing new human tests based on motion that work just as well as questionnaires [11]. By unifying the studied indicators across species, findings from one species carry over to another far more reliably, instead of combining variability in the compounds’ true efficacy with the differences in the measures estimating it.
Though these solutions could potentially reduce risk in drug development, their implementation remains far off. Such changes may come too little too late at this point, especially because the nervous system now receives far less attention from many drug companies. Regardless, significant progress has been made: we have illuminated a previously murky source of major financial losses to the neuropharmaceutical industry and even noticed a few paths forward. The trails forward may wind far into the distance over obstacles difficult to surmount, but at least we now know where to go. Hopefully, one day our discoveries will again yield cures, not just headlines.
References
- Kinch, M.S. (2015). An analysis of FDA-approved drugs for neurological disorders. Drug Discov Today. 20(9): 1040-1043.
- Miller, G. (2013). Is pharma running out of brainy ideas? Science. 329(5991): 502-504.
- Hay, M., Thomas, D.W., Craighead, J.L., Economides, C., Rosenthal, J. (2014). Clinical development success rates for investigational drugs. Nat Biotechnol. 32(1): 40-51.
- Insel, T.R., Voon, V., Nye, J.S., Brown, V.J., Altevogt, B.T., et al. (2013). Innovative solutions to novel drug development in mental health. Neurosci Biobehav Rev. 37(10): 2438-2444.
- Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science. 349(6251): aac4716.
- Van der Worp, H.B., Howells, D.W., Sena, E.S., Porritt, M.J., Rewell, S., et al. (2010). Can animal models reliably inform human studies? PLoS Med. 7(3): e1000245.
- Hyman, S.E. (2014). Revitalizing psychiatric therapeutics. Neuropsychopharmacology. 39(1): 220-229.
- Nestler, E.J., Hyman, S.E. (2010). Animal models of neuropsychiatric disorders. Nat Neurosci. 13(10): 1161-1169.
- Pankevich, D.E., Altevogt, B.M., Dunlop, J., Gage, F.H., Hyman, S.E. Improving and accelerating drug development for nervous system disorders. Neuron. 84(3): 546-553.
- Howe VI, J.R., Bear, M.F., Golshani, P., Klann, E., Lipton, S.A., et al. (2018). The mouse as a model for neuropsychiatric drug development. Curr Biol. 28(17): R909-R914.
- Young, J.W., Minassian, A., Paulus, M.P., Geyer, M.A., Perry, W. (2007). A reverse-translational approach to bipolar disorder: Rodent and human studies in the behavioral pattern monitor. Neurosci Biobehav Rev. 31(6): 882-896.
You must be logged in to post a comment.