CoRE  Case Studies & Publications


CoRE Case Studies

Two Pharma companies wanted to compare the cost and accuracy of predictive models developed using CoRE’s active machine learning methods, to those using standard industry methods.  EPA’s ToxCast dataset was used as a simulated “experimental space” for the test.

  • A predictive model of the experimental space was developed using the current industry standard analytic methods.  Using several machine learning approaches – RandomForest and LASSO regression– it was necessary to explore 80% of the experimental space to reach the maximum predictive accuracy when compounds were chosen for experimentation based on their chemical diversity.

  • However, it only took 10% of the experimental space for CoRE to reach this level of predictive accuracy.

Given the EPA estimate that $6M was spent on the experiments to develop ToxCast, when dollarized, current industry methods would have cost $4.8M to explore and achieve the level of accuracy that CoRE™ would have achieved for $600K.  An 87% savings! 


Any further experimentation directed by CoRE™ resulted in an accuracy that was better than standard machine learning methods regardless of experiment selection methods. 

Case Study One - Reduced Experimentation Cost to Develop Accurate Predictive Models
Case Study Two - Reduced Experimentation by Leveraging Historical Experimental Results

A large pharma wanted to  compare the efficiency of CoRE’s active machine learning methods to standard industry methods for predicting hepatotoxicity.  Their high content screening (HCS) data from a recently published study was used.

  • A predictive model of the experimental space was developed using the current industry standard analytic methods.  About 50% of the experiments executed in the Study were needed to create the most accurate predictive model.

  • By comparison, it took only 30% of the experimental space for CoRE to reach this same level of accuracy predicting hepatotoxicity.


While it took 40% less experimentation,  the savings could not be estimated as costs were not made available.


More interestingly, collaborators then suggested that methods be tested for predicting toxicity without using “new” experimental results from HCS screens.  This is as if the models were developed entirely in silico without novel experimentation. In order to use only our extensive database of prior research (CoRE knowledgebase) on this problem, a  new, sophisticated method was designed that works with extremely sparse data sets.  Using this method with the dataset and no current experimental results, CoRE™ developed a model with higher accuracy than any methods previously tested. This shows that the knowledge gathered in their new HCS experiments was actually already in the CoRE knowledgebase, but it had been gathered in different experiments, testing different compounds.  The active learning methods used by CoRE™ enabled us to capture that knowledge effectively.

Case Study Three - Reduced Compound Synthesis Required to Discover Promising Drug Leads

A smaller pharma specializing in CNS drug development wished to assess how well CoRE would have performed on a completed drug discovery campaign had it been used to direct experimentation.  the pharma company conducted the campaign and identified a lead to advance after synthesizing a large number of compounds.  In our simulations, CoRE™ used their historical data to simulate an active learning approach as if it was directing compound synthesis.  So all of their data was hidden from CoRE™ and only revealed when CoRE™ recommended a batch of compound be “synthesized.” Random selection required that on average 42 compounds be synthesized in order to predict the ideal compound.  The “industry standard approach” required on average 25 compounds to be synthesized to produce an optimal lead.  CoRE™ required an average of only 18 compounds be synthesized to produce the optimal lead to advance.


This represents a 30-50% reduction in the number of synthesized compounds that would have needed to be made.

CoRE and Active learning Publications

  • J. D. Kangas, Naik, A. W. & Murphy, R. F., Active Learning to Improve Efficiency of Drug Discovery and Development, SLAS 2014 poster describing the capabilities and uses of the CoRE™.

  • R. J. Brennan, Kangas, J. D., Schmidt, F., Khan-Malek, R. & Keller, D. A., Applying an Active Machine Learning Process to Build Predictive Models of In Vivo Toxicity from ToxCast Screening Data, ToxCast Data Summit, September 2014. Additional Information