Scaling Science: Using Data to Find “Hidden Gem” Research

The current system of scientific recognition and funding has significant bias towards work published in the top few percent of journals and by a relatively small number of research groups. In other words, a large amount of high-quality scientific research goes unnoticed due to current promotion and funding criteria. In this work, we developed a system we call DELPHI (Dynamic Early-warning by Learning to Predict High Impact) to explore whether we could use data science to find “hidden gem” research that would go on to become highly “impactful” without having the high out-of-the-box citation counts that are typical of well-known and highly-established research groups.

As co-author James Weis tells MIT News, “We hope we can use this to find the most deserving research and researchers, regardless of what institutions they’re affiliated with or how connected they are.” 

From the Authors

Diversity in scientific research along many different axes is important and powerful.  We believe that diversity in race, gender and background of the scientists carrying out scientific research is critical and foundational to ensuring broad based scientific results as well as ensuring an ever increasing circle of role models and mentors to encourage anyone who is passionate about science to bring their unique talents and insights to bear on the exciting and pressing problems that the world has to offer for the benefit of all of us.

This paper is about diversity in scientific approach. In financial theory, a portfolio of investments has low risk if those investments (e.g. debt (bonds), equity (stocks) and fixed assets (real estate)) are uncorrelated with each other.  On average if some investments perform poorly others will perform well.  We believe that it is similarly important in science that we build a system which backs both a diversity of scientists as well as a diversity of approaches.

As a famous example, for some 20 years, Alzheimer’s disease was described by the amyloid hypotheses predicated on the idea that Alzheimer’s is caused by the accumulation of fibrillar amyloid β (Aβ) peptide[1]. It was very difficult to get Alzheimer’s grants based on alternate mechanisms of action and billions of dollars were spent developing Aβ targeting therapeutics.  The net result: Zero efficacious medicines.  Thankfully there are now hundreds of therapeutic research programs based on alternative mechanisms of action for Alzheimer’s but several decades were lost by not funding alternative approaches. 

In this paper we took a two step approach towards outlining a quantitative method for the allocation of finite resources to a diverse set of scientific approaches. The first step was the development of a model  (which we have termed DELPHI) to predict research likely to have high impact (defined as a time de-biased version of PageRank[2], similar to the metric used to rank webpages) in the future.  This model learns the pattern of features typical for papers in the past that have ended up being high impact.  A particular example is the pattern of second and third order citations which indicate how the scientific community not only recognizes but builds upon high impact results[3].  The second step is to create a correlation matrix which finds clusters of research in a particular field which are relatively uncorrelated along some dimension (e.g. the citation graph for each cluster) but in which each cluster is predicted to be of high impact.  The lowest risk path to achieving success in a given field is to back multiple uncorrelated clusters each with high impact expectation [4].

In summary we believe that a system which backs both an increasing diversity of people, and an increasing diversity in the set of approaches they take to solving scientific problems, is the best approach for ensuring a maximally beneficial future for all of us.

James Weis and Joseph Jacobson

[1] Kametani, F. and Hasegawa, M., 2018. Reconsideration of amyloid hypothesis and tau hypothesis in Alzheimer's disease. Frontiers in neuroscience12, p.25.

[2] See: Xu, S., Mariani, M.S., Lü, L. and Medo, M., 2020. Unbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data. Journal of informetrics14(1), p.101005.  This metric has been shown to outperform more common metrics in identifying milestone research.

[3] We note that once a publication accumulates several citations, the citation graph becomes the dominant feature in predicting future impact.  Prior to any citations only reputational features (e.g. author, network, journal) exist. 

[4] We note that this is not the path of highest return. That is realized by picking one or a small number of clusters which if successful have high return but also high risk.  In practice an optimal risk- return operating point is one in which N clusters of uncorrelated approaches are funded, such that N is the largest number for which the marginal utility of the Nth approach continues to exceed the marginal cost of that approach.

Frequently Asked Questions

  1. What is the goal of this research?

    The current system of scientific recognition and funding has significant bias towards work published in the top few percent of journals and by a relatively small number of research groups. In other words, a large amount of high-quality scientific research goes unnoticed due to current promotion and funding criteria. In this work, we explored whether we could use data science to find “hidden gem” research that would go on to become highly “impactful” but do not necessarily have the high out-of-the-box citation counts that are typical of well-known and highly-established research groups.

    We found that there are indeed a significant number of such “hidden gem” papers, which are missed by conventional citation metrics. As an example, there are a number of journals that would be considered mid-range on the basis of Impact Factor that have higher average predicted impact than some top-ranked journals. Furthermore, even the bottom-ranked journals (again based on Impact Factor) from our biotechnology-focused dataset each have a number of papers of predicted medium-to-high future impact.

    We think this is an important finding; it suggests that there exists untapped research and investigators that, despite creating high-quality work, may not be receiving the resources, recognition, and support they deserve. It also suggests that, if we can design new ways to identify these “hidden gems” projects, we can potentially start breaking out of a journal- and citation-drive status quo. Of course, our platform is simply an algorithmic way to find new research and investigators; its biases must be continually monitored by area experts and it must be carefully integrated into a larger, human-driven toolkit to ensure systemic biases are decreased and that innovation and creativity are incentivized.

    However, there is also correlation between research groups—some groups may be large, and work on consistently cited research, while others may be smaller, less connected, or less recognized via traditional metrics. Not only is focusing attention and funding on larger “clusters'' undesirable from a social equality perspective, but it also could lead to reduced risk-taking and thus lower overall scientific output. As such, we are interested in integrating our data-driven measures, along with constant human feedback and expertise, into funding strategies that reward, rather than penalize, research in new areas or with novel collaborators. 

    Overall, our goal is to develop tools that help us, collectively, discover interesting, exciting, impactful research—especially work that might be overlooked with current publication metrics, which are known to be biased. By combining a wider variety of metrics into a machine learning pipeline, we hope to start to move beyond reliance on any particular metric (e.g. citations, h-index, etc). It’s critical to note that we view DELPHI as an additional tool to be integrated into the researcher’s toolkit, and never as a replacement for human-level expertise and intuition. In other words, our framework can be understood as a way to search through the academic literature to help find important research and researchers that the current system may have missed. 

  2. Your models use many features that we know to be biased; aren’t you going to augment, rather than combat, systemic biases in the academic network?

    This is an important question. Our hypothesis was that, by considering a broader array of features, we could find important research that would otherwise be missed by current bias-prone metrics, which we have demonstrated by DELPHI’s ability to highlight “impactful” work that citations (and other metrics) would not have otherwise uncovered in the same time period. The features used in our system have been demonstrated as containing signal by previous research for predictive tasks, and while we had to start somewhere, we expect future, improved versions of our model to be developed that incorporate larger and improved sets of relevant metrics, as well as new objective functions—which would ideally be community-curated to ensure relevance and lack of bias. That said, the avalanche of academic research makes the usage of search algorithms and metrics inevitable, and thus we believe that the development of continually-improved ways of exploring the academic landscape (and to highlight high-quality researchers that currently have less visibility) are important.

  3. How exactly is your algorithm capable of knowing the future based on a paper?

     We are not predicting the future based on the contents of a paper—our approach is recognizing patterns in how the scientific community itself (composed of humans) react (by building on) a newly-released piece of research. By measuring the interaction of experts with new work, we hope to highlight valuable work earlier (regardless of where it was published or how many citations it has). 

  4. How can any machine learning model replace human intuition?

    Our goal is certainly not to decide what work gets done, but instead to highlight important work—and most importantly “hidden gem” research that are treated unfairly by the current status quo. Our approach uses time-series changes in network-level features to quantify the interaction of the scientific community with new papers, and compresses this high-dimensional information down to a single "early-warning score" that can be used by researchers. Interestingly, it turns out that this approach can find important work earlier than citations or other existing metrics—simply by measuring the response of the academic community in a more comprehensive manner. In no way is our algorithm a replacement for the brain of a researcher (or funder); it is an additional data input that they can use to broaden their “search space” and potentially find research they may otherwise not be exposed to. Our hope is that such tools can be used in combination with existing systems to broaden the scope of researchers that are considered important and thus deserving of attention and funding. 

  5. How are you quantifying scientific impact?

    There have been a plethora of attempts to quantify scientific impact, ranging from the more common citation count, h-index and journal impact factor to time- and field-normalized measures. However, as has been discussed extensively in previous literature, these metrics are not only suboptimal measures of quality but are also lagging indicators of impact—as such, their use can lead to suboptimal decisions in academic hiring, promotion and funding. As such, we have designed our framework to accommodate arbitrary, user-defined measures of “impact.” Further, we have demonstrated strong performance and compelling results using a time-rescaled measure of node centrality, which both removes age bias, allowing meaningful comparisons across years, and provides state-of-the-art performance in ranking milestone technologies

  6. Do you think it may be possible for malicious actors to "game" this system?

    As with all machine learning-based systems, care must be taken to ensure that these methods do not provide opportunities for malicious actors to manipulate the system for their own gain. By considering a broad range of features and using only those that hold real signal about future impact, we think that DELPHI holds the potential to reduce the possibility for manipulation by obviating reliance on simpler (and often reputation-related) metrics. It is possible, therefore, that DELPHI scores will be more difficult for authors or journals to manipulate than, for example, simple citation counts (upon which h-index and journal impact factor are based). However, additional studies, as well as careful human examination of calculations, are critical to more fully understand these possibilities.

  7. Won’t using machine learning in this way simply aggravate existing systemic problems in the academic and funding communities?

    We are acutely cognizant of the potential for our system to aggravate existing systemic biases, and want to strongly emphasize that any utilization of our platform should be conducted alongside area experts. Just as we would not blindly accept that the top cited articles are the most important, we should not blindly accept the top-scoring articles of our system to be the most valuable. Instead, the DELPHI score should be viewed as an alternative perspective into the academic system that can highlight work for consideration—even if it has a low citation count, or comes from an unexpected place.

  8. Why does the heatmap of CRISPR researchers show many male investigators?

    The heatmap of CRISPR researchers, which is a figure from the discussion section of our recent manuscript, does not show any data from our machine learning pipeline. Instead, this is simply showing historical correlation between researchers in our biotechnology-focused dataset that have published in the CRISPR space. For readability (given the large number of researchers in the space), the graphing function sampled the names at regular intervals (they were not selected by us). The purpose of this figure is to highlight the existing correlation between certain parts of the academic network—the identification of which is the first step towards developing new funding models that drive resources towards the “uncorrelated” and less-connected researchers. While this was part of our discussion, and not a core result of our paper, we think the integration of data-driven approaches to identifying under-funded researchers could help construct a funding landscape where funding is allocated more broadly, and where risk-taking is appreciated on a fundamental level.

  9. What would be the process for expanding this system into other scientific fields, and what might be the limitations of this?

    Our graph-based approach makes the expansion of our system into additional scientific fields possible—initially beginning with the inclusion of additional disciplines and academic journals, and subsequently potentially including other sources of high-quality research like arXiv. However, such expansion must be done carefully, as there are significant differences between the structure and dynamics among different disciplines. 

  10. Who could use this platform, and how could it potentially help improve the academic system?

    The metrics widely used by the academic and funding ecosystem are known to be biased and suboptimal—our goal is to create a tool that suggests additional research that is worth of review—even if it does not have an enormous number of citations, or is not published in a high Impact Factor journal. Given the large amounts of funding that are allocated annually by government funding agencies (NSF, NIH, NASA, etc), broadening the scope of what is considered valuable research could potentially help move towards a more impactful and equitable place—especially if it helps us find undervalued but high-value research or investigators which the current system overlooks.

  11. I have a serious scientific comment, methodology inquiry, or would like to implement your system myself. What should I do?

    Please contact us! We would be very happy to collaborate with those interested in designing new ways to make the academic system more accessible, equitable, and productive, and we welcome all serious feedback.

 [1] We note that this is not the path of highest expected return, which would be realized by picking one or a small number of clusters which. In practice an optimal risk-return operating point is one in which N clusters of uncorrelated approaches are funded, such that N is the largest number for which the marginal utility of the Nth approach continues to exceed the marginal cost of that approach.