Quality & Governance News

Federal Effort Aims to Optimize Search Engines for COVID-19 Research

The collaborative project will apply powerful search engines to the COVID-19 Open Research Dataset.

Federal effort aims to optimize search engines for covid-19 research

Source: Thinkstock

By Jessica Kent

- The US Department of Commerce’s National Institute of Standards and Technology (NIST) and the White House Office of Science and Technology Policy (OSTP) have partnered to support the development of a search engine that will enhance COVID-19 research.

For more coronavirus updates, visit our resource page, updated twice daily by Xtelligent Healthcare Media.

For the project, NIST will initially work with the Allen Institute for Artificial Intelligence, the National Library of Medicine (NLM), Oregon Health & Science University (OHSU), and the University of Texas Health Science Center at Houston (UTHealth).

The group will apply the Text Retrieval Conference (TREC), a long-running program of expert engagement and technology assessment, to the COVID-19 Open Research Dataset (CORD-19). CORD-19 contains more than 44,000 research articles and related data about COVID-19 and the coronavirus family of viruses.

Released in March 2020, CORD-19 is an extensive machine-readable coronavirus literature collection available for data mining. The dataset was put together by researchers from the Allen Institute for AI, the Chan Zuckerberg Initiative (CHI), Microsoft, and others to better understand the nature and spread of COVID-19.

The TREC-COVID program will aim to create datasets and use an independent assessment process that will help search engine developers evaluate and optimize their systems.

“Our nation’s scientific enterprise is mobilized to defeat the invisible enemy that is COVID-19,” said Secretary of Commerce Wilbur Ross. “Our scientists — and the businesses and institutions that provide them with advanced digital research technologies — are to be commended for their unwavering dedication to finding a cure for this insidious disease.”

The TREC-COVID team will first release a series of sample queries for the biomedical research community, developed by team members from NLM, OHSU and UTHealth. Registered participants in TREC-COVID will use their information retrieval and search systems to run the queries against the CORD-19 dataset and return their results to NIST.

Biomedical experts will then review test results, including document relevance ratings, to evaluate the overall performance of the retrieval systems. NIST will score the submissions and post the scores, the retrieval results themselves, and the list of key reference documents to the TREC-COVID website.

Information retrieval researchers can then use these test collections to evaluate and improve the performance of their own search engines. The goal of this effort is to help researchers understand how search systems could best support medical researchers when available information is developing quickly, as is happening with the current pandemic.

Each Friday, the Allen Institute for AI has released an expanded COVID-19 dataset to capture the most recent articles on COVID-19 and related coronaviruses. Later rounds of TREC-COVID will use the larger releases of CORD-19 and expanded query sets.

Participants will have one week to submit their search results, and NIST will post results within about a week, with an expected spacing of about two weeks between each new dataset round being released. The team initially anticipates conducting five consecutive rounds of search system assessments.

“The TREC program has provided an effective way to evaluate and advance search engine technologies since 1992, and has led directly to the powerful search capabilities and internet-based efficiencies we now often take for granted,” said Under Secretary of Commerce for Standards and Technology and NIST Director Walter G. Copan.

“We are pleased to apply this infrastructure to the challenge of working with massive amounts of data to help researchers better understand and ultimately to combat this deadly novel coronavirus and related threats.”

With the TREC-COVID program, federal organizations are continuing to focus their efforts to combat coronavirus.

“AI experts worldwide are responding to the White House’s call to action, developing approaches that help scientists gain insights from thousands of articles of COVID-19 scholarly literature,” said Michael Kratsios, US chief technology officer.

“The TREC-COVID program expands upon these efforts by creating powerful and accurate search engines that extract knowledge from this literature, tailored to the needs of the healthcare and medical research communities. We thank NIST for this valuable contribution as part of the Trump administration’s whole-of-America response to the coronavirus.”