Analytics in Action News

Google Makes COVID-19 Datasets Freely Available to Researchers

Google Cloud is allowing researchers, data scientists, and analysts to access COVID-19 datasets for model development.

Google makes COVID-19 dataset freely available to researchers

Source: Getty Images

By Jessica Kent

- Google Cloud is offering researchers free access to critical coronavirus information through its COVID-19 Public Dataset Program, which will help accelerate analytics solutions during the global pandemic.

For more coronavirus updates, visit our resource page, updated twice daily by Xtelligent Healthcare Media.

The program will make a hosted repository of public datasets free to access and query, including the Johns Hopkins Center for Systems Science and Engineering (JHU CSSE) dashboard, Global Health Data from the World Bank, and OpenStreetMap data. Researchers will also be able to use BigQuery ML to train machine learning models at no additional cost.

“Data always plays a critical role in the ability to research, study, and combat public health emergencies, and nowhere is this more true than in the case of a global crisis,” Chad W. Jennings, BigQuery product manager and GIS lead, and Shane Glass, developer advocate, wrote in a blog post.

“Access to data sets—and tools that can analyze that data at cloud scale—are increasingly essential to the research process, and are particularly necessary in the global response to the novel coronavirus (COVID-19)."

These datasets will remove barriers for research and provide access to crucial information quickly and easily, removing the need to search for and onboard large data files. Research teams can access the datasets within the Google Cloud Console, along with a description of the data and sample queries to advance research.

All data included in the program will be public and freely available. The program will stay in effect until September 15, 2020.

“Making COVID-19 data open and available in BigQuery will be a boon to researchers and analysis in the field,” said Sam Skillman, head of engineering at Descartes Labs. “In particular, having queries be free will allow greater participation, and the ability to quickly share results and analysis with colleagues and the public will accelerate our shared understanding of how the virus is spreading.”

This effort is one among many that aim to monitor and control the COVID-19 pandemic using data and analytics tools. Recently, Amazon Web Services (AWS) launched the AWS Diagnostic Development Initiative, a program that seeks to accelerate COVID-19 research and testing.

The initiative will support customers who are working to bring better and more accurate diagnostic solutions to market faster.

“As COVID-19 continues to spread, we are acutely aware of the impact this is having on families, businesses, and communities. This is a global health emergency that will only be resolved by governments, businesses, academia, and individuals working together to better understand this virus and ultimately find a cure,” Teresa Carlson, vice president for AWS’s worldwide public sector, wrote in a blog post.

“In our AWS business, one area where we have heard an urgent need is in the research and development of diagnostics, which consist of rapid, accurate detection and testing of COVID-19. Better diagnostics will help accelerate treatment and containment, and in time, shorten the course of this epidemic.”

Google Cloud is seeking to achieve similar results with this program, enabling researchers to build quality algorithms for improved care and control.

“Developing data-driven models for the spread of this infectious disease is critical,” said Matteo Chinazzi, Associate Research Scientist, Northeastern University. “Our team is working intensively to model and better understand the spread of the COVID-19 outbreak. By making COVID-19 data open and available in BigQuery, researchers and public health officials can better understand, study, and analyze the impact of this disease.”

With the new COVID-19 Public Dataset Program, Google Cloud expects to accelerate new tools and solutions that can help combat and mitigate the impact of the virus.

“The contents of these datasets are provided to the public strictly for educational and research purposes only. We are not onboarding or managing PHI or PII data as part of the COVID-19 Public Dataset Program. Google has practices and policies in place to ensure that data is handled in accordance with widely recognized patient privacy and data security policies,” the authors concluded.

“We on the Google Cloud team sincerely hope that the COVID-19 Public Dataset Program will enable better and faster research to combat the spread of this disease.”