NOAA’s Physical Sciences Laboratory and Global Systems Laboratory and the artificial intelligence (AI) startup Brightband have entered into a two-year cooperative research and development agreement (CRADA) that is intended to optimize a vast NOAA-managed archive of observational weather data for training AI-based weather forecasting applications.
A homogenized archive
NOAA’s National Centers for Environmental Information currently maintain over 60 petabytes of environmental data, which is expected to expand to 400 petabytes by 2030. In comparison, the largest estimates of the size of training data used to train Chat GPT-4 are 1 petabyte.
Under the ‘Making NOAA Observation Data Artificial Intelligence-Ready’ CRADA, NOAA will collaborate with Brightband to transform the NOAA NASA Joint Archive of observational data from satellites, weather balloons and surface stations into an open-source data repository that will support a suite of geospatial foundation AI models. The joint archive is a collaboration between NOAA and NASA that has developed a homogenized repository of Earth system observations from 1970 to the present.
Brightband aims to make NOAA’s observational data archive AI-ready by processing data from older, difficult-to-use formats into modern, analysis-ready and cloud-optimized formats. This transformation is expected to enable rapid access to and processing of data in the cloud. The partners hope that this new dataset will be a foundation of data-driven weather forecasting tools.
AI’s role in optimizing forecasting
“The homogenized archive represents a great opportunity for AI developers across the weather enterprise,” said NOAA scientist Sergey Frolov, who leads the Physical Sciences Laboratory team involved in the project. “While individual pieces of data are available elsewhere, this will be a one-stop shop that significantly lowers the bar for entry for everyone who would like to leverage AI to improve forecasting.”
The engineering initiative will be led by Daniel Rothenberg, Brightband co-founder and head of data and weather, who previously helped to build the Pangeo community and toolkit to handle very large-scale datasets. Brightband was formerly known as OpenEarthAI.
“As more groups work to use machine learning to improve data assimilation and incorporate observations into weather forecasts, having a single, comprehensive and easy-to-use dataset will accelerate research efforts,” said Rothenberg. “Our partnership with NOAA will ensure that the community has access to the best possible dataset to use for this work.”
In related news, researchers from the Institute of Oceanology of the Chinese Academy of Sciences recently developed a new model for forecasting rapid intensification of a tropical cyclone, based on ‘contrastive learning’. Click here to read the full story.