Sign In or Create an Account.

By continuing, you agree to the Terms of Service and acknowledge our Privacy Policy

Climate Tech

Google Is Using AI to Fill a Flood Risk Data Gap

Researchers at the hyperscaler say they can predict flash floods with a new Gemini-produced dataset.

A flooded house and the Gemini logo.
Heatmap Illustration/Getty Images

Flash floods, when stormwater pools and rises rapidly in an area within just a few hours of a storm's onset, are one of the more dangerous hazards of a warming planet prone to heavier rainfall. They are also notoriously difficult to predict. But research out of Google on Thursday shows how artificial intelligence could unlock better forecasts and help communities prepare.

Google researchers used Gemini, the tech giant’s signature AI agent, to process millions of news articles from around the world about past floods and extract data on when and where the deluges occurred. After assembling this vast new dataset — the largest of its kind to date — they used it to train a flood prediction model that uses local, hourly meteorological data to produce 24-hour forecasts for urban flash floods in more than 150 countries.

The dataset, which Google has named Groundsource, is free for anyone to download and use, and the forecasts are now live on Google’s Flood Hub, an online portal that also predicts river-related flood events. The tool is somewhat crude — it simply indicates whether there is a medium or high likelihood of a flash flood occurring in the next 24 hours in a given area. It only covers urban areas, and it doesn’t tell you how severe the flood could be. The resolution is also pretty coarse, indicating risks at the scale of a city rather than a street or neighborhood.

Still, the researchers said the forecasts would be useful for alerting authorities to potential risks.

“People have been very interested, even at that level of granularity,” Gila Loike, a product manager at Google Research, told reporters in a press conference this week.

According to Google, a regional disaster authority in Southern Africa caught a flash flood alert while the tool was still in beta, confirmed the flood on the ground, and then deployed a humanitarian worker to oversee the response. “We’re still in the early days of seeing the impact of Groundsource, but that chain of events from a prediction in Flood Hub to boots on the ground is exactly what Flood Hub was built for,” Juliet Rothenberg, the product director for Google’s crisis resilience work, said.

One of the key reasons it’s so hard to predict flash floods is the lack of historical data. We have decent flood models for “riverine” flooding, when rivers overflow, because of physical gauges in rivers around the world that have collected water levels for decades, but there’s no equivalent for city streets.

News articles present a largely untapped source to fill this gap. The challenge is that the key bits of information, such as where and when the flood occurred, are buried in narrative texts and expressed in wildly inconsistent formats. It would take human experts untold hours and resources to wade through each one and record the data in a standardized manner. An AI agent such as Gemini, however, can do it much faster.

Google’s research team started out by crawling the web for news articles describing flood events going back to the year 2000, gathering an initial pool of more than 9 million stories from around the world. After getting rid of ads and menus and the like and translating the articles that were in other languages to English, they fed them to Gemini.

“You are a meticulous flood event analyst,” the researchers told the AI agent. The rest of the elaborate prompt is included in a non-peer-reviewed preprint paper detailing the group’s methods for producing the dataset. In essence, they goaded Gemini to take a sentence such as “Main Street flooded on Tuesday,” and interpret where, exactly, this Main Street was located, and which Tuesday the article was referring to.

The resulting dataset contains 2.6 million historical flood events across more than 150 countries. As a comparison, the next largest public dataset, the National Oceanic and Atmospheric Administration’s Storm Events database, contains about 2 million storm events from 1950 to the present, only about 230,000 of which are flood events. The biggest global dataset, the United Nations Office for Disaster Risk Reduction’s DesInventar system, contains 500,000 events, only a fraction of which are records of floods. It’s also restricted to participating nations and inconsistently updated.

“Oftentimes, the first question our researchers will ask when we talk about going into a new domain within crisis resilience is, what data do you have? How many data entries do you have?” Rothenberg said. “That’s what really unlocks the ability to make breakthroughs here.”

Humberto Vergara, an assistant professor of civil and environmental engineering at the University of Iowa who studies flash floods, agreed that the lack of flood observation data has been a significant obstacle for the field. He told me the Groundsource dataset will “definitely be of great interest” and that there is “definitely great need for things like this.” Using news reports to fill out the global picture of flooding is something researchers have been thinking about doing for a while, he added.

While Vergara was cautiously optimistic the data would be useful, he was quick to note that it would take additional efforts to validate. His lab is working on its own dataset based on satellite estimates of rainfall that could be used to prove out Google’s records, he said.

The Google team already made some efforts to validate Groundsource, cross-checking it with manual annotations of the news reports as well as with other existing databases. It found that about 82% of the events were labeled with the correct location and timeframe. “From a research perspective, using an 82% accurate dataset is actually acceptable,” Loike said. “A well-trained model can smooth out the inconsistencies and thereby learn the dominant patterns while ignoring the 18% of labeling errors.”

They also validated the Flood Hub predictions by comparing its U.S. outputs to flood and flash flood warnings produced by the National Weather Service. “Achieving performance metrics comparable to such a sophisticated, instrumentation-rich framework demonstrates how AI can bridge the warning gap in underserved regions that lack equivalent infrastructure,” the researchers wrote in a second non-peer-reviewed preprint describing the model development.

Part of the reason Vergara was cautious in praising the effort is that predicting flash floods is challenging for reasons beyond the lack of historical data. “Most of the driving force is rainfall,” he said. “Everybody in the community knows that predicting rainfall is extremely difficult. The best models out there cannot predict rainfall with the accuracy that is needed for flash floods with more than one or two hours of lead time.”

The utility of Google’s Flood Hub depends on who will be consuming the information, he said. It’s probably not high-resolution enough to be useful for emergency responders, but there might be agencies at the city or regional level that can use it as a situational awareness tool.

Rothenberg, of Google, is optimistic that this same method can produce useful predictions for other kinds of extreme events.

“Applying this methodology to flash flood reports is just the beginning,” Rothenberg told reporters at the press conference. “We think there’s an immense opportunity in thinking about how we could use publicly available information to help predict heat waves or landslides, for example — other events that are hard to predict because the data hasn’t been centralized or it doesn’t exist.”

Blue

You’re out of free articles.

Subscribe today to experience Heatmap’s expert analysis 
of climate change, clean energy, and sustainability.
To continue reading
Create a free account or sign in to unlock more free articles.
or
Please enter an email address
By continuing, you agree to the Terms of Service and acknowledge our Privacy Policy
Climate

Does Microsoft’s Clean Energy Pullback Actually Matter?

Giving up on hourly matching by 2030 doesn’t mean giving up on climate ambition — necessarily.

Clean energy and the Microsoft logo.
Heatmap Illustration/Getty Images

Microsoft celebrated a “milestone achievement” earlier this year, when it announced that it had successfully matched 100% of its 2025 electricity usage with renewable energy. This past week, however, Bloomberg reported that the company was considering delaying or abandoning its next clean energy target set for 2030.

What comes after achieving 100% renewable energy, you might ask? What Microsoft did in 2025 was tally its annual energy consumption and purchase an equal amount of solar and wind power. By 2030, the company aspired to match every kilowatt it consumes with carbon-free electricity hour by hour. That means finding clean power for all the hours when the sun isn’t shining and the wind isn’t blowing.

Keep reading...Show less
Blue
Energy

Regulatory Reform Is Headed for the Nation’s Largest Grid

PJM Interconnection has some ideas, as does the state of New Jersey.

Josh Shapiro and Mikie Sherrill.
Heatmap Illustration/Getty Images

We’ve already talked this week about Pennsylvania asking whether the modern “regulatory compact,” which grants utilities monopoly geographical franchises and regulated returns from their capital investments, is still suitable in this era of rising prices and data-center-driven load growth.

Now America’s biggest electricity market and another one of that market’s biggest states are considering far-reaching, fundamental reforms that could alter how electricity infrastructure is planned and paid for over 65 million Americans.

Keep reading...Show less
Green
Climate Tech

Funding Friday: Robots Want Fast-Charging Batteries

Big fundraises for Nyobolt and Skeleton Technologies, plus more of the week’s biggest money moves.

A Skeleton factory.
Heatmap Illustration/Getty Images, Skeleton

Following a quiet week for new deals, the industry is back at it with a bunch of capital flowing into some of the industry’s most active areas. My colleague Alexander C. Kaufman already told you about one of the more buzzworthy announcements from data center-land in Wednesday’s AM newsletter: Wave energy startup Panthalassa raised $140 million in a round led by Peter Thiel to “perform AI inference computing at sea” using nodes powered by the ocean’s waves.

This week also saw fresh funding for more conventional data center infrastructure, as Nyobolt and Skeleton Technologies both announced later-stage rounds for data center backup power solutions. Meanwhile, it turns out Redwood Materials is not the only company bringing in significant capital for second-life EV battery systems — Moment Energy just raised $40 million to pursue a similar approach. Elsewhere, investors backed an effort to rebuild domestic magnesium production, and, in a glimmer of hope for a sector on the outs, gave a boost to green cement startup Terra CO2.

Keep reading...Show less
Green