New Data Technique Improves Accuracy of AI Forecasting for Nature Trends

Jul 9, 2025 By: daviddefusco

The researchers were led by Dr. David Li, director of the M.S. in Data Analytics and Visualization, and included Data Analytics graduates Zhengnan Li, above, Juan Francisco Leonhardt Chavez and Shubham Pant.

By Dave DeFusco

Forecasting changes in nature—like mosquito populations—has become more accurate thanks to a type of artificial intelligence called Long Short-Term Memory (LSTM) models. These models are especially good at analyzing time-based data, but their success depends on how the data is prepared before being fed into the model.

A new study from the Katz School, led by Dr. David Li, director of the M.S. in Data Analytics and Visualization, introduces a better way to prepare this kind of data. Called partition-based shuffling, the method helps the model make more accurate predictions by preserving important seasonal patterns while still mixing up the data enough to avoid overfitting, which is when a model performs well on training data but poorly on new data. The team presented their findings at the IEEE SoutheastCon conference in Charlotte, N.C., in March.

Traditionally, researchers shuffle data randomly to help models generalize better but when you do this with time-based data, like mosquito counts over many years, it can destroy patterns that occur at certain times of year. That’s a problem because models like LSTM rely on spotting those patterns to make accurate forecasts.

The Katz School team’s method keeps the seasonal patterns within each year intact while still randomizing the data across multiple years. This helps the model learn consistent seasonal trends, like more mosquitoes in summer, while still preventing it from becoming too attached to specific years in the dataset.

“Partition-based shuffling gives us the best of both worlds,” said Dr. Li. “It preserves the rhythms within each year but still adds enough variety across years to help the model generalize.”

The researchers, including Data Analytics graduates Zhengnan Li, Juan Francisco Leonhardt Chavez and Shubham Pant, tested this method on data that tracks mosquito populations along with environmental factors like temperature and humidity. They found that models trained using partition-based shuffling made better predictions and were less likely to overfit. These improvements were measured using a standard error metric known as root mean square error (RMSE)—the lower the score, the better the performance.

Key findings:

More accurate forecasts: Models trained with partition-based shuffling had lower error rates.
Better generalization: They performed better on new, unseen data.
Preserved seasonal patterns: The method avoided the problems caused by fully random shuffling, which can scramble important trends.

While this approach was used on mosquito forecasting, it could benefit many other areas that rely on seasonal time-based data, such as predicting disease outbreaks, modeling climate change or managing natural resources.

“This method gives researchers a more reliable way to prepare time-series data,” said Leonhardt Chavez. “It can improve model performance across a range of ecological and public health applications.”

However, the new technique isn’t without challenges. As datasets grow, so do the computing demands, and since this method requires careful handling of data across different years, it can be more expensive to run—especially in cloud computing environments. Still, the researchers believe the trade-off is worth it.

“Even though it takes more computing power, the improved accuracy and reliability make partition-based shuffling a valuable tool for anyone working with seasonal data,” said Pant.

��Ƶ��

��Ƶ�� News

News Channel