Crop Yield Prediction through Machine Learning Integration with Sentinel-2 Time Series Data in Irrigated Agroecosystems
DOI:
https://doi.org/10.63163/jpehss.v4i1.1293Abstract
Accurate, timely crop yield prediction is essential for food security, market stability, and precision agriculture, particularly in irrigated agroecosystems where spatial heterogeneity in water, nutrient, and management practices complicates forecasting. This review synthesizes recent advances in integrating high-resolution Sentinel-2 time-series data with machine learning (ML) models to achieve field-level yield estimation. Sentinel-2’s 10–60 m multispectral bands, 5-day revisit frequency, and derived vegetation indices (NDVI, EVI, NDRE, SAVI, GNDVI) capture phenological dynamics, canopy development, and stress responses across growth stages. Key ML approaches include Random Forest, Support Vector Regression, Gradient Boosting Machines (XGBoost, LightGBM), deep learning architectures (CNN, LSTM, Transformer-based models), and hybrid frameworks combining vegetation indices with meteorological variables, soil properties, and crop management data. Studies demonstrate R² values of 0.75–0.95 and RMSE reductions of 15–40% compared to traditional statistical or coarse-resolution models, with superior performance in heterogeneous irrigated systems (rice, wheat, maize, cotton). Feature importance analyses consistently highlight mid-season red-edge and near-infrared bands for predictive power. Challenges such as cloud cover, data harmonization, model transferability, and ground-truth scarcity are addressed through gap-filling techniques, transfer learning, and data augmentation. The integration of Sentinel-2 with ML offers scalable, cost-effective yield forecasting, enabling proactive interventions and supporting climate-resilient agriculture in water-managed regions.