Accurate crop yield prediction is vital towards optimizing agricultural productivity. Machine Learning (ML) has shown promise in this field; however, its application to legume crops, especially to lupin, remains limited, while many models lack interpretability, hindering real-world adoption. To bridge this literature gap, an interpretable ML framework was developed for predicting lupin yield using Sentinel-2 remote sensing data integrated with georeferenced yield measurements. Data preprocessing involved computing vegetation indices, removing outliers, addressing multicollinearity, normalizing feature scales, and applying data augmentation techniques to correct target imbalance. Subsequently, six ML models were evaluated representing different algorithmic strategies. Among them, XGBoost showed the best performance ((Formula presented.) = 0.8756) and low error values across (Formula presented.), (Formula presented.), and (Formula presented.) metrics. To enhance model transparency, SHapley Additive exPlanations (SHAP) values were applied to interpret the feature contributions of the XGBoost model. The Enhanced Vegetation Index ((Formula presented.)) and Normalized Difference Vegetation Index ((Formula presented.)) were found to be key predictors of crop yield, both showing a positive correlation with higher values reflecting greater vegetation vigor and corresponding to increased yield. These were followed by (Formula presented.) (green) and (Formula presented.) (short-wave infrared), which captured key reflectance properties associated with chlorophyll activity and water content, respectively. Both of them substantially influence photosynthetic efficiency and plant health, ultimately affecting yield potential.

Interpretable Machine Learning for Legume Yield Prediction Using Satellite Remote Sensing Data / Petropoulos, Theodoros; Benos, Lefteris; Berruto, Remigio; Miserendino, Gabriele; Marinoudi, Vasso; Busato, Patrizia; Zisis, Chrysostomos; Bochtis, Dionysis. - In: APPLIED SCIENCES. - ISSN 2076-3417. - 15:13(2025), pp. 1-18. [10.3390/app15137074]

Interpretable Machine Learning for Legume Yield Prediction Using Satellite Remote Sensing Data

Berruto, Remigio;Busato, Patrizia;
2025

Abstract

Accurate crop yield prediction is vital towards optimizing agricultural productivity. Machine Learning (ML) has shown promise in this field; however, its application to legume crops, especially to lupin, remains limited, while many models lack interpretability, hindering real-world adoption. To bridge this literature gap, an interpretable ML framework was developed for predicting lupin yield using Sentinel-2 remote sensing data integrated with georeferenced yield measurements. Data preprocessing involved computing vegetation indices, removing outliers, addressing multicollinearity, normalizing feature scales, and applying data augmentation techniques to correct target imbalance. Subsequently, six ML models were evaluated representing different algorithmic strategies. Among them, XGBoost showed the best performance ((Formula presented.) = 0.8756) and low error values across (Formula presented.), (Formula presented.), and (Formula presented.) metrics. To enhance model transparency, SHapley Additive exPlanations (SHAP) values were applied to interpret the feature contributions of the XGBoost model. The Enhanced Vegetation Index ((Formula presented.)) and Normalized Difference Vegetation Index ((Formula presented.)) were found to be key predictors of crop yield, both showing a positive correlation with higher values reflecting greater vegetation vigor and corresponding to increased yield. These were followed by (Formula presented.) (green) and (Formula presented.) (short-wave infrared), which captured key reflectance properties associated with chlorophyll activity and water content, respectively. Both of them substantially influence photosynthetic efficiency and plant health, ultimately affecting yield potential.
2025
File in questo prodotto:
File Dimensione Formato  
Interpretable Machine Learning for Legume Yield Prediction.pdf

accesso aperto

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Creative commons
Dimensione 2.56 MB
Formato Adobe PDF
2.56 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3004234