Accurate crop yield prediction is vital towards optimizing agricultural productivity. Machine Learning (ML) has shown promise in this field; however, its application to legume crops, especially to lupin, remains limited, while many models lack interpretability, hindering real-world adoption. To bridge this literature gap, an interpretable ML framework was developed for predicting lupin yield using Sentinel-2 remote sensing data integrated with georeferenced yield measurements. Data preprocessing involved computing vegetation indices, removing outliers, addressing multicollinearity, normalizing feature scales, and applying data augmentation techniques to correct target imbalance. Subsequently, six ML models were evaluated representing different algorithmic strategies. Among them, XGBoost showed the best performance ((Formula presented.) = 0.8756) and low error values across (Formula presented.), (Formula presented.), and (Formula presented.) metrics. To enhance model transparency, SHapley Additive exPlanations (SHAP) values were applied to interpret the feature contributions of the XGBoost model. The Enhanced Vegetation Index ((Formula presented.)) and Normalized Difference Vegetation Index ((Formula presented.)) were found to be key predictors of crop yield, both showing a positive correlation with higher values reflecting greater vegetation vigor and corresponding to increased yield. These were followed by (Formula presented.) (green) and (Formula presented.) (short-wave infrared), which captured key reflectance properties associated with chlorophyll activity and water content, respectively. Both of them substantially influence photosynthetic efficiency and plant health, ultimately affecting yield potential.
Interpretable Machine Learning for Legume Yield Prediction Using Satellite Remote Sensing Data / Petropoulos, Theodoros; Benos, Lefteris; Berruto, Remigio; Miserendino, Gabriele; Marinoudi, Vasso; Busato, Patrizia; Zisis, Chrysostomos; Bochtis, Dionysis. - In: APPLIED SCIENCES. - ISSN 2076-3417. - 15:13(2025), pp. 1-18. [10.3390/app15137074]
Interpretable Machine Learning for Legume Yield Prediction Using Satellite Remote Sensing Data
Berruto, Remigio;Busato, Patrizia;
2025
Abstract
Accurate crop yield prediction is vital towards optimizing agricultural productivity. Machine Learning (ML) has shown promise in this field; however, its application to legume crops, especially to lupin, remains limited, while many models lack interpretability, hindering real-world adoption. To bridge this literature gap, an interpretable ML framework was developed for predicting lupin yield using Sentinel-2 remote sensing data integrated with georeferenced yield measurements. Data preprocessing involved computing vegetation indices, removing outliers, addressing multicollinearity, normalizing feature scales, and applying data augmentation techniques to correct target imbalance. Subsequently, six ML models were evaluated representing different algorithmic strategies. Among them, XGBoost showed the best performance ((Formula presented.) = 0.8756) and low error values across (Formula presented.), (Formula presented.), and (Formula presented.) metrics. To enhance model transparency, SHapley Additive exPlanations (SHAP) values were applied to interpret the feature contributions of the XGBoost model. The Enhanced Vegetation Index ((Formula presented.)) and Normalized Difference Vegetation Index ((Formula presented.)) were found to be key predictors of crop yield, both showing a positive correlation with higher values reflecting greater vegetation vigor and corresponding to increased yield. These were followed by (Formula presented.) (green) and (Formula presented.) (short-wave infrared), which captured key reflectance properties associated with chlorophyll activity and water content, respectively. Both of them substantially influence photosynthetic efficiency and plant health, ultimately affecting yield potential.| File | Dimensione | Formato | |
|---|---|---|---|
|
Interpretable Machine Learning for Legume Yield Prediction.pdf
accesso aperto
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Creative commons
Dimensione
2.56 MB
Formato
Adobe PDF
|
2.56 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3004234
