State-of-the-art snow sensing technologies currently provide an unprecedented amount of data from both remote sensing and ground sensors, but their assimilation into dynamic models is bounded to data quality, which is often low – especially in mountain, high-elevation, and unattended regions where snow is the predominant land-cover feature. To maximize the value of snow-depth measurements, we developed a random forest classifier to automatize the quality assurance and quality control (QA/QC) procedure of near-surface snow-depth measurements collected through ultrasonic sensors, with particular reference to the differentiation of snow cover from grass or bare-ground data and to the detection of random errors (e.g., spikes). The model was trained and validated using a split-sample approach of an already manually classified dataset of 18 years of data from 43 sensors in Aosta Valley (northwestern Italian Alps) and then further validated using 3 years of data from 27 stations across the rest of Italy (with no further training or tuning). The F1 score was used as scoring metric, it being the most suited to describe the performances of a model in the case of a multiclass imbalanced classification problem. The model proved to be both robust and reliable in the classification of snow cover vs. grass/bare ground in Aosta Valley (F1 values above 90 %) yet less reliable in rare random-error detection, mostly due to the dataset imbalance (samples distribution: 46.46 % snow, 49.21 % grass/bare ground, 4.34 % error). No clear correlation with snow-season climatology was found in the training dataset, which further suggests the robustness of our approach. The application across the rest of Italy yielded F1 scores on the order of 90 % for snow and grass/bare ground, thus confirming results from the testing region and corroborating model robustness and reliability, with again a less skillful classification of random errors (values below 5 %). This machine learning algorithm of data quality assessment will provide more reliable snow data, enhancing their use in snow models.
A random forest approach to quality-checking automatic snow-depth sensor measurements / Blandini, G.; Avanzi, F.; Gabellani, S.; Ponziani, D.; Stevenin, H.; Ratto, S.; Ferraris, L.; Viglione, A.. - In: THE CRYOSPHERE. - ISSN 1994-0416. - 17:12(2023), pp. 5317-5333. [10.5194/tc-17-5317-2023]
A random forest approach to quality-checking automatic snow-depth sensor measurements
Blandini G.;Ratto S.;Viglione A.
2023
Abstract
State-of-the-art snow sensing technologies currently provide an unprecedented amount of data from both remote sensing and ground sensors, but their assimilation into dynamic models is bounded to data quality, which is often low – especially in mountain, high-elevation, and unattended regions where snow is the predominant land-cover feature. To maximize the value of snow-depth measurements, we developed a random forest classifier to automatize the quality assurance and quality control (QA/QC) procedure of near-surface snow-depth measurements collected through ultrasonic sensors, with particular reference to the differentiation of snow cover from grass or bare-ground data and to the detection of random errors (e.g., spikes). The model was trained and validated using a split-sample approach of an already manually classified dataset of 18 years of data from 43 sensors in Aosta Valley (northwestern Italian Alps) and then further validated using 3 years of data from 27 stations across the rest of Italy (with no further training or tuning). The F1 score was used as scoring metric, it being the most suited to describe the performances of a model in the case of a multiclass imbalanced classification problem. The model proved to be both robust and reliable in the classification of snow cover vs. grass/bare ground in Aosta Valley (F1 values above 90 %) yet less reliable in rare random-error detection, mostly due to the dataset imbalance (samples distribution: 46.46 % snow, 49.21 % grass/bare ground, 4.34 % error). No clear correlation with snow-season climatology was found in the training dataset, which further suggests the robustness of our approach. The application across the rest of Italy yielded F1 scores on the order of 90 % for snow and grass/bare ground, thus confirming results from the testing region and corroborating model robustness and reliability, with again a less skillful classification of random errors (values below 5 %). This machine learning algorithm of data quality assessment will provide more reliable snow data, enhancing their use in snow models.File | Dimensione | Formato | |
---|---|---|---|
tc-17-5317-2023.pdf
accesso aperto
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Creative commons
Dimensione
6.28 MB
Formato
Adobe PDF
|
6.28 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2988814