Reconstructing the physical complexity of many-body dynamical systems can be a hard task. Starting from the trajectories of their constitutive units (raw data), typical approaches require choosing adequate parameters/descriptors to convert them into time-series that are then analyzed to extract human-interpretable information. However, identifying the best descriptor is often far from being trivial. Here we report a data-driven approach that allows to compare the efficiency of different types of descriptors in extracting information from noisy trajectories and translating them into physically-relevant information. As a prototypical example of a system with non-trivial internal complexity, we analyze molecular dynamics trajectories of an atomistic model system where ice and water coexist dynamically in correspondence of the solid/liquid transition temperature. We compare different types of general or specific descriptors often used to study aqueous systems, e.g. number of neighbors, molecular velocities, smooth overlap of atomic positions (SOAP), local environments and neighbors shuffling (LENS), orientational tetrahedral order, and distance from the fifth neighbor (d5). We use Onion clustering (an efficient unsupervised clustering method for timeseries analysis) to assess the maximum amount of information that can be extracted from the noisy trajectories by the various descriptors, which we then rank via a high-dimensional metric. Our results demonstrate how advanced descriptors, such as SOAP and LENS, outperform classical ones thanks to higher signal-to-noise ratios. Nonetheless, even the simplest descriptor can become as efficient (and even more) as advanced ones upon local-denoising of their signal. This is the case of, e.g. d5, among the worst performing descriptors, which becomes following to denoising by far the best one in resolving the non-strictly-local dynamical complexity of such an ice/water system. This work highlights the critical role of noise in the process of information extraction and it offers a data-driven approach to identify optimal descriptors for systems with characteristic internal complexity.
A data driven approach to classify descriptors based on their efficiency in translating noisy trajectories into physically-relevant information / Martino, Simone; Doria, Domiziano; Lionello, Chiara; Becchi, Matteo; Pavan, Giovanni M. - In: MACHINE LEARNING: SCIENCE AND TECHNOLOGY. - ISSN 2632-2153. - 6:3(2025). [10.1088/2632-2153/adfa66]
A data driven approach to classify descriptors based on their efficiency in translating noisy trajectories into physically-relevant information
Martino, Simone;Doria, Domiziano;Lionello, Chiara;Becchi, Matteo;Pavan, Giovanni M
2025
Abstract
Reconstructing the physical complexity of many-body dynamical systems can be a hard task. Starting from the trajectories of their constitutive units (raw data), typical approaches require choosing adequate parameters/descriptors to convert them into time-series that are then analyzed to extract human-interpretable information. However, identifying the best descriptor is often far from being trivial. Here we report a data-driven approach that allows to compare the efficiency of different types of descriptors in extracting information from noisy trajectories and translating them into physically-relevant information. As a prototypical example of a system with non-trivial internal complexity, we analyze molecular dynamics trajectories of an atomistic model system where ice and water coexist dynamically in correspondence of the solid/liquid transition temperature. We compare different types of general or specific descriptors often used to study aqueous systems, e.g. number of neighbors, molecular velocities, smooth overlap of atomic positions (SOAP), local environments and neighbors shuffling (LENS), orientational tetrahedral order, and distance from the fifth neighbor (d5). We use Onion clustering (an efficient unsupervised clustering method for timeseries analysis) to assess the maximum amount of information that can be extracted from the noisy trajectories by the various descriptors, which we then rank via a high-dimensional metric. Our results demonstrate how advanced descriptors, such as SOAP and LENS, outperform classical ones thanks to higher signal-to-noise ratios. Nonetheless, even the simplest descriptor can become as efficient (and even more) as advanced ones upon local-denoising of their signal. This is the case of, e.g. d5, among the worst performing descriptors, which becomes following to denoising by far the best one in resolving the non-strictly-local dynamical complexity of such an ice/water system. This work highlights the critical role of noise in the process of information extraction and it offers a data-driven approach to identify optimal descriptors for systems with characteristic internal complexity.File | Dimensione | Formato | |
---|---|---|---|
Martino_2025_Mach._Learn.__Sci._Technol._6_035039.pdf
accesso aperto
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Creative commons
Dimensione
2.31 MB
Formato
Adobe PDF
|
2.31 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3002908