Extracting from trajectory data meaningful information to understand complex molecular systems might be nontrivial. High-dimensional analyses are typically assumed to be desirable, if not required, to prevent losing important information. But to what extent such high-dimensionality is really needed/beneficial often remains unclear. Here we challenge such a fundamental general problem. As a representative case of a system with internal dynamical complexity, we study atomistic molecular dynamics trajectories of liquid water and ice coexisting in dynamical equilibrium at the solid/liquid transition temperature. To attain an intrinsically high-dimensional analysis, we use as an example an abstract high-dimensional descriptor of local molecular environments (e.g., Smooth Overlap of Atomic Positions, SOAP), obtaining a large dataset containing 2.56 × 106 576-dimensional SOAP spectra that we analyze in various ways. Our results demonstrate how the time-series data contained in one single SOAP dimension accounting only <0.001% of the total dataset's variance (neglected and discarded in typical variance-based dimensionality reduction approaches) allows resolving a remarkable amount of information, classifying/discriminating the bulk of water and ice phases, as well as two solid-interface and liquid-interface layers as four statistically distinct dynamical molecular environments. Adding more dimensions to this one is found not only ineffective but even detrimental to the analysis due to recurrent negligible-information/non-negligible-noise additions and "frustrated information" phenomena leading to information loss. Such effects are proven general and are observed also in completely different systems and descriptors' combinations. This shows how high-dimensional analyses are not necessarily better than low-dimensional ones to elucidate the internal complexity of physical/chemical systems, especially when these are characterized by non-negligible internal noise.
Relevant, Hidden, and Frustrated Information in High-Dimensional Analyses of Complex Dynamical Systems with Internal Noise / Lionello, Chiara; Becchi, Matteo; Martino, Simone; Pavan, Giovanni M.. - In: JOURNAL OF CHEMICAL THEORY AND COMPUTATION. - ISSN 1549-9618. - 21:14(2025), pp. 6683-6697. [10.1021/acs.jctc.5c00374]
Relevant, Hidden, and Frustrated Information in High-Dimensional Analyses of Complex Dynamical Systems with Internal Noise
Chiara Lionello;Matteo Becchi;Simone Martino;Giovanni M. Pavan
2025
Abstract
Extracting from trajectory data meaningful information to understand complex molecular systems might be nontrivial. High-dimensional analyses are typically assumed to be desirable, if not required, to prevent losing important information. But to what extent such high-dimensionality is really needed/beneficial often remains unclear. Here we challenge such a fundamental general problem. As a representative case of a system with internal dynamical complexity, we study atomistic molecular dynamics trajectories of liquid water and ice coexisting in dynamical equilibrium at the solid/liquid transition temperature. To attain an intrinsically high-dimensional analysis, we use as an example an abstract high-dimensional descriptor of local molecular environments (e.g., Smooth Overlap of Atomic Positions, SOAP), obtaining a large dataset containing 2.56 × 106 576-dimensional SOAP spectra that we analyze in various ways. Our results demonstrate how the time-series data contained in one single SOAP dimension accounting only <0.001% of the total dataset's variance (neglected and discarded in typical variance-based dimensionality reduction approaches) allows resolving a remarkable amount of information, classifying/discriminating the bulk of water and ice phases, as well as two solid-interface and liquid-interface layers as four statistically distinct dynamical molecular environments. Adding more dimensions to this one is found not only ineffective but even detrimental to the analysis due to recurrent negligible-information/non-negligible-noise additions and "frustrated information" phenomena leading to information loss. Such effects are proven general and are observed also in completely different systems and descriptors' combinations. This shows how high-dimensional analyses are not necessarily better than low-dimensional ones to elucidate the internal complexity of physical/chemical systems, especially when these are characterized by non-negligible internal noise.File | Dimensione | Formato | |
---|---|---|---|
2412.09412v5_compressed.pdf
accesso aperto
Tipologia:
1. Preprint / submitted version [pre- review]
Licenza:
Creative commons
Dimensione
3.52 MB
Formato
Adobe PDF
|
3.52 MB | Adobe PDF | Visualizza/Apri |
lionello-et-al-2025-relevant-hidden-and-frustrated-information-in-high-dimensional-analyses-of-complex-dynamical.pdf
accesso aperto
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Creative commons
Dimensione
1.23 MB
Formato
Adobe PDF
|
1.23 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3001515