Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > physics.data-an

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Data Analysis, Statistics and Probability

  • New submissions
  • Cross-lists
  • Replacements

See recent articles

Showing new listings for Monday, 23 June 2025

Total of 6 entries
Showing up to 2000 entries per page: fewer | more | all

New submissions (showing 2 of 2 entries)

[1] arXiv:2506.16215 [pdf, html, other]
Title: Transfer entropy for finite data
Alec Kirkley
Subjects: Data Analysis, Statistics and Probability (physics.data-an); Social and Information Networks (cs.SI)

Transfer entropy is a widely used measure for quantifying directed information flows in complex systems. While the challenges of estimating transfer entropy for continuous data are well known, it has two major shortcomings that persist even for data of finite cardinality: it exhibits a substantial positive bias for sparse bin counts, and it has no clear means to assess statistical significance. By more precisely accounting for information content in finite data streams, we derive a transfer entropy measure which is asymptotically equivalent to the standard plug-in estimator but remedies these issues for time series of small size and/or high cardinality, permitting a fully nonparametric assessment of statistical significance without simulation. We show that this correction for finite data has a substantial impact on results in both real and synthetic time series datasets.

[2] arXiv:2506.16715 [pdf, html, other]
Title: Transition of AI Models in dependence of noise
Thomas Seidler, Markus Abel
Subjects: Data Analysis, Statistics and Probability (physics.data-an)

We investigate the dependence of the score on noise in the data, and on the network size. As a result, we obtain the so-called "cognition transition" from good performance to zero with increasing noise. The understanding of this transition is of fundamental scientific and practical interest. We use concepts from statistical mechanics to understand how a changing finite size of models affects the cognition ability under the presence or corrupted data. On one hand, we study if there is a universal aspect in the transition to several models, on the other hand we go into detail how the approach of the cognition transition point can be captured quantitatively. Therefore, we use the so-called scaling approach from statistical mechanics and find a power-law behaviour of the transition width with increasing model size. Since our study is aimed at universal aspects we use well-know models and data for image classification. That way we avoid uncertainties in data handling or model setup. The practical implication of our results is a tool to estimate model sizes for a certain "universality class" of models, without the need to investigate large sizes, just by extrapolating the scaling results. In turn, that allows for cost reduction in hyperparameter studies. Here, we present first results on a concrete setup; we think that the understanding the mechanics of large system sizes is of fundamental interest for a further exploration of even larger models.

Cross submissions (showing 3 of 3 entries)

[3] arXiv:2506.15713 (cross-list from cs.LG) [pdf, html, other]
Title: An application of machine learning to the motion response prediction of floating assets
Michael T.M.B. Morris-Thomas, Marius Martens
Comments: 17 pages, 6 figures
Subjects: Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Fluid Dynamics (physics.flu-dyn)

The real-time prediction of floating offshore asset behavior under stochastic metocean conditions remains a significant challenge in offshore engineering. While traditional empirical and frequency-domain methods work well in benign conditions, they struggle with both extreme sea states and nonlinear responses. This study presents a supervised machine learning approach using multivariate regression to predict the nonlinear motion response of a turret-moored vessel in 400 m water depth. We developed a machine learning workflow combining a gradient-boosted ensemble method with a custom passive weathervaning solver, trained on approximately $10^6$ samples spanning 100 features. The model achieved mean prediction errors of less than 5% for critical mooring parameters and vessel heading accuracy to within 2.5 degrees across diverse metocean conditions, significantly outperforming traditional frequency-domain methods. The framework has been successfully deployed on an operational facility, demonstrating its efficacy for real-time vessel monitoring and operational decision-making in offshore environments.

[4] arXiv:2506.16446 (cross-list from cond-mat.stat-mech) [pdf, html, other]
Title: A General Framework for Linking Free and Forced Fluctuations via Koopmanism
Valerio Lucarini, Manuel Santos Gutierrez, John Moroney, Niccolò Zagli
Comments: 18 pages, 3 figures
Subjects: Statistical Mechanics (cond-mat.stat-mech); Chaotic Dynamics (nlin.CD); Data Analysis, Statistics and Probability (physics.data-an)

The link between forced and free fluctuations for nonequilibrium systems can be described via a generalized version of the celebrated fluctuation-dissipation theorem. The use of the formalism of the Koopman operator makes it possible to deliver an intepretable form of the response operators written as a sum of exponentially decaying terms, each associated one-to-one with a mode of natural variability of the system. Here we showcase on a stochastically forced version of the celebrated Lorenz '63 model the feasibility and skill of such an approach by considering different Koopman dictionaries, which allows us to treat also seamlessly coarse-graining approaches like the Ulam method. Our findings provide support for the development of response theory-based investigation methods also in an equation-agnostic, data-driven environment.

[5] arXiv:2506.16522 (cross-list from physics.ins-det) [pdf, html, other]
Title: Improvement of Nuclide Detection through Graph Spectroscopic Analysis Framework and its Application to Nuclear Facility Upset Detection
Pedro Rodríguez Fernández, Christian Svinth, Alex Hagen
Subjects: Instrumentation and Detectors (physics.ins-det); Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an)

We present a method to improve the detection limit for radionuclides using spectroscopic radiation detectors and the arrival time of each detected radiation quantum. We enable this method using a neural network with an attention mechanism. We illustrate the method on the detection of Cesium release from a nuclear facility during an upset, and our method shows $2\times$ improvement over the traditional spectroscopic method. We hypothesize that our method achieves this performance increase by modulating its detection probability by the overall rate of probable detections, specifically by adapting detection thresholds based on temporal event distributions and local spectral features, and show evidence to this effect. We believe this method is applicable broadly and may be more successful for radionuclides with more complicated decay chains than Cesium; we also note that our method can generalize beyond the addition of arrival time and could integrate other data about each detection event, such as pulse quality, location in detector, or even combining the energy and time from detections in different detectors.

Replacement submissions (showing 1 of 1 entries)

[6] arXiv:2405.18532 (replaced) [pdf, html, other]
Title: Automatic Forward Model Parameterization with Bayesian Inference of Conformational Populations
Robert M. Raddi, Tim Marshall, Vincent A. Voelz
Subjects: Biological Physics (physics.bio-ph); Chemical Physics (physics.chem-ph); Data Analysis, Statistics and Probability (physics.data-an)

To quantify how well theoretical predictions of structural ensembles agree with experimental measurements, we depend on the accuracy of forward models. These models are computational frameworks that generate observable quantities from molecular configurations based on empirical relationships linking specific molecular properties to experimental measurements. Bayesian Inference of Conformational Populations (BICePs) is a reweighting algorithm that reconciles simulated ensembles with ensemble-averaged experimental observations, even when such observations are sparse and/or noisy. This is achieved by sampling the posterior distribution of conformational populations under experimental restraints as well as sampling the posterior distribution of uncertainties due to random and systematic error. In this study, we enhance the algorithm for the refinement of empirical forward model (FM) parameters. We introduce and evaluate two novel methods for optimizing FM parameters. The first method treats FM parameters as nuisance parameters, integrating over them in the full posterior distribution. The second method employs variational minimization of a quantity called the BICePs score that reports the free energy of `turning on` the experimental restraints. This technique, coupled with improved likelihood functions for handling experimental outliers, facilitates force field validation and optimization, as illustrated in recent studies (Raddi et al. 2023, 2024). Using this approach, we refine parameters that modulate the Karplus relation, crucial for accurate predictions of J-coupling constants based on dihedral angles between interacting nuclei. We validate this approach first with a toy model system, and then for human ubiquitin, predicting six sets of Karplus parameters. Finally, we demonstrate that our framework naturally generalizes optimization to any differentiable forward model...

Total of 6 entries
Showing up to 2000 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack