The project automatically fetches the latest papers from arXiv based on keywords.
The subheadings in the README file represent the search keywords.
Only the most recent articles for each keyword are retained, up to a maximum of 100 papers.
You can click the 'Watch' button to receive daily email notifications.
Last update: 2025-04-08
Title | Date | Abstract | Comment |
---|---|---|---|
Choosing an analytic approach: Key study design considerations in state policy evaluation | 2025-04-04 | ShowThis paper reviews and details methods for state policy evaluation to guide selection of a research approach based on evaluation setting and available data. We highlight key design considerations for an analysis, including treatment and control group selection, timing of policy adoption, expected effect heterogeneity, and data considerations. We then provide an overview of analytic approaches and differentiate between methods based on evaluation context, such as settings with no control units, a single treated unit, multiple treated units, or with multiple treatment cohorts. Methods discussed include interrupted time series models, difference-in-differences estimators, autoregressive models, and synthetic control methods, along with method extensions which address issues like staggered policy adoption and heterogenous treatment effects. We end with an illustrative example, applying the developed framework to evaluate the impacts of state-level naloxone standing order policies on overdose rates. Overall, we provide researchers with an approach for deciding on methods for state policy evaluations, which can be used to select study designs and inform methodological choices. |
|
Scalable Hypergraph Structure Learning with Diverse Smoothness Priors | 2025-04-04 | ShowIn graph signal processing, learning the weighted connections between nodes from a set of sample signals is a fundamental task when the underlying relationships are not known a priori. This task is typically addressed by finding a graph Laplacian on which the observed signals are smooth. With the extension of graphs to hypergraphs - where edges can connect more than two nodes - graph learning methods have similarly been generalized to hypergraphs. However, the absence of a unified framework for calculating total variation has led to divergent definitions of smoothness and, consequently, differing approaches to hyperedge recovery. We confront this challenge through generalization of several previously proposed hypergraph total variations, subsequently allowing ease of substitution into a vector based optimization. To this end, we propose a novel hypergraph learning method that recovers a hypergraph topology from time-series signals based on a smoothness prior. Our approach addresses key limitations in prior works, such as hyperedge selection and convergence issues, by formulating the problem as a convex optimization solved via a forward-backward-forward algorithm, ensuring guaranteed convergence. Additionally, we introduce a process that simultaneously limits the span of the hyperedge search and maintains a valid hyperedge selection set. In doing so, our method becomes scalable in increasingly complex network structures. The experimental results demonstrate improved performance, in terms of accuracy, over other state-of-the-art hypergraph inference methods; furthermore, we empirically show our method to be robust to total variation terms, biased towards global smoothness, and scalable to larger hypergraphs. |
13 pa...13 pages, 6 figures, submitted to IEEE for possible publication |
Quantifying Robustness: A Benchmarking Framework for Deep Learning Forecasting in Cyber-Physical Systems | 2025-04-04 | ShowCyber-Physical Systems (CPS) in domains such as manufacturing and energy distribution generate complex time series data crucial for Prognostics and Health Management (PHM). While Deep Learning (DL) methods have demonstrated strong forecasting capabilities, their adoption in industrial CPS remains limited due insufficient robustness. Existing robustness evaluations primarily focus on formal verification or adversarial perturbations, inadequately representing the complexities encountered in real-world CPS scenarios. To address this, we introduce a practical robustness definition grounded in distributional robustness, explicitly tailored to industrial CPS, and propose a systematic framework for robustness evaluation. Our framework simulates realistic disturbances, such as sensor drift, noise and irregular sampling, enabling thorough robustness analyses of forecasting models on real-world CPS datasets. The robustness definition provides a standardized score to quantify and compare model performance across diverse datasets, assisting in informed model selection and architecture design. Through extensive empirical studies evaluating prominent DL architectures (including recurrent, convolutional, attention-based, modular, and structured state-space models) we demonstrate the applicability and effectiveness of our approach. We publicly release our robustness benchmark to encourage further research and reproducibility. |
|
Identifiability of VAR(1) model in a stationary setting | 2025-04-04 | ShowWe consider a classical First-order Vector AutoRegressive (VAR(1)) model, where we interpret the autoregressive interaction matrix as influence relationships among the components of the VAR(1) process that can be encoded by a weighted directed graph. A majority of previous work studies the structural identifiability of the graph based on time series observations and therefore relies on dynamical information. In this work we assume that an equilibrium exists, and study instead the identifiability of the graph from the stationary distribution, meaning that we seek a way to reconstruct the influence graph underlying the dynamic network using only static information. We use an approach from algebraic statistics that characterizes models using the Jacobian matroids associated with the parametrization of the models, and we introduce sufficient graphical conditions under which different graphs yield distinct steady-state distributions. Additionally, we illustrate how our results could be applied to characterize networks inspired by ecological research. |
|
Semi-Supervised Model-Free Bayesian State Estimation from Compressed Measurements | 2025-04-04 | ShowWe consider data-driven Bayesian state estimation from compressed measurements (BSCM) of a model-free process. The dimension of the temporal measurement vector is lower than that of the temporal state vector to be estimated, leading to an under-determined inverse problem. The underlying dynamical model of the state's evolution is unknown for a 'model-free process.' Hence, it is difficult to use traditional model-driven methods, for example, Kalman and particle filters. Instead, we consider data-driven methods. We experimentally show that two existing unsupervised learning-based data-driven methods fail to address the BSCM problem in a model-free process. The methods are -- data-driven nonlinear state estimation (DANSE) and deep Markov model (DMM). While DANSE provides good predictive/forecasting performance to model the temporal measurement data as a time series, its unsupervised learning lacks suitable regularization for tackling the BSCM task. We then propose a semi-supervised learning approach and develop a semi-supervised learning-based DANSE method, referred to as SemiDANSE. In SemiDANSE, we use a large amount of unlabelled data along with a limited amount of labelled data, i.e., pairwise measurement-and-state data, which provides the desired regularization. Using three benchmark dynamical systems, we show that the data-driven SemiDANSE provides competitive state estimation performance for BSCM against a hybrid method called KalmanNet and two model-driven methods (extended Kalman filter and unscented Kalman filter) that know the dynamical models exactly. |
14 pa...14 pages, under review at IEEE Transactions on Signal Processing |
Data Augmentation of Time-Series Data in Human Movement Biomechanics: A Scoping Review | 2025-04-04 | ShowThe integration of machine learning and deep learning has transformed data analytics in biomechanics, enabled by extensive wearable sensor data. However, the field faces challenges such as limited large-scale datasets and high data acquisition costs, which hinder the development of robust algorithms. Data augmentation techniques show promise in addressing these issues, but their application to biomechanical time-series data requires comprehensive evaluation. This scoping review investigates data augmentation methods for time-series data in the biomechanics domain. It analyzes current approaches for augmenting and generating time-series datasets, evaluates their effectiveness, and offers recommendations for applying these techniques in biomechanics. Four databases, PubMed, IEEE Xplore, Scopus, and Web of Science, were searched for studies published between 2013 and 2024. Following PRISMA-ScR guidelines, a two-stage screening identified 21 relevant publications. Results show that there is no universally preferred method for augmenting biomechanical time-series data; instead, methods vary based on study objectives. A major issue identified is the absence of soft tissue artifacts in synthetic data, leading to discrepancies referred to as the synthetic gap. Moreover, many studies lack proper evaluation of augmentation methods, making it difficult to assess their effects on model performance and data quality. This review highlights the critical role of data augmentation in addressing limited dataset availability and improving model generalization in biomechanics. Tailoring augmentation strategies to the characteristics of biomechanical data is essential for advancing predictive modeling. A better understanding of how different augmentation methods impact data quality and downstream tasks will be key to developing more effective and realistic techniques. |
Prepr...Preprint under review at PLOS ONE |
Block Toeplitz Sparse Precision Matrix Estimation for Large-Scale Interval-Valued Time Series Forecasting | 2025-04-04 | ShowModeling and forecasting interval-valued time series (ITS) have attracted considerable attention due to their growing presence in various contexts. To the best of our knowledge, there have been no efforts to model large-scale ITS. In this paper, we propose a feature extraction procedure for large-scale ITS, which involves key steps such as auto-segmentation and clustering, and feature transfer learning. This procedure can be seamlessly integrated with any suitable prediction models for forecasting purposes. Specifically, we transform the automatic segmentation and clustering of ITS into the estimation of Toeplitz sparse precision matrices and assignment set. The majorization-minimization algorithm is employed to convert this highly non-convex optimization problem into two subproblems. We derive efficient dynamic programming and alternating direction method to solve these two subproblems alternately and establish their convergence properties. By employing the Joint Recurrence Plot (JRP) to image subsequence and assigning a class label to each cluster, an image dataset is constructed. Then, an appropriate neural network is chosen to train on this image dataset and used to extract features for the next step of forecasting. Real data applications demonstrate that the proposed method can effectively obtain invariant representations of the raw data and enhance forecasting performance. |
|
Adaptive Classification of Interval-Valued Time Series | 2025-04-04 | ShowIn recent years, the modeling and analysis of interval-valued time series have garnered significant attention in the fields of econometrics and statistics. However, the existing literature primarily focuses on regression tasks while neglecting classification aspects. In this paper, we propose an adaptive approach for interval-valued time series classification. Specifically, we represent interval-valued time series using convex combinations of upper and lower bounds of intervals and transform these representations into images based on point-valued time series imaging methods. We utilize a fine-grained image classification neural network to classify these images, to achieve the goal of classifying the original interval-valued time series. This proposed method is applicable to both univariate and multivariate interval-valued time series. On the optimization front, we treat the convex combination coefficients as learnable parameters similar to the parameters of the neural network and provide an efficient estimation method based on the alternating direction method of multipliers (ADMM). On the theoretical front, under specific conditions, we establish a margin-based multiclass generalization bound for generic CNNs composed of basic blocks involving convolution, pooling, and fully connected layers. Through simulation studies and real data applications, we validate the effectiveness of the proposed method and compare its performance against a wide range of point-valued time series classification methods. |
|
A model-free feature extraction procedure for interval-valued time series prediction | 2025-04-04 | ShowIn this paper, we present a novel feature extraction procedure to predict interval-valued time series by combing transfer learning and imaging approaches. Initially, we represent interval-valued time series using a bivariate point-valued time series, which serves as a representative form. We first transform each time series into images by employing various imaging approaches such as recurrence plot, gramian angular summation/difference field, and Markov transition field, and construct an image dataset by treating each imaging method's output as a separate class. Based on this dataset, we train several candidates for a feature extraction network (FEN), specifically ResNet with varying layers. Then we choose the penultimate layer of the FEN to extract the most relevant features from the transformed images. We integrate the extracted features into conventional predictive models to formulate the corresponding prediction models. To formulate prediction, we integrate the extracted features into a regular prediction model. The proposed methods are evaluated based on the S&P 500 index and three data-generating processes (DGPs), and the experimental results demonstrate a notable improvement in prediction performance compared to existing methods. |
|
A Robust Method for Fault Detection and Severity Estimation in Mechanical Vibration Data | 2025-04-04 | ShowThis paper proposes a robust method for fault detection and severity estimation in multivariate time-series data to enhance predictive maintenance of mechanical systems. We use the Temporal Graph Convolutional Network (T-GCN) model to capture both spatial and temporal dependencies among variables. This enables accurate future state predictions under varying operational conditions. To address the challenge of fluctuating anomaly scores that reduce fault severity estimation accuracy, we introduce a novel fault severity index based on the mean and standard deviation of anomaly scores. This generates a continuous and reliable severity measurement. We validate the proposed method using two experimental datasets: an open IMS bearing dataset and data collected from a fanjet electric propulsion system. Results demonstrate that our method significantly reduces abrupt fluctuations and inconsistencies in anomaly scores. This provides a more dependable foundation for maintenance planning and risk management in safety-critical applications. |
8 pages, 9 figures |
Water Mapping and Change Detection Using Time Series Derived from the Continuous Monitoring of Land Disturbance Algorithm | 2025-04-04 | ShowGiven the growing environmental challenges, accurate monitoring and prediction of changes in water bodies are essential for sustainable management and conservation. The Continuous Monitoring of Land Disturbance (COLD) algorithm provides a valuable tool for real-time analysis of land changes, such as deforestation, urban expansion, agricultural activities, and natural disasters. This capability enables timely interventions and more informed decision-making. This paper assesses the effectiveness of the algorithm to estimate water bodies and track pixel-level water trends over time. Our findings indicate that COLD-derived data can reliably estimate estimate water frequency during stable periods and delineate water bodies. Furthermore, it enables the evaluation of trends in water areas after disturbances, allowing for the determination of whether water frequency increases, decreases, or remains constant. |
|
Graph Network Modeling Techniques for Visualizing Human Mobility Patterns | 2025-04-04 | ShowHuman mobility analysis at urban-scale requires models to represent the complex nature of human movements, which in turn are affected by accessibility to nearby points of interest, underlying socioeconomic factors of a place, and local transport choices for people living in a geographic region. In this work, we represent human mobility and the associated flow of movements as a grapyh. Graph-based approaches for mobility analysis are still in their early stages of adoption and are actively being researched. The challenges of graph-based mobility analysis are multifaceted - the lack of sufficiently high-quality data to represent flows at high spatial and teporal resolution whereas, limited computational resources to translate large voluments of mobility data into a network structure, and scaling issues inherent in graph models etc. The current study develops a methodology by embedding graphs into a continuous space, which alleviates issues related to fast graph matching, graph time-series modeling, and visualization of mobility dynamics. Through experiments, we demonstrate how mobility data collected from taxicab trajectories could be transformed into network structures and patterns of mobility flow changes, and can be used for downstream tasks reporting approx 40% decrease in error on average in matched graphs vs unmatched ones. |
|
Latent Gaussian dynamic factor modeling and forecasting for multivariate count time series | 2025-04-03 | ShowThis work considers estimation and forecasting in a multivariate, possibly high-dimensional count time series model constructed from a transformation of a latent Gaussian dynamic factor series. The estimation of the latent model parameters is based on second-order properties of the count and underlying Gaussian time series, yielding estimators of the underlying covariance matrices for which standard principal component analysis applies. Theoretical consistency results are established for the proposed estimation, building on certain concentration results for the models of the type considered. They also involve the memory of the latent Gaussian process, quantified through a spectral gap, shown to be suitably bounded as the model dimension increases, which is of independent interest. In addition, novel cross-validation schemes are suggested for model selection. The forecasting is carried out through a particle-based sequential Monte Carlo, leveraging Kalman filtering techniques. A simulation study and an application are also considered. |
|
Survival Analysis with Machine Learning for Predicting Li-ion Battery Remaining Useful Life | 2025-04-03 | ShowBattery degradation significantly impacts the reliability and efficiency of energy storage systems, particularly in electric vehicles (EVs) and industrial applications. Predicting the remaining useful life (RUL) of lithium-ion (Li-ion) batteries is crucial for optimizing maintenance schedules, reducing costs, and improving safety. Traditional RUL prediction methods often struggle with nonlinear degradation patterns and uncertainty quantification. To address these challenges, we propose a hybrid survival analysis framework integrating both statistical and machine-learning-based models for RUL estimation. Our approach transforms time-series battery data into time-to-failure data using path signatures, enabling effective survival modeling. We apply five models, including Cox-based survival models and machine-learning-based methods such as DeepHit and MTLR, to estimate failure-free probabilities over time. Experiments conducted on 362 Toyota battery datasets demonstrate the effectiveness of our approach, achieving high time-dependent AUC and concordance index while maintaining a low integrated Brier score. The proposed methodology provides actionable insights for battery manufacturers and engineers, supporting dynamic maintenance strategies and optimized lifecycle management. |
|
Anomaly Detection in Time Series Data Using Reinforcement Learning, Variational Autoencoder, and Active Learning | 2025-04-03 | ShowA novel approach to detecting anomalies in time series data is presented in this paper. This approach is pivotal in domains such as data centers, sensor networks, and finance. Traditional methods often struggle with manual parameter tuning and cannot adapt to new anomaly types. Our method overcomes these limitations by integrating Deep Reinforcement Learning (DRL) with a Variational Autoencoder (VAE) and Active Learning. By incorporating a Long Short-Term Memory (LSTM) network, our approach models sequential data and its dependencies effectively, allowing for the detection of new anomaly classes with minimal labeled data. Our innovative DRL- VAE and Active Learning combination significantly improves existing methods, as shown by our evaluations on real-world datasets, enhancing anomaly detection techniques and advancing time series analysis. |
|
A co-segmentation algorithm to predict emotional stress from passively sensed mHealth data | 2025-04-03 | ShowWe develop a data-driven co-segmentation algorithm of passively sensed and self-reported active variables collected through smartphones to identify emotionally stressful states in middle-aged and older patients with mood disorders undergoing therapy, some of whom also have chronic pain. Our method leverages the association between the different types of time series. These data are typically non-stationary, with meaningful associations often occurring only over short time windows. Traditional machine learning (ML) methods, when applied globally on the entire time series, often fail to capture these time-varying local patterns. Our approach first segments the passive sensing variables by detecting their change points, then examines segment-specific associations with the active variable to identify co-segmented periods that exhibit distinct relationships between stress and passively sensed measures. We then use these periods to predict future emotional stress states using standard ML methods. By shifting the unit of analysis from individual time points to data-driven segments of time and allowing for different associations in different segments, our algorithm helps detect patterns that only exist within short-time windows. We apply our method to detect periods of stress in patient data collected during ALACRITY Phase I study. Our findings indicate that the data-driven segmentation algorithm identifies stress periods more accurately than traditional ML methods that do not incorporate segmentation. |
|
Explanation Space: A New Perspective into Time Series Interpretability | 2025-04-03 | ShowHuman understandable explanation of deep learning models is essential for various critical and sensitive applications. Unlike image or tabular data where the importance of each input feature (for the classifier's decision) can be directly projected into the input, time series distinguishable features (e.g. dominant frequency) are often hard to manifest in time domain for a user to easily understand. Additionally, most explanation methods require a baseline value as an indication of the absence of any feature. However, the notion of lack of feature, which is often defined as black pixels for vision tasks or zero/mean values for tabular data, is not well-defined in time series. Despite the adoption of explainable AI methods (XAI) from tabular and vision domain into time series domain, these differences limit the application of these XAI methods in practice. In this paper, we propose a simple yet effective method that allows a model originally trained on the time domain to be interpreted in other explanation spaces using existing methods. We suggest five explanation spaces, each of which can potentially alleviate these issues in certain types of time series. Our method can be easily integrated into existing platforms without any changes to trained models or XAI methods. The code will be released upon acceptance. |
|
End-To-End Self-Tuning Self-Supervised Time Series Anomaly Detection | 2025-04-03 | ShowTime series anomaly detection (TSAD) finds many applications such as monitoring environmental sensors, industry KPIs, patient biomarkers, etc. A two-fold challenge for TSAD is a versatile and unsupervised model that can detect various different types of time series anomalies (spikes, discontinuities, trend shifts, etc.) without any labeled data. Modern neural networks have outstanding ability in modeling complex time series. Self-supervised models in particular tackle unsupervised TSAD by transforming the input via various augmentations to create pseudo anomalies for training. However, their performance is sensitive to the choice of augmentation, which is hard to choose in practice, while there exists no effort in the literature on data augmentation tuning for TSAD without labels. Our work aims to fill this gap. We introduce TSAP for TSA "on autoPilot", which can (self-)tune augmentation hyperparameters end-to-end. It stands on two key components: a differentiable augmentation architecture and an unsupervised validation loss to effectively assess the alignment between augmentation type and anomaly type. Case studies show TSAP's ability to effectively select the (discrete) augmentation type and associated (continuous) hyperparameters. In turn, it outperforms established baselines, including SOTA self-supervised models, on diverse TSAD tasks exhibiting different anomaly types. |
Accepted at SDM 2025 |
A Dynamic, Ordinal Gaussian Process Item Response Theoretic Model | 2025-04-03 | ShowSocial scientists are often interested in using ordinal indicators to estimate latent traits that change over time. Frequently, this is done with item response theoretic (IRT) models that describe the relationship between those latent traits and observed indicators. We combine recent advances in Bayesian nonparametric IRT, which makes minimal assumptions on shapes of item response functions, and Gaussian process time series methods to capture dynamic structures in latent traits from longitudinal observations. We propose a generalized dynamic Gaussian process item response theory (GD-GPIRT) as well as a Markov chain Monte Carlo sampling algorithm for estimation of both latent traits and response functions. We evaluate GD-GPIRT in simulation studies against baselines in dynamic IRT, and apply it to various substantive studies, including assessing public opinions on economy environment and congressional ideology related to abortion debate. |
|
Adaptive Student's t-distribution with method of moments moving estimator for nonstationary time series | 2025-04-03 | ShowThe real life time series are usually nonstationary, bringing a difficult question of model adaptation. Classical approaches like ARMA-ARCH assume arbitrary type of dependence. To avoid their bias, we will focus on recently proposed agnostic philosophy of moving estimator: in time |
x-\mu |
VISTA: Unsupervised 2D Temporal Dependency Representations for Time Series Anomaly Detection | 2025-04-03 | ShowTime Series Anomaly Detection (TSAD) is essential for uncovering rare and potentially harmful events in unlabeled time series data. Existing methods are highly dependent on clean, high-quality inputs, making them susceptible to noise and real-world imperfections. Additionally, intricate temporal relationships in time series data are often inadequately captured in traditional 1D representations, leading to suboptimal modeling of dependencies. We introduce VISTA, a training-free, unsupervised TSAD algorithm designed to overcome these challenges. VISTA features three core modules: 1) Time Series Decomposition using Seasonal and Trend Decomposition via Loess (STL) to decompose noisy time series into trend, seasonal, and residual components; 2) Temporal Self-Attention, which transforms 1D time series into 2D temporal correlation matrices for richer dependency modeling and anomaly detection; and 3) Multivariate Temporal Aggregation, which uses a pretrained feature extractor to integrate cross-variable information into a unified, memory-efficient representation. VISTA's training-free approach enables rapid deployment and easy hyperparameter tuning, making it suitable for industrial applications. It achieves state-of-the-art performance on five multivariate TSAD benchmarks. |
|
FLEXtime: Filterbank learning to explain time series | 2025-04-03 | ShowState-of-the-art methods for explaining predictions from time series involve learning an instance-wise saliency mask for each time step; however, many types of time series are difficult to interpret in the time domain, due to the inherently complex nature of the data. Instead, we propose to view time series explainability as saliency maps over interpretable parts, leaning on established signal processing methodology on signal decomposition. Specifically, we propose a new method called FLEXtime that uses a bank of bandpass filters to split the time series into frequency bands. Then, we learn the combination of these bands that optimally explains the model's prediction. Our extensive evaluation shows that, on average, FLEXtime outperforms state-of-the-art explainability methods across a range of datasets. FLEXtime fills an important gap in the current time series explainability methodology and is a valuable tool for a wide range of time series such as EEG and audio. Code is available at https://github.com/theabrusch/FLEXtime. |
Accep...Accepted to The 3rd World Conference on eXplainable Artificial Intelligence |
Transformer-based Multivariate Time Series Anomaly Localization | 2025-04-03 | ShowWith the growing complexity of Cyber-Physical Systems (CPS) and the integration of Internet of Things (IoT), the use of sensors for online monitoring generates large volume of multivariate time series (MTS) data. Consequently, the need for robust anomaly diagnosis in MTS is paramount to maintaining system reliability and safety. While significant advancements have been made in anomaly detection, localization remains a largely underexplored area, though crucial for intelligent decision-making. This paper introduces a novel transformer-based model for unsupervised anomaly diagnosis in MTS, with a focus on improving localization performance, through an in-depth analysis of the self-attention mechanism's learning behavior under both normal and anomalous conditions. We formulate the anomaly localization problem as a three-stage process: time-step, window, and segment-based. This leads to the development of the Space-Time Anomaly Score (STAS), a new metric inspired by the connection between transformer latent representations and space-time statistical models. STAS is designed to capture individual anomaly behaviors and inter-series dependencies, delivering enhanced localization performance. Additionally, the Statistical Feature Anomaly Score (SFAS) complements STAS by analyzing statistical features around anomalies, with their combination helping to reduce false alarms. Experiments on real world and synthetic datasets illustrate the model's superiority over state-of-the-art methods in both detection and localization tasks. |
|
Temporal Gaussian Copula For Clinical Multivariate Time Series Data Imputation | 2025-04-03 | ShowThe imputation of the Multivariate time series (MTS) is particularly challenging since the MTS typically contains irregular patterns of missing values due to various factors such as instrument failures, interference from irrelevant data, and privacy regulations. Existing statistical methods and deep learning methods have shown promising results in time series imputation. In this paper, we propose a Temporal Gaussian Copula Model (TGC) for three-order MTS imputation. The key idea is to leverage the Gaussian Copula to explore the cross-variable and temporal relationships based on the latent Gaussian representation. Subsequently, we employ an Expectation-Maximization (EM) algorithm to improve robustness in managing data with varying missing rates. Comprehensive experiments were conducted on three real-world MTS datasets. The results demonstrate that our TGC substantially outperforms the state-of-the-art imputation methods. Additionally, the TGC model exhibits stronger robustness to the varying missing ratios in the test dataset. Our code is available at https://github.com/MVL-Lab/TGC-MTS. |
Accepted in BIBM2024 |
Traffic Flow Data Completion and Anomaly Diagnosis via Sparse and Low-Rank Tensor Optimization | 2025-04-03 | ShowSpatiotemporal traffic time series, such as traffic speed data, collected from sensing systems are often incomplete, with considerable corruption and large amounts of missing values. A vast amount of data conceals implicit data structures, which poses significant challenges for data recovery issues, such as mining the potential spatio-temporal correlations of data and identifying abnormal data. In this paper, we propose a Tucker decomposition-based sparse low-rank high-order tensor optimization model (TSLTO) for data imputation and anomaly diagnosis. We decompose the traffic tensor data into low-rank and sparse tensors, and establish a sparse low-rank high-order tensor optimization model based on Tucker decomposition. By utilizing tools of non-smooth analysis for tensor functions, we explore the optimality conditions of the proposed tensor optimization model and design an ADMM optimization algorithm for solving the model. Finally, numerical experiments are conducted on both synthetic data and a real-world dataset: the urban traffic speed dataset of Guangzhou. Numerical comparisons with several representative existing algorithms demonstrate that our proposed approach achieves higher accuracy and efficiency in traffic flow data recovery and anomaly diagnosis tasks. |
|
Bayesian Inference for High-dimensional Time Series with a Directed Acyclic Graphical Structure | 2025-04-03 | ShowIn multivariate time series analysis, understanding the underlying causal relationships among variables is often of interest for various applications. Directed acyclic graphs (DAGs) provide a powerful framework for representing causal dependencies. This paper proposes a novel Bayesian approach for modeling multivariate time series where conditional independencies and causal structure are encoded by a DAG. The proposed model allows structural properties such as stationarity to be easily accommodated. Given the application, we further extend the model for matrix-variate time series. We take a Bayesian approach to inference, and a ``projection-posterior'' based efficient computational algorithm is developed. The posterior convergence properties of the proposed method are established along with two identifiability results for the unrestricted structural equation models. The utility of the proposed method is demonstrated through simulation studies and real data analysis. |
|
Movement-Prediction-Adjusted Naïve Forecast | 2025-04-03 | ShowThis study introduces a movement-prediction-adjusted na"ive forecast, which is the original na"ive forecast with the addition of a weighted movement prediction term, in the context of forecasting time series that exhibit symmetric random walk properties. The weight of the movement term is determined by two parameters: one reflecting the directional accuracy and the other representing the mean absolute increment. The settings of the two parameters involve a trade-off: larger values may yield meaningful gains over the original na"ive forecast, whereas smaller values often render the adjusted forecast more reliable. This trade-off can be managed by empirically setting the parameters using sliding windows on in-sample data. To statistically test the performance of the adjusted na"ive forecast under different directional accuracy levels, we used four synthetic time series to simulate multiple forecast scenarios, assuming that for each directional accuracy level, diverse movement predictions were provided. The simulation results show that as the directional accuracy increases, the error of the adjusted na"ive forecast decreases. In particular, the adjusted na"ive forecast achieves statistically significant improvements over the original na"ive forecast, even under a low directional accuracy of slightly above 0.50. This finding implies that the movement-prediction-adjusted na"ive forecast can serve as a new optimal point forecast for time series with symmetric random walk characteristics if consistent movement prediction can be provided. |
|
FastFlow: Early Yet Robust Network Flow Classification using the Minimal Number of Time-Series Packets | 2025-04-02 | ShowNetwork traffic classification is of great importance for network operators in their daily routines, such as analyzing the usage patterns of multimedia applications and optimizing network configurations. Internet service providers (ISPs) that operate high-speed links expect network flow classifiers to accurately classify flows early, using the minimal number of necessary initial packets per flow. These classifiers must also be robust to packet sequence disorders in candidate flows and capable of detecting unseen flow types that are not within the existing classification scope, which are not well achieved by existing methods. In this paper, we develop FastFlow, a time-series flow classification method that accurately classifies network flows as one of the known types or the unknown type, which dynamically selects the minimal number of packets to balance accuracy and efficiency. Toward the objectives, we first develop a flow representation process that converts packet streams at both per-packet and per-slot granularity for precise packet statistics with robustness to packet sequence disorders. Second, we develop a sequential decision-based classification model that leverages LSTM architecture trained with reinforcement learning. Our model makes dynamic decisions on the minimal number of time-series data points per flow for the confident classification as one of the known flow types or an unknown one. We evaluated our method on public datasets and demonstrated its superior performance in early and accurate flow classification. Deployment insights on the classification of over 22.9 million flows across seven application types and 33 content providers in a campus network over one week are discussed, showing that FastFlow requires an average of only 8.37 packets and 0.5 seconds to classify the application type of a flow with over 91% accuracy and over 96% accuracy for the content providers. |
This ...This paper is accepted at ACM SIGMETRICS 2025. Proc. ACM Meas. Anal. Comput. Syst (2025) |
Reducing Smoothness with Expressive Memory Enhanced Hierarchical Graph Neural Networks | 2025-04-02 | ShowGraphical forecasting models learn the structure of time series data via projecting onto a graph, with recent techniques capturing spatial-temporal associations between variables via edge weights. Hierarchical variants offer a distinct advantage by analysing the time series across multiple resolutions, making them particularly effective in tasks like global weather forecasting, where low-resolution variable interactions are significant. A critical challenge in hierarchical models is information loss during forward or backward passes through the hierarchy. We propose the Hierarchical Graph Flow (HiGFlow) network, which introduces a memory buffer variable of dynamic size to store previously seen information across variable resolutions. We theoretically show two key results: HiGFlow reduces smoothness when mapping onto new feature spaces in the hierarchy and non-strictly enhances the utility of message-passing by improving Weisfeiler-Lehman (WL) expressivity. Empirical results demonstrate that HiGFlow outperforms state-of-the-art baselines, including transformer models, by at least an average of 6.1% in MAE and 6.2% in RMSE. Code is available at https://github.com/TB862/ HiGFlow.git. |
|
Niche Dynamics in Complex Online Community Ecosystems | 2025-04-02 | ShowOnline communities are important organizational forms where members socialize and share information. Curiously, different online communities often overlap considerably in topic and membership. Recent research has investigated competition and mutualism among overlapping online communities through the lens of organizational ecology; however, it has not accounted for how the nonlinear dynamics of online attention may lead to episodic competition and mutualism. Neither has it explored the origins of competition and mutualism in the processes by which online communities select or adapt to their niches. This paper presents a large-scale study of 8,806 Reddit communities belonging to 1,919 clusters of high user overlap over a 5-year period. The method uses nonlinear time series methods to infer bursty, often short-lived ecological dynamics. Results reveal that mutualism episodes are longer lived and slightly more frequent than competition episodes. Next, it tests whether online communities find their niches by specializing to avoid competition using panel regression models. It finds that competitive ecological interactions lead to decreasing topic and user overlaps; however, changes that decrease such niche overlaps do not lead to mutualism. The discussion proposes that future designs may enable online community ecosystem management by informing online community leaders to organize "spin-off" communities or via feeds and recommendations. |
12 pa...12 pages, 4 figures, conditionally accepted to ICWSM 2025 |
Efficient Model Selection for Time Series Forecasting via LLMs | 2025-04-02 | ShowModel selection is a critical step in time series forecasting, traditionally requiring extensive performance evaluations across various datasets. Meta-learning approaches aim to automate this process, but they typically depend on pre-constructed performance matrices, which are costly to build. In this work, we propose to leverage Large Language Models (LLMs) as a lightweight alternative for model selection. Our method eliminates the need for explicit performance matrices by utilizing the inherent knowledge and reasoning capabilities of LLMs. Through extensive experiments with LLaMA, GPT and Gemini, we demonstrate that our approach outperforms traditional meta-learning techniques and heuristic baselines, while significantly reducing computational overhead. These findings underscore the potential of LLMs in efficient model selection for time series forecasting. |
16 pages, 3 Figures |
Limits to Analog Reservoir Learning | 2025-04-02 | ShowReservoir computation is a recurrent framework for learning and predicting time series data, that benefits from extremely simple training and interpretability, often as the the dynamics of a physical system. In this paper, we will study the impact of noise on the learning capabilities of analog reservoir computers. Recent work on reservoir computation has shown that the information processing capacity (IPC) is a useful metric for quantifying the degradation of the performance due to noise. We further this analysis and demonstrate that this degradation of the IPC limits the possible features that can be meaningfully constructed in an analog reservoir computing setting. We borrow a result from quantum complexity theory that relates the circuit model of computation to a continuous time model, and demonstrate an exponential reduction in the accessible volume of reservoir configurations. We conclude by relating this degradation in the IPC to the fat-shattering dimension of a family of functions describing the reservoir dynamics, which allows us to express our result in terms of a classification task. We conclude that any physical, analog reservoir computer that is exposed to noise can only be used to perform a polynomial amount of learning, despite the exponentially large latent space, even with an exponential amount of post-processing. |
10 pages, 1 figure |
Virtual Target Trajectory Prediction for Stochastic Targets | 2025-04-02 | ShowTrajectory prediction of other vehicles is crucial for autonomous vehicles, with applications from missile guidance to UAV collision avoidance. Typically, target trajectories are assumed deterministic, but real-world aerial vehicles exhibit stochastic behavior, such as evasive maneuvers or gliders circling in thermals. This paper uses Conditional Normalizing Flows, an unsupervised Machine Learning technique, to learn and predict the stochastic behavior of targets of guided missiles using trajectory data. The trained model predicts the distribution of future target positions based on initial conditions and parameters of the dynamics. Samples from this distribution are clustered using a time series k-means algorithm to generate representative trajectories, termed virtual targets. The method is fast and target-agnostic, requiring only training data in the form of target trajectories. Thus, it serves as a drop-in replacement for deterministic trajectory predictions in guidance laws and path planning. Simulated scenarios demonstrate the approach's effectiveness for aerial vehicles with random maneuvers, bridging the gap between deterministic predictions and stochastic reality, advancing guidance and control algorithms for autonomous vehicles. |
will ...will be submitted to Journal of Guidance, Control, and Dynamics |
shapr: Explaining Machine Learning Models with Conditional Shapley Values in R and Python | 2025-04-02 | ShowThis paper introduces the shapr package, a versatile tool for generating Shapley value explanations for machine learning and statistical regression models in both R and Python. The package emphasizes conditional Shapley value estimates, providing a comprehensive range of approaches for accurately capturing feature dependencies, which is crucial for correct model interpretation and lacking in similar software. In addition to regular tabular data, the shapr R-package includes specialized functionality for explaining time series forecasts. The package offers a minimal set of user functions with sensible defaults for most use cases while providing extensive flexibility for advanced users to fine-tune computations. Additional features include parallelized computations, iterative estimation with convergence detection, and rich visualization tools. shapr also extends its functionality to compute causal and asymmetric Shapley values when causal information is available. In addition, we introduce the shaprpy Python library, which brings core capabilities of shapr to the Python ecosystem. Overall, the package aims to enhance the interpretability of predictive models within a powerful and user-friendly framework. |
|
Inference of hidden common driver dynamics by anisotropic self-organizing neural networks | 2025-04-02 | ShowWe are introducing a novel approach to infer the underlying dynamics of hidden common drivers, based on analyzing time series data from two driven dynamical systems. The inference relies on time-delay embedding, estimation of the intrinsic dimension of the observed systems, and their mutual dimension. A key component of our approach is a new anisotropic training technique applied to Kohonen's self-organizing map, which effectively learns the attractor of the driven system and separates it into submanifolds corresponding to the self-dynamics and shared dynamics. To demonstrate the effectiveness of our method, we conducted simulated experiments using different chaotic maps in a setup, where two chaotic maps were driven by a third map with nonlinear coupling. The inferred time series exhibited high correlation with the time series of the actual hidden common driver, in contrast to the observed systems. The quality of our reconstruction were compared and shown to be superior to several other methods that are intended to find the common features behind the observed time series, including linear methods like PCA and ICA as well as nonlinear methods like dynamical component analysis, canonical correlation analysis and even deep canonical correlation analysis. |
|
CLaP -- State Detection from Time Series | 2025-04-02 | ShowThe ever-growing amount of sensor data from machines, smart devices, and the environment leads to an abundance of high-resolution, unannotated time series (TS). These recordings encode the recognizable properties of latent states and transitions from physical phenomena that can be modelled as abstract processes. The unsupervised localization and identification of these states and their transitions is the task of time series state detection (TSSD). We introduce CLaP, a new, highly accurate and efficient algorithm for TSSD. It leverages the predictive power of time series classification for TSSD in an unsupervised setting by applying novel self-supervision techniques to detect whether data segments emerge from the same state or not. To this end, CLaP cross-validates a classifier with segment-labelled subsequences to quantify confusion between segments. It merges labels from segments with high confusion, representing the same latent state, if this leads to an increase in overall classification quality. We conducted an experimental evaluation using 391 TS from four benchmarks and found CLaP to be significantly more precise in detecting states than five state-of-the-art competitors. It achieves the best accuracy-runtime tradeoff and is scalable to large TS. We provide a Python implementation of CLaP, which can be deployed in TS analysis workflows. |
|
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications | 2025-04-02 | ShowFinancial LLMs hold promise for advancing financial tasks and domain-specific applications. However, they are limited by scarce corpora, weak multimodal capabilities, and narrow evaluations, making them less suited for real-world application. To address this, we introduce \textit{Open-FinLLMs}, the first open-source multimodal financial LLMs designed to handle diverse tasks across text, tabular, time-series, and chart data, excelling in zero-shot, few-shot, and fine-tuning settings. The suite includes FinLLaMA, pre-trained on a comprehensive 52-billion-token corpus; FinLLaMA-Instruct, fine-tuned with 573K financial instructions; and FinLLaVA, enhanced with 1.43M multimodal tuning pairs for strong cross-modal reasoning. We comprehensively evaluate Open-FinLLMs across 14 financial tasks, 30 datasets, and 4 multimodal tasks in zero-shot, few-shot, and supervised fine-tuning settings, introducing two new multimodal evaluation datasets. Our results show that Open-FinLLMs outperforms afvanced financial and general LLMs such as GPT-4, across financial NLP, decision-making, and multi-modal tasks, highlighting their potential to tackle real-world challenges. To foster innovation and collaboration across academia and industry, we release all codes (https://anonymous.4open.science/r/PIXIU2-0D70/B1D7/LICENSE) and models under OSI-approved licenses. |
33 pages, 13 figures |
Early Classification of Time Series: Taxonomy and Benchmark | 2025-04-02 | ShowIn many situations, the measurements of a studied phenomenon are provided sequentially, and the prediction of its class needs to be made as early as possible so as not to incur too high a time penalty, but not too early and risk paying the cost of misclassification. This problem has been particularly studied in the case of time series, and is known as Early Classification of Time Series (ECTS). Although it has been the subject of a growing body of literature, there is still a lack of a systematic, shared evaluation protocol to compare the relative merits of the various existing methods. This document begins by situating these methods within a principle-based taxonomy. It defines dimensions for organizing their evaluation, and then reports the results of a very extensive set of experiments along these dimensions involving nine state-of-the art ECTS algorithms. In addition, these and other experiments can be carried out using an open-source library in which most of the existing ECTS algorithms have been implemented (see https://github.com/ML-EDM/ml_edm). |
|
Dependence-based fuzzy clustering of functional time series | 2025-04-02 | ShowTime series clustering is essential in scientific applications, yet methods for functional time series, collections of infinite-dimensional curves treated as random elements in a Hilbert space, remain underdeveloped. This work presents clustering approaches for functional time series that combine the fuzzy |
48 pa...48 pages, 6 figures, 16 tables |
DRAN: A Distribution and Relation Adaptive Network for Spatio-temporal Forecasting | 2025-04-02 | ShowAccurate predictions of spatio-temporal systems' states are crucial for tasks such as system management, control, and crisis prevention. However, the inherent time variance of spatio-temporal systems poses challenges to achieving accurate predictions whenever stationarity is not granted. To address non-stationarity frameworks, we propose a Distribution and Relation Adaptive Network (DRAN) capable of dynamically adapting to relation and distribution changes over time. While temporal normalization and de-normalization are frequently used techniques to adapt to distribution shifts, this operation is not suitable for the spatio-temporal context as temporal normalization scales the time series of nodes and possibly disrupts the spatial relations among nodes. In order to address this problem, we develop a Spatial Factor Learner (SFL) module that enables the normalization and de-normalization process in spatio-temporal systems. To adapt to dynamic changes in spatial relationships among sensors, we propose a Dynamic-Static Fusion Learner (DSFL) module that effectively integrates features learned from both dynamic and static relations through an adaptive fusion ratio mechanism. Furthermore, we introduce a Stochastic Learner to capture the noisy components of spatio-temporal representations. Our approach outperforms state of the art methods in weather prediction and traffic flows forecasting tasks. Experimental results show that our SFL efficiently preserves spatial relationships across various temporal normalization operations. Visualizations of the learned dynamic and static relations demonstrate that DSFL can capture both local and distant relationships between nodes. Moreover, ablation studies confirm the effectiveness of each component. |
15 pages, 9 figures |
AverageTime: Enhance Long-Term Time Series Forecasting with Simple Averaging | 2025-04-02 | ShowLong-term time series forecasting focuses on leveraging historical data to predict future trends. The core challenge lies in effectively modeling dependencies both within sequences and channels. Convolutional Neural Networks and Linear models often excel in sequence modeling but frequently fall short in capturing complex channel dependencies. In contrast, Transformer-based models, with their attention mechanisms applied to both sequences and channels, have demonstrated strong predictive performance. Our research proposes a new approach for capturing sequence and channel dependencies: AverageTime, an exceptionally simple yet effective structure. By employing mixed channel embedding and averaging operations, AverageTime separately captures correlations for sequences and channels through channel mapping and result averaging. In addition, we integrate clustering methods to further accelerate the model's training process. Experiments on real-world datasets demonstrate that AverageTime surpasses state-of-the-art models in predictive performance while maintaining efficiency comparable to lightweight linear models. This provides a new and effective framework for modeling long time series. |
|
A Prefixed Patch Time Series Transformer for Two-Point Boundary Value Problems in Three-Body Problems | 2025-04-02 | ShowTwo-point boundary value problems for cislunar trajectories present significant challenges in circler restricted three body problem, making traditional analytical methods like Lambert's problem inapplicable. This study proposes a novel approach using a prefixed patch time series Transformer model that automates the solution of two-point boundary value problems from lunar flyby to arbitrary terminal conditions. Using prefix tokens of terminal conditions in our deep generative model enables solving boundary value problems in three-body dynamics. The training dataset consists of trajectories obtained through forward propagation rather than solving boundary value problems directly. The model demonstrates potential practical utility for preliminary trajectory design in cislunar mission scenarios. |
|
Attention Mamba: Time Series Modeling with Adaptive Pooling Acceleration and Receptive Field Enhancements | 2025-04-02 | Show"This work has been submitted to the lEEE for possible publication. Copyright may be transferred without noticeafter which this version may no longer be accessible." Time series modeling serves as the cornerstone of real-world applications, such as weather forecasting and transportation management. Recently, Mamba has become a promising model that combines near-linear computational complexity with high prediction accuracy in time series modeling, while facing challenges such as insufficient modeling of nonlinear dependencies in attention and restricted receptive fields caused by convolutions. To overcome these limitations, this paper introduces an innovative framework, Attention Mamba, featuring a novel Adaptive Pooling block that accelerates attention computation and incorporates global information, effectively overcoming the constraints of limited receptive fields. Furthermore, Attention Mamba integrates a bidirectional Mamba block, efficiently capturing long-short features and transforming inputs into the Value representations for attention mechanisms. Extensive experiments conducted on diverse datasets underscore the effectiveness of Attention Mamba in extracting nonlinear dependencies and enhancing receptive fields, establishing superior performance among leading counterparts. Our codes will be available on GitHub. |
|
FAN: Fourier Analysis Networks | 2025-04-02 | ShowDespite the remarkable successes of general-purpose neural networks, such as MLPs and Transformers, we find that they exhibit notable shortcomings in modeling and reasoning about periodic phenomena, achieving only marginal performance within the training domain and failing to generalize effectively to out-of-domain (OOD) scenarios. Periodicity is ubiquitous throughout nature and science. Therefore, neural networks should be equipped with the essential ability to model and handle periodicity. In this work, we propose FAN, a novel general-purpose neural network that offers broad applicability similar to MLP while effectively addressing periodicity modeling challenges. Periodicity is naturally integrated into FAN's structure and computational processes by introducing the Fourier Principle. Unlike existing Fourier-based networks, which possess particular periodicity modeling abilities but are typically designed for specific tasks, our approach maintains the general-purpose modeling capability. Therefore, FAN can seamlessly replace MLP in various model architectures with fewer parameters and FLOPs. Through extensive experiments, we demonstrate the superiority of FAN in periodicity modeling tasks and the effectiveness and generalizability of FAN across a range of real-world tasks, e.g., symbolic formula representation, time series forecasting, language modeling, and image recognition. |
|
Data Driven Decision Making with Time Series and Spatio-temporal Data | 2025-04-02 | ShowTime series data captures properties that change over time. Such data occurs widely, ranging from the scientific and medical domains to the industrial and environmental domains. When the properties in time series exhibit spatial variations, we often call the data spatio-temporal. As part of the continued digitalization of processes throughout society, increasingly large volumes of time series and spatio-temporal data are available. In this tutorial, we focus on data-driven decision making with such data, e.g., enabling greener and more efficient transportation based on traffic time series forecasting. The tutorial adopts the holistic paradigm of |
This ...This paper is accepted by ICDE 2025 |
Recurrent Stochastic Configuration Networks for Temporal Data Analytics | 2025-04-02 | ShowTemporal data modelling techniques with neural networks are useful in many domain applications, including time-series forecasting and control engineering. This paper aims at developing a recurrent version of stochastic configuration networks (RSCNs) for problem solving, where we have no underlying assumption on the dynamic orders of the input variables. Given a collection of historical data, we first build an initial RSCN model in the light of a supervisory mechanism, followed by an online update of the output weights by using a projection algorithm. Some theoretical results are established, including the echo state property, the universal approximation property of RSCNs for both the offline and online learnings, and the convergence of the output weights. The proposed RSCN model is remarkably distinguished from the well-known echo state networks (ESNs) in terms of the way of assigning the input random weight matrix and a special structure of the random feedback matrix. A comprehensive comparison study among the long short-term memory (LSTM) network, the original ESN, and several state-of-the-art ESN methods such as the simple cycle reservoir (SCR), the polynomial ESN (PESN), the leaky-integrator ESN (LIESN) and RSCN is carried out. Numerical results clearly indicate that the proposed RSCN performs favourably over all of the datasets. |
|
Derivative estimation by RKHS regularization for learning dynamics from time-series data | 2025-04-02 | ShowLearning the governing equations from time-series data has gained increasing attention due to its potential to extract useful dynamics from real-world data. Despite significant progress, it becomes challenging in the presence of noise, especially when derivatives need to be calculated. To reduce the effect of noise, we propose a method that simultaneously fits both the derivative and trajectory from noisy time-series data. Our approach formulates derivative estimation as an inverse problem involving integral operators within the forward model, and estimates the derivative function by solving a regularization problem in a vector-valued reproducing kernel Hilbert space (vRKHS). We derive an integral-form representer theorem, which enables the computation of the regularized solution by solving a finite-dimensional problem and facilitates efficiently estimating the optimal regularization parameter. By embedding the dynamics within a vRKHS and utilizing the fitted derivative and trajectory, we can recover the underlying dynamics from noisy data by solving a linear regularization problem. Several numerical experiments are conducted to validate the effectiveness and efficiency of our method. |
|
TS-RAG: Retrieval-Augmented Generation based Time Series Foundation Models are Stronger Zero-Shot Forecaster | 2025-04-01 | ShowRecently, Large Language Models (LLMs) and Foundation Models (FMs) have become prevalent for time series forecasting tasks. However, fine-tuning large language models (LLMs) for forecasting enables the adaptation to specific domains but may not generalize well across diverse, unseen datasets. Meanwhile, existing time series foundation models (TSFMs) lack inherent mechanisms for domain adaptation and suffer from limited interpretability, making them suboptimal for zero-shot forecasting. To this end, we present TS-RAG, a retrieval-augmented generation based time series forecasting framework that enhances the generalization capability and interpretability of TSFMs. Specifically, TS-RAG leverages pre-trained time series encoders to retrieve semantically relevant time series segments from a dedicated knowledge database, incorporating contextual patterns for the given time series query. Next, we develop a learnable Mixture-of-Experts (MoE)-based augmentation module, which dynamically fuses retrieved time series patterns with the TSFM's representation of the input query, improving forecasting accuracy without requiring task-specific fine-tuning. Thorough empirical studies on seven public benchmark datasets demonstrate that TS-RAG achieves state-of-the-art zero-shot forecasting performance, outperforming TSFMs by up to 6.51% across diverse domains and showcasing desired interpretability. |
|
Stochastic Reservoir Computers | 2025-04-01 | ShowReservoir computing is a form of machine learning that utilizes nonlinear dynamical systems to perform complex tasks in a cost-effective manner when compared to typical neural networks. Many recent advancements in reservoir computing, in particular quantum reservoir computing, make use of reservoirs that are inherently stochastic. However, the theoretical justification for using these systems has not yet been well established. In this paper, we investigate the universality of stochastic reservoir computers, in which we use a stochastic system for reservoir computing using the probabilities of each reservoir state as the readout instead of the states themselves. In stochastic reservoir computing, the number of distinct states of the entire reservoir computer can potentially scale exponentially with the size of the reservoir hardware, offering the advantage of compact device size. We prove that classes of stochastic echo state networks, and therefore the class of all stochastic reservoir computers, are universal approximating classes. We also investigate the performance of two practical examples of stochastic reservoir computers in classification and chaotic time series prediction. While shot noise is a limiting factor in the performance of stochastic reservoir computing, we show significantly improved performance compared to a deterministic reservoir computer with similar hardware in cases where the effects of noise are small. |
34 pages, 8 figures |
DeformTime: Capturing Variable Dependencies with Deformable Attention for Time Series Forecasting | 2025-04-01 | ShowIn multivariable time series (MTS) forecasting, existing state-of-the-art deep learning approaches tend to focus on autoregressive formulations and often overlook the potential of using exogenous variables in enhancing the prediction of the target endogenous variable. To address this limitation, we present DeformTime, a neural network architecture that attempts to capture correlated temporal patterns from the input space, and hence, improve forecasting accuracy. It deploys two core operations performed by deformable attention blocks (DABs): learning dependencies across variables from different time steps (variable DAB), and preserving temporal dependencies in data from previous time steps (temporal DAB). Input data transformation is explicitly designed to enhance learning from the deformed series of information while passing through a DAB. We conduct extensive experiments on 6 MTS data sets, using previously established benchmarks as well as challenging infectious disease modelling tasks with more exogenous variables. The results demonstrate that DeformTime improves accuracy against previous competitive methods across the vast majority of MTS forecasting tasks, reducing the mean absolute error by 7.2% on average. Notably, performance gains remain consistent across longer forecasting horizons. |
Publi...Published in Transactions on Machine Learning Research (04/2025). The code is available at https://github.com/ClaudiaShu/DeformTime |
Nonparametric spectral density estimation using interactive mechanisms under local differential privacy | 2025-04-01 | ShowWe address the problem of nonparametric estimation of the spectral density for a centered stationary Gaussian time series under local differential privacy constraints. Specifically, we propose new interactive privacy mechanisms for three tasks: estimating a single covariance coefficient, estimating the spectral density at a fixed frequency, and estimating the entire spectral density function. Our approach achieves faster rates through a two-stage process: we apply first the Laplace mechanism to the truncated value and then use the former privatized sample to gain knowledge on the dependence mechanism in the time series. For spectral densities belonging to H"older and Sobolev smoothness classes, we demonstrate that our estimators improve upon the non-interactive mechanism of Kroll (2024) for small privacy parameter |
47 pages |
Detection of Anomalous Vehicular Traffic and Sensor Failures Using Data Clustering Techniques | 2025-04-01 | ShowThe increasing availability of traffic data from sensor networks has created new opportunities for understanding vehicular dynamics and identifying anomalies. In this study, we employ clustering techniques to analyse traffic flow data with the dual objective of uncovering meaningful traffic patterns and detecting anomalies, including sensor failures and irregular congestion events. We explore multiple clustering approaches, i.e partitioning and hierarchical methods, combined with various time-series representations and similarity measures. Our methodology is applied to real-world data from highway sensors, enabling us to assess the impact of different clustering frameworks on traffic pattern recognition. We also introduce a clustering-driven anomaly detection methodology that identifies deviations from expected traffic behaviour based on distance-based anomaly scores. Results indicate that hierarchical clustering with symbolic representations provides robust segmentation of traffic patterns, while partitioning methods such as k-means and fuzzy c-means yield meaningful results when paired with Dynamic Time Warping. The proposed anomaly detection strategy successfully identifies sensor malfunctions and abnormal traffic conditions with minimal false positives, demonstrating its practical utility for real-time monitoring. Real-world vehicular traffic data are provided by Autostrade Alto Adriatico S.p.A. |
|
NIRVAR: Network Informed Restricted Vector Autoregression | 2025-04-01 | ShowHigh-dimensional panels of time series often arise in finance and macroeconomics, where co-movements within groups of panel components occur. Extracting these groupings from the data provides a course-grained description of the complex system in question and can inform subsequent prediction tasks. We develop a novel methodology to model such a panel as a restricted vector autoregressive process, where the coefficient matrix is the weighted adjacency matrix of a stochastic block model. This network time series model, which we call the Network Informed Restricted Vector Autoregression (NIRVAR) model, yields a coefficient matrix that has a sparse block-diagonal structure. We propose an estimation procedure that embeds each panel component in a low-dimensional latent space and clusters the embedded points to recover the blocks of the coefficient matrix. Crucially, the method allows for network-based time series modelling when the underlying network is unobserved. We derive the bias, consistency and asymptotic normality of the NIRVAR estimator. Simulation studies suggest that the NIRVAR estimated embedded points are Gaussian distributed around the ground truth latent positions. On three applications to finance, macroeconomics, and transportation systems, NIRVAR outperforms competing models in terms of prediction and provides interpretable results regarding group recovery. |
27 pages |
ARMAr-LASSO: Mitigating the Impact of Predictor Serial Correlation on the LASSO | 2025-04-01 | ShowWe explore estimation and forecast accuracy for sparse linear models, focusing on scenarios where both predictors and errors carry serial correlations. We establish a clear link between predictor serial correlation and the performance of the LASSO, showing that even orthogonal or weakly correlated stationary AR processes can lead to significant spurious correlations due to their serial correlations. To address this challenge, we propose a novel approach named ARMAr-LASSO ({\em ARMA residuals LASSO}), which applies the LASSO to predictors that have been pre-whitened with ARMA filters and lags of dependent variable. We derive both asymptotic results and oracle inequalities for the ARMAr-LASSO, demonstrating that it effectively reduces estimation errors while also providing an effective forecasting and feature selection strategy. Our findings are supported by extensive simulations and an application to real-world macroeconomic data, which highlight the superior performance of the ARMAr-LASSO for handling sparse linear models in the context of time series. |
34 pa...34 pages, 3 Figures, 4 Tables. arXiv admin note: substantial text overlap with arXiv:2208.00727 |
Innovative LSGTime Model for Crime Spatiotemporal Prediction Based on MindSpore Framework | 2025-04-01 | ShowWith the acceleration of urbanization, the spatiotemporal characteristics of criminal activities have become increasingly complex. Accurate prediction of crime distribution is crucial for optimizing the allocation of police resources and preventing crime. This paper proposes LGSTime, a crime spatiotemporal prediction model that integrates Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and the Multi-head Sparse Self-attention mechanism. LSTM and GRU capture long-term dependencies in crime time series, such as seasonality and periodicity, through their unique gating mechanisms. The Multi-head Sparse Self-attention mechanism, on the other hand, focuses on both temporal and spatial features of criminal events simultaneously through parallel processing and sparsification techniques, significantly improving computational efficiency and prediction accuracy. The integrated model leverages the strengths of each technique to better handle complex spatiotemporal data. Experimental findings demonstrate that the model attains optimal performance across four real - world crime datasets. In comparison to the CNN model, it exhibits performance enhancements of 2.8%, 1.9%, and 1.4% in the Mean Squared Error (MSE), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE) metrics respectively. These results offer a valuable reference for tackling the challenges in crime prediction. |
|
Class-Dependent Perturbation Effects in Evaluating Time Series Attributions | 2025-04-01 | ShowAs machine learning models become increasingly prevalent in time series applications, Explainable Artificial Intelligence (XAI) methods are essential for understanding their predictions. Within XAI, feature attribution methods aim to identify which input features contribute the most to a model's prediction, with their evaluation typically relying on perturbation-based metrics. Through systematic empirical analysis across multiple datasets, model architectures, and perturbation strategies, we reveal previously overlooked class-dependent effects in these metrics: they show varying effectiveness across classes, achieving strong results for some while remaining less sensitive to others. In particular, we find that the most effective perturbation strategies often demonstrate the most pronounced class differences. Our analysis suggests that these effects arise from the learned biases of classifiers, indicating that perturbation-based evaluation may reflect specific model behaviors rather than intrinsic attribution quality. We propose an evaluation framework with a class-aware penalty term to help assess and account for these effects in evaluating feature attributions, offering particular value for class-imbalanced datasets. Although our analysis focuses on time series classification, these class-dependent effects likely extend to other structured data domains where perturbation-based evaluation is common. |
Accep...Accepted at The World Conference on eXplainable Artificial Intelligence (XAI-2025) |
Integrating Fourier Neural Operators with Diffusion Models to improve Spectral Representation of Synthetic Earthquake Ground Motion Response | 2025-04-01 | ShowNuclear reactor buildings must be designed to withstand the dynamic load induced by strong ground motion earthquakes. For this reason, their structural behavior must be assessed in multiple realistic ground shaking scenarios (e.g., the Maximum Credible Earthquake). However, earthquake catalogs and recorded seismograms may not always be available in the region of interest. Therefore, synthetic earthquake ground motion is progressively being employed, although with some due precautions: earthquake physics is sometimes not well enough understood to be accurately reproduced with numerical tools, and the underlying epistemic uncertainties lead to prohibitive computational costs related to model calibration. In this study, we propose an AI physics-based approach to generate synthetic ground motion, based on the combination of a neural operator that approximates the elastodynamics Green's operator in arbitrary source-geology setups, enhanced by a denoising diffusion probabilistic model. The diffusion model is trained to correct the ground motion time series generated by the neural operator. Our results show that such an approach promisingly enhances the realism of the generated synthetic seismograms, with frequency biases and Goodness-Of-Fit (GOF) scores being improved by the diffusion model. This indicates that the latter is capable to mitigate the mid-frequency spectral falloff observed in the time series generated by the neural operator. Our method showcases fast and cheap inference in different site and source conditions. |
|
SVInvNet: A Densely Connected Encoder-Decoder Architecture for Seismic Velocity Inversion | 2025-04-01 | ShowThis study presents a deep learning-based approach to seismic velocity inversion problem, focusing on both noisy and noiseless training datasets of varying sizes. Our Seismic Velocity Inversion Network (SVInvNet) introduces a novel architecture that contains a multi-connection encoder-decoder structure enhanced with dense blocks. This design is specifically tuned to effectively process time series data, which is essential for addressing the challenges of non-linear seismic velocity inversion. For training and testing, we created diverse seismic velocity models, including multi-layered, faulty, and salt dome categories. We also investigated how different kinds of ambient noise, both coherent and stochastic, and the size of the training dataset affect learning outcomes. SVInvNet is trained on datasets ranging from 750 to 6,000 samples and is tested using a large benchmark dataset of 12,000 samples. Despite its fewer parameters compared to the baseline model, SVInvNet achieves superior performance with this dataset. The performance of SVInvNet was further evaluated using the OpenFWI dataset and Marmousi-derived velocity models. The comparative analysis clearly reveals the effectiveness of the proposed model. |
This ...This is the preprint of the accepted manuscript to appear in IEEE Transactions on Geoscience and Remote Sensing |
Estimating invertible processes in Hilbert spaces, with applications to functional ARMA processes | 2025-04-01 | ShowInvertible processes are central to functional time series analysis, making the estimation of their defining operators a key problem. While asymptotic error bounds have been established for specific ARMA models on |
|
ResNLS: An Improved Model for Stock Price Forecasting | 2025-04-01 | ShowStock prices forecasting has always been a challenging task. Although many research projects try to address the problem, few of them pay attention to the varying degrees of dependencies between stock prices. In this paper, we introduce a hybrid model that improves the prediction of stock prices by emphasizing the dependencies between adjacent stock prices. The proposed model, ResNLS, is mainly composed of two neural architectures, ResNet and LSTM. ResNet serves as a feature extractor to identify dependencies between stock prices, while LSTM analyzes the initial time series data with the combination of dependencies, which are considered as residuals. Our experiment reveals that when the closing price data for the previous 5 consecutive trading days is used as input, the performance of the model (ResNLS-5) is optimal compared to those with other inputs. Furthermore, ResNLS-5 demonstrates at least a 20% improvement over current state-of-the-art baselines. To verify whether ResNLS-5 can help clients effectively avoid risks and earn profits in the stock market, we construct a quantitative trading framework for back testing. The result shows that the trading strategy based on ResNLS-5 predictions can successfully mitigate losses during declining stock prices and generate profits in periods of rising stock prices. The relevant code is publicly available on GitHub. |
Accep...Accepted by Computational Intelligence 2023 |
Did ChatGPT or Copilot use alter the style of internet news headlines? A time series regression analysis | 2025-04-01 | ShowThe release of advanced Large Language Models (LLMs) such as ChatGPT and Copilot is changing the way text is created and may influence the content that we find on the web. This study investigated whether the release of these two popular LLMs coincided with a change in writing style in headlines and links on worldwide news websites. 175 NLP features were obtained for each text in a dataset of 451 million headlines/links. An interrupted time series analysis was applied for each of the 175 NLP features to evaluate whether there were any statistically significant sustained changes after the release dates of ChatGPT and/or Copilot. There were a total of 44 features that did not appear to have any significant sustained change after the release of ChatGPT/Copilot. A total of 91 other features did show significant change with ChatGPT and/or Copilot although significance with earlier control LLM release dates (GPT-1/2/3, Gopher) removed them from consideration. This initial analysis suggests these language models may have had a limited impact on the style of individual news headlines/links, with respect to only some NLP measures. |
|
Time-Series Forecasting via Topological Information Supervised Framework with Efficient Topological Feature Learning | 2025-04-01 | ShowTopological Data Analysis (TDA) has emerged as a powerful tool for extracting meaningful features from complex data structures, driving significant advancements in fields such as neuroscience, biology, machine learning, and financial modeling. Despite its success, the integration of TDA with time-series prediction remains underexplored due to three primary challenges: the limited utilization of temporal dependencies within topological features, computational bottlenecks associated with persistent homology, and the deterministic nature of TDA pipelines restricting generalized feature learning. This study addresses these challenges by proposing the Topological Information Supervised (TIS) Prediction framework, which leverages neural networks and Conditional Generative Adversarial Networks (CGANs) to generate synthetic topological features, preserving their distribution while significantly reducing computational time. We propose a novel training strategy that integrates topological consistency loss to improve the predictive accuracy of deep learning models. Specifically, we introduce two state-of-the-art models, TIS-BiGRU and TIS-Informer, designed to capture short-term and long-term temporal dependencies, respectively. Comparative experimental results demonstrate the superior performance of TIS models over conventional predictors, validating the effectiveness of integrating topological information. This work not only advances TDA-based time-series prediction but also opens new avenues for utilizing topological features in deep learning architectures. |
The e...The experiments are incomplete |
Transfer Learning in Financial Time Series with Gramian Angular Field | 2025-04-01 | ShowIn financial analysis, time series modeling is often hampered by data scarcity, limiting neural network models' ability to generalize. Transfer learning mitigates this by leveraging data from similar domains, but selecting appropriate source domains is crucial to avoid negative transfer. This study enhances source domain selection in transfer learning by introducing Gramian Angular Field (GAF) transformations to improve time series similarity functions. We evaluate a comprehensive range of baseline similarity functions, including both basic and state-of-the-art (SOTA) functions, and perform extensive experiments with Deep Neural Networks (DNN) and Long Short-Term Memory (LSTM) networks. The results demonstrate that GAF-based similarity functions significantly reduce prediction errors. Notably, Coral (GAF) for DNN and CMD (GAF) for LSTM consistently deliver superior performance, highlighting their effectiveness in complex financial environments. |
|
Self and mutually exciting point process embedding flexible residuals and intensity with discretely Markovian dynamics | 2025-04-01 | ShowThis work introduces a self and mutually exciting point process that embeds flexible residuals and intensity with discretely Markovian dynamics. By allowing the integration of diverse residual distributions, this model serves as an extension of the Hawkes process, facilitating intensity modeling. This model's nature enables a filtered historical simulation that more accurately incorporates the properties of the original time series. Furthermore, the process extends to multivariate models with manageable estimation and simulation implementations. We investigate the impact of a flexible residual distribution on the estimation of high-frequency financial data, comparing it with the Hawkes process. |
|
On Creating a Causally Grounded Usable Rating Method for Assessing the Robustness of Foundation Models Supporting Time Series | 2025-03-31 | ShowFoundation Models (FMs) have improved time series forecasting in various sectors, such as finance, but their vulnerability to input disturbances can hinder their adoption by stakeholders, such as investors and analysts. To address this, we propose a causally grounded rating framework to study the robustness of Foundational Models for Time Series (FMTS) with respect to input perturbations. We evaluate our approach to the stock price prediction problem, a well-studied problem with easily accessible public data, evaluating six state-of-the-art (some multi-modal) FMTS across six prominent stocks spanning three industries. The ratings proposed by our framework effectively assess the robustness of FMTS and also offer actionable insights for model selection and deployment. Within the scope of our study, we find that (1) multi-modal FMTS exhibit better robustness and accuracy compared to their uni-modal versions and, (2) FMTS pre-trained on time series forecasting task exhibit better robustness and forecasting accuracy compared to general-purpose FMTS pre-trained across diverse settings. Further, to validate our framework's usability, we conduct a user study showcasing FMTS prediction errors along with our computed ratings. The study confirmed that our ratings reduced the difficulty for users in comparing the robustness of different systems. |
|
Non-Asymptotic Analysis of Classical Spectrum Estimators for $L$-mixing Time-series Data with Unknown Means | 2025-03-31 | ShowSpectral estimation is an important tool in time series analysis, with applications including economics, astronomy, and climatology. The asymptotic theory for non-parametric estimation is well-known but the development of non-asymptotic theory is still ongoing. Our recent work obtained the first non-asymptotic error bounds on the Bartlett and Welch methods for |
7 pag...7 pages, 2 figures, Under Review for Conference on Decision and Control 2025 |
MetaCLBench: Meta Continual Learning Benchmark on Resource-Constrained Edge Devices | 2025-03-31 | ShowMeta-Continual Learning (Meta-CL) has emerged as a promising approach to minimize manual labeling efforts and system resource requirements by enabling Continual Learning (CL) with limited labeled samples. However, while existing methods have shown success in image-based tasks, their effectiveness remains unexplored for sequential time-series data from sensor systems, particularly audio inputs. To address this gap, we conduct a comprehensive benchmark study evaluating six representative Meta-CL approaches using three network architectures on five datasets from both image and audio modalities. We develop MetaCLBench, an end-to-end Meta-CL benchmark framework for edge devices to evaluate system overheads and investigate trade-offs among performance, computational costs, and memory requirements across various Meta-CL methods. Our results reveal that while many Meta-CL methods enable to learn new classes for both image and audio modalities, they impose significant computational and memory costs on edge devices. Also, we find that pre-training and meta-training procedures based on source data before deployment improve Meta-CL performance. Finally, to facilitate further research, we provide practical guidelines for researchers and machine learning practitioners implementing Meta-CL on resource-constrained environments and make our benchmark framework and tools publicly available, enabling fair evaluation across both accuracy and system-level metrics. |
|
Disinformation about autism in Latin America and the Caribbean: Mapping 150 false causes and 150 false cures of ASD in conspiracy theory communities on Telegram | 2025-03-31 | ShowHow do conspiracy theory communities in Latin America and the Caribbean structure, articulate, and sustain the dissemination of disinformation about autism? To answer this question, this research investigates the structuring, articulation, and promotion of autism-related disinformation in conspiracy theory communities in Latin America and the Caribbean. By analyzing publications from 1,659 Telegram communities over ten years (2015 - 2025) and examining more than 58 million pieces of shared content from approximately 5.3 million users, this study explores how false narratives about autism are promoted, including unfounded claims about its causes and promises of miraculous cures. The adopted methodology combines network analysis, time series analysis, thematic clustering, and content analysis, enabling the identification of dissemination patterns, key influencers, and interconnections with other conspiracy theories. Among the key findings, Brazilian communities stand out as the leading producers and distributors of these narratives in the region, accounting for 46% of the analyzed content. Additionally, there has been an exponential 15,000% (x151) increase in the volume of autism-related disinformation since the COVID-19 pandemic in Latin America and the Caribbean, highlighting the correlation between health crises and the rise of conspiracy beliefs. The research also reveals that false cures, such as chlorine dioxide (CDS), ozone therapy, and extreme diets, are widely promoted within these communities and commercially exploited, often preying on desperate families in exchange for money. By addressing the research question, this study aims to contribute to the understanding of the disinformation ecosystem and proposes critical reflections on how to confront these harmful narratives. |
Engli...English and Portuguese versions, with 124 pages together |
Assessing Driving Risk Through Unsupervised Detection of Anomalies in Telematics Time Series Data | 2025-03-31 | ShowVehicle telematics provides granular data for dynamic driving risk assessment, but current methods often rely on aggregated metrics (e.g., harsh braking counts) and do not fully exploit the rich time-series structure of telematics data. In this paper, we introduce a flexible framework using continuous-time hidden Markov model (CTHMM) to model and analyze trip-level telematics data. Unlike existing methods, the CTHMM models raw time-series data without predefined thresholds on harsh driving events or assumptions about accident probabilities. Moreover, our analysis is based solely on telematics data, requiring no traditional covariates such as driver or vehicle characteristics. Through unsupervised anomaly detection based on pseudo-residuals, we identify deviations from normal driving patterns -- defined as the prevalent behaviour observed in a driver's history or across the population -- which are linked to accident risk. Validated on both controlled and real-world datasets, the CTHMM effectively detects abnormal driving behaviour and trips with increased accident likelihood. In real data analysis, higher anomaly levels in longitudinal and lateral accelerations consistently correlate with greater accident risk, with classification models using this information achieving ROC-AUC values as high as 0.86 for trip-level analysis and 0.78 for distinguishing drivers with claims. Furthermore, the methodology reveals significant behavioural differences between drivers with and without claims, offering valuable insights for insurance applications, accident analysis, and prevention. |
|
EMForecaster: A Deep Learning Framework for Time Series Forecasting in Wireless Networks with Distribution-Free Uncertainty Quantification | 2025-03-31 | ShowWith the recent advancements in wireless technologies, forecasting electromagnetic field (EMF) exposure has become critical to enable proactive network spectrum and power allocation, as well as network deployment planning. In this paper, we develop a deep learning (DL) time series forecasting framework referred to as \textit{EMForecaster}. The proposed DL architecture employs patching to process temporal patterns at multiple scales, complemented by reversible instance normalization and mixing operations along both temporal and patch dimensions for efficient feature extraction. We augment {EMForecaster} with a conformal prediction mechanism, which is independent of the data distribution, to enhance the trustworthiness of model predictions via uncertainty quantification of forecasts. This conformal prediction mechanism ensures that the ground truth lies within a prediction interval with target error rate |
|
Times2D: Multi-Period Decomposition and Derivative Mapping for General Time Series Forecasting | 2025-03-31 | ShowTime series forecasting is an important application in various domains such as energy management, traffic planning, financial markets, meteorology, and medicine. However, real-time series data often present intricate temporal variability and sharp fluctuations, which pose significant challenges for time series forecasting. Previous models that rely on 1D time series representations usually struggle with complex temporal variations. To address the limitations of 1D time series, this study introduces the Times2D method that transforms the 1D time series into 2D space. Times2D consists of three main parts: first, a Periodic Decomposition Block (PDB) that captures temporal variations within a period and between the same periods by converting the time series into a 2D tensor in the frequency domain. Second, the First and Second Derivative Heatmaps (FSDH) capture sharp changes and turning points, respectively. Finally, an Aggregation Forecasting Block (AFB) integrates the output tensors from PDB and FSDH for accurate forecasting. This 2D transformation enables the utilization of 2D convolutional operations to effectively capture long and short characteristics of the time series. Comprehensive experimental results across large-scale data in the literature demonstrate that the proposed Times2D model achieves state-of-the-art performance in both short-term and long-term forecasting. The code is available in this repository: https://github.com/Tims2D/Times2D. |
Accep...Accepted at the AAAI 2025 Conference on Artificial Intelligence |
Enhancing Time Series Forecasting with Fuzzy Attention-Integrated Transformers | 2025-03-31 | ShowThis paper introduces FANTF (Fuzzy Attention Network-Based Transformers), a novel approach that integrates fuzzy logic with existing transformer architectures to advance time series forecasting, classification, and anomaly detection tasks. FANTF leverages a proposed fuzzy attention mechanism incorporating fuzzy membership functions to handle uncertainty and imprecision in noisy and ambiguous time series data. The FANTF approach enhances its ability to capture complex temporal dependencies and multivariate relationships by embedding fuzzy logic principles into the self-attention module of the existing transformer's architecture. The framework combines fuzzy-enhanced attention with a set of benchmark existing transformer-based architectures to provide efficient predictions, classification and anomaly detection. Specifically, FANTF generates learnable fuzziness attention scores that highlight the relative importance of temporal features and data points, offering insights into its decision-making process. Experimental evaluatios on some real-world datasets reveal that FANTF significantly enhances the performance of forecasting, classification, and anomaly detection tasks over traditional transformer-based models. |
|
Integrating Quantum-Classical Attention in Patch Transformers for Enhanced Time Series Forecasting | 2025-03-31 | ShowQCAAPatchTF is a quantum attention network integrated with an advanced patch-based transformer, designed for multivariate time series forecasting, classification, and anomaly detection. Leveraging quantum superpositions, entanglement, and variational quantum eigensolver principles, the model introduces a quantum-classical hybrid self-attention mechanism to capture multivariate correlations across time points. For multivariate long-term time series, the quantum self-attention mechanism can reduce computational complexity while maintaining temporal relationships. It then applies the quantum-classical hybrid self-attention mechanism alongside a feed-forward network in the encoder stage of the advanced patch-based transformer. While the feed-forward network learns nonlinear representations for each variable frame, the quantum self-attention mechanism processes individual series to enhance multivariate relationships. The advanced patch-based transformer computes the optimized patch length by dividing the sequence length into a fixed number of patches instead of using an arbitrary set of values. The stride is then set to half of the patch length to ensure efficient overlapping representations while maintaining temporal continuity. QCAAPatchTF achieves state-of-the-art performance in both long-term and short-term forecasting, classification, and anomaly detection tasks, demonstrating state-of-the-art accuracy and efficiency on complex real-world datasets. |
|
A Comparison of Parametric Dynamic Mode Decomposition Algorithms for Thermal-Hydraulics Applications | 2025-03-31 | ShowIn recent years, algorithms aiming at learning models from available data have become quite popular due to two factors: 1) the significant developments in Artificial Intelligence techniques and 2) the availability of large amounts of data. Nevertheless, this topic has already been addressed by methodologies belonging to the Reduced Order Modelling framework, of which perhaps the most famous equation-free technique is Dynamic Mode Decomposition. This algorithm aims to learn the best linear model that represents the physical phenomena described by a time series dataset: its output is a best state operator of the underlying dynamical system that can be used, in principle, to advance the original dataset in time even beyond its span. However, in its standard formulation, this technique cannot deal with parametric time series, meaning that a different linear model has to be derived for each parameter realization. Research on this is ongoing, and some versions of a parametric Dynamic Mode Decomposition already exist. This work contributes to this research field by comparing the different algorithms presently deployed and assessing their advantages and shortcomings compared to each other. To this aim, three different thermal-hydraulics problems are considered: two benchmark 'flow over cylinder' test cases at diverse Reynolds numbers, whose datasets are, respectively, obtained with the FEniCS finite element solver and retrieved from the CFDbench dataset, and the DYNASTY experimental facility operating at Politecnico di Milano, which studies the natural circulation established by internally heated fluids for Generation IV nuclear applications, whose dataset was generated using the RELAP5 nodal solver. |
|
Frequency-Aware Attention-LSTM for PM$_{2.5}$ Time Series Forecasting | 2025-03-31 | ShowTo enhance the accuracy and robustness of PM$_{2.5}$ concentration forecasting, this paper introduces FALNet, a Frequency-Aware LSTM Network that integrates frequency-domain decomposition, temporal modeling, and attention-based refinement. The model first applies STL and FFT to extract trend, seasonal, and denoised residual components, effectively filtering out high-frequency noise. The filtered residuals are then fed into a stacked LSTM to capture long-term dependencies, followed by a multi-head attention mechanism that dynamically focuses on key time steps. Experiments conducted on real-world urban air quality datasets demonstrate that FALNet consistently outperforms conventional models across standard metrics such as MAE, RMSE, and |
|
CITRAS: Covariate-Informed Transformer for Time Series Forecasting | 2025-03-31 | ShowCovariates play an indispensable role in practical time series forecasting, offering rich context from the past and sometimes extending into the future. However, their availability varies depending on the scenario, and situations often involve multiple target variables simultaneously. Moreover, the cross-variate dependencies between them are multi-granular, with some covariates having a short-term impact on target variables and others showing long-term correlations. This heterogeneity and the intricate dependencies arising in covariate-informed forecasting present significant challenges to existing deep models. To address these issues, we propose CITRAS, a patch-based Transformer that flexibly leverages multiple targets and covariates covering both the past and the future forecasting horizon. While preserving the strong autoregressive capabilities of the canonical Transformer, CITRAS introduces two novel mechanisms in patch-wise cross-variate attention: Key-Value (KV) Shift and Attention Score Smoothing. KV Shift seamlessly incorporates future known covariates into the forecasting of target variables based on their concurrent dependencies. Additionally, Attention Score Smoothing transforms locally accurate patch-wise cross-variate dependencies into global variate-level dependencies by smoothing the past series of attention scores. Experimentally, CITRAS achieves state-of-the-art performance in both covariate-informed and multivariate forecasting, demonstrating its versatile ability to leverage cross-variate dependency for improved forecasting accuracy. |
|
LSEAttention is All You Need for Time Series Forecasting | 2025-03-31 | ShowTransformer-based architectures have achieved remarkable success in natural language processing and computer vision. However, their performance in multivariate long-term forecasting often falls short compared to simpler linear baselines. Previous research has identified the traditional attention mechanism as a key factor limiting their effectiveness in this domain. To bridge this gap, we introduce LATST, a novel approach designed to mitigate entropy collapse and training instability common challenges in Transformer-based time series forecasting. We rigorously evaluate LATST across multiple real-world multivariate time series datasets, demonstrating its ability to outperform existing state-of-the-art Transformer models. Notably, LATST manages to achieve competitive performance with fewer parameters than some linear models on certain datasets, highlighting its efficiency and effectiveness. |
8 pag...8 pages with referencing, 1 figure, 5 tables |
ModelRadar: Aspect-based Forecast Evaluation | 2025-03-31 | ShowAccurate evaluation of forecasting models is essential for ensuring reliable predictions. Current practices for evaluating and comparing forecasting models focus on summarising performance into a single score, using metrics such as SMAPE. While convenient, averaging performance over all samples dilutes relevant information about model behavior under varying conditions. This limitation is especially problematic for time series forecasting, where multiple layers of averaging--across time steps, horizons, and multiple time series in a dataset--can mask relevant performance variations. We address this limitation by proposing ModelRadar, a framework for evaluating univariate time series forecasting models across multiple aspects, such as stationarity, presence of anomalies, or forecasting horizons. We demonstrate the advantages of this framework by comparing 24 forecasting methods, including classical approaches and different machine learning algorithms. NHITS, a state-of-the-art neural network architecture, performs best overall but its superiority varies with forecasting conditions. For instance, concerning the forecasting horizon, we found that NHITS (and also other neural networks) only outperforms classical approaches for multi-step ahead forecasting. Another relevant insight is that classical approaches such as ETS or Theta are notably more robust in the presence of anomalies. These and other findings highlight the importance of aspect-based model evaluation for both practitioners and researchers. ModelRadar is available as a Python package. |
|
Testing for integer integration in functional time series | 2025-03-31 | ShowWe develop a statistical testing procedure to examine whether the curve-valued time series of interest is integrated of order d for an integer d. The proposed procedure can distinguish between integer-integrated time series and fractionally-integrated ones, and it has broad applicability in practice. Monte Carlo simulation experiments show that the proposed testing procedure performs reasonably well. We apply our methodology to Canadian yield curve data and French sub-national age-specific mortality data. We find evidence that these time series are mostly integrated of order one, while some have fractional orders exceeding or falling below one. |
|
Wasserstein multivariate auto-regressive models for modeling distributional time series | 2025-03-31 | ShowThis paper is focused on the statistical analysis of data consisting of a collection of multiple series of probability measures that are indexed by distinct time instants and supported over a bounded interval of the real line. By modeling these time-dependent probability measures as random objects in the Wasserstein space, we propose a new auto-regressive model for the statistical analysis of multivariate distributional time series. Using the theory of iterated random function systems, results on the existence, uniqueness and stationarity of the solution of such a model are provided. We also propose a consistent estimator for the auto-regressive coefficients of this model. Due to the simplex constraints that we impose on the model coefficients, the proposed estimator that is learned under these constraints, naturally has a sparse structure. The sparsity allows the application of the proposed model in learning a graph of temporal dependency from multivariate distributional time series. We explore the numerical performances of our estimation procedure using simulated data. To shed some light on the benefits of our approach for real data analysis, we also apply this methodology to two data sets, respectively made of observations from age distribution in different countries and those from the bike sharing network in Paris. |
|
Incremental capacity-based multi-feature fusion model for predicting state-of-health of lithium-ion batteries | 2025-03-31 | ShowLithium-ion batteries have become an indispensable part of human industrial production and daily life. For the safe use, management and maintenance of lithium-ion batteries, the state of health (SOH) of lithium-ion batteries is an important indicator so that the SOH estimation is of significant practical value. In order to accurately predict SOH, this paper proposes a fusion prediction model which combines particle swarm optimization (PSO) algorithm, bi-directional long-short time memory network (BiLSTM) and adaptive boosting (AdaBoost) algorithm. In the proposed prediction model, indirect health indicators (HIs), which characterize battery degradation, are obtained with the help of incremental capacity analysis (ICA), and is fed into BiLSTM to extract time-series features, whose parameters are optimized by employing PSO algorithm. On this basis, the AdaBoost algorithm is applied to reduce the risk of overfitting the PSO-BiLSTM model. The study based on lithium-ion battery data from Center for Advanced Life Cycle Engineering (CALCE) shows that the PSO-BiLSTM-AdaBoost model has higher accuracy, better robustness, and generalization ability. |
|
Q-fid: Quantum Circuit Fidelity Improvement with LSTM Networks | 2025-03-31 | ShowThe fidelity of quantum circuits (QC) is influenced by several factors, including hardware characteristics, calibration status, and the transpilation process, all of which impact their susceptibility to noise. However, existing methods struggle to estimate and compare the noise performance of different circuit layouts due to fluctuating error rates and the absence of a standardized fidelity metric. In this work, Q-fid is introduced, a Long Short-Term Memory (LSTM) based fidelity prediction system accompanied by a novel metric designed to quantify the fidelity of quantum circuits. Q-fid provides an intuitive way to predict the noise performance of Noisy Intermediate-Scale Quantum (NISQ) circuits. This approach frames fidelity prediction as a Time Series Forecasting problem to analyze the tokenized circuits, capturing the causal dependence of the gate sequences and their impact on overall fidelity. Additionally, the model is capable of dynamically adapting to changes in hardware characteristics, ensuring accurate fidelity predictions under varying conditions. Q-fid achieves a high prediction accuracy with an average RMSE of 0.0515, up to 24.7x more accurate than the Qiskit transpile tool mapomatic. By offering a reliable method for fidelity prediction, Q-fid empowers developers to optimize transpilation strategies, leading to more efficient and noise-resilient quantum circuit implementations. |
|
Simple Feedfoward Neural Networks are Almost All You Need for Time Series Forecasting | 2025-03-30 | ShowTime series data are everywhere -- from finance to healthcare -- and each domain brings its own unique complexities and structures. While advanced models like Transformers and graph neural networks (GNNs) have gained popularity in time series forecasting, largely due to their success in tasks like language modeling, their added complexity is not always necessary. In our work, we show that simple feedforward neural networks (SFNNs) can achieve performance on par with, or even exceeding, these state-of-the-art models, while being simpler, smaller, faster, and more robust. Our analysis indicates that, in many cases, univariate SFNNs are sufficient, implying that modeling interactions between multiple series may offer only marginal benefits. Even when inter-series relationships are strong, a basic multivariate SFNN still delivers competitive results. We also examine some key design choices and offer guidelines on making informed decisions. Additionally, we critique existing benchmarking practices and propose an improved evaluation protocol. Although SFNNs may not be optimal for every situation (hence the ``almost'' in our title) they serve as a strong baseline that future time series forecasting methods should always be compared against. |
|
A Kolmogorov-Zurbenko Fourier Transform Band-pass Filter Extension for Time Series Analysis | 2025-03-30 | ShowThis research introduces a novel extension, called the Extended Kolmogorov-Zurbenko Fourier Transform (EKZFT), to an existing class of band-pass filters first introduced by Kolmogorov and Zurbenko. Their original Kolmogorov-Zurbenko Fourier Transform (KZFT) is a useful tool in time series and spatio-temporal analysis, Fourier analysis, and related statistical analysis fields. Example uses of these filters include separating frequencies, filtering portions of the spectra, reconstructing seasonality, investigating periodic signals, and reducing noise. KZFT filters have many practical applications across a wide range of disciplines including health, social, natural, and physical sciences. KZFT filters are band-pass filters defined by three arguments: the length of the filter window; the number of iterations; and the central frequency of the band-pass filter. However, the KZFT filter is limited in design to only positive odd integer widow lengths inherited from the time series. Therefore, for any combination of the other KZFT filter arguments, there is only a relatively small, discrete, selection of possible filter window lengths in a range determined by the size of the dataset. This limits the utility of KZFT filters for many of the stated uses. The proposed EKZFT filter allows a continuous selection of filter window length arguments over the same range, offering improved control, increased functionality, and wider practical use of this band-pass filter. An example application of the EKZFT in a data simulation is provided. |
17 pages, 4 figures |
Title | Date | Abstract | Comment |
---|---|---|---|
Target Prediction Under Deceptive Switching Strategies via Outlier-Robust Filtering of Partially Observed Incomplete Trajectories | 2025-04-04 | ShowMotivated by a study on deception and counter-deception, this paper addresses the problem of identifying an agent's target as it seeks to reach one of two targets in a given environment. In practice, an agent may initially follow a strategy to aim at one target but decide to switch to another midway. Such a strategy can be deceptive when the counterpart only has access to imperfect observations, which include heavily corrupted sensor noise and possible outliers, making it difficult to visually identify the agent's true intent. To counter deception and identify the true target, we utilize prior knowledge of the agent's dynamics and the imprecisely observed partial trajectory of the agent's states to dynamically update the estimation of the posterior probability of whether a deceptive switch has taken place. However, existing methods in the literature have not achieved effective deception identification within a reasonable computation time. We propose a set of outlier-robust change detection methods to track relevant change-related statistics efficiently, enabling the detection of deceptive strategies in hidden nonlinear dynamics with reasonable computational effort. The performance of the proposed framework is examined for Weapon-Target Assignment (WTA) detection under deceptive strategies using random simulations in the kinematics model with external forcing. |
|
Event-Triggered Polynomial Control for Trajectory Tracking by Unicycle Robots | 2025-04-04 | ShowThis paper proposes an event-triggered polynomial control method for trajectory tracking by unicycle robots. In this method, each control input between two consecutive events is a polynomial and its coefficients are chosen to minimize the error in approximating a continuous-time control signal. We design an event-triggering rule that guarantees uniform ultimate boundedness of the tracking error and non-Zeno behavior of inter-event times. We illustrate our results through a suite of numerical simulations and experiments, which indicate that the number of events generated by the proposed controller is significantly less compared to that by a time-triggered controller or a event-triggered controller based on zero-order hold while guaranteeing similar tracking performance. |
|
Data-Driven Hamiltonian for Direct Construction of Safe Set from Trajectory Data | 2025-04-04 | ShowIn continuous-time optimal control, evaluating the Hamiltonian requires solving a constrained optimization problem using the system's dynamics model. Hamilton-Jacobi reachability analysis for safety verification has demonstrated practical utility only when efficient evaluation of the Hamiltonian over a large state-time grid is possible. In this study, we introduce the concept of a data-driven Hamiltonian (DDH), which circumvents the need for an explicit dynamics model by relying only on mild prior knowledge (e.g., Lipschitz constants), thus enabling the construction of reachable sets directly from trajectory data. Recognizing that the Hamiltonian is the optimal inner product between a given costate and realizable state velocities, the DDH estimates the Hamiltonian using the worst-case realization of the velocity field based on the observed state trajectory data. This formulation ensures a conservative approximation of the true Hamiltonian for uncertain dynamics. The reachable set computed based on the DDH is also ensured to be a conservative approximation of the true reachable set. Next, we propose a data-efficient safe experiment framework for gradual expansion of safe sets using the DDH. This is achieved by iteratively conducting experiments within the computed data-driven safe set and updating the set using newly collected trajectory data. To demonstrate the capabilities of our approach, we showcase its effectiveness in safe flight envelope expansion for a tiltrotor vehicle transitioning from near-hover to forward flight. |
This ...This is the extended version of the article submitted to IEEE CDC 2025. This work has been submitted to the IEEE for possible publication |
Learning Lie Group Generators from Trajectories | 2025-04-04 | ShowThis work investigates the inverse problem of generator recovery in matrix Lie groups from discretized trajectories. Let |
7 pages, 12 figures |
VL-TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments | 2025-04-04 | ShowWe present a multi-modal trajectory generation and selection algorithm for real-world mapless outdoor navigation in human-centered environments. Such environments contain rich features like crosswalks, grass, and curbs, which are easily interpretable by humans, but not by mobile robots. We aim to compute suitable trajectories that (1) satisfy the environment-specific traversability constraints and (2) generate human-like paths while navigating on crosswalks, sidewalks, etc. Our formulation uses a Conditional Variational Autoencoder (CVAE) generative model enhanced with traversability constraints to generate multiple candidate trajectories for global navigation. We develop a visual prompting approach and leverage the Visual Language Model's (VLM) zero-shot ability of semantic understanding and logical reasoning to choose the best trajectory given the contextual information about the task. We evaluate our method in various outdoor scenes with wheeled robots and compare the performance with other global navigation algorithms. In practice, we observe an average improvement of 20.81% in satisfying traversability constraints and 28.51% in terms of human-like navigation in four different outdoor navigation scenarios. |
|
Quattro: Transformer-Accelerated Iterative Linear Quadratic Regulator Framework for Fast Trajectory Optimization | 2025-04-03 | ShowReal-time optimal control remains a fundamental challenge in robotics, especially for nonlinear systems with stringent performance requirements. As one of the representative trajectory optimization algorithms, the iterative Linear Quadratic Regulator (iLQR) faces limitations due to their inherently sequential computational nature, which restricts the efficiency and applicability of real-time control for robotic systems. While existing parallel implementations aim to overcome the above limitations, they typically demand additional computational iterations and high-performance hardware, leading to only modest practical improvements. In this paper, we introduce Quattro, a transformer-accelerated iLQR framework employing an algorithm-hardware co-design strategy to predict intermediate feedback and feedforward matrices. It facilitates effective parallel computations on resource-constrained devices without sacrificing accuracy. Experiments on cart-pole and quadrotor systems show an algorithm-level acceleration of up to 5.3$\times$ and 27$\times$ per iteration, respectively. When integrated into a Model Predictive Control (MPC) framework, Quattro achieves overall speedups of 2.8$\times$ for the cart-pole and 17.8$\times$ for the quadrotor compared to the one that applies traditional iLQR. Transformer inference is deployed on FPGA to maximize performance, achieving further up to 20.8$\times$ speedup over prevalent embedded CPUs with over 11$\times$ power reduction than GPU and low hardware resource overhead. |
|
Deep Representation Learning for Unsupervised Clustering of Myocardial Fiber Trajectories in Cardiac Diffusion Tensor Imaging | 2025-04-02 | ShowUnderstanding the complex myocardial architecture is critical for diagnosing and treating heart disease. However, existing methods often struggle to accurately capture this intricate structure from Diffusion Tensor Imaging (DTI) data, particularly due to the lack of ground truth labels and the ambiguous, intertwined nature of fiber trajectories. We present a novel deep learning framework for unsupervised clustering of myocardial fibers, providing a data-driven approach to identifying distinct fiber bundles. We uniquely combine a Bidirectional Long Short-Term Memory network to capture local sequential information along fibers, with a Transformer autoencoder to learn global shape features, with pointwise incorporation of essential anatomical context. Clustering these representations using a density-based algorithm identifies 33 to 62 robust clusters, successfully capturing the subtle distinctions in fiber trajectories with varying levels of granularity. Our framework offers a new, flexible, and quantitative way to analyze myocardial structure, achieving a level of delineation that, to our knowledge, has not been previously achieved, with potential applications in improving surgical planning, characterizing disease-related remodeling, and ultimately, advancing personalized cardiac care. |
10 pa...10 pages, 5 figures. Submitted to MICCAI 2025 (under review) |
End-to-End Driving with Online Trajectory Evaluation via BEV World Model | 2025-04-02 | ShowEnd-to-end autonomous driving has achieved remarkable progress by integrating perception, prediction, and planning into a fully differentiable framework. Yet, to fully realize its potential, an effective online trajectory evaluation is indispensable to ensure safety. By forecasting the future outcomes of a given trajectory, trajectory evaluation becomes much more effective. This goal can be achieved by employing a world model to capture environmental dynamics and predict future states. Therefore, we propose an end-to-end driving framework WoTE, which leverages a BEV World model to predict future BEV states for Trajectory Evaluation. The proposed BEV world model is latency-efficient compared to image-level world models and can be seamlessly supervised using off-the-shelf BEV-space traffic simulators. We validate our framework on both the NAVSIM benchmark and the closed-loop Bench2Drive benchmark based on the CARLA simulator, achieving state-of-the-art performance. Code is released at https://github.com/liyingyanUCAS/WoTE. |
|
Virtual Target Trajectory Prediction for Stochastic Targets | 2025-04-02 | ShowTrajectory prediction of other vehicles is crucial for autonomous vehicles, with applications from missile guidance to UAV collision avoidance. Typically, target trajectories are assumed deterministic, but real-world aerial vehicles exhibit stochastic behavior, such as evasive maneuvers or gliders circling in thermals. This paper uses Conditional Normalizing Flows, an unsupervised Machine Learning technique, to learn and predict the stochastic behavior of targets of guided missiles using trajectory data. The trained model predicts the distribution of future target positions based on initial conditions and parameters of the dynamics. Samples from this distribution are clustered using a time series k-means algorithm to generate representative trajectories, termed virtual targets. The method is fast and target-agnostic, requiring only training data in the form of target trajectories. Thus, it serves as a drop-in replacement for deterministic trajectory predictions in guidance laws and path planning. Simulated scenarios demonstrate the approach's effectiveness for aerial vehicles with random maneuvers, bridging the gap between deterministic predictions and stochastic reality, advancing guidance and control algorithms for autonomous vehicles. |
will ...will be submitted to Journal of Guidance, Control, and Dynamics |
ACTIVE: Continuous Similarity Search for Vessel Trajectories | 2025-04-01 | ShowPublicly available vessel trajectory data is emitted continuously from the global AIS system. Continuous trajectory similarity search on this data has applications in, e.g., maritime navigation and safety. Existing proposals typically assume an offline setting and focus on finding similarities between complete trajectories. Such proposals are less effective when applied to online scenarios, where similarity comparisons must be performed continuously as new trajectory data arrives and trajectories evolve. We therefore propose a real-time continuous trajectory similarity search method for vessels (ACTIVE). We introduce a novel similarity measure, object-trajectory real-time distance, that emphasizes the anticipated future movement trends of vessels, enabling more predictive and forward-looking comparisons. Next, we propose a segment-based vessel trajectory index structure that organizes historical trajectories into smaller and manageable segments, facilitating accelerated similarity computations. Leveraging this index, we propose an efficient continuous similar trajectory search (CSTS) algorithm together with a variety of search space pruning strategies that reduce unnecessary computations during the continuous similarity search, thereby further improving efficiency. Extensive experiments on two large real-world AIS datasets offer evidence that ACTIVE is capable of outperforming state-of-the-art methods considerably. ACTIVE significantly reduces index construction costs and index size while achieving a 70% reduction in terms of query time and a 60% increase in terms of hit rate. |
|
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning | 2025-04-01 | ShowLearning from multiple domains is a primary factor that influences the generalization of a single unified robot system. In this paper, we aim to learn the trajectory prediction model by using broad out-of-domain data to improve its performance and generalization ability. Trajectory model is designed to predict any-point trajectories in the current frame given an instruction and can provide detailed control guidance for robotic policy learning. To handle the diverse out-of-domain data distribution, we propose a sparsely-gated MoE (\textbf{Top-1} gating strategy) architecture for trajectory model, coined as \textbf{Tra-MoE}. The sparse activation design enables good balance between parameter cooperation and specialization, effectively benefiting from large-scale out-of-domain data while maintaining constant FLOPs per token. In addition, we further introduce an adaptive policy conditioning technique by learning 2D mask representations for predicted trajectories, which is explicitly aligned with image observations to guide action prediction more flexibly. We perform extensive experiments on both simulation and real-world scenarios to verify the effectiveness of Tra-MoE and adaptive policy conditioning technique. We also conduct a comprehensive empirical study to train Tra-MoE, demonstrating that our Tra-MoE consistently exhibits superior performance compared to the dense baseline model, even when the latter is scaled to match Tra-MoE's parameter count. |
Accep...Accepted to CVPR 2025. Code Page: https://github.com/MCG-NJU/Tra-MoE |
Timely Trajectory Reconstruction in Finite Buffer Remote Tracking Systems | 2025-04-01 | ShowRemote tracking systems play a critical role in applications such as IoT, monitoring, surveillance and healthcare. In such systems, maintaining both real-time state awareness (for online decision making) and accurate reconstruction of historical trajectories (for offline post-processing) are essential. While the Age of Information (AoI) metric has been extensively studied as a measure of freshness, it does not capture the accuracy with which past trajectories can be reconstructed. In this work, we investigate reconstruction error as a complementary metric to AoI, addressing the trade-off between timely updates and historical accuracy. Specifically, we consider three policies, each prioritizing different aspects of information management: Keep-Old, Keep-Fresh, and our proposed Inter-arrival-Aware dropping policy. We compare these policies in terms of impact on both AoI and reconstruction error in a remote tracking system with a finite buffer. Through theoretical analysis and numerical simulations of queueing behavior, we demonstrate that while the Keep-Fresh policy minimizes AoI, it does not necessarily minimize reconstruction error. In contrast, our proposed Inter-arrival-Aware dropping policy dynamically adjusts packet retention decisions based on generation times, achieving a balance between AoI and reconstruction error. Our results provide key insights into the design of efficient update policies for resource-constrained IoT networks. |
|
Fine-Grained Behavior and Lane Constraints Guided Trajectory Prediction Method | 2025-04-01 | ShowTrajectory prediction, as a critical component of autonomous driving systems, has attracted the attention of many researchers. Existing prediction algorithms focus on extracting more detailed scene features or selecting more reasonable trajectory destinations. However, in the face of dynamic and evolving future movements of the target vehicle, these algorithms cannot provide a fine-grained and continuous description of future behaviors and lane constraints, which degrades the prediction accuracy. To address this challenge, we present BLNet, a novel dualstream architecture that synergistically integrates behavioral intention recognition and lane constraint modeling through parallel attention mechanisms. The framework generates fine-grained behavior state queries (capturing spatial-temporal movement patterns) and lane queries (encoding lane topology constraints), supervised by two auxiliary losses, respectively. Subsequently, a two-stage decoder first produces trajectory proposals, then performs point-level refinement by jointly incorporating both the continuity of passed lanes and future motion features. Extensive experiments on two large datasets, nuScenes and Argoverse, show that our network exhibits significant performance gains over existing direct regression and goal-based algorithms. |
This ...This work has been submitted to the IEEE for possible publication |
Joint Beamforming and Trajectory Optimization for Multi-UAV-Assisted Integrated Sensing and Communication Systems | 2025-04-01 | ShowIn this paper, we investigate beamforming design and trajectory optimization for a multi-unmanned aerial vehicle (UAV)-assisted integrated sensing and communication (ISAC) system. The proposed system employs multiple UAVs equipped with dual-functional radar-communication capabilities to simultaneously perform target sensing and provide communication services to users. We formulate a joint optimization problem that aims to maximize the sum rate of users while maintaining target sensing performance through coordinated beamforming and UAV trajectory design. To address this challenging non-convex problem, we develop a block coordinated descent (BCD)-based iterative algorithm that decomposes the original problem into tractable subproblems. Then, the beamforming design problem is addressed using fractional programming, while the UAV trajectory is refined through the deep deterministic policy gradient (DDPG) algorithm. The simulation results demonstrate that the proposed joint optimization approach achieves significant performance improvements in both communication throughput and sensing accuracy compared to conventional, separated designs. We also show that proper coordination of multiple UAVs through optimized trajectories and beamforming patterns can effectively balance the tradeoff between sensing and communication objectives. |
5 pages, 1 figure |
Design and Validation of an Intention-Aware Probabilistic Framework for Trajectory Prediction: Integrating COLREGS, Grounding Hazards, and Planned Routes | 2025-04-01 | ShowCollision avoidance capability is an essential component in an autonomous vessel navigation system. To this end, an accurate prediction of dynamic obstacle trajectories is vital. Traditional approaches to trajectory prediction face limitations in generalizability and often fail to account for the intentions of other vessels. While recent research has considered incorporating the intentions of dynamic obstacles, these efforts are typically based on the own-ship's interpretation of the situation. The current state-of-the-art in this area is a Dynamic Bayesian Network (DBN) model, which infers target vessel intentions by considering multiple underlying causes and allowing for different interpretations of the situation by different vessels. However, since its inception, there have not been any significant structural improvements to this model. In this paper, we propose enhancing the DBN model by incorporating considerations for grounding hazards and vessel waypoint information. The proposed model is validated using real vessel encounters extracted from historical Automatic Identification System (AIS) data. |
|
Geometry Based UAV Trajectory Planning for Mixed User Traffic in mmWave Communication | 2025-04-01 | ShowUnmanned aerial vehicle (UAV) assisted communication is a revolutionary technology that has been recently presented as a potential candidate for beyond fifth-generation millimeter wave (mmWave) communications. Although mmWaves can offer a notably high data rate, their high penetration and propagation losses mean that line of sight (LoS) is necessary for effective communication. Due to the presence of obstacles and user mobility, UAV trajectory planning plays a crucial role in improving system performance. In this work, we propose a novel computational geometry-based trajectory planning scheme by considering the user mobility, the priority of the delay sensitive ultra-reliable low-latency communications (URLLC) and the high throughput requirements of the enhanced mobile broadband (eMBB) traffic. Specifically, we use geometric tools like Apollonius circle and minimum enclosing ball of balls to find the optimal position of the UAV that supports uninterrupted connections to the URLLC users and maximizes the aggregate throughput of the eMBB users. Finally, the numerical results demonstrate the benefits of the suggested approach over an existing state of the art benchmark scheme in terms of sum throughput obtained by URLLC and eMBB users. |
This ...This work has been submitted to a reputed conference |
Harmonic model predictive control for tracking sinusoidal references and its application to trajectory tracking | 2025-04-01 | ShowHarmonic model predictive control (HMPC) is a recent model predictive control (MPC) formulation for tracking piece-wise constant references that includes a parameterized artificial harmonic reference as a decision variable, resulting in an increased performance and domain of attraction with respect to other MPC formulations. This article presents an extension of the HMPC formulation to track periodic harmonic/sinusoidal references and discusses its use for tracking arbitrary trajectories. The proposed formulation inherits the benefits of its predecessor, namely its good performance and large domain of attraction when using small prediction horizons, and that the complexity of its optimization problem does not depend on the period of the reference. We show closed-loop results discussing its performance and comparing it to other MPC formulations. |
Accep...Accepted version of the article published in IEEE Transactions on Automatic Control (8 pages, 5 figures) |
Aligning Diffusion Model with Problem Constraints for Trajectory Optimization | 2025-04-01 | ShowDiffusion models have recently emerged as effective generative frameworks for trajectory optimization, capable of producing high-quality and diverse solutions. However, training these models in a purely data-driven manner without explicit incorporation of constraint information often leads to violations of critical constraints, such as goal-reaching, collision avoidance, and adherence to system dynamics. To address this limitation, we propose a novel approach that aligns diffusion models explicitly with problem-specific constraints, drawing insights from the Dynamic Data-driven Application Systems (DDDAS) framework. Our approach introduces a hybrid loss function that explicitly measures and penalizes constraint violations during training. Furthermore, by statistically analyzing how constraint violations evolve throughout the diffusion steps, we develop a re-weighting strategy that aligns predicted violations to ground truth statistics at each diffusion step. Evaluated on a tabletop manipulation and a two-car reach-avoid problem, our constraint-aligned diffusion model significantly reduces constraint violations compared to traditional diffusion models, while maintaining the quality of trajectory solutions. This approach is well-suited for integration into the DDDAS framework for efficient online trajectory adaptation as new environmental data becomes available. |
|
Learning Velocity and Acceleration: Self-Supervised Motion Consistency for Pedestrian Trajectory Prediction | 2025-03-31 | ShowUnderstanding human motion is crucial for accurate pedestrian trajectory prediction. Conventional methods typically rely on supervised learning, where ground-truth labels are directly optimized against predicted trajectories. This amplifies the limitations caused by long-tailed data distributions, making it difficult for the model to capture abnormal behaviors. In this work, we propose a self-supervised pedestrian trajectory prediction framework that explicitly models position, velocity, and acceleration. We leverage velocity and acceleration information to enhance position prediction through feature injection and a self-supervised motion consistency mechanism. Our model hierarchically injects velocity features into the position stream. Acceleration features are injected into the velocity stream. This enables the model to predict position, velocity, and acceleration jointly. From the predicted position, we compute corresponding pseudo velocity and acceleration, allowing the model to learn from data-generated pseudo labels and thus achieve self-supervised learning. We further design a motion consistency evaluation strategy grounded in physical principles; it selects the most reasonable predicted motion trend by comparing it with historical dynamics and uses this trend to guide and constrain trajectory generation. We conduct experiments on the ETH-UCY and Stanford Drone datasets, demonstrating that our method achieves state-of-the-art performance on both datasets. |
|
Dynamic High-Order Control Barrier Functions with Diffuser for Safety-Critical Trajectory Planning at Signal-Free Intersections | 2025-03-31 | ShowPlanning safe and efficient trajectories through signal-free intersections presents significant challenges for autonomous vehicles (AVs), particularly in dynamic, multi-task environments with unpredictable interactions and an increased possibility of conflicts. This study aims to address these challenges by developing a unified, robust, adaptive framework to ensure safety and efficiency across three distinct intersection movements: left-turn, right-turn, and straight-ahead. Existing methods often struggle to reliably ensure safety and effectively learn multi-task behaviors from demonstrations in such environments. This study proposes a safety-critical planning method that integrates Dynamic High-Order Control Barrier Functions (DHOCBF) with a diffusion-based model, called Dynamic Safety-Critical Diffuser (DSC-Diffuser). The DSC-Diffuser leverages task-guided planning to enhance efficiency, allowing the simultaneous learning of multiple driving tasks from real-world expert demonstrations. Moreover, the incorporation of goal-oriented constraints significantly reduces displacement errors, ensuring precise trajectory execution. To further ensure driving safety in dynamic environments, the proposed DHOCBF framework dynamically adjusts to account for the movements of surrounding vehicles, offering enhanced adaptability and reduce the conservatism compared to traditional control barrier functions. Validity evaluations of DHOCBF, conducted through numerical simulations, demonstrate its robustness in adapting to variations in obstacle velocities, sizes, uncertainties, and locations, effectively maintaining driving safety across a wide range of complex and uncertain scenarios. Comprehensive performance evaluations demonstrate that DSC-Diffuser generates realistic, stable, and generalizable policies, providing flexibility and reliable safety assurance in complex multi-task driving scenarios. |
11 fi...11 figures, 5 tables, 15 pages |
Trajectory Planning for Automated Driving using Target Funnels | 2025-03-31 | ShowSelf-driving vehicles rely on sensory input to monitor their surroundings and continuously adapt to the most likely future road course. Predictive trajectory planning is based on snapshots of the (uncertain) road course as a key input. Under noisy perception data, estimates of the road course can vary significantly, leading to indecisive and erratic steering behavior. To overcome this issue, this paper introduces a predictive trajectory planning algorithm with a novel objective function: instead of targeting a single reference trajectory based on the most likely road course, tracking a series of target reference sets, called a target funnel, is considered. The proposed planning algorithm integrates probabilistic information about the road course, and thus implicitly considers regular updates to road perception. Our solution is assessed in a case study using real driving data collected from a prototype vehicle. The results demonstrate that the algorithm maintains tracking accuracy and substantially reduces undesirable steering commands in the presence of noisy road perception, achieving a 56% reduction in input costs compared to a certainty equivalent formulation. |
accep...accepted to European Control Conference 2025 (ECC25) |
Language-Guided Trajectory Traversal in Disentangled Stable Diffusion Latent Space for Factorized Medical Image Generation | 2025-03-30 | ShowText-to-image diffusion models have demonstrated a remarkable ability to generate photorealistic images from natural language prompts. These high-resolution, language-guided synthesized images are essential for the explainability of disease or exploring causal relationships. However, their potential for disentangling and controlling latent factors of variation in specialized domains like medical imaging remains under-explored. In this work, we present the first investigation of the power of pre-trained vision-language foundation models, once fine-tuned on medical image datasets, to perform latent disentanglement for factorized medical image generation and interpolation. Through extensive experiments on chest X-ray and skin datasets, we illustrate that fine-tuned, language-guided Stable Diffusion inherently learns to factorize key attributes for image generation, such as the patient's anatomical structures or disease diagnostic features. We devise a framework to identify, isolate, and manipulate key attributes through latent space trajectory traversal of generative models, facilitating precise control over medical image synthesis. |
10 pages |
OnSiteVRU: A High-Resolution Trajectory Dataset for High-Density Vulnerable Road Users | 2025-03-30 | ShowWith the acceleration of urbanization and the growth of transportation demands, the safety of vulnerable road users (VRUs, such as pedestrians and cyclists) in mixed traffic flows has become increasingly prominent, necessitating high-precision and diverse trajectory data to support the development and optimization of autonomous driving systems. However, existing datasets fall short in capturing the diversity and dynamics of VRU behaviors, making it difficult to meet the research demands of complex traffic environments. To address this gap, this study developed the OnSiteVRU datasets, which cover a variety of scenarios, including intersections, road segments, and urban villages. These datasets provide trajectory data for motor vehicles, electric bicycles, and human-powered bicycles, totaling approximately 17,429 trajectories with a precision of 0.04 seconds. The datasets integrate both aerial-view natural driving data and onboard real-time dynamic detection data, along with environmental information such as traffic signals, obstacles, and real-time maps, enabling a comprehensive reconstruction of interaction events. The results demonstrate that VRU_Data outperforms traditional datasets in terms of VRU density and scene coverage, offering a more comprehensive representation of VRU behavioral characteristics. This provides critical support for traffic flow modeling, trajectory prediction, and autonomous driving virtual testing. The dataset is publicly available for download at: https://www.kaggle.com/datasets/zcyan2/mixed-traffic-trajectory-dataset-in-from-shanghai. |
|
Efficient Twin Migration in Vehicular Metaverses: Multi-Agent Split Deep Reinforcement Learning with Spatio-Temporal Trajectory Generation | 2025-03-30 | ShowVehicle Twins (VTs) as digital representations of vehicles can provide users with immersive experiences in vehicular metaverse applications, e.g., Augmented Reality (AR) navigation and embodied intelligence. VT migration is an effective way that migrates the VT when the locations of physical entities keep changing to maintain seamless immersive VT services. However, an efficient VT migration is challenging due to the rapid movement of vehicles, dynamic workloads of Roadside Units (RSUs), and heterogeneous resources of the RSUs. To achieve efficient migration decisions and a minimum latency for the VT migration, we propose a multi-agent split Deep Reinforcement Learning (DRL) framework combined with spatio-temporal trajectory generation. In this framework, multiple split DRL agents utilize split architecture to efficiently determine VT migration decisions. Furthermore, we propose a spatio-temporal trajectory generation algorithm based on trajectory datasets and road network data to simulate vehicle trajectories, enhancing the generalization of the proposed scheme for managing VT migration in dynamic network environments. Finally, experimental results demonstrate that the proposed scheme not only enhances the Quality of Experience (QoE) by 29% but also reduces the computational parameter count by approximately 25% while maintaining similar performances, enhancing users' immersive experiences in vehicular metaverses. |
|
TRACE: Intra-visit Clinical Event Nowcasting via Effective Patient Trajectory Encoding | 2025-03-29 | ShowElectronic Health Records (EHR) have become a valuable resource for a wide range of predictive tasks in healthcare. However, existing approaches have largely focused on inter-visit event predictions, overlooking the importance of intra-visit nowcasting, which provides prompt clinical insights during an ongoing patient visit. To address this gap, we introduce the task of laboratory measurement prediction within a hospital visit. We study the laboratory data that, however, remained underexplored in previous work. We propose TRACE, a Transformer-based model designed for clinical event nowcasting by encoding patient trajectories. TRACE effectively handles long sequences and captures temporal dependencies through a novel timestamp embedding that integrates decay properties and periodic patterns of data. Additionally, we introduce a smoothed mask for denoising, improving the robustness of the model. Experiments on two large-scale electronic health record datasets demonstrate that the proposed model significantly outperforms previous methods, highlighting its potential for improving patient care through more accurate laboratory measurement nowcasting. The code is available at https://github.com/Amehi/TRACE. |
Accep...Accepted by WWW'25 short paper track |
Unified Uncertainty-Aware Diffusion for Multi-Agent Trajectory Modeling | 2025-03-29 | ShowMulti-agent trajectory modeling has primarily focused on forecasting future states, often overlooking broader tasks like trajectory completion, which are crucial for real-world applications such as correcting tracking data. Existing methods also generally predict agents' states without offering any state-wise measure of uncertainty. Moreover, popular multi-modal sampling methods lack any error probability estimates for each generated scene under the same prior observations, making it difficult to rank the predictions during inference time. We introduce U2Diff, a \textbf{unified} diffusion model designed to handle trajectory completion while providing state-wise \textbf{uncertainty} estimates jointly. This uncertainty estimation is achieved by augmenting the simple denoising loss with the negative log-likelihood of the predicted noise and propagating latent space uncertainty to the real state space. Additionally, we incorporate a Rank Neural Network in post-processing to enable \textbf{error probability} estimation for each generated mode, demonstrating a strong correlation with the error relative to ground truth. Our method outperforms the state-of-the-art solutions in trajectory completion and forecasting across four challenging sports datasets (NBA, Basketball-U, Football-U, Soccer-U), highlighting the effectiveness of uncertainty and error probability estimation. Video at https://youtu.be/ngw4D4eJToE |
Accep...Accepted to CVPR 2025 conference |
Simplification of Trajectory Streams | 2025-03-29 | ShowWhile there are software systems that simplify trajectory streams on the fly, few curve simplification algorithms with quality guarantees fit the streaming requirements. We present streaming algorithms for two such problems under the Fr'{e}chet distance |
\sigma |
SIGHT: Single-Image Conditioned Generation of Hand Trajectories for Hand-Object Interaction | 2025-03-28 | ShowWe introduce a novel task of generating realistic and diverse 3D hand trajectories given a single image of an object, which could be involved in a hand-object interaction scene or pictured by itself. When humans grasp an object, appropriate trajectories naturally form in our minds to use it for specific tasks. Hand-object interaction trajectory priors can greatly benefit applications in robotics, embodied AI, augmented reality and related fields. However, synthesizing realistic and appropriate hand trajectories given a single object or hand-object interaction image is a highly ambiguous task, requiring to correctly identify the object of interest and possibly even the correct interaction among many possible alternatives. To tackle this challenging problem, we propose the SIGHT-Fusion system, consisting of a curated pipeline for extracting visual features of hand-object interaction details from egocentric videos involving object manipulation, and a diffusion-based conditional motion generation model processing the extracted features. We train our method given video data with corresponding hand trajectory annotations, without supervision in the form of action labels. For the evaluation, we establish benchmarks utilizing the first-person FPHAB and HOI4D datasets, testing our method against various baselines and using multiple metrics. We also introduce task simulators for executing the generated hand trajectories and reporting task success rates as an additional metric. Experiments show that our method generates more appropriate and realistic hand trajectories than baselines and presents promising generalization capability on unseen objects. The accuracy of the generated hand trajectories is confirmed in a physics simulation setting, showcasing the authenticity of the created sequences and their applicability in downstream uses. |
|
Next-Best-Trajectory Planning of Robot Manipulators for Effective Observation and Exploration | 2025-03-28 | ShowVisual observation of objects is essential for many robotic applications, such as object reconstruction and manipulation, navigation, and scene understanding. Machine learning algorithms constitute the state-of-the-art in many fields but require vast data sets, which are costly and time-intensive to collect. Automated strategies for observation and exploration are crucial to enhance the efficiency of data gathering. Therefore, a novel strategy utilizing the Next-Best-Trajectory principle is developed for a robot manipulator operating in dynamic environments. Local trajectories are generated to maximize the information gained from observations along the path while avoiding collisions. We employ a voxel map for environment modeling and utilize raycasting from perspectives around a point of interest to estimate the information gain. A global ergodic trajectory planner provides an optional reference trajectory to the local planner, improving exploration and helping to avoid local minima. To enhance computational efficiency, raycasting for estimating the information gain in the environment is executed in parallel on the graphics processing unit. Benchmark results confirm the efficiency of the parallelization, while real-world experiments demonstrate the strategy's effectiveness. |
Accep...Accepted for publication at the IEEE International Conference on Robotics and Automation (ICRA), 2025 |
Robust Offline Imitation Learning Through State-level Trajectory Stitching | 2025-03-28 | ShowImitation learning (IL) has proven effective for enabling robots to acquire visuomotor skills through expert demonstrations. However, traditional IL methods are limited by their reliance on high-quality, often scarce, expert data, and suffer from covariate shift. To address these challenges, recent advances in offline IL have incorporated suboptimal, unlabeled datasets into the training. In this paper, we propose a novel approach to enhance policy learning from mixed-quality offline datasets by leveraging task-relevant trajectory fragments and rich environmental dynamics. Specifically, we introduce a state-based search framework that stitches state-action pairs from imperfect demonstrations, generating more diverse and informative training trajectories. Experimental results on standard IL benchmarks and real-world robotic tasks showcase that our proposed method significantly improves both generalization and performance. |
|
Reinforcement learning for efficient and robust multi-setpoint and multi-trajectory tracking in bioprocesses | 2025-03-28 | ShowEfficient and robust bioprocess control is essential for maximizing performance and adaptability in advanced biotechnological systems. In this work, we present a reinforcement-learning framework for multi-setpoint and multi-trajectory tracking. Tracking multiple setpoints and time-varying trajectories in reinforcement learning is challenging due to the complexity of balancing multiple objectives, a difficulty further exacerbated by system uncertainties such as uncertain initial conditions and stochastic dynamics. This challenge is relevant, e.g., in bioprocesses involving microbial consortia, where precise control over population compositions is required. We introduce a novel return function based on multiplicative reciprocal saturation functions, which explicitly couples reward gains to the simultaneous satisfaction of multiple references. Through a case study involving light-mediated cybergenetic growth control in microbial consortia, we demonstrate via computational experiments that our approach achieves faster convergence, improved stability, and superior control compliance compared to conventional quadratic-cost-based return functions. Moreover, our method enables tuning of the saturation function's parameters, shaping the learning process and policy updates. By incorporating system uncertainties, our framework also demonstrates robustness, a key requirement in industrial bioprocessing. Overall, this work advances reinforcement-learning-based control strategies in bioprocess engineering, with implications in the broader field of process and systems engineering. |
|
Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance | 2025-03-28 | ShowPre-trained conditional diffusion models have demonstrated remarkable potential in image editing. However, they often face challenges with temporal consistency, particularly in the talking head domain, where continuous changes in facial expressions intensify the level of difficulty. These issues stem from the independent editing of individual images and the inherent loss of temporal continuity during the editing process. In this paper, we introduce Follow Your Motion (FYM), a generic framework for maintaining temporal consistency in portrait editing. Specifically, given portrait images rendered by a pre-trained 3D Gaussian Splatting model, we first develop a diffusion model that intuitively and inherently learns motion trajectory changes at different scales and pixel coordinates, from the first frame to each subsequent frame. This approach ensures that temporally inconsistent edited avatars inherit the motion information from the rendered avatars. Secondly, to maintain fine-grained expression temporal consistency in talking head editing, we propose a dynamic re-weighted attention mechanism. This mechanism assigns higher weight coefficients to landmark points in space and dynamically updates these weights based on landmark loss, achieving more consistent and refined facial expressions. Extensive experiments demonstrate that our method outperforms existing approaches in terms of temporal consistency and can be used to optimize and compensate for temporally inconsistent outputs in a range of applications, such as text-driven editing, relighting, and various other applications. |
|
Multi-modal Knowledge Distillation-based Human Trajectory Forecasting | 2025-03-28 | ShowPedestrian trajectory forecasting is crucial in various applications such as autonomous driving and mobile robot navigation. In such applications, camera-based perception enables the extraction of additional modalities (human pose, text) to enhance prediction accuracy. Indeed, we find that textual descriptions play a crucial role in integrating additional modalities into a unified understanding. However, online extraction of text requires the use of VLM, which may not be feasible for resource-constrained systems. To address this challenge, we propose a multi-modal knowledge distillation framework: a student model with limited modality is distilled from a teacher model trained with full range of modalities. The comprehensive knowledge of a teacher model trained with trajectory, human pose, and text is distilled into a student model using only trajectory or human pose as a sole supplement. In doing so, we separately distill the core locomotion insights from intra-agent multi-modality and inter-agent interaction. Our generalizable framework is validated with two state-of-the-art models across three datasets on both ego-view (JRDB, SIT) and BEV-view (ETH/UCY) setups, utilizing both annotated and VLM-generated text captions. Distilled student models show consistent improvement in all prediction metrics for both full and instantaneous observations, improving up to ~13%. The code is available at https://github.com/Jaewoo97/KDTF. |
Accep...Accepted to CVPR 2025 |
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis | 2025-03-28 | ShowThe intuitive nature of drag-based interaction has led to its growing adoption for controlling object trajectories in image-to-video synthesis. Still, existing methods that perform dragging in the 2D space usually face ambiguity when handling out-of-plane movements. In this work, we augment the interaction with a new dimension, i.e., the depth dimension, such that users are allowed to assign a relative depth for each point on the trajectory. That way, our new interaction paradigm not only inherits the convenience from 2D dragging, but facilitates trajectory control in the 3D space, broadening the scope of creativity. We propose a pioneering method for 3D trajectory control in image-to-video synthesis by abstracting object masks into a few cluster points. These points, accompanied by the depth information and the instance information, are finally fed into a video diffusion model as the control signal. Extensive experiments validate the effectiveness of our approach, dubbed LeviTor, in precisely manipulating the object movements when producing photo-realistic videos from static images. Our code is available at: https://github.com/ant-research/LeviTor. |
Proje...Project page available at https://github.com/ant-research/LeviTor |
Motion Prompting: Controlling Video Generation with Motion Trajectories | 2025-03-27 | ShowMotion control is crucial for generating expressive and compelling video content; however, most existing video generation models rely mainly on text prompts for control, which struggle to capture the nuances of dynamic actions and temporal compositions. To this end, we train a video generation model conditioned on spatio-temporally sparse or dense motion trajectories. In contrast to prior motion conditioning work, this flexible representation can encode any number of trajectories, object-specific or global scene motion, and temporally sparse motion; due to its flexibility we refer to this conditioning as motion prompts. While users may directly specify sparse trajectories, we also show how to translate high-level user requests into detailed, semi-dense motion prompts, a process we term motion prompt expansion. We demonstrate the versatility of our approach through various applications, including camera and object motion control, "interacting" with an image, motion transfer, and image editing. Our results showcase emergent behaviors, such as realistic physics, suggesting the potential of motion prompts for probing video models and interacting with future generative world models. Finally, we evaluate quantitatively, conduct a human study, and demonstrate strong performance. Video results are available on our webpage: https://motion-prompting.github.io/ |
CVPR ...CVPR 2025 camera ready. Project page: https://motion-prompting.github.io/ |
Model-Predictive Trajectory Generation for Aerial Search and Coverage | 2025-03-27 | ShowThis paper introduces a trajectory planning algorithm for search and coverage missions with an Unmanned Aerial Vehicle (UAV) based on an uncertainty map that represents prior knowledge of the target region, modeled by a Gaussian Mixture Model (GMM). The trajectory planning problem is formulated as an Optimal Control Problem (OCP), which aims to maximize the uncertainty reduction within a specified mission duration. However, this results in an intractable OCP whose objective functional cannot be expressed in closed form. To address this, we propose a Model Predictive Control (MPC) algorithm based on a relaxed formulation of the objective function to approximate the optimal solutions. This relaxation promotes efficient map exploration by penalizing overlaps in the UAV's visibility regions along the trajectory. The algorithm can produce efficient and smooth trajectories, and it can be efficiently implemented using standard Nonlinear Programming solvers, being suitable for real-time planning. Unlike traditional methods, which often rely on discretizing the mission space and using complex mixed-integer formulations, our approach is computationally efficient and easier to implement. The MPC algorithm is initially assessed in MATLAB, followed by Gazebo simulations and actual experimental tests conducted in an outdoor environment. The results demonstrate that the proposed strategy can generate efficient and smooth trajectories for search and coverage missions. |
|
Towards Optimizing a Convex Cover of Collision-Free Space for Trajectory Generation | 2025-03-27 | ShowWe propose an online iterative algorithm to optimize a convex cover to under-approximate the free space for autonomous navigation to delineate Safe Flight Corridors (SFC). The convex cover consists of a set of polytopes such that the union of the polytopes represents obstacle-free space, allowing us to find trajectories for robots that lie within the convex cover. In order to find the SFC that facilitates trajectory optimization, we iteratively find overlapping polytopes of maximum volumes that include specified waypoints initialized by a geometric or kinematic planner. Constraints at waypoints appear in two alternating stages of a joint optimization problem, which is solved by a novel heuristic-based iterative algorithm with partially distributed variables. We validate the effectiveness of our proposed algorithm using a range of parameterized environments and show its applications for two-stage motion planning. |
|
Consistency Trajectory Matching for One-Step Generative Super-Resolution | 2025-03-27 | ShowCurrent diffusion-based super-resolution (SR) approaches achieve commendable performance at the cost of high inference overhead. Therefore, distillation techniques are utilized to accelerate the multi-step teacher model into one-step student model. Nevertheless, these methods significantly raise training costs and constrain the performance of the student model by the teacher model. To overcome these tough challenges, we propose Consistency Trajectory Matching for Super-Resolution (CTMSR), a distillation-free strategy that is able to generate photo-realistic SR results in one step. Concretely, we first formulate a Probability Flow Ordinary Differential Equation (PF-ODE) trajectory to establish a deterministic mapping from low-resolution (LR) images with noise to high-resolution (HR) images. Then we apply the Consistency Training (CT) strategy to directly learn the mapping in one step, eliminating the necessity of pre-trained diffusion model. To further enhance the performance and better leverage the ground-truth during the training process, we aim to align the distribution of SR results more closely with that of the natural images. To this end, we propose to minimize the discrepancy between their respective PF-ODE trajectories from the LR image distribution by our meticulously designed Distribution Trajectory Matching (DTM) loss, resulting in improved realism of our recovered HR images. Comprehensive experimental results demonstrate that the proposed methods can attain comparable or even superior capabilities on both synthetic and real datasets while maintaining minimal inference latency. |
|
Latency Minimization for UAV-Enabled Federated Learning: Trajectory Design and Resource Allocation | 2025-03-27 | ShowFederated learning (FL) has become a transformative paradigm for distributed machine learning across wireless networks. However, the performance of FL is often hindered by the unreliable communication links between resource-constrained Internet of Things (IoT) devices and the central server. To overcome this challenge, we propose a novel framework that employs an unmanned aerial vehicle (UAV) as a mobile server to enhance the FL training process. By capitalizing on the UAV's mobility, we establish strong line-of-sight connections with IoT devices, thereby enhancing communication reliability and capacity. To maximize training efficiency, we formulate a latency minimization problem that jointly optimizes bandwidth allocation, computing frequencies, transmit power for both the UAV and IoT devices, and the UAV's flight trajectory. Subsequently, we analyze the required rounds of the IoT devices training and the UAV aggregation for FL convergence. Based on the convergence constraint, we transform the problem into three subproblems and develop an efficient alternating optimization algorithm to solve this problem effectively. Additionally, we provide a thorough analysis of the algorithm's convergence and computational complexity. Extensive numerical results demonstrate that our proposed scheme not only surpasses existing benchmark schemes in reducing latency up to 15.29%, but also achieves training efficiency that nearly matches the ideal scenario. |
This ...This manuscript has been submitted to IEEE |
A Multi-Modal Knowledge-Enhanced Framework for Vessel Trajectory Prediction | 2025-03-27 | ShowAccurate vessel trajectory prediction facilitates improved navigational safety, routing, and environmental protection. However, existing prediction methods are challenged by the irregular sampling time intervals of the vessel tracking data from the global AIS system and the complexity of vessel movement. These aspects render model learning and generalization difficult. To address these challenges and improve vessel trajectory prediction, we propose the multi-modal knowledge-enhanced framework (MAKER) for vessel trajectory prediction. To contend better with the irregular sampling time intervals, MAKER features a Large language model-guided Knowledge Transfer (LKT) module that leverages pre-trained language models to transfer trajectory-specific contextual knowledge effectively. To enhance the ability to learn complex trajectory patterns, MAKER incorporates a Knowledge-based Self-paced Learning (KSL) module. This module employs kinematic knowledge to progressively integrate complex patterns during training, allowing for adaptive learning and enhanced generalization. Experimental results on two vessel trajectory datasets show that MAKER can improve the prediction accuracy of state-of-the-art methods by 12.08%-17.86%. |
8 pages, 5 figures |
Vision-based Multi-future Trajectory Prediction: A Survey | 2025-03-26 | ShowVision-based trajectory prediction is an important task that supports safe and intelligent behaviours in autonomous systems. Many advanced approaches have been proposed over the years with improved spatial and temporal feature extraction. However, human behaviour is naturally diverse and uncertain. Given the past trajectory and surrounding environment information, an agent can have multiple plausible trajectories in the future. To tackle this problem, an essential task named multi-future trajectory prediction (MTP) has recently been studied. This task aims to generate a diverse, acceptable and explainable distribution of future predictions for each agent. In this paper, we present the first survey for MTP with our unique taxonomies and a comprehensive analysis of frameworks, datasets and evaluation metrics. We also compare models on existing MTP datasets and conduct experiments on the ForkingPath dataset. Finally, we discuss multiple future directions that can help researchers develop novel multi-future trajectory prediction systems and other diverse learning tasks similar to MTP. |
Accep...Accepted by TNNLS 2025 |
A Universal Model Combining Differential Equations and Neural Networks for Ball Trajectory Prediction | 2025-03-25 | ShowThis paper presents a data driven universal ball trajectory prediction method integrated with physics equations. Existing methods are designed for specific ball types and struggle to generalize. This challenge arises from three key factors. First, learning-based models require large datasets but suffer from accuracy drops in unseen scenarios. Second, physics-based models rely on complex formulas and detailed inputs, yet accurately obtaining ball states, such as spin, is often impractical. Third, integrating physical principles with neural networks to achieve high accuracy, fast inference, and strong generalization remains difficult. To address these issues, we propose an innovative approach that incorporates physics-based equations and neural networks. We first derive three generalized physical formulas. Then, using a neural network and observed trajectory points, we infer certain parameters while fitting the remaining ones. These formulas enable precise trajectory prediction with minimal training data: only a few dozen samples. Extensive experiments demonstrate our method superiority in generalization, real-time performance, and accuracy. |
This ...This submission was made without my advisor's consent, and I mistakenly uploaded an incorrect version of the paper. Additionally, some content in the paper should not be made publicly available at this time, as per my advisor's wishes. I apologize for any inconvenience this may have caused |
TraF-Align: Trajectory-aware Feature Alignment for Asynchronous Multi-agent Perception | 2025-03-25 | ShowCooperative perception presents significant potential for enhancing the sensing capabilities of individual vehicles, however, inter-agent latency remains a critical challenge. Latencies cause misalignments in both spatial and semantic features, complicating the fusion of real-time observations from the ego vehicle with delayed data from others. To address these issues, we propose TraF-Align, a novel framework that learns the flow path of features by predicting the feature-level trajectory of objects from past observations up to the ego vehicle's current time. By generating temporally ordered sampling points along these paths, TraF-Align directs attention from the current-time query to relevant historical features along each trajectory, supporting the reconstruction of current-time features and promoting semantic interaction across multiple frames. This approach corrects spatial misalignment and ensures semantic consistency across agents, effectively compensating for motion and achieving coherent feature fusion. Experiments on two real-world datasets, V2V4Real and DAIR-V2X-Seq, show that TraF-Align sets a new benchmark for asynchronous cooperative perception. |
Accep...Accepted to CVPR 2025 |
A Rapid Trajectory Optimization and Control Framework for Resource-Constrained Applications | 2025-03-24 | ShowThis paper presents a computationally efficient model predictive control formulation that uses an integral Chebyshev collocation method to enable rapid operations of autonomous agents. By posing the finite-horizon optimal control problem and recursive re-evaluation of the optimal trajectories, minimization of the L2 norms of the state and control errors are transcribed into a quadratic program. Control and state variable constraints are parameterized using Chebyshev polynomials and are accommodated in the optimal trajectory generation programs to incorporate the actuator limits and keep-out constraints. Differentiable collision detection of polytopes is leveraged for optimal collision avoidance. Results obtained from the collocation methods are benchmarked against the existing approaches on an edge computer to outline the performance improvements. Finally, collaborative control scenarios involving multi-agent space systems are considered to demonstrate the technical merits of the proposed work. |
This ...This work has been accepted for publication at the IEEE ACC 2025 |
Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training | 2025-03-24 | ShowReinforcement learning (RL) is a critical component of large language model (LLM) post-training. However, existing on-policy algorithms used for post-training are inherently incompatible with the use of experience replay buffers, which can be populated scalably by distributed off-policy actors to enhance exploration as compute increases. We propose efficiently obtaining this benefit of replay buffers via Trajectory Balance with Asynchrony (TBA), a massively scalable LLM RL system. In contrast to existing approaches, TBA uses a larger fraction of compute on search, constantly generating off-policy data for a central replay buffer. A training node simultaneously samples data from this buffer based on reward or recency to update the policy using Trajectory Balance (TB), a diversity-seeking RL objective introduced for GFlowNets. TBA offers three key advantages: (1) decoupled training and search, speeding up training wall-clock time by 4x or more; (2) improved diversity through large-scale off-policy sampling; and (3) scalable search for sparse reward settings. On mathematical reasoning, preference-tuning, and automated red-teaming (diverse and representative post-training tasks), TBA produces speed and performance improvements over strong baselines. |
|
CarPlanner: Consistent Auto-regressive Trajectory Planning for Large-scale Reinforcement Learning in Autonomous Driving | 2025-03-24 | ShowTrajectory planning is vital for autonomous driving, ensuring safe and efficient navigation in complex environments. While recent learning-based methods, particularly reinforcement learning (RL), have shown promise in specific scenarios, RL planners struggle with training inefficiencies and managing large-scale, real-world driving scenarios. In this paper, we introduce \textbf{CarPlanner}, a \textbf{C}onsistent \textbf{a}uto-\textbf{r}egressive \textbf{Planner} that uses RL to generate multi-modal trajectories. The auto-regressive structure enables efficient large-scale RL training, while the incorporation of consistency ensures stable policy learning by maintaining coherent temporal consistency across time steps. Moreover, CarPlanner employs a generation-selection framework with an expert-guided reward function and an invariant-view module, simplifying RL training and enhancing policy performance. Extensive analysis demonstrates that our proposed RL framework effectively addresses the challenges of training efficiency and performance enhancement, positioning CarPlanner as a promising solution for trajectory planning in autonomous driving. To the best of our knowledge, we are the first to demonstrate that the RL-based planner can surpass both IL- and rule-based state-of-the-arts (SOTAs) on the challenging large-scale real-world dataset nuPlan. Our proposed CarPlanner surpasses RL-, IL-, and rule-based SOTA approaches within this demanding dataset. |
CVPR 2025 |
DiffMove: Group Mobility Tendency Enhanced Trajectory Recovery via Diffusion Model | 2025-03-24 | ShowIn the real world, trajectory data is often sparse and incomplete due to low collection frequencies or limited device coverage. Trajectory recovery aims to recover these missing trajectory points, making the trajectories denser and more complete. However, this task faces two key challenges: 1) The excessive sparsity of individual trajectories makes it difficult to effectively leverage historical information for recovery; 2) Sparse trajectories make it harder to capture complex individual mobility preferences. To address these challenges, we propose a novel method called DiffMove. Firstly, we harness crowd wisdom for trajectory recovery. Specifically, we construct a group tendency graph using the collective trajectories of all users and then integrate the group mobility trends into the location representations via graph embedding. This solves the challenge of sparse trajectories being unable to rely on individual historical trajectories for recovery. Secondly, we capture individual mobility preferences from both historical and current perspectives. Finally, we integrate group mobility tendencies and individual preferences into the spatiotemporal distribution of the trajectory to recover high-quality trajectories. Extensive experiments on two real-world datasets demonstrate that DiffMove outperforms existing state-of-the-art methods. Further analysis validates the robustness of our method. |
|
Trajectory Imputation in Multi-Agent Sports with Derivative-Accumulating Self-Ensemble | 2025-03-23 | ShowMulti-agent trajectory data collected from domains such as team sports often suffer from missing values due to various factors. While many imputation methods have been proposed for spatiotemporal data, they are not well-suited for multi-agent sports scenarios where player movements are highly dynamic and inter-agent interactions continuously evolve. To address these challenges, we propose MIDAS (Multi-agent Imputer with Derivative-Accumulating Self-ensemble), a framework that imputes multi-agent trajectories with high accuracy and physical plausibility. It jointly predicts positions, velocities, and accelerations through a Set Transformer-based neural network and generates alternative estimates by recursively accumulating predicted velocity and acceleration values. These predictions are then combined using a learnable weighted ensemble to produce final imputed trajectories. Experiments on three sports datasets demonstrate that MIDAS significantly outperforms existing baselines in both positional accuracy and physical plausibility. Lastly, we showcase use cases of MIDAS, such as approximating total distance and pass success probability, to highlight its applicability to practical downstream tasks that require complete tracking data. |
|
EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting | 2025-03-23 | ShowScene reconstruction from casually captured videos has wide applications in real-world scenarios. With recent advancements in differentiable rendering techniques, several methods have attempted to simultaneously optimize scene representations (NeRF or 3DGS) and camera poses. Despite recent progress, existing methods relying on traditional camera input tend to fail in high-speed (or equivalently low-frame-rate) scenarios. Event cameras, inspired by biological vision, record pixel-wise intensity changes asynchronously with high temporal resolution, providing valuable scene and motion information in blind inter-frame intervals. In this paper, we introduce the event camera to aid scene construction from a casually captured video for the first time, and propose Event-Aided Free-Trajectory 3DGS, called EF-3DGS, which seamlessly integrates the advantages of event cameras into 3DGS through three key components. First, we leverage the Event Generation Model (EGM) to fuse events and frames, supervising the rendered views observed by the event stream. Second, we adopt the Contrast Maximization (CMax) framework in a piece-wise manner to extract motion information by maximizing the contrast of the Image of Warped Events (IWE), thereby calibrating the estimated poses. Besides, based on the Linear Event Generation Model (LEGM), the brightness information encoded in the IWE is also utilized to constrain the 3DGS in the gradient domain. Third, to mitigate the absence of color information of events, we introduce photometric bundle adjustment (PBA) to ensure view consistency across events and frames. We evaluate our method on the public Tanks and Temples benchmark and a newly collected real-world dataset, RealEv-DAVIS. Our project page is https://lbh666.github.io/ef-3dgs/. |
Proje...Project Page: https://lbh666.github.io/ef-3dgs/ |
Combining longitudinal cohort studies to examine cardiovascular risk factor trajectories across the adult lifespan | 2025-03-22 | ShowWe introduce a statistical framework for combining data from multiple large longitudinal cardiovascular cohorts to enable the study of long-term cardiovascular health starting in early adulthood. Using data from seven cohorts belonging to the Lifetime Risk Pooling Project (LRPP), we present a Bayesian hierarchical multivariate approach that jointly models multiple longitudinal risk factors over time and across cohorts. Because few cohorts in our project cover the entire adult lifespan, our strategy uses information from all risk factors to increase precision for each risk factor trajectory and borrows information across cohorts to fill in unobserved risk factors. We develop novel diagnostic testing and model validation methods to ensure that our model robustly captures and maintains critical relationships over time and across risk factors. |
|
Physically-Feasible Reactive Synthesis for Terrain-Adaptive Locomotion via Trajectory Optimization and Symbolic Repair | 2025-03-21 | ShowWe propose an integrated planning framework for quadrupedal locomotion over dynamically changing, unforeseen terrains. Existing approaches either rely on heuristics for instantaneous foothold selection--compromising safety and versatility--or solve expensive trajectory optimization problems with complex terrain features and long time horizons. In contrast, our framework leverages reactive synthesis to generate correct-by-construction controllers at the symbolic level, and mixed-integer convex programming (MICP) for dynamic and physically feasible footstep planning for each symbolic transition. We use a high-level manager to reduce the large state space in synthesis by incorporating local environment information, improving synthesis scalability. To handle specifications that cannot be met due to dynamic infeasibility, and to minimize costly MICP solves, we leverage a symbolic repair process to generate only necessary symbolic transitions. During online execution, re-running the MICP with real-world terrain data, along with runtime symbolic repair, bridges the gap between offline synthesis and online execution. We demonstrate, in simulation, our framework's capabilities to discover missing locomotion skills and react promptly in safety-critical environments, such as scattered stepping stones and rebars. |
|
MultiNash-PF: A Particle Filtering Approach for Computing Multiple Local Generalized Nash Equilibria in Trajectory Games | 2025-03-21 | ShowModern robotic systems frequently engage in complex multi-agent interactions, many of which are inherently multi-modal, meaning they can lead to multiple distinct outcomes. To interact effectively, robots must recognize the possible interaction modes and adapt to the one preferred by other agents. In this work, we propose an efficient algorithm for capturing the multimodality in multi-agent interactions. We leverage a game-theoretic planner to model interaction outcomes as equilibria where \emph{each equilibrium} corresponds to a distinct interaction \emph{mode}. We then develop an efficient algorithm to identify all the equilibria, allowing robots to reason about multiple interaction modes. More specifically, we formulate interactive planning as Constrained Potential Trajectory Games (CPTGs) and model interaction outcomes by local Generalized Nash equilibria (GNEs) of the game. CPTGs are a class of games for which a local GNE can be found by solving a single constrained optimal control problem where a potential function is minimized. We propose to integrate the potential game approach with implicit particle filtering, a sample-efficient method for non-convex trajectory optimization. We utilize implicit particle filtering to identify the coarse estimates of multiple local minimizers of the game's potential function. MultiNash-PF then refines these estimates with optimization solvers, obtaining different local GNEs. We show through numerical simulations that MultiNash-PF reduces computation time by up to 50% compared to a baseline. We further demonstrate the effectiveness of our algorithm in real-world human-robot interaction scenarios, where it successfully accounts for the multi-modal nature of interactions and resolves potential conflicts in real-time. |
|
Physical Plausibility-aware Trajectory Prediction via Locomotion Embodiment | 2025-03-21 | ShowHumans can predict future human trajectories even from momentary observations by using human pose-related cues. However, previous Human Trajectory Prediction (HTP) methods leverage the pose cues implicitly, resulting in implausible predictions. To address this, we propose Locomotion Embodiment, a framework that explicitly evaluates the physical plausibility of the predicted trajectory by locomotion generation under the laws of physics. While the plausibility of locomotion is learned with an indifferentiable physics simulator, it is replaced by our differentiable Locomotion Value function to train an HTP network in a data-driven manner. In particular, our proposed Embodied Locomotion loss is beneficial for efficiently training a stochastic HTP network using multiple heads. Furthermore, the Locomotion Value filter is proposed to filter out implausible trajectories at inference. Experiments demonstrate that our method enhances even the state-of-the-art HTP methods across diverse datasets and problem settings. Our code is available at: https://github.com/ImIntheMiddle/EmLoco. |
CVPR2...CVPR2025. Project page: https://iminthemiddle.github.io/EmLoco-Page/ |
Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning | 2025-03-21 | ShowIn recent years, data-driven reinforcement learning (RL), also known as offline RL, have gained significant attention. However, the role of data sampling techniques in offline RL has been overlooked despite its potential to enhance online RL performance. Recent research suggests applying sampling techniques directly to state-transitions does not consistently improve performance in offline RL. Therefore, in this study, we propose a memory technique, (Prioritized) Trajectory Replay (TR/PTR), which extends the sampling perspective to trajectories for more comprehensive information extraction from limited data. TR enhances learning efficiency by backward sampling of trajectories that optimizes the use of subsequent state information. Building on TR, we build the weighted critic target to avoid sampling unseen actions in offline training, and Prioritized Trajectory Replay (PTR) that enables more efficient trajectory sampling, prioritized by various trajectory priority metrics. We demonstrate the benefits of integrating TR and PTR with existing offline RL algorithms on D4RL. In summary, our research emphasizes the significance of trajectory-based data sampling techniques in enhancing the efficiency and performance of offline RL algorithms. |
Accep...Accepted by AAMAS 2024, see https://dl.acm.org/doi/10.5555/3635637.3662980 |
Parameter Adjustments in POMDP-Based Trajectory Planning for Unsignalized Intersections | 2025-03-20 | ShowThis paper investigates the problem of trajectory planning for autonomous vehicles at unsignalized intersections, specifically focusing on scenarios where the vehicle lacks the right of way and yet must cross safely. To address this issue, we have employed a method based on the Partially Observable Markov Decision Processes (POMDPs) framework designed for planning under uncertainty. The method utilizes the Adaptive Belief Tree (ABT) algorithm as an approximate solver for the POMDPs. We outline the POMDP formulation, beginning with discretizing the intersection's topology. Additionally, we present a dynamics model for the prediction of the evolving states of vehicles, such as their position and velocity. Using an observation model, we also describe the connection of those states with the imperfect (noisy) available measurements. Our results confirmed that the method is able to plan collision-free trajectories in a series of simulations utilizing real-world traffic data from aerial footage of two distinct intersections. Furthermore, we studied the impact of parameter adjustments of the ABT algorithm on the method's performance. This provides guidance in determining reasonable parameter settings, which is valuable for future method applications. |
Submitted version |
MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance | 2025-03-20 | ShowRecent advances in video generation have led to remarkable improvements in visual quality and temporal coherence. Upon this, trajectory-controllable video generation has emerged to enable precise object motion control through explicitly defined spatial paths. However, existing methods struggle with complex object movements and multi-object motion control, resulting in imprecise trajectory adherence, poor object consistency, and compromised visual quality. Furthermore, these methods only support trajectory control in a single format, limiting their applicability in diverse scenarios. Additionally, there is no publicly available dataset or benchmark specifically tailored for trajectory-controllable video generation, hindering robust training and systematic evaluation. To address these challenges, we introduce MagicMotion, a novel image-to-video generation framework that enables trajectory control through three levels of conditions from dense to sparse: masks, bounding boxes, and sparse boxes. Given an input image and trajectories, MagicMotion seamlessly animates objects along defined trajectories while maintaining object consistency and visual quality. Furthermore, we present MagicData, a large-scale trajectory-controlled video dataset, along with an automated pipeline for annotation and filtering. We also introduce MagicBench, a comprehensive benchmark that assesses both video quality and trajectory control accuracy across different numbers of objects. Extensive experiments demonstrate that MagicMotion outperforms previous methods across various metrics. Our project page are publicly available at https://quanhaol.github.io/magicmotion-site. |
|
Loop Closure from Two Views: Revisiting PGO for Scalable Trajectory Estimation through Monocular Priors | 2025-03-20 | Show(Visual) Simultaneous Localization and Mapping (SLAM) remains a fundamental challenge in enabling autonomous systems to navigate and understand large-scale environments. Traditional SLAM approaches struggle to balance efficiency and accuracy, particularly in large-scale settings where extensive computational resources are required for scene reconstruction and Bundle Adjustment (BA). However, this scene reconstruction, in the form of sparse pointclouds of visual landmarks, is often only used within the SLAM system because navigation and planning methods require different map representations. In this work, we therefore investigate a more scalable Visual SLAM (VSLAM) approach without reconstruction, mainly based on approaches for two-view loop closures. By restricting the map to a sparse keyframed pose graph without dense geometry representations, our '2GO' system achieves efficient optimization with competitive absolute trajectory accuracy. In particular, we find that recent advancements in image matching and monocular depth priors enable very accurate trajectory optimization from two-view edges. We conduct extensive experiments on diverse datasets, including large-scale scenarios, and provide a detailed analysis of the trade-offs between runtime, accuracy, and map size. Our results demonstrate that this streamlined approach supports real-time performance, scales well in map size and trajectory duration, and effectively broadens the capabilities of VSLAM for long-duration deployments to large environments. |
|
PoseTraj: Pose-Aware Trajectory Control in Video Diffusion | 2025-03-20 | ShowRecent advancements in trajectory-guided video generation have achieved notable progress. However, existing models still face challenges in generating object motions with potentially changing 6D poses under wide-range rotations, due to limited 3D understanding. To address this problem, we introduce PoseTraj, a pose-aware video dragging model for generating 3D-aligned motion from 2D trajectories. Our method adopts a novel two-stage pose-aware pretraining framework, improving 3D understanding across diverse trajectories. Specifically, we propose a large-scale synthetic dataset PoseTraj-10K, containing 10k videos of objects following rotational trajectories, and enhance the model perception of object pose changes by incorporating 3D bounding boxes as intermediate supervision signals. Following this, we fine-tune the trajectory-controlling module on real-world videos, applying an additional camera-disentanglement module to further refine motion accuracy. Experiments on various benchmark datasets demonstrate that our method not only excels in 3D pose-aligned dragging for rotational trajectories but also outperforms existing baselines in trajectory accuracy and video quality. |
Code,...Code, data and project page: https://robingg1.github.io/Pose-Traj/ |
Statistical modeling of categorical trajectories with multivariate functional principal components | 2025-03-20 | ShowThere are many examples in which the statistical units of interest are samples of a continuous time categorical random process, that is to say a continuous time stochastic process taking values in a finite state space. Without loosing any information, we associate to each state a binary random function, taking values in |
|
Tangles: Unpacking Extended Collision Experiences with Soma Trajectories | 2025-03-20 | ShowWe reappraise the idea of colliding with robots, moving from a position that tries to avoid or mitigate collisions to one that considers them an important facet of human interaction. We report on a soma design workshop that explored how our bodies could collide with telepresence robots, mobility aids, and a quadruped robot. Based on our findings, we employed soma trajectories to analyse collisions as extended experiences that negotiate key transitions of consent, preparation, launch, contact, ripple, sting, untangle, debris and reflect. We then employed these ideas to analyse two collision experiences, an accidental collision between a person and a drone, and the deliberate design of a robot to play with cats, revealing how real-world collisions involve the complex and ongoing entanglement of soma trajectories. We discuss how viewing collisions as entangled trajectories, or tangles, can be used analytically, as a design approach, and as a lens to broach ethical complexity. |
32 pages, 13 figures |
Perturb-and-Revise: Flexible 3D Editing with Generative Trajectories | 2025-03-20 | ShowRecent advancements in text-based diffusion models have accelerated progress in 3D reconstruction and text-based 3D editing. Although existing 3D editing methods excel at modifying color, texture, and style, they struggle with extensive geometric or appearance changes, thus limiting their applications. To this end, we propose Perturb-and-Revise, which makes possible a variety of NeRF editing. First, we perturb the NeRF parameters with random initializations to create a versatile initialization. The level of perturbation is determined automatically through analysis of the local loss landscape. Then, we revise the edited NeRF via generative trajectories. Combined with the generative process, we impose identity-preserving gradients to refine the edited NeRF. Extensive experiments demonstrate that Perturb-and-Revise facilitates flexible, effective, and consistent editing of color, appearance, and geometry in 3D. For 360{\deg} results, please visit our project page: https://susunghong.github.io/Perturb-and-Revise. |
CVPR ...CVPR 2025. Project page: https://susunghong.github.io/Perturb-and-Revise |
SSTP: Efficient Sample Selection for Trajectory Prediction | 2025-03-20 | ShowTrajectory prediction is a core task in autonomous driving. However, training advanced trajectory prediction models on large-scale datasets is both time-consuming and computationally expensive. In addition, the imbalanced distribution of driving scenarios often biases models toward data-rich cases, limiting performance in safety-critical, data-scarce conditions. To address these challenges, we propose the Sample Selection for Trajectory Prediction (SSTP) framework, which constructs a compact yet balanced dataset for trajectory prediction. SSTP consists of two main stages (1) Extraction, in which a pretrained trajectory prediction model computes gradient vectors for each sample to capture their influence on parameter updates; and (2) Selection, where a submodular function is applied to greedily choose a representative subset that covers diverse driving scenarios. This approach significantly reduces the dataset size and mitigates scenario imbalance, without sacrificing prediction accuracy and even improving in high-density cases. We evaluate our proposed SSTP on the Argoverse 1 and Argoverse 2 benchmarks using a wide range of recent state-of-the-art models. Our experiments demonstrate that SSTP achieves comparable performance to full-dataset training using only half the data while delivering substantial improvements in high-density traffic scenes and significantly reducing training time. Importantly, SSTP exhibits strong generalization and robustness, and the selected subset is model-agnostic, offering a broadly applicable solution. |
|
Reachable Sets-based Trajectory Planning Combining Reinforcement Learning and iLQR | 2025-03-19 | ShowThe driving risk field is applicable to more complex driving scenarios, providing new approaches for safety decision-making and active vehicle control in intricate environments. However, existing research often overlooks the driving risk field and fails to consider the impact of risk distribution within drivable areas on trajectory planning, which poses challenges for enhancing safety. This paper proposes a trajectory planning method for intelligent vehicles based on the risk reachable set to further improve the safety of trajectory planning. First, we construct the reachable set incorporating the driving risk field to more accurately assess and avoid potential risks in drivable areas. Then, the initial trajectory is generated based on safe reinforcement learning and projected onto the reachable set. Finally, we introduce a trajectory planning method based on a constrained iterative quadratic regulator to optimize the initial solution, ensuring that the planned trajectory achieves optimal comfort, safety, and efficiency. We conduct simulation tests of trajectory planning in high-speed lane-changing scenarios. The results indicate that the proposed method can guarantee trajectory comfort and driving efficiency, with the generated trajectory situated outside high-risk boundaries, thereby ensuring vehicle safety during operation. |
|
Sparse canonical correlation analysis for multiple measurements with latent trajectories | 2025-03-19 | ShowCanonical Correlation Analysis, CCA, is a widely used multivariate method in omics research for integrating high dimensional datasets. CCA identifies hidden links by deriving linear projections of features maximally correlating datasets. For standard CCA, observations must be independent of each other. As a result, it cannot properly deal with repeated measurements. Current CCA extensions dealing with these challenges either perform CCA on summarized data or estimate correlations for each measurement. While these techniques factor in the correlation between measurements, they are sub-optimal for high dimensional analysis and exploiting this datas longitudinal qualities. We propose a novel extension of sparse CCA that incorporates time dynamics at the latent level through longitudinal models. This approach addresses the correlation of repeated measurements while drawing latent paths, focusing on dynamics in the correlation structures. To aid interpretability and computational efficiency, we implement a penalty to enforce fixed sparsity levels. We estimate these trajectories fitting longitudinal models to the low dimensional latent variables, leveraging the clustered structure of high dimensional datasets, thus exploring shared longitudinal latent mechanisms. Furthermore, modeling time in the latent space significantly reduces computational burden. We validate our models performance using simulated data and show its real world applicability with data from the Human Microbiome Project. Our CCA method for repeated measurements enables efficient estimation of canonical correlations across measurements for clustered data. Compared to existing methods, ours substantially reduces computational time in high dimensional analyses as well as provides longitudinal trajectories that yield interpretable and insightful results. |
19 pages, 13 figures |
Scalable Trajectory-User Linking with Dual-Stream Representation Networks | 2025-03-19 | ShowTrajectory-user linking (TUL) aims to match anonymous trajectories to the most likely users who generated them, offering benefits for a wide range of real-world spatio-temporal applications. However, existing TUL methods are limited by high model complexity and poor learning of the effective representations of trajectories, rendering them ineffective in handling large-scale user trajectory data. In this work, we propose a novel $\underline{Scal}$abl$\underline{e}$ Trajectory-User Linking with dual-stream representation networks for large-scale |
The p...The paper has been accepted by AAAI 2025 |
Adaptive Trajectory Optimization for Task-Specific Human-Robot Collaboration | 2025-03-18 | ShowThis paper proposes a task-specific trajectory optimization framework for human-robot collaboration, enabling adaptive motion planning based on human interaction dynamics. Unlike conventional approaches that rely on predefined desired trajectories, the proposed framework optimizes the collaborative motion dynamically using the inverse differential Riccati equation, ensuring adaptability to task variations and human input. The generated trajectory serves as the reference for a neuro-adaptive PID controller, which leverages a neural network to adjust control gains in real time, addressing system uncertainties while maintaining low computational complexity. The combination of trajectory planning and the adaptive control law ensures stability and accurate joint-space tracking without requiring extensive parameter tuning. Numerical simulations validate the proposed approach. |
7 pag...7 pages, 6 figures, 1 table |
Stochastic Trajectory Prediction under Unstructured Constraints | 2025-03-18 | ShowTrajectory prediction facilitates effective planning and decision-making, while constrained trajectory prediction integrates regulation into prediction. Recent advances in constrained trajectory prediction focus on structured constraints by constructing optimization objectives. However, handling unstructured constraints is challenging due to the lack of differentiable formal definitions. To address this, we propose a novel method for constrained trajectory prediction using a conditional generative paradigm, named Controllable Trajectory Diffusion (CTD). The key idea is that any trajectory corresponds to a degree of conformity to a constraint. By quantifying this degree and treating it as a condition, a model can implicitly learn to predict trajectories under unstructured constraints. CTD employs a pre-trained scoring model to predict the degree of conformity (i.e., a score), and uses this score as a condition for a conditional diffusion model to generate trajectories. Experimental results demonstrate that CTD achieves high accuracy on the ETH/UCY and SDD benchmarks. Qualitative analysis confirms that CTD ensures adherence to unstructured constraints and can predict trajectories that satisfy combinatorial constraints. |
has b...has been accepted by ICRA 2025 |
Riemannian Variational Calculus: Optimal Trajectories Under Inertia, Gravity, and Drag Effects | 2025-03-18 | ShowRobotic motion optimization often focuses on task-specific solutions, overlooking fundamental motion principles. Building on Riemannian geometry and the calculus of variations (often appearing as indirect methods of optimal control), we derive an optimal control equation that expresses general forces as functions of configuration and velocity, revealing how inertia, gravity, and drag shape optimal trajectories. Our analysis identifies three key effects: (i) curvature effects of inertia manifold, (ii) curvature effects of potential field, and (iii) shortening effects from resistive force. We validate our approach on a two-link manipulator and a UR5, demonstrating a unified geometric framework for understanding optimal trajectories beyond geodesic-based planning. |
6 pag...6 pages, submitted to IEEE Control Systems Letters (L-CSS) |
A finite-sample bound for identifying partially observed linear switched systems from a single trajectory | 2025-03-17 | ShowWe derive a finite-sample probabilistic bound on the parameter estimation error of a system identification algorithm for Linear Switched Systems. The algorithm estimates Markov parameters from a single trajectory and applies a variant of the Ho-Kalman algorithm to recover the system matrices. Our bound guarantees statistical consistency under the assumption that the true system exhibits quadratic stability. The proof leverages the theory of weakly dependent processes. To the best of our knowledge, this is the first finite-sample bound for this algorithm in the single-trajectory setting. |
|
TraSCE: Trajectory Steering for Concept Erasure | 2025-03-17 | ShowRecent advancements in text-to-image diffusion models have brought them to the public spotlight, becoming widely accessible and embraced by everyday users. However, these models have been shown to generate harmful content such as not-safe-for-work (NSFW) images. While approaches have been proposed to erase such abstract concepts from the models, jail-breaking techniques have succeeded in bypassing such safety measures. In this paper, we propose TraSCE, an approach to guide the diffusion trajectory away from generating harmful content. Our approach is based on negative prompting, but as we show in this paper, a widely used negative prompting strategy is not a complete solution and can easily be bypassed in some corner cases. To address this issue, we first propose using a specific formulation of negative prompting instead of the widely used one. Furthermore, we introduce a localized loss-based guidance that enhances the modified negative prompting technique by steering the diffusion trajectory. We demonstrate that our proposed method achieves state-of-the-art results on various benchmarks in removing harmful content, including ones proposed by red teams, and erasing artistic styles and objects. Our proposed approach does not require any training, weight modifications, or training data (either image or prompt), making it easier for model owners to erase new concepts. |
|
Trajectory Optimization for Spatial Microstructure Control in Electron Beam Metal Additive Manufacturing | 2025-03-17 | ShowMetal additive manufacturing (AM) opens the possibility for spatial control of as-fabricated microstructure and properties. However, since the solid state diffusional transformations that drive microstructure outcomes are governed by nonlinear ODEs in terms of temperature, which is itself governed by PDEs over the entire part domain, solving for the system inputs needed to achieve desired microstructure distributions has proven difficult. In this work, we present a trajectory optimization approach for spatial control of microstructure in metal AM, which we demonstrate by controlling the hardness of a low-alloy steel in electron beam powder bed fusion (EB-PBF). To this end, we present models for thermal and microstructural dynamics. Next, we use experimental data to identify the parameters of the microstructure transformation dynamics. We then pose spatial microstructure control as a finite-horizon optimal control problem. The optimal power field trajectory is computed using an augmented Lagrangian differential dynamic programming (AL-DDP) method with GPU acceleration. The resulting time-varying power fields are then realized on an EB-PBF machine through an approximation scheme. Measurements of the resultant hardness shows that the optimized power field trajectory is able to closely produce the desired hardness distribution. |
6 pages, 6 figures |
A representational framework for learning and encoding structurally enriched trajectories in complex agent environments | 2025-03-17 | ShowThe ability of artificial intelligence agents to make optimal decisions and generalise them to different domains and tasks is compromised in complex scenarios. One way to address this issue has focused on learning efficient representations of the world and on how the actions of agents affect them, such as disentangled representations that exploit symmetries. Whereas such representations are procedurally efficient, they are based on the compression of low-level state-action transitions, which lack structural richness. To address this problem, we propose to enrich the agent's ontology and extend the traditional conceptualisation of trajectories to provide a more nuanced view of task execution. Structurally Enriched Trajectories (SETs) extend the encoding of sequences of states and their transitions by incorporating hierarchical relations between objects, interactions and affordances. SETs are built as multi-level graphs, providing a detailed representation of the agent dynamics and a transferable functional abstraction of the task. SETs are integrated into an architecture, Structurally Enriched Trajectory Learning and Encoding (SETLE), that employs a heterogeneous graph-based memory structure of multi-level relational dependencies essential for generalisation. Using reinforcement learning as a data generation tool, we demonstrate that SETLE can support downstream tasks, enabling agents to recognise task-relevant structural patterns across diverse environments. |
|
Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery | 2025-03-17 | ShowThis paper presents a framework for extracting georeferenced vehicle trajectories from high-altitude drone imagery, addressing key challenges in urban traffic monitoring and the limitations of traditional ground-based systems. Our approach integrates several novel contributions, including a tailored object detector optimized for high-altitude bird's-eye view perspectives, a unique track stabilization method that uses detected vehicle bounding boxes as exclusion masks during image registration, and an orthophoto and master frame-based georeferencing strategy that enhances consistent alignment across multiple drone viewpoints. Additionally, our framework features robust vehicle dimension estimation and detailed road segmentation, enabling comprehensive traffic analysis. Conducted in the Songdo International Business District, South Korea, the study utilized a multi-drone experiment covering 20 intersections, capturing approximately 12TB of 4K video data over four days. The framework produced two high-quality datasets: the Songdo Traffic dataset, comprising approximately 700,000 unique vehicle trajectories, and the Songdo Vision dataset, containing over 5,000 human-annotated images with about 300,000 vehicle instances in four classes. Comparisons with high-precision sensor data from an instrumented probe vehicle highlight the accuracy and consistency of our extraction pipeline in dense urban environments. The public release of Songdo Traffic and Songdo Vision, and the complete source code for the extraction pipeline, establishes new benchmarks in data quality, reproducibility, and scalability in traffic research. Results demonstrate the potential of integrating drone technology with advanced computer vision for precise and cost-effective urban traffic monitoring, providing valuable resources for developing intelligent transportation systems and enhancing traffic management strategies. |
|
Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latent Space | 2025-03-17 | ShowAdvanced end-to-end autonomous driving systems predict other vehicles' motions and plan ego vehicle's trajectory. The world model that can foresee the outcome of the trajectory has been used to evaluate the end-to-end autonomous driving system. However, existing world models predominantly emphasize the trajectory of the ego vehicle and leave other vehicles uncontrollable. This limitation hinders their ability to realistically simulate the interaction between the ego vehicle and the driving scenario. In addition, it remains a challenge to match multiple trajectories with each vehicle in the video to control the video generation. To address above issues, a driving World Model named EOT-WM is proposed in this paper, unifying Ego-Other vehicle Trajectories in videos. Specifically, we first project ego and other vehicle trajectories in the BEV space into the image coordinate to match each trajectory with its corresponding vehicle in the video. Then, trajectory videos are encoded by the Spatial-Temporal Variational Auto Encoder to align with driving video latents spatially and temporally in the unified visual space. A trajectory-injected diffusion Transformer is further designed to denoise the noisy video latents for video generation with the guidance of ego-other vehicle trajectories. In addition, we propose a metric based on control latent similarity to evaluate the controllability of trajectories. Extensive experiments are conducted on the nuScenes dataset, and the proposed model outperforms the state-of-the-art method by 30% in FID and 55% in FVD. The model can also predict unseen driving scenes with self-produced trajectories. |
8 pages, 7 figures |
EMoTive: Event-guided Trajectory Modeling for 3D Motion Estimation | 2025-03-17 | ShowVisual 3D motion estimation aims to infer the motion of 2D pixels in 3D space based on visual cues. The key challenge arises from depth variation induced spatio-temporal motion inconsistencies, disrupting the assumptions of local spatial or temporal motion smoothness in previous motion estimation frameworks. In contrast, event cameras offer new possibilities for 3D motion estimation through continuous adaptive pixel-level responses to scene changes. This paper presents EMoTive, a novel event-based framework that models spatio-temporal trajectories via event-guided non-uniform parametric curves, effectively characterizing locally heterogeneous spatio-temporal motion. Specifically, we first introduce Event Kymograph - an event projection method that leverages a continuous temporal projection kernel and decouples spatial observations to encode fine-grained temporal evolution explicitly. For motion representation, we introduce a density-aware adaptation mechanism to fuse spatial and temporal features under event guidance, coupled with a non-uniform rational curve parameterization framework to adaptively model heterogeneous trajectories. The final 3D motion estimation is achieved through multi-temporal sampling of parametric trajectories, yielding optical flow and depth motion fields. To facilitate evaluation, we introduce CarlaEvent3D, a multi-dynamic synthetic dataset for comprehensive validation. Extensive experiments on both this dataset and a real-world benchmark demonstrate the effectiveness of the proposed method. |
|
Intrinsic Successive Convexification: Trajectory Optimization on Smooth Manifolds | 2025-03-17 | ShowA fundamental issue at the core of trajectory optimization on smooth manifolds is handling the implicit manifold constraint within the dynamics. The conventional approach is to enforce the dynamic model as a constraint. However, we show this approach leads to significantly redundant operations, as well as being heavily dependent on the state space representation. Specifically, we propose an intrinsic successive convexification methodology for optimal control on smooth manifolds. This so-called iSCvx is then applied to a representative example involving attitude trajectory optimization for a spacecraft subject to non-convex constraints. |
|
CDKFormer: Contextual Deviation Knowledge-Based Transformer for Long-Tail Trajectory Prediction | 2025-03-16 | ShowPredicting the future movements of surrounding vehicles is essential for ensuring the safe operation and efficient navigation of autonomous vehicles (AVs) in urban traffic environments. Existing vehicle trajectory prediction methods primarily focus on improving overall performance, yet they struggle to address long-tail scenarios effectively. This limitation often leads to poor predictions in rare cases, significantly increasing the risk of safety incidents. Taking Argoverse 2 motion forecasting dataset as an example, we first investigate the long-tail characteristics in trajectory samples from two perspectives, individual motion and group interaction, and deriving deviation features to distinguish abnormal from regular scenarios. On this basis, we propose CDKFormer, a Contextual Deviation Knowledge-based Transformer model for long-tail trajectory prediction. CDKFormer integrates an attention-based scene context fusion module to encode spatiotemporal interaction and road topology. An additional deviation feature fusion module is proposed to capture the dynamic deviations in the target vehicle status. We further introduce a dual query-based decoder, supported by a multi-stream decoder block, to sequentially decode heterogeneous scene deviation features and generate multimodal trajectory predictions. Extensive experiments demonstrate that CDKFormer achieves state-of-the-art performance, significantly enhancing prediction accuracy and robustness for long-tailed trajectories compared to existing methods, thus advancing the reliability of AVs in complex real-world environments. |
|
Power Swing Trajectory Influenced by Virtual Impedance-Based Current-Limiting Strategy | 2025-03-15 | ShowGrid-forming (GFM) inverter-based resources (IBRs) can emulate the external characteristics of synchronous generators (SGs) through appropriate control loop design. However, in systems with GFM IBRs, the apparent impedance trajectory under current limitation differs significantly from that of SG-based systems due to the limited overcurrent capability of power electronic devices. This difference challenges the power swing detection functions of distance relays designed for SG-based systems. This paper presents a theoretical analysis of the apparent impedance trajectory over a full power swing cycle under two typical current-limiting strategies: variable virtual impedance (VI) and adaptive VI. The analysis reveals that the trajectory under VI current-limiting strategies differs significantly from that of a conventional SG. The results also indicate that the control parameters affect the characteristics of the trajectory. In addition, the new trajectories challenge conventional power swing detection functions, increasing the risk of malfunction. Furthermore, the implementation of VI leads to a deterioration in system stability. The theoretical analysis is further validated through simulations on the MATLAB/Simulink platform. |
|
D4orm: Multi-Robot Trajectories with Dynamics-aware Diffusion Denoised Deformations | 2025-03-15 | ShowThis work presents an optimization method for generating kinodynamically feasible and collision-free multi-robot trajectories that exploits an incremental denoising scheme in diffusion models. Our key insight is that high-quality trajectories can be discovered merely by denoising noisy trajectories sampled from a distribution. This approach has no learning component, relying instead on only two ingredients: a dynamical model of the robots to obtain feasible trajectories via rollout, and a score function to guide denoising with Monte Carlo gradient approximation. The proposed framework iteratively optimizes the deformation from the previous round with this denoising process, allows \textit{anytime} refinement as time permits, supports different dynamics, and benefits from GPU acceleration. Our evaluations for differential-drive and holonomic teams with up to 16 robots in 2D and 3D worlds show its ability to discover high-quality solutions faster than other black-box optimization methods such as MPPI, approximately three times faster in a 3D holonomic case with 16 robots. As evidence for feasibility, we demonstrate zero-shot deployment of the planned trajectories on eight multirotors. |
|
Robust Dataset Distillation by Matching Adversarial Trajectories | 2025-03-15 | ShowDataset distillation synthesizes compact datasets that enable models to achieve performance comparable to training on the original large-scale datasets. However, existing distillation methods overlook the robustness of the model, resulting in models that are vulnerable to adversarial attacks when trained on distilled data. To address this limitation, we introduce the task of ``robust dataset distillation", a novel paradigm that embeds adversarial robustness into the synthetic datasets during the distillation process. We propose Matching Adversarial Trajectories (MAT), a method that integrates adversarial training into trajectory-based dataset distillation. MAT incorporates adversarial samples during trajectory generation to obtain robust training trajectories, which are then used to guide the distillation process. As experimentally demonstrated, even through natural training on our distilled dataset, models can achieve enhanced adversarial robustness while maintaining competitive accuracy compared to existing distillation methods. Our work highlights robust dataset distillation as a new and important research direction and provides a strong baseline for future research to bridge the gap between efficient training and adversarial robustness. |
|
Learning a Fast Mixing Exogenous Block MDP using a Single Trajectory | 2025-03-15 | ShowIn order to train agents that can quickly adapt to new objectives or reward functions, efficient unsupervised representation learning in sequential decision-making environments can be important. Frameworks such as the Exogenous Block Markov Decision Process (Ex-BMDP) have been proposed to formalize this representation-learning problem (Efroni et al., 2022b). In the Ex-BMDP framework, the agent's high-dimensional observations of the environment have two latent factors: a controllable factor, which evolves deterministically within a small state space according to the agent's actions, and an exogenous factor, which represents time-correlated noise, and can be highly complex. The goal of the representation learning problem is to learn an encoder that maps from observations into the controllable latent space, as well as the dynamics of this space. Efroni et al. (2022b) has shown that this is possible with a sample complexity that depends only on the size of the controllable latent space, and not on the size of the noise factor. However, this prior work has focused on the episodic setting, where the controllable latent state resets to a specific start state after a finite horizon. By contrast, if the agent can only interact with the environment in a single continuous trajectory, prior works have not established sample-complexity bounds. We propose STEEL, the first provably sample-efficient algorithm for learning the controllable dynamics of an Ex-BMDP from a single trajectory, in the function approximation setting. STEEL has a sample complexity that depends only on the sizes of the controllable latent space and the encoder function class, and (at worst linearly) on the mixing time of the exogenous noise factor. We prove that STEEL is correct and sample-efficient, and demonstrate STEEL on two toy problems. Code is available at: https://github.com/midi-lab/steel. |
ICLR 2025 |
Title | Date | Abstract | Comment |
---|---|---|---|
Optimization of a Triangular Delaunay Mesh Generator using Reinforcement Learning | 2025-04-04 | ShowIn this work we introduce a triangular Delaunay mesh generator that can be trained using reinforcement learning to maximize a given mesh quality metric. Our mesh generator consists of a graph neural network that distributes and modifies vertices, and a standard Delaunay algorithm to triangulate the vertices. We explore various design choices and evaluate our mesh generator on various tasks including mesh generation, mesh improvement, and producing variable resolution meshes. The learned mesh generator outputs meshes that are comparable to those produced by Triangle and DistMesh, two popular Delaunay-based mesh generators. |
|
Optimizing Quantum Circuits via ZX Diagrams using Reinforcement Learning and Graph Neural Networks | 2025-04-04 | ShowQuantum computing is currently strongly limited by the impact of noise, in particular introduced by the application of two-qubit gates. For this reason, reducing the number of two-qubit gates is of paramount importance on noisy intermediate-scale quantum hardware. To advance towards more reliable quantum computing, we introduce a framework based on ZX calculus, graph-neural networks and reinforcement learning for quantum circuit optimization. By combining reinforcement learning and tree search, our method addresses the challenge of selecting optimal sequences of ZX calculus rewrite rules. Instead of relying on existing heuristic rules for minimizing circuits, our method trains a novel reinforcement learning policy that directly operates on ZX-graphs, therefore allowing us to search through the space of all possible circuit transformations to find a circuit significantly minimizing the number of CNOT gates. This way we can scale beyond hard-coded rules towards discovering arbitrary optimization rules. We demonstrate our method's competetiveness with state-of-the-art circuit optimizers and generalization capabilities on large sets of diverse random circuits. |
|
Proximal Policy Optimization with Graph Neural Networks for Optimal Power Flow | 2025-04-04 | ShowOptimal Power Flow (OPF) is a very traditional research area within the power systems field that seeks for the optimal operation point of electric power plants, and which needs to be solved every few minutes in real-world scenarios. However, due to the nonconvexities that arise in power generation systems, there is not yet a fast, robust solution technique for the full Alternating Current Optimal Power Flow (ACOPF). In the last decades, power grids have evolved into a typical dynamic, non-linear and large-scale control system, known as the power system, so searching for better and faster ACOPF solutions is becoming crucial. Appearance of Graph Neural Networks (GNN) has allowed the natural use of Machine Learning (ML) algorithms on graph data, such as power networks. On the other hand, Deep Reinforcement Learning (DRL) is known for its powerful capability to solve complex decision-making problems. Although solutions that use these two methods separately are beginning to appear in the literature, none has yet combined the advantages of both. We propose a novel architecture based on the Proximal Policy Optimization algorithm with Graph Neural Networks to solve the Optimal Power Flow. The objective is to design an architecture that learns how to solve the optimization problem and that is at the same time able to generalize to unseen scenarios. We compare our solution with the DCOPF in terms of cost after having trained our DRL agent on IEEE 30 bus system and then computing the OPF on that base network with topology changes |
|
Certified Defense on the Fairness of Graph Neural Networks | 2025-04-04 | ShowGraph Neural Networks (GNNs) have emerged as a prominent graph learning model in various graph-based tasks over the years. Nevertheless, due to the vulnerabilities of GNNs, it has been empirically proved that malicious attackers could easily corrupt the fairness level of their predictions by adding perturbations to the input graph data. In this paper, we take crucial steps to study a novel problem of certifiable defense on the fairness level of GNNs. Specifically, we propose a principled framework named ELEGANT and present a detailed theoretical certification analysis for the fairness of GNNs. ELEGANT takes any GNNs as its backbone, and the fairness level of such a backbone is theoretically impossible to be corrupted under certain perturbation budgets for attackers. Notably, ELEGANT does not have any assumption over the GNN structure or parameters, and does not require re-training the GNNs to realize certification. Hence it can serve as a plug-and-play framework for any optimized GNNs ready to be deployed. We verify the satisfactory effectiveness of ELEGANT in practice through extensive experiments on real-world datasets across different backbones of GNNs, where ELEGANT is also demonstrated to be beneficial for GNN debiasing. Open-source code can be found at https://github.com/yushundong/ELEGANT. |
|
Graph Attention for Heterogeneous Graphs with Positional Encoding | 2025-04-03 | ShowGraph Neural Networks (GNNs) have emerged as the de facto standard for modeling graph data, with attention mechanisms and transformers significantly enhancing their performance on graph-based tasks. Despite these advancements, the performance of GNNs on heterogeneous graphs often remains complex, with networks generally underperforming compared to their homogeneous counterparts. This work benchmarks various GNN architectures to identify the most effective methods for heterogeneous graphs, with a particular focus on node classification and link prediction. Our findings reveal that graph attention networks excel in these tasks. As a main contribution, we explore enhancements to these attention networks by integrating positional encodings for node embeddings. This involves utilizing the full Laplacian spectrum to accurately capture both the relative and absolute positions of each node within the graph, further enhancing performance on downstream tasks such as node classification and link prediction. |
10 pages, 3 figures |
BioX-CPath: Biologically-driven Explainable Diagnostics for Multistain IHC Computational Pathology | 2025-04-03 | ShowThe development of biologically interpretable and explainable models remains a key challenge in computational pathology, particularly for multistain immunohistochemistry (IHC) analysis. We present BioX-CPath, an explainable graph neural network architecture for whole slide image (WSI) classification that leverages both spatial and semantic features across multiple stains. At its core, BioX-CPath introduces a novel Stain-Aware Attention Pooling (SAAP) module that generates biologically meaningful, stain-aware patient embeddings. Our approach achieves state-of-the-art performance on both Rheumatoid Arthritis and Sjogren's Disease multistain datasets. Beyond performance metrics, BioX-CPath provides interpretable insights through stain attention scores, entropy measures, and stain interaction scores, that permit measuring model alignment with known pathological mechanisms. This biological grounding, combined with strong classification performance, makes BioX-CPath particularly suitable for clinical applications where interpretability is key. Source code and documentation can be found at: https://github.com/AmayaGS/BioX-CPath. |
Accep...Accepted for publication at CVPR 2025 |
MI-HGNN: Morphology-Informed Heterogeneous Graph Neural Network for Legged Robot Contact Perception | 2025-04-03 | ShowWe present a Morphology-Informed Heterogeneous Graph Neural Network (MI-HGNN) for learning-based contact perception. The architecture and connectivity of the MI-HGNN are constructed from the robot morphology, in which nodes and edges are robot joints and links, respectively. By incorporating the morphology-informed constraints into a neural network, we improve a learning-based approach using model-based knowledge. We apply the proposed MI-HGNN to two contact perception problems, and conduct extensive experiments using both real-world and simulated data collected using two quadruped robots. Our experiments demonstrate the superiority of our method in terms of effectiveness, generalization ability, model efficiency, and sample efficiency. Our MI-HGNN improved the performance of a state-of-the-art model that leverages robot morphological symmetry by 8.4% with only 0.21% of its parameters. Although MI-HGNN is applied to contact perception problems for legged robots in this work, it can be seamlessly applied to other types of multi-body dynamical systems and has the potential to improve other robot learning frameworks. Our code is made publicly available at https://github.com/lunarlab-gatech/Morphology-Informed-HGNN. |
6 pag...6 pages, 5 figures; This work has been accepted to ICRA 2025 and will soon be published |
A Hybrid Similarity-Aware Graph Neural Network with Transformer for Node Classification | 2025-04-03 | ShowNode classification has gained significant importance in graph deep learning with real-world applications such as recommendation systems, drug discovery, and citation networks. Graph Convolutional Networks and Graph Transformers have achieved superior performance in node classification tasks. However, the key concern with Graph Convolutional Networks is over-squashing, which limits their ability to capture long-range dependencies in the network. Additionally, Graph Transformers face scalability challenges, making it difficult to process large graphs efficiently. To address this, we propose a novel framework, A Hybrid SImilarity-Aware Graph Neural Network with Transformer for Node Classification (SIGNNet), which capitalizes on local and global structural information, enhances the model's capability to effectively capture fine-grained relationships and broader contextual patterns within the graph structure. The proposed method leverages Graph Convolutional Networks alongside a score-based mechanism to effectively capture local and global node interactions while addressing the limitations of over-squashing. Our proposed method employs a novel Personalized PageRank-based node sampling method to address scalability issues by generating subgraphs of nodes. Additionally, SIGNNet incorporates a novel attention mechanism, Structure-Aware Multi-Head Attention (SA-MHA), which integrates node structural information for informed attention weighting, enabling the model to prioritize nodes based on topological significance. Extensive experiments demonstrate the significant improvements achieved by the proposed method over existing state-of-the-art methods, with average accuracy gains of 6.03%, 5.47%, 4.78%, 19.10%, 19.61%, 7.22%, 19.54%, and 14.94% on Cora, Citeseer, CS, Wisconsin, Texas, Actor, Cornell and Chameleon datasets, respectively. |
|
STEAK: Streaming Network for Continual Learning of Object Relocations under Household Context Drifts | 2025-04-03 | ShowIn real-world settings, robots are expected to assist humans across diverse tasks and still continuously adapt to dynamic changes over time. For example, in domestic environments, robots can proactively help users by fetching needed objects based on learned routines, which they infer by observing how objects move over time. However, data from these interactions are inherently non-independent and non-identically distributed (non-i.i.d.), e.g., a robot assisting multiple users may encounter varying data distributions as individuals follow distinct habits. This creates a challenge: integrating new knowledge without catastrophic forgetting. To address this, we propose STREAK (Spatio Temporal RElocation with Adaptive Knowledge retention), a continual learning framework for real-world robotic learning. It leverages a streaming graph neural network with regularization and rehearsal techniques to mitigate context drifts while retaining past knowledge. Our method is time- and memory-efficient, enabling long-term learning without retraining on all past data, which becomes infeasible as data grows in real-world interactions. We evaluate STREAK on the task of incrementally predicting human routines over 50+ days across different households. Results show that it effectively prevents catastrophic forgetting while maintaining generalization, making it a scalable solution for long-term human-robot interactions. |
|
Toward General and Robust LLM-enhanced Text-attributed Graph Learning | 2025-04-03 | ShowRecent advancements in Large Language Models (LLMs) and the proliferation of Text-Attributed Graphs (TAGs) across various domains have positioned LLM-enhanced TAG learning as a critical research area. By utilizing rich graph descriptions, this paradigm leverages LLMs to generate high-quality embeddings, thereby enhancing the representational capacity of Graph Neural Networks (GNNs). However, the field faces significant challenges: (1) the absence of a unified framework to systematize the diverse optimization perspectives arising from the complex interactions between LLMs and GNNs, and (2) the lack of a robust method capable of handling real-world TAGs, which often suffer from texts and edge sparsity, leading to suboptimal performance. To address these challenges, we propose UltraTAG, a unified pipeline for LLM-enhanced TAG learning. UltraTAG provides a unified comprehensive and domain-adaptive framework that not only organizes existing methodologies but also paves the way for future advancements in the field. Building on this framework, we propose UltraTAG-S, a robust instantiation of UltraTAG designed to tackle the inherent sparsity issues in real-world TAGs. UltraTAG-S employs LLM-based text propagation and text augmentation to mitigate text sparsity, while leveraging LLM-augmented node selection techniques based on PageRank and edge reconfiguration strategies to address edge sparsity. Our extensive experiments demonstrate that UltraTAG-S significantly outperforms existing baselines, achieving improvements of 2.12% and 17.47% in ideal and sparse settings, respectively. Moreover, as the data sparsity ratio increases, the performance improvement of UltraTAG-S also rises, which underscores the effectiveness and robustness of UltraTAG-S. |
|
Enhancing Customer Contact Efficiency with Graph Neural Networks in Credit Card Fraud Detection Workflow | 2025-04-03 | ShowCredit card fraud has been a persistent issue since the last century, causing significant financial losses to the industry. The most effective way to prevent fraud is by contacting customers to verify suspicious transactions. However, while these systems are designed to detect fraudulent activity, they often mistakenly flag legitimate transactions, leading to unnecessary declines that disrupt the user experience and erode customer trust. Frequent false positives can frustrate customers, resulting in dissatisfaction, increased complaints, and a diminished sense of security. To address these limitations, we propose a fraud detection framework incorporating Relational Graph Convolutional Networks (RGCN) to enhance the accuracy and efficiency of identifying fraudulent transactions. By leveraging the relational structure of transaction data, our model reduces the need for direct customer confirmation while maintaining high detection performance. Our experiments are conducted using the IBM credit card transaction dataset to evaluate the effectiveness of this approach. |
|
LLM-Augmented Graph Neural Recommenders: Integrating User Reviews | 2025-04-03 | ShowRecommender systems increasingly aim to combine signals from both user reviews and purchase (or other interaction) behaviors. While user-written comments provide explicit insights about preferences, merging these textual representations from large language models (LLMs) with graph-based embeddings of user actions remains a challenging task. In this work, we propose a framework that employs both a Graph Neural Network (GNN)-based model and an LLM to produce review-aware representations, preserving review semantics while mitigating textual noise. Our approach utilizes a hybrid objective that balances user-item interactions against text-derived features, ensuring that user's both behavioral and linguistic signals are effectively captured. We evaluate this method on multiple datasets from diverse application domains, demonstrating consistent improvements over a baseline GNN-based recommender model. Notably, our model achieves significant gains in recommendation accuracy when review data is sparse or unevenly distributed. These findings highlight the importance of integrating LLM-driven textual feedback with GNN-derived user behavioral patterns to develop robust, context-aware recommender systems. |
Under Review |
Reducing Smoothness with Expressive Memory Enhanced Hierarchical Graph Neural Networks | 2025-04-02 | ShowGraphical forecasting models learn the structure of time series data via projecting onto a graph, with recent techniques capturing spatial-temporal associations between variables via edge weights. Hierarchical variants offer a distinct advantage by analysing the time series across multiple resolutions, making them particularly effective in tasks like global weather forecasting, where low-resolution variable interactions are significant. A critical challenge in hierarchical models is information loss during forward or backward passes through the hierarchy. We propose the Hierarchical Graph Flow (HiGFlow) network, which introduces a memory buffer variable of dynamic size to store previously seen information across variable resolutions. We theoretically show two key results: HiGFlow reduces smoothness when mapping onto new feature spaces in the hierarchy and non-strictly enhances the utility of message-passing by improving Weisfeiler-Lehman (WL) expressivity. Empirical results demonstrate that HiGFlow outperforms state-of-the-art baselines, including transformer models, by at least an average of 6.1% in MAE and 6.2% in RMSE. Code is available at https://github.com/TB862/ HiGFlow.git. |
|
OmniCellTOSG: The First Cell Text-Omic Signaling Graphs Dataset for Joint LLM and GNN Modeling | 2025-04-02 | ShowComplex cell signaling systems -- governed by varying protein abundances and interactions -- generate diverse cell types across organs. These systems evolve under influences such as age, sex, diet, environmental exposures, and diseases, making them challenging to decode given the involvement of tens of thousands of genes and proteins. Recently, hundreds of millions of single-cell omics data have provided a robust foundation for understanding these signaling networks within various cell subpopulations and conditions. Inspired by the success of large foundation models (for example, large language models and large vision models) pre-trained on massive datasets, we introduce OmniCellTOSG, the first dataset of cell text-omic signaling graphs (TOSGs). Each TOSG represents the signaling network of an individual or meta-cell and is labeled with information such as organ, disease, sex, age, and cell subtype. OmniCellTOSG offers two key contributions. First, it introduces a novel graph model that integrates human-readable annotations -- such as biological functions, cellular locations, signaling pathways, related diseases, and drugs -- with quantitative gene and protein abundance data, enabling graph reasoning to decode cell signaling. This approach calls for new joint models combining large language models and graph neural networks. Second, the dataset is built from single-cell RNA sequencing data of approximately 120 million cells from diverse tissues and conditions (healthy and diseased) and is fully compatible with PyTorch. This facilitates the development of innovative cell signaling models that could transform research in life sciences, healthcare, and precision medicine. The OmniCellTOSG dataset is continuously expanding and will be updated regularly. The dataset and code are available at https://github.com/FuhaiLiAiLab/OmniCellTOSG. |
|
LL4G: Self-Supervised Dynamic Optimization for Graph-Based Personality Detection | 2025-04-02 | ShowGraph-based personality detection constructs graph structures from textual data, particularly social media posts. Current methods often struggle with sparse or noisy data and rely on static graphs, limiting their ability to capture dynamic changes between nodes and relationships. This paper introduces LL4G, a self-supervised framework leveraging large language models (LLMs) to optimize graph neural networks (GNNs). LLMs extract rich semantic features to generate node representations and to infer explicit and implicit relationships. The graph structure adaptively adds nodes and edges based on input data, continuously optimizing itself. The GNN then uses these optimized representations for joint training on node reconstruction, edge prediction, and contrastive learning tasks. This integration of semantic and structural information generates robust personality profiles. Experimental results on Kaggle and Pandora datasets show LL4G outperforms state-of-the-art models. |
|
From Text to Graph: Leveraging Graph Neural Networks for Enhanced Explainability in NLP | 2025-04-02 | ShowResearchers have relegated natural language processing tasks to Transformer-type models, particularly generative models, because these models exhibit high versatility when performing generation and classification tasks. As the size of these models increases, they achieve outstanding results. Given their widespread use, many explainability techniques are developed based on these models. However, this process becomes computationally expensive due to the large size of the models. Additionally, transformers interpret input information through tokens that fragment input words into sequences lacking inherent semantic meaning, complicating the explanation of the model from the very beginning. This study proposes a novel methodology to achieve explainability in natural language processing tasks by automatically converting sentences into graphs and maintaining semantics through nodes and relations that express fundamental linguistic concepts. It also allows the subsequent exploitation of this knowledge in subsequent tasks, making it possible to obtain trends and understand how the model associates the different elements inside the text with the explained task. The experiments delivered promising results in determining the most critical components within the text structure for a given classification. |
|
Graphically Speaking: Unmasking Abuse in Social Media with Conversation Insights | 2025-04-02 | ShowDetecting abusive language in social media conversations poses significant challenges, as identifying abusiveness often depends on the conversational context, characterized by the content and topology of preceding comments. Traditional Abusive Language Detection (ALD) models often overlook this context, which can lead to unreliable performance metrics. Recent Natural Language Processing (NLP) methods that integrate conversational context often depend on limited and simplified representations, and report inconsistent results. In this paper, we propose a novel approach that utilize graph neural networks (GNNs) to model social media conversations as graphs, where nodes represent comments, and edges capture reply structures. We systematically investigate various graph representations and context windows to identify the optimal configuration for ALD. Our GNN model outperform both context-agnostic baselines and linear context-aware methods, achieving significant improvements in F1 scores. These findings demonstrate the critical role of structured conversational context and establish GNNs as a robust framework for advancing context-aware abusive language detection. |
|
Geometric Reasoning in the Embedding Space | 2025-04-02 | ShowIn this contribution, we demonstrate that Graph Neural Networks and Transformers can learn to reason about geometric constraints. We train them to predict spatial position of points in a discrete 2D grid from a set of constraints that uniquely describe hidden figures containing these points. Both models are able to predict the position of points and interestingly, they form the hidden figures described by the input constraints in the embedding space during the reasoning process. Our analysis shows that both models recover the grid structure during training so that the embeddings corresponding to the points within the grid organize themselves in a 2D subspace and reflect the neighborhood structure of the grid. We also show that the Graph Neural Network we design for the task performs significantly better than the Transformer and is also easier to scale. |
|
Graph Representation Learning via Causal Diffusion for Out-of-Distribution Recommendation | 2025-04-02 | ShowGraph Neural Networks (GNNs)-based recommendation algorithms typically assume that training and testing data are drawn from independent and identically distributed (IID) spaces. However, this assumption often fails in the presence of out-of-distribution (OOD) data, resulting in significant performance degradation. In this study, we construct a Structural Causal Model (SCM) to analyze interaction data, revealing that environmental confounders (e.g., the COVID-19 pandemic) lead to unstable correlations in GNN-based models, thus impairing their generalization to OOD data. To address this issue, we propose a novel approach, graph representation learning via causal diffusion (CausalDiffRec) for OOD recommendation. This method enhances the model's generalization on OOD data by eliminating environmental confounding factors and learning invariant graph representations. Specifically, we use backdoor adjustment and variational inference to infer the real environmental distribution, thereby eliminating the impact of environmental confounders. This inferred distribution is then used as prior knowledge to guide the representation learning in the reverse phase of the diffusion process to learn the invariant representation. In addition, we provide a theoretical derivation that proves optimizing the objective function of CausalDiffRec can encourage the model to learn environment-invariant graph representations, thereby achieving excellent generalization performance in recommendations under distribution shifts. Our extensive experiments validate the effectiveness of CausalDiffRec in improving the generalization of OOD data, and the average improvement is up to 10.69% on Food, 18.83% on KuaiRec, 22.41% on Yelp2018, and 11.65% on Douban datasets. |
14 pa...14 pages, accepted by WWW2025 |
Identifying Obfuscated Code through Graph-Based Semantic Analysis of Binary Code | 2025-04-02 | ShowProtecting sensitive program content is a critical issue in various situations, ranging from legitimate use cases to unethical contexts. Obfuscation is one of the most used techniques to ensure such protection. Consequently, attackers must first detect and characterize obfuscation before launching any attack against it. This paper investigates the problem of function-level obfuscation detection using graph-based approaches, comparing algorithms, from elementary baselines to promising techniques like GNN (Graph Neural Networks), on different feature choices. We consider various obfuscation types and obfuscators, resulting in two complex datasets. Our findings demonstrate that GNNs need meaningful features that capture aspects of function semantics to outperform baselines. Our approach shows satisfactory results, especially in a challenging 11-class classification task and in a practical malware analysis example. |
The 1...The 13th International Conference on Complex Networks and their Applications, Dec 2024, Istabul, Turkey |
Adversarial Curriculum Graph-Free Knowledge Distillation for Graph Neural Networks | 2025-04-02 | ShowData-free Knowledge Distillation (DFKD) is a method that constructs pseudo-samples using a generator without real data, and transfers knowledge from a teacher model to a student by enforcing the student to overcome dimensional differences and learn to mimic the teacher's outputs on these pseudo-samples. In recent years, various studies in the vision domain have made notable advancements in this area. However, the varying topological structures and non-grid nature of graph data render the methods from the vision domain ineffective. Building upon prior research into differentiable methods for graph neural networks, we propose a fast and high-quality data-free knowledge distillation approach in this paper. Without compromising distillation quality, the proposed graph-free KD method (ACGKD) significantly reduces the spatial complexity of pseudo-graphs by leveraging the Binary Concrete distribution to model the graph structure and introducing a spatial complexity tuning parameter. This approach enables efficient gradient computation for the graph structure, thereby accelerating the overall distillation process. Additionally, ACGKD eliminates the dimensional ambiguity between the student and teacher models by increasing the student's dimensions and reusing the teacher's classifier. Moreover, it equips graph knowledge distillation with a CL-based strategy to ensure the student learns graph structures progressively. Extensive experiments demonstrate that ACGKD achieves state-of-the-art performance in distilling knowledge from GNNs without training data. |
|
Deep Graph Reinforcement Learning for UAV-Enabled Multi-User Secure Communications | 2025-04-02 | ShowWhile unmanned aerial vehicles (UAVs) with flexible mobility are envisioned to enhance physical layer security in wireless communications, the efficient security design that adapts to such high network dynamics is rather challenging. The conventional approaches extended from optimization perspectives are usually quite involved, especially when jointly considering factors in different scales such as deployment and transmission in UAV-related scenarios. In this paper, we address the UAV-enabled multi-user secure communications by proposing a deep graph reinforcement learning framework. Specifically, we reinterpret the security beamforming as a graph neural network (GNN) learning task, where mutual interference among users is managed through the message-passing mechanism. Then, the UAV deployment is obtained through soft actor-critic reinforcement learning, where the GNN-based security beamforming is exploited to guide the deployment strategy update. Simulation results demonstrate that the proposed approach achieves near-optimal security performance and significantly enhances the efficiency of strategy determination. Moreover, the deep graph reinforcement learning framework offers a scalable solution, adaptable to various network scenarios and configurations, establishing a robust basis for information security in UAV-enabled communications. |
Accepted at IEEE TMC |
Refining Interactions: Enhancing Anisotropy in Graph Neural Networks with Language Semantics | 2025-04-02 | ShowThe integration of Large Language Models (LLMs) with Graph Neural Networks (GNNs) has recently been explored to enhance the capabilities of Text Attribute Graphs (TAGs). Most existing methods feed textual descriptions of the graph structure or neighbouring nodes' text directly into LLMs. However, these approaches often cause LLMs to treat structural information simply as general contextual text, thus limiting their effectiveness in graph-related tasks. In this paper, we introduce LanSAGNN (Language Semantic Anisotropic Graph Neural Network), a framework that extends the concept of anisotropic GNNs to the natural language level. This model leverages LLMs to extract tailor-made semantic information for node pairs, effectively capturing the unique interactions within node relationships. In addition, we propose an efficient dual-layer LLMs finetuning architecture to better align LLMs' outputs with graph tasks. Experimental results demonstrate that LanSAGNN significantly enhances existing LLM-based methods without increasing complexity while also exhibiting strong robustness against interference. |
Accep...Accepted by ICME 2025 |
HCAF-DTA: drug-target binding affinity prediction with cross-attention fused hypergraph neural networks | 2025-04-02 | ShowAccurate prediction of the binding affinity between drugs and target proteins is a core task in computer-aided drug design. Existing deep learning methods tend to ignore the information of internal sub-structural features of drug molecules and drug-target interactions, resulting in limited prediction performance. In this paper, we propose a drug-target association prediction model HCAF-DTA based on cross-attention fusion hypergraph neural network. The model innovatively introduces hypergraph representation in the feature extraction stage: drug molecule hypergraphs are constructed based on the tree decomposition algorithm, and the sub-structural and global features extracted by fusing the hypergraph neural network with the graphical neural network through hopping connections, in which the hyper edges can efficiently characterise the functional functional groups and other key chemical features; for the protein feature extraction, a weighted graph is constructed based on the residues predicted by the ESM model contact maps to construct weighted graphs, and multilayer graph neural networks were used to capture spatial dependencies. In the prediction stage, a bidirectional multi-head cross-attention mechanism is designed to model intermolecular interactions from the dual viewpoints of atoms and amino acids, and cross-modal features with correlated information are fused by attention. Experiments on benchmark datasets such as Davis and KIBA show that HCAF-DTA outperforms state of the arts in all three performance evaluation metrics, with the MSE metrics reaching 0.198 and 0.122, respectively, with an improvement of up to 4% from the optimal baseline. |
|
Flexible and Explainable Graph Analysis for EEG-based Alzheimer's Disease Classification | 2025-04-02 | ShowAlzheimer's Disease is a progressive neurological disorder that is one of the most common forms of dementia. It leads to a decline in memory, reasoning ability, and behavior, especially in older people. The cause of Alzheimer's Disease is still under exploration and there is no all-inclusive theory that can explain the pathologies in each individual patient. Nevertheless, early intervention has been found to be effective in managing symptoms and slowing down the disease's progression. Recent research has utilized electroencephalography (EEG) data to identify biomarkers that distinguish Alzheimer's Disease patients from healthy individuals. Prior studies have used various machine learning methods, including deep learning and graph neural networks, to examine electroencephalography-based signals for identifying Alzheimer's Disease patients. In our research, we proposed a Flexible and Explainable Gated Graph Convolutional Network (GGCN) with Multi-Objective Tree-Structured Parzen Estimator (MOTPE) hyperparameter tuning. This provides a flexible solution that efficiently identifies the optimal number of GGCN blocks to achieve the optimized precision, specificity, and recall outcomes, as well as the optimized area under the Receiver Operating Characteristic (AUC). Our findings demonstrated a high efficacy with an over 0.9 Receiver Operating Characteristic score, alongside precision, specificity, and recall scores in distinguishing health control with Alzheimer's Disease patients in Moderate to Severe Dementia using the power spectrum density (PSD) of electroencephalography signals across various frequency bands. Moreover, our research enhanced the interpretability of the embedded adjacency matrices, revealing connectivity differences in frontal and parietal brain regions between Alzheimer's patients and healthy individuals. |
|
Learning Graph Quantized Tokenizers | 2025-04-02 | ShowTransformers serve as the backbone architectures of Foundational Models, where domain-specific tokenizers allow them to adapt to various domains. Graph Transformers (GTs) have recently emerged as leading models in geometric deep learning, outperforming Graph Neural Networks (GNNs) in various graph learning tasks. However, the development of tokenizers for graphs has lagged behind other modalities. To address this, we introduce GQT (\textbf{G}raph \textbf{Q}uantized \textbf{T}okenizer), which decouples tokenizer training from Transformer training by leveraging multi-task graph self-supervised learning, yielding robust and generalizable graph tokens. Furthermore, the GQT utilizes Residual Vector Quantization (RVQ) to learn hierarchical discrete tokens, resulting in significantly reduced memory requirements and improved generalization capabilities. By combining the GQT with token modulation, a Transformer encoder achieves state-of-the-art performance on 20 out of 22 benchmarks, including large-scale homophilic and heterophilic datasets. |
ICLR 2025 |
ComFairGNN: Community Fair Graph Neural Network | 2025-04-01 | ShowGraph Neural Networks (GNNs) have become the leading approach for addressing graph analytical problems in various real-world scenarios. However, GNNs may produce biased predictions against certain demographic subgroups due to node attributes and neighbors surrounding a node. Most current research on GNN fairness focuses predominantly on debiasing GNNs using oversimplified fairness evaluation metrics, which can give a misleading impression of fairness. Understanding the potential evaluation paradoxes due to the complicated nature of the graph structure is crucial for developing effective GNN debiasing mechanisms. In this paper, we examine the effectiveness of current GNN debiasing methods in terms of unfairness evaluation. Specifically, we introduce a community-level strategy to measure bias in GNNs and evaluate debiasing methods at this level. Further, We introduce ComFairGNN, a novel framework designed to mitigate community-level bias in GNNs. Our approach employs a learnable coreset-based debiasing function that addresses bias arising from diverse local neighborhood distributions during GNNs neighborhood aggregation. Comprehensive evaluations on three benchmark datasets demonstrate our model's effectiveness in both accuracy and fairness metrics. |
Publi...Published at PAKDD 2025 |
Neural Approaches to SAT Solving: Design Choices and Interpretability | 2025-04-01 | ShowIn this contribution, we provide a comprehensive evaluation of graph neural networks applied to Boolean satisfiability problems, accompanied by an intuitive explanation of the mechanisms enabling the model to generalize to different instances. We introduce several training improvements, particularly a novel closest assignment supervision method that dynamically adapts to the model's current state, significantly enhancing performance on problems with larger solution spaces. Our experiments demonstrate the suitability of variable-clause graph representations with recurrent neural network updates, which achieve good accuracy on SAT assignment prediction while reducing computational demands. We extend the base graph neural network into a diffusion model that facilitates incremental sampling and can be effectively combined with classical techniques like unit propagation. Through analysis of embedding space patterns and optimization trajectories, we show how these networks implicitly perform a process very similar to continuous relaxations of MaxSAT, offering an interpretable view of their reasoning process. This understanding guides our design choices and explains the ability of recurrent architectures to scale effectively at inference time beyond their training distribution, which we demonstrate with test-time scaling experiments. |
|
Efficient n-body simulations using physics informed graph neural networks | 2025-04-01 | ShowThis paper presents a novel approach for accelerating n-body simulations by integrating a physics-informed graph neural networks (GNN) with traditional numerical methods. Our method implements a leapfrog-based simulation engine to generate datasets from diverse astrophysical scenarios which are then transformed into graph representations. A custom-designed GNN is trained to predict particle accelerations with high precision. Experiments, conducted on 60 training and 6 testing simulations spanning from 3 to 500 bodies over 1000 time steps, demonstrate that the proposed model achieves extremely low prediction errors-loss values while maintaining robust long-term stability, with accumulated errors in position, velocity, and acceleration remaining insignificant. Furthermore, our method yields a modest speedup of approximately 17% over conventional simulation techniques. These results indicate that the integration of deep learning with traditional physical simulation methods offers a promising pathway to significantly enhance computational efficiency without compromising accuracy. |
10 pa...10 pages, 6 figures, 3 tables, accepted in conference MAEB 2025 (more info at https://www.uik.eus/es/curso/xvi-congreso-espanol-metaheuristicas-algoritmos-evolutivos-bioinspirados) |
GNN 101: Visual Learning of Graph Neural Networks in Your Web Browser | 2025-04-01 | ShowGraph Neural Networks (GNNs) have achieved significant success across various applications. However, their complex structures and inner workings can be challenging for non-AI experts to understand. To address this issue, this study presents \name{}, an educational visualization tool for interactive learning of GNNs. GNN 101 introduces a set of animated visualizations that seamlessly integrate mathematical formulas with visualizations via multiple levels of abstraction, including a model overview, layer operations, and detailed calculations. Users can easily switch between two complementary views: a node-link view that offers an intuitive understanding of the graph data, and a matrix view that provides a space-efficient and comprehensive overview of all features and their transformations across layers. GNN 101 was designed and developed based on close collaboration with four GNN experts and deployment in three GNN-related courses. We demonstrated the usability and effectiveness of GNN 101 via use cases and user studies with both GNN teaching assistants and students. To ensure broad educational access, GNN 101 is open-source and available directly in web browsers without requiring any installations. |
|
GKAN: Explainable Diagnosis of Alzheimer's Disease Using Graph Neural Network with Kolmogorov-Arnold Networks | 2025-04-01 | ShowAlzheimer's Disease (AD) is a progressive neurodegenerative disorder that poses significant diagnostic challenges due to its complex etiology. Graph Convolutional Networks (GCNs) have shown promise in modeling brain connectivity for AD diagnosis, yet their reliance on linear transformations limits their ability to capture intricate nonlinear patterns in neuroimaging data. To address this, we propose GCN-KAN, a novel single-modal framework that integrates Kolmogorov-Arnold Networks (KAN) into GCNs to enhance both diagnostic accuracy and interpretability. Leveraging structural MRI data, our model employs learnable spline-based transformations to better represent brain region interactions. Evaluated on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, GCN-KAN outperforms traditional GCNs by 4-8% in classification accuracy while providing interpretable insights into key brain regions associated with AD. This approach offers a robust and explainable tool for early AD diagnosis. |
12 pa...12 pages, 4 figures, under review of The Southwest Data Science Conference (SDSC 2025) |
Evaluating machine learning models for predicting pesticides toxicity to honey bees | 2025-04-01 | ShowSmall molecules play a critical role in the biomedical, environmental, and agrochemical domains, each with distinct physicochemical requirements and success criteria. Although biomedical research benefits from extensive datasets and established benchmarks, agrochemical data remain scarce, particularly with respect to species-specific toxicity. This work focuses on ApisTox, the most comprehensive dataset of experimentally validated chemical toxicity to the honey bee (Apis mellifera), an ecologically vital pollinator. We evaluate ApisTox using a diverse suite of machine learning approaches, including molecular fingerprints, graph kernels, and graph neural networks, as well as pretrained models. Comparative analysis with medicinal datasets from the MoleculeNet benchmark reveals that ApisTox represents a distinct chemical space. Performance degradation on non-medicinal datasets, such as ApisTox, demonstrates their limited generalizability of current state-of-the-art algorithms trained solely on biomedical data. Our study highlights the need for more diverse datasets and for targeted model development geared toward the agrochemical domain. |
|
Designing Heterogeneous GNNs with Desired Permutation Properties for Wireless Resource Allocation | 2025-04-01 | ShowGraph neural networks (GNNs) have been designed for learning a variety of wireless policies, i.e., the mappings from environment parameters to decision variables, thanks to their superior performance, and the potential in enabling scalability and size generalizability. These merits are rooted in leveraging permutation prior, i.e., satisfying the permutation property of the policy to be learned (referred to as desired permutation property). Many wireless policies are with complicated permutation properties. To satisfy these properties, heterogeneous GNNs (HetGNNs) should be used to learn such policies. There are two critical factors that enable a HetGNN to satisfy a desired permutation property: constructing an appropriate heterogeneous graph and judiciously designing the architecture of the HetGNN. However, both the graph and the HetGNN are designed heuristically so far. In this paper, we strive to provide a systematic approach for the design to satisfy the desired permutation property. We first propose a method for constructing a graph for a policy, where the edges and their types are defined for the sake of satisfying complicated permutation properties. Then, we provide and prove three sufficient conditions to design a HetGNN such that it can satisfy the desired permutation property when learning over an appropriate graph. These conditions suggest a method of designing the HetGNN with desired permutation property by sharing the processing, combining, and pooling functions according to the types of vertices and edges of the graph. We take power allocation and hybrid precoding policies as examples for demonstrating how to apply the proposed methods and validating the impact of the permutation prior by simulations. |
|
Simple yet Effective Node Property Prediction on Edge Streams under Distribution Shifts | 2025-04-01 | ShowThe problem of predicting node properties (e.g., node classes) in graphs has received significant attention due to its broad range of applications. Graphs from real-world datasets often evolve over time, with newly emerging edges and dynamically changing node properties, posing a significant challenge for this problem. In response, temporal graph neural networks (TGNNs) have been developed to predict dynamic node properties from a stream of emerging edges. However, our analysis reveals that most TGNN-based methods are (a) far less effective without proper node features and, due to their complex model architectures, (b) vulnerable to distribution shifts. In this paper, we propose SPLASH, a simple yet powerful method for predicting node properties on edge streams under distribution shifts. Our key contributions are as follows: (1) we propose feature augmentation methods and an automatic feature selection method for edge streams, which improve the effectiveness of TGNNs, (2) we propose a lightweight MLP-based TGNN architecture that is highly efficient and robust under distribution shifts, and (3) we conduct extensive experiments to evaluate the accuracy, efficiency, generalization, and qualitative performance of the proposed method and its competitors on dynamic node classification, dynamic anomaly detection, and node affinity prediction tasks across seven real-world datasets. |
14 pa...14 pages, 14 figures, To Appear in ICDE 2025 |
Lorentzian Graph Isomorphic Network | 2025-03-31 | ShowWe introduce the Lorentzian Graph Isomorphic Network (LGIN), a novel graph neural network (GNN) designed to operate in hyperbolic spaces, leveraging the Lorentzian model to enhance graph representation learning. Existing GNNs primarily operate in Euclidean spaces, which can limit their ability to capture hierarchical and multi-relational structures inherent to complex graphs. LGIN addresses this by incorporating curvature-aware aggregation functions that preserve the Lorentzian metric tensor, ensuring embeddings remain constrained within the hyperbolic space by proposing a new update rule that effectively captures both local neighborhood interactions and global structural properties, enabling LGIN to distinguish non-isomorphic graphs with expressiveness at least as powerful as the Weisfeiler-Lehman test. Through extensive evaluation across nine benchmark datasets, including molecular and protein structures, LGIN consistently outperforms or matches state-of-the-art GNNs, demonstrating its robustness and efficacy in modeling complex graph structures. To the best of our knowledge, this is the first study to extend the concept of a powerful graph neural network to Riemannian manifolds, paving the way for future advancements in hyperbolic graph learning. The code for our paper can be found at https://github.com/Deceptrax123/LGIN. |
Prepr...Preprint. Under Review |
Advances in Continual Graph Learning for Anti-Money Laundering Systems: A Comprehensive Review | 2025-03-31 | ShowFinancial institutions are required by regulation to report suspicious financial transactions related to money laundering. Therefore, they need to constantly monitor vast amounts of incoming and outgoing transactions. A particular challenge in detecting money laundering is that money launderers continuously adapt their tactics to evade detection. Hence, detection methods need constant fine-tuning. Traditional machine learning models suffer from catastrophic forgetting when fine-tuning the model on new data, thereby limiting their effectiveness in dynamic environments. Continual learning methods may address this issue and enhance current anti-money laundering (AML) practices, by allowing models to incorporate new information while retaining prior knowledge. Research on continual graph learning for AML, however, is still scarce. In this review, we critically evaluate state-of-the-art continual graph learning approaches for AML applications. We categorise methods into replay-based, regularization-based, and architecture-based strategies within the graph neural network (GNN) framework, and we provide in-depth experimental evaluations on both synthetic and real-world AML data sets that showcase the effect of the different hyperparameters. Our analysis demonstrates that continual learning improves model adaptability and robustness in the face of extreme class imbalances and evolving fraud patterns. Finally, we outline key challenges and propose directions for future research. |
|
Traffic Engineering in Large-scale Networks with Generalizable Graph Neural Networks | 2025-03-31 | ShowTraffic engineering (TE) in large-scale computer networks has become a fundamental yet challenging problem, owing to the swift growth of global-scale cloud wide-area networks or backbone low-Earth-orbit satellite constellations. To address the scalability issue of traditional TE algorithms, learning-based approaches have been proposed, showing potential of significant efficiency improvement over state-of-the-art methods. Nevertheless, the intrinsic limitations of existing learning-based methods hinder their practical application: they are not generalizable across diverse topologies and network conditions, incur excessive training overhead, and do not respect link capacities by default. This paper proposes TELGEN, a novel TE algorithm that learns to solve TE problems efficiently in large-scale networks, while achieving superior generalizability across diverse network conditions. TELGEN is based on the novel idea of transforming the problem of "predicting the optimal TE solution" into "predicting the optimal TE algorithm", which enables TELGEN to learn and efficiently approximate the end-to-end solving process of classical optimal TE algorithms. The learned algorithm is agnostic to the exact network topology or traffic patterns, and can efficiently solve TE problems given arbitrary inputs and generalize well to unseen topologies and demands. We trained and evaluated TELGEN on random and real-world networks with up to 5000 nodes and 106 links. TELGEN achieved less than 3% optimality gap while ensuring feasibility in all cases, even when the test network had up to 20x more nodes than the largest in training. It also saved up to 84% solving time than classical optimal solver, and could reduce training time per epoch and solving time by 2-4 orders of magnitude than latest learning algorithms on the largest networks. |
|
Backdoor Graph Condensation | 2025-03-31 | ShowGraph condensation has recently emerged as a prevalent technique to improve the training efficiency for graph neural networks (GNNs). It condenses a large graph into a small one such that a GNN trained on this small synthetic graph can achieve comparable performance to a GNN trained on the large graph. However, while existing graph condensation studies mainly focus on the best trade-off between graph size and the GNNs' performance (model utility), they overlook the security issues of graph condensation. To bridge this gap, we first explore backdoor attack against the GNNs trained on the condensed graphs. We introduce an effective backdoor attack against graph condensation, termed BGC. This attack aims to (1) preserve the condensed graph quality despite trigger injection, and (2) ensure trigger efficacy through the condensation process, achieving a high attack success rate. Specifically, BGC consistently updates triggers during condensation and targets representative nodes for poisoning. Extensive experiments demonstrate the effectiveness of our attack. BGC achieves a high attack success rate (close to 1.0) and good model utility in all cases. Furthermore, the results against multiple defense methods demonstrate BGC's resilience under their defenses. Finally, we analyze the key hyperparameters that influence the attack performance. Our code is available at: https://github.com/JiahaoWuGit/BGC. |
ICDE ...ICDE 2025 Camera Ready |
Graph Neural Network-Based Predictive Modeling for Robotic Plaster Printing | 2025-03-31 | ShowThis work proposes a Graph Neural Network (GNN) modeling approach to predict the resulting surface from a particle based fabrication process. The latter consists of spray-based printing of cementitious plaster on a wall and is facilitated with the use of a robotic arm. The predictions are computed using the robotic arm trajectory features, such as position, velocity and direction, as well as the printing process parameters. The proposed approach, based on a particle representation of the wall domain and the end effector, allows for the adoption of a graph-based solution. The GNN model consists of an encoder-processor-decoder architecture and is trained using data from laboratory tests, while the hyperparameters are optimized by means of a Bayesian scheme. The aim of this model is to act as a simulator of the printing process, and ultimately used for the generation of the robotic arm trajectory and the optimization of the printing parameters, towards the materialization of an autonomous plastering process. The performance of the proposed model is assessed in terms of the prediction error against unseen ground truth data, which shows its generality in varied scenarios, as well as in comparison with the performance of an existing benchmark model. The results demonstrate a significant improvement over the benchmark model, with notably better performance and enhanced error scaling across prediction steps. |
|
Inductive Graph Representation Learning with Quantum Graph Neural Networks | 2025-03-31 | ShowQuantum Graph Neural Networks (QGNNs) present a promising approach for combining quantum computing with graph-structured data processing. While classical Graph Neural Networks (GNNs) are renowned for their scalability and robustness, existing QGNNs often lack flexibility due to graph-specific quantum circuit designs, limiting their applicability to a narrower range of graph-structured problems, falling short of real-world scenarios. To address these limitations, we propose a versatile QGNN framework inspired by the classical GraphSAGE approach, utilizing quantum models as aggregators. In this work, we integrate established techniques for inductive representation learning on graphs with parametrized quantum convolutional and pooling layers, effectively bridging classical and quantum paradigms. The convolutional layer is flexible, enabling tailored designs for specific problems. Benchmarked on a node regression task with the QM9 dataset, we demonstrate that our framework successfully models a non-trivial molecular dataset, achieving performance comparable to classical GNNs. In particular, we show that our quantum approach exhibits robust generalization across molecules with varying numbers of atoms without requiring circuit modifications, slightly outperforming classical GNNs. Furthermore, we numerically investigate the scalability of the QGNN framework. Specifically, we demonstrate the absence of barren plateaus in our architecture as the number of qubits increases, suggesting that the proposed quantum model can be extended to handle larger and more complex graph-based problems effectively. |
18 pages, 6 figures |
A Concise Survey on Lane Topology Reasoning for HD Mapping | 2025-03-31 | ShowLane topology reasoning techniques play a crucial role in high-definition (HD) mapping and autonomous driving applications. While recent years have witnessed significant advances in this field, there has been limited effort to consolidate these works into a comprehensive overview. This survey systematically reviews the evolution and current state of lane topology reasoning methods, categorizing them into three major paradigms: procedural modeling-based methods, aerial imagery-based methods, and onboard sensors-based methods. We analyze the progression from early rule-based approaches to modern learning-based solutions utilizing transformers, graph neural networks (GNNs), and other deep learning architectures. The paper examines standardized evaluation metrics, including road-level measures (APLS and TLTS score), and lane-level metrics (DET and TOP score), along with performance comparisons on benchmark datasets such as OpenLane-V2. We identify key technical challenges, including dataset availability and model efficiency, and outline promising directions for future research. This comprehensive review provides researchers and practitioners with insights into the theoretical frameworks, practical implementations, and emerging trends in lane topology reasoning for HD mapping applications. |
Accep...Accepted by IEEE IV'25 |
Accelerating High-Efficiency Organic Photovoltaic Discovery via Pretrained Graph Neural Networks and Generative Reinforcement Learning | 2025-03-31 | ShowOrganic photovoltaic (OPV) materials offer a promising avenue toward cost-effective solar energy utilization. However, optimizing donor-acceptor (D-A) combinations to achieve high power conversion efficiency (PCE) remains a significant challenge. In this work, we propose a framework that integrates large-scale pretraining of graph neural networks (GNNs) with a GPT-2 (Generative Pretrained Transformer 2)-based reinforcement learning (RL) strategy to design OPV molecules with potentially high PCE. This approach produces candidate molecules with predicted efficiencies approaching 21%, although further experimental validation is required. Moreover, we conducted a preliminary fragment-level analysis to identify structural motifs recognized by the RL model that may contribute to enhanced PCE, thus providing design guidelines for the broader research community. To facilitate continued discovery, we are building the largest open-source OPV dataset to date, expected to include nearly 3,000 donor-acceptor pairs. Finally, we discuss plans to collaborate with experimental teams on synthesizing and characterizing AI-designed molecules, which will provide new data to refine and improve our predictive and generative models. |
AI fo...AI for Accelerated Materials Design - ICLR 2025 |
GNN-Based Candidate Node Predictor for Influence Maximization in Temporal Graphs | 2025-03-31 | ShowIn an age where information spreads rapidly across social media, effectively identifying influential nodes in dynamic networks is critical. Traditional influence maximization strategies often fail to keep up with rapidly evolving relationships and structures, leading to missed opportunities and inefficiencies. To address this, we propose a novel learning-based approach integrating Graph Neural Networks (GNNs) with Bidirectional Long Short-Term Memory (BiLSTM) models. This hybrid framework captures both structural and temporal dynamics, enabling accurate prediction of candidate nodes for seed set selection. The bidirectional nature of BiLSTM allows our model to analyze patterns from both past and future network states, ensuring adaptability to changes over time. By dynamically adapting to graph evolution at each time snapshot, our approach improves seed set calculation efficiency, achieving an average of 90% accuracy in predicting potential seed nodes across diverse networks. This significantly reduces computational overhead by optimizing the number of nodes evaluated for seed selection. Our method is particularly effective in fields like viral marketing and social network analysis, where understanding temporal dynamics is crucial. |
9 pag...9 pages, 5 figures, Accepted in AAAI25 to AI4TS Workshop@AAAI 2025 |
Graph neural networks extrapolate out-of-distribution for shortest paths | 2025-03-31 | ShowNeural networks (NNs), despite their success and wide adoption, still struggle to extrapolate out-of-distribution (OOD), i.e., to inputs that are not well-represented by their training dataset. Addressing the OOD generalization gap is crucial when models are deployed in environments significantly different from the training set, such as applying Graph Neural Networks (GNNs) trained on small graphs to large, real-world graphs. One promising approach for achieving robust OOD generalization is the framework of neural algorithmic alignment, which incorporates ideas from classical algorithms by designing neural architectures that resemble specific algorithmic paradigms (e.g. dynamic programming). The hope is that trained models of this form would have superior OOD capabilities, in much the same way that classical algorithms work for all instances. We rigorously analyze the role of algorithmic alignment in achieving OOD generalization, focusing on graph neural networks (GNNs) applied to the canonical shortest path problem. We prove that GNNs, trained to minimize a sparsity-regularized loss over a small set of shortest path instances, exactly implement the Bellman-Ford (BF) algorithm for shortest paths. In fact, if a GNN minimizes this loss within an error of |
|
Hierarchical graph sampling based minibatch learning with chain preservation and variance reduction | 2025-03-30 | ShowGraph sampling based Graph Convolutional Networks (GCNs) decouple the sampling from the forward and backward propagation during minibatch training, which exhibit good scalability in terms of layer depth and graph size. We propose HIS_GCNs, a hierarchical importance graph sampling based learning method. By constructing minibatches using sampled subgraphs, HIS_GCNs gives attention to the importance of both core and periphery nodes/edges in a scale-free training graph. Specifically, it preserves the centrum of the core to most minibatches, which maintains connectivity between periphery nodes, and samples periphery edges without core node interference, in order to keep more long chains composed entirely of low-degree nodes in the same minibatch. HIS_GCNs can maximize the discrete Ricci curvature (i.e., Ollivier-Ricci curvatures) of the edges in a subgraph that enables the preservation of important chains for information propagation, and can achieve a low node embedding variance and a high convergence speed. Diverse experiments on Graph Neural Networks (GNNs) with node classification tasks confirm superior performance of HIS_GCNs in both accuracy and training time. Open sourced code (https://github.com/HuQiaCHN/HIS-GCN). |
26 pages, 10 figures |
Simple Feedfoward Neural Networks are Almost All You Need for Time Series Forecasting | 2025-03-30 | ShowTime series data are everywhere -- from finance to healthcare -- and each domain brings its own unique complexities and structures. While advanced models like Transformers and graph neural networks (GNNs) have gained popularity in time series forecasting, largely due to their success in tasks like language modeling, their added complexity is not always necessary. In our work, we show that simple feedforward neural networks (SFNNs) can achieve performance on par with, or even exceeding, these state-of-the-art models, while being simpler, smaller, faster, and more robust. Our analysis indicates that, in many cases, univariate SFNNs are sufficient, implying that modeling interactions between multiple series may offer only marginal benefits. Even when inter-series relationships are strong, a basic multivariate SFNN still delivers competitive results. We also examine some key design choices and offer guidelines on making informed decisions. Additionally, we critique existing benchmarking practices and propose an improved evaluation protocol. Although SFNNs may not be optimal for every situation (hence the ``almost'' in our title) they serve as a strong baseline that future time series forecasting methods should always be compared against. |
|
Graph-Eq: Discovering Mathematical Equations using Graph Generative Models | 2025-03-30 | ShowThe ability to discover meaningful, accurate, and concise mathematical equations that describe datasets is valuable across various domains. Equations offer explicit relationships between variables, enabling deeper insights into underlying data patterns. Most existing equation discovery methods rely on genetic programming, which iteratively searches the equation space but is often slow and prone to overfitting. By representing equations as directed acyclic graphs, we leverage the use of graph neural networks to learn the underlying semantics of equations, and generate new, previously unseen equations. Although graph generative models have been shown to be successful in discovering new types of graphs in many fields, there application in discovering equations remains largely unexplored. In this work, we propose Graph-EQ, a deep graph generative model designed for efficient equation discovery. Graph-EQ uses a conditional variational autoencoder (CVAE) to learn a rich latent representation of the equation space by training it on a large corpus of equations in an unsupervised manner. Instead of directly searching the equation space, we employ Bayesian optimization to efficiently explore this learned latent space. We show that the encoder-decoder architecture of Graph-Eq is able to accurately reconstruct input equations. Moreover, we show that the learned latent representation can be sampled and decoded into valid equations, including new and previously unseen equations in the training data. Finally, we assess Graph-Eq's ability to discover equations that best fit a dataset by exploring the latent space using Bayesian optimization. Latent space exploration is done on 20 dataset with known ground-truth equations, and Graph-Eq is shown to successfully discover the grountruth equation in the majority of datasets. |
8 pages, 4 figures |
Krait: A Backdoor Attack Against Graph Prompt Tuning | 2025-03-30 | ShowGraph prompt tuning has emerged as a promising paradigm to effectively transfer general graph knowledge from pre-trained models to various downstream tasks, particularly in few-shot contexts. However, its susceptibility to backdoor attacks, where adversaries insert triggers to manipulate outcomes, raises a critical concern. We conduct the first study to investigate such vulnerability, revealing that backdoors can disguise benign graph prompts, thus evading detection. We introduce Krait, a novel graph prompt backdoor. Specifically, we propose a simple yet effective model-agnostic metric called label non-uniformity homophily to select poisoned candidates, significantly reducing computational complexity. To accommodate diverse attack scenarios and advanced attack types, we design three customizable trigger generation methods to craft prompts as triggers. We propose a novel centroid similarity-based loss function to optimize prompt tuning for attack effectiveness and stealthiness. Experiments on four real-world graphs demonstrate that Krait can efficiently embed triggers to merely 0.15% to 2% of training nodes, achieving high attack success rates without sacrificing clean accuracy. Notably, in one-to-one and all-to-one attacks, Krait can achieve 100% attack success rates by poisoning as few as 2 and 22 nodes, respectively. Our experiments further show that Krait remains potent across different transfer cases, attack types, and graph neural network backbones. Additionally, Krait can be successfully extended to the black-box setting, posing more severe threats. Finally, we analyze why Krait can evade both classical and state-of-the-art defenses, and provide practical insights for detecting and mitigating this class of attacks. |
Accep...Accepted by SaTML'2025 |
Question-Aware Knowledge Graph Prompting for Enhancing Large Language Models | 2025-03-30 | ShowLarge Language Models (LLMs) often struggle with tasks requiring external knowledge, such as knowledge-intensive Multiple Choice Question Answering (MCQA). Integrating Knowledge Graphs (KGs) can enhance reasoning; however, existing methods typically demand costly fine-tuning or retrieve noisy KG information. Recent approaches leverage Graph Neural Networks (GNNs) to generate KG-based input embedding prefixes as soft prompts for LLMs but fail to account for question relevance, resulting in noisy prompts. Moreover, in MCQA tasks, the absence of relevant KG knowledge for certain answer options remains a significant challenge. To address these issues, we propose Question-Aware Knowledge Graph Prompting (QAP), which incorporates question embeddings into GNN aggregation to dynamically assess KG relevance. QAP employs global attention to capture inter-option relationships, enriching soft prompts with inferred knowledge. Experimental results demonstrate that QAP outperforms state-of-the-art methods across multiple datasets, highlighting its effectiveness. |
|
POINT$^{2}$: A Polymer Informatics Training and Testing Database | 2025-03-30 | ShowThe advancement of polymer informatics has been significantly propelled by the integration of machine learning (ML) techniques, enabling the rapid prediction of polymer properties and expediting the discovery of high-performance polymeric materials. However, the field lacks a standardized workflow that encompasses prediction accuracy, uncertainty quantification, ML interpretability, and polymer synthesizability. In this study, we introduce POINT$^{2}$ (POlymer INformatics Training and Testing), a comprehensive benchmark database and protocol designed to address these critical challenges. Leveraging the existing labeled datasets and the unlabeled PI1M dataset, a collection of approximately one million virtual polymers generated via a recurrent neural network trained on the realistic polymers, we develop an ensemble of ML models, including Quantile Random Forests, Multilayer Perceptrons with dropout, Graph Neural Networks, and pretrained large language models. These models are coupled with diverse polymer representations such as Morgan, MACCS, RDKit, Topological, Atom Pair fingerprints, and graph-based descriptors to achieve property predictions, uncertainty estimations, model interpretability, and template-based polymerization synthesizability across a spectrum of properties, including gas permeability, thermal conductivity, glass transition temperature, melting temperature, fractional free volume, and density. The POINT$^{2}$ database can serve as a valuable resource for the polymer informatics community for polymer discovery and optimization. |
|
A Systematic Decade Review of Trip Route Planning with Travel Time Estimation based on User Preferences and Behavior | 2025-03-30 | ShowThis paper systematically explores the advancements in adaptive trip route planning and travel time estimation (TTE) through Artificial Intelligence (AI). With the increasing complexity of urban transportation systems, traditional navigation methods often struggle to accommodate dynamic user preferences, real-time traffic conditions, and scalability requirements. This study explores the contributions of established AI techniques, including Machine Learning (ML), Reinforcement Learning (RL), and Graph Neural Networks (GNNs), alongside emerging methodologies like Meta-Learning, Explainable AI (XAI), Generative AI, and Federated Learning. In addition to highlighting these innovations, the paper identifies critical challenges such as ethical concerns, computational scalability, and effective data integration, which must be addressed to advance the field. The paper concludes with recommendations for leveraging AI to build efficient, transparent, and sustainable navigation systems. |
6 pag...6 pages, 2 figures, 1 table |
TouchUp-G: Improving Feature Representation through Graph-Centric Finetuning | 2025-03-30 | ShowHow can we enhance the node features acquired from Pretrained Models (PMs) to better suit downstream graph learning tasks? Graph Neural Networks (GNNs) have become the state-of-the-art approach for many high-impact, real-world graph applications. For feature-rich graphs, a prevalent practice involves utilizing a PM directly to generate features, without incorporating any domain adaptation techniques. Nevertheless, this practice is suboptimal because the node features extracted from PM are graph-agnostic and prevent GNNs from fully utilizing the potential correlations between the graph structure and node features, leading to a decline in GNNs performance. In this work, we seek to improve the node features obtained from a PM for downstream graph tasks and introduce TOUCHUP-G, which has several advantages. It is (a) General: applicable to any downstream graph task, including link prediction which is often employed in recommender systems; (b) Multi-modal: able to improve raw features of any modality (e.g. images, texts, audio); (c) Principled: it is closely related to a novel metric, feature homophily, which we propose to quantify the potential correlations between the graph structure and node features and we show that TOUCHUP-G can effectively shrink the discrepancy between the graph structure and node features; (d) Effective: achieving state-of-the-art results on four real-world datasets spanning different tasks and modalities. |
SIGIR 2024 |
A QUBO Framework for Team Formation | 2025-03-29 | ShowThe team formation problem assumes a set of experts and a task, where each expert has a set of skills and the task requires some skills. The objective is to find a set of experts that maximizes coverage of the required skills while simultaneously minimizing the costs associated with the experts. Different definitions of cost have traditionally led to distinct problem formulations and algorithmic solutions. We introduce the unified TeamFormation formulation that captures all cost definitions for team formation problems that balance task coverage and expert cost. Specifically, we formulate three TeamFormation variants with different cost functions using quadratic unconstrained binary optimization (QUBO), and we evaluate two distinct general-purpose solution methods. We show that solutions based on the QUBO formulations of TeamFormation problems are at least as good as those produced by established baselines. Furthermore, we show that QUBO-based solutions leveraging graph neural networks can effectively learn representations of experts and skills to enable transfer learning, allowing node embeddings from one problem instance to be efficiently applied to another. |
|
Graph ODEs and Beyond: A Comprehensive Survey on Integrating Differential Equations with Graph Neural Networks | 2025-03-29 | ShowGraph Neural Networks (GNNs) and differential equations (DEs) are two rapidly advancing areas of research that have shown remarkable synergy in recent years. GNNs have emerged as powerful tools for learning on graph-structured data, while differential equations provide a principled framework for modeling continuous dynamics across time and space. The intersection of these fields has led to innovative approaches that leverage the strengths of both, enabling applications in physics-informed learning, spatiotemporal modeling, and scientific computing. This survey aims to provide a comprehensive overview of the burgeoning research at the intersection of GNNs and DEs. We will categorize existing methods, discuss their underlying principles, and highlight their applications across domains such as molecular modeling, traffic prediction, and epidemic spreading. Furthermore, we identify open challenges and outline future research directions to advance this interdisciplinary field. A comprehensive paper list is provided at https://github.com/Emory-Melody/Awesome-Graph-NDEs. This survey serves as a resource for researchers and practitioners seeking to understand and contribute to the fusion of GNNs and DEs |
|
Prediction of 30-day hospital readmission with clinical notes and EHR information | 2025-03-29 | ShowHigh hospital readmission rates are associated with significant costs and health risks for patients. Therefore, it is critical to develop predictive models that can support clinicians to determine whether or not a patient will return to the hospital in a relatively short period of time (e.g, 30-days). Nowadays, it is possible to collect both structured (electronic health records - EHR) and unstructured information (clinical notes) about a patient hospital event, all potentially containing relevant information for a predictive model. However, their integration is challenging. In this work we explore the combination of clinical notes and EHRs to predict 30-day hospital readmissions. We address the representation of the various types of information available in the EHR data, as well as exploring LLMs to characterize the clinical notes. We collect both information sources as the nodes of a graph neural network (GNN). Our model achieves an AUROC of 0.72 and a balanced accuracy of 66.7%, highlighting the importance of combining the multimodal information. |
|
ADAGE: Active Defenses Against GNN Extraction | 2025-03-29 | ShowGraph Neural Networks (GNNs) achieve high performance in various real-world applications, such as drug discovery, traffic states prediction, and recommendation systems. The fact that building powerful GNNs requires a large amount of training data, powerful computing resources, and human expertise turns the models into lucrative targets for model stealing attacks. Prior work has revealed that the threat vector of stealing attacks against GNNs is large and diverse, as an attacker can leverage various heterogeneous signals ranging from node labels to high-dimensional node embeddings to create a local copy of the target GNN at a fraction of the original training costs. This diversity in the threat vector renders the design of effective and general defenses challenging and existing defenses usually focus on one particular stealing setup. Additionally, they solely provide means to identify stolen model copies rather than preventing the attack. To close this gap, we propose the first and general Active Defense Against GNN Extraction (ADAGE). By analyzing the queries to the GNN, tracking their diversity in terms of proximity to different communities identified in the underlying graph, and increasing the defense strength with the growing fraction of communities that have been queried, ADAGE can prevent stealing in all common attack setups. Our extensive experimental evaluation using six benchmark datasets, four GNN models, and three types of adaptive attackers shows that ADAGE penalizes attackers to the degree of rendering stealing impossible, whilst not harming predictive performance for legitimate users. ADAGE, thereby, contributes towards securely sharing valuable GNNs in the future. |
Not a...Not all authors have given their explicit consent |
MIL vs. Aggregation: Evaluating Patient-Level Survival Prediction Strategies Using Graph-Based Learning | 2025-03-29 | ShowOncologists often rely on a multitude of data, including whole-slide images (WSIs), to guide therapeutic decisions, aiming for the best patient outcome. However, predicting the prognosis of cancer patients can be a challenging task due to tumor heterogeneity and intra-patient variability, and the complexity of analyzing WSIs. These images are extremely large, containing billions of pixels, making direct processing computationally expensive and requiring specialized methods to extract relevant information. Additionally, multiple WSIs from the same patient may capture different tumor regions, some being more informative than others. This raises a fundamental question: Should we use all WSIs to characterize the patient, or should we identify the most representative slide for prognosis? Our work seeks to answer this question by performing a comparison of various strategies for predicting survival at the WSI and patient level. The former treats each WSI as an independent sample, mimicking the strategy adopted in other works, while the latter comprises methods to either aggregate the predictions of the several WSIs or automatically identify the most relevant slide using multiple-instance learning (MIL). Additionally, we evaluate different Graph Neural Networks architectures under these strategies. We conduct our experiments using the MMIST-ccRCC dataset, which comprises patients with clear cell renal cell carcinoma (ccRCC). Our results show that MIL-based selection improves accuracy, suggesting that choosing the most representative slide benefits survival prediction. |
|
DP-GPL: Differentially Private Graph Prompt Learning | 2025-03-29 | ShowGraph Neural Networks (GNNs) have shown remarkable performance in various applications. Recently, graph prompt learning has emerged as a powerful GNN training paradigm, inspired by advances in language and vision foundation models. Here, a GNN is pre-trained on public data and then adapted to sensitive tasks using lightweight graph prompts. However, using prompts from sensitive data poses privacy risks. In this work, we are the first to investigate these practical risks in graph prompts by instantiating a membership inference attack that reveals significant privacy leakage. We also find that the standard privacy method, DP-SGD, fails to provide practical privacy-utility trade-offs in graph prompt learning, likely due to the small number of sensitive data points used to learn the prompts. As a solution, we propose DP-GPL for differentially private graph prompt learning based on the PATE framework, that generates a graph prompt with differential privacy guarantees. Our evaluation across various graph prompt learning methods, GNN architectures, and pre-training strategies demonstrates that our algorithm achieves high utility at strong privacy, effectively mitigating privacy concerns while preserving the powerful capabilities of prompted GNNs as powerful foundation models in the graph domain. |
Not a...Not all authors have given their explicit consent |
AuditVotes: A Framework Towards More Deployable Certified Robustness for Graph Neural Networks | 2025-03-29 | ShowDespite advancements in Graph Neural Networks (GNNs), adaptive attacks continue to challenge their robustness. Certified robustness based on randomized smoothing has emerged as a promising solution, offering provable guarantees that a model's predictions remain stable under adversarial perturbations within a specified range. However, existing methods face a critical trade-off between accuracy and robustness, as achieving stronger robustness requires introducing greater noise into the input graph. This excessive randomization degrades data quality and disrupts prediction consistency, limiting the practical deployment of certifiably robust GNNs in real-world scenarios where both accuracy and robustness are essential. To address this challenge, we propose \textbf{AuditVotes}, the first framework to achieve both high clean accuracy and certifiably robust accuracy for GNNs. It integrates randomized smoothing with two key components, \underline{au}gmentation and con\underline{dit}ional smoothing, aiming to improve data quality and prediction consistency. The augmentation, acting as a pre-processing step, de-noises the randomized graph, significantly improving data quality and clean accuracy. The conditional smoothing, serving as a post-processing step, employs a filtering function to selectively count votes, thereby filtering low-quality predictions and improving voting consistency. Extensive experimental results demonstrate that AuditVotes significantly enhances clean accuracy, certified robustness, and empirical robustness while maintaining high computational efficiency. Notably, compared to baseline randomized smoothing, AuditVotes improves clean accuracy by |
20 pages |
SuperEIO: Self-Supervised Event Feature Learning for Event Inertial Odometry | 2025-03-29 | ShowEvent cameras asynchronously output low-latency event streams, promising for state estimation in high-speed motion and challenging lighting conditions. As opposed to frame-based cameras, the motion-dependent nature of event cameras presents persistent challenges in achieving robust event feature detection and matching. In recent years, learning-based approaches have demonstrated superior robustness over traditional handcrafted methods in feature detection and matching, particularly under aggressive motion and HDR scenarios. In this paper, we propose SuperEIO, a novel framework that leverages the learning-based event-only detection and IMU measurements to achieve event-inertial odometry. Our event-only feature detection employs a convolutional neural network under continuous event streams. Moreover, our system adopts the graph neural network to achieve event descriptor matching for loop closure. The proposed system utilizes TensorRT to accelerate the inference speed of deep networks, which ensures low-latency processing and robust real-time operation on resource-limited platforms. Besides, we evaluate our method extensively on multiple public datasets, demonstrating its superior accuracy and robustness compared to other state-of-the-art event-based methods. We have also open-sourced our pipeline to facilitate research in the field: https://github.com/arclab-hku/SuperEIO. |
|
Graph Kolmogorov-Arnold Networks for Multi-Cancer Classification and Biomarker Identification, An Interpretable Multi-Omics Approach | 2025-03-29 | ShowThe integration of multi-omics data presents a major challenge in precision medicine, requiring advanced computational methods for accurate disease classification and biological interpretation. This study introduces the Multi-Omics Graph Kolmogorov-Arnold Network (MOGKAN), a deep learning model that integrates messenger RNA, micro RNA sequences, and DNA methylation data with Protein-Protein Interaction (PPI) networks for accurate and interpretable cancer classification across 31 cancer types. MOGKAN employs a hybrid approach combining differential expression with DESeq2, Linear Models for Microarray (LIMMA), and Least Absolute Shrinkage and Selection Operator (LASSO) regression to reduce multi-omics data dimensionality while preserving relevant biological features. The model architecture is based on the Kolmogorov-Arnold theorem principle, using trainable univariate functions to enhance interpretability and feature analysis. MOGKAN achieves classification accuracy of 96.28 percent and demonstrates low experimental variability with a standard deviation that is reduced by 1.58 to 7.30 percents compared to Convolutional Neural Networks (CNNs) and Graph Neural Networks (GNNs). The biomarkers identified by MOGKAN have been validated as cancer-related markers through Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis. The proposed model presents an ability to uncover molecular oncogenesis mechanisms by detecting phosphoinositide-binding substances and regulating sphingolipid cellular processes. By integrating multi-omics data with graph-based deep learning, our proposed approach demonstrates superior predictive performance and interpretability that has the potential to enhance the translation of complex multi-omics data into clinically actionable cancer diagnostics. |
|
Interpretability of Graph Neural Networks to Assess Effects of Global Change Drivers on Ecological Networks | 2025-03-28 | ShowPollinators play a crucial role for plant reproduction, either in natural ecosystem or in human-modified landscape. Global change drivers,including climate change or land use modifications, can alter the plant-pollinator interactions. To assess the potential influence of global change drivers on pollination, large-scale interactions, climate and land use data are required. While recent machine learning methods, such as graph neural networks (GNNs), allow the analysis of such datasets, interpreting their results can be challenging. We explore existing methods for interpreting GNNs in order to highlight the effects of various environmental covariates on pollination network connectivity. A large simulation study is performed to confirm whether these methods can detect the interactive effect between a covariate and a genus of plant on connectivity, and whether the application of debiasing techniques influences the estimation of these effects. An application on the Spipoll dataset, with and without accounting for sampling effects, highlights the potential impact of land use on network connectivity and shows that accounting for sampling effects partially alters the estimation of these effects. |
|
MCI-GRU: Stock Prediction Model Based on Multi-Head Cross-Attention and Improved GRU | 2025-03-28 | ShowAs financial markets grow increasingly complex in the big data era, accurate stock prediction has become more critical. Traditional time series models, such as GRUs, have been widely used but often struggle to capture the intricate nonlinear dynamics of markets, particularly in the flexible selection and effective utilization of key historical information. Recently, methods like Graph Neural Networks and Reinforcement Learning have shown promise in stock prediction but require high data quality and quantity, and they tend to exhibit instability when dealing with data sparsity and noise. Moreover, the training and inference processes for these models are typically complex and computationally expensive, limiting their broad deployment in practical applications. Existing approaches also generally struggle to capture unobservable latent market states effectively, such as market sentiment and expectations, microstructural factors, and participant behavior patterns, leading to an inadequate understanding of market dynamics and subsequently impact prediction accuracy. To address these challenges, this paper proposes a stock prediction model, MCI-GRU, based on a multi-head cross-attention mechanism and an improved GRU. First, we enhance the GRU model by replacing the reset gate with an attention mechanism, thereby increasing the model's flexibility in selecting and utilizing historical information. Second, we design a multi-head cross-attention mechanism for learning unobservable latent market state representations, which are further enriched through interactions with both temporal features and cross-sectional features. Finally, extensive experiments on four main stock markets show that the proposed method outperforms SOTA techniques across multiple metrics. Additionally, its successful application in real-world fund management operations confirms its effectiveness and practicality. |
|
Comparing Methods for Bias Mitigation in Graph Neural Networks | 2025-03-28 | ShowThis paper examines the critical role of Graph Neural Networks (GNNs) in data preparation for generative artificial intelligence (GenAI) systems, with a particular focus on addressing and mitigating biases. We present a comparative analysis of three distinct methods for bias mitigation: data sparsification, feature modification, and synthetic data augmentation. Through experimental analysis using the german credit dataset, we evaluate these approaches using multiple fairness metrics, including statistical parity, equality of opportunity, and false positive rates. Our research demonstrates that while all methods improve fairness metrics compared to the original dataset, stratified sampling and synthetic data augmentation using GraphSAGE prove particularly effective in balancing demographic representation while maintaining model performance. The results provide practical insights for developing more equitable AI systems while maintaining model performance. |
|
Efficient Data Selection for Training Genomic Perturbation Models | 2025-03-28 | ShowGenomic studies, including CRISPR-based PerturbSeq analyses, face a vast hypothesis space, while gene perturbations remain costly and time-consuming. Gene expression models based on graph neural networks are trained to predict the outcomes of gene perturbations to facilitate such experiments. Active learning methods are often employed to train these models due to the cost of the genomic experiments required to build the training set. However, poor model initialization in active learning can result in suboptimal early selections, wasting time and valuable resources. While typical active learning mitigates this issue over many iterations, the limited number of experimental cycles in genomic studies exacerbates the risk. To this end, we propose graph-based one-shot data selection methods for training gene expression models. Unlike active learning, one-shot data selection predefines the gene perturbations before training, hence removing the initialization bias. The data selection is motivated by theoretical studies of graph neural network generalization. The criteria are defined over the input graph and are optimized with submodular maximization. We compare them empirically to baselines and active learning methods that are state-of-the-art on this problem. The results demonstrate that graph-based one-shot data selection achieves comparable accuracy while alleviating the aforementioned risks. |
19 pages |
CFiCS: Graph-Based Classification of Common Factors and Microcounseling Skills | 2025-03-28 | ShowCommon factors and microcounseling skills are critical to the effectiveness of psychotherapy. Understanding and measuring these elements provides valuable insights into therapeutic processes and outcomes. However, automatic identification of these change principles from textual data remains challenging due to the nuanced and context-dependent nature of therapeutic dialogue. This paper introduces CFiCS, a hierarchical classification framework integrating graph machine learning with pretrained contextual embeddings. We represent common factors, intervention concepts, and microcounseling skills as a heterogeneous graph, where textual information from ClinicalBERT enriches each node. This structure captures both the hierarchical relationships (e.g., skill-level nodes linking to broad factors) and the semantic properties of therapeutic concepts. By leveraging graph neural networks, CFiCS learns inductive node embeddings that generalize to unseen text samples lacking explicit connections. Our results demonstrate that integrating ClinicalBERT node features and graph structure significantly improves classification performance, especially in fine-grained skill prediction. CFiCS achieves substantial gains in both micro and macro F1 scores across all tasks compared to baselines, including random forests, BERT-based multi-task models, and graph-based methods. |
10 pa...10 pages, 3 figures, 2 tables |
Invariant Control Strategies for Active Flow Control using Graph Neural Networks | 2025-03-28 | ShowReinforcement learning has gained traction for active flow control tasks, with initial applications exploring drag mitigation via flow field augmentation around a two-dimensional cylinder. RL has since been extended to more complex turbulent flows and has shown significant potential in learning complex control strategies. However, such applications remain computationally challenging due to its sample inefficiency and associated simulation costs. This fact is worsened by the lack of generalization capabilities of these trained policy networks, often being implicitly tied to the input configurations of their training conditions. In this work, we propose the use of graph neural networks to address this particular limitation, effectively increasing the range of applicability and getting more value out of the upfront RL training cost. GNNs can naturally process unstructured, threedimensional flow data, preserving spatial relationships without the constraints of a Cartesian grid. Additionally, they incorporate rotational, reflectional, and permutation invariance into the learned control policies, thus improving generalization and thereby removing the shortcomings of commonly used CNN or MLP architectures. To demonstrate the effectiveness of this approach, we revisit the well-established two-dimensional cylinder benchmark problem for active flow control. The RL training is implemented using Relexi, a high-performance RL framework, with flow simulations conducted in parallel using the high-order discontinuous Galerkin framework FLEXI. Our results show that GNN-based control policies achieve comparable performance to existing methods while benefiting from improved generalization properties. This work establishes GNNs as a promising architecture for RL-based flow control and highlights the capabilities of Relexi and FLEXI for large-scale RL applications in fluid dynamics. |
|
Data-driven modeling of fluid flow around rotating structures with graph neural networks | 2025-03-28 | ShowGraph neural networks, recently introduced into the field of fluid flow surrogate modeling, have been successfully applied to model the temporal evolution of various fluid flow systems. Existing applications, however, are mostly restricted to cases where the domain is time-invariant. The present work extends the application of graph neural network-based modeling to fluid flow around structures rotating with respect to a certain axis. Specifically, we propose to apply a graph neural network-based surrogate modeling for fluid flow with the mesh corotating with the structure. Unlike conventional data-driven approaches that rely on structured Cartesian meshes, our framework operates on unstructured co-rotating meshes, enforcing rotation equivariance of the learned model by leveraging co-rotating polar (2D) and cylindrical (3D) coordinate systems. To model the pressure for systems without Dirichlet pressure boundaries, we propose a novel local directed pressure difference formulation that is invariant to the reference pressure point and value. For flow systems with large mesh sizes, we introduce a scheme to train the network in single or distributed graphics processing units by accumulating the backpropagated gradients from partitions of the mesh. The effectiveness of our proposed framework is examined on two test cases: (i) fluid flow in a 2D rotating mixer, and (ii) the flow past a 3D rotating cube. Our results show that the model achieves stable and accurate rollouts for over 2000 time steps in periodic regimes while capturing accurate short-term dynamics in chaotic flow regimes. In addition, the drag and lift force predictions closely match the CFD calculations, highlighting the potential of the framework for modeling both periodic and chaotic fluid flow around rotating structures. |
|
A Social Dynamical System for Twitter Analysis | 2025-03-28 | ShowUnderstanding the evolution of public opinion is crucial for informed decision-making in various domains, particularly public affairs. The rapid growth of social networks, such as Twitter (now rebranded as X), provides an unprecedented opportunity to analyze public opinion at scale without relying on traditional surveys. With the rise of deep learning, Graph Neural Networks (GNNs) have shown great promise in modeling online opinion dynamics. Notably, classical opinion dynamics models, such as DeGroot, can be reformulated within a GNN framework. We introduce Latent Social Dynamical System (LSDS), a novel framework for modeling the latent dynamics of social media users' opinions based on textual content. Since expressed opinions may not fully reflect underlying beliefs, LSDS first encodes post content into latent representations. It then leverages a GraphODE framework, using a GNN-based ODE function to predict future opinions. A decoder subsequently utilizes these predicted latent opinions to perform downstream tasks, such as interaction prediction, which serve as benchmarks for model evaluation. Our framework is highly flexible, supporting various opinion dynamic models as ODE functions, provided they can be adapted into a GNN-based form. It also accommodates different encoder architectures and is compatible with diverse downstream tasks. To validate our approach, we constructed dynamic datasets from Twitter data. Experimental results demonstrate the effectiveness of LSDS, highlighting its potential for future applications. We plan to publicly release our dataset and code upon the publication of this paper. |
will ...will be submitted to a journal soon |
Few-Shot Graph Out-of-Distribution Detection with LLMs | 2025-03-28 | ShowExisting methods for graph out-of-distribution (OOD) detection typically depend on training graph neural network (GNN) classifiers using a substantial amount of labeled in-distribution (ID) data. However, acquiring high-quality labeled nodes in text-attributed graphs (TAGs) is challenging and costly due to their complex textual and structural characteristics. Large language models (LLMs), known for their powerful zero-shot capabilities in textual tasks, show promise but struggle to naturally capture the critical structural information inherent to TAGs, limiting their direct effectiveness. To address these challenges, we propose LLM-GOOD, a general framework that effectively combines the strengths of LLMs and GNNs to enhance data efficiency in graph OOD detection. Specifically, we first leverage LLMs' strong zero-shot capabilities to filter out likely OOD nodes, significantly reducing the human annotation burden. To minimize the usage and cost of the LLM, we employ it only to annotate a small subset of unlabeled nodes. We then train a lightweight GNN filter using these noisy labels, enabling efficient predictions of ID status for all other unlabeled nodes by leveraging both textual and structural information. After obtaining node embeddings from the GNN filter, we can apply informativeness-based methods to select the most valuable nodes for precise human annotation. Finally, we train the target ID classifier using these accurately annotated ID nodes. Extensive experiments on four real-world TAG datasets demonstrate that LLM-GOOD significantly reduces human annotation costs and outperforms state-of-the-art baselines in terms of both ID classification accuracy and OOD detection performance. |
|
Graph Sampling for Scalable and Expressive Graph Neural Networks on Homophilic Graphs | 2025-03-27 | ShowGraph Neural Networks (GNNs) excel in many graph machine learning tasks but face challenges when scaling to large networks. GNN transferability allows training on smaller graphs and applying the model to larger ones, but existing methods often rely on random subsampling, leading to disconnected subgraphs and reduced model expressivity. We propose a novel graph sampling algorithm that leverages feature homophily to preserve graph structure. By minimizing the trace of the data correlation matrix, our method better preserves the graph Laplacian trace -- a proxy for the graph connectivity -- than random sampling, while achieving lower complexity than spectral methods. Experiments on citation networks show improved performance in preserving Laplacian trace and GNN transferability compared to random sampling. |
|
Improving Equivariant Networks with Probabilistic Symmetry Breaking | 2025-03-27 | ShowEquivariance encodes known symmetries into neural networks, often enhancing generalization. However, equivariant networks cannot break symmetries: the output of an equivariant network must, by definition, have at least the same self-symmetries as the input. This poses an important problem, both (1) for prediction tasks on domains where self-symmetries are common, and (2) for generative models, which must break symmetries in order to reconstruct from highly symmetric latent spaces. This fundamental limitation can be addressed by considering equivariant conditional distributions, instead of equivariant functions. We present novel theoretical results that establish necessary and sufficient conditions for representing such distributions. Concretely, this representation provides a practical framework for breaking symmetries in any equivariant network via randomized canonicalization. Our method, SymPE (Symmetry-breaking Positional Encodings), admits a simple interpretation in terms of positional encodings. This approach expands the representational power of equivariant networks while retaining the inductive bias of symmetry, which we justify through generalization bounds. Experimental results demonstrate that SymPE significantly improves performance of group-equivariant and graph neural networks across diffusion models for graphs, graph autoencoders, and lattice spin system modeling. |
28 pages, 7 figures |
A Comprehensive Benchmark for RNA 3D Structure-Function Modeling | 2025-03-27 | ShowThe RNA structure-function relationship has recently garnered significant attention within the deep learning community, promising to grow in importance as nucleic acid structure models advance. However, the absence of standardized and accessible benchmarks for deep learning on RNA 3D structures has impeded the development of models for RNA functional characteristics. In this work, we introduce a set of seven benchmarking datasets for RNA structure-function prediction, designed to address this gap. Our library builds on the established Python library rnaglib, and offers easy data distribution and encoding, splitters and evaluation methods, providing a convenient all-in-one framework for comparing models. Datasets are implemented in a fully modular and reproducible manner, facilitating for community contributions and customization. Finally, we provide initial baseline results for all tasks using a graph neural network. Source code: https://github.com/cgoliver/rnaglib Documentation: https://rnaglib.org |
|
GNNMerge: Merging of GNN Models Without Accessing Training Data | 2025-03-27 | ShowModel merging has gained prominence in machine learning as a method to integrate multiple trained models into a single model without accessing the original training data. While existing approaches have demonstrated success in domains such as computer vision and NLP, their application to Graph Neural Networks (GNNs) remains unexplored. These methods often rely on the assumption of shared initialization, which is seldom applicable to GNNs. In this work, we undertake the first benchmarking study of model merging algorithms for GNNs, revealing their limited effectiveness in this context. To address these challenges, we propose GNNMerge, which utilizes a task-agnostic node embedding alignment strategy to merge GNNs. Furthermore, we establish that under a mild relaxation, the proposed optimization objective admits direct analytical solutions for widely used GNN architectures, significantly enhancing its computational efficiency. Empirical evaluations across diverse datasets, tasks, and architectures establish GNNMerge to be up to 24% more accurate than existing methods while delivering over 2 orders of magnitude speed-up compared to training from scratch. |
|
Fusion of Graph Neural Networks via Optimal Transport | 2025-03-27 | ShowIn this paper, we explore the idea of combining GCNs into one model. To that end, we align the weights of different models layer-wise using optimal transport (OT). We present and evaluate three types of transportation costs and show that the studied fusion method consistently outperforms the performance of vanilla averaging. Finally, we present results suggesting that model fusion using OT is harder in the case of GCNs than MLPs and that incorporating the graph structure into the process does not improve the performance of the method. |
|
Deep Cut-informed Graph Embedding and Clustering | 2025-03-27 | ShowGraph clustering aims to divide the graph into different clusters. The recently emerging deep graph clustering approaches are largely built on graph neural networks (GNN). However, GNN is designed for general graph encoding and there is a common issue of representation collapse in existing GNN-based deep graph clustering algorithms. We attribute two main reasons for such issues: (i) the inductive bias of GNN models: GNNs tend to generate similar representations for proximal nodes. Since graphs often contain a non-negligible amount of inter-cluster links, the bias results in error message passing and leads to biased clustering; (ii) the clustering guided loss function: most traditional approaches strive to make all samples closer to pre-learned cluster centers, which causes a degenerate solution assigning all data points to a single label thus make all samples and less discriminative. To address these challenges, we investigate graph clustering from a graph cut perspective and propose an innovative and non-GNN-based Deep Cut-informed Graph embedding and Clustering framework, namely DCGC. This framework includes two modules: (i) cut-informed graph encoding; (ii) self-supervised graph clustering via optimal transport. For the encoding module, we derive a cut-informed graph embedding objective to fuse graph structure and attributes by minimizing their joint normalized cut. For the clustering module, we utilize the optimal transport theory to obtain the clustering assignments, which can balance the guidance of "proximity to the pre-learned cluster center". With the above two tailored designs, DCGC is more suitable for the graph clustering task, which can effectively alleviate the problem of representation collapse and achieve better performance. We conduct extensive experiments to demonstrate that our method is simple but effective compared with benchmarks. |
|
CombiGCN: An effective GCN model for Recommender System | 2025-03-27 | ShowGraph Neural Networks (GNNs) have opened up a potential line of research for collaborative filtering (CF). The key power of GNNs is based on injecting collaborative signal into user and item embeddings which will contain information about user-item interactions after that. However, there are still some unsatisfactory points for a CF model that GNNs could have done better. The way in which the collaborative signal are extracted through an implicit feedback matrix that is essentially built on top of the message-passing architecture of GNNs, and it only helps to update the embedding based on the value of the items (or users) embeddings neighboring. By identifying the similarity weight of users through their interaction history, a key concept of CF, we endeavor to build a user-user weighted connection graph based on their similarity weight. In this study, we propose a recommendation framework, CombiGCN, in which item embeddings are only linearly propagated on the user-item interaction graph, while user embeddings are propagated simultaneously on both the user-user weighted connection graph and user-item interaction graph graphs with Light Graph Convolution (LGC) and combined in a simpler method by using the weighted sum of the embeddings for each layer. We also conducted experiments comparing CombiGCN with several state-of-the-art models on three real-world datasets. |
|
Improvement Graph Convolution Collaborative Filtering with Weighted addition input | 2025-03-27 | ShowGraph Neural Networks have been extensively applied in the field of machine learning to find features of graphs, and recommendation systems are no exception. The ratings of users on considered items can be represented by graphs which are input for many efficient models to find out the characteristics of the users and the items. From these insights, relevant items are recommended to users. However, user's decisions on the items have varying degrees of effects on different users, and this information should be learned so as not to be lost in the process of information mining. In this publication, we propose to build an additional graph showing the recommended weight of an item to a target user to improve the accuracy of GNN models. Although the users' friendships were not recorded, their correlation was still evident through the commonalities in consumption behavior. We build a model WiGCN (Weighted input GCN) to describe and experiment on well-known datasets. Conclusions will be stated after comparing our results with state-of-the-art such as GCMC, NGCF and LightGCN. The source code is also included at https://github.com/trantin84/WiGCN. |
|
Graph-to-Vision: Multi-graph Understanding and Reasoning using Vision-Language Models | 2025-03-27 | ShowGraph Neural Networks (GNNs), as the dominant paradigm for graph-structured learning, have long faced dual challenges of exponentially escalating computational complexity and inadequate cross-scenario generalization capability. With the rapid advancement of multimodal learning, Vision-Language Models (VLMs) have demonstrated exceptional cross-modal relational reasoning capabilities and generalization capacities, thereby opening up novel pathways for overcoming the inherent limitations of conventional graph learning paradigms. However, current research predominantly concentrates on investigating the single-graph reasoning capabilities of VLMs, which fundamentally fails to address the critical requirement for coordinated reasoning across multiple heterogeneous graph data in real-world application scenarios. To address these limitations, we propose the first multi-graph joint reasoning benchmark for VLMs. Our benchmark encompasses four graph categories: knowledge graphs, flowcharts, mind maps, and route maps,with each graph group accompanied by three progressively challenging instruction-response pairs. Leveraging this benchmark, we conducted comprehensive capability assessments of state-of-the-art VLMs and performed fine-tuning on open-source models. This study not only addresses the underexplored evaluation gap in multi-graph reasoning for VLMs but also empirically validates their generalization superiority in graph-structured learning. |
|
A Logic for Reasoning About Aggregate-Combine Graph Neural Networks | 2025-03-27 | ShowWe propose a modal logic in which counting modalities appear in linear inequalities. We show that each formula can be transformed into an equivalent graph neural network (GNN). We also show that a broad class of GNNs can be transformed efficiently into a formula, thus significantly improving upon the literature about the logical expressiveness of GNNs. We also show that the satisfiability problem is PSPACE-complete. These results bring together the promise of using standard logical methods for reasoning about GNNs and their properties, particularly in applications such as GNN querying, equivalence checking, etc. We prove that such natural problems can be solved in polynomial space. |
arXiv...arXiv admin note: text overlap with arXiv:2307.05150 |
AugWard: Augmentation-Aware Representation Learning for Accurate Graph Classification | 2025-03-27 | ShowHow can we accurately classify graphs? Graph classification is a pivotal task in data mining with applications in social network analysis, web analysis, drug discovery, molecular property prediction, etc. Graph neural networks have achieved the state-of-the-art performance in graph classification, but they consistently struggle with overfitting. To mitigate overfitting, researchers have introduced various representation learning methods utilizing graph augmentation. However, existing methods rely on simplistic use of graph augmentation, which loses augmentation-induced differences and limits the expressiveness of representations. In this paper, we propose AugWard (Augmentation-Aware Training with Graph Distance and Consistency Regularization), a novel graph representation learning framework that carefully considers the diversity introduced by graph augmentation. AugWard applies augmentation-aware training to predict the graph distance between the augmented graph and its original one, aligning the representation difference directly with graph distance at both feature and structure levels. Furthermore, AugWard employs consistency regularization to encourage the classifier to handle richer representations. Experimental results show that AugWard gives the state-of-the-art performance in supervised, semi-supervised graph classification, and transfer learning. |
Accep...Accepted to PAKDD 2025 (Oral Presentation) |
MCBLT: Multi-Camera Multi-Object 3D Tracking in Long Videos | 2025-03-26 | ShowObject perception from multi-view cameras is crucial for intelligent systems, particularly in indoor environments, e.g., warehouses, retail stores, and hospitals. Most traditional multi-target multi-camera (MTMC) detection and tracking methods rely on 2D object detection, single-view multi-object tracking (MOT), and cross-view re-identification (ReID) techniques, without properly handling important 3D information by multi-view image aggregation. In this paper, we propose a 3D object detection and tracking framework, named MCBLT, which first aggregates multi-view images with necessary camera calibration parameters to obtain 3D object detections in bird's-eye view (BEV). Then, we introduce hierarchical graph neural networks (GNNs) to track these 3D detections in BEV for MTMC tracking results. Unlike existing methods, MCBLT has impressive generalizability across different scenes and diverse camera settings, with exceptional capability for long-term association handling. As a result, our proposed MCBLT establishes a new state-of-the-art on the AICity'24 dataset with |
|
Symmetry-Informed Graph Neural Networks for Carbon Dioxide Isotherm and Adsorption Prediction in Aluminum-Substituted Zeolites | 2025-03-26 | ShowAccurately predicting adsorption properties in nanoporous materials using Deep Learning models remains a challenging task. This challenge becomes even more pronounced when attempting to generalize to structures that were not part of the training data.. In this work, we introduce SymGNN, a graph neural network architecture that leverages material symmetries to improve adsorption property prediction. By incorporating symmetry operations into the message-passing mechanism, our model enhances parameter sharing across different zeolite topologies, leading to improved generalization. We evaluate SymGNN on both interpolation and generalization tasks, demonstrating that it successfully captures key adsorption trends, including the influence of both the framework and aluminium distribution on CO$_2$ adsorption. Furthermore, we apply our model to the characterization of experimental adsorption isotherms, using a genetic algorithm to infer likely aluminium distributions. Our results highlight the effectiveness of machine learning models trained on simulations for studying real materials and suggest promising directions for fine-tuning with experimental data and generative approaches for the inverse design of multifunctional nanomaterials. |
|
$β$-GNN: A Robust Ensemble Approach Against Graph Structure Perturbation | 2025-03-26 | ShowGraph Neural Networks (GNNs) are playing an increasingly important role in the efficient operation and security of computing systems, with applications in workload scheduling, anomaly detection, and resource management. However, their vulnerability to network perturbations poses a significant challenge. We propose |
This ...This is the author's version of the paper accepted at EuroMLSys 2025 |
Valid Conformal Prediction for Dynamic GNNs | 2025-03-26 | ShowDynamic graphs provide a flexible data abstraction for modelling many sorts of real-world systems, such as transport, trade, and social networks. Graph neural networks (GNNs) are powerful tools allowing for different kinds of prediction and inference on these systems, but getting a handle on uncertainty, especially in dynamic settings, is a challenging problem. In this work we propose to use a dynamic graph representation known in the tensor literature as the unfolding, to achieve valid prediction sets via conformal prediction. This representation, a simple graph, can be input to any standard GNN and does not require any modification to existing GNN architectures or conformal prediction routines. One of our key contributions is a careful mathematical consideration of the different inference scenarios which can arise in a dynamic graph modelling context. For a range of practically relevant cases, we obtain valid prediction sets with almost no assumptions, even dispensing with exchangeability. In a more challenging scenario, which we call the semi-inductive regime, we achieve valid prediction under stronger assumptions, akin to stationarity. We provide real data examples demonstrating validity, showing improved accuracy over baselines, and sign-posting different failure modes which can occur when those assumptions are violated. |
25 pages, 6 figures |
Towards Efficient Training of Graph Neural Networks: A Multiscale Approach | 2025-03-26 | ShowGraph Neural Networks (GNNs) have emerged as a powerful tool for learning and inferring from graph-structured data, and are widely used in a variety of applications, often considering large amounts of data and large graphs. However, training on such data requires large memory and extensive computations. In this paper, we introduce a novel framework for efficient multiscale training of GNNs, designed to integrate information across multiscale representations of a graph. Our approach leverages a hierarchical graph representation, taking advantage of coarse graph scales in the training process, where each coarse scale graph has fewer nodes and edges. Based on this approach, we propose a suite of GNN training methods: such as coarse-to-fine, sub-to-full, and multiscale gradient computation. We demonstrate the effectiveness of our methods on various datasets and learning tasks. |
|
Multi-dataset and Transfer Learning Using Gene Expression Knowledge Graphs | 2025-03-26 | ShowGene expression datasets offer insights into gene regulation mechanisms, biochemical pathways, and cellular functions. Additionally, comparing gene expression profiles between disease and control patients can deepen the understanding of disease pathology. Therefore, machine learning has been used to process gene expression data, with patient diagnosis emerging as one of the most popular applications. Although gene expression data can provide valuable insights, challenges arise because the number of patients in expression datasets is usually limited, and the data from different datasets with different gene expressions cannot be easily combined. This work proposes a novel methodology to address these challenges by integrating multiple gene expression datasets and domain-specific knowledge using knowledge graphs, a unique tool for biomedical data integration. Then, vector representations are produced using knowledge graph embedding techniques, which are used as inputs for a graph neural network and a multi-layer perceptron. We evaluate the efficacy of our methodology in three settings: single-dataset learning, multi-dataset learning, and transfer learning. The experimental results show that combining gene expression datasets and domain-specific knowledge improves patient diagnosis in all three settings. |
Accep...Accepted at the Extended Semantic Web Conference 2025 |
Graph-Level Label-Only Membership Inference Attack against Graph Neural Networks | 2025-03-26 | ShowGraph neural networks (GNNs) are widely used for graph-structured data but are vulnerable to membership inference attacks (MIAs) in graph classification tasks, which determine if a graph was part of the training dataset, potentially causing data leakage. Existing MIAs rely on prediction probability vectors, but they become ineffective when only prediction labels are available. We propose a Graph-level Label-Only Membership Inference Attack (GLO-MIA), which is based on the intuition that the target model's predictions on training data are more stable than those on testing data. GLO-MIA generates a set of perturbed graphs for target graph by adding perturbations to its effective features and queries the target model with the perturbed graphs to get their prediction labels, which are then used to calculate robustness score of the target graph. Finally, by comparing the robustness score with a predefined threshold, the membership of the target graph can be inferred correctly with high probability. Our evaluation on three datasets and four GNN models shows that GLO-MIA achieves an attack accuracy of up to 0.825, outperforming baseline work by 8.5% and closely matching the performance of probability-based MIAs, even with only prediction labels. |
|
BeLightRec: A lightweight recommender system enhanced with BERT | 2025-03-26 | ShowThe trend of data mining using deep learning models on graph neural networks has proven effective in identifying object features through signal encoders and decoders, particularly in recommendation systems utilizing collaborative filtering methods. Collaborative filtering exploits similarities between users and items from historical data. However, it overlooks distinctive information, such as item names and descriptions. The semantic data of items should be further mined using models in the natural language processing field. Thus, items can be compared using text classification, similarity assessments, or identifying analogous sentence pairs. This research proposes combining two sources of item similarity signals: one from collaborative filtering and one from the semantic similarity measure between item names and descriptions. These signals are integrated into a graph convolutional neural network to optimize model weights, thereby providing accurate recommendations. Experiments are also designed to evaluate the contribution of each signal group to the recommendation results. |
|
TransPlace: Transferable Circuit Global Placement via Graph Neural Network | 2025-03-26 | ShowGlobal placement, a critical step in designing the physical layout of computer chips, is essential to optimize chip performance. Prior global placement methods optimize each circuit design individually from scratch. Their neglect of transferable knowledge limits solution efficiency and chip performance as circuit complexity drastically increases. This study presents TransPlace, a global placement framework that learns to place millions of mixed-size cells in continuous space. TransPlace introduces i) Netlist Graph to efficiently model netlist topology, ii) Cell-flow and relative position encoding to learn SE(2)-invariant representation, iii) a tailored graph neural network architecture for informed parameterization of placement knowledge, and iv) a two-stage strategy for coarse-to-fine placement. Compared to state-of-the-art placement methods, TransPlace-trained on a few high-quality placements-can place unseen circuits with 1.2x speedup while reducing congestion by 30%, timing by 9%, and wirelength by 5%. |
Accepted at KDD 2025 |
PowerGNN: A Topology-Aware Graph Neural Network for Electricity Grids | 2025-03-26 | ShowThe increasing penetration of renewable energy sources introduces significant variability and uncertainty in modern power systems, making accurate state prediction critical for reliable grid operation. Conventional forecasting methods often neglect the power grid's inherent topology, limiting their ability to capture complex spatio temporal dependencies. This paper proposes a topology aware Graph Neural Network (GNN) framework for predicting power system states under high renewable integration. We construct a graph based representation of the power network, modeling buses and transmission lines as nodes and edges, and introduce a specialized GNN architecture that integrates GraphSAGE convolutions with Gated Recurrent Units (GRUs) to model both spatial and temporal correlations in system dynamics. The model is trained and evaluated on the NREL 118 test system using realistic, time synchronous renewable generation profiles. Our results show that the proposed GNN outperforms baseline approaches including fully connected neural networks, linear regression, and rolling mean models, achieving substantial improvements in predictive accuracy. The GNN achieves average RMSEs of 0.13 to 0.17 across all predicted variables and demonstrates consistent performance across spatial locations and operational conditions. These results highlight the potential of topology aware learning for scalable and robust power system forecasting in future grids with high renewable penetration. |
|
Peer Disambiguation in Self-Reported Surveys using Graph Attention Networks | 2025-03-25 | ShowStudying peer relationships is crucial in solving complex challenges underserved communities face and designing interventions. The effectiveness of such peer-based interventions relies on accurate network data regarding individual attributes and social influences. However, these datasets are often collected through self-reported surveys, introducing ambiguities in network construction. These ambiguities make it challenging to fully utilize the network data to understand the issues and to design the best interventions. We propose and solve two variations of link ambiguities in such network data -- (i) which among the two candidate links exists, and (ii) if a candidate link exists. We design a Graph Attention Network (GAT) that accounts for personal attributes and network relationships on real-world data with real and simulated ambiguities. We also demonstrate that by resolving these ambiguities, we improve network accuracy, and in turn, improve suicide risk prediction. We also uncover patterns using GNNExplainer to provide additional insights into vital features and relationships. This research demonstrates the potential of Graph Neural Networks (GNN) to advance real-world network data analysis facilitating more effective peer interventions across various fields. |
|
Truck Parking Usage Prediction with Decomposed Graph Neural Networks | 2025-03-25 | ShowTruck parking on freight corridors faces the major challenge of insufficient parking spaces. This is exacerbated by the Hour-of-Service (HOS) regulations, which often result in unauthorized parking practices, causing safety concerns. It has been shown that providing accurate parking usage prediction can be a cost-effective solution to reduce unsafe parking practices. In light of this, existing studies have developed various methods to predict the usage of a truck parking site and have demonstrated satisfactory accuracy. However, these studies focused on a single parking site, and few approaches have been proposed to predict the usage of multiple truck parking sites considering spatio-temporal dependencies, due to the lack of data. This paper aims to fill this gap and presents the Regional Temporal Graph Convolutional Network (RegT-GCN) to predict parking usage across the entire state to provide more comprehensive truck parking information. The framework leverages the topological structures of truck parking site locations and historical parking data to predict the occupancy rate considering spatio-temporal dependencies across a state. To achieve this, we introduce a Regional Decomposition approach, which effectively captures the geographical characteristics of the truck parking locations and their spatial correlations. Evaluation results demonstrate that the proposed model outperforms other baseline models, showing the effectiveness of our regional decomposition. The code is available at https://github.com/raynbowy23/RegT-GCN. |