πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ Robo-Saber: Generating and Simulating Virtual Reality Players
πŸ—“οΈ Published: 2/20/2026
πŸ”— http://arxiv.org/abs/2602.18319v1
πŸ‘₯ Authors: Nam Hee Kim, Jingjing May Liu, Jaakko Lehtinen (possible past Nvidia (United States) affiliation), Perttu HΓ€mΓ€lΓ€inen, James F. O'brien, Xue Bin Peng (possible past University Of California, Berkeley affiliation)
Abstract

We present the first motion generation system for playtesting virtual reality (VR) games. Our player model generates VR headset and handheld controller movements from in-game object arrangements, guided by style exemplars and aligned to maximize simulated gameplay score. We train on the large BOXRR-23 dataset and apply our framework on the popular VR game Beat Saber. The resulting model Robo-Saber produces skilled gameplay and captures diverse player behaviors, mirroring the skill levels and mov...

πŸ“„ Reverso: Efficient Time Series Foundation Models for Zero-shot Forecasting
πŸ—“οΈ Published: 2/19/2026
πŸ”— http://arxiv.org/abs/2602.17634v1
πŸ‘₯ Authors: Xinghong Fu, Yanhong Li (possible past Baidu (China) affiliation), Georgios Papaioannou, Yoon Kim (possible past University Of Oxford affiliation)
Abstract

Learning time series foundation models has been shown to be a promising approach for zero-shot time series forecasting across diverse time series domains. Insofar as scaling has been a critical driver of performance of foundation models in other modalities such as language and vision, much recent work on time series foundation modeling has focused on scaling. This has resulted in time series foundation models with hundreds of millions of parameters that are, while performant, inefficient and exp...

πŸ“„ AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games
πŸ—“οΈ Published: 2/19/2026
πŸ”— http://arxiv.org/abs/2602.17594v1
πŸ‘₯ Authors: Lance Ying, Ryan Truong, Prafull Sharma, Kaiya Ivy Zhao, Nathan Cloos, Kelsey R. Allen, Thomas L. Griffiths (possible past University Of California, Berkeley affiliation), Katherine M. Collins, JosΓ© HernΓ‘ndez-Orallo, Phillip Isola (possible past University Of California, Berkeley affiliation), Samuel J. Gershman, Joshua B. Tenenbaum (possible past Massachusetts Institute Of Technology affiliation)
Abstract

Rigorously evaluating machine intelligence against the broad spectrum of human general intelligence has become increasingly important and challenging in this era of rapid technological advance. Conventional AI benchmarks typically assess only narrow capabilities in a limited range of human activity. Most are also static, quickly saturating as developers explicitly or implicitly optimize for them. We propose that a more promising way to evaluate human-like general intelligence in AI systems is th...

πŸ“„ Improving LLM-based Recommendation with Self-Hard Negatives from Intermediate Layers
πŸ—“οΈ Published: 2/19/2026
πŸ”— http://arxiv.org/abs/2602.17410v1
πŸ‘₯ Authors: Bingqian Li, Bowen Zheng, Xiaolei Wang, Long Zhang, Jinpeng Wang (possible past Tencent (China) affiliation), Sheng Chen, Wayne Xin Zhao (possible past Baidu (China) affiliation), Ji-Rong Wen
Abstract

Large language models (LLMs) have shown great promise in recommender systems, where supervised fine-tuning (SFT) is commonly used for adaptation. Subsequent studies further introduce preference learning to incorporate negative samples into the training process. However, existing methods rely on sequence-level, offline-generated negatives, making them less discriminative and informative when adapting LLMs to recommendation tasks with large negative item spaces. To address these challenges, we pro...

πŸ“„ Towards Cross-lingual Values Assessment: A Consensus-Pluralism Perspective
πŸ—“οΈ Published: 2/19/2026
πŸ”— http://arxiv.org/abs/2602.17283v1
πŸ‘₯ Authors: Yukun Chen, Xinyu Zhang (possible past Baidu (China) affiliation), Jialong Tang, Yu Wan, Baosong Yang (possible past Tencent (China) affiliation), Yiming Li (possible past Tsinghua University affiliation), Zhan Qin, Kui Ren
Abstract

While large language models (LLMs) have become pivotal to content safety, current evaluation paradigms primarily focus on detecting explicit harms (e.g., violence or hate speech), neglecting the subtler value dimensions conveyed in digital content. To bridge this gap, we introduce X-Value, a novel Cross-lingual Values Assessment Benchmark designed to evaluate LLMs' ability to assess deep-level values of content from a global perspective. X-Value consists of more than 5,000 QA pairs across 18 lan...

πŸ“„ Deeper detection limits in astronomical imaging using self-supervised spatiotemporal denoising
πŸ—“οΈ Published: 2/19/2026
πŸ”— http://arxiv.org/abs/2602.17205v1
πŸ‘₯ Authors: Yuduo Guo, Hao Zhang (possible past Tencent (China) affiliation), Mingyu Li, Fujiang Yu, Yunjing Wu, Yuhan Hao, Song Huang, Yongming Liang, Xiaojing Lin, Xinyang Li, Jiamin Wu, Zheng Cai, Qionghai Dai (possible past Tsinghua University affiliation)
Abstract

The detection limit of astronomical imaging observations is limited by several noise sources. Some of that noise is correlated between neighbouring image pixels and exposures, so in principle could be learned and corrected. We present an astronomical self-supervised transformer-based denoising algorithm (ASTERIS), that integrates spatiotemporal information across multiple exposures. Benchmarking on mock data indicates that ASTERIS improves detection limits by 1.0 magnitude at 90% completeness an...

πŸ“„ LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation
πŸ—“οΈ Published: 2/18/2026
πŸ”— http://arxiv.org/abs/2602.16953v1
πŸ‘₯ Authors: Hejia Zhang, Zhongming Yu, Chia-Tung Ho, Haoxing Ren (possible past Nvidia (United States) affiliation), Brucek Khailany (possible past Nvidia (United States) affiliation), Jishen Zhao
Abstract

Execution-aware LLM agents offer a promising paradigm for learning from tool feedback, but such feedback is often expensive and slow to obtain, making online reinforcement learning (RL) impractical. High-coverage hardware verification exemplifies this challenge due to its reliance on industrial simulators and non-differentiable execution signals. We propose LLM4Cov, an offline agent-learning framework that models verification as memoryless state transitions guided by deterministic evaluators. Bu...

πŸ“„ The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning
πŸ—“οΈ Published: 2/20/2026
πŸ”— http://arxiv.org/abs/2602.18428v1
πŸ‘₯ Authors: Mojtaba Sahraee-Ardakan, Mauricio Delbracio (possible past Google (United States) affiliation), Peyman Milanfar (possible past Google (United States) affiliation)
Abstract

Autonomous (noise-agnostic) generative models, such as Equilibrium Matching and blind diffusion, challenge the standard paradigm by learning a single, time-invariant vector field that operates without explicit noise-level conditioning. While recent work suggests that high-dimensional concentration allows these models to implicitly estimate noise levels from corrupted observations, a fundamental paradox remains: what is the underlying landscape being optimized when the noise level is treated as a...

πŸ“„ A Deep Surrogate Model for Robust and Generalizable Long-Term Blast Wave Prediction
πŸ—“οΈ Published: 2/20/2026
πŸ”— http://arxiv.org/abs/2602.18168v1
πŸ‘₯ Authors: Danning Jing, Xinhai Chen, Xifeng Pu, Jie Hu, Chao Huang (possible past Tencent (China) affiliation), Xuguang Chen, Qinglin Wang, Jie Liu (possible past Tencent (China) affiliation)
Abstract

Accurately modeling the spatio-temporal dynamics of blast wave propagation remains a longstanding challenge due to its highly nonlinear behavior, sharp gradients, and burdensome computational cost. While machine learning-based surrogate models offer fast inference as a promising alternative, they suffer from degraded accuracy, particularly evaluated on complex urban layouts or out-of-distribution scenarios. Moreover, autoregressive prediction strategies in such models are prone to error accumula...

πŸ“„ Asynchronous Heavy-Tailed Optimization
πŸ—“οΈ Published: 2/20/2026
πŸ”— http://arxiv.org/abs/2602.18002v1
πŸ‘₯ Authors: Junfei Sun, Dixi Yao, Xuchen Gong, Tahseen Rabbani, Manzil Zaheer (possible past Google (United States) affiliation), Tian Li (possible past Carnegie Mellon University affiliation)
Abstract

Heavy-tailed stochastic gradient noise, commonly observed in transformer models, can destabilize the optimization process. Recent works mainly focus on developing and understanding approaches to address heavy-tailed noise in the centralized or distributed, synchronous setting, leaving the interactions between such noise and asynchronous optimization underexplored. In this work, we investigate two communication schemes that handle stragglers with asynchronous updates in the presence of heavy-tail...

πŸ“„ JAX-Privacy: A library for differentially private machine learning
πŸ—“οΈ Published: 2/19/2026
πŸ”— http://arxiv.org/abs/2602.17861v1
πŸ‘₯ Authors: Ryan Mckenna, Galen Andrew (possible past Google (United States) affiliation), Borja Balle (possible past Deepmind (United Kingdom) affiliation), Vadym Doroshenko, Arun Ganesh, Weiwei Kong, Alex Kurakin, Brendan Mcmahan (possible past Google (United States) affiliation), Mikhail Pravilov
Abstract

JAX-Privacy is a library designed to simplify the deployment of robust and performant mechanisms for differentially private machine learning. Guided by design principles of usability, flexibility, and efficiency, JAX-Privacy serves both researchers requiring deep customization and practitioners who want a more out-of-the-box experience. The library provides verified, modular primitives for critical components for all aspects of the mechanism design including batch selection, gradient clipping, n...

πŸ“„ Dual Length Codes for Lossless Compression of BFloat16
πŸ—“οΈ Published: 2/19/2026
πŸ”— http://arxiv.org/abs/2602.17849v1
πŸ‘₯ Authors: Aditya Agrawal (possible past Nvidia (United States) affiliation), Albert Magyar, Hiteshwar Eswaraiah, Patrick Sheridan, Pradeep Janedula, Ravi Krishnan Venkatesan, Krishna Nair, Ravi Iyer (possible past Meta (United States) affiliation)
Abstract

Training and serving Large Language Models (LLMs) relies heavily on parallelization and collective operations, which are frequently bottlenecked by network bandwidth. Lossless compression using e.g., Huffman codes can alleviate the issue, however, Huffman codes suffer from slow, bit-sequential decoding and high hardware complexity due to deep tree traversals. Universal codes e.g., Exponential-Golomb codes are faster to decode but do not exploit the symbol frequency distributions. To address thes...

πŸ“„ A Theoretical Framework for Modular Learning of Robust Generative Models
πŸ—“οΈ Published: 2/19/2026
πŸ”— http://arxiv.org/abs/2602.17554v1
πŸ‘₯ Authors: Corinna Cortes (possible past Google (United States) affiliation), Mehryar Mohri (possible past Google (United States) affiliation), Yutao Zhong
Abstract

Training large-scale generative models is resource-intensive and relies heavily on heuristic dataset weighting. We address two fundamental questions: Can we train Large Language Models (LLMs) modularly-combining small, domain-specific experts to match monolithic performance-and can we do so robustly for any data mixture, eliminating heuristic tuning? We present a theoretical framework for modular generative modeling where a set of pre-trained experts are combined via a gating mechanism. We defin...

πŸ“„ Retrospective In-Context Learning for Temporal Credit Assignment with Large Language Models
πŸ—“οΈ Published: 2/19/2026
πŸ”— http://arxiv.org/abs/2602.17497v1
πŸ‘₯ Authors: Wen-Tse Chen, Jiayu Chen, Fahim Tajwar, Hao Zhu (possible past Tsinghua University affiliation), Xintong Duan, Ruslan Salakhutdinov (possible past University Of Toronto affiliation), Jeff Schneider
Abstract

Learning from self-sampled data and sparse environmental feedback remains a fundamental challenge in training self-evolving agents. Temporal credit assignment mitigates this issue by transforming sparse feedback into dense supervision signals. However, previous approaches typically depend on learning task-specific value functions for credit assignment, which suffer from poor sample efficiency and limited generalization. In this work, we propose to leverage pretrained knowledge from large languag...

πŸ“„ Unified Latents (UL): How to train your latents
πŸ—“οΈ Published: 2/19/2026
πŸ”— http://arxiv.org/abs/2602.17270v1
πŸ‘₯ Authors: Jonathan Heek (possible past Google (United States) affiliation), Emiel Hoogeboom, Thomas Mensink, Tim Salimans (possible past Openai (United States) affiliation)
Abstract

We present Unified Latents (UL), a framework for learning latent representations that are jointly regularized by a diffusion prior and decoded by a diffusion model. By linking the encoder's output noise to the prior's minimum noise level, we obtain a simple training objective that provides a tight upper bound on the latent bitrate. On ImageNet-512, our approach achieves competitive FID of 1.4, with high reconstruction quality (PSNR) while requiring fewer training FLOPs than models trained on Sta...

πŸ“„ Anti-causal domain generalization: Leveraging unlabeled data
πŸ—“οΈ Published: 2/19/2026
πŸ”— http://arxiv.org/abs/2602.17187v1
πŸ‘₯ Authors: Sorawit Saengkyongam, Juan L. Gamella, Andrew C. Miller (possible past Google (United States) affiliation), Jonas Peters (possible past Eth Zurich affiliation), Nicolai Meinshausen, Christina Heinze-Deml
Abstract

The problem of domain generalization concerns learning predictive models that are robust to distribution shifts when deployed in new, previously unseen environments. Existing methods typically require labeled data from multiple training environments, limiting their applicability when labeled data are scarce. In this work, we study domain generalization in an anti-causal setting, where the outcome causes the observed covariates. Under this structure, environment perturbations that affect the cova...

*Notable papers are those with at least two authors from a "big" AI/ML lab.