📄 Notable* Recent AI/ML arXiv Papers

Last updated just now...

📄 Introspective Coupling: Self-Explanation Training Tracks Behavioral Change Despite Fixed Supervision
🗓️ Published: 6/30/2026
🔗 http://arxiv.org/abs/2606.32038v1
👥 Authors: Zifan Carl Guo, Laura Ruis, Jacob Andreas (possible past University Of California, Berkeley affiliation), Belinda Z. Li (possible past University Of Washington affiliation)
Abstract

When does training language models (LMs) to generate explanations of their predictions yield faithful introspection, rather than superficial imitation? We study LMs trained to explain which features of their inputs influenced their behavior, using models' counterfactual behavior on modified inputs as supervision. Surprisingly, we find that LMs trained on fixed counterfactual explanations derived from earlier checkpoints of themselves, or even from behaviorally similar models in different familie...

📄 AdaJEPA: An Adaptive Latent World Model
🗓️ Published: 6/30/2026
🔗 http://arxiv.org/abs/2606.32026v1
👥 Authors: Ying Wang (possible past Tsinghua University affiliation), Oumayma Bounou, Yann Lecun (possible past Meta (United States) affiliation), Mengye Ren (possible past University Of Toronto affiliation)
Abstract

Latent world models enable planning from high-dimensional observations by predicting future states in a compact latent space. However, these models are typically kept frozen at test time: when their predictions become inaccurate, planning can fail, especially under test-time distribution shift. To address this, we propose AdaJEPA, an adaptive latent world model that performs test-time adaptation within the closed loop of model predictive control (MPC). After training, AdaJEPA plans and executes ...

📄 GR2 Technical Report
🗓️ Published: 6/30/2026
🔗 http://arxiv.org/abs/2606.31984v1
👥 Authors: Yufei Li, Zaiwei Zhang, Mingfu Liang, Kavosh Asadi, Jay Xu, Jimmy Kim, Chongyang Bai, Jieyi Zhang, Hongye Xie, Prachi Agrawal, Dian Yu (possible past Tencent (China) affiliation), Tianyi Chen, Jean-Pascal Billaud, Garret Buell, Yk, Zhu, Sachin Patil, Brooke Bian, Zhou Fang, Kevin Huang, Shiva Sudanagunta, Yuzhen Huang, Emma Lu, Chris O'brien, Yang Song (possible past Stanford University affiliation), Lihong Li (possible past Microsoft (United States) affiliation), Jacob Tao, Zhicheng Zhu, Chao Li (possible past Baidu (China) affiliation), Gaoxiang Liu, Neil Wu, Zhongyin Hu, Li Han, Loki Chen, Ming Lei, Greg Rehm, Siyuan Song, Tianwei Zhang, Li Li (possible past Google (United States) affiliation), Ketan Singh, Yavuz Yetim, Ilyas Atishev, Satendra Gera, Ashkan Sadeghi, Rachel Yan, Nikko Mizutani, Shuaiwen Wang, Song Yang, Zhijing Li, Jiang Liu, Mengying Sun, Fei Tian, Xiaohan Wei, Chonglin Sun, Parish Aggarwal, Kaushik Rangadurai, Zhi Hua, Frank Shyu, Ruchit Sharma, Liyuan Li, Shike Mei, Wenlin Chen (possible past Meta (United States) affiliation), Santanu Kolay, Ben Schulte, Deepak Chandra (possible past Google (United States) affiliation), Adam, Song, Sandeep Pandey, Xi Liu, Hamed Firooz, Luke Simon
Abstract

Industrial recommendation systems serve billions of users through a multi-stage funnel -- retrieval, early-stage ranking, and re-ranking -- where the final re-ranking step disproportionately shapes user engagement and downstream performance, particularly for carousel and grid display formats. Despite growing enthusiasm for Large Language Models (LLMs) in recommendation, three gaps hinder industrial adoption: (1) most efforts target retrieval and ranking, leaving re-ranking -- the stage closest t...

📄 LUNA: Learning Universal 3D Human Animation Beyond Skinning
🗓️ Published: 6/30/2026
🔗 http://arxiv.org/abs/2606.31981v1
👥 Authors: Peng Li (possible past Tsinghua University affiliation), Rawal Khirodkar (possible past Carnegie Mellon University affiliation), Junxuan Li, Yuan Dong, Chen Cao, Yuan Liu (possible past Google (United States) affiliation), Wenhan Luo (possible past Tencent (China) affiliation), Yike Guo, Shunsuke Saito (possible past Meta (United States) affiliation)
Abstract

Creating photorealistic, animatable 3D human avatars from monocular images still largely depends on Linear Blend Skinning (LBS) and parametric body models, which constrain expressivity and often introduce artifacts due to imperfect fitting. We propose LUNA, an LBS-free universal neural animation model that directly maps multiple 2D controls like images, keypoints, sketches, and unseen characters into 3D Gaussian deformations, bypassing explicit body fitting. At its core, a transformer-based moti...

📄 ShopX: A Foundation Model for Intent-to-Item Fulfillment in Agentic Shopping
🗓️ Published: 6/30/2026
🔗 http://arxiv.org/abs/2606.31693v1
👥 Authors: Jiacheng Chen, Tao Zhang (possible past Nvidia (United States) affiliation), Manxi Lin, Dunxian Huang, Teng Shi, Honghao Fu, Mengyan Li, Xinming Zhang, Chenchi Zhang, Xuan Lu, Xiaoxiong Du, Haibin Chen, Shaolin Ye, Hao Chang, Xiaoqi Li, Shuwen Xiao, Yujin Yuan, Jingxuan Feng, Shaopan Xiong, Huimin Yi, Ju Huang, Qiu Shen, Ying Chen (possible past Baidu (China) affiliation), Junjun Zheng, Xiangheng Kong, Yuning Jiang
Abstract

The wave of AI-native applications is moving shopping beyond page- and feed-based browsing toward intent-driven experiences orchestrated by LLM agents. A common design wraps an LLM around existing search and recommendation pipelines, forcing complex intents through low-bandwidth retrieval or ranking interfaces and leaving a gap between language understanding and item-space fulfillment. Generative recommendation gives LLMs a direct item-space interface through semantic IDs (SIDs), but existing mo...

📄 WorldRoamBench: An Open-World Benchmark for Long-Horizon Stability of Interactive World Models
🗓️ Published: 6/30/2026
🔗 http://arxiv.org/abs/2606.31672v1
👥 Authors: Ting-Bing Xu, Jiacheng Sui, Zhe Gao, Kewei Shi, Wenjin Yang, Zhicheng Liu, Zhaoxu Sun, Mingchao Sun, Hongyu Pan, Fan Jiang (possible past Shanghai Jiao Tong University affiliation), Mu Xu, Qi Fan, Yong Li (possible past Tsinghua University affiliation), Baoquan Chen (possible past Peking University affiliation)
Abstract

Despite rapid progress in interactive world models (IWMs), existing benchmarks evaluate action following only at trajectory level and ignore memory and interaction physics. We introduce WorldRoamBench, an open-world benchmark for long-horizon stability across four dimensions, each with tailored innovations: (i) Action: per-frame action metric bypassing cross-model semantic scale disparity and exposing failures hidden by trajectory; (ii) Vision: segment-based drift metric capturing non-monotonic ...

📄 FLARE-AI: Flaw Reporting for AI
🗓️ Published: 6/30/2026
🔗 http://arxiv.org/abs/2606.31567v1
👥 Authors: Shayne Longpre (possible past Apple (United States) affiliation), Elaine Zhu, Carson Ezell, Avijit Ghosh, Sean Mcgregor, Kevin Paeth, Kevin Klyman, Sayash Kapoor, Rishi Bommasani, Ruth Appel, Gregory Strom, Lauren Mcilvenny, Mark M. Jaycox, Peter Slattery, Nathan Butters, Arvind Narayanan, Percy Liang (possible past Stanford University affiliation), Alex Pentland (possible past Massachusetts Institute Of Technology affiliation)
Abstract

Flaw reporting for deployed AI systems is fundamental to identifying system failures and improving AI safety. Yet the AI reporting ecosystem is fragmented: researchers who identify flaws often do not know what or where to report, and groups who receive reports rarely share them with other relevant stakeholders. As a result, good-faith reporters duplicate effort by submitting many different forms, and recipients lack standardized, triage-ready information. We audit 12 reporting systems published ...

📄 Robustness of Robotic Manipulation: Foundations and Frontiers
🗓️ Published: 6/30/2026
🔗 http://arxiv.org/abs/2606.31494v1
👥 Authors: Yifei Dong, Zhanyi Sun, Lujie Yang, Manuel Baum, Kei Ikemura, Shuran Song (possible past Google (United States) affiliation), Florian T. Pokorny (possible past University Of California, Berkeley affiliation), Xianyi Cheng
Abstract

Humans and animals exhibit remarkable robustness in physical manipulation, yet robots remain far behind. Progress toward human-level manipulation robustness is hindered by the absence of a unified and systematic understanding: different subfields frame robustness in distinct ways, often leaving the concept ambiguous and limiting deeper analysis as well as communication across research areas. This paper presents a systematic study of manipulation robustness. We begin with a formal definition, cha...

📄 One Reflection Is Not Enough: Self-Correcting Autonomous Research via Multi-Hypothesis Failure Attribution
🗓️ Published: 6/30/2026
🔗 http://arxiv.org/abs/2606.31478v1
👥 Authors: Jie Ma (possible past University Of Oxford affiliation), Binfei Chu, Jie Gao, Jinlu Zhang, Yiwei Ma, Yi Tan, Jiayi Ji, Xiaoshuai Sun, Rongrong Ji (possible past Tencent (China) affiliation)
Abstract

Autonomous research agents can now draft hypotheses, write code, run experiments, and produce papers, but they remain brittle when experiments fail. Under the prevailing paradigm, failure recovery is usually delegated to a single free-form reflection: a rich trajectory of metrics, logs, and design choices is compressed into one verbal critique, which often leads either to localized trial-and-error or to hard pivots that discard useful context. We propose SAGE, a Self-correcting, Autonomous, Grou...

📄 Stage-Transition Dense Reward Modeling for Reinforcement Learning
🗓️ Published: 6/30/2026
🔗 http://arxiv.org/abs/2606.31377v1
👥 Authors: Yang Yang (possible past Tencent (China) affiliation), Bingjie Chen, Zihan Wang (possible past Tsinghua University affiliation), Yizhe Li, Guoping Pan, Yi Cheng, Houde Liu
Abstract

Reinforcement learning for long-horizon robotic manipulation is often limited by sparse and delayed rewards, while manually designing dense shaping signals is costly and brittle to changes in environments and object configurations. This work proposes Stage-Transition Dense Reward (STDR), a visual reward-learning framework that converts unstructured expert videos into logically grounded dense rewards for training RL agents from scratch. STDR leverages semantic understanding to infer a task's stag...

📄 Learning from Failure: Inference-Time Self-Improvement for Computer-Use Agents
🗓️ Published: 6/30/2026
🔗 http://arxiv.org/abs/2606.31270v1
👥 Authors: Xueqiao Sun, Xiaohan Wang (possible past Baidu (China) affiliation), Ludwig Schmidt (possible past University Of Washington affiliation), Serena Yeung-Levy, Yuhui Zhang
Abstract

Computer-use agents, which leverage multimodal large language models (MLLMs) to operate computers and complete tasks, have attracted significant attention for their utility and versatility. A major challenge in developing these agents is collecting large-scale, high-quality trajectories. The standard approach generates synthetic data through a self-improving loop: an agent is placed in a verifiable environment and iteratively fine-tuned on its successful trajectories. Despite its effectiveness, ...

📄 Revealing Safety-Critical Scenarios for UTM via Transformer
🗓️ Published: 6/30/2026
🔗 http://arxiv.org/abs/2606.31114v1
👥 Authors: Huaze Tang, Bill Zeng, Chao Wang (possible past Google (United States) affiliation), Zhenpeng Shi, Qian Zhang (possible past University Of Washington affiliation), Wenbo Ding (possible past Tsinghua University affiliation)
Abstract

Unmanned Traffic Management (UTM) systems are cloud-based platforms designed to manage and coordinate multiple aerial vehicles remotely. UTM systems are safety-critical which cannot tolerate failures like crash or collision. To reveal latent vulnerabilities, there are neither optimal failure-exposing demonstrations nor clear reward signals. Additionally, UTM's self-healing capability introduces the ``long-tail effect'' of critical failures. We propose framing UTM vulnerability discovery as a seq...

📄 Introduction to Stochastic Differential Equations for Generative Machine Learning: A Variational Perspective
🗓️ Published: 6/30/2026
🔗 http://arxiv.org/abs/2606.31576v1
👥 Authors: Ole Winther, Paul Jeha, Sander Dieleman (possible past Google (United States) affiliation), Andriy Mnih (possible past Google (United States) affiliation), Manfred Opper, Andrea Dittadi
Abstract

The use of ordinary and stochastic differential equations has led to substantial progress in generative machine learning with applications to, for example, image, video and biomolecule generation. This paper provides a self-contained and informal introduction to the differential equations, the probabilistic framework for using them in generative modeling and the Fokker--Planck equation that governs the temporal evolution of the marginal distribution of the stochastic variables of the differentia...

📄 Scaling Storm-Resolving Atmospheric AI Simulation to the Entire Planet
🗓️ Published: 6/30/2026
🔗 http://arxiv.org/abs/2606.31248v1
👥 Authors: Zeyuan Hu, Akshay Subramaniam (possible past Nvidia (United States) affiliation), Noel Keen, Tao Ge, Jaideep Pathak (possible past Nvidia (United States) affiliation), Mohammad Shoaib Abbas, Suman Ravuri (possible past Google (United States) affiliation), Karthik Kashinath (possible past Nvidia (United States) affiliation), Naser Mahfouz, Peter Caldwell, Mike Pritchard, Noah Brenowitz
Abstract

Kilometer-scale convection shapes precipitation extremes, tropical organization, and cloud feedbacks, but most global atmospheric models approximate these processes at 25-100 km resolution. Global storm-resolving physics models resolve convective systems explicitly, but at a cost -- roughly one MWh per simulated day on exascale supercomputers -- that limits long-duration simulation. We introduce STRATA (Storm-resolving Tile-based autoRegressive Atmosphere Transformer Architecture), the first aut...

📄 Can Tabular In-Context Learners Generalize to Biomolecular Property Prediction?
🗓️ Published: 6/30/2026
🔗 http://arxiv.org/abs/2606.31126v1
👥 Authors: Davy Guan, Lu Zhang (possible past Tencent (China) affiliation), Asiri Wijesinghe, Allen Zhu, He Zhao (possible past Tencent (China) affiliation), Helen Power, F. Hafna Ahmed, Andrew Warden, Cheng Soon Ong, Daniel M. Steinberg
Abstract

Predicting biomolecular properties from limited labeled data is a central bottleneck in protein engineering and small-molecule design. As strong pretrained encoders now supply rich fixed-length representations, the difficulty has shifted from representation learning to building a data-efficient predictor for the few-shot regime. Tabular foundation models such as TabPFN3 and TabICL are unlikely candidates for this role: they are in-context learners pretrained on synthetic tables drawn from random...

📄 Using AI Agents to Automate Black-Box Audits of Personalization Algorithms at Scale
🗓️ Published: 6/29/2026
🔗 http://arxiv.org/abs/2606.30801v1
👥 Authors: Alessandro Morosini, Sarah H. Cen (possible past Deepmind (United Kingdom) affiliation), Andrew Ilyas, Hedi Driss, Aleksander Mądry (possible past Massachusetts Institute Of Technology affiliation), Chara Podimata
Abstract

Personalization algorithms determine what content users encounter on online platforms. Auditing these systems is difficult because independent auditors have only black-box access to the algorithms, while personalization depends on users' attributes, behavior, and evolving interaction histories. Existing auditing methods face a tradeoff: studies with real users capture realistic behavior but are costly and hard to control, whereas sock-puppet audits scale more easily but often rely on scripted be...

📄 Experience Augmented Policy Optimization for LLM Reasoning
🗓️ Published: 6/29/2026
🔗 http://arxiv.org/abs/2606.30420v1
👥 Authors: Jinda Lu, Kexin Huang (possible past Stanford University affiliation), Junkang Wu, Shuo Yang, Jinghan Li, Chiyu Ma, Shaohang Wei, Xiang Wang (possible past Tencent (China) affiliation), Guoyin Wang, Jingren Zhou
Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) is a powerful paradigm for improving the reasoning capabilities of large language models (LLMs). However, existing RLVR methods typically rely on on-policy optimization from scratch, resulting in high sampling costs and inefficient utilization of accumulated experience. As model capabilities and policy behaviors evolve during training, recent attempts to reuse experience via fixed reasoning trajectories further suffer from policy mismatch. Mo...

📄 Diffusion Fine-tuning with Rewarded Moment Matching Distillation
🗓️ Published: 6/29/2026
🔗 http://arxiv.org/abs/2606.30414v1
👥 Authors: Alexis Jacq, Guillaume Couairon, Valentin De Bortoli, Quentin Berthet (possible past University Of Cambridge affiliation), Arnaud Doucet (possible past University Of Oxford affiliation), Romuald Elie
Abstract

Distillation and Reinforcement Learning (RL) fine-tuning are the primary pillars of diffusion post-training. While traditionally studied in isolation, the interaction between these phases remains poorly understood, and in particular how fine-tuning impacts the generative quality of distilled models. We introduce Rewarded Moment Matching Distillation (RMMD), a novel framework that simultaneously distills diffusion models and maximizes a reward function. RMMD preserves the high-fidelity ``naturaln...

📄 MOPD: Multi-Teacher On-Policy Distillation for Capability Integration in LLM Post-Training
🗓️ Published: 6/29/2026
🔗 http://arxiv.org/abs/2606.30406v1
👥 Authors: Wenhan Ma, Jianyu Wei, Liang Zhao (possible past Baidu (China) affiliation), Hailin Zhang, Bangjun Xiao, Lei Li (possible past Carnegie Mellon University affiliation), Qibin Yang, Bofei Gao, Yudong Wang, Rang Li, Jinhao Dong, Zhifang Sui (possible past Peking University affiliation), Fuli Luo (possible past Peking University affiliation)
Abstract

Modern large language models (LLMs) rely on reinforcement learning during post-training to push specific capabilities, yet integrating multiple capabilities into one model remains hard. Existing methods, such as Off-Policy Finetune and Mix-RL, are either inefficient or lose performance. In this work, we propose Multi-teacher On-Policy Distillation (MOPD), a post-training paradigm for combining the capabilities of multiple domain RL teachers: we first run per-domain specialised RL to obtain a set...

*Notable papers are those with at least two authors from a "big" AI/ML lab.