📄 Notable* Recent AI/ML arXiv Papers

Last updated just now...

📄 UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience
🗓️ Published: 3/25/2026
🔗 http://arxiv.org/abs/2603.24533v1
👥 Authors: Zichuan Lin, Feiyu Liu, Yijun Yang, Jiafei Lyu, Yiming Gao, Yicheng Liu, Zhicong Lu, Yangbin Yu, Mingyu Yang, Junyou Li, Deheng Ye (possible past Tencent (China) affiliation), Jie Jiang (possible past Tencent (China) affiliation)
Abstract

Autonomous mobile GUI agents have attracted increasing attention along with the advancement of Multimodal Large Language Models (MLLMs). However, existing methods still suffer from inefficient learning from failed trajectories and ambiguous credit assignment under sparse rewards for long-horizon GUI tasks. To that end, we propose UI-Voyager, a novel two-stage self-evolving mobile GUI agent. In the first stage, we employ Rejection Fine-Tuning (RFT), which enables the continuous co-evolution of da...

📄 CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents
🗓️ Published: 3/25/2026
🔗 http://arxiv.org/abs/2603.24440v1
👥 Authors: Xiangru Jian, Shravan Nayak, Kevin Qinghong Lin (possible past National University Of Singapore affiliation), Aarash Feizi, Kaixin Li, Patrice Bechard, Spandana Gella (possible past University Of Edinburgh affiliation), Sai Rajeswar
Abstract

Computer-use agents (CUAs) hold great promise for automating complex desktop workflows, yet progress toward general-purpose agents is bottlenecked by the scarcity of continuous, high-quality human demonstration videos. Recent work emphasizes that continuous video, not sparse screenshots, is the critical missing ingredient for scaling these agents. However, the largest existing open dataset, ScaleCUA, contains only 2 million screenshots, equating to less than 20 hours of video. To address this bo...

📄 Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing
🗓️ Published: 3/25/2026
🔗 http://arxiv.org/abs/2603.24326v1
👥 Authors: Cheng Cui, Ting Sun, Suyin Liang, Tingquan Gao, Zelun Zhang, Jiaxuan Liu, Xueqing Wang, Changda Zhou, Hongen Liu, Manhui Lin, Yue Zhang, Yubo Zhang (possible past Carnegie Mellon University affiliation), Jing Zhang (possible past University Of Washington affiliation), Jun Zhang (possible past Tencent (China) affiliation), Xing Wei, Yi Liu (possible past Google (United States) affiliation), Dianhai Yu (possible past Baidu (China) affiliation), Yanjun Ma (possible past Baidu (China) affiliation)
Abstract

Document parsing is a fine-grained task where image resolution significantly impacts performance. While advanced research leveraging vision-language models benefits from high-resolution input to boost model performance, this often leads to a quadratic increase in the number of vision tokens and significantly raises computational costs. We attribute this inefficiency to substantial visual regions redundancy in document images, like background. To tackle this, we propose PaddleOCR-VL, a novel coar...

📄 Powerful Teachers Matter: Text-Guided Multi-view Knowledge Distillation with Visual Prior Enhancement
🗓️ Published: 3/25/2026
🔗 http://arxiv.org/abs/2603.24208v1
👥 Authors: Xin Zhang (possible past Google (United States) affiliation), Jianyang Xu, Hao Peng (possible past Tsinghua University affiliation), Dongjing Wang, Jingyuan Zheng, Yu Li (possible past Tencent (China) affiliation), Yuyu Yin, Hongbo Wang
Abstract

Knowledge distillation transfers knowledge from large teacher models to smaller students for efficient inference. While existing methods primarily focus on distillation strategies, they often overlook the importance of enhancing teacher knowledge quality. In this paper, we propose Text-guided Multi-view Knowledge Distillation (TMKD), which leverages dual-modality teachers, a visual teacher and a text teacher (CLIP), to provide richer supervisory signals. Specifically, we enhance the visual teach...

📄 A Deep Dive into Scaling RL for Code Generation with Synthetic Data and Curricula
🗓️ Published: 3/25/2026
🔗 http://arxiv.org/abs/2603.24202v1
👥 Authors: Cansu Sancaktar, David Zhang (possible past Meta (United States) affiliation), Gabriel Synnaeve (possible past Meta (United States) affiliation), Taco Cohen
Abstract

Reinforcement learning (RL) has emerged as a powerful paradigm for improving large language models beyond supervised fine-tuning, yet sustaining performance gains at scale remains an open challenge, as data diversity and structure, rather than volume alone, become the limiting factor. We address this by introducing a scalable multi-turn synthetic data generation pipeline in which a teacher model iteratively refines problems based on in-context student performance summaries, producing structured ...

📄 Towards Effective Experiential Learning: Dual Guidance for Utilization and Internalization
🗓️ Published: 3/25/2026
🔗 http://arxiv.org/abs/2603.24093v1
👥 Authors: Fei Bai, Zhipeng Chen, Chuan Hao, Ming Yang (possible past Meta (United States) affiliation), Ran Tao, Bryan Dai, Wayne Xin Zhao (possible past Baidu (China) affiliation), Jian Yang, Hongteng Xu
Abstract

Recently, reinforcement learning~(RL) has become an important approach for improving the capabilities of large language models~(LLMs). In particular, reinforcement learning from verifiable rewards~(RLVR) has emerged as a promising paradigm for reasoning tasks. However, existing RL-based training still remains only a rough approximation to human learning. Human learners leverage both external and internal experience to guide exploration and gradually internalize useful trajectories into stable kn...

📄 The Price Reversal Phenomenon: When Cheaper Reasoning Models End Up Costing More
🗓️ Published: 3/25/2026
🔗 http://arxiv.org/abs/2603.23971v1
👥 Authors: Lingjiao Chen, Chi Zhang (possible past Peking University affiliation), Yeye He, Ion Stoica (possible past University Of California, Berkeley affiliation), Matei Zaharia (possible past University Of California, Berkeley affiliation), James Zou
Abstract

Developers and consumers increasingly choose reasoning language models (RLMs) based on their listed API prices. However, how accurately do these prices reflect actual inference costs? We conduct the first systematic study of this question, evaluating 8 frontier RLMs across 9 diverse tasks covering competition math, science QA, code generation, and multi-domain reasoning. We uncover the pricing reversal phenomenon: in 21.8% of model-pair comparisons, the model with a lower listed price actually i...

📄 Self-Distillation for Multi-Token Prediction
🗓️ Published: 3/25/2026
🔗 http://arxiv.org/abs/2603.23911v1
👥 Authors: Guoliang Zhao, Ruobing Xie (possible past Tencent (China) affiliation), An Wang, Shuaipeng Li, Huaibing Xie, Xingwu Sun (possible past Baidu (China) affiliation)
Abstract

As Large Language Models (LLMs) scale up, inference efficiency becomes a critical bottleneck. Multi-Token Prediction (MTP) could accelerate LLM inference by predicting multiple future tokens in parallel. However, existing MTP approaches still face two challenges: limited acceptance rates of MTP heads, and difficulties in jointly training multiple MTP heads. Therefore, we propose MTP-D, a simple yet effective self-distillation method with minimal additional training cost, which boosts MTP head ac...

📄 AgentChemist: A Multi-Agent Experimental Robotic Platform Integrating Chemical Perception and Precise Control
🗓️ Published: 3/25/2026
🔗 http://arxiv.org/abs/2603.23886v1
👥 Authors: Xiangyi Wei, Fei Wang, Haotian Zhang (possible past Stanford University affiliation), Xin An, Haitian Zhu, Lianrui Hu, Yang Li (possible past Google (United States) affiliation), Changbo Wang, Xiao He
Abstract

Chemical laboratory automation has long been constrained by rigid workflows and poor adaptability to the long-tail distribution of experimental tasks. While most automated platforms perform well on a narrow set of standardized procedures, real laboratories involve diverse, infrequent, and evolving operations that fall outside predefined protocols. This mismatch prevents existing systems from generalizing to novel reaction conditions, uncommon instrument configurations, and unexpected procedural ...

📄 Why the Maximum Second Derivative of Activations Matters for Adversarial Robustness
🗓️ Published: 3/25/2026
🔗 http://arxiv.org/abs/2603.23860v1
👥 Authors: Yunrui Yu, Hang Su (possible past Tsinghua University affiliation), Jun Zhu (possible past Tsinghua University affiliation)
Abstract

This work investigates the critical role of activation function curvature -- quantified by the maximum second derivative $\max|σ''|$ -- in adversarial robustness. Using the Recursive Curvature-Tunable Activation Family (RCT-AF), which enables precise control over curvature through parameters $α$ and $β$, we systematically analyze this relationship. Our study reveals a fundamental trade-off: insufficient curvature limits model expressivity, while excessive curvature amplifies the normalized Hessi...

📄 The Diminishing Returns of Early-Exit Decoding in Modern LLMs
🗓️ Published: 3/24/2026
🔗 http://arxiv.org/abs/2603.23701v1
👥 Authors: Rui Wei, Rui Du, Hanfei Yu, Devesh Tiwari, Jian Li (possible past Tencent (China) affiliation), Zhaozhuo Xu, Hao Wang (possible past Tsinghua University affiliation)
Abstract

In Large Language Model (LLM) inference, early-exit refers to stopping computation at an intermediate layer once the prediction is sufficiently confident, thereby reducing latency and cost. However, recent LLMs adopt improved pretraining recipes and architectures that reduce layer redundancy, potentially limiting early-exit opportunities. We re-evaluate layer-wise early-exit in modern LLMs and analyze how intermediate representations evolve during training. We introduce a metric to quantify a mo...

📄 3DCity-LLM: Empowering Multi-modality Large Language Models for 3D City-scale Perception and Understanding
🗓️ Published: 3/24/2026
🔗 http://arxiv.org/abs/2603.23447v1
👥 Authors: Yiping Chen, Jinpeng Li, Wenyu Ke, Yang Luo, Jie Ouyang, Zhongjie He, Li Liu (possible past National University Of Defense Technology affiliation), Hongchao Fan, Hao Wu (possible past Tencent (China) affiliation)
Abstract

While multi-modality large language models excel in object-centric or indoor scenarios, scaling them to 3D city-scale environments remains a formidable challenge. To bridge this gap, we propose 3DCity-LLM, a unified framework designed for 3D city-scale vision-language perception and understanding. 3DCity-LLM employs a coarse-to-fine feature encoding strategy comprising three parallel branches for target object, inter-object relationship, and global scene. To facilitate large-scale training, we i...

📄 PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments
🗓️ Published: 3/24/2026
🔗 http://arxiv.org/abs/2603.23231v1
👥 Authors: Shuochen Liu, Junyi Zhu, Long Shu, Junda Lin, Yuhao Chen, Haotian Zhang (possible past Stanford University affiliation), Chao Zhang, Derong Xu, Jia Li (possible past Google (United States) affiliation), Bo Tang, Zhiyu Li, Feiyu Xiong, Enhong Chen (possible past Baidu (China) affiliation), Tong Xu (possible past Baidu (China) affiliation)
Abstract

Empowering large language models with long-term memory is crucial for building agents that adapt to users' evolving needs. However, prior evaluations typically interleave preference-related dialogues with irrelevant conversations, reducing the task to needle-in-a-haystack retrieval while ignoring relationships between events that drive the evolution of user preferences. Such settings overlook a fundamental characteristic of real-world personalization: preferences emerge gradually and accumulate ...

📄 ImplicitRM: Unbiased Reward Modeling from Implicit Preference Data for LLM alignment
🗓️ Published: 3/24/2026
🔗 http://arxiv.org/abs/2603.23184v1
👥 Authors: Hao Wang (possible past Tsinghua University affiliation), Haocheng Yang, Licheng Pan, Lei Shen, Xiaoxi Li, Yinuo Wang, Zhichao Chen, Yuan Lu, Haoxuan Li, Zhouchen Lin (possible past Peking University affiliation)
Abstract

Reward modeling represents a long-standing challenge in reinforcement learning from human feedback (RLHF) for aligning language models. Current reward modeling is heavily contingent upon experimental feedback data with high collection costs. In this work, we study \textit{implicit reward modeling} -- learning reward models from implicit human feedback (e.g., clicks and copies) -- as a cost-effective alternative. We identify two fundamental challenges in implicit reward modeling: (1) Implicit pre...

📄 MedCausalX: Adaptive Causal Reasoning with Self-Reflection for Trustworthy Medical Vision-Language Models
🗓️ Published: 3/24/2026
🔗 http://arxiv.org/abs/2603.23085v1
👥 Authors: Jianxin Lin, Chunzheng Zhu, Peter J. Kneuertz, Yunfei Bai (possible past Google (United States) affiliation), Yuan Xue (possible past Google (United States) affiliation)
Abstract

Vision-Language Models (VLMs) have enabled interpretable medical diagnosis by integrating visual perception with linguistic reasoning. Yet, existing medical chain-of-thought (CoT) models lack explicit mechanisms to represent and enforce causal reasoning, leaving them vulnerable to spurious correlations and limiting their clinical reliability. We pinpoint three core challenges in medical CoT reasoning: how to adaptively trigger causal correction, construct high-quality causal-spurious contrastive...

📄 StateLinFormer: Stateful Training Enhancing Long-term Memory in Navigation
🗓️ Published: 3/24/2026
🔗 http://arxiv.org/abs/2603.23571v1
👥 Authors: Zhiyuan Chen (possible past Google (United States) affiliation), Yuxuan Zhong, Fan Wang (possible past Baidu (China) affiliation), Bo Yu (possible past Baidu (China) affiliation), Pengtao Shao, Shaoshan Liu, Ning Ding (possible past Tsinghua University affiliation)
Abstract

Effective navigation intelligence relies on long-term memory to support both immediate generalization and sustained adaptation. However, existing approaches face a dilemma: modular systems rely on explicit mapping but lack flexibility, while Transformer-based end-to-end models are constrained by fixed context windows, limiting persistent memory across extended interactions. We introduce StateLinFormer, a linear-attention navigation model trained with a stateful memory mechanism that preserves re...

📄 JFTA-Bench: Evaluate LLM's Ability of Tracking and Analyzing Malfunctions Using Fault Trees
🗓️ Published: 3/24/2026
🔗 http://arxiv.org/abs/2603.22978v1
👥 Authors: Yuhui Wang, Zhixiong Yang, Ming Zhang (possible past Peking University affiliation), Shihan Dou, Zhiheng Xi, Enyu Zhou, Senjie Jin, Yujiong Shen, Dingwei Zhu, Yi Dong, Tao Gui, Qi Zhang (possible past Tencent (China) affiliation), Xuanjing Huang
Abstract

In the maintenance of complex systems, fault trees are used to locate problems and provide targeted solutions. To enable fault trees stored as images to be directly processed by large language models, which can assist in tracking and analyzing malfunctions, we propose a novel textual representation of fault trees. Building on it, we construct a benchmark for multi-turn dialogue systems that emphasizes robust interaction in complex environments, evaluating a model's ability to assist in malfuncti...

📄 DreamerAD: Efficient Reinforcement Learning via Latent World Model for Autonomous Driving
🗓️ Published: 3/25/2026
🔗 http://arxiv.org/abs/2603.24587v1
👥 Authors: Pengxuan Yang, Yupeng Zheng, Deheng Qian, Zebin Xing, Qichao Zhang, Linbo Wang, Yichen Zhang, Shaoyu Guo, Zhongpu Xia, Qiang Chen (possible past Baidu (China) affiliation), Junyu Han (possible past Baidu (China) affiliation), Lingyun Xu, Yifeng Pan (possible past Baidu (China) affiliation), Dongbin Zhao
Abstract

We introduce DreamerAD, the first latent world model framework that enables efficient reinforcement learning for autonomous driving by compressing diffusion sampling from 100 steps to 1 - achieving 80x speedup while maintaining visual interpretability. Training RL policies on real-world driving data incurs prohibitive costs and safety risks. While existing pixel-level diffusion world models enable safe imagination-based training, they suffer from multi-step diffusion inference latency (2s/frame)...

📄 Scaling Recurrence-aware Foundation Models for Clinical Records via Next-Visit Prediction
🗓️ Published: 3/25/2026
🔗 http://arxiv.org/abs/2603.24562v1
👥 Authors: Haresh Rengaraj Rajamohan, Xiang Gao, Weicheng Zhu, Shih-Lun Huang, Long Chen (possible past Tencent (China) affiliation), Gabe Schulman, Huizhen Jin, Shengduo Li, Yixuan Wang, Huidi Yang, Kyunghyun Cho (possible past Meta (United States) affiliation), Cem M. Deniz, Narges Razavian
Abstract

While large-scale pretraining has revolutionized language modeling, its potential remains underexplored in healthcare with structured electronic health records (EHRs). We present RAVEN, a novel generative pretraining strategy for sequential EHR data based on Recurrence-Aware next-Visit EveNt prediction. Leveraging a dataset of over one million unique individuals, our model learns to autoregressively generate tokenized clinical events for the next visit conditioned on patient history. We introduc...

📄 AVO: Agentic Variation Operators for Autonomous Evolutionary Search
🗓️ Published: 3/25/2026
🔗 http://arxiv.org/abs/2603.24517v1
👥 Authors: Terry Chen, Zhifan Ye, Bing Xu (possible past Tsinghua University affiliation), Zihao Ye, Timmy Liu, Ali Hassani, Tianqi Chen (possible past University Of Washington affiliation), Andrew Kerr, Haicheng Wu, Yang Xu, Yu-Jung Chen, Hanfeng Chen, Aditya Kane, Ronny Krashinsky (possible past Nvidia (United States) affiliation), Ming-Yu Liu (possible past Nvidia (United States) affiliation), Vinod Grover, Luis Ceze, Roger Bringmann, John Tran, Wei Liu (possible past Tsinghua University affiliation), Fung Xie, Michael Lightstone, Humphrey Shi
Abstract

Agentic Variation Operators (AVO) are a new family of evolutionary variation operators that replace the fixed mutation, crossover, and hand-designed heuristics of classical evolutionary search with autonomous coding agents. Rather than confining a language model to candidate generation within a prescribed pipeline, AVO instantiates variation as a self-directed agent loop that can consult the current lineage, a domain-specific knowledge base, and execution feedback to propose, repair, critique, a...

📄 Composer 2 Technical Report
🗓️ Published: 3/25/2026
🔗 http://arxiv.org/abs/2603.24477v1
👥 Authors: Cursor Reseach, :, Aaron Chan, Ahmed Shalaby, Alexander Wettig, Aman Sanger, Andrew Zhai, Anurag Ajay, Ashvin Nair (possible past University Of California, Berkeley affiliation), Charlie Snell, Chen Lu, Chen Shen (possible past Tencent (China) affiliation), Emily Jia, Federico Cassano, Hanpeng Liu, Haoyu Chen, Henry Wildermuth, Jacob Jackson, Janet Li, Jediah Katz, Jiajun Yao, Joey Hejna, Josh Warner, Julius Vering, Kevin Frans, Lee Danilek, Less Wright, Lujing Cen, Luke Melas-Kyriazi, Michael Truell, Michiel De Jong (possible past Google (United States) affiliation), Naman Jain, Nate Schmidt, Nathan Wang, Niklas Muennighoff, Oleg Rybkin, Paul Loh, Phillip Kravtsov, Rishabh Yadav, Sahil Shah, Sam Kottler, Alexander M Rush, Shengtong Zhang, Shomil Jain, Sriram Sankar, Stefan Heule (possible past Eth Zurich affiliation), Stuart H. Sul, Sualeh Asif, Victor Rong, Wanqi Zhu, William Lin, Yuchen Wu (possible past Google (United States) affiliation), Yuri Volkov, Yury Zemlyanskiy (possible past Google (United States) affiliation), Zack Holbrook, Zhiyuan Zhang
Abstract

Composer 2 is a specialized model designed for agentic software engineering. The model demonstrates strong long-term planning and coding intelligence while maintaining the ability to efficiently solve problems for interactive use. The model is trained in two phases: first, continued pretraining to improve the model's knowledge and latent coding ability, followed by large-scale reinforcement learning to improve end-to-end coding performance through stronger reasoning, accurate multi-step executio...

*Notable papers are those with at least two authors from a "big" AI/ML lab.