πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ Vega: Learning to Drive with Natural Language Instructions
πŸ—“οΈ Published: 3/26/2026
πŸ”— http://arxiv.org/abs/2603.25741v1
πŸ‘₯ Authors: Sicheng Zuo, Yuxuan Li, Wenzhao Zheng, Zheng Zhu, Jie Zhou (possible past Tsinghua University affiliation), Jiwen Lu (possible past Tsinghua University affiliation)
Abstract

Vision-language-action models have reshaped autonomous driving to incorporate languages into the decision-making process. However, most existing pipelines only utilize the language modality for scene descriptions or reasoning and lack the flexibility to follow diverse user instructions for personalized driving. To address this, we first construct a large-scale driving dataset (InstructScene) containing around 100,000 scenes annotated with diverse driving instructions with the corresponding traje...

πŸ“„ Back to Basics: Revisiting ASR in the Age of Voice Agents
πŸ—“οΈ Published: 3/26/2026
πŸ”— http://arxiv.org/abs/2603.25727v1
πŸ‘₯ Authors: Geeyang Tay, Wentao Ma, Jaewon Lee, Yuzhi Tang, Daniel Lee, Weisu Yin, Dongming Shen, Silin Meng, Yi Zhu, Mu Li (possible past Carnegie Mellon University affiliation), Alex Smola (possible past Google (United States) affiliation)
Abstract

Automatic speech recognition (ASR) systems have achieved near-human accuracy on curated benchmarks, yet still fail in real-world voice agents under conditions that current evaluations do not systematically cover. Without diagnostic tools that isolate specific failure factors, practitioners cannot anticipate which conditions, in which languages, will cause what degree of degradation. We introduce WildASR, a multilingual (four-language) diagnostic benchmark sourced entirely from real human speech ...

πŸ“„ Voxtral TTS
πŸ—“οΈ Published: 3/26/2026
πŸ”— http://arxiv.org/abs/2603.25551v1
πŸ‘₯ Authors: Alexander H. Liu, Alexis Tacnet, Andy Ehrenberg, Andy Lo, Chen-Yo Sun, Guillaume Lample, Henry Lagarde, Jean-Malo Delignon, Jaeyoung Kim, John Harvill, Khyathi Raghavi Chandu (possible past Carnegie Mellon University affiliation), Lorenzo Signoretti, Margaret Jennings, Patrick Von Platen, Pavankumar Reddy Muddireddy, Rohin Arora, Sanchit Gandhi, Samuel Humeau, Soham Ghosh, Srijan Mishra, Van Phung, Abdelaziz Bounhar, Abhinav Rastogi (possible past Google (United States) affiliation), Adrien SadΓ©, Alan Jeffares, Albert Jiang, Alexandre Cahill, Alexandre Gavaudan, Alexandre Sablayrolles, AmΓ©lie HΓ©liou, Amos You, Andrew Bai, Andrew Zhao, Angele Lenglemetz, Anmol Agarwal, Anton Eliseev, Antonia Calvi, Arjun Majumdar, Arthur Fournier, Artjom Joosen, Avi Sooriyarachchi, Aysenur Karaduman Utkur, Baptiste Bout, Baptiste RoziΓ¨re, Baudouin De Monicault, Benjamin Tibi, Bowen Yang, Charlotte CronjΓ€ger, ClΓ©mence Lanfranchi, Connor Chen, Corentin Barreau, Corentin Sautier, Cyprien Courtot, Darius Dabert, Diego De Las Casas (possible past Deepmind (United Kingdom) affiliation), Elizaveta Demyanenko, Elliot Chane-Sane, Emmanuel Gottlob, Enguerrand Paquin, Etienne Goffinet, Fabien Niel, Faruk Ahmed, Federico Baldassarre, Gabrielle Berrada, GaΓ«tan Ecrepont, Gauthier Guinet, Genevieve Hayes, Georgii Novikov, Giada Pistilli, Guillaume Kunsch, Guillaume Martin, Guillaume Raille, Gunjan Dhanuka, Gunshi Gupta, Han Zhou, Harshil Shah, Hope Mcgovern, Hugo Thimonier, Indraneel Mukherjee (possible past Google (United States) affiliation), Irene Zhang, Jacques Sun, Jan Ludziejewski, Jason Rute, JΓ©rΓ©mie Dentan, Joachim Studnia, Jonas Amar, JosΓ©phine Delas, Josselin Somerville Roberts, Julien Tauran, Karmesh Yadav, Kartik Khandelwal, Kilian Tep, Kush Jain, Laurence Aitchison, Laurent Fainsin, LΓ©onard Blier, Lingxiao Zhao, Louis Martin, Lucile Saulnier, Luyu Gao, Maarten Buyl, Manan Sharma, Marie Pellat, Mark Prins, Martin Alexandre, Mathieu PoirΓ©e, Mathieu Schmitt, Mathilde Guillaumin, Matthieu Dinot, Matthieu Futeral, Maxime Darrin, Maximilian Augustin, Mert Unsal, Mia Chiquier, Mikhail Biriuchinskii, Minh-Quang Pham, Mircea Lica, Morgane RiviΓ¨re (possible past Meta (United States) affiliation), Nathan Grinsztajn, Neha Gupta, Olivier Bousquet (possible past Google (United States) affiliation), Olivier Duchenne, Patricia Wang, Paul Jacob, Paul Wambergue, Paula Kurylowicz, Philippe Pinel, PhilomΓ¨ne Chagniot, Pierre Stock, Piotr MiΕ‚oΕ›, Prateek Gupta, Pravesh Agrawal, Quentin Torroba, Ram Ramrakhya, Randall Isenhour, Rishi Shah, Romain Sauvestre, Roman Soletskyi, Rosalie Millner, Rupert Menneer, Sagar Vaze, Samuel Barry, Samuel Belkadi, Sandeep Subramanian (possible past Carnegie Mellon University affiliation), Sean Cha, Shashwat Verma, Siddhant Waghjale, Siddharth Gandhi, Simon Lepage, Sumukh Aithal, Szymon Antoniak, Tarun Kumar Vangani, Teven Le Scao, ThΓ©o Cachet, Theo Simon Sorg, Thibaut Lavril (possible past Meta (United States) affiliation), Thomas Chabal, Thomas Foubert, Thomas Robert, Thomas Wang, Tim Lawson, Tom Bewley, Tom Edwards, Tyler Wang, Umar Jamil, Umberto Tomasini, Valeriia Nemychnikova, Vedant Nanda, Victor Jouault, Vincent MaladiΓ¨re, Vincent Pfister, Virgile Richard, Vladislav Bataev, Wassim Bouaziz, Wen-Ding Li, William Havard, William Marshall, Xinghui Li, Xingran Guo, Xinyu Yang, Yannic Neuhaus, Yassine El Ouahidi, Yassir Bendou, Yihan Wang, Yimu Pan, Zaccharie Ramzi, Zhenlin Xu
Abstract

We introduce Voxtral TTS, an expressive multilingual text-to-speech model that generates natural speech from as little as 3 seconds of reference audio. Voxtral TTS adopts a hybrid architecture that combines auto-regressive generation of semantic speech tokens with flow-matching for acoustic tokens. These tokens are encoded and decoded with Voxtral Codec, a speech tokenizer trained from scratch with a hybrid VQ-FSQ quantization scheme. In human evaluations conducted by native speakers, Voxtral TT...

πŸ“„ Evaluating Language Models for Harmful Manipulation
πŸ—“οΈ Published: 3/26/2026
πŸ”— http://arxiv.org/abs/2603.25326v1
πŸ‘₯ Authors: Canfer Akbulut, Rasmi Elasmar, Abhishek Roy, Anthony Payne, Priyanka Suresh, Lujain Ibrahim, Seliem El-Sayed, Charvi Rastogi, Ashyana Kachra, Will Hawkins (possible past Deepmind (United Kingdom) affiliation), Kristian Lum (possible past Google (United States) affiliation), Laura Weidinger (possible past Deepmind (United Kingdom) affiliation)
Abstract

Interest in the concept of AI-driven harmful manipulation is growing, yet current approaches to evaluating it are limited. This paper introduces a framework for evaluating harmful AI manipulation via context-specific human-AI interaction studies. We illustrate the utility of this framework by assessing an AI model with 10,101 participants spanning interactions in three AI use domains (public policy, finance, and health) and three locales (US, UK, and India). Overall, we find that that the tested...

πŸ“„ Activation Matters: Test-time Activated Negative Labels for OOD Detection with Vision-Language Models
πŸ—“οΈ Published: 3/26/2026
πŸ”— http://arxiv.org/abs/2603.25250v1
πŸ‘₯ Authors: Yabin Zhang, Maya Varma, Yunhe Gao, Jean-Benoit Delbrouck, Jiaming Liu (possible past Baidu (China) affiliation), Chong Wang (possible past Google (United States) affiliation), Curtis Langlotz
Abstract

Out-of-distribution (OOD) detection aims to identify samples that deviate from in-distribution (ID). One popular pipeline addresses this by introducing negative labels distant from ID classes and detecting OOD based on their distance to these labels. However, such labels may present poor activation on OOD samples, failing to capture the OOD characteristics. To address this, we propose \underline{T}est-time \underline{A}ctivated \underline{N}egative \underline{L}abels (TANL) by dynamically evalua...

πŸ“„ FluxEDA: A Unified Execution Infrastructure for Stateful Agentic EDA
πŸ—“οΈ Published: 3/26/2026
πŸ”— http://arxiv.org/abs/2603.25243v1
πŸ‘₯ Authors: Zhengrui Chen, Zixuan Song, Yu Li (possible past Tencent (China) affiliation), Qi Sun (possible past Google (United States) affiliation), Cheng Zhuo
Abstract

Large language models and autonomous agents are increasingly explored for EDA automation, but many existing integrations still rely on script-level or request-level interactions, which makes it difficult to preserve tool state and support iterative optimization in real production-oriented environments. In this work, we present FluxEDA, a unified and stateful infrastructure substrate for agentic EDA. FluxEDA introduces a managed gateway-based execution interface with structured request and respon...

πŸ“„ Photon: Speedup Volume Understanding with Efficient Multimodal Large Language Models
πŸ—“οΈ Published: 3/26/2026
πŸ”— http://arxiv.org/abs/2603.25155v1
πŸ‘₯ Authors: Chengyu Fang, Heng Guo (possible past Tencent (China) affiliation), Zheng Jiang, Chunming He, Xiu Li (possible past Tsinghua University affiliation), Minfeng Xu
Abstract

Multimodal large language models are promising for clinical visual question answering tasks, but scaling to 3D imaging is hindered by high computational costs. Prior methods often rely on 2D slices or fixed-length token compression, disrupting volumetric continuity and obscuring subtle findings. We present Photon, a framework that represents 3D medical volumes with token sequences of variable length. Photon introduces instruction-conditioned token scheduling and surrogate gradient propagation to...

πŸ“„ UniAI-GraphRAG: Synergizing Ontology-Guided Extraction, Multi-Dimensional Clustering, and Dual-Channel Fusion for Robust Multi-Hop Reasoning
πŸ—“οΈ Published: 3/26/2026
πŸ”— http://arxiv.org/abs/2603.25152v1
πŸ‘₯ Authors: Jie Wang (possible past Tsinghua University affiliation), Honghua Huang, Xi Ge, Jianhui Su, Wen Liu (possible past Tencent (China) affiliation), Shiguo Lian
Abstract

Retrieval-Augmented Generation (RAG) systems face significant challenges in complex reasoning, multi-hop queries, and domain-specific QA. While existing GraphRAG frameworks have made progress in structural knowledge organization, they still have limitations in cross-industry adaptability, community report integrity, and retrieval performance. This paper proposes UniAI-GraphRAG, an enhanced framework built upon open-source GraphRAG. The framework introduces three core innovations: (1) Ontology-Gu...

πŸ“„ MCLMR: A Model-Agnostic Causal Learning Framework for Multi-Behavior Recommendation
πŸ—“οΈ Published: 3/26/2026
πŸ”— http://arxiv.org/abs/2603.25126v1
πŸ‘₯ Authors: Ranxu Zhang, Junjie Meng, Ying Sun, Ziqi Xu, Bing Yin, Hao Li (possible past Tsinghua University affiliation), Yanyong Zhang, Chao Wang (possible past Google (United States) affiliation)
Abstract

Multi-Behavior Recommendation (MBR) leverages multiple user interaction types (e.g., views, clicks, purchases) to enrich preference modeling and alleviate data sparsity issues in traditional single-behavior approaches. However, existing MBR methods face fundamental challenges: they lack principled frameworks to model complex confounding effects from user behavioral habits and item multi-behavior distributions, struggle with effective aggregation of heterogeneous auxiliary behaviors, and fail to ...

πŸ“„ Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models
πŸ—“οΈ Published: 3/25/2026
πŸ”— http://arxiv.org/abs/2603.24844v1
πŸ‘₯ Authors: Isha Puri, Mehul Damani, Idan Shenfeld, Marzyeh Ghassemi (possible past University Of Toronto affiliation), Jacob Andreas (possible past University Of California, Berkeley affiliation), Yoon Kim (possible past University Of Oxford affiliation)
Abstract

Given a question, a language model (LM) implicitly encodes a distribution over possible answers. In practice, post-training procedures for LMs often collapse this distribution onto a single dominant mode. While this is generally not a problem for benchmark-style evaluations that assume one correct answer, many real-world tasks inherently involve multiple valid answers or irreducible uncertainty. Examples include medical diagnosis, ambiguous question answering, and settings with incomplete inform...

πŸ“„ UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience
πŸ—“οΈ Published: 3/25/2026
πŸ”— http://arxiv.org/abs/2603.24533v1
πŸ‘₯ Authors: Zichuan Lin, Feiyu Liu, Yijun Yang, Jiafei Lyu, Yiming Gao, Yicheng Liu, Zhicong Lu, Yangbin Yu, Mingyu Yang, Junyou Li, Deheng Ye (possible past Tencent (China) affiliation), Jie Jiang (possible past Tencent (China) affiliation)
Abstract

Autonomous mobile GUI agents have attracted increasing attention along with the advancement of Multimodal Large Language Models (MLLMs). However, existing methods still suffer from inefficient learning from failed trajectories and ambiguous credit assignment under sparse rewards for long-horizon GUI tasks. To that end, we propose UI-Voyager, a novel two-stage self-evolving mobile GUI agent. In the first stage, we employ Rejection Fine-Tuning (RFT), which enables the continuous co-evolution of da...

πŸ“„ CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents
πŸ—“οΈ Published: 3/25/2026
πŸ”— http://arxiv.org/abs/2603.24440v1
πŸ‘₯ Authors: Xiangru Jian, Shravan Nayak, Kevin Qinghong Lin (possible past National University Of Singapore affiliation), Aarash Feizi, Kaixin Li, Patrice Bechard, Spandana Gella (possible past University Of Edinburgh affiliation), Sai Rajeswar
Abstract

Computer-use agents (CUAs) hold great promise for automating complex desktop workflows, yet progress toward general-purpose agents is bottlenecked by the scarcity of continuous, high-quality human demonstration videos. Recent work emphasizes that continuous video, not sparse screenshots, is the critical missing ingredient for scaling these agents. However, the largest existing open dataset, ScaleCUA, contains only 2 million screenshots, equating to less than 20 hours of video. To address this bo...

πŸ“„ Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing
πŸ—“οΈ Published: 3/25/2026
πŸ”— http://arxiv.org/abs/2603.24326v1
πŸ‘₯ Authors: Cheng Cui, Ting Sun, Suyin Liang, Tingquan Gao, Zelun Zhang, Jiaxuan Liu, Xueqing Wang, Changda Zhou, Hongen Liu, Manhui Lin, Yue Zhang, Yubo Zhang (possible past Carnegie Mellon University affiliation), Jing Zhang (possible past University Of Washington affiliation), Jun Zhang (possible past Tencent (China) affiliation), Xing Wei, Yi Liu (possible past Google (United States) affiliation), Dianhai Yu (possible past Baidu (China) affiliation), Yanjun Ma (possible past Baidu (China) affiliation)
Abstract

Document parsing is a fine-grained task where image resolution significantly impacts performance. While advanced research leveraging vision-language models benefits from high-resolution input to boost model performance, this often leads to a quadratic increase in the number of vision tokens and significantly raises computational costs. We attribute this inefficiency to substantial visual regions redundancy in document images, like background. To tackle this, we propose PaddleOCR-VL, a novel coar...

πŸ“„ Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale
πŸ—“οΈ Published: 3/26/2026
πŸ”— http://arxiv.org/abs/2603.25040v1
πŸ‘₯ Authors: Yicheng Zou, Dongsheng Zhu, Lin Zhu, Tong Zhu (possible past Nvidia (United States) affiliation), Yunhua Zhou, Peiheng Zhou, Xinyu Zhou, Dongzhan Zhou, Zhiwang Zhou, Yuhao Zhou, Bowen Zhou, Zhanping Zhong, Zhijie Zhong, Haiteng Zhao, Penghao Zhao, Xiaomeng Zhao, Zhiyuan Zhao, Yechen Zhang, Jin Zhang, Wenwei Zhang, Hongjie Zhang, Zhuo Zhang, Wenlong Zhang, Bo Zhang (possible past Tencent (China) affiliation), Chao Zhang, Chen Zhang (possible past Peking University affiliation), Yuhang Zang, Fei Yuan, Jiakang Yuan, Jiashuo Yu, Jinhui Yin, Haochen Ye, Qian Yao, Bowen Yang, Danni Yang, Kaichen Yang, Ziang Yan, Jun Xu (possible past Google (United States) affiliation), Yicheng Xu, Wanghan Xu, Xuenan Xu, Chao Xu, Ruiliang Xu, Shuhao Xing, Long Xing, Xinchen Xie, Ling-I Wu, Zijian Wu, Zhenyu Wu, Lijun Wu, Yue Wu, Jianyu Wu, Wen Wu, Fan Wu, Xilin Wei, Qi Wei, Bingli Wang, Rui Wang (possible past Tencent (China) affiliation), Ziyi Wang, Zun Wang, Yi Wang, Haomin Wang, Yizhou Wang (possible past Peking University affiliation), Lintao Wang, Yiheng Wang, Longjiang Wang, Bin Wang, Jian Tong, Zhongbo Tian, Huanze Tang, Chen Tang, Shixiang Tang, Yu Sun (possible past Baidu (China) affiliation), Qiushi Sun, Xuerui Su, Qisheng Su, Chenlin Su, Demin Song, Jin Shi, Fukai Shang, Yuchen Ren, Pengli Ren, Xiaoye Qu, Yuan Qu, Jiantao Qiu, Yu Qiao (possible past Shanghai Artificial Intelligence Laboratory affiliation), Runyu Peng, Tianshuo Peng, Jiahui Peng, Qizhi Pei, Zhuoshi Pan, Linke Ouyang, Wenchang Ning, Yichuan Ma, Zerun Ma, Ningsheng Ma, Runyuan Ma, Chengqi Lyu, Haijun Lv (possible past Baidu (China) affiliation), Han Lv, Lindong Lu, Kuikun Liu, Jiangning Liu, Yuhong Liu, Kai Liu (possible past Baidu (China) affiliation), Hongwei Liu, Zhoumianze Liu, Mengjie Liu, Ziyu Liu, Wenran Liu, Yang Liu (possible past Tsinghua University affiliation), Liwei Liu, Kaiwen Liu, Junyao Lin, Junming Lin, Tianyang Lin, Dahua Lin, Jianze Liang, Linyang Li, Peiji Li, Zonglin Li, Zehao Li, Pengze Li, Guoyan Li, Lingkai Kong, Linglin Jing, Zhenjiang Jin, Feifei Jiang, Qian Jiang, Junhao Huang, Zixian Huang, Haian Huang, Zhouqi Hua, Han Hu, Linfeng Hou, Yinan He, Conghui He (possible past Tsinghua University affiliation), Tianyao He, Xu Guo, Qipeng Guo, Aijia Guo, Yuzhe Gu, Lixin Gu, Jingyang Gong, Qiming Ge, Jiaye Ge, Songyang Gao, Jianfei Gao, Xinyu Fang, Caihua Fan, Yue Fan, Yanhui Duan, Zichen Ding, Shengyuan Ding, Xuanlang Dai, Erfei Cui, Ganqu Cui (possible past Tsinghua University affiliation), Pei Chu, Tao Chu, Guangran Cheng, Yu Cheng (possible past National University Of Singapore affiliation), Kai Chen (possible past Shanghai Jiao Tong University affiliation), Yongkang Chen, Chiyu Chen, Guanzhou Chen, Qiaosheng Chen, Sitao Chen, Xin Chen (possible past Tencent (China) affiliation), Haojiong Chen, Yicheng Chen, Weihan Cao, Yuhang Cao, Qinglong Cao, Lei Bai
Abstract

We introduce Intern-S1-Pro, the first one-trillion-parameter scientific multimodal foundation model. Scaling to this unprecedented size, the model delivers a comprehensive enhancement across both general and scientific domains. Beyond stronger reasoning and image-text understanding capabilities, its intelligence is augmented with advanced agent capabilities. Simultaneously, its scientific expertise has been vastly expanded to master over 100 specialized tasks across critical science fields, incl...

πŸ“„ Estimating near-verbatim extraction risk in language models with decoding-constrained beam search
πŸ—“οΈ Published: 3/26/2026
πŸ”— http://arxiv.org/abs/2603.24917v1
πŸ‘₯ Authors: A. Feder Cooper, Mark A. Lemley, Christopher De Sa, Lea Duesterwald, Allison Casasola, Jamie Hayes, Katherine Lee (possible past Google (United States) affiliation), Daniel E. Ho, Percy Liang (possible past Stanford University affiliation)
Abstract

Recent work shows that standard greedy-decoding extraction methods for quantifying memorization in LLMs miss how extraction risk varies across sequences. Probabilistic extraction -- computing the probability of generating a target suffix given a prefix under a decoding scheme -- addresses this, but is tractable only for verbatim memorization, missing near-verbatim instances that pose similar privacy and copyright risks. Quantifying near-verbatim extraction risk is expensive: the set of near-verb...

πŸ“„ DreamerAD: Efficient Reinforcement Learning via Latent World Model for Autonomous Driving
πŸ—“οΈ Published: 3/25/2026
πŸ”— http://arxiv.org/abs/2603.24587v1
πŸ‘₯ Authors: Pengxuan Yang, Yupeng Zheng, Deheng Qian, Zebin Xing, Qichao Zhang, Linbo Wang, Yichen Zhang, Shaoyu Guo, Zhongpu Xia, Qiang Chen (possible past Baidu (China) affiliation), Junyu Han (possible past Baidu (China) affiliation), Lingyun Xu, Yifeng Pan (possible past Baidu (China) affiliation), Dongbin Zhao
Abstract

We introduce DreamerAD, the first latent world model framework that enables efficient reinforcement learning for autonomous driving by compressing diffusion sampling from 100 steps to 1 - achieving 80x speedup while maintaining visual interpretability. Training RL policies on real-world driving data incurs prohibitive costs and safety risks. While existing pixel-level diffusion world models enable safe imagination-based training, they suffer from multi-step diffusion inference latency (2s/frame)...

πŸ“„ Demystifying When Pruning Works via Representation Hierarchies
πŸ—“οΈ Published: 3/25/2026
πŸ”— http://arxiv.org/abs/2603.24652v1
πŸ‘₯ Authors: Shwai He, Guoheng Sun, Haichao Zhang (possible past Baidu (China) affiliation), Yun Fu, Ang Li (possible past Google (United States) affiliation)
Abstract

Network pruning, which removes less important parameters or architectures, is often expected to improve efficiency while preserving performance. However, this expectation does not consistently hold across language tasks: pruned models can perform well on non-generative tasks but frequently fail in generative settings. To understand this discrepancy, we analyze network pruning from a representation-hierarchy perspective, decomposing the internal computation of language models into three sequentia...

πŸ“„ Scaling Recurrence-aware Foundation Models for Clinical Records via Next-Visit Prediction
πŸ—“οΈ Published: 3/25/2026
πŸ”— http://arxiv.org/abs/2603.24562v1
πŸ‘₯ Authors: Haresh Rengaraj Rajamohan, Xiang Gao, Weicheng Zhu, Shih-Lun Huang, Long Chen (possible past Tencent (China) affiliation), Gabe Schulman, Huizhen Jin, Shengduo Li, Yixuan Wang, Huidi Yang, Kyunghyun Cho (possible past Meta (United States) affiliation), Cem M. Deniz, Narges Razavian
Abstract

While large-scale pretraining has revolutionized language modeling, its potential remains underexplored in healthcare with structured electronic health records (EHRs). We present RAVEN, a novel generative pretraining strategy for sequential EHR data based on Recurrence-Aware next-Visit EveNt prediction. Leveraging a dataset of over one million unique individuals, our model learns to autoregressively generate tokenized clinical events for the next visit conditioned on patient history. We introduc...

πŸ“„ AVO: Agentic Variation Operators for Autonomous Evolutionary Search
πŸ—“οΈ Published: 3/25/2026
πŸ”— http://arxiv.org/abs/2603.24517v1
πŸ‘₯ Authors: Terry Chen, Zhifan Ye, Bing Xu (possible past Tsinghua University affiliation), Zihao Ye, Timmy Liu, Ali Hassani, Tianqi Chen (possible past University Of Washington affiliation), Andrew Kerr, Haicheng Wu, Yang Xu, Yu-Jung Chen, Hanfeng Chen, Aditya Kane, Ronny Krashinsky (possible past Nvidia (United States) affiliation), Ming-Yu Liu (possible past Nvidia (United States) affiliation), Vinod Grover, Luis Ceze, Roger Bringmann, John Tran, Wei Liu (possible past Tsinghua University affiliation), Fung Xie, Michael Lightstone, Humphrey Shi
Abstract

Agentic Variation Operators (AVO) are a new family of evolutionary variation operators that replace the fixed mutation, crossover, and hand-designed heuristics of classical evolutionary search with autonomous coding agents. Rather than confining a language model to candidate generation within a prescribed pipeline, AVO instantiates variation as a self-directed agent loop that can consult the current lineage, a domain-specific knowledge base, and execution feedback to propose, repair, critique, a...

πŸ“„ Composer 2 Technical Report
πŸ—“οΈ Published: 3/25/2026
πŸ”— http://arxiv.org/abs/2603.24477v2
πŸ‘₯ Authors: Cursor Research, :, Aaron Chan, Ahmed Shalaby, Alexander Wettig, Aman Sanger, Andrew Zhai, Anurag Ajay, Ashvin Nair (possible past University Of California, Berkeley affiliation), Charlie Snell, Chen Lu, Chen Shen (possible past Tencent (China) affiliation), Emily Jia, Federico Cassano, Hanpeng Liu, Haoyu Chen, Henry Wildermuth, Jacob Jackson, Janet Li, Jediah Katz, Jiajun Yao, Joey Hejna, Josh Warner, Julius Vering, Kevin Frans, Lee Danilek, Less Wright, Lujing Cen, Luke Melas-Kyriazi, Michael Truell, Michiel De Jong (possible past Google (United States) affiliation), Naman Jain, Nate Schmidt, Nathan Wang, Niklas Muennighoff, Oleg Rybkin, Paul Loh, Phillip Kravtsov, Rishabh Yadav, Sahil Shah, Sam Kottler, Alexander M Rush, Shengtong Zhang, Shomil Jain, Sriram Sankar, Stefan Heule (possible past Eth Zurich affiliation), Stuart H. Sul, Sualeh Asif, Victor Rong, Wanqi Zhu, William Lin, Yuchen Wu (possible past Google (United States) affiliation), Yuri Volkov, Yury Zemlyanskiy (possible past Google (United States) affiliation), Zack Holbrook, Zhiyuan Zhang
Abstract

Composer 2 is a specialized model designed for agentic software engineering. The model demonstrates strong long-term planning and coding intelligence while maintaining the ability to efficiently solve problems for interactive use. The model is trained in two phases: first, continued pretraining to improve the model's knowledge and latent coding ability, followed by large-scale reinforcement learning to improve end-to-end coding performance through stronger reasoning, accurate multi-step executio...

πŸ“„ A Deep Dive into Scaling RL for Code Generation with Synthetic Data and Curricula
πŸ—“οΈ Published: 3/25/2026
πŸ”— http://arxiv.org/abs/2603.24202v1
πŸ‘₯ Authors: Cansu Sancaktar, David Zhang (possible past Meta (United States) affiliation), Gabriel Synnaeve (possible past Meta (United States) affiliation), Taco Cohen
Abstract

Reinforcement learning (RL) has emerged as a powerful paradigm for improving large language models beyond supervised fine-tuning, yet sustaining performance gains at scale remains an open challenge, as data diversity and structure, rather than volume alone, become the limiting factor. We address this by introducing a scalable multi-turn synthetic data generation pipeline in which a teacher model iteratively refines problems based on in-context student performance summaries, producing structured ...

πŸ“„ Towards Effective Experiential Learning: Dual Guidance for Utilization and Internalization
πŸ—“οΈ Published: 3/25/2026
πŸ”— http://arxiv.org/abs/2603.24093v1
πŸ‘₯ Authors: Fei Bai, Zhipeng Chen, Chuan Hao, Ming Yang (possible past Meta (United States) affiliation), Ran Tao, Bryan Dai, Wayne Xin Zhao (possible past Baidu (China) affiliation), Jian Yang, Hongteng Xu
Abstract

Recently, reinforcement learning~(RL) has become an important approach for improving the capabilities of large language models~(LLMs). In particular, reinforcement learning from verifiable rewards~(RLVR) has emerged as a promising paradigm for reasoning tasks. However, existing RL-based training still remains only a rough approximation to human learning. Human learners leverage both external and internal experience to guide exploration and gradually internalize useful trajectories into stable kn...

*Notable papers are those with at least two authors from a "big" AI/ML lab.