πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ Think Before You Lie: How Reasoning Improves Honesty
πŸ—“οΈ Published: 3/10/2026
πŸ”— http://arxiv.org/abs/2603.09957v1
πŸ‘₯ Authors: Ann Yuan (possible past Google (United States) affiliation), Asma Ghandeharioun, Carter Blum, Alicia Machado, Jessica Hoffmann, Daphne Ippolito (possible past Google (United States) affiliation), Martin Wattenberg (possible past Google (United States) affiliation), Lucas Dixon (possible past Google (United States) affiliation), Katja Filippova (possible past Google (United States) affiliation)
Abstract

While existing evaluations of large language models (LLMs) measure deception rates, the underlying conditions that give rise to deceptive behavior are poorly understood. We investigate this question using a novel dataset of realistic moral trade-offs where honesty incurs variable costs. Contrary to humans, who tend to become less honest given time to deliberate (Capraro, 2017; Capraro et al., 2019), we find that reasoning consistently increases honesty across scales and for several LLM families....

πŸ“„ Adaptive Clinical-Aware Latent Diffusion for Multimodal Brain Image Generation and Missing Modality Imputation
πŸ—“οΈ Published: 3/10/2026
πŸ”— http://arxiv.org/abs/2603.09931v1
πŸ‘₯ Authors: Rong Zhou (possible past Google (United States) affiliation), Houliang Zhou, Yao Su, Brian Y. Chen, Yu Zhang (possible past Google (United States) affiliation), Lifang He, Alzheimer's Disease Neuroimaging Initiative
Abstract

Multimodal neuroimaging provides complementary insights for Alzheimer's disease diagnosis, yet clinical datasets frequently suffer from missing modalities. We propose ACADiff, a framework that synthesizes missing brain imaging modalities through adaptive clinical-aware diffusion. ACADiff learns mappings between incomplete multimodal observations and target modalities by progressively denoising latent representations while attending to available imaging data and clinical metadata. The framework e...

πŸ“„ MedMASLab: A Unified Orchestration Framework for Benchmarking Multimodal Medical Multi-Agent Systems
πŸ—“οΈ Published: 3/10/2026
πŸ”— http://arxiv.org/abs/2603.09909v1
πŸ‘₯ Authors: Yunhang Qian, Xiaobin Hu (possible past Tencent (China) affiliation), Jiaquan Yu, Siyang Xin, Xiaokun Chen, Jiangning Zhang (possible past Tencent (China) affiliation), Peng-Tao Jiang, Jiawei Liu, Hongwei Bran Li
Abstract

While Multi-Agent Systems (MAS) show potential for complex clinical decision support, the field remains hindered by architectural fragmentation and the lack of standardized multimodal integration. Current medical MAS research suffers from non-uniform data ingestion pipelines, inconsistent visual-reasoning evaluation, and a lack of cross-specialty benchmarking. To address these challenges, we present MedMASLab, a unified framework and benchmarking platform for multimodal medical multi-agent syste...

πŸ“„ Emerging Extrinsic Dexterity in Cluttered Scenes via Dynamics-aware Policy Learning
πŸ—“οΈ Published: 3/10/2026
πŸ”— http://arxiv.org/abs/2603.09882v1
πŸ‘₯ Authors: Yixin Zheng, Jiangran Lyu, Yifan Zhang, Jiayi Chen, Mi Yan, Yuntian Deng, Xuesong Shi, Xiaoguang Zhao, Yizhou Wang (possible past Peking University affiliation), Zhizheng Zhang, He Wang (possible past Stanford University affiliation)
Abstract

Extrinsic dexterity leverages environmental contact to overcome the limitations of prehensile manipulation. However, achieving such dexterity in cluttered scenes remains challenging and underexplored, as it requires selectively exploiting contact among multiple interacting objects with inherently coupled dynamics. Existing approaches lack explicit modeling of such complex dynamics and therefore fall short in non-prehensile manipulation in cluttered environments, which in turn limits their practi...

πŸ“„ Logics-Parsing-Omni Technical Report
πŸ—“οΈ Published: 3/10/2026
πŸ”— http://arxiv.org/abs/2603.09677v1
πŸ‘₯ Authors: Xin An, Jingyi Cai, Xiangyang Chen, Huayao Liu, Peiting Liu, Peng Wang (possible past Peking University affiliation), Bei Yang, Xiuwen Zhu, Yongfan Chen, Baoyu Hou, Shuzhao Li, Weidong Ren, Fan Yang (possible past Tencent (China) affiliation), Jiangtao Zhang, Xiaoxiao Xu, Lin Qu
Abstract

Addressing the challenges of fragmented task definitions and the heterogeneity of unstructured data in multimodal parsing, this paper proposes the Omni Parsing framework. This framework establishes a Unified Taxonomy covering documents, images, and audio-visual streams, introducing a progressive parsing paradigm that bridges perception and cognition. Specifically, the framework integrates three hierarchical levels: 1) Holistic Detection, which achieves precise spatial-temporal grounding of objec...

πŸ“„ Efficiently Aligning Draft Models via Parameter- and Data-Efficient Adaptation
πŸ—“οΈ Published: 3/10/2026
πŸ”— http://arxiv.org/abs/2603.09527v1
πŸ‘₯ Authors: Luxi Lin, Zhihang Lin, Zhanpeng Zeng, Yuhao Chen, Qingyu Zhang, Jixiang Luo, Xuelong Li (possible past Tencent (China) affiliation), Rongrong Ji (possible past Tencent (China) affiliation)
Abstract

Speculative decoding accelerates LLM inference but suffers from performance degradation when target models are fine-tuned for specific domains. A naive solution is to retrain draft models for every target model, which is costly and inefficient. To address this, we introduce a parameter- and data-efficient framework named Efficient Draft Adaptation, abbreviated as EDA, for efficiently adapting draft models. EDA introduces three innovations: (1) a decoupled architecture that utilizes shared and pr...

πŸ“„ Evolving Prompt Adaptation for Vision-Language Models
πŸ—“οΈ Published: 3/10/2026
πŸ”— http://arxiv.org/abs/2603.09493v1
πŸ‘₯ Authors: Enming Zhang, Jiayang Li, Yanru Wu, Zhenyu Liu (possible past Google (United States) affiliation), Yang Li (possible past Google (United States) affiliation)
Abstract

The adaptation of large-scale vision-language models (VLMs) to downstream tasks with limited labeled data remains a significant challenge. While parameter-efficient prompt learning methods offer a promising path, they often suffer from catastrophic forgetting of pre-trained knowledge. Toward addressing this limitation, our work is grounded in the insight that governing the evolutionary path of prompts is essential for forgetting-free adaptation. To this end, we propose EvoPrompt, a novel framewo...

πŸ“„ EvoDriveVLA: Evolving Autonomous Driving Vision-Language-Action Model via Collaborative Perception-Planning Distillation
πŸ—“οΈ Published: 3/10/2026
πŸ”— http://arxiv.org/abs/2603.09465v1
πŸ‘₯ Authors: Jiajun Cao, Xiaoan Zhang, Xiaobao Wei, Liyuqiu Huang, Wang Zijian, Hanzhen Zhang, Zhengyu Jia, Wei Mao, Hao Wang (possible past Tsinghua University affiliation), Xianming Liu (possible past Meta (United States) affiliation), Shuchang Zhou Liu, Yang Wang (possible past Baidu (China) affiliation), Shanghang Zhang
Abstract

Vision-Language-Action models have shown great promise for autonomous driving, yet they suffer from degraded perception after unfreezing the visual encoder and struggle with accumulated instability in long-term planning. To address these challenges, we propose EvoDriveVLA-a novel collaborative perception-planning distillation framework that integrates self-anchored perceptual constraints and oracle-guided trajectory optimization. Specifically, self-anchored visual distillation leverages self-anc...

πŸ“„ An Empirical Study and Theoretical Explanation on Task-Level Model-Merging Collapse
πŸ—“οΈ Published: 3/10/2026
πŸ”— http://arxiv.org/abs/2603.09463v1
πŸ‘₯ Authors: Yuan Cao (possible past Google (United States) affiliation), Dezhi Ran, Yuzhe Guo, Mengzhou Wu, Simin Chen, Linyi Li, Wei Yang (possible past Tencent (China) affiliation), Tao Xie
Abstract

Model merging unifies independently fine-tuned LLMs from the same base, enabling reuse and integration of parallel development efforts without retraining. However, in practice we observe that merging does not always succeed: certain combinations of task-specialist models suffer from catastrophic performance degradation after merging. We refer to this failure mode as merging collapse. Intuitively, collapse arises when the learned representations or parameter adjustments for different tasks are fu...

πŸ“„ ICDAR 2025 Competition on End-to-End Document Image Machine Translation Towards Complex Layouts
πŸ—“οΈ Published: 3/10/2026
πŸ”— http://arxiv.org/abs/2603.09392v1
πŸ‘₯ Authors: Yaping Zhang, Yupu Liang, Zhiyang Zhang, Zhiyuan Chen (possible past Google (United States) affiliation), Lu Xiang, Yang Zhao (possible past Google (United States) affiliation), Yu Zhou, Chengqing Zong
Abstract

Document Image Machine Translation (DIMT) seeks to translate text embedded in document images from one language to another by jointly modeling both textual content and page layout, bridging optical character recognition (OCR) and natural language processing (NLP). The DIMT 2025 Challenge advances research on end-to-end document image translation, a rapidly evolving area within multimodal document understanding. The competition features two tracks, OCR-free and OCR-based, each with two subtasks f...

πŸ“„ Logos: An evolvable reasoning engine for rational molecular design
πŸ—“οΈ Published: 3/10/2026
πŸ”— http://arxiv.org/abs/2603.09268v1
πŸ‘₯ Authors: Haibin Wen, Zhe Zhao (possible past Tencent (China) affiliation), Fanfu Wang, Tianyi Xu, Hao Zhang (possible past Tencent (China) affiliation), Chao Yang, Ye Wei
Abstract

The discovery and design of functional molecules remain central challenges across chemistry,biology, and materials science. While recent advances in machine learning have accelerated molecular property prediction and candidate generation, existing models tend to excel either in physical fidelity without transparent reasoning, or in flexible reasoning without guarantees of chemical validity. This imbalance limits the reliability of artificial intelligence systems in real scientific design workflo...

πŸ“„ Latent-DARM: Bridging Discrete Diffusion And Autoregressive Models For Reasoning
πŸ—“οΈ Published: 3/10/2026
πŸ”— http://arxiv.org/abs/2603.09184v1
πŸ‘₯ Authors: Lina Berrayana, Ahmed Heakl, Abdullah Sohail, Thomas Hofmann (possible past Google (United States) affiliation), Salman Khan (possible past Inception Institute Of Artificial Intelligence affiliation), Wei Chen
Abstract

Most multi-agent systems rely exclusively on autoregressive language models (ARMs) that are based on sequential generation. Although effective for fluent text, ARMs limit global reasoning and plan revision. On the other hand, Discrete Diffusion Language Models (DDLMs) enable non-sequential, globally revisable generation and have shown strong planning capabilities, but their limited text fluency hinders direct collaboration with ARMs. We introduce Latent-DARM, a latent-space communication framewo...

πŸ“„ Reinforced Generation of Combinatorial Structures: Ramsey Numbers
πŸ—“οΈ Published: 3/10/2026
πŸ”— http://arxiv.org/abs/2603.09172v1
πŸ‘₯ Authors: Ansh Nagda, Prabhakar Raghavan (possible past Google (United States) affiliation), Abhradeep Thakurta (possible past Google (United States) affiliation)
Abstract

We present improved lower bounds for five classical Ramsey numbers: $\mathbf{R}(3, 13)$ is increased from $60$ to $61$, $\mathbf{R}(3, 18)$ from $99$ to $100$, $\mathbf{R}(4, 13)$ from $138$ to $139$, $\mathbf{R}(4, 14)$ from $147$ to $148$, and $\mathbf{R}(4, 15)$ from $158$ to $159$. These results were achieved using~\emph{AlphaEvolve}, an LLM-based code mutation agent. Beyond these new results, we successfully recovered lower bounds for all Ramsey numbers known to be exact, and matched the be...

πŸ“„ ZeroWBC: Learning Natural Visuomotor Humanoid Control Directly from Human Egocentric Video
πŸ—“οΈ Published: 3/10/2026
πŸ”— http://arxiv.org/abs/2603.09170v1
πŸ‘₯ Authors: Haoran Yang, Jiacheng Bao, Yucheng Xin, Haoming Song, Yuyang Tian, Bin Zhao, Dong Wang (possible past Tsinghua University affiliation), Xuelong Li (possible past Tencent (China) affiliation)
Abstract

Achieving versatile and naturalistic whole-body control for humanoid robot scene-interaction remains a significant challenge. While some recent works have demonstrated autonomous humanoid interactive control, they are constrained to rigid locomotion patterns and expensive teleoperation data collection, lacking the versatility to execute more human-like natural behaviors such as sitting or kicking. Furthermore, acquiring the necessary real robot teleoperation data is prohibitively expensive and t...

πŸ“„ Deep Tabular Research via Continual Experience-Driven Execution
πŸ—“οΈ Published: 3/10/2026
πŸ”— http://arxiv.org/abs/2603.09151v1
πŸ‘₯ Authors: Junnan Dong, Chuang Zhou, Zheng Yuan, Yifei Yu, Siyu An, Di Yin, Xing Sun (possible past Tencent (China) affiliation), Feiyue Huang (possible past Tencent (China) affiliation)
Abstract

Large language models often struggle with complex long-horizon analytical tasks over unstructured tables, which typically feature hierarchical and bidirectional headers and non-canonical layouts. We formalize this challenge as Deep Tabular Research (DTR), requiring multi-step reasoning over interdependent table regions. To address DTR, we propose a novel agentic framework that treats tabular reasoning as a closed-loop decision-making process. We carefully design a coupled query and table compreh...

πŸ“„ Meissa: Multi-modal Medical Agentic Intelligence
πŸ—“οΈ Published: 3/9/2026
πŸ”— http://arxiv.org/abs/2603.09018v1
πŸ‘₯ Authors: Yixiong Chen, Xinyi Bai, Yue Pan, Zongwei Zhou (possible past Google (United States) affiliation), Alan Yuille (possible past Google (United States) affiliation)
Abstract

Multi-modal large language models (MM-LLMs) have shown strong performance in medical image understanding and clinical reasoning. Recent medical agent systems extend them with tool use and multi-agent collaboration, enabling complex decision-making. However, these systems rely almost entirely on frontier models (e.g., GPT), whose API-based deployment incurs high cost, high latency, and privacy risks that conflict with on-premise clinical requirements. We present Meissa, a lightweight 4B-parameter...

πŸ“„ Fish Audio S2 Technical Report
πŸ—“οΈ Published: 3/9/2026
πŸ”— http://arxiv.org/abs/2603.08823v1
πŸ‘₯ Authors: Shijia Liao, Yuxuan Wang (possible past Google (United States) affiliation), Songting Liu, Yifan Cheng, Ruoyi Zhang, Tianyu Li, Shidong Li, Yisheng Zheng, Xingwei Liu, Qingzheng Wang, Zhizhuo Zhou, Jiahua Liu, Xin Chen (possible past Tencent (China) affiliation), Dawei Han
Abstract

We introduce Fish Audio S2, an open-sourced text-to-speech system featuring multi-speaker, multi-turn generation, and, most importantly, instruction-following control via natural-language descriptions. To scale training, we develop a multi-stage training recipe together with a staged data pipeline covering video captioning and speech captioning, voice-quality assessment, and reward modeling. To push the frontier of open-source TTS, we release our model weights, fine-tuning code, and an SGLang-ba...

πŸ“„ A New Lower Bound for the Random Offerer Mechanism in Bilateral Trade using AI-Guided Evolutionary Search
πŸ—“οΈ Published: 3/9/2026
πŸ”— http://arxiv.org/abs/2603.08679v1
πŸ‘₯ Authors: Yang Cai, Vineet Gupta (possible past Google (United States) affiliation), Zun Li, Aranyak Mehta (possible past Google (United States) affiliation)
Abstract

The celebrated Myerson--Satterthwaite theorem shows that in bilateral trade, no mechanism can be simultaneously fully efficient, Bayesian incentive compatible (BIC), and budget balanced (BB). This naturally raises the question of how closely the gains from trade (GFT) achievable by a BIC and BB mechanism can approximate the first-best (fully efficient) benchmark. The optimal BIC and BB mechanism is typically complex and highly distribution-dependent, making it difficult to characterize directly....

πŸ“„ OfficeQA Pro: An Enterprise Benchmark for End-to-End Grounded Reasoning
πŸ—“οΈ Published: 3/9/2026
πŸ”— http://arxiv.org/abs/2603.08655v1
πŸ‘₯ Authors: Krista Opsahl-Ong, Arnav Singhvi, Jasmine Collins (possible past Google (United States) affiliation), Ivan Zhou, Cindy Wang, Ashutosh Baheti, Owen Oertell, Jacob Portes, Sam Havens, Erich Elsen (possible past Google (United States) affiliation), Michael Bendersky (possible past Google (United States) affiliation), Matei Zaharia (possible past University Of California, Berkeley affiliation), Xing Chen
Abstract

We introduce OfficeQA Pro, a benchmark for evaluating AI agents on grounded, multi-document reasoning over a large and heterogeneous document corpus. The corpus consists of U.S. Treasury Bulletins spanning nearly 100 years, comprising 89,000 pages and over 26 million numerical values. OfficeQA Pro consists of 133 questions that require precise document parsing, retrieval, and analytical reasoning across both unstructured text and tabular data. Frontier LLMs including Claude Opus 4.6, GPT-5.4, an...

πŸ“„ Towards Effective and Efficient Graph Alignment without Supervision
πŸ—“οΈ Published: 3/9/2026
πŸ”— http://arxiv.org/abs/2603.08526v1
πŸ‘₯ Authors: Songyang Chen, Youfang Lin, Yu Liu, Shuai Zheng (possible past University Of Oxford affiliation), Lei Zou (possible past Peking University affiliation)
Abstract

Unsupervised graph alignment aims to find the node correspondence across different graphs without any anchor node pairs. Despite the recent efforts utilizing deep learning-based techniques, such as the embedding and optimal transport (OT)-based approaches, we observe their limitations in terms of model accuracy-efficiency tradeoff. By focusing on the exploitation of local and global graph information, we formalize them as the ``local representation, global alignment'' paradigm, and present a new...

πŸ“„ A prospective clinical feasibility study of a conversational diagnostic AI in an ambulatory primary care clinic
πŸ—“οΈ Published: 3/9/2026
πŸ”— http://arxiv.org/abs/2603.08448v2
πŸ‘₯ Authors: Peter Brodeur, Jacob M. Koshy, Anil Palepu (possible past Google (United States) affiliation), Khaled Saab, Ava Homiar, Roma Ruparel, Charles Wu, Ryutaro Tanno (possible past Google (United States) affiliation), Joseph Xu, Amy Wang, David Stutz, Hannah M. Ferrera, David Barrett, Lindsey Crowley, Jihyeon Lee, Spencer E. Rittner, Ellery Wulczyn (possible past Google (United States) affiliation), Selena K. Zhang, Elahe Vedadi, Christine G. Kohn, Kavita Kulkarni, Vinay Kadiyala, Sara Mahdavi, Wendy Du, Jessica Williams, David Feinbloom, Renee Wong (possible past Google (United States) affiliation), Tao Tu (possible past Google (United States) affiliation), Petar Sirkovic, Alessio Orlandi, Christopher Semturs (possible past Google (United States) affiliation), Yun Liu (possible past Google (United States) affiliation), Juraj Gottweis (possible past Google (United States) affiliation), Dale R. Webster (possible past Google (United States) affiliation), JoΓ«lle Barral (possible past Google (United States) affiliation), Katherine Chou (possible past Google (United States) affiliation), Pushmeet Kohli (possible past Google (United States) affiliation), Avinatan Hassidim (possible past Google (United States) affiliation), Yossi Matias (possible past Google (United States) affiliation), James Manyika, Rob Fields, Jonathan X. Li, Marc L. Cohen, Vivek Natarajan (possible past Google (United States) affiliation), Mike Schaekermann (possible past Google (United States) affiliation), Alan Karthikesalingam (possible past Google (United States) affiliation), Adam Rodman
Abstract

Large language model (LLM)-based AI systems have shown promise for patient-facing diagnostic and management conversations in simulated settings. Translating these systems into clinical practice requires assessment in real-world workflows with rigorous safety oversight. We report a prospective, single-arm feasibility study of an LLM-based conversational AI, the Articulate Medical Intelligence Explorer (AMIE), conducting clinical history taking and presentation of potential diagnoses for patients ...

πŸ“„ Beyond Test-Time Training: Learning to Reason via Hardware-Efficient Optimal Control
πŸ—“οΈ Published: 3/10/2026
πŸ”— http://arxiv.org/abs/2603.09221v1
πŸ‘₯ Authors: Peihao Wang, Shan Yang (possible past Google (United States) affiliation), Xijun Wang, Tesi Xiao, Xin Liu, Changlong Yu, Yu Lou, Pan Li (possible past Baidu (China) affiliation), Zhangyang Wang, Ming Lin, RenΓ© Vidal
Abstract

Associative memory has long underpinned the design of sequential models. Beyond recall, humans reason by projecting future states and selecting goal-directed actions, a capability that modern language models increasingly require but do not natively encode. While prior work uses reinforcement learning or test-time training, planning remains external to the model architecture. We formulate reasoning as optimal control and introduce the Test-Time Control (TTC) layer, which performs finite-horizon L...

πŸ“„ SCALAR: Learning and Composing Skills through LLM Guided Symbolic Planning and Deep RL Grounding
πŸ—“οΈ Published: 3/10/2026
πŸ”— http://arxiv.org/abs/2603.09036v1
πŸ‘₯ Authors: Renos Zabounidis, Yue Wu, Simon Stepputtis, Woojun Kim, Yuanzhi Li (possible past University Of California, Berkeley affiliation), Tom Mitchell, Katia Sycara (possible past Carnegie Mellon University affiliation)
Abstract

LM-based agents excel when given high-level action APIs but struggle to ground language into low-level control. Prior work has LLMs generate skills or reward functions for RL, but these one-shot approaches lack feedback to correct specification errors. We introduce SCALAR, a bidirectional framework coupling LLM planning with RL through a learned skill library. The LLM proposes skills with preconditions and effects; RL trains policies for each skill and feeds back execution results to iteratively...

πŸ“„ Two Teachers Better Than One: Hardware-Physics Co-Guided Distributed Scientific Machine Learning
πŸ—“οΈ Published: 3/10/2026
πŸ”— http://arxiv.org/abs/2603.09032v1
πŸ‘₯ Authors: Yuchen Yuan (possible past Baidu (China) affiliation), Junhuan Yang, Hao Wan, Yipei Liu, Hanhan Wu, Youzuo Lin, Lei Yang (possible past Google (United States) affiliation)
Abstract

Scientific machine learning (SciML) is increasingly applied to in-field processing, controlling, and monitoring; however, wide-area sensing, real-time demands, and strict energy and reliability constraints make centralized SciML implementation impractical. Most SciML models assume raw data aggregation at a central node, incurring prohibitively high communication latency and energy costs; yet, distributing models developed for general-purpose ML often breaks essential physical principles, resulti...

πŸ“„ The Coupling Within: Flow Matching via Distilled Normalizing Flows
πŸ—“οΈ Published: 3/9/2026
πŸ”— http://arxiv.org/abs/2603.09014v1
πŸ‘₯ Authors: David Berthelot (possible past Google (United States) affiliation), Tianrong Chen, Jiatao Gu (possible past Meta (United States) affiliation), Marco Cuturi, Laurent Dinh (possible past Google (United States) affiliation), Bhavik Chandna, Michal Klein, Josh Susskind, Shuangfei Zhai
Abstract

Flow models have rapidly become the go-to method for training and deploying large-scale generators, owing their success to inference-time flexibility via adjustable integration steps. A crucial ingredient in flow training is the choice of coupling measure for sampling noise/data pairs that define the flow matching (FM) regression loss. While FM training defaults usually to independent coupling, recent works show that adaptive couplings informed by noise/data distributions (e.g., via optimal tran...

πŸ“„ How Far Can Unsupervised RLVR Scale LLM Training?
πŸ—“οΈ Published: 3/9/2026
πŸ”— http://arxiv.org/abs/2603.08660v1
πŸ‘₯ Authors: Bingxiang He, Yuxin Zuo, Zeyuan Liu, Shangziqi Zhao, Zixuan Fu, Junlin Yang, Cheng Qian, Kaiyan Zhang, Yuchen Fan, Ganqu Cui (possible past Tsinghua University affiliation), Xiusi Chen (possible past Peking University affiliation), Youbang Sun, Xingtai Lv, Xuekai Zhu, Li Sheng (possible past Google (United States) affiliation), Ran Li, Huan-Ang Gao, Yuchen Zhang (possible past University Of California, Berkeley affiliation), Bowen Zhou, Zhiyuan Liu (possible past Tsinghua University affiliation), Ning Ding (possible past Tsinghua University affiliation)
Abstract

Unsupervised reinforcement learning with verifiable rewards (URLVR) offers a pathway to scale LLM training beyond the supervision bottleneck by deriving rewards without ground truth labels. Recent works leverage model intrinsic signals, showing promising early gains, yet their potential and limitations remain unclear. In this work, we revisit URLVR and provide a comprehensive analysis spanning taxonomy, theory and extensive experiments. We first classify URLVR methods into intrinsic versus exter...

πŸ“„ Revealing Behavioral Plasticity in Large Language Models: A Token-Conditional Perspective
πŸ—“οΈ Published: 3/9/2026
πŸ”— http://arxiv.org/abs/2603.08398v1
πŸ‘₯ Authors: Liyuan Mao, Le Yu (possible past Tsinghua University affiliation), Jing Zhou, Chujie Zheng, Bowen Yu, Chang Gao, Shixuan Liu, An Yang, Weinan Zhang (possible past Shanghai Jiao Tong University affiliation), Junyang Lin
Abstract

In this work, we reveal that Large Language Models (LLMs) possess intrinsic behavioral plasticity-akin to chameleons adapting their coloration to environmental cues-that can be exposed through token-conditional generation and stabilized via reinforcement learning. Specifically, by conditioning generation on carefully selected token prefixes sampled from responses exhibiting desired behaviors, LLMs seamlessly adapt their behavioral modes at inference time (e.g., switching from step-by-step reason...

πŸ“„ PolyFormer: learning efficient reformulations for scalable optimization under complex physical constraints
πŸ—“οΈ Published: 3/9/2026
πŸ”— http://arxiv.org/abs/2603.08283v1
πŸ‘₯ Authors: Yilin Wen, Yi Guo, Bo Zhao (possible past National University Of Singapore affiliation), Wei Qi (possible past Baidu (China) affiliation), Zechun Hu, Colin Jones, Jian Sun (possible past Microsoft (United States) affiliation)
Abstract

Real-world optimization problems are often constrained by complex physical laws that limit computational scalability. These constraints are inherently tied to complex regions, and thus learning models that incorporate physical and geometric knowledge, i.e., physics-informed machine learning (PIML), offer a promising pathway for efficient solution. Here, we introduce PolyFormer, which opens a new direction for PIML in prescriptive optimization tasks, where physical and geometric knowledge is not ...

*Notable papers are those with at least two authors from a "big" AI/ML lab.