📄 Notable* Recent AI/ML arXiv Papers

Last updated just now...

📄 COMIC: Agentic Sketch Comedy Generation
🗓️ Published: 3/11/2026
🔗 http://arxiv.org/abs/2603.11048v1
👥 Authors: Susung Hong, Brian Curless (possible past University Of Washington affiliation), Ira Kemelmacher-Shlizerman (possible past University Of Washington affiliation), Steve Seitz (possible past Google (United States) affiliation)
Abstract

We propose a fully automated AI system that produces short comedic videos similar to sketch shows such as Saturday Night Live. Starting with character references, the system employs a population of agents loosely based on real production studio roles, structured to optimize the quality and diversity of ideas and outputs through iterative competition, evaluation, and improvement. A key contribution is the introduction of LLM critics aligned with real viewer preferences through the analysis of a c...

📄 Dynamics-Predictive Sampling for Active RL Finetuning of Large Reasoning Models
🗓️ Published: 3/11/2026
🔗 http://arxiv.org/abs/2603.10887v1
👥 Authors: Yixiu Mao, Yun Qu, Qi Wang (possible past Tsinghua University affiliation), Heming Zou, Xiangyang Ji (possible past Tsinghua University affiliation)
Abstract

Reinforcement learning (RL) finetuning has become a key technique for enhancing the reasoning abilities of large language models (LLMs). However, its effectiveness critically depends on the selection of training data. Recent advances underscore the importance of online prompt selection methods, which typically concentrate training on partially solved or moderately challenging examples under the current policy, thereby yielding more effective model updates. While significantly accelerating RL fin...

📄 Towards Cold-Start Drafting and Continual Refining: A Value-Driven Memory Approach with Application to NPU Kernel Synthesis
🗓️ Published: 3/11/2026
🔗 http://arxiv.org/abs/2603.10846v1
👥 Authors: Yujie Zheng, Zhuo Li, Shengtao Zhang, Hanjing Wang, Junjie Sheng, Jiaqian Wang, Junchi Yan (possible past Shanghai Jiao Tong University affiliation), Weinan Zhang (possible past Shanghai Jiao Tong University affiliation), Ying Wen, Bo Tang, Muning Wen
Abstract

Deploying Large Language Models to data-scarce programming domains poses significant challenges, particularly for kernel synthesis on emerging Domain-Specific Architectures where a "Data Wall" limits available training data. While models excel on data-rich platforms like CUDA, they suffer catastrophic performance drops on data-scarce ecosystems such as NPU programming. To overcome this cold-start barrier without expensive fine-tuning, we introduce EvoKernel, a self-evolving agentic framework tha...

📄 Emulating Clinician Cognition via Self-Evolving Deep Clinical Research
🗓️ Published: 3/11/2026
🔗 http://arxiv.org/abs/2603.10677v1
👥 Authors: Ruiyang Ren (possible past Baidu (China) affiliation), Yuhao Wang, Yunsen Liang, Lan Luo, Jing Liu (possible past Baidu (China) affiliation), Haifeng Wang (possible past Google (United States) affiliation), Cong Feng, Yinan Zhang, Chunyan Miao, Ji-Rong Wen, Wayne Xin Zhao (possible past Baidu (China) affiliation)
Abstract

Clinical diagnosis is a complex cognitive process, grounded in dynamic cue acquisition and continuous expertise accumulation. Yet most current artificial intelligence (AI) systems are misaligned with this reality, treating diagnosis as single-pass retrospective prediction while lacking auditable mechanisms for governed improvement. We developed DxEvolve, a self-evolving diagnostic agent that bridges these gaps through an interactive deep clinical research workflow. The framework autonomously req...

📄 Recover to Predict: Progressive Retrospective Learning for Variable-Length Trajectory Prediction
🗓️ Published: 3/11/2026
🔗 http://arxiv.org/abs/2603.10597v1
👥 Authors: Hao Zhou, Lu Qi (possible past Tencent (China) affiliation), Jason Li (possible past Nvidia (United States) affiliation), Jie Zhang, Yi Liu (possible past Google (United States) affiliation), Xu Yang, Mingyu Fan, Fei Luo
Abstract

Trajectory prediction is critical for autonomous driving, enabling safe and efficient planning in dense, dynamic traffic. Most existing methods optimize prediction accuracy under fixed-length observations. However, real-world driving often yields variable-length, incomplete observations, posing a challenge to these methods. A common strategy is to directly map features from incomplete observations to those from complete ones. This one-shot mapping, however, struggles to learn accurate representa...

📄 IH-Challenge: A Training Dataset to Improve Instruction Hierarchy on Frontier LLMs
🗓️ Published: 3/11/2026
🔗 http://arxiv.org/abs/2603.10521v1
👥 Authors: Chuan Guo, Juan Felipe Ceron Uribe, Sicheng Zhu, Christopher A. Choquette-Choo (possible past Google (United States) affiliation), Steph Lin, Nikhil Kandpal, Milad Nasr (possible past Google (United States) affiliation), Rai, Sam Toyer, Miles Wang, Yaodong Yu, Alex Beutel (possible past Google (United States) affiliation), Kai Xiao
Abstract

Instruction hierarchy (IH) defines how LLMs prioritize system, developer, user, and tool instructions under conflict, providing a concrete, trust-ordered policy for resolving instruction conflicts. IH is key to defending against jailbreaks, system prompt extractions, and agentic prompt injections. However, robust IH behavior is difficult to train: IH failures can be confounded with instruction-following failures, conflicts can be nuanced, and models can learn shortcuts such as overrefusing. We i...

📄 Aligning Large Language Models with Searcher Preferences
🗓️ Published: 3/11/2026
🔗 http://arxiv.org/abs/2603.10473v1
👥 Authors: Wei Wu (possible past Tencent (China) affiliation), Peilun Zhou, Liyi Chen, Qimeng Wang, Chengqiang Lu, Yan Gao, Yi Wu (possible past University Of California, Berkeley affiliation), Yao Hu, Hui Xiong (possible past Baidu (China) affiliation)
Abstract

The paradigm shift from item-centric ranking to answer-centric synthesis is redefining the role of search engines. While recent industrial progress has applied generative techniques to closed-set item ranking in e-commerce, research and deployment of open-ended generative search on large content platforms remain limited. This setting introduces challenges, including robustness to noisy retrieval, non-negotiable safety guarantees, and alignment with diverse user needs. In this work, we introduce ...

📄 Modeling Stage-wise Evolution of User Interests for News Recommendation
🗓️ Published: 3/11/2026
🔗 http://arxiv.org/abs/2603.10471v1
👥 Authors: Zhiyong Cheng, Yike Jin, Zhijie Zhang (possible past Tencent (China) affiliation), Huilin Chen, Zhangling Duan, Meng Wang (possible past Google (United States) affiliation)
Abstract

Personalized news recommendation is highly time-sensitive, as user interests are often driven by emerging events, trending topics, and shifting real-world contexts. These dynamics make it essential to model not only users' long-term preferences, which reflect stable reading habits and high-order collaborative patterns, but also their short-term, context-dependent interests that change rapidly over time. However, most existing approaches rely on a single static interaction graph, which struggles ...

📄 UniPINN: A Unified PINN Framework for Multi-task Learning of Diverse Navier-Stokes Equations
🗓️ Published: 3/11/2026
🔗 http://arxiv.org/abs/2603.10466v1
👥 Authors: Dengdi Sun, Jie Chen (possible past Tencent (China) affiliation), Xiao Wang (possible past Google (United States) affiliation), Jin Tang
Abstract

Physics-Informed Neural Networks (PINNs) have shown promise in solving incompressible Navier-Stokes equations, yet existing approaches are predominantly designed for single-flow settings. When extended to multi-flow scenarios, these methods face three key challenges: (1) difficulty in simultaneously capturing both shared physical principles and flow-specific characteristics, (2) susceptibility to inter-task negative transfer that degrades prediction accuracy, and (3) unstable training dynamics c...

📄 The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training
🗓️ Published: 3/11/2026
🔗 http://arxiv.org/abs/2603.10444v1
👥 Authors: Hengjie Cao, Zhendong Huang, Mengyi Chen, Yifeng Yang, Fanqi Yu, Ruijun Huang, Fang Dong, Xin Zhang (possible past Google (United States) affiliation), Jixian Zhou, Anrui Chen, Mingzhi Dong, Yujiang Wang, Jinlong Hou (possible past Tencent (China) affiliation), Qin Lv, Yuan Cheng, Tun Lu, Fan Yang (possible past Tencent (China) affiliation), Li Shang
Abstract

Large language models trained on natural language exhibit pronounced anisotropy: a small number of directions concentrate disproportionate energy, while the remaining dimensions form a broad semantic tail. In low-bit training regimes, this geometry becomes numerically unstable. Because blockwise quantization scales are determined by extreme elementwise magnitudes, dominant directions stretch the dynamic range, compressing long-tail semantic variation into narrow numerical bins. We show that this...

📄 On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD
🗓️ Published: 3/11/2026
🔗 http://arxiv.org/abs/2603.10397v1
👥 Authors: Tongcheng Zhang, Zhanpeng Zhou, Mingze Wang, Andi Han, Wei Huang (possible past Google (United States) affiliation), Taiji Suzuki, Junchi Yan (possible past Shanghai Jiao Tong University affiliation)
Abstract

One crucial factor behind the success of deep learning lies in the implicit bias induced by noise inherent in gradient-based training algorithms. Motivated by empirical observations that training with noisy labels improves model generalization, we delve into the underlying mechanisms behind stochastic gradient descent (SGD) with label noise. Focusing on a two-layer over-parameterized linear network, we analyze the learning dynamics of label noise SGD, unveiling a two-phase learning behavior. In ...

📄 HEAL: Hindsight Entropy-Assisted Learning for Reasoning Distillation
🗓️ Published: 3/11/2026
🔗 http://arxiv.org/abs/2603.10359v1
👥 Authors: Wenjing Zhang, Jiangze Yan, Jieyun Huang, Yi Shen (possible past Baidu (China) affiliation), Shuming Shi (possible past Tencent (China) affiliation), Ping Chen, Ning Wang, Zhaoxiang Liu, Kai Wang, Shiguo Lian
Abstract

Distilling reasoning capabilities from Large Reasoning Models (LRMs) into smaller models is typically constrained by the limitation of rejection sampling. Standard methods treat the teacher as a static filter, discarding complex "corner-case" problems where the teacher fails to explore valid solutions independently, thereby creating an artificial "Teacher Ceiling" for the student. In this work, we propose Hindsight Entropy-Assisted Learning (HEAL), an RL-free framework designed to bridge this re...

📄 Mitigating Translationese Bias in Multilingual LLM-as-a-Judge via Disentangled Information Bottleneck
🗓️ Published: 3/11/2026
🔗 http://arxiv.org/abs/2603.10351v1
👥 Authors: Hongbin Zhang, Kehai Chen, Xuefen Bai, Youcheng Pan, Yang Xiang, Jinpeng Wang (possible past Tencent (China) affiliation), Min Zhang (possible past Tsinghua University affiliation)
Abstract

Large language models (LLMs) have become a standard for multilingual evaluation, yet they exhibit a severe systematic translationese bias. In this paper, translationese bias is characterized as LLMs systematically favoring machine-translated text over human-authored references, particularly in low-resource languages. We attribute this bias to spurious correlations with (i) latent manifold alignment with English and (ii) cross-lingual predictability. To mitigate this bias, we propose DIBJudge, a ...

📄 Think Before You Lie: How Reasoning Improves Honesty
🗓️ Published: 3/10/2026
🔗 http://arxiv.org/abs/2603.09957v1
👥 Authors: Ann Yuan (possible past Google (United States) affiliation), Asma Ghandeharioun, Carter Blum, Alicia Machado, Jessica Hoffmann, Daphne Ippolito (possible past Google (United States) affiliation), Martin Wattenberg (possible past Google (United States) affiliation), Lucas Dixon (possible past Google (United States) affiliation), Katja Filippova (possible past Google (United States) affiliation)
Abstract

While existing evaluations of large language models (LLMs) measure deception rates, the underlying conditions that give rise to deceptive behavior are poorly understood. We investigate this question using a novel dataset of realistic moral trade-offs where honesty incurs variable costs. Contrary to humans, who tend to become less honest given time to deliberate (Capraro, 2017; Capraro et al., 2019), we find that reasoning consistently increases honesty across scales and for several LLM families....

📄 Code-Space Response Oracles: Generating Interpretable Multi-Agent Policies with Large Language Models
🗓️ Published: 3/10/2026
🔗 http://arxiv.org/abs/2603.10098v1
👥 Authors: Daniel Hennes (possible past Deepmind (United Kingdom) affiliation), Zun Li, John Schultz, Marc Lanctot (possible past Google (United States) affiliation)
Abstract

Recent advances in multi-agent reinforcement learning, particularly Policy-Space Response Oracles (PSRO), have enabled the computation of approximate game-theoretic equilibria in increasingly complex domains. However, these methods rely on deep reinforcement learning oracles that produce `black-box' neural network policies, making them difficult to interpret, trust or debug. We introduce Code-Space Response Oracles (CSRO), a novel framework that addresses this challenge by replacing RL oracles w...

📄 Adaptive Clinical-Aware Latent Diffusion for Multimodal Brain Image Generation and Missing Modality Imputation
🗓️ Published: 3/10/2026
🔗 http://arxiv.org/abs/2603.09931v1
👥 Authors: Rong Zhou (possible past Google (United States) affiliation), Houliang Zhou, Yao Su, Brian Y. Chen, Yu Zhang (possible past Google (United States) affiliation), Lifang He, Alzheimer's Disease Neuroimaging Initiative
Abstract

Multimodal neuroimaging provides complementary insights for Alzheimer's disease diagnosis, yet clinical datasets frequently suffer from missing modalities. We propose ACADiff, a framework that synthesizes missing brain imaging modalities through adaptive clinical-aware diffusion. ACADiff learns mappings between incomplete multimodal observations and target modalities by progressively denoising latent representations while attending to available imaging data and clinical metadata. The framework e...

📄 MedMASLab: A Unified Orchestration Framework for Benchmarking Multimodal Medical Multi-Agent Systems
🗓️ Published: 3/10/2026
🔗 http://arxiv.org/abs/2603.09909v1
👥 Authors: Yunhang Qian, Xiaobin Hu (possible past Tencent (China) affiliation), Jiaquan Yu, Siyang Xin, Xiaokun Chen, Jiangning Zhang (possible past Tencent (China) affiliation), Peng-Tao Jiang, Jiawei Liu, Hongwei Bran Li
Abstract

While Multi-Agent Systems (MAS) show potential for complex clinical decision support, the field remains hindered by architectural fragmentation and the lack of standardized multimodal integration. Current medical MAS research suffers from non-uniform data ingestion pipelines, inconsistent visual-reasoning evaluation, and a lack of cross-specialty benchmarking. To address these challenges, we present MedMASLab, a unified framework and benchmarking platform for multimodal medical multi-agent syste...

📄 Emerging Extrinsic Dexterity in Cluttered Scenes via Dynamics-aware Policy Learning
🗓️ Published: 3/10/2026
🔗 http://arxiv.org/abs/2603.09882v1
👥 Authors: Yixin Zheng, Jiangran Lyu, Yifan Zhang, Jiayi Chen, Mi Yan, Yuntian Deng, Xuesong Shi, Xiaoguang Zhao, Yizhou Wang (possible past Peking University affiliation), Zhizheng Zhang, He Wang (possible past Stanford University affiliation)
Abstract

Extrinsic dexterity leverages environmental contact to overcome the limitations of prehensile manipulation. However, achieving such dexterity in cluttered scenes remains challenging and underexplored, as it requires selectively exploiting contact among multiple interacting objects with inherently coupled dynamics. Existing approaches lack explicit modeling of such complex dynamics and therefore fall short in non-prehensile manipulation in cluttered environments, which in turn limits their practi...

📄 Logics-Parsing-Omni Technical Report
🗓️ Published: 3/10/2026
🔗 http://arxiv.org/abs/2603.09677v1
👥 Authors: Xin An, Jingyi Cai, Xiangyang Chen, Huayao Liu, Peiting Liu, Peng Wang (possible past Peking University affiliation), Bei Yang, Xiuwen Zhu, Yongfan Chen, Baoyu Hou, Shuzhao Li, Weidong Ren, Fan Yang (possible past Tencent (China) affiliation), Jiangtao Zhang, Xiaoxiao Xu, Lin Qu
Abstract

Addressing the challenges of fragmented task definitions and the heterogeneity of unstructured data in multimodal parsing, this paper proposes the Omni Parsing framework. This framework establishes a Unified Taxonomy covering documents, images, and audio-visual streams, introducing a progressive parsing paradigm that bridges perception and cognition. Specifically, the framework integrates three hierarchical levels: 1) Holistic Detection, which achieves precise spatial-temporal grounding of objec...

📄 KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization
🗓️ Published: 3/10/2026
🔗 http://arxiv.org/abs/2603.10085v1
👥 Authors: Qitong Sun, Jun Han, Tianlin Li, Zhe Tang (possible past Alibaba Group (China) affiliation), Sheng Chen, Fei Yang (possible past Meta (United States) affiliation), Aishan Liu, Xianglong Liu, Yang Liu (possible past Tsinghua University affiliation)
Abstract

Improving GPU kernel efficiency is crucial for advancing AI systems. Recent work has explored leveraging large language models (LLMs) for GPU kernel generation and optimization. However, existing LLM-based kernel optimization pipelines typically rely on opaque, implicitly learned heuristics within the LLMs to determine optimization strategies. This leads to inefficient trial-and-error and weakly interpretable optimizations. Our key insight is to replace implicit heuristics with expert optimizati...

📄 Efficiently Aligning Draft Models via Parameter- and Data-Efficient Adaptation
🗓️ Published: 3/10/2026
🔗 http://arxiv.org/abs/2603.09527v1
👥 Authors: Luxi Lin, Zhihang Lin, Zhanpeng Zeng, Yuhao Chen, Qingyu Zhang, Jixiang Luo, Xuelong Li (possible past Tencent (China) affiliation), Rongrong Ji (possible past Tencent (China) affiliation)
Abstract

Speculative decoding accelerates LLM inference but suffers from performance degradation when target models are fine-tuned for specific domains. A naive solution is to retrain draft models for every target model, which is costly and inefficient. To address this, we introduce a parameter- and data-efficient framework named Efficient Draft Adaptation, abbreviated as EDA, for efficiently adapting draft models. EDA introduces three innovations: (1) a decoupled architecture that utilizes shared and pr...

📄 Evolving Prompt Adaptation for Vision-Language Models
🗓️ Published: 3/10/2026
🔗 http://arxiv.org/abs/2603.09493v1
👥 Authors: Enming Zhang, Jiayang Li, Yanru Wu, Zhenyu Liu (possible past Google (United States) affiliation), Yang Li (possible past Google (United States) affiliation)
Abstract

The adaptation of large-scale vision-language models (VLMs) to downstream tasks with limited labeled data remains a significant challenge. While parameter-efficient prompt learning methods offer a promising path, they often suffer from catastrophic forgetting of pre-trained knowledge. Toward addressing this limitation, our work is grounded in the insight that governing the evolutionary path of prompts is essential for forgetting-free adaptation. To this end, we propose EvoPrompt, a novel framewo...

📄 EvoDriveVLA: Evolving Autonomous Driving Vision-Language-Action Model via Collaborative Perception-Planning Distillation
🗓️ Published: 3/10/2026
🔗 http://arxiv.org/abs/2603.09465v1
👥 Authors: Jiajun Cao, Xiaoan Zhang, Xiaobao Wei, Liyuqiu Huang, Wang Zijian, Hanzhen Zhang, Zhengyu Jia, Wei Mao, Hao Wang (possible past Tsinghua University affiliation), Xianming Liu (possible past Meta (United States) affiliation), Shuchang Zhou Liu, Yang Wang (possible past Baidu (China) affiliation), Shanghang Zhang
Abstract

Vision-Language-Action models have shown great promise for autonomous driving, yet they suffer from degraded perception after unfreezing the visual encoder and struggle with accumulated instability in long-term planning. To address these challenges, we propose EvoDriveVLA-a novel collaborative perception-planning distillation framework that integrates self-anchored perceptual constraints and oracle-guided trajectory optimization. Specifically, self-anchored visual distillation leverages self-anc...

📄 An Empirical Study and Theoretical Explanation on Task-Level Model-Merging Collapse
🗓️ Published: 3/10/2026
🔗 http://arxiv.org/abs/2603.09463v1
👥 Authors: Yuan Cao (possible past Google (United States) affiliation), Dezhi Ran, Yuzhe Guo, Mengzhou Wu, Simin Chen, Linyi Li, Wei Yang (possible past Tencent (China) affiliation), Tao Xie
Abstract

Model merging unifies independently fine-tuned LLMs from the same base, enabling reuse and integration of parallel development efforts without retraining. However, in practice we observe that merging does not always succeed: certain combinations of task-specialist models suffer from catastrophic performance degradation after merging. We refer to this failure mode as merging collapse. Intuitively, collapse arises when the learned representations or parameter adjustments for different tasks are fu...

📄 ICDAR 2025 Competition on End-to-End Document Image Machine Translation Towards Complex Layouts
🗓️ Published: 3/10/2026
🔗 http://arxiv.org/abs/2603.09392v1
👥 Authors: Yaping Zhang, Yupu Liang, Zhiyang Zhang, Zhiyuan Chen (possible past Google (United States) affiliation), Lu Xiang, Yang Zhao (possible past Google (United States) affiliation), Yu Zhou, Chengqing Zong
Abstract

Document Image Machine Translation (DIMT) seeks to translate text embedded in document images from one language to another by jointly modeling both textual content and page layout, bridging optical character recognition (OCR) and natural language processing (NLP). The DIMT 2025 Challenge advances research on end-to-end document image translation, a rapidly evolving area within multimodal document understanding. The competition features two tracks, OCR-free and OCR-based, each with two subtasks f...

📄 What do near-optimal learning rate schedules look like?
🗓️ Published: 3/11/2026
🔗 http://arxiv.org/abs/2603.10301v1
👥 Authors: Hiroki Naganuma, Atish Agarwala (possible past Stanford University affiliation), Priya Kasimbeg (possible past Stanford University affiliation), George E. Dahl (possible past University Of Toronto affiliation)
Abstract

A basic unanswered question in neural network training is: what is the best learning rate schedule shape for a given workload? The choice of learning rate schedule is a key factor in the success or failure of the training process, but beyond having some kind of warmup and decay, there is no consensus on what makes a good schedule shape. To answer this question, we designed a search procedure to find the best shapes within a parameterized schedule family. Our approach factors out the schedule sha...

📄 ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning
🗓️ Published: 3/10/2026
🔗 http://arxiv.org/abs/2603.10160v1
👥 Authors: Ruizhong Qiu, Hanqing Zeng, Yinglong Xia, Yiwen Meng, Ren Chen, Jiarui Feng, Dongqi Fu, Qifan Wang (possible past Google (United States) affiliation), Jiayi Liu, Jun Xiao, Xiangjun Fan, Benyu Zhang, Hong Li, Zhining Liu, Hyunsik Yoo, Zhichen Zeng, Tianxin Wei, Hanghang Tong (possible past Ibm (United States) affiliation)
Abstract

Low-rank adapters (LoRAs) are a parameter-efficient finetuning technique that injects trainable low-rank matrices into pretrained models to adapt them to new tasks. Mixture-of-LoRAs models expand neural networks efficiently by routing each layer input to a small subset of specialized LoRAs of the layer. Existing Mixture-of-LoRAs routers assign a learned routing weight to each LoRA to enable end-to-end training of the router. Despite their empirical promise, we observe that the routing weights ar...

*Notable papers are those with at least two authors from a "big" AI/ML lab.