πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.19220v1
πŸ‘₯ Authors: Zhuolin Yang, Zihan Liu, Yang Chen (possible past Tencent (China) affiliation), Wenliang Dai, Boxin Wang, Sheng-Chieh Lin, Chankyu Lee, Yangyi Chen, Dongfu Jiang, Jiafan He, Renjie Pi, Grace Lam, Nayeon Lee, Alexander Bukharin, Mohammad Shoeybi (possible past Nvidia (United States) affiliation), Bryan Catanzaro (possible past University Of California, Berkeley affiliation), Wei Ping (possible past Baidu (China) affiliation)
Abstract

We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight LLM, after DeepSeekV3.2-Speciale-671B-A37B, to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICP...

πŸ“„ SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.19173v1
πŸ‘₯ Authors: Edward Lin, Sahil Modi, Siva Kumar Sastry Hari (possible past Nvidia (United States) affiliation), Qijing Huang, Zhifan Ye, Nestor Qin, Fengzhe Zhou, Yuan Zhang (possible past Google (United States) affiliation), Jingquan Wang, Sana Damani, Dheeraj Peri, Ouye Xie, Aditya Kane, Moshe Maor, Michael Behar, Triston Cao, Rishabh Mehta, Vartika Singh, Vikram Sharma Mailthody, Terry Chen, Zihao Ye, Hanfeng Chen, Tianqi Chen (possible past University Of Washington affiliation), Vinod Grover, Wei Chen, Wei Liu (possible past Tsinghua University affiliation), Eric Chung, Luis Ceze, Roger Bringmann, Cyril Zeller, Michael Lightstone, Christos Kozyrakis (possible past Stanford University affiliation), Humphrey Shi
Abstract

As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity to hardware-efficient execution. We present SOL-ExecBench, a benchmark of 235 CUDA kernel optimization problems extracted from 124 production and emerging AI models spanning language, diffusion, vision, audio, video, and hybrid architectures, targeting NVIDIA Blackwell GPUs. The benchmark covers forward...

πŸ“„ LuMamba: Latent Unified Mamba for Electrode Topology-Invariant and Efficient EEG Modeling
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.19100v1
πŸ‘₯ Authors: DanaΓ© Broustail, Anna Tegon, Thorir Mar Ingolfsson, Yawei Li (possible past Google (United States) affiliation), Luca Benini (possible past Eth Zurich affiliation)
Abstract

Electroencephalography (EEG) enables non-invasive monitoring of brain activity across clinical and neurotechnology applications, yet building foundation models for EEG remains challenging due to \emph{differing electrode topologies} and \emph{computational scalability}, as Transformer architectures incur quadratic sequence complexity. As a joint solution, we propose \textbf{LuMamba} (\textbf{L}atent \textbf{U}nified \textbf{Mamba}), a self-supervised framework combining topology-invariant encodi...

πŸ“„ Em-Garde: A Propose-Match Framework for Proactive Streaming Video Understanding
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.19054v1
πŸ‘₯ Authors: Yikai Zheng, Xin Ding, Yifan Yang (possible past Tencent (China) affiliation), Shiqi Jiang, Hao Wu (possible past Tencent (China) affiliation), Qianxi Zhang, Weijun Wang (possible past Google (United States) affiliation), Ting Cao, Yunxin Liu
Abstract

Recent advances in Streaming Video Understanding has enabled a new interaction paradigm where models respond proactively to user queries. Current proactive VideoLLMs rely on per-frame triggering decision making, which suffers from an efficiency-accuracy dilemma. We propose Em-Garde, a novel framework that decouples semantic understanding from streaming perception. At query time, the Instruction-Guided Proposal Parser transforms user queries into structured, perceptually grounded visual proposals...

πŸ“„ Reasoning over mathematical objects: on-policy reward modeling and test time aggregation
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.18886v1
πŸ‘₯ Authors: Pranjal Aggarwal, Marjan Ghazvininejad (possible past Meta (United States) affiliation), Seungone Kim, Ilia Kulikov, Jack Lanchantin, Xian Li (possible past Meta (United States) affiliation), Tianjian Li, Bo Liu (possible past Meta (United States) affiliation), Graham Neubig (possible past Carnegie Mellon University affiliation), Anaelia Ovalle, Swarnadeep Saha, Sainbayar Sukhbaatar, Sean Welleck, Jason Weston (possible past Stanford University affiliation), Chenxi Whitehouse, Adina Williams (possible past Eth Zurich affiliation), Jing Xu (possible past Meta (United States) affiliation), Ping Yu, Weizhe Yuan, Jingyu Zhang, Wenting Zhao
Abstract

The ability to precisely derive mathematical objects is a core requirement for downstream STEM applications, including mathematics, physics, and chemistry, where reasoning must culminate in formally structured expressions. Yet, current LM evaluations of mathematical and scientific reasoning rely heavily on simplified answer formats such as numerical values or multiple choice options due to the convenience of automated assessment. In this paper we provide three contributions for improving reasoni...

πŸ“„ ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.18815v1
πŸ‘₯ Authors: Hao Zhang (possible past Tencent (China) affiliation), Mingjie Liu, Shaokun Zhang, Songyang Han, Jian Hu, Zhenghui Jin, Yuchi Zhang, Shizhe Diao, Ximing Lu, Binfeng Xu, Zhiding Yu (possible past Nvidia (United States) affiliation), Jan Kautz (possible past Nvidia (United States) affiliation), Yi Dong
Abstract

Multi-turn LLM agents are increasingly important for solving complex, interactive tasks, and reinforcement learning (RL) is a key ingredient for improving their long-horizon behavior. However, RL training requires generating large numbers of sandboxed rollout trajectories, and existing infrastructures often couple rollout orchestration with the training loop, making systems hard to migrate and maintain. Under the rollout-as-a-service philosophy, we present ProRL Agent , a scalable infrastructure...

πŸ“„ dTRPO: Trajectory Reduction in Policy Optimization of Diffusion Large Language Models
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.18806v1
πŸ‘₯ Authors: Wenxuan Zhang, Lemeng Wu, Changsheng Zhao, Ernie Chang, Mingchen Zhuge, Zechun Liu, Andy Su, Hanxian Huang, Jun Chen, Chong Zhou, Raghuraman Krishnamoorthi, Vikas Chandra (possible past Meta (United States) affiliation), Mohamed Elhoseiny (possible past Meta (United States) affiliation), Wei Wen (possible past Google (United States) affiliation)
Abstract

Diffusion Large Language Models (dLLMs) introduce a new paradigm for language generation, which in turn presents new challenges for aligning them with human preferences. In this work, we aim to improve the policy optimization for dLLMs by reducing the cost of the trajectory probability calculation, thereby enabling scaled-up offline policy training. We prove that: (i) under reference policy regularization, the probability ratio of the newly unmasked tokens is an unbiased estimate of that of inte...

πŸ“„ CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User Feedbacks
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.18736v1
πŸ‘₯ Authors: Hao Wang (possible past Tsinghua University affiliation), Licheng Pan, Zhichao Chen, Chunyuan Zheng, Zhixuan Chu, Xiaoxi Li, Yuan Lu, Xinggao Liu, Haoxuan Li, Zhouchen Lin (possible past Peking University affiliation)
Abstract

Despite the success of reinforcement learning from human feedback (RLHF) in aligning language models, current reward modeling heavily relies on experimental feedback data collected from human annotators under controlled and costly conditions. In this work, we introduce observational reward modeling -- learning reward models with observational user feedback (e.g., clicks, copies, and upvotes) -- as a scalable and cost-effective alternative. We identify two fundamental challenges in this setting: ...

πŸ“„ OpenT2M: No-frill Motion Generation with Open-source,Large-scale, High-quality Data
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.18623v1
πŸ‘₯ Authors: Bin Cao (possible past Microsoft (United States) affiliation), Sipeng Zheng, Hao Luo, Boyuan Li, Jing Liu (possible past Baidu (China) affiliation), Zongqing Lu
Abstract

Text-to-motion (T2M) generation aims to create realistic human movements from text descriptions, with promising applications in animation and robotics. Despite recent progress, current T2M models perform poorly on unseen text descriptions due to the small scale and limited diversity of existing motion datasets. To address this problem, we introduce OpenT2M, a million-level, high-quality, and open-source motion dataset containing over 2800 hours of human motion. Each sequence undergoes rigorous q...

πŸ“„ Interplay: Training Independent Simulators for Reference-Free Conversational Recommendation
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.18573v1
πŸ‘₯ Authors: Jerome Ramos, Feng Xia (possible past Tencent (China) affiliation), Xi Wang (possible past Tsinghua University affiliation), Shubham Chatterjee, Xiao Fu, Hossein A. Rahmani, Aldo Lipani
Abstract

Training conversational recommender systems (CRS) requires extensive dialogue data, which is challenging to collect at scale. To address this, researchers have used simulated user-recommender conversations. Traditional simulation approaches often utilize a single large language model (LLM) that generates entire conversations with prior knowledge of the target items, leading to scripted and artificial dialogues. We propose a reference-free simulation framework that trains two independent LLMs, on...

πŸ“„ SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative Decoding
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.18567v1
πŸ‘₯ Authors: Shenggui Li, Chao Wang (possible past Google (United States) affiliation), Yikai Zhu, Yubo Wang, Fan Yin, Shuai Shi, Yefei Chen, Xiaomin Dong, Qiaoling Chen, Jin Pan, Ji Li, Laixin Xie, Yineng Zhang, Lei Yu (possible past University Of Oxford affiliation), Yonggang Wen, Ivor Tsang, Tianwei Zhang
Abstract

Large language models incur high inference latency due to sequential autoregressive decoding. Speculative decoding alleviates this bottleneck by using a lightweight draft model to propose multiple tokens for batched verification. However, its adoption has been limited by the lack of high-quality draft models and scalable training infrastructure. We introduce SpecForge, an open-source, production-oriented framework for training speculative decoding models with full support for EAGLE-3. SpecForge ...

πŸ“„ CoDA: Exploring Chain-of-Distribution Attacks and Post-Hoc Token-Space Repair for Medical Vision-Language Models
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.18545v1
πŸ‘₯ Authors: Xiang Chen (possible past Tencent (China) affiliation), Fangfang Yang, Chunlei Meng, Chengyin Hu, Ang Li (possible past Google (United States) affiliation), Yiwei Wei, Jiahuan Long, Jiujiang Guo
Abstract

Medical vision--language models (MVLMs) are increasingly used as perceptual backbones in radiology pipelines and as the visual front end of multimodal assistants, yet their reliability under real clinical workflows remains underexplored. Prior robustness evaluations often assume clean, curated inputs or study isolated corruptions, overlooking routine acquisition, reconstruction, display, and delivery operations that preserve clinical readability while shifting image statistics. To address this g...

πŸ“„ Scaling Sim-to-Real Reinforcement Learning for Robot VLAs with Generative 3D Worlds
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.18532v1
πŸ‘₯ Authors: Andrew Choi, Xinjie Wang, Zhizhong Su (possible past Baidu (China) affiliation), Wei Xu (possible past Tencent (China) affiliation)
Abstract

The strong performance of large vision-language models (VLMs) trained with reinforcement learning (RL) has motivated similar approaches for fine-tuning vision-language-action (VLA) models in robotics. Many recent works fine-tune VLAs directly in the real world to avoid addressing the sim-to-real gap. While real-world RL circumvents sim-to-real issues, it inherently limits the generality of the resulting VLA, as scaling scene and object diversity in the physical world is prohibitively difficult. ...

πŸ“„ Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.18472v1
πŸ‘₯ Authors: Yinghui Li, Jiayi Kuang, Peng Xing, Daixian Liu, Junnan Dong, Shu-Yu Guo, Yangning Li, Qingyu Zhou, Wenhao Jiang (possible past Tencent (China) affiliation), Hai-Tao Zheng (possible past Tsinghua University affiliation), Ying Shen, Liang Lin, Philip S. Yu (possible past Tsinghua University affiliation)
Abstract

While Multimodal Large Language Models (MLLMs) have achieved remarkable success in interpreting natural scenes, their ability to process discrete symbols -- the fundamental building blocks of human cognition -- remains a critical open question. Unlike continuous visual data, symbols such as mathematical formulas, chemical structures, and linguistic characters require precise, deeper interpretation. This paper introduces a comprehensive benchmark to evaluate how top-tier MLLMs navigate these "dis...

πŸ“„ AlignMamba-2: Enhancing Multimodal Fusion and Sentiment Analysis with Modality-Aware Mamba
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.18462v1
πŸ‘₯ Authors: Yan Li (possible past Tencent (China) affiliation), Yifei Xing, Xiangyuan Lan, Xin Li (possible past Google (United States) affiliation), Haifeng Chen, Dongmei Jiang
Abstract

In the era of large-scale pre-trained models, effectively adapting general knowledge to specific affective computing tasks remains a challenge, particularly regarding computational efficiency and multimodal heterogeneity. While Transformer-based methods have excelled at modeling inter-modal dependencies, their quadratic computational complexity limits their use with long-sequence data. Mamba-based models have emerged as a computationally efficient alternative; however, their inherent sequential ...

πŸ“„ Mind the Rarities: Can Rare Skin Diseases Be Reliably Diagnosed via Diagnostic Reasoning?
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.18418v1
πŸ‘₯ Authors: Yang Liu (possible past Tsinghua University affiliation), Jiyao Yang, Hongjin Zhao, Xiaoyong Li, Yanzhe Ji, Xingjian Li (possible past Baidu (China) affiliation), Runmin Jiang, Tianyang Wang, Saeed Anwar, Dongwoo Kim, Yue Yao, Zhenyue Qin, Min Xu
Abstract

Large vision-language models (LVLMs) demonstrate strong performance in dermatology; however, evaluating diagnostic reasoning for rare conditions remains largely unexplored. Existing benchmarks focus on common diseases and assess only final accuracy, overlooking the clinical reasoning process, which is critical for complex cases. We address this gap by constructing DermCase, a long-context benchmark derived from peer-reviewed case reports. Our dataset contains 26,030 multi-modal image-text pairs ...

πŸ“„ The Validity Gap in Health AI Evaluation: A Cross-Sectional Analysis of Benchmark Composition
πŸ—“οΈ Published: 3/18/2026
πŸ”— http://arxiv.org/abs/2603.18294v1
πŸ‘₯ Authors: Alvin Rajkomar (possible past Google (United States) affiliation), Pavan Sudarshan, Angela Lai, Lily Peng (possible past Google (United States) affiliation)
Abstract

Background: Clinical trials rely on transparent inclusion criteria to ensure generalizability. In contrast, benchmarks validating health-related large language models (LLMs) rarely characterize the "patient" or "query" populations they contain. Without defined composition, aggregate performance metrics may misrepresent model readiness for clinical use. Methods: We analyzed 18,707 consumer health queries across six public benchmarks using LLMs as automated coding instruments to apply a standard...

πŸ“„ VLM-AutoDrive: Post-Training Vision-Language Models for Safety-Critical Autonomous Driving Events
πŸ—“οΈ Published: 3/18/2026
πŸ”— http://arxiv.org/abs/2603.18178v1
πŸ‘₯ Authors: Mohammad Qazim Bhat, Yufan Huang, Niket Agarwal (possible past Nvidia (United States) affiliation), Hao Wang (possible past Tsinghua University affiliation), Michael Woods, John Kenyon, Tsung-Yi Lin (possible past Nvidia (United States) affiliation), Xiaodong Yang (possible past Nvidia (United States) affiliation), Ming-Yu Liu (possible past Nvidia (United States) affiliation), Kevin Xie
Abstract

The rapid growth of ego-centric dashcam footage presents a major challenge for detecting safety-critical events such as collisions and near-collisions, scenarios that are brief, rare, and difficult for generic vision models to capture. While multimodal large language models (MLLMs) demonstrate strong general reasoning ability, they underperform in driving contexts due to domain and temporal misalignment. We introduce VLM-AutoDrive, a modular post-training framework for adapting pretrained Visi...

πŸ“„ Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models
πŸ—“οΈ Published: 3/18/2026
πŸ”— http://arxiv.org/abs/2603.18002v1
πŸ‘₯ Authors: Kevin Qu, Haozhe Qi, Mihai Dusmanu, Mahdi Rad, Rui Wang (possible past Tencent (China) affiliation), Marc Pollefeys (possible past Google (United States) affiliation)
Abstract

Multimodal Large Language Models (MLLMs) have made impressive progress in connecting vision and language, but they still struggle with spatial understanding and viewpoint-aware reasoning. Recent efforts aim to augment the input representations with geometric cues rather than explicitly teaching models to reason in 3D space. We introduce Loc3R-VLM, a framework that equips 2D Vision-Language Models with advanced 3D understanding capabilities from monocular video input. Inspired by human spatial co...

πŸ“„ How do LLMs Compute Verbal Confidence
πŸ—“οΈ Published: 3/18/2026
πŸ”— http://arxiv.org/abs/2603.17839v1
πŸ‘₯ Authors: Dharshan Kumaran (possible past Google (United States) affiliation), Arthur Conmy, Federico Barbero, Simon Osindero (possible past Google (United States) affiliation), Viorica Patraucean, Petar Velickovic
Abstract

Verbal confidence -- prompting LLMs to state their confidence as a number or category -- is widely used to extract uncertainty estimates from black-box models. However, how LLMs internally generate such scores remains unknown. We address two questions: first, when confidence is computed - just-in-time when requested, or automatically during answer generation and cached for later retrieval; and second, what verbal confidence represents - token log-probabilities, or a richer evaluation of answer q...

πŸ“„ Spectrally-Guided Diffusion Noise Schedules
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.19222v1
πŸ‘₯ Authors: Carlos Esteves (possible past Google (United States) affiliation), Ameesh Makadia (possible past Google (United States) affiliation)
Abstract

Denoising diffusion models are widely used for high-quality image and video generation. Their performance depends on noise schedules, which define the distribution of noise levels applied during training and the sequence of noise levels traversed during sampling. Noise schedules are typically handcrafted and require manual tuning across different resolutions. In this work, we propose a principled way to design per-instance noise schedules for pixel diffusion, based on the image's spectral proper...

πŸ“„ DriveTok: 3D Driving Scene Tokenization for Unified Multi-View Reconstruction and Understanding
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.19219v1
πŸ‘₯ Authors: Dong Zhuo, Wenzhao Zheng, Sicheng Zuo, Siming Yan, Lu Hou, Jie Zhou (possible past Tsinghua University affiliation), Jiwen Lu (possible past Tsinghua University affiliation)
Abstract

With the growing adoption of vision-language-action models and world models in autonomous driving systems, scalable image tokenization becomes crucial as the interface for the visual modality. However, most existing tokenizers are designed for monocular and 2D scenes, leading to inefficiency and inter-view inconsistency when applied to high-resolution multi-view driving scenes. To address this, we propose DriveTok, an efficient 3D driving scene tokenizer for unified multi-view reconstruction and...

πŸ“„ Fast and Effective Computation of Generalized Symmetric Matrix Factorization
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.19147v1
πŸ‘₯ Authors: Lei Yang (possible past Google (United States) affiliation), Han Wan, Min Zhang (possible past Tsinghua University affiliation), Ling Liang
Abstract

In this paper, we study a nonconvex, nonsmooth, and non-Lipschitz generalized symmetric matrix factorization model that unifies a broad class of matrix factorization formulations arising in machine learning, image science, engineering, and related areas. We first establish two exactness properties. On the modeling side, we prove an exact penalty property showing that, under suitable conditions, the symmetry-inducing quadratic penalty enforces symmetry whenever the penalty parameter is sufficient...

πŸ“„ STEP: Scientific Time-Series Encoder Pretraining via Cross-Domain Distillation
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.18688v1
πŸ‘₯ Authors: Chen Zhang (possible past Peking University affiliation), Liwei Liu, Jun Tao, Xiaoyu Yang, Xuenan Xu, Kai Chen (possible past Shanghai Jiao Tong University affiliation), Bowen Zhou, Wen Wu, Chao Zhang
Abstract

Scientific time series are central to scientific AI but are typically sparse, highly heterogeneous, and limited in scale, making unified representation learning particularly challenging. Meanwhile, foundation models pretrained on relevant time series domains such as audio, general time series, and brain signals contain rich knowledge, but their applicability to scientific signals remains underexplored. In this paper, we investigate the transferability and complementarity of foundation models fro...

πŸ“„ Data-efficient pre-training by scaling synthetic megadocs
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.18534v1
πŸ‘₯ Authors: Konwoo Kim, Suhas Kotha, Yejin Choi (possible past Allen Institute For Artificial Intelligence affiliation), Tatsunori Hashimoto (possible past Stanford University affiliation), Nick Haber, Percy Liang (possible past Stanford University affiliation)
Abstract

Synthetic data augmentation has emerged as a promising solution when pre-training is constrained by data rather than compute. We study how to design synthetic data algorithms that achieve better loss scaling: not only lowering loss at finite compute but especially as compute approaches infinity. We first show that pre-training on web data mixed with synthetically generated rephrases improves i.i.d. validation loss on the web data, despite the synthetic data coming from an entirely different dist...

πŸ“„ AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.18464v1
πŸ‘₯ Authors: Chengxuan Lu, Shukuan Wang, Yanjie Li, Wei Liu (possible past Tsinghua University affiliation), Shiji Jin, Fuyuan Qian, Peiming Li, Baigui Sun, Yang Liu (possible past Tsinghua University affiliation)
Abstract

Reinforcement learning (RL) for large-scale Vision-Language-Action (VLA) models faces significant challenges in computational efficiency and data acquisition. We propose AcceRL, a fully asynchronous and decoupled RL framework designed to eliminate synchronization barriers by physically isolating training, inference, and rollouts. Crucially, AcceRL is the first to integrate a plug-and-play, trainable world model into a distributed asynchronous RL pipeline to generate virtual experiences. Experime...

πŸ“„ Path-Constrained Mixture-of-Experts
πŸ—“οΈ Published: 3/18/2026
πŸ”— http://arxiv.org/abs/2603.18297v1
πŸ‘₯ Authors: Zijin Gu, Tatiana Likhomanenko (possible past Meta (United States) affiliation), Vimal Thilak, Jason Ramapuram, Navdeep Jaitly (possible past University Of Toronto affiliation)
Abstract

Sparse Mixture-of-Experts (MoE) architectures enable efficient scaling by activating only a subset of parameters for each input. However, conventional MoE routing selects each layer's experts independently, creating N^L possible expert paths -- for N experts across L layers. This far exceeds typical training set sizes, leading to statistical inefficiency as the model may not learn meaningful structure over such a vast path space. To constrain it, we propose \pathmoe, which shares router paramete...

πŸ“„ Transfer Learning for Contextual Joint Assortment-Pricing under Cross-Market Heterogeneity
πŸ—“οΈ Published: 3/18/2026
πŸ”— http://arxiv.org/abs/2603.18114v1
πŸ‘₯ Authors: Elynn Chen, Xi Chen (possible past University Of California, Berkeley affiliation), Yi Zhang (possible past Google (United States) affiliation)
Abstract

We study transfer learning for contextual joint assortment-pricing under a multinomial logit choice model with bandit feedback. A seller operates across multiple related markets and observes only posted prices and realized purchases. While data from source markets can accelerate learning in a target market, cross-market differences in customer preferences may introduce systematic bias if pooled indiscriminately. We model heterogeneity through a structured utility shift, where markets share a c...

πŸ“„ Towards Infinitely Long Neural Simulations: Self-Refining Neural Surrogate Models for Dynamical Systems
πŸ—“οΈ Published: 3/18/2026
πŸ”— http://arxiv.org/abs/2603.17750v1
πŸ‘₯ Authors: Qi Liu (possible past Tencent (China) affiliation), Laure Zanna, Joan Bruna (possible past University Of California, Berkeley affiliation)
Abstract

Recent advances in autoregressive neural surrogate models have enabled orders-of-magnitude speedups in simulating dynamical systems. However, autoregressive models are generally prone to distribution drift: compounding errors in autoregressive rollouts that severely degrade generation quality over long time horizons. Existing work attempts to address this issue by implicitly leveraging the inherent trade-off between short-time accuracy and long-time consistency through hyperparameter tuning. In ...

πŸ“„ VC-Soup: Value-Consistency Guided Multi-Value Alignment for Large Language Models
πŸ—“οΈ Published: 3/18/2026
πŸ”— http://arxiv.org/abs/2603.18113v1
πŸ‘₯ Authors: Hefei Xu, Le Wu, Yu Wang (possible past Tsinghua University affiliation), Min Hou, Han Wu, Zhen Zhang, Meng Wang (possible past Google (United States) affiliation)
Abstract

As large language models (LLMs) increasingly shape content generation, interaction, and decision-making across the Web, aligning them with human values has become a central objective in trustworthy AI. This challenge becomes even more pronounced when aligning multiple, potentially conflicting human values. Although recent approaches, such as reward reweighting, prompt-based supervised fine-tuning, and model merging, attempt to tackle multi-value alignment, they still face two major limitations: ...

*Notable papers are those with at least two authors from a "big" AI/ML lab.