๐Ÿ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

๐Ÿ“„ SageBwd: A Trainable Low-bit Attention
๐Ÿ—“๏ธ Published: 3/2/2026
๐Ÿ”— http://arxiv.org/abs/2603.02170v1
๐Ÿ‘ฅ Authors: Jintao Zhang, Marco Chen, Haoxu Wang, Kai Jiang, Ion Stoica (possible past University Of California, Berkeley affiliation), Joseph E. Gonzalez (possible past University Of California, Berkeley affiliation), Jianfei Chen, Jun Zhu (possible past Tsinghua University affiliation)
Abstract

Low-bit attention, such as SageAttention, has emerged as an effective approach for accelerating model inference, but its applicability to training remains poorly understood. In prior work, we introduced SageBwd, a trainable INT8 attention that quantizes six of seven attention matrix multiplications while preserving fine-tuning performance. However, SageBwd exhibited a persistent performance gap to full-precision attention (FPA) during pre-training. In this work, we investigate why this gap occur...

๐Ÿ“„ Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons
๐Ÿ—“๏ธ Published: 3/2/2026
๐Ÿ”— http://arxiv.org/abs/2603.02115v1
๐Ÿ‘ฅ Authors: Anthony Liang, Yigit Korkmaz, Jiahui Zhang, Minyoung Hwang, Abrar Anwar, Sidhant Kaushik, Aditya Shah, Alex S. Huang, Luke Zettlemoyer (possible past University Of Washington affiliation), Dieter Fox (possible past University Of Washington affiliation), Yu Xiang (possible past University Of Washington affiliation), Anqi Li, Andreea Bobu, Abhishek Gupta (possible past University Of California, Berkeley affiliation), Stephen Tu (possible past Google (United States) affiliation), Erdem Biyik, Jesse Zhang
Abstract

General-purpose robot reward models are typically trained to predict absolute task progress from expert demonstrations, providing only local, frame-level supervision. While effective for expert demonstrations, this paradigm scales poorly to large-scale robotics datasets where failed and suboptimal trajectories are abundant and assigning dense progress labels is ambiguous. We introduce Robometer, a scalable reward modeling framework that combines intra-trajectory progress supervision with inter-t...

๐Ÿ“„ CharacterFlywheel: Scaling Iterative Improvement of Engaging and Steerable LLMs in Production
๐Ÿ—“๏ธ Published: 3/2/2026
๐Ÿ”— http://arxiv.org/abs/2603.01973v1
๐Ÿ‘ฅ Authors: Yixin Nie, Lin Guan, Zhongyao Ma, Anchit Gupta, Yipin Zhou, Xiao Li, Zhengping Zhou, Raymond Zeng, Gelin Zhou, Shigan Chu, Ajay Thampi, Wancen Mu, Nathan Shuster, Ketong Wang, Lin Chen, Jason Brewer, Derek Hao Hu, Alexander Mccauley, Jason Weston (possible past Stanford University affiliation), Sem Park, Na Zhang, Kevin Tang (possible past Stanford University affiliation)
Abstract

This report presents CharacterFlywheel, an iterative flywheel process for improving large language models (LLMs) in production social chat applications across Instagram, WhatsApp, and Messenger. Starting from LLaMA 3.1, we refined models across 15 generations using data from both internal and external real-user traffic. Through continuous deployments from July 2024 to April 2025, we conducted controlled 7-day A/B tests showing consistent engagement improvements: 7 of 8 newly deployed models demo...

๐Ÿ“„ Closed-Loop Action Chunks with Dynamic Corrections for Training-Free Diffusion Policy
๐Ÿ—“๏ธ Published: 3/2/2026
๐Ÿ”— http://arxiv.org/abs/2603.01953v1
๐Ÿ‘ฅ Authors: Pengyuan Wu, Pingrui Zhang, Zhigang Wang (possible past Baidu (China) affiliation), Dong Wang (possible past Tsinghua University affiliation), Bin Zhao, Xuelong Li (possible past Tencent (China) affiliation)
Abstract

Diffusion-based policies have achieved remarkable results in robotic manipulation but often struggle to adapt rapidly in dynamic scenarios, leading to delayed responses or task failures. We present DCDP, a Dynamic Closed-Loop Diffusion Policy framework that integrates chunk-based action generation with real-time correction. DCDP integrates a self-supervised dynamic feature encoder, cross-attention fusion, and an asymmetric action encoder-decoder to inject environmental dynamics before action exe...

๐Ÿ“„ Federated Agentic AI for Wireless Networks: Fundamentals, Approaches, and Applications
๐Ÿ—“๏ธ Published: 3/2/2026
๐Ÿ”— http://arxiv.org/abs/2603.01755v1
๐Ÿ‘ฅ Authors: Lingyi Cai, Yu Zhang (possible past Google (United States) affiliation), Ruichen Zhang, Yinqiu Liu, Tao Jiang (possible past Alibaba Group (China) affiliation), Dusit Niyato, Wei Ni, Abbas Jamalipour
Abstract

Agentic artificial intelligence (AI) presents a promising pathway toward realizing autonomous and self-improving wireless network services. However, resource-constrained, widely distributed, and data-heterogeneous nature of wireless networks poses significant challenges to existing agentic AI that relies on centralized architectures, leading to high communication overhead, privacy risks, and non-independent and identically distributed (non-IID) data. Federated learning (FL) has the potential to ...

๐Ÿ“„ FT-Dojo: Towards Autonomous LLM Fine-Tuning with Language Agents
๐Ÿ—“๏ธ Published: 3/2/2026
๐Ÿ”— http://arxiv.org/abs/2603.01712v1
๐Ÿ‘ฅ Authors: Qizheng Li, Yifei Zhang, Xiao Yang (possible past Tencent (China) affiliation), Xu Yang, Zhuo Wang, Weiqing Liu, Jiang Bian (possible past Baidu (China) affiliation)
Abstract

Fine-tuning large language models for vertical domains remains a labor-intensive and expensive process, requiring domain experts to curate data, configure training, and iteratively diagnose model behavior. Despite growing interest in autonomous machine learning, no prior work has tackled end-to-end LLM fine-tuning with agents. Can LLM-based agents automate this complete process? We frame this as a substantially open problem: agents must navigate an open-ended search space spanning data curation ...

๐Ÿ“„ Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search
๐Ÿ—“๏ธ Published: 3/2/2026
๐Ÿ”— http://arxiv.org/abs/2603.01692v1
๐Ÿ‘ฅ Authors: Yifei Zhang, Xu Yang, Xiao Yang (possible past Tencent (China) affiliation), Bowen Xian, Qizheng Li, Shikai Fang, Jingyuan Li, Jian Wang (possible past Baidu (China) affiliation), Mingrui Xu, Weiqing Liu, Jiang Bian (possible past Baidu (China) affiliation)
Abstract

LLM-based agents for machine learning engineering (MLE) predominantly rely on tree search, a form of gradient-free optimization that uses scalar validation scores to rank candidates. As LLM reasoning capabilities improve, exhaustive enumeration becomes increasingly inefficient compared to directed updates, analogous to how accurate gradients enable efficient descent over random search. We introduce \textsc{Gome}, an MLE agent that operationalizes gradient-based optimization. \textsc{Gome} maps s...

๐Ÿ“„ CeProAgents: A Hierarchical Agents System for Automated Chemical Process Development
๐Ÿ—“๏ธ Published: 3/2/2026
๐Ÿ”— http://arxiv.org/abs/2603.01654v1
๐Ÿ‘ฅ Authors: Yuhang Yang, Ruikang Li, Jifei Ma, Kai Zhang, Qi Liu (possible past Tencent (China) affiliation), Jianyu Han, Yonggan Bu, Jibin Zhou, Defu Lian, Xin Li (possible past Google (United States) affiliation), Enhong Chen (possible past Baidu (China) affiliation)
Abstract

The development of chemical processes, a cornerstone of chemical engineering, presents formidable challenges due to its multi-faceted nature, integrating specialized knowledge, conceptual design, and parametric simulation. Capitalizing on this, we propose CeProAgents, a hierarchical multi-agent system designed to automate the development of chemical process through collaborative division of labor. Our architecture comprises three specialized agent cohorts focused on knowledge, concept, and param...

๐Ÿ“„ Learning Structured Reasoning via Tractable Trajectory Control
๐Ÿ—“๏ธ Published: 3/2/2026
๐Ÿ”— http://arxiv.org/abs/2603.01641v1
๐Ÿ‘ฅ Authors: Po-Nien Kung, Zhen Yang (possible past Tsinghua University affiliation), Jeffrey Luo, Cheng-Fu Yang, Haikang Deng, Zi-Yi Dou (possible past Tencent (China) affiliation), Yinfei Yang (possible past Google (United States) affiliation), Nanyun Peng, Zhe Gan (possible past Microsoft (United States) affiliation), Kai-Wei Chang
Abstract

Large language models can exhibit emergent reasoning behaviors, often manifested as recurring lexical patterns (e.g., "wait," indicating verification). However, complex reasoning trajectories remain sparse in unconstrained sampling, and standard RL often fails to guarantee the acquisition of diverse reasoning behaviors. We propose a systematic discovery and reinforcement of diverse reasoning patterns through structured reasoning, a paradigm that requires targeted exploration of specific reasonin...

๐Ÿ“„ SafeSci: Safety Evaluation of Large Language Models in Science Domains and Beyond
๐Ÿ—“๏ธ Published: 3/2/2026
๐Ÿ”— http://arxiv.org/abs/2603.01589v1
๐Ÿ‘ฅ Authors: Xiangyang Zhu, Yuan Tian, Qi Jia, Kaiwei Zhang, Zicheng Zhang, Chunyi Li, Kaiyuan Ji, Dongrui Liu, Zijian Chen, Lu Sun, Renrui Zhang, Yan Teng, Jing Shao, Wei Sun (possible past Google (United States) affiliation), Xia Hu, Yu Qiao (possible past Shanghai Artificial Intelligence Laboratory affiliation), Guangtao Zhai (possible past Shanghai Jiao Tong University affiliation)
Abstract

The success of large language models (LLMs) in scientific domains has heightened safety concerns, prompting numerous benchmarks to evaluate their scientific safety. Existing benchmarks often suffer from limited risk coverage and a reliance on subjective evaluation. To address these problems, we introduce SafeSci, a comprehensive framework for safety evaluation and enhancement in scientific contexts. SafeSci comprises SafeSciBench, a multi-disciplinary benchmark with 0.25M samples, and SafeSciTra...

๐Ÿ“„ PhotoBench: Beyond Visual Matching Towards Personalized Intent-Driven Photo Retrieval
๐Ÿ—“๏ธ Published: 3/2/2026
๐Ÿ”— http://arxiv.org/abs/2603.01493v1
๐Ÿ‘ฅ Authors: Tianyi Xu, Rong Shan, Junjie Wu, Jiadeng Huang, Teng Wang, Jiachen Zhu, Wenteng Chen, Minxin Tu, Quantao Dou, Zhaoxiang Wang, Changwang Zhang (possible past Tencent (China) affiliation), Weinan Zhang (possible past Shanghai Jiao Tong University affiliation), Jun Wang (possible past Tencent (China) affiliation), Jianghao Lin
Abstract

Personal photo albums are not merely collections of static images but living, ecological archives defined by temporal continuity, social entanglement, and rich metadata, which makes the personalized photo retrieval non-trivial. However, existing retrieval benchmarks rely heavily on context-isolated web snapshots, failing to capture the multi-source reasoning required to resolve authentic, intent-driven user queries. To bridge this gap, we introduce PhotoBench, the first benchmark constructed fro...

๐Ÿ“„ From Verbatim to Gist: Distilling Pyramidal Multimodal Memory via Semantic Information Bottleneck for Long-Horizon Video Agents
๐Ÿ—“๏ธ Published: 3/2/2026
๐Ÿ”— http://arxiv.org/abs/2603.01455v1
๐Ÿ‘ฅ Authors: Niu Lian, Yuting Wang, Hanshu Yao, Jinpeng Wang (possible past Tencent (China) affiliation), Bin Chen, Yaowei Wang, Min Zhang (possible past Tsinghua University affiliation), Shu-Tao Xia
Abstract

While multimodal large language models have demonstrated impressive short-term reasoning, they struggle with long-horizon video understanding due to limited context windows and static memory mechanisms that fail to mirror human cognitive efficiency. Existing paradigms typically fall into two extremes: vision-centric methods that incur high latency and redundancy through dense visual accumulation, or text-centric approaches that suffer from detail loss and hallucination via aggressive captioning....

๐Ÿ“„ VidDoS: Universal Denial-of-Service Attack on Video-based Large Language Models
๐Ÿ—“๏ธ Published: 3/2/2026
๐Ÿ”— http://arxiv.org/abs/2603.01454v1
๐Ÿ‘ฅ Authors: Duoxun Tang, Dasen Dai, Jiyao Wang, Xiao Yang (possible past Tencent (China) affiliation), Jianyu Wang (possible past Carnegie Mellon University affiliation), Siqi Cai
Abstract

Video-LLMs are increasingly deployed in safety-critical applications but are vulnerable to Energy-Latency Attacks (ELAs) that exhaust computational resources. Current image-centric methods fail because temporal aggregation mechanisms dilute individual frame perturbations. Additionally, real-time demands make instance-wise optimization impractical for continuous video streams. We introduce VidDoS, which is the first universal ELA framework tailored for Video-LLMs. Our method leverages universal o...

๐Ÿ“„ Securing the Floor and Raising the Ceiling: A Merging-based Paradigm for Multi-modal Search Agents
๐Ÿ—“๏ธ Published: 3/2/2026
๐Ÿ”— http://arxiv.org/abs/2603.01416v1
๐Ÿ‘ฅ Authors: Zhixiang Wang, Jingxuan Xu, Dajun Chen, Yunfang Wu, Wei Jiang (possible past Apple (United States) affiliation), Yong Li (possible past Tsinghua University affiliation)
Abstract

Recent advances in Vision-Language Models (VLMs) have motivated the development of multi-modal search agents that can actively invoke external search tools and integrate retrieved evidence through multi-step reasoning. While promising, existing approaches typically rely on large-scale supervised trajectories or expensive reinforcement learning (RL), leading to high training cost, instability, and a severe cold-start problem for standard VLMs. We propose a training-free paradigm to empower VLMs w...

๐Ÿ“„ ASTRA-bench: Evaluating Tool-Use Agent Reasoning and Action Planning with Personal User Context
๐Ÿ—“๏ธ Published: 3/2/2026
๐Ÿ”— http://arxiv.org/abs/2603.01357v1
๐Ÿ‘ฅ Authors: Zidi Xiu, David Q. Sun, Kevin Cheng, Maitrik Patel, Josh Date, Yizhe Zhang, Jiarui Lu, Omar Attia, Raviteja Vemulapalli (possible past Google (United States) affiliation), Oncel Tuzel (possible past Apple (United States) affiliation), Meng Cao, Samy Bengio (possible past Stanford University affiliation)
Abstract

Next-generation AI must manage vast personal data, diverse tools, and multi-step reasoning, yet most benchmarks remain context-free and single-turn. We present ASTRA-bench (Assistant Skills in Tool-use, Reasoning \& Action-planning), a benchmark that uniquely unifies time-evolving personal context with an interactive toolbox and complex user intents. Our event-driven pipeline generates 2,413 scenarios across four protagonists, grounded in longitudinal life events and annotated by referential, fu...

๐Ÿ“„ Theoretical Perspectives on Data Quality and Synergistic Effects in Pre- and Post-Training Reasoning Models
๐Ÿ—“๏ธ Published: 3/1/2026
๐Ÿ”— http://arxiv.org/abs/2603.01293v1
๐Ÿ‘ฅ Authors: Adel Javanmard, Baharan Mirzasoleiman (possible past Eth Zurich affiliation), Vahab Mirrokni (possible past Google (United States) affiliation)
Abstract

Large Language Models (LLMs) are pretrained on massive datasets and later instruction-tuned via supervised fine-tuning (SFT) or reinforcement learning (RL). Best practices emphasize large, diverse pretraining data, whereas post-training operates differently: SFT relies on smaller, high-quality datasets, while RL benefits more from scale, with larger amounts of feedback often outweighing label quality. Yet it remains unclear why pretraining and RL require large datasets, why SFT excels on smaller...

๐Ÿ“„ Linking Knowledge to Care: Knowledge Graph-Augmented Medical Follow-Up Question Generation
๐Ÿ—“๏ธ Published: 3/1/2026
๐Ÿ”— http://arxiv.org/abs/2603.01252v1
๐Ÿ‘ฅ Authors: Liwen Sun, Xiang Yu (possible past University Of Washington affiliation), Ming Tan (possible past Tencent (China) affiliation), Zhuohao Chen, Anqi Cheng, Ashutosh Joshi, Chenyan Xiong
Abstract

Clinical diagnosis is time-consuming, requiring intensive interactions between patients and medical professionals. While large language models (LLMs) could ease the pre-diagnostic workload, their limited domain knowledge hinders effective medical question generation. We introduce a Knowledge Graph-augmented LLM with active in-context learning to generate relevant and important follow-up questions, KG-Followup, serving as a critical module for the pre-diagnostic assessment. The structured medical...

๐Ÿ“„ RMBench: Memory-Dependent Robotic Manipulation Benchmark with Insights into Policy Design
๐Ÿ—“๏ธ Published: 3/1/2026
๐Ÿ”— http://arxiv.org/abs/2603.01229v1
๐Ÿ‘ฅ Authors: Tianxing Chen, Yuran Wang, Mingleyang Li, Yan Qin, Hao Shi, Zixuan Li, Yifan Hu (possible past Tencent (China) affiliation), Yingsheng Zhang, Kaixuan Wang, Yue Chen (possible past Google (United States) affiliation), Hongcheng Wang, Renjing Xu, Ruihai Wu, Yao Mu, Yaodong Yang, Hao Dong, Ping Luo (possible past Shanghai Artificial Intelligence Laboratory affiliation)
Abstract

Robotic manipulation policies have made rapid progress in recent years, yet most existing approaches give limited consideration to memory capabilities. Consequently, they struggle to solve tasks that require reasoning over historical observations and maintaining task-relevant information over time, which are common requirements in real-world manipulation scenarios. Although several memory-aware policies have been proposed, systematic evaluation of memory-dependent manipulation remains underexplo...

๐Ÿ“„ A Unified Framework to Quantify Cultural Intelligence of AI
๐Ÿ—“๏ธ Published: 3/1/2026
๐Ÿ”— http://arxiv.org/abs/2603.01211v1
๐Ÿ‘ฅ Authors: Sunipa Dev, Vinodkumar Prabhakaran (possible past Google (United States) affiliation), Rutledge Chin Feman, Aida Davani, Remi Denton, Charu Kalia, Piyawat L Kumjorn, Madhurima Maji, Rida Qadri, Negar Rostamzadeh, Renee Shelby, Romina Stella, Hayk Stepanyan, Erin Van Liemt, Aishwarya Verma, Oscar Wahltinez, Edem Wornyo, Andrew Zaldivar (possible past Google (United States) affiliation), Saลกka Mojsiloviฤ‡
Abstract

As generative AI technologies are increasingly being launched across the globe, assessing their competence to operate in different cultural contexts is exigently becoming a priority. While recent years have seen numerous and much-needed efforts on cultural benchmarking, these efforts have largely focused on specific aspects of culture and evaluation. While these efforts contribute to our understanding of cultural competence, a unified and systematic evaluation approach is needed for us as a fiel...

๐Ÿ“„ How Well Does Agent Development Reflect Real-World Work?
๐Ÿ—“๏ธ Published: 3/1/2026
๐Ÿ”— http://arxiv.org/abs/2603.01203v1
๐Ÿ‘ฅ Authors: Zora Zhiruo Wang, Sanidhya Vijayvargiya, Aspen Chen, Hanmo Zhang, Venu Arvind Arangarajan, Jett Chen, Valerie Chen, Diyi Yang (possible past Stanford University affiliation), Daniel Fried, Graham Neubig (possible past Carnegie Mellon University affiliation)
Abstract

AI agents are increasingly developed and evaluated on benchmarks relevant to human work, yet it remains unclear how representative these benchmarking efforts are of the labor market as a whole. In this work, we systematically study the relationship between agent development efforts and the distribution of real-world human work by mapping benchmark instances to work domains and skills. We first analyze 43 benchmarks and 72,342 tasks, measuring their alignment with human employment and capital all...

๐Ÿ“„ DeepResearch-9K: A Challenging Benchmark Dataset of Deep-Research Agent
๐Ÿ—“๏ธ Published: 3/1/2026
๐Ÿ”— http://arxiv.org/abs/2603.01152v1
๐Ÿ‘ฅ Authors: Tongzhou Wu, Yuhao Wang, Xinyu Ma, Xiuqiang He (possible past Tencent (China) affiliation), Shuaiqiang Wang (possible past Baidu (China) affiliation), Dawei Yin (possible past Baidu (China) affiliation), Xiangyu Zhao
Abstract

Deep-research agents are capable of executing multi-step web exploration, targeted retrieval, and sophisticated question answering. Despite their powerful capabilities, deep-research agents face two critical bottlenecks: (1) the lack of large-scale, challenging datasets with real-world difficulty, and (2) the absence of accessible, open-source frameworks for data synthesis and agent training. To bridge these gaps, we first construct DeepResearch-9K, a large-scale challenging dataset specifically...

๐Ÿ“„ AutoSkill: Experience-Driven Lifelong Learning via Skill Self-Evolution
๐Ÿ—“๏ธ Published: 3/1/2026
๐Ÿ”— http://arxiv.org/abs/2603.01145v1
๐Ÿ‘ฅ Authors: Yutao Yang, Junsong Li, Qianjun Pan, Bihao Zhan, Yuxuan Cai, Lin Du, Jie Zhou (possible past Tsinghua University affiliation), Kai Chen (possible past Shanghai Jiao Tong University affiliation), Qin Chen, Xin Li (possible past Google (United States) affiliation), Bo Zhang (possible past Tencent (China) affiliation), Liang He
Abstract

In practical LLM applications, users repeatedly express stable preferences and requirements, such as reducing hallucinations, following institutional writing conventions, or avoiding overly technical wording, yet such interaction experience is seldom consolidated into reusable knowledge. Consequently, LLM agents often fail to accumulate personalized capabilities across sessions. We present AutoSkill, an experience-driven lifelong learning framework that enables LLM agents to automatically derive...

๐Ÿ“„ The Expressive Limits of Diagonal SSMs for State-Tracking
๐Ÿ—“๏ธ Published: 3/2/2026
๐Ÿ”— http://arxiv.org/abs/2603.01959v1
๐Ÿ‘ฅ Authors: Mehran Shakerinava, Behnoush Khavari, Siamak Ravanbakhsh (possible past Carnegie Mellon University affiliation), Sarath Chandar (possible past Mila - Quebec Artificial Intelligence Institute affiliation)
Abstract

State-Space Models (SSMs) have recently been shown to achieve strong empirical performance on a variety of long-range sequence modeling tasks while remaining efficient and highly-parallelizable. However, the theoretical understanding of their expressive power remains limited. In this work, we study the expressivity of input-Dependent Complex-valued Diagonal (DCD) SSMs on sequential state-tracking tasks. We show that single-layer DCD SSMs cannot express state-tracking of any non-Abelian group at ...

๐Ÿ“„ TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training
๐Ÿ—“๏ธ Published: 3/2/2026
๐Ÿ”— http://arxiv.org/abs/2603.01714v1
๐Ÿ‘ฅ Authors: Jinluan Yang, Yuxin Liu, Zhengyu Chen, Chengcheng Han, Yueqing Sun, Qi Gu, Hui Su (possible past Tencent (China) affiliation), Xunliang Cai, Fei Wu (possible past Google (United States) affiliation), Kun Kuang
Abstract

Training tool-use agents typically relies on outcome-based filtering: Supervised Fine-Tuning (SFT) on successful trajectories and Reinforcement Learning (RL) on pass-rate-selected tasks. However, this paradigm ignores interaction dynamics: successful trajectories may lack error recovery or exhibit redundancy, while pass rates fail to distinguish structurally informative tasks from trivial ones. We propose \textbf{TopoCurate}, an interaction-aware framework that projects multi-trial rollouts from...

๐Ÿ“„ Adaptive Spectral Feature Forecasting for Diffusion Sampling Acceleration
๐Ÿ—“๏ธ Published: 3/2/2026
๐Ÿ”— http://arxiv.org/abs/2603.01623v1
๐Ÿ‘ฅ Authors: Jiaqi Han, Juntong Shi, Puheng Li, Haotian Ye (possible past Peking University affiliation), Qiushan Guo, Stefano Ermon (possible past Stanford University affiliation)
Abstract

Diffusion models have become the dominant tool for high-fidelity image and video generation, yet are critically bottlenecked by their inference speed due to the numerous iterative passes of Diffusion Transformers. To reduce the exhaustive compute, recent works resort to the feature caching and reusing scheme that skips network evaluations at selected diffusion steps by using cached features in previous steps. However, their preliminary design solely relies on local approximation, causing errors ...

๐Ÿ“„ KERV: Kinematic-Rectified Speculative Decoding for Embodied VLA Models
๐Ÿ—“๏ธ Published: 3/2/2026
๐Ÿ”— http://arxiv.org/abs/2603.01581v1
๐Ÿ‘ฅ Authors: Zihao Zheng, Zhihao Mao, Maoliang Li, Jiayu Chen, Xinhao Sun, Zhaobo Zhang, Donggang Cao, Hong Mei (possible past Peking University affiliation), Xiang Chen (possible past Tencent (China) affiliation)
Abstract

Vision-Language-Action (VLA) models build a token-domain robot control paradigm, yet suffer from low speed. Speculative Decoding (SD) is an optimization strategy that can boost inference speed. Two key issues emerge when integrating VLA and SD: first, SD relies on re-inference to address token errors, which is computationally expensive; second, to mitigate token errors, the acceptance threshold in SD requires careful adjustment. Existing works fail to address the above two issues effectively. Me...

๐Ÿ“„ Fed-GAME: Personalized Federated Learning with Graph Attention Mixture-of-Experts For Time-Series Forecasting
๐Ÿ—“๏ธ Published: 3/2/2026
๐Ÿ”— http://arxiv.org/abs/2603.01363v1
๐Ÿ‘ฅ Authors: Yi Li (possible past University Of Washington affiliation), Han Liu (possible past Tsinghua University affiliation), Mingfeng Fan, Guo Chen, Chaojie Li, Biplab Sikdar
Abstract

Federated learning (FL) on graphs shows promise for distributed time-series forecasting. Yet, existing methods rely on static topologies and struggle with client heterogeneity. We propose Fed-GAME, a framework that models personalized aggregation as message passing over a learnable dynamic implicit graph. The core is a decoupled parameter difference-based update protocol, where clients transmit parameter differences between their fine-tuned private model and a shared global model. On the server,...

๐Ÿ“„ Learn Hard Problems During RL with Reference Guided Fine-tuning
๐Ÿ—“๏ธ Published: 3/1/2026
๐Ÿ”— http://arxiv.org/abs/2603.01223v1
๐Ÿ‘ฅ Authors: Yangzhen Wu, Shanda Li, Zixin Wen, Xin Zhou (possible past Stanford University affiliation), Ameet Talwalkar (possible past University Of California, Berkeley affiliation), Yiming Yang (possible past Microsoft (United States) affiliation), Wenhao Huang, Tianle Cai
Abstract

Reinforcement learning (RL) for mathematical reasoning can suffer from reward sparsity: for challenging problems, LLM fails to sample any correct trajectories, preventing RL from receiving meaningful positive feedback. At the same time, there often exist human-written reference solutions along with the problem (e.g., problems from AoPS), but directly fine-tuning on these solutions offers no benefit because models often cannot imitate human proofs that lie outside their own reasoning distribution...

๐Ÿ“„ Thoth: Mid-Training Bridges LLMs to Time Series Understanding
๐Ÿ—“๏ธ Published: 3/1/2026
๐Ÿ”— http://arxiv.org/abs/2603.01042v1
๐Ÿ‘ฅ Authors: Jiafeng Lin, Yuxuan Wang (possible past Google (United States) affiliation), Jialong Wu, Huakun Luo, Zhongyi Pei, Jianmin Wang (possible past Tsinghua University affiliation)
Abstract

Large Language Models (LLMs) have demonstrated remarkable success in general-purpose reasoning. However, they still struggle to understand and reason about time series data, which limits their effectiveness in decision-making scenarios that depend on temporal dynamics. In this paper, we propose Thoth, the first family of mid-trained LLMs with general-purpose time series understanding capabilities. As a pivotal intermediate stage, mid-training achieves task- and domain-agnostic alignment between ...

*Notable papers are those with at least two authors from a "big" AI/ML lab.