📄 Notable* Recent AI/ML arXiv Papers

Last updated just now...

📄 HiRO-Nav: Hybrid ReasOning Enables Efficient Embodied Navigation
🗓️ Published: 4/9/2026
🔗 http://arxiv.org/abs/2604.08232v1
👥 Authors: He Zhao (possible past Tencent (China) affiliation), Yijun Yang, Zichuan Lin, Deheng Ye (possible past Tencent (China) affiliation), Chunyan Miao
Abstract

Embodied navigation agents built upon large reasoning models (LRMs) can handle complex, multimodal environmental input and perform grounded reasoning per step to improve sequential decision-making for long-horizon tasks. However, a critical question remains: \textit{how can the reasoning capabilities of LRMs be harnessed intelligently and efficiently for long-horizon navigation tasks?} In simple scenes, agents are expected to act reflexively, while in complex ones they should engage in deliberat...

📄 AT-ADD: All-Type Audio Deepfake Detection Challenge Evaluation Plan
🗓️ Published: 4/9/2026
🔗 http://arxiv.org/abs/2604.08184v1
👥 Authors: Yuankun Xie, Haonan Cheng, Jiayi Zhou, Xiaoxuan Guo, Tao Wang (possible past Stanford University affiliation), Jian Liu, Weiqiang Wang, Ruibo Fu, Xiaopeng Wang, Hengyan Huang, Xiaoying Huang, Long Ye, Guangtao Zhai (possible past Shanghai Jiao Tong University affiliation)
Abstract

The rapid advancement of Audio Large Language Models (ALLMs) has enabled cost-effective, high-fidelity generation and manipulation of both speech and non-speech audio, including sound effects, singing voices, and music. While these capabilities foster creativity and content production, they also introduce significant security and trust challenges, as realistic audio deepfakes can now be generated and disseminated at scale. Existing audio deepfake detection (ADD) countermeasures (CMs) and benchma...

📄 ViVa: A Video-Generative Value Model for Robot Reinforcement Learning
🗓️ Published: 4/9/2026
🔗 http://arxiv.org/abs/2604.08168v1
👥 Authors: Jindi Lv, Hao Li (possible past Tsinghua University affiliation), Jie Li, Yifei Nie, Fankun Kong, Yang Wang (possible past Baidu (China) affiliation), Xiaofeng Wang, Zheng Zhu, Chaojun Ni, Qiuping Deng, Hengtao Li, Jiancheng Lv, Guan Huang
Abstract

Vision-language-action (VLA) models have advanced robot manipulation through large-scale pretraining, but real-world deployment remains challenging due to partial observability and delayed feedback. Reinforcement learning addresses this via value functions, which assess task progress and guide policy improvement. However, existing value models built on vision-language models (VLMs) struggle to capture temporal dynamics, undermining reliable value estimation in long-horizon tasks. In this paper, ...

📄 Face-D(^2)CL: Multi-Domain Synergistic Representation with Dual Continual Learning for Facial DeepFake Detection
🗓️ Published: 4/9/2026
🔗 http://arxiv.org/abs/2604.08159v1
👥 Authors: Yushuo Zhang, Yu Cheng (possible past National University Of Singapore affiliation), Yongkang Hu, Jiuan Zhou, Jiawei Chen (possible past Tencent (China) affiliation), Yuan Xie, Zhaoxia Yin
Abstract

The rapid advancement of facial forgery techniques poses severe threats to public trust and information security, making facial DeepFake detection a critical research priority. Continual learning provides an effective approach to adapt facial DeepFake detection models to evolving forgery patterns. However, existing methods face two key bottlenecks in real-world continual learning scenarios: insufficient feature representation and catastrophic forgetting. To address these issues, we propose Face-...

📄 LegoDiffusion: Micro-Serving Text-to-Image Diffusion Workflows
🗓️ Published: 4/9/2026
🔗 http://arxiv.org/abs/2604.08123v1
👥 Authors: Lingyun Yang, Suyi Li (possible past Google (United States) affiliation), Tianyu Feng, Xiaoxiao Jiang, Zhipeng Di, Weiyi Lu, Kan Liu, Yinghao Yu, Tao Lan, Guodong Yang, Lin Qu, Liping Zhang, Wei Wang (possible past University Of Oxford affiliation)
Abstract

Text-to-image generation executes a diffusion workflow comprising multiple models centered on a base diffusion model. Existing serving systems treat each workflow as an opaque monolith, provisioning, placing, and scaling all constituent models together, which obscures internal dataflow, prevents model sharing, and enforces coarse-grained resource management. In this paper, we make a case for micro-serving diffusion workflows with LegoDiffusion, a system that decomposes a workflow into loosely co...

📄 Small Vision-Language Models are Smart Compressors for Long Video Understanding
🗓️ Published: 4/9/2026
🔗 http://arxiv.org/abs/2604.08120v1
👥 Authors: Junjie Fei, Jun Chen, Zechun Liu, Yunyang Xiong, Chong Zhou, Wei Wen (possible past Google (United States) affiliation), Junlin Han, Mingchen Zhuge, Saksham Suri, Qi Qian, Shuming Liu, Lemeng Wu, Raghuraman Krishnamoorthi, Vikas Chandra (possible past Meta (United States) affiliation), Mohamed Elhoseiny (possible past Meta (United States) affiliation), Chenchen Zhu
Abstract

Adapting Multimodal Large Language Models (MLLMs) for hour-long videos is bottlenecked by context limits. Dense visual streams saturate token budgets and exacerbate the lost-in-the-middle phenomenon. Existing heuristics, like sparse sampling or uniform pooling, blindly sacrifice fidelity by discarding decisive moments and wasting bandwidth on irrelevant backgrounds. We propose Tempo, an efficient query-aware framework compressing long videos for downstream understanding. Tempo leverages a Small ...

📄 PASK: Toward Intent-Aware Proactive Agents with Long-Term Memory
🗓️ Published: 4/9/2026
🔗 http://arxiv.org/abs/2604.08000v1
👥 Authors: Zhifei Xie, Zongzheng Hu, Fangda Ye, Xin Zhang (possible past Google (United States) affiliation), Haobo Chai, Zihang Liu, Pengcheng Wu, Guibin Zhang, Yue Liao, Xiaobin Hu (possible past Tencent (China) affiliation), Deheng Ye (possible past Tencent (China) affiliation), Chunyan Miao, Shuicheng Yan (possible past National University Of Singapore affiliation)
Abstract

Proactivity is a core expectation for AGI. Prior work remains largely confined to laboratory settings, leaving a clear gap in real-world proactive agent: depth, complexity, ambiguity, precision and real-time constraints. We study this setting, where useful intervention requires inferring latent needs from ongoing context and grounding actions in evolving user memory under latency and long-horizon constraints. We first propose DD-MM-PAS (Demand Detection, Memory Modeling, Proactive Agent System) ...

📄 LogAct: Enabling Agentic Reliability via Shared Logs
🗓️ Published: 4/9/2026
🔗 http://arxiv.org/abs/2604.07988v1
👥 Authors: Mahesh Balakrishnan, Ashwin Bharambe (possible past Meta (United States) affiliation), Davide Testuggine, David Geraghty, David Mao, Vidhya Venkat, Ilya Mironov (possible past Meta (United States) affiliation), Rithesh Baradi, Gayathri Aiyer, Victoria Dudin
Abstract

Agents are LLM-driven components that can mutate environments in powerful, arbitrary ways. Extracting guarantees for the execution of agents in production environments can be challenging due to asynchrony and failures. In this paper, we propose a new abstraction called LogAct, where each agent is a deconstructed state machine playing a shared log. In LogAct, agentic actions are visible in the shared log before they are executed; can be stopped prior to execution by pluggable, decoupled voters; a...

📄 A Decomposition Perspective to Long-context Reasoning for LLMs
🗓️ Published: 4/9/2026
🔗 http://arxiv.org/abs/2604.07981v1
👥 Authors: Yanling Xiao, Huaibing Xie, Guoliang Zhao, Shihan Dou, Shaolei Wang, Yiting Liu, Nantao Zheng, Cheng Zhang, Pluto Zhou, Zhisong Zhang (possible past Shanghai Jiao Tong University affiliation), Lemao Liu (possible past Tencent (China) affiliation)
Abstract

Long-context reasoning is essential for complex real-world applications, yet remains a significant challenge for Large Language Models (LLMs). Despite the rapid evolution in long-context reasoning, current research often overlooks the internal complexity of the long-context reasoning task itself. In this paper, we move beyond this holistic view and decompose long-context reasoning into a set of fundamental atomic skills, and we then automatically synthesize a suite of pseudo datasets, each expli...

📄 How Far Are Large Multimodal Models from Human-Level Spatial Action? A Benchmark for Goal-Oriented Embodied Navigation in Urban Airspace
🗓️ Published: 4/9/2026
🔗 http://arxiv.org/abs/2604.07973v1
👥 Authors: Baining Zhao, Ziyou Wang, Jianjie Fang, Zile Zhou, Yanggang Xu, Yatai Ji, Jiacheng Xu, Qian Zhang (possible past University Of Washington affiliation), Weichen Zhang, Chen Gao, Xinlei Chen (possible past Tsinghua University affiliation)
Abstract

Large multimodal models (LMMs) show strong visual-linguistic reasoning but their capacity for spatial decision-making and action remains unclear. In this work, we investigate whether LMMs can achieve embodied spatial action like human through a challenging scenario: goal-oriented navigation in urban 3D spaces. We first spend over 500 hours constructing a dataset comprising 5,037 high-quality goal-oriented navigation samples, with an emphasis on 3D vertical actions and rich urban semantic informa...

📄 WorldMAP: Bootstrapping Vision-Language Navigation Trajectory Prediction with Generative World Models
🗓️ Published: 4/9/2026
🔗 http://arxiv.org/abs/2604.07957v1
👥 Authors: Hongjin Chen, Shangyun Jiang, Tonghua Su, Chen Gao, Xinlei Chen (possible past Tsinghua University affiliation), Yong Li (possible past Tsinghua University affiliation), Zhibo Chen (possible past Tencent (China) affiliation)
Abstract

Vision-language models (VLMs) and generative world models are opening new opportunities for embodied navigation. VLMs are increasingly used as direct planners or trajectory predictors, while world models support look-ahead reasoning by imagining future views. Yet predicting a reliable trajectory from a single egocentric observation remains challenging. Current VLMs often generate unstable trajectories, and world models, though able to synthesize plausible futures, do not directly provide the gro...

📄 Mitigating Entangled Steering in Large Vision-Language Models for Hallucination Reduction
🗓️ Published: 4/9/2026
🔗 http://arxiv.org/abs/2604.07914v1
👥 Authors: Yuanhong Zhang, Zhaoyang Wang, Xin Zhang (possible past Google (United States) affiliation), Weizhan Zhang, Joey Tianyi Zhou (possible past Tencent (China) affiliation)
Abstract

Large Vision-Language Models (LVLMs) have achieved remarkable success across cross-modal tasks but remain hindered by hallucinations, producing textual outputs inconsistent with visual content. Existing methods mitigate hallucinations but often alter generation behavior, resulting in shorter outputs and shifted token distributions, especially in latent space steering approaches. We identify that this issue stems from entangled steering signals, where suppressing hallucinations inadvertently disr...

📄 Data Selection for Multi-turn Dialogue Instruction Tuning
🗓️ Published: 4/9/2026
🔗 http://arxiv.org/abs/2604.07892v1
👥 Authors: Bo Li (possible past Tencent (China) affiliation), Shikun Zhang, Wei Ye (possible past Meta (United States) affiliation)
Abstract

Instruction-tuned language models increasingly rely on large multi-turn dialogue corpora, but these datasets are often noisy and structurally inconsistent, with topic drift, repetitive chitchat, and mismatched answer formats across turns. We address this from a data selection perspective and propose \textbf{MDS} (Multi-turn Dialogue Selection), a dialogue-level framework that scores whole conversations rather than isolated turns. MDS combines a global coverage stage that performs bin-wise select...

📄 QaRL: Rollout-Aligned Quantization-Aware RL for Fast and Stable Training under Training--Inference Mismatch
🗓️ Published: 4/9/2026
🔗 http://arxiv.org/abs/2604.07853v1
👥 Authors: Hao Gu, Hao Wang (possible past Tsinghua University affiliation), Jiacheng Liu, Lujun Li, Qiyuan Zhu, Bei Liu, Binxing Xu, Lei Wang (possible past Baidu (China) affiliation), Xintong Yang, Sida Lin, Sirui Han, Yike Guo
Abstract

Large language model (LLM) reinforcement learning (RL) pipelines are often bottlenecked by rollout generation, making end-to-end training slow. Recent work mitigates this by running rollouts with quantization to accelerate decoding, which is the most expensive stage of the RL loop. However, these setups destabilize optimization by amplifying the training-inference gap: rollouts are operated at low precision, while learning updates are computed at full precision. To address this challenge, we pro...

📄 SPARD: Self-Paced Curriculum for RL Alignment via Integrating Reward Dynamics and Data Utility
🗓️ Published: 4/9/2026
🔗 http://arxiv.org/abs/2604.07837v1
👥 Authors: Xuyang Zhi, Peilun Zhou, Chengqiang Lu, Hang Lv, Yiwei Liang, Rongyang Zhang, Yan Gao, Yi Wu (possible past University Of California, Berkeley affiliation), Yao Hu, Hongchao Gu, Defu Lian, Hao Wang (possible past Tsinghua University affiliation), Enhong Chen (possible past Baidu (China) affiliation)
Abstract

The evolution of Large Language Models (LLMs) is shifting the focus from single, verifiable tasks toward complex, open-ended real-world scenarios, imposing significant challenges on the post-training phase. In these settings, the scale and complexity of reward systems have grown significantly, transitioning toward multi-objective formulations that encompass a comprehensive spectrum of model capabilities and application contexts. However, traditional methods typically rely on fixed reward weights...

📄 Lightweight LLM Agent Memory with Small Language Models
🗓️ Published: 4/9/2026
🔗 http://arxiv.org/abs/2604.07798v1
👥 Authors: Jiaquan Zhang, Chaoning Zhang, Shuxu Chen, Zhenzhen Huang, Pengcheng Zheng, Zhicheng Wang (possible past Google (United States) affiliation), Ping Guo, Fan Mo, Sung-Ho Bae, Jie Zou, Jiwei Wei, Yang Yang (possible past Tencent (China) affiliation)
Abstract

Although LLM agents can leverage tools for complex tasks, they still need memory to maintain cross-turn consistency and accumulate reusable information in long-horizon interactions. However, retrieval-based external memory systems incur low online overhead but suffer from unstable accuracy due to limited query construction and candidate filtering. In contrast, many systems use repeated large-model calls for online memory operations, improving accuracy but accumulating latency over long interacti...

📄 Emotion Concepts and their Function in a Large Language Model
🗓️ Published: 4/9/2026
🔗 http://arxiv.org/abs/2604.07729v1
👥 Authors: Nicholas Sofroniew, Isaac Kauvar, William Saunders, Runjin Chen, Tom Henighan (possible past Openai (United States) affiliation), Sasha Hydrie, Craig Citro (possible past Google (United States) affiliation), Adam Pearce (possible past Google (United States) affiliation), Julius Tarng, Wes Gurnee, Joshua Batson, Sam Zimmerman, Kelley Rivoire, Kyle Fish, Chris Olah (possible past Openai (United States) affiliation), Jack Lindsey
Abstract

Large language models (LLMs) sometimes appear to exhibit emotional reactions. We investigate why this is the case in Claude Sonnet 4.5 and explore implications for alignment-relevant behavior. We find internal representations of emotion concepts, which encode the broad concept of a particular emotion and generalize across contexts and behaviors it might be linked to. These representations track the operative emotion concept at a given token position in a conversation, activating in accordance wi...

📄 Squeeze Evolve: Unified Multi-Model Orchestration for Verifier-Free Evolution
🗓️ Published: 4/9/2026
🔗 http://arxiv.org/abs/2604.07725v1
👥 Authors: Monishwaran Maheswaran, Leon Lakhani, Zhongzhu Zhou, Shijia Yang, Junxiong Wang, Coleman Hooper, Yuezhou Hu, Rishabh Tiwari, Jue Wang (possible past Tencent (China) affiliation), Harman Singh, Qingyang Wu, Yuqing Jian, Ce Zhang (possible past Eth Zurich affiliation), Kurt Keutzer (possible past University Of California, Berkeley affiliation), Tri Dao, Xiaoxia Wu, Ben Athiwaratkun, James Zou, Chenfeng Xu (possible past University Of California, Berkeley affiliation)
Abstract

We show that verifier-free evolution is bottlenecked by both diversity and efficiency: without external correction, repeated evolution accelerates collapse toward narrow modes, while the uniform use of a high-cost model wastes compute and quickly becomes economically impractical. We introduce Squeeze Evolve, a unified multi-model orchestration framework for verifier-free evolutionary inference. Our approach is guided by a simple principle: allocate model capability where it has the highest margi...

📄 How Independent are Large Language Models? A Statistical Framework for Auditing Behavioral Entanglement and Reweighting Verifier Ensembles
🗓️ Published: 4/8/2026
🔗 http://arxiv.org/abs/2604.07650v1
👥 Authors: Chenchen Kuai, Jiwan Jiang, Zihao Zhu, Hao Wang (possible past Tsinghua University affiliation), Keshu Wu, Zihao Li, Yunlong Zhang, Chenxi Liu, Zhengzhong Tu (possible past Google (United States) affiliation), Zhiwen Fan, Yang Zhou
Abstract

The rapid growth of the large language model (LLM) ecosystem raises a critical question: are seemingly diverse models truly independent? Shared pretraining data, distillation, and alignment pipelines can induce hidden behavioral dependencies, latent entanglement, that undermine multi-model systems such as LLM-as-a-judge pipelines and ensemble verification, which implicitly assume independent signals. In practice, this manifests as correlated reasoning patterns and synchronized failures, where ap...

📄 Exponential quantum advantage in processing massive classical data
🗓️ Published: 4/8/2026
🔗 http://arxiv.org/abs/2604.07639v1
👥 Authors: Haimeng Zhao, Alexander Zlokapa (possible past Massachusetts Institute Of Technology affiliation), Hartmut Neven (possible past Google (United States) affiliation), Ryan Babbush (possible past Google (United States) affiliation), John Preskill, Jarrod R. Mcclean (possible past Google (United States) affiliation), Hsin-Yuan Huang (possible past Google (United States) affiliation)
Abstract

Broadly applicable quantum advantage, particularly in classical data processing and machine learning, has been a fundamental open problem. In this work, we prove that a small quantum computer of polylogarithmic size can perform large-scale classification and dimension reduction on massive classical data by processing samples on the fly, whereas any classical machine achieving the same prediction performance requires exponentially larger size. Furthermore, classical machines that are exponentiall...

📄 Learning is Forgetting: LLM Training As Lossy Compression
🗓️ Published: 4/8/2026
🔗 http://arxiv.org/abs/2604.07569v1
👥 Authors: Henry C. Conklin, Tom Hosking, Tan Yi-Chern, Julian Gold, Jonathan D. Cohen (possible past Deepmind (United Kingdom) affiliation), Thomas L. Griffiths (possible past University Of California, Berkeley affiliation), Max Bartolo, Seraphina Goldfarb-Tarrant
Abstract

Despite the increasing prevalence of large language models (LLMs), we still have a limited understanding of how their representational spaces are structured. This limits our ability to interpret how and what they learn or relate them to learning in humans. We argue LLMs are best seen as an instance of lossy compression, where over training they learn by retaining only information in their training data relevant to their objective(s). We show pre-training results in models that are optimally comp...

📄 ReflectRM: Boosting Generative Reward Models via Self-Reflection within a Unified Judgment Framework
🗓️ Published: 4/8/2026
🔗 http://arxiv.org/abs/2604.07506v1
👥 Authors: Kai Qin, Liangxin Liu, Yu Liang, Longzheng Wang, Yan Wang (possible past Tencent (China) affiliation), Yueyang Zhang (possible past Baidu (China) affiliation), Long Xia, Zhiyuan Sun, Houde Liu, Daiting Shi
Abstract

Reward Models (RMs) are critical components in the Reinforcement Learning from Human Feedback (RLHF) pipeline, directly determining the alignment quality of Large Language Models (LLMs). Recently, Generative Reward Models (GRMs) have emerged as a superior paradigm, offering higher interpretability and stronger generalization than traditional scalar RMs. However, existing methods for GRMs focus primarily on outcome-level supervision, neglecting analytical process quality, which constrains their p...

📄 Enabling Intrinsic Reasoning over Dense Geospatial Embeddings with DFR-Gemma
🗓️ Published: 4/8/2026
🔗 http://arxiv.org/abs/2604.07490v1
👥 Authors: Xuechen Zhang, Aviv Slobodkin, Joydeep Paul, Mandar Sharma, Samet Oymak, Shravya Shetty (possible past Google (United States) affiliation), Gautam Prasad (possible past Google (United States) affiliation)
Abstract

Representation learning for geospatial and spatio-temporal data plays a critical role in enabling general-purpose geospatial intelligence. Recent geospatial foundation models, such as the Population Dynamics Foundation Model (PDFM), encode complex population and mobility dynamics into compact embeddings. However, their integration with Large Language Models (LLMs) remains limited. Existing approaches to LLM integration treat these embeddings as retrieval indices or convert them into textual desc...

📄 ConsistRM: Improving Generative Reward Models via Consistency-Aware Self-Training
🗓️ Published: 4/8/2026
🔗 http://arxiv.org/abs/2604.07484v1
👥 Authors: Yu Liang, Liangxin Liu, Longzheng Wang, Yan Wang (possible past Tencent (China) affiliation), Yueyang Zhang (possible past Baidu (China) affiliation), Long Xia, Zhiyuan Sun, Daiting Shi
Abstract

Generative reward models (GRMs) have emerged as a promising approach for aligning Large Language Models (LLMs) with human preferences by offering greater representational capacity and flexibility than traditional scalar reward models. However, GRMs face two major challenges: reliance on costly human-annotated data restricts scalability, and self-training approaches often suffer from instability and vulnerability to reward hacking. To address these issues, we propose ConsistRM, a self-training fr...

📄 CMP: Robust Whole-Body Tracking for Loco-Manipulation via Competence Manifold Projection
🗓️ Published: 4/8/2026
🔗 http://arxiv.org/abs/2604.07457v1
👥 Authors: Ziyang Cheng, Haoyu Wei, Hang Yin, Xiuwei Xu, Bingyao Yu, Jie Zhou (possible past Tsinghua University affiliation), Jiwen Lu (possible past Tsinghua University affiliation)
Abstract

While decoupled control schemes for legged mobile manipulators have shown robustness, learning holistic whole-body control policies for tracking global end-effector poses remains fragile against Out-of-Distribution (OOD) inputs induced by sensor noise or infeasible user commands. To improve robustness against these perturbations without sacrificing task performance and continuity, we propose Competence Manifold Projection (CMP). Specifically, we utilize a Frame-Wise Safety Scheme that transforms...

📄 MoRight: Motion Control Done Right
🗓️ Published: 4/8/2026
🔗 http://arxiv.org/abs/2604.07348v1
👥 Authors: Shaowei Liu, Xuanchi Ren, Tianchang Shen, Huan Ling, Saurabh Gupta (possible past University Of California, Berkeley affiliation), Shenlong Wang (possible past University Of Toronto affiliation), Sanja Fidler (possible past University Of Toronto affiliation), Jun Gao (possible past Nvidia (United States) affiliation)
Abstract

Generating motion-controlled videos--where user-specified actions drive physically plausible scene dynamics under freely chosen viewpoints--demands two capabilities: (1) disentangled motion control, allowing users to separately control the object motion and adjust camera viewpoint; and (2) motion causality, ensuring that user-driven actions trigger coherent reactions from other objects rather than merely displacing pixels. Existing methods fall short on both fronts: they entangle camera and obje...

📄 GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents
🗓️ Published: 4/8/2026
🔗 http://arxiv.org/abs/2604.07429v1
👥 Authors: Mingyu Ouyang, Siyuan Hu, Kevin Qinghong Lin (possible past National University Of Singapore affiliation), Hwee Tou Ng, Mike Zheng Shou (possible past National University Of Singapore affiliation)
Abstract

Towards an embodied generalist for real-world interaction, Multimodal Large Language Model (MLLM) agents still suffer from challenging latency, sparse feedback, and irreversible mistakes. Video games offer an ideal testbed with rich visual observations and closed-loop interaction, demanding fine-grained perception, long-horizon planning, and precise control. However, systematically evaluating these capabilities is currently hindered by heterogeneous action interfaces and heuristic verification. ...

📄 Bit-by-Bit: Progressive QAT Strategy with Outlier Channel Splitting for Stable Low-Bit LLMs
🗓️ Published: 4/9/2026
🔗 http://arxiv.org/abs/2604.07888v1
👥 Authors: Binxing Xu, Hao Gu, Lujun Li, Hao Wang (possible past Tsinghua University affiliation), Bei Liu, Jiacheng Liu, Qiyuan Zhu, Xintong Yang, Chao Li (possible past Baidu (China) affiliation), Sirui Han, Yike Guo
Abstract

Training LLMs at ultra-low precision remains a formidable challenge. Direct low-bit QAT often suffers from convergence instability and substantial training costs, exacerbated by quantization noise from heavy-tailed outlier channels and error accumulation across layers. To address these issues, we present Bit-by-Bit, a progressive QAT framework with outlier channel splitting. Our approach integrates three key components: (1) block-wise progressive training that reduces precision stage by stage, e...

📄 FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling
🗓️ Published: 4/8/2026
🔗 http://arxiv.org/abs/2604.06916v1
👥 Authors: Yitong Li, Junsong Chen, Shuchen Xue, Pengcuo Zeren, Siyuan Fu, Dinghao Yang, Yangyang Tang, Junjie Bai, Ping Luo (possible past Shanghai Artificial Intelligence Laboratory affiliation), Song Han (possible past Stanford University affiliation), Enze Xie
Abstract

Reinforcement-Learning-based post-training has recently emerged as a promising paradigm for aligning text-to-image diffusion models with human preferences. In recent studies, increasing the rollout group size yields pronounced performance improvements, indicating substantial room for further alignment gains. However, scaling rollouts on large-scale foundational diffusion models (e.g., FLUX.1-12B) imposes a heavy computational burden. To alleviate this bottleneck, we explore the integration of FP...

*Notable papers are those with at least two authors from a "big" AI/ML lab.