πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ Scaling Verification Can Be More Effective than Scaling Policy Learning for Vision-Language-Action Alignment
πŸ—“οΈ Published: 2/12/2026
πŸ”— http://arxiv.org/abs/2602.12281v1
πŸ‘₯ Authors: Jacky Kwok, Xilun Zhang, Mengdi Xu, Yuejiang Liu, Azalia Mirhoseini (possible past Google (United States) affiliation), Chelsea Finn (possible past University Of California, Berkeley affiliation), Marco Pavone (possible past Stanford University affiliation)
Abstract

The long-standing vision of general-purpose robots hinges on their ability to understand and act upon natural language instructions. Vision-Language-Action (VLA) models have made remarkable progress toward this goal, yet their generated actions can still misalign with the given instructions. In this paper, we investigate test-time verification as a means to shrink the "intention-action gap.'' We first characterize the test-time scaling law for embodied instruction following and demonstrate that ...

πŸ“„ Agentic Test-Time Scaling for WebAgents
πŸ—“οΈ Published: 2/12/2026
πŸ”— http://arxiv.org/abs/2602.12276v1
πŸ‘₯ Authors: Nicholas Lee, Lutfi Eren Erdogan, Chris Joseph John, Surya Krishnapillai, Michael W. Mahoney (possible past Stanford University affiliation), Kurt Keutzer (possible past University Of California, Berkeley affiliation), Amir Gholami
Abstract

Test-time scaling has become a standard way to improve performance and boost reliability of neural network models. However, its behavior on agentic, multi-step tasks remains less well-understood: small per-step errors can compound over long horizons; and we find that naive policies that uniformly increase sampling show diminishing returns. In this work, we present CATTS, a simple technique for dynamically allocating compute for multi-step agents. We first conduct an empirical study of inference-...

πŸ“„ Olmix: A Framework for Data Mixing Throughout LM Development
πŸ—“οΈ Published: 2/12/2026
πŸ”— http://arxiv.org/abs/2602.12237v1
πŸ‘₯ Authors: Mayee F. Chen, Tyler Murray, David Heineman, Matt Jordan, Hannaneh Hajishirzi (possible past University Of Washington affiliation), Christopher RΓ© (possible past Stanford University affiliation), Luca Soldaini, Kyle Lo
Abstract

Data mixing -- determining the ratios of data from different domains -- is a first-order concern for training language models (LMs). While existing mixing methods show promise, they fall short when applied during real-world LM development. We present Olmix, a framework that addresses two such challenges. First, the configuration space for developing a mixing method is not well understood -- design choices across existing methods lack justification or consensus and overlook practical issues like ...

πŸ“„ STAR : Bridging Statistical and Agentic Reasoning for Large Model Performance Prediction
πŸ—“οΈ Published: 2/12/2026
πŸ”— http://arxiv.org/abs/2602.12143v1
πŸ‘₯ Authors: Xiaoxiao Wang, Chunxiao Li, Junying Wang, Yijin Guo, Zijian Chen, Chunyi Li, Xiaohong Liu (possible past Shanghai Jiao Tong University affiliation), Zicheng Zhang, Guangtao Zhai (possible past Shanghai Jiao Tong University affiliation)
Abstract

As comprehensive large model evaluation becomes prohibitively expensive, predicting model performance from limited observations has become essential. However, existing statistical methods struggle with pattern shifts, data sparsity, and lack of explanation, while pure LLM methods remain unreliable. We propose STAR, a framework that bridges data-driven STatistical expectations with knowledge-driven Agentic Reasoning. STAR leverages specialized retrievers to gather external knowledge and embeds se...

πŸ“„ Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation
πŸ—“οΈ Published: 2/12/2026
πŸ”— http://arxiv.org/abs/2602.12125v1
πŸ‘₯ Authors: Wenkai Yang, Weijie Liu (possible past Tencent (China) affiliation), Ruobing Xie (possible past Tencent (China) affiliation), Kai Yang, Saiyong Yang, Yankai Lin (possible past Tsinghua University affiliation)
Abstract

On-policy distillation (OPD), which aligns the student with the teacher's logit distribution on student-generated trajectories, has demonstrated strong empirical gains in improving student performance and often outperforms off-policy distillation and reinforcement learning (RL) paradigms. In this work, we first theoretically show that OPD is a special case of dense KL-constrained RL where the reward function and the KL regularization are always weighted equally and the reference model can by any...

πŸ“„ Stop Unnecessary Reflection: Training LRMs for Efficient Reasoning with Adaptive Reflection and Length Coordinated Penalty
πŸ—“οΈ Published: 2/12/2026
πŸ”— http://arxiv.org/abs/2602.12113v1
πŸ‘₯ Authors: Zewei Yu, Lirong Gao, Yuke Zhu (possible past Stanford University affiliation), Bo Zheng, Sheng Guo (possible past Google (United States) affiliation), Haobo Wang, Junbo Zhao
Abstract

Large Reasoning Models (LRMs) have demonstrated remarkable performance on complex reasoning tasks by employing test-time scaling. However, they often generate over-long chains-of-thought that, driven by substantial reflections such as repetitive self-questioning and circular reasoning, lead to high token consumption, substantial computational overhead, and increased latency without improving accuracy, particularly in smaller models. Our observation reveals that increasing problem complexity indu...

πŸ“„ DeepSight: An All-in-One LM Safety Toolkit
πŸ—“οΈ Published: 2/12/2026
πŸ”— http://arxiv.org/abs/2602.12092v1
πŸ‘₯ Authors: Bo Zhang (possible past Tencent (China) affiliation), Jiaxuan Guo, Lijun Li, Dongrui Liu, Sujin Chen, Guanxu Chen, Zhijie Zheng, Qihao Lin, Lewen Yan, Chen Qian (possible past Shanghai Jiao Tong University affiliation), Yijin Zhou, Yuyao Wu, Shaoxiong Guo, Tianyi Du, Jingyi Yang, Xuhao Hu, Ziqi Miao, Xiaoya Lu, Jing Shao, Xia Hu
Abstract

As the development of Large Models (LMs) progresses rapidly, their safety is also a priority. In current Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) safety workflow, evaluation, diagnosis, and alignment are often handled by separate tools. Specifically, safety evaluation can only locate external behavioral risks but cannot figure out internal root causes. Meanwhile, safety diagnosis often drifts from concrete risk scenarios and remains at the explainable level. In t...

πŸ“„ CSEval: A Framework for Evaluating Clinical Semantics in Text-to-Image Generation
πŸ—“οΈ Published: 2/12/2026
πŸ”— http://arxiv.org/abs/2602.12004v1
πŸ‘₯ Authors: Robert Cronshaw, Konstantinos Vilouras, Junyu Yan, Yuning Du (possible past Baidu (China) affiliation), Feng Chen, Steven Mcdonagh, Sotirios A. Tsaftaris (possible past University Of Edinburgh affiliation)
Abstract

Text-to-image generation has been increasingly applied in medical domains for various purposes such as data augmentation and education. Evaluating the quality and clinical reliability of these generated images is essential. However, existing methods mainly assess image realism or diversity, while failing to capture whether the generated images reflect the intended clinical semantics, such as anatomical location and pathology. In this study, we propose the Clinical Semantics Evaluator (CSEval), a...

πŸ“„ Towards Fair and Comprehensive Evaluation of Routers in Collaborative LLM Systems
πŸ—“οΈ Published: 2/12/2026
πŸ”— http://arxiv.org/abs/2602.11877v1
πŸ‘₯ Authors: Wanxing Wu, He Zhu, Yixia Li, Lei Yang (possible past Google (United States) affiliation), Jiehui Zhao, Hongru Wang, Jian Yang, Benyou Wang (possible past Tencent (China) affiliation), Bingyi Jing, Guanhua Chen
Abstract

Large language models (LLMs) have achieved success, but cost and privacy constraints necessitate deploying smaller models locally while offloading complex queries to cloud-based models. Existing router evaluations are unsystematic, overlooking scenario-specific requirements and out-of-distribution robustness. We propose RouterXBench, a principled evaluation framework with three dimensions: router ability, scenario alignment, and cross-domain robustness. Unlike prior work that relies on output pr...

πŸ“„ Intelligent AI Delegation
πŸ—“οΈ Published: 2/12/2026
πŸ”— http://arxiv.org/abs/2602.11865v1
πŸ‘₯ Authors: Nenad TomaΕ‘ev (possible past Google (United States) affiliation), Matija Franklin, Simon Osindero (possible past Google (United States) affiliation)
Abstract

AI agents are able to tackle increasingly complex tasks. To achieve more ambitious goals, AI agents need to be able to meaningfully decompose problems into manageable sub-components, and safely delegate their completion across to other AI agents and humans alike. Yet, existing task decomposition and delegation methods rely on simple heuristics, and are not able to dynamically adapt to environmental changes and robustly handle unexpected failures. Here we propose an adaptive framework for intelli...

πŸ“„ MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling
πŸ—“οΈ Published: 2/12/2026
πŸ”— http://arxiv.org/abs/2602.11761v1
πŸ‘₯ Authors: Minicpm Team, Wenhao An, Yingfa Chen, Yewei Fang, Jiayi Li, Xin Li (possible past Google (United States) affiliation), Yaohui Li, Yishan Li, Yuxuan Li, Biyuan Lin, Chuan Liu, Hezi Liu, Siyuan Liu, Hongya Lyu, Yinxu Pan, Shixin Ren, Xingyu Shen, Zhou Su, Haojun Sun, Yangang Sun, Zhen Leng Thai, Xin Tian, Rui Wang (possible past Tencent (China) affiliation), Xiaorong Wang, Yudong Wang, Bo Wu (possible past Tencent (China) affiliation), Xiaoyue Xu, Dong Xu, Shuaikang Xue, Jiawei Yang, Bowen Zhang, Jinqian Zhang, Letian Zhang, Shengnan Zhang, Xinyu Zhang (possible past Baidu (China) affiliation), Xinyuan Zhang, Zhu Zhang, Hengyu Zhao, Jiacheng Zhao, Jie Zhou (possible past Tsinghua University affiliation), Zihan Zhou, Shuo Wang (possible past Nvidia (United States) affiliation), Chaojun Xiao, Xu Han (possible past Tsinghua University affiliation), Zhiyuan Liu (possible past Tsinghua University affiliation), Maosong Sun (possible past Tsinghua University affiliation)
Abstract

The evolution of large language models (LLMs) towards applications with ultra-long contexts faces challenges posed by the high computational and memory costs of the Transformer architecture. While existing sparse and linear attention mechanisms attempt to mitigate these issues, they typically involve a trade-off between memory efficiency and model performance. This paper introduces MiniCPM-SALA, a 9B-parameter hybrid architecture that integrates the high-fidelity long-context modeling of sparse ...

πŸ“„ Beyond Parameter Arithmetic: Sparse Complementary Fusion for Distribution-Aware Model Merging
πŸ—“οΈ Published: 2/12/2026
πŸ”— http://arxiv.org/abs/2602.11717v1
πŸ‘₯ Authors: Weihong Lin (possible past Peking University affiliation), Lin Sun, Qilong Shi, Aomufei Yuan, Yuxuan Tian, Zhengyang Wang, Guangxiang Zhao, Xiangzheng Zhang, Tong Yang (possible past Peking University affiliation)
Abstract

Model merging has emerged as a promising paradigm for composing the capabilities of large language models by directly operating in weight space, enabling the integration of specialized models without costly retraining. However, existing merging methods largely rely on parameter-space heuristics, which often introduce severe interference, leading to degraded generalization and unstable generation behaviors such as repetition and incoherent outputs. In this work, we propose Sparse Complementary Fu...

πŸ“„ DRACO: a Cross-Domain Benchmark for Deep Research Accuracy, Completeness, and Objectivity
πŸ—“οΈ Published: 2/12/2026
πŸ”— http://arxiv.org/abs/2602.11685v1
πŸ‘₯ Authors: Joey Zhong, Hao Zhang (possible past Tencent (China) affiliation), Clare Southern, Jeremy Yang, Thomas Wang, Kate Jung, Shu Zhang (possible past Google (United States) affiliation), Denis Yarats (possible past Meta (United States) affiliation), Johnny Ho, Jerry Ma
Abstract

We present DRACO (Deep Research Accuracy, Completeness, and Objectivity), a benchmark of complex deep research tasks. These tasks, which span 10 domains and draw on information sources from 40 countries, originate from anonymized real-world usage patterns within a large-scale deep research system. Tasks are sampled from a de-identified dataset of Perplexity Deep Research requests, then filtered and augmented to ensure that the tasks are anonymized, open-ended and complex, objectively evaluable, ...

πŸ“„ ThinkRouter: Efficient Reasoning via Routing Thinking between Latent and Discrete Spaces
πŸ—“οΈ Published: 2/12/2026
πŸ”— http://arxiv.org/abs/2602.11683v1
πŸ‘₯ Authors: Xin Xu, Tong Yu (possible past Carnegie Mellon University affiliation), Xiang Chen (possible past Tencent (China) affiliation), Haoliang Wang, Julian Mcauley, Saayan Mitra
Abstract

Recent work explores latent reasoning to improve reasoning efficiency by replacing explicit reasoning trajectories with continuous representations in a latent space, yet its effectiveness varies across settings. Analysis of model confidence dynamics under latent reasoning reveals that thinking trajectories ending in incorrect answers contain fewer low-confidence steps than those ending in correct answers. Meanwhile, we suggest that soft embeddings aggregated by multiple low-confidence thinking a...

πŸ“„ PLOT-CT: Pre-log Voronoi Decomposition Assisted Generation for Low-dose CT Reconstruction
πŸ—“οΈ Published: 2/12/2026
πŸ”— http://arxiv.org/abs/2602.11625v1
πŸ‘₯ Authors: Bin Huang, Xun Yu, Yikun Zhang, Yi Zhang (possible past Google (United States) affiliation), Yang Chen (possible past Tencent (China) affiliation), Qiegen Liu
Abstract

Low-dose computed tomography (LDCT) reconstruction is fundamentally challenged by severe noise and compromised data fidelity under reduced radiation exposure. Most existing methods operate either in the image or post-log projection domain, which fails to fully exploit the rich structural information in pre-log measurements while being highly susceptible to noise. The requisite logarithmic transformation critically amplifies noise within these data, imposing exceptional demands on reconstruction ...

πŸ“„ ABot-N0: Technical Report on the VLA Foundation Model for Versatile Embodied Navigation
πŸ—“οΈ Published: 2/12/2026
πŸ”— http://arxiv.org/abs/2602.11598v1
πŸ‘₯ Authors: Zedong Chu, Shichao Xie, Xiaolong Wu, Yanfen Shen, Minghua Luo, Zhengbo Wang, Fei Liu, Xiaoxu Leng, Junjun Hu, Mingyang Yin, Jia Lu, Yingnan Guo, Kai Yang, Jiawei Han (possible past Google (United States) affiliation), Xu Chen (possible past Tencent (China) affiliation), Yanqing Zhu, Yuxiang Zhao, Xin Liu, Yirong Yang, Ye He, Jiahang Wang, Yang Cai, Tianlin Zhang, Li Gao, Liu Liu, Mingchao Sun, Fan Jiang (possible past Shanghai Jiao Tong University affiliation), Chiyu Wang, Zhicheng Liu, Hongyu Pan, Honglin Han, Zhining Gu, Kuan Yang, Jianfang Zhang, Di Jing, Zihao Guan, Wei Guo, Guoqing Liu, Di Yang, Xiangpo Yang, Menglin Yang, Hongguang Xing, Weiguo Li, Mu Xu
Abstract

Embodied navigation has long been fragmented by task-specific architectures. We introduce ABot-N0, a unified Vision-Language-Action (VLA) foundation model that achieves a ``Grand Unification'' across 5 core tasks: Point-Goal, Object-Goal, Instruction-Following, POI-Goal, and Person-Following. ABot-N0 utilizes a hierarchical ``Brain-Action'' architecture, pairing an LLM-based Cognitive Brain for semantic reasoning with a Flow Matching-based Action Expert for precise, continuous trajectory generat...

πŸ“„ Adaptive Milestone Reward for GUI Agents
πŸ—“οΈ Published: 2/12/2026
πŸ”— http://arxiv.org/abs/2602.11524v1
πŸ‘₯ Authors: Congmin Zheng, Xiaoyun Mo, Xinbei Ma, Qiqiang Lin, Yin Zhao, Jiachen Zhu, Xingyu Lou, Jun Wang (possible past Tencent (China) affiliation), Zhaoxiang Wang, Weiwen Liu, Zhuosheng Zhang, Yong Yu (possible past Shanghai Jiao Tong University affiliation), Weinan Zhang (possible past Shanghai Jiao Tong University affiliation)
Abstract

Reinforcement Learning (RL) has emerged as a mainstream paradigm for training Mobile GUI Agents, yet it struggles with the temporal credit assignment problem inherent in long-horizon tasks. A primary challenge lies in the trade-off between reward fidelity and density: outcome reward offers high fidelity but suffers from signal sparsity, while process reward provides dense supervision but remains prone to bias and reward hacking. To resolve this conflict, we propose the Adaptive Milestone Reward ...

πŸ“„ AgentNoiseBench: Benchmarking Robustness of Tool-Using LLM Agents Under Noisy Condition
πŸ—“οΈ Published: 2/11/2026
πŸ”— http://arxiv.org/abs/2602.11348v1
πŸ‘₯ Authors: Ruipeng Wang, Yuxin Chen, Yukai Wang, Chang Wu, Junfeng Fang, Xiaodong Cai, Qi Gu, Hui Su (possible past Tencent (China) affiliation), An Zhang, Xiang Wang (possible past Tencent (China) affiliation), Xunliang Cai, Tat-Seng Chua
Abstract

Recent advances in large language models have enabled LLM-based agents to achieve strong performance on a variety of benchmarks. However, their performance in real-world deployments often that observed on benchmark settings, especially in complex and imperfect environments. This discrepancy largely arises because prevailing training and evaluation paradigms are typically built on idealized assumptions, overlooking the inherent stochasticity and noise present in real-world interactions. To bridge...

πŸ“„ MolmoSpaces: A Large-Scale Open Ecosystem for Robot Navigation and Manipulation
πŸ—“οΈ Published: 2/11/2026
πŸ”— http://arxiv.org/abs/2602.11337v1
πŸ‘₯ Authors: Yejin Kim, Wilbert Pumacay, Omar Rayyan, Max Argus, Winson Han, Eli Vanderbilt, Jordi Salvador, Abhay Deshpande, Rose Hendrix, Snehal Jauhri, Shuo Liu, Nur Muhammad Mahi Shafiullah, Maya Guru, Ainaz Eftekhar, Karen Farley, Donovan Clay, Jiafei Duan, Arjun Guru, Piper Wolters, Alvaro Herrasti, Ying-Chun Lee, Georgia Chalvatzaki, Yuchen Cui, Ali Farhadi (possible past University Of Washington affiliation), Dieter Fox (possible past University Of Washington affiliation), Ranjay Krishna (possible past University Of Washington affiliation)
Abstract

Deploying robots at scale demands robustness to the long tail of everyday situations. The countless variations in scene layout, object geometry, and task specifications that characterize real environments are vast and underrepresented in existing robot benchmarks. Measuring this level of generalization requires infrastructure at a scale and diversity that physical evaluation alone cannot provide. We introduce MolmoSpaces, a fully open ecosystem to support large-scale benchmarking of robot polici...

πŸ“„ Voxtral Realtime
πŸ—“οΈ Published: 2/11/2026
πŸ”— http://arxiv.org/abs/2602.11298v1
πŸ‘₯ Authors: Alexander H. Liu, Andy Ehrenberg, Andy Lo, Chen-Yo Sun, Guillaume Lample, Jean-Malo Delignon, Khyathi Raghavi Chandu (possible past Carnegie Mellon University affiliation), Patrick Von Platen, Pavankumar Reddy Muddireddy, Rohin Arora, Sanchit Gandhi, Sandeep Subramanian (possible past Carnegie Mellon University affiliation), Soham Ghosh, Srijan Mishra, Abhinav Rastogi (possible past Google (United States) affiliation), Alan Jeffares, Albert Jiang, Alexandre Sablayrolles, AmΓ©lie HΓ©liou, Andrew Bai, Angele Lenglemetz, Anmol Agarwal, Anton Eliseev, Antonia Calvi, Arjun Majumdar, Baptiste Bout, Baptiste RoziΓ¨re, Baudouin De Monicault, Benjamin Tibi, ClΓ©mence Lanfranchi, Connor Chen, Corentin Barreau, Corentin Sautier, Cyprien Courtot, Darius Dabert, Diego De Las Casas (possible past Deepmind (United Kingdom) affiliation), Elliot Chane-Sane, Enguerrand Paquin, Faruk Ahmed, Federico Baldassarre, Gabrielle Berrada, GaΓ«tan Ecrepont, Gauthier Guinet, Genevieve Hayes, Georgii Novikov, Giada Pistilli, Guillaume Martin, Gunjan Dhanuka, Gunshi Gupta, Han Zhou, Indraneel Mukherjee (possible past Google (United States) affiliation), Irene Zhang, Jaeyoung Kim, Jan Ludziejewski, Jason Rute, Joachim Studnia, John Harvill, Jonas Amar, Josselin Somerville Roberts, Julien Tauran, Karmesh Yadav, Kartik Khandelwal, Kush Jain, Laurence Aitchison, LΓ©onard Blier, Lingxiao Zhao, Louis Martin, Lucile Saulnier, Luyu Gao, Maarten Buyl, Manan Sharma, Margaret Jennings, Marie Pellat, Mark Prins, Mathieu PoirΓ©e, Mathilde Guillaumin, Matthieu Dinot, Matthieu Futeral, Maxime Darrin, Maximilian Augustin, Mert Unsal, Mia Chiquier, Nathan Grinsztajn, Neha Gupta, Olivier Bousquet (possible past Google (United States) affiliation), Olivier Duchenne, Patricia Wang, Paul Jacob, Paul Wambergue, Paula Kurylowicz, PhilomΓ¨ne Chagniot, Pierre Stock, Piotr MiΕ‚oΕ›, Prateek Gupta, Pravesh Agrawal, Quentin Torroba, Ram Ramrakhya, Rishi Shah, Romain Sauvestre, Roman Soletskyi, Rosalie Millner, Sagar Vaze, Samuel Humeau, Siddharth Gandhi, Sumukh Aithal, Szymon Antoniak, Teven Le Scao, ThΓ©o Cachet, Theo Simon Sorg, Thibaut Lavril (possible past Meta (United States) affiliation), Thomas Chabal, Thomas Foubert, Thomas Robert, Thomas Wang, Tim Lawson, Tom Bewley, Tom Edwards, Tyler Wang, Valeriia Nemychnikova, Van Phung, Vedant Nanda, Victor Jouault, Virgile Richard, Vladislav Bataev, Wassim Bouaziz, Wen-Ding Li, William Marshall, Xinghui Li, Xingran Guo, Xinyu Yang, Yannic Neuhaus, Yihan Wang, Zaccharie Ramzi, Zhenlin Xu
Abstract

We introduce Voxtral Realtime, a natively streaming automatic speech recognition model that matches offline transcription quality at sub-second latency. Unlike approaches that adapt offline models through chunking or sliding windows, Voxtral Realtime is trained end-to-end for streaming, with explicit alignment between audio and text streams. Our architecture builds on the Delayed Streams Modeling framework, introducing a new causal audio encoder and Ada RMS-Norm for improved delay conditioning. ...

πŸ“„ HiFloat4 Format for Language Model Inference
πŸ—“οΈ Published: 2/11/2026
πŸ”— http://arxiv.org/abs/2602.11287v1
πŸ‘₯ Authors: Yuanyong Luo, Jing Huang (possible past Meta (United States) affiliation), Yu Cheng (possible past National University Of Singapore affiliation), Ziwei Yu, Kaihua Zhang, Kehong Hong, Xinda Ma, Xin Wang (possible past University Of Edinburgh affiliation), Anping Tong, Guipeng Hu, Yun Xu, Mehran Taghian, Peng Wu, Guanglin Li, Yunke Peng, Tianchi Hu, Minqi Chen, Michael Bi Mi, Hu Liu, Xiping Zhou, Junsong Wang, Qiang Lin, Heng Liao
Abstract

This paper introduces HiFloat4 (HiF4), a block floating-point data format tailored for deep learning. Each HiF4 unit packs 64 4-bit elements with 32 bits of shared scaling metadata, averaging 4.5 bits per value. The metadata specifies a three-level scaling hierarchy, capturing inter- and intra-group dynamic range while improving the utilization of the representational space. In addition, the large 64-element group size enables matrix multiplications to be executed in a highly fixed-point manner,...

πŸ“„ Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling
πŸ—“οΈ Published: 2/11/2026
πŸ”— http://arxiv.org/abs/2602.11146v1
πŸ‘₯ Authors: Gongye Liu, Bo Yang (possible past Tencent (China) affiliation), Yida Zhi, Zhizhou Zhong, Lei Ke (possible past Tencent (China) affiliation), Didan Deng, Han Gao (possible past Tencent (China) affiliation), Yongxiang Huang, Kaihao Zhang (possible past Tencent (China) affiliation), Hongbo Fu, Wenhan Luo (possible past Tencent (China) affiliation)
Abstract

Preference optimization for diffusion and flow-matching models relies on reward functions that are both discriminatively robust and computationally efficient. Vision-Language Models (VLMs) have emerged as the primary reward provider, leveraging their rich multimodal priors to guide alignment. However, their computation and memory cost can be substantial, and optimizing a latent diffusion generator through a pixel-space reward introduces a domain mismatch that complicates alignment. In this paper...

πŸ“„ T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization
πŸ—“οΈ Published: 2/12/2026
πŸ”— http://arxiv.org/abs/2602.12262v1
πŸ‘₯ Authors: Tunyu Zhang, Xinxi Zhang, Ligong Han (possible past Google (United States) affiliation), Haizhou Shi, Xiaoxiao He, Zhuowei Li, Hao Wang (possible past Tsinghua University affiliation), Kai Xu (possible past National University Of Defense Technology affiliation), Akash Srivastava, Hao Wang (possible past Tsinghua University affiliation), Vladimir Pavlovic, Dimitris N. Metaxas
Abstract

Diffusion large language models (DLLMs) have the potential to enable fast text generation by decoding multiple tokens in parallel. However, in practice, their inference efficiency is constrained by the need for many refinement steps, while aggressively reducing the number of steps leads to a substantial degradation in generation quality. To alleviate this, we propose a trajectory self-distillation framework that improves few-step decoding by distilling the model's own generative trajectories. We...

πŸ“„ Moonshine v2: Ergodic Streaming Encoder ASR for Latency-Critical Speech Applications
πŸ—“οΈ Published: 2/12/2026
πŸ”— http://arxiv.org/abs/2602.12241v1
πŸ‘₯ Authors: Manjunath Kudlur (possible past Nvidia (United States) affiliation), Evan King, James Wang, Pete Warden (possible past Google (United States) affiliation)
Abstract

Latency-critical speech applications (e.g., live transcription, voice commands, and real-time translation) demand low time-to-first-token (TTFT) and high transcription accuracy, particularly on resource-constrained edge devices. Full-attention Transformer encoders remain a strong accuracy baseline for automatic speech recognition (ASR) because every frame can directly attend to every other frame, which resolves otherwise locally ambiguous acoustics using distant lexical context. However, this gl...

πŸ“„ Extending Puzzle for Mixture-of-Experts Reasoning Models with Application to GPT-OSS Acceleration
πŸ—“οΈ Published: 2/12/2026
πŸ”— http://arxiv.org/abs/2602.11937v1
πŸ‘₯ Authors: Akhiad Bercovich, Nir Ailon (possible past Google (United States) affiliation), Vladimir Anisimov, Tomer Asida, Nave Assaf, Mohammad Dabbah, Ido Galil, Amnon Geifman, Yonatan Geifman, Izhak Golan, Roi Koren, Itay Levy, Zach Moshe, Pavlo Molchanov (possible past Nvidia (United States) affiliation), Najeeb Nabwani, Mostofa Patwari, Omri Puny, Tomer Ronen, Itamar Schen, Elad Segal, Ido Shahaf, Oren Tropp, Ran Zilberstein, Ran El-Yaniv
Abstract

Reasoning-focused LLMs improve answer quality by generating longer reasoning traces, but the additional tokens dramatically increase serving cost, motivating inference optimization. We extend and apply Puzzle, a post-training neural architecture search (NAS) framework, to gpt-oss-120B to produce gpt-oss-puzzle-88B, a deployment-optimized derivative. Our approach combines heterogeneous MoE expert pruning, selective replacement of full-context attention with window attention, FP8 KV-cache quantiza...

πŸ“„ Echo: Towards Advanced Audio Comprehension via Audio-Interleaved Reasoning
πŸ—“οΈ Published: 2/12/2026
πŸ”— http://arxiv.org/abs/2602.11909v1
πŸ‘₯ Authors: Daiqing Wu, Xuan Zhang (possible past Meta (United States) affiliation), Dongbao Yang, Jiashu Yao, Longfei Chen, Qingsong Liu, Sicheng Zhao (possible past University Of California, Berkeley affiliation), Can Ma, Yangyang Kang, Yu Zhou
Abstract

The maturation of Large Audio Language Models (LALMs) has raised growing expectations for them to comprehend complex audio much like humans. Current efforts primarily replicate text-based reasoning by contextualizing audio content through a one-time encoding, which introduces a critical information bottleneck. Drawing inspiration from human cognition, we propose audio-interleaved reasoning to break through this bottleneck. It treats audio as an active reasoning component, enabling sustained audi...

πŸ“„ Temporal Difference Learning with Constrained Initial Representations
πŸ—“οΈ Published: 2/12/2026
πŸ”— http://arxiv.org/abs/2602.11800v1
πŸ‘₯ Authors: Jiafei Lyu, Jingwen Yang, Zhongjian Qiao, Runze Liu, Zeyuan Liu, Deheng Ye (possible past Tencent (China) affiliation), Zongqing Lu, Xiu Li (possible past Tsinghua University affiliation)
Abstract

Recently, there have been numerous attempts to enhance the sample efficiency of off-policy reinforcement learning (RL) agents when interacting with the environment, including architecture improvements and new algorithms. Despite these advances, they overlook the potential of directly constraining the initial representations of the input data, which can intuitively alleviate the distribution shift issue and stabilize training. In this paper, we introduce the Tanh function into the initial layer t...

πŸ“„ LAER-MoE: Load-Adaptive Expert Re-layout for Efficient Mixture-of-Experts Training
πŸ—“οΈ Published: 2/12/2026
πŸ”— http://arxiv.org/abs/2602.11686v1
πŸ‘₯ Authors: Xinyi Liu, Yujie Wang, Fangcheng Fu (possible past Peking University affiliation), Xuefeng Xiao, Huixia Li, Jiashi Li, Bin Cui (possible past Peking University affiliation)
Abstract

Expert parallelism is vital for effectively training Mixture-of-Experts (MoE) models, enabling different devices to host distinct experts, with each device processing different input data. However, during expert parallel training, dynamic routing results in significant load imbalance among experts: a handful of overloaded experts hinder overall iteration, emerging as a training bottleneck. In this paper, we introduce LAER-MoE, an efficient MoE training framework. The core of LAER-MoE is a nove...

πŸ“„ Learn from Your Mistakes: Self-Correcting Masked Diffusion Models
πŸ—“οΈ Published: 2/12/2026
πŸ”— http://arxiv.org/abs/2602.11590v1
πŸ‘₯ Authors: Yair Schiff, Omer Belhasin, Roy Uziel, Guanghan Wang, Marianne Arriola, Gilad Turok, Michael Elad (possible past Technion – Israel Institute Of Technology affiliation), Volodymyr Kuleshov (possible past Stanford University affiliation)
Abstract

Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models, enabling parallel token generation while achieving competitive performance. Despite these advantages, MDMs face a fundamental limitation: once tokens are unmasked, they remain fixed, leading to error accumulation and ultimately degrading sample quality. We address this by proposing a framework that trains a model to perform both unmasking and correction. By reusing outputs from the MDM denoising netw...

πŸ“„ Brain4FMs: A Benchmark of Foundation Models for Electrical Brain Signal
πŸ—“οΈ Published: 2/12/2026
πŸ”— http://arxiv.org/abs/2602.11558v1
πŸ‘₯ Authors: Fanqi Shen, Enhong Yang, Jiahe Li, Junru Hong, Xiaoran Pan, Zhizhang Yuan, Meng Li (possible past Meta (United States) affiliation), Yang Yang (possible past Tencent (China) affiliation)
Abstract

Brain Foundation Models (BFMs) are transforming neuroscience by enabling scalable and transferable learning from neural signals, advancing both clinical diagnostics and cutting-edge neuroscience exploration. Their emergence is powered by large-scale clinical recordings, particularly electroencephalography (EEG) and intracranial EEG, which provide rich temporal and spatial representations of brain dynamics. However, despite their rapid proliferation, the field lacks a unified understanding of exi...

πŸ“„ CADET: Context-Conditioned Ads CTR Prediction With a Decoder-Only Transformer
πŸ—“οΈ Published: 2/11/2026
πŸ”— http://arxiv.org/abs/2602.11410v1
πŸ‘₯ Authors: David Pardoe, Neil Daftary, Miro Furtado, Aditya Aiyer, Yu Wang (possible past Tsinghua University affiliation), Liuqing Li, Tao Song, Lars Hertel, Young Jin Yun, Senthil Radhakrishnan, Zhiwei Wang, Tommy Li, Khai Tran, Ananth Nagarajan, Ali Naqvi, Yue Zhang, Renpeng Fang, Avi Romascanu, Arjun Kulothungun, Deepak Kumar, Praneeth Boda, Fedor Borisyuk (possible past Meta (United States) affiliation), Ruoyan Wang
Abstract

Click-through rate (CTR) prediction is fundamental to online advertising systems. While Deep Learning Recommendation Models (DLRMs) with explicit feature interactions have long dominated this domain, recent advances in generative recommenders have shown promising results in content recommendation. However, adapting these transformer-based architectures to ads CTR prediction still presents unique challenges, including handling post-scoring contextual signals, maintaining offline-online consistenc...

πŸ“„ Latent Forcing: Reordering the Diffusion Trajectory for Pixel-Space Image Generation
πŸ—“οΈ Published: 2/11/2026
πŸ”— http://arxiv.org/abs/2602.11401v1
πŸ‘₯ Authors: Alan Baade, Eric Ryan Chan, Kyle Sargent, Changan Chen, Justin Johnson (possible past Stanford University affiliation), Ehsan Adeli, Li Fei-Fei (possible past Stanford University affiliation)
Abstract

Latent diffusion models excel at generating high-quality images but lose the benefits of end-to-end modeling. They discard information during image encoding, require a separately trained decoder, and model an auxiliary distribution to the raw data. In this paper, we propose Latent Forcing, a simple modification to existing architectures that achieves the efficiency of latent diffusion while operating on raw natural images. Our approach orders the denoising trajectory by jointly processing latent...

πŸ“„ YOR: Your Own Mobile Manipulator for Generalizable Robotics
πŸ—“οΈ Published: 2/11/2026
πŸ”— http://arxiv.org/abs/2602.11150v1
πŸ‘₯ Authors: Manan H Anjaria, Mehmet Enes Erciyes, Vedant Ghatnekar, Neha Navarkar, Haritheja Etukuru, Xiaole Jiang, Kanad Patel, Dhawal Kabra, Nicholas Wojno, Radhika Ajay Prayage, Soumith Chintala (possible past Meta (United States) affiliation), Lerrel Pinto (possible past Carnegie Mellon University affiliation), Nur Muhammad Mahi Shafiullah, Zichen Jeff Cui
Abstract

Recent advances in robot learning have generated significant interest in capable platforms that may eventually approach human-level competence. This interest, combined with the commoditization of actuators, has propelled growth in low-cost robotic platforms. However, the optimal form factor for mobile manipulation, especially on a budget, remains an open question. We introduce YOR, an open-source, low-cost mobile manipulator that integrates an omnidirectional base, a telescopic vertical lift, an...

*Notable papers are those with at least two authors from a "big" AI/ML lab.