πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ ELF: Embedded Language Flows
πŸ—“οΈ Published: 5/11/2026
πŸ”— http://arxiv.org/abs/2605.10938v1
πŸ‘₯ Authors: Keya Hu, Linlu Qiu, Yiyang Lu, Hanhong Zhao, Tianhong Li, Yoon Kim (possible past University Of Oxford affiliation), Jacob Andreas (possible past University Of California, Berkeley affiliation), Kaiming He (possible past Microsoft (United States) affiliation)
Abstract

Diffusion and flow-based models have become the de facto approaches for generating continuous data, e.g., in domains such as images and videos. Their success has attracted growing interest in applying them to language modeling. Unlike their image-domain counterparts, today's leading diffusion language models (DLMs) primarily operate over discrete tokens. In this paper, we show that continuous DLMs can be made effective with minimal adaptation to the discrete domain. We propose Embedded Language ...

πŸ“„ Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why
πŸ—“οΈ Published: 5/11/2026
πŸ”— http://arxiv.org/abs/2605.10889v1
πŸ‘₯ Authors: Mohammadreza Armandpour, Fatih Ilhan, David Harrison (possible past Google (United States) affiliation), Ajay Jaiswal, Duc N. M Hoang, Fartash Faghri (possible past University Of Toronto affiliation), Yizhe Zhang, Minsik Cho, Mehrdad Farajtabar (possible past Google (United States) affiliation)
Abstract

On-policy distillation offers dense, per-token supervision for training reasoning models; however, it remains unclear under which conditions this signal is beneficial and under which it is detrimental. Which teacher model should be used, and in the case of self-distillation, which specific context should serve as the supervisory signal? Does the optimal choice vary from one token to the next? At present, addressing these questions typically requires costly training runs whose aggregate performan...

πŸ“„ BenchCAD: A Comprehensive, Industry-Standard Benchmark for Programmatic CAD
πŸ—“οΈ Published: 5/11/2026
πŸ”— http://arxiv.org/abs/2605.10865v1
πŸ‘₯ Authors: Haozhe Zhang, Kaichen Liu, Miaomiao Chen, Lei Li (possible past Carnegie Mellon University affiliation), Shaojie Yang (possible past Tencent (China) affiliation), Cheng Peng, Hanjie Chen
Abstract

Industrial Computer-Aided Design (CAD) code generation requires models to produce executable parametric programs from visual or textual inputs. Beyond recognizing the outer shape of a part, this task involves understanding its 3D structure, inferring engineering parameters, and choosing CAD operations that reflect how the part would be designed and manufactured. Despite the promise of Multimodal large language models (MLLMs) for this task, they are rarely evaluated on whether these capabilities ...

πŸ“„ MaD Physics: Evaluating information seeking under constraints in physical environments
πŸ—“οΈ Published: 5/11/2026
πŸ”— http://arxiv.org/abs/2605.10820v1
πŸ‘₯ Authors: Moksh Jain, Mehdi Bennani, Johannes Bausch, Yuri Chervonyi, Bogdan Georgiev, Simon Osindero (possible past Google (United States) affiliation), Nenad TomaΕ‘ev (possible past Google (United States) affiliation)
Abstract

Scientific discovery is fundamentally a resource-constrained process that requires navigating complex trade-offs between the quality and quantity of measurements due to physical and cost constraints. Measurements drive the scientific process by revealing novel phenomena to improve our understanding. Existing benchmarks for evaluating agents for scientific discovery focus on either static knowledge-based reasoning or unconstrained experimental design tasks, and do not capture the ability to make ...

πŸ“„ NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation
πŸ—“οΈ Published: 5/11/2026
πŸ”— http://arxiv.org/abs/2605.10813v1
πŸ‘₯ Authors: Jinhang Xu, Qiyuan Zhu, Yujun Wu, Zirui Wang, Dongxu Zhang, Jianxin Tang, Marcia Tian, Yiling Duan, Siyuan Li (possible past Tencent (China) affiliation), Jingxuan Wei, Sirui Han, Yike Guo, Odin Zhang, Conghui He (possible past Tsinghua University affiliation), Cheng Tan
Abstract

LLM-powered multi-agent systems can now automate the full research pipeline from ideation to paper writing, but a fundamental question remains: automation for whom? Researchers operate under different resource configurations, hold different methodological preferences, and target different output formats. A system that produces uniform outputs regardless of these differences will systematically under-serve every individual user, making personalization a precondition for research automation to be ...

πŸ“„ Towards Understanding Continual Factual Knowledge Acquisition of Language Models: From Theory to Algorithm
πŸ—“οΈ Published: 5/11/2026
πŸ”— http://arxiv.org/abs/2605.10640v1
πŸ‘₯ Authors: Haoyu Wang (possible past Tencent (China) affiliation), Yifan Shang, Zhongxiang Sun, Weijie Yu, Xiao Zhang (possible past Tsinghua University affiliation), Jun Xu (possible past Google (United States) affiliation)
Abstract

Continual Pre-Training (CPT) is essential for enabling Language Models (LMs) to integrate new knowledge without erasing old. While classical CPT techniques like data replay have become the standard paradigm, the mechanisms underlying how LMs acquire and retain facts over time, termed as continual Factual Knowledge Acquisition (cFKA), remain unclear. In this work, we present a theoretical framework that characterizes the training dynamics of cFKA using a single-layer Transformer, offering a unifi...

πŸ“„ Multi-layer attentive probing improves transfer of audio representations for bioacoustics
πŸ—“οΈ Published: 5/11/2026
πŸ”— http://arxiv.org/abs/2605.10494v1
πŸ‘₯ Authors: Marius Miron, David Robinson, Masato Hagiwara, Titouan Parcollet, Jules Cauzinille, Gagan Narula, Milad Alizadeh, Ellen Gilsenan-Mcmahon, Sara Keen, Emmanuel Chemla, Benjamin Hoffman, Maddie Cusimano, Diane Kim, Felix Effenberger, Jane K. Lawton, Aza Raskin, Olivier Pietquin (possible past Google (United States) affiliation), Matthieu Geist (possible past Google (United States) affiliation)
Abstract

Probing heads map the representations learned from audio by a machine learning model to downstream task labels and are a key component in evaluating representation learning. Most bioacoustic benchmarks use a fixed, low-capacity probe, such as a linear layer on the final encoder layer. While this standardization enables model comparisons, it may bias results by overlooking the interaction between encoder features and probe design. In this work, we systematically study different probing strategies...

πŸ“„ How Mobile World Model Guides GUI Agents?
πŸ—“οΈ Published: 5/11/2026
πŸ”— http://arxiv.org/abs/2605.10347v1
πŸ‘₯ Authors: Weikai Xu, Kun Huang, Yunren Feng, Jiaxing Li, Yuhan Chen, Yuxuan Liu, Zhizheng Jiang, Heng Qu, Pengzhi Gao, Wei Liu (possible past Tsinghua University affiliation), Jian Luan, Xiaolin Hu (possible past Tsinghua University affiliation), Bo An
Abstract

Recent advances in vision-language models have enabled mobile GUI agents to perceive visual interfaces and execute user instructions, but reliable prediction of action consequences remains critical for long-horizon and high-risk interactions. Existing mobile world models provide either text-based or image-based future states, yet it remains unclear which representation is useful, whether generated rollouts can replace real environments, and how test-time guidance helps agents of different streng...

πŸ“„ PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents
πŸ—“οΈ Published: 5/11/2026
πŸ”— http://arxiv.org/abs/2605.10341v1
πŸ‘₯ Authors: Bihui Yu, Xinglong Xu, Junjie Jiang, Jiabei Cheng, Caijun Jia, Siyuan Li (possible past Tencent (China) affiliation), Conghui He (possible past Tsinghua University affiliation), Jingxuan Wei, Cheng Tan
Abstract

A LaTeX manuscript that compiles without error is not necessarily publication-ready. The resulting PDFs frequently suffer from misplaced floats, overflowing equations, inconsistent table scaling, widow and orphan lines, and poor page balance, forcing authors into repetitive compile-inspect-edit cycles. Rule-based tools are blind to rendered visuals, operating only on source code and log files. Text-only LLMs perform open-loop text editing, unable to predict or verify the two-dimensional layout c...

πŸ“„ PowerStep: Memory-Efficient Adaptive Optimization via $\ell_p$-Norm Steepest Descent
πŸ—“οΈ Published: 5/11/2026
πŸ”— http://arxiv.org/abs/2605.10335v1
πŸ‘₯ Authors: Yao Lu (possible past Google (United States) affiliation), Dengdong Fan, Shixun Zhang, Yonghong Tian (possible past Peking University affiliation)
Abstract

Adaptive optimizers, most notably Adam, have become the default standard for training large-scale neural networks such as Transformers. These methods maintain running estimates of gradient first and second moments, incurring substantial memory overhead. We introduce PowerStep, a memory-efficient optimizer that achieves coordinate-wise adaptivity without storing second-moment statistics. Motivated by steepest descent under an $\ell_p$-norm geometry, we show that applying a nonlinear transform dir...

πŸ“„ EmbodiSkill: Skill-Aware Reflection for Self-Evolving Embodied Agents
πŸ—“οΈ Published: 5/11/2026
πŸ”— http://arxiv.org/abs/2605.10332v1
πŸ‘₯ Authors: Ruofei Ju, Xinrui Wang, Xin Ding, Yifan Yang (possible past Tencent (China) affiliation), Hao Wu (possible past Tencent (China) affiliation), Shiqi Jiang, Qianxi Zhang, Hao Wen, Xiangyu Li, Weijun Wang (possible past Google (United States) affiliation), Kun Li, Yunxin Liu, Haipeng Dai, Wei Wang (possible past University Of Oxford affiliation), Ting Cao
Abstract

Embodied agents can benefit from skills that guide object search, action execution, and state changes across diverse environments. Since embodied environments vary across layouts, object states, and other execution factors, these skills must self-evolve from trajectories generated during task execution. However, existing skill self-evolution methods are mainly developed in digital environments and often convert trajectories into coarse skill updates. Directly applying this paradigm to embodied s...

πŸ“„ Verifiable Process Rewards for Agentic Reasoning
πŸ—“οΈ Published: 5/11/2026
πŸ”— http://arxiv.org/abs/2605.10325v1
πŸ‘₯ Authors: Huining Yuan, Zelai Xu, Huaijie Wang, Xiangmin Yi, Jiaxuan Gao, Xiao-Ping Zhang, Yu Wang (possible past Tsinghua University affiliation), Chao Yu, Yi Wu (possible past University Of California, Berkeley affiliation)
Abstract

Reinforcement learning from verifiable rewards (RLVR) has improved the reasoning abilities of large language models (LLMs), but most existing approaches rely on sparse outcome-level feedback. This sparsity creates a credit assignment challenge in long-horizon agentic reasoning: a trajectory may fail despite containing many correct intermediate decisions, or succeed despite containing flawed ones. In this work, we study a class of densely-verifiable agentic reasoning problems, where intermediate ...

πŸ“„ Positive Alignment: Artificial Intelligence for Human Flourishing
πŸ—“οΈ Published: 5/11/2026
πŸ”— http://arxiv.org/abs/2605.10310v1
πŸ‘₯ Authors: Ruben Laukkonen, Seb Krier, ChloΓ© Bakalar, Shamil Chandaria, Morten Kringelbach, Adam Elwood, Daniel Ford, Fernando Rosas, Maty Bohacek, Matija Franklin, Nenad TomaΕ‘ev (possible past Google (United States) affiliation), Stephanie Chan, Verena Rieser, Roma Patel (possible past Google (United States) affiliation), Michael Levin (possible past Google (United States) affiliation), Arun Rao
Abstract

Existing alignment research is dominated by concerns about safety and preventing harm: safeguards, controllability, and compliance. This paradigm of alignment parallels early psychology's focus on mental illness: necessary but incomplete. What we call Positive Alignment is the development of AI systems that (i) actively support human and ecological flourishing in a pluralistic, polycentric, context-sensitive, and user-authored way while (ii) remaining safe and cooperative. It is a distinct and n...

πŸ“„ ProteinOPD: Towards Effective and Efficient Preference Alignment for Protein Design
πŸ—“οΈ Published: 5/11/2026
πŸ”— http://arxiv.org/abs/2605.10189v1
πŸ‘₯ Authors: Yulin Zhang, He Cao, Zihao Jiang, Chenyi Zi, Zhipeng Zhou, Zijing Liu, Yu Li (possible past Tencent (China) affiliation), Jia Li (possible past Google (United States) affiliation), Ziqi Gao
Abstract

Designing proteins with desired functions or properties represents a core goal in synthetic biology and drug discovery. Recent advances in protein language models (PLMs) have enabled the generation of highly designable protein sequences, while preference alignment provides a promising way to steer designs toward desired functions and properties. Nevertheless, they often trigger catastrophic forgetting of pretrained knowledge, degrading basic designability and failing to balance multiple competin...

πŸ“„ ViSRA: A Video-based Spatial Reasoning Agent for Multi-modal Large Language Models
πŸ—“οΈ Published: 5/11/2026
πŸ”— http://arxiv.org/abs/2605.10106v1
πŸ‘₯ Authors: Tingshu Mou, Jiabo He, Renying Wang, Ce Liu (possible past Google (United States) affiliation), Hao Yang (possible past Tencent (China) affiliation), Tiehua Zhang, Jingjing Chen, Xingjun Ma
Abstract

Recent advances in Multi-modal Large Language Models (MLLMs) target 3D spatial intelligence, yet the progress has been largely driven by post-training on curated benchmarks, leaving the inference-time approach relatively underexplored. In this paper, we take a training-free perspective and introduce ViSRA, a human-aligned Video-based Spatial Reasoning Agent, as a framework to probe the spatial reasoning mechanism of MLLMs. ViSRA elicits spatial reasoning in a modular and extensible manner by lev...

πŸ“„ Metis: Learning to Jailbreak LLMs via Self-Evolving Metacognitive Policy Optimization
πŸ—“οΈ Published: 5/11/2026
πŸ”— http://arxiv.org/abs/2605.10067v1
πŸ‘₯ Authors: Huilin Zhou, Jian Zhao, Yilu Zhong, Zhen Liang, Xiuyuan Chen, Yuchen Yuan (possible past Baidu (China) affiliation), Tianle Zhang, Chi Zhang (possible past Peking University affiliation), Lan Zhang, Xuelong Li (possible past Tencent (China) affiliation)
Abstract

Red teaming is critical for uncovering vulnerabilities in Large Language Models (LLMs). While automated methods have improved scalability, existing approaches often rely on static heuristics or stochastic search, rendering them brittle against advanced safety alignment. To address this, we introduce Metis, a framework that reformulates jailbreaking as inference-time policy optimization within an adversarial Partially Observable Markov Decision Process (POMDP). Metis employs a self-evolving metac...

πŸ“„ Bridging the Cognitive Gap: A Unified Memory Paradigm for 6G Agentic AI-RAN
πŸ—“οΈ Published: 5/11/2026
πŸ”— http://arxiv.org/abs/2605.10036v1
πŸ‘₯ Authors: Xijun Wang, Zhaoyang Liu, Chenyuan Feng, Xiang Chen (possible past Tencent (China) affiliation), Howard H. Yang (possible past Google (United States) affiliation), Tony Q. S. Quek
Abstract

As 6G evolves, the radio access network must transcend traditional automation to embrace agentic AI capable of perception, reasoning, and evolution. A fundamental cognitive gap persists in current disaggregated architectures, where interfaces force the physical layer to compress high-dimensional states into low-dimensional metrics, trapping reasoning agents behind a semantic bottleneck. This article envisions a shift from interface-bound to memory-centric architectures. We propose a unified memo...

πŸ“„ DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices
πŸ—“οΈ Published: 5/11/2026
πŸ”— http://arxiv.org/abs/2605.10933v1
πŸ‘₯ Authors: Chenyang Song, Weilin Zhao, Xu Han (possible past Tsinghua University affiliation), Chaojun Xiao, Yingfa Chen, Zhiyuan Liu (possible past Tsinghua University affiliation)
Abstract

While Mixture-of-Experts (MoE) scales model capacity without proportionally increasing computation, its massive total parameter footprint creates significant storage and memory-access bottlenecks, which hinder efficient end-side deployment that simultaneously requires high performance, low computational cost, and small storage overhead. To achieve these properties, we present DECO, a sparse MoE architecture designed to match the performance of dense Transformers under identical total parameter b...

πŸ“„ RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
πŸ—“οΈ Published: 5/11/2026
πŸ”— http://arxiv.org/abs/2605.10899v1
πŸ‘₯ Authors: Gaotang Li, Bhavana Dalvi Mishra (possible past Carnegie Mellon University affiliation), Zifeng Wang, Jun Yan, Yanfei Chen, Chun-Liang Li, Long T. Le, Rujun Han, George Lee, Hanghang Tong (possible past Ibm (United States) affiliation), Chen-Yu Lee (possible past Google (United States) affiliation), Tomas Pfister (possible past University Of Oxford affiliation)
Abstract

Training deep research agents, namely systems that plan, search, evaluate evidence, and synthesize long-form reports, pushes reinforcement learning beyond the regime of verifiable rewards. Their outputs lack ground-truth answers, their trajectories span many tool-augmented decisions, and standard post-training offers little mechanism for turning past attempts into reusable experience. In this work, we argue that rubrics should serve not merely as final-answer evaluators, but as the shared interf...

πŸ“„ Masked Generative Transformer Is What You Need for Image Editing
πŸ—“οΈ Published: 5/11/2026
πŸ”— http://arxiv.org/abs/2605.10859v1
πŸ‘₯ Authors: Wei Chow, Linfeng Li, Xian Sun, Lingdong Kong, Zefeng Li, Qi Xu, Hang Song, Tian Ye, Xian Wang, Jinbin Bai, Shilin Xu, Xiangtai Li (possible past Peking University affiliation), Junting Pan, Shaoteng Liu, Ran Zhou, Tianshu Yang, Songhua Liu (possible past Baidu (China) affiliation)
Abstract

Diffusion models dominate image editing, yet their global denoising mechanism entangles edited regions with surrounding context, causing modifications to propagate into areas that should remain intact. We propose a fundamentally different approach by leveraging Masked Generative Transformers (MGTs), whose localized token-prediction paradigm naturally confines changes to intended regions. We present EditMGT, an MGT-based editing framework that is the first of its kind. Our approach employs multi-...

πŸ“„ FORGE: Fragment-Oriented Ranking and Generation for Context-Aware Molecular Optimization
πŸ—“οΈ Published: 5/11/2026
πŸ”— http://arxiv.org/abs/2605.10230v1
πŸ‘₯ Authors: Qingchuan Zhang, He Cao, Hao Li (possible past Tsinghua University affiliation), Yanjun Shao, Zhiyuan Liu (possible past Tsinghua University affiliation), Shihang Wang, Shufang Xie, Shenghua Gao, Xinwu Ye
Abstract

Molecular optimization seeks to improve a molecule through small structural edits while preserving similarity to the starting compound. Recent language-model approaches typically treat this task as prompt-conditioned sequence generation. However, relying on natural language introduces an inherent data-scaling bottleneck, often leads to chemical hallucinations, and ignores the strong context dependence of fragment effects. We present FORGE, a two-stage framework that reformulates molecular optimi...

*Notable papers are those with at least two authors from a "big" AI/ML lab.