πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ SkillOpt: Executive Strategy for Self-Evolving Agent Skills
πŸ—“οΈ Published: 5/22/2026
πŸ”— http://arxiv.org/abs/2605.23904v1
πŸ‘₯ Authors: Yifan Yang (possible past Tencent (China) affiliation), Ziyang Gong, Weiquan Huang, Qihao Yang, Ziwei Zhou, Zisu Huang, Yan Li (possible past Tencent (China) affiliation), Xuemei Gao, Qi Dai, Bei Liu, Kai Qiu, Yuqing Yang, Dongdong Chen, Xue Yang, Chong Luo (possible past Google (United States) affiliation)
Abstract

Agent skills today are hand-crafted, generated one-shot, or evolved through loosely controlled self-revision, none of which behaves like a deep-learning optimizer for the skill, and none of which reliably improves over its starting point under feedback. We argue the skill should instead be trained as the external state of a frozen agent, with the same discipline that makes weight-space optimization reproducible. SkillOpt is, to our knowledge, the first systematic controllable text-space optimize...

πŸ“„ From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills
πŸ—“οΈ Published: 5/22/2026
πŸ”— http://arxiv.org/abs/2605.23899v1
πŸ‘₯ Authors: Zisu Huang, Jingwen Xu, Yifan Yang (possible past Tencent (China) affiliation), Ziyang Gong, Qihao Yang, Muzhao Tian, Xiaohua Wang, Changze Lv, Xuemei Gao, Qi Dai, Bei Liu, Kai Qiu, Xue Yang, Dongdong Chen, Xiaoqing Zheng, Chong Luo (possible past Google (United States) affiliation)
Abstract

Language agents increasingly improve by reusing \emph{skills} -- structured procedural artifacts distilled from past experience. In particular, \emph{domain-level} and \emph{model-generated} skills are especially promising. They offer fast adaptation within a domain by encoding domain-specific recurring procedures, and they scale beyond labor-intensive hand-crafting. However, while extraction methods continue to proliferate, understanding remains limited, with no comprehensive study spanning the...

πŸ“„ Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers
πŸ—“οΈ Published: 5/22/2026
πŸ”— http://arxiv.org/abs/2605.23892v1
πŸ‘₯ Authors: Shuhong Zheng, Michael Oechsle, Erik SandstrΓΆm, Marie-Julie Rakotosaona, Federico Tombari (possible past Google (United States) affiliation), Igor Gilitschenski (possible past Massachusetts Institute Of Technology affiliation)
Abstract

Visual geometry transformers have become powerful architectures for multi-view 3D reconstruction, enabling joint prediction of multiple 3D attributes in a feed-forward manner. However, their computational cost grows quadratically with the input sequence length due to the global attention layers inside these models. This limits both their scalability and efficiency. In this work, we address this challenge with a simple yet general strategy: restricting the number of key/value tokens that each que...

πŸ“„ Any2Any: Efficient Cross-Embodiment Transfer for Humanoid Whole-Body Tracking
πŸ—“οΈ Published: 5/22/2026
πŸ”— http://arxiv.org/abs/2605.23733v1
πŸ‘₯ Authors: Ming Yang (possible past Meta (United States) affiliation), Tao Yu (possible past University Of Washington affiliation), Feng Li, Hua Chen
Abstract

Whole-body tracking (WBT) models have become a key foundation for humanoid robots, enabling them to imitate diverse motions with high fidelity. Training such models from scratch requires large-scale data and computation, making rapid deployment on new humanoid platforms costly. This raises a natural question: Can pretrained WBT models transfer across embodiments with minimal adaptation? To answer this question, we propose Any2Any, a paradigm that efficiently transfers an existing WBT specialist ...

πŸ“„ CVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception
πŸ—“οΈ Published: 5/22/2026
πŸ”— http://arxiv.org/abs/2605.23655v1
πŸ‘₯ Authors: Liupeng Li, Haoqian Kang, Zhenyu Lu, Jinpeng Wang (possible past Tencent (China) affiliation), Bin Chen, Ke Chen (possible past Tencent (China) affiliation), Yaowei Wang
Abstract

High-resolution (HR) image perception presents a key bottleneck for multimodal large language models (MLLMs). While visual search offers a promising solution, existing methods struggle with the trade-off between coverage and efficiency. Visual expert-assisted search is efficient but prone to blind spots when proposals fail, whereas scan-based search guarantees coverage at the cost of computational redundancy and semantic fragmentation. To address this dilemma, we introduce CVSearch, a training-f...

πŸ“„ DiLaDiff: Distilled Latent-Augmented Diffusion for Language Modeling
πŸ—“οΈ Published: 5/22/2026
πŸ”— http://arxiv.org/abs/2605.23605v1
πŸ‘₯ Authors: Jean-Marie Lemercier, Tomas Geffner, Karsten Kreis (possible past Nvidia (United States) affiliation), Morteza Mardani (possible past Nvidia (United States) affiliation), Arash Vahdat (possible past Nvidia (United States) affiliation), Ante JukiΔ‡
Abstract

Diffusion language models intrinsically fail to capture correlations between decoded tokens, which leads to a harsh trade-off between sampling quality and throughput. To solve this issue, we propose DiLaDiff, a variant of masked diffusion language models with three components: (1) a continuous latent space with semantic capabilities, learned by an auto-encoder fine-tuned from an existing masked diffusion language model; (2) a latent diffusion model learning the prior over the encoder distributio...

πŸ“„ PathNavigate: A Training-Free Pathology Agent with Surprise-Guided Scan and Shared Slide Memory for Whole-Slide Image VQA
πŸ—“οΈ Published: 5/22/2026
πŸ”— http://arxiv.org/abs/2605.23559v1
πŸ‘₯ Authors: Chunze Yang, Qidong Liu, Wenjie Zhao, Yue Tang, Jiusong Ge, Di Zhang, Jiashuai Liu, Lei Wu, Junbo Lu, Ni Zhang, Xian Wu (possible past Tencent (China) affiliation), Zeyu Gao, Chen Li (possible past Tencent (China) affiliation)
Abstract

Whole-slide image visual question answering (WSI-VQA) frames pathology as an extreme-context search problem: to answer a free-form clinical query, a system must first navigate a gigapixel slide under a strict inspection budget to locate sparse, high-resolution evidence. Existing approaches largely fall into two paradigms: i) supervised pathology multimodal large language models (MLLMs) and agents can absorb localization and reasoning into learned modules, but they often couple navigation to task...

πŸ“„ Precise: SDE-Consistent Stochastic Sampling for RL Post-Training of Flow-Matching Models
πŸ—“οΈ Published: 5/22/2026
πŸ”— http://arxiv.org/abs/2605.23522v1
πŸ‘₯ Authors: Jade Zou, Tao Huang, Weijie Kong (possible past Peking University affiliation), Junzhe Li, Yue Wu, Qi Tian (possible past Huawei Technologies (China) affiliation), Jiangfeng Xiong, Jianwei Zhang, Liefeng Bo, Zhao Zhong
Abstract

Reinforcement learning (RL) has become an effective way to improve prompt alignment and perceptual quality in diffusion and flow-matching generators. A critical step for applying online RL to flow matching is turning the deterministic sampling trajectory into a stochastic policy, typically by replacing the reverse-time Ordinary Differential Equation (ODE) with a Stochastic Differential Equation (SDE). The stochastic sampler, controlling the exploration behavior and denoising dynamics, is thus pa...

πŸ“„ Metacognition as Reward: Reinforcing LLM Reasoning via Knowledge and Regulation Signals
πŸ—“οΈ Published: 5/22/2026
πŸ”— http://arxiv.org/abs/2605.23384v1
πŸ‘₯ Authors: Sirui Chen, Lei Xu (possible past Tsinghua University affiliation), Yuying Zhao, Yutian Chen (possible past Deepmind (United Kingdom) affiliation), Yu Wang (possible past Tsinghua University affiliation), Beier Zhu, Hanwang Zhang, Shengjie Zhao, Chaochao Lu
Abstract

Recent RL methods have substantially improved the reasoning abilities of LLMs. Existing reward designs mainly follow two paradigms: (1) Reinforcement learning with verifiable rewards (RLVR) derives outcome signals from executable checks or ground-truth answers, but provides limited guidance for intermediate reasoning behaviors. (2) Rubrics-as-reward (RaR) goes beyond final-answer checking by using natural-language rubrics to assess reasoning quality and task compliance, but often requires instan...

πŸ“„ EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation
πŸ—“οΈ Published: 5/22/2026
πŸ”— http://arxiv.org/abs/2605.23271v1
πŸ‘₯ Authors: Songlin Yang, Haobin Zhong, Ruilin Zhang, Xiaotong Zhao, Shuai Li, Kai Zheng, Xuyi Yang, Zhe Wang (possible past Deepmind (United Kingdom) affiliation), Zhenchen Tang, Yang Li (possible past Google (United States) affiliation), Bohai Gu, Zhengwei Peng, Yidan Huang, Mengzhou Luo, Yihang Bo, Dalu Feng, Yujia Zhang, Juntao Ma, Ruiqi Wang, Lvmin Zhang, Yuwei Guo, Frank Guan, Maneesh Agrawala (possible past Stanford University affiliation), Hongbo Fu, Alan Zhao, Anyi Rao
Abstract

The rapid evolution of generative video foundation models has propelled the field toward professional-grade cinematic synthesis. To achieve such demanding quality, the community transitions towards Reinforcement Learning (RL) and agentic workflows. However, reliable evaluation has emerged as a critical bottleneck. Existing benchmarks predominantly evaluate ''whether it is right'' (basic prompt-following) while fundamentally neglecting ''whether it is good'' (cinematic quality, acting, and aesthe...

πŸ“„ Foundation Protocol: A Coordination Layer for Agentic Society
πŸ—“οΈ Published: 5/22/2026
πŸ”— http://arxiv.org/abs/2605.23218v1
πŸ‘₯ Authors: Bang Liu, Yongfeng Gu, Jiayi Zhang, Zhaoyang Yu, Sirui Hong, Maojia Song, Xiaoqiang Wang, Mingyi Deng, Zijie Zhuang, Ronghao Wang, Mingzhe Cao, Yutong Zhu, Xingjian Li (possible past Baidu (China) affiliation), Yifan Wu (possible past Carnegie Mellon University affiliation), Jianhao Ruan, Yiran Peng, Shuangrui Chen, Jinlin Wang, Yizhang Lin, Dongjie Zhang, Dekun Wu, Chen Ma, Lizi Liao, Han Yu, Jian Pei, Heng Ji, Qiang Yang, Yuyu Luo, Chenglin Wu
Abstract

Autonomous agents are moving from tools into a layer of social infrastructure: they browse, purchase, deploy software, manage systems, and increasingly interact with one another. As these systems scale, the bottleneck shifts away from raw model capability toward coordination. Agents need to form reliable relationships, organize multi-agent work, exchange value, support an AI economy, and stay safe and accountable under real-world oversight. This paper introduces the Foundation Protocol (FP), a g...

πŸ“„ AutoResearch AI: Towards AI-Powered Research Automation for Scientific Discovery
πŸ—“οΈ Published: 5/22/2026
πŸ”— http://arxiv.org/abs/2605.23204v1
πŸ‘₯ Authors: Guiyao Tie, Jiawen Shi, Dingjie Song, Yixiao Huang, Ziji Sheng, Xueyang Zhou, Daizong Liu, Pan Zhou, Yongchao Chen, Ran Xu, Lifang He, Qingsong Wen, Manling Li, Cong Lu, Shuai Li, Pengtao Xie, Yixuan Yuan, Rui Meng, Lei Xing (possible past Stanford University affiliation), Lichao Sun, Caiming Xiong (possible past Salesforce (United States) affiliation), Philip S. Yu (possible past Tsinghua University affiliation), Jianfeng Gao (possible past Microsoft (United States) affiliation)
Abstract

Scientific research is being reshaped by AI systems that move beyond isolated assistance toward longer-horizon workflows spanning literature grounding, hypothesis generation, experimentation, validation, reporting, and revision. This shift marks a transition from task-level AI for science to workflow-level research automation. Yet current systems remain fragmented, differing in autonomy, domain scope, execution environment, validation mechanism, and human oversight, while still struggling with e...

πŸ“„ Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems
πŸ—“οΈ Published: 5/22/2026
πŸ”— http://arxiv.org/abs/2605.23109v1
πŸ‘₯ Authors: Shubham Agarwal, Alexander Krentsel, Shu Liu (possible past Tencent (China) affiliation), Mert Cemri, Audrey Cheng, Rui Meng, Tomas Pfister (possible past University Of Oxford affiliation), Chun-Liang Li, Sylvia Ratnasamy (possible past University Of California, Berkeley affiliation), Aditya Parameswaran (possible past Stanford University affiliation), Matei Zaharia (possible past University Of California, Berkeley affiliation), Ion Stoica (possible past University Of California, Berkeley affiliation), Mohsen Lesani
Abstract

AI agents increasingly excel at generating, testing, and refining code. However, they fall short on tasks requiring formal guarantees of full coverage that testing alone cannot provide. Distributed systems are a prime example: properties such as consistency between reads and writes must hold under every possible interleaving of events. Mechanized formal verification can guarantee such correctness, but typically demands months to years of expert effort. As evidence, even SOTA coding agents (Codex...

πŸ“„ A Proactive Multi-Agent Dialogue Framework for Assessing Social Language Disorder Traits in Autism
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22993v1
πŸ‘₯ Authors: Chuanbo Hu, Minglei Yin, Bin Liu, Wenqi Li (possible past Nvidia (United States) affiliation), Lynn K. Paul, Shuo Wang (possible past Nvidia (United States) affiliation), Xin Li (possible past Google (United States) affiliation)
Abstract

Characteristic linguistic behaviors associated with Social Language Disorder (SLD) in autism spectrum disorder, including echoic repetition, pronoun displacement, and stereotyped media quoting, are largely absent from spontaneous conversation and only emerge under specific conversational conditions. In structured clinical assessments, this latency means that questioning strategy selection is a critical yet underappreciated determinant of how much diagnostic information a conversation yields. Whe...

πŸ“„ Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22791v1
πŸ‘₯ Authors: Ali Hatamizadeh (possible past Nvidia (United States) affiliation), Yejin Choi (possible past Allen Institute For Artificial Intelligence affiliation), Jan Kautz (possible past Nvidia (United States) affiliation)
Abstract

Linear attention replaces the unbounded cache of softmax attention with a fixed-size recurrent state, reducing sequence mixing to linear time and decoding to constant memory. The hard part is not just what to forget, but how to edit this compressed memory without scrambling existing associations. Delta-rule models subtract the current read before writing a new value, and Kimi Delta Attention (KDA) sharpens forgetting with channel-wise decay. But the active edit still uses a single scalar gate to...

πŸ“„ Advancing Mathematics Research with AI-Driven Formal Proof Search
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22763v1
πŸ‘₯ Authors: George Tsoukalas, Anton Kovsharov, Sergey Shirobokov, Anja Surina, Moritz Firsching, Gergely BΓ©rczi, Francisco J. R. Ruiz (possible past Deepmind (United Kingdom) affiliation), Arun Suggala, Adam Zsolt Wagner, Eric Wieser (possible past University Of Cambridge affiliation), Lei Yu (possible past University Of Oxford affiliation), Aja Huang (possible past Google (United States) affiliation), MiklΓ³s Z. HorvΓ‘th, Andrew Ferrauiolo, Henryk Michalewski, Codrut Grosu, Thomas Hubert (possible past Deepmind (United Kingdom) affiliation), Matej Balog (possible past Deepmind (United Kingdom) affiliation), Pushmeet Kohli (possible past Google (United States) affiliation), Swarat Chaudhuri
Abstract

Large language models (LLMs) increasingly excel at mathematical reasoning, but their unreliability limits their utility in mathematics research. A mitigation is using LLMs to generate formal proofs in languages like Lean. We perform the first large-scale evaluation of this method's ability to solve open problems. Our most capable agent autonomously resolved 9 of 353 open ErdΕ‘s problems at the per-problem cost of a few hundred dollars, proved 44/492 OEIS conjectures, and is being deployed in comb...

πŸ“„ Towards a General Intelligence and Interface for Wearable Health Data
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22759v1
πŸ‘₯ Authors: Girish Narayanswamy, Maxwell A. Xu, A. Ali Heydari, Samy Abdel-Ghaffar, Marius Guerard, Kara Vaillancourt, Zhihan Zhang, Jake Garrison, Levi Albuquerque, Dimitris Spathis, Hong Yu, Hamid Palangi, Xuhai "orson" Xu, David G. T. Barrett (possible past Google (United States) affiliation), Joseph Breda, Jed Mcgiffin, Yubin Kim, Yuwei Zhang, Naghmeh Rezaei, Samuel Solomon, Karan Ahuja, Tim Althoff, Jake Sunshine, Ming-Zher Poh, Benjamin Yetton, Ari Winbush, Nicholas B. Allen, James M. Rehg, Isaac Galatzer-Levy, Yun Liu (possible past Google (United States) affiliation), John Hernandez (possible past Google (United States) affiliation), Anupam Pathak, Conor Heneghan, Yuzhe Yang, Ahmed A. Metwally, Pushmeet Kohli (possible past Google (United States) affiliation), Mark Malhotra, Shwetak Patel, Xin Liu, Daniel Mcduff (possible past Google (United States) affiliation)
Abstract

While ubiquitous wearable sensors capture a wealth of behavioral and physiological information, effectively transforming these signals into personalized health insights is challenging. Specifically, converting low-level sensor data into representations capable of characterizing higher-level states is difficult due to high phenotypic diversity and variation in individual baseline health, physiology, and lifestyle factors. Moreover, collecting wearable data paired with health outcome annotations i...

πŸ“„ Forecasting Scientific Progress with Artificial Intelligence
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22681v1
πŸ‘₯ Authors: Sean Wu, Pan Lu (possible past Baidu (China) affiliation), Yupeng Chen, Jonathan Bragg, Yutaro Yamada, Peter Clark, David Clifton, Philip Torr (possible past University Of Oxford affiliation), James Zou, Junchi Yu
Abstract

Artificial intelligence (AI) is increasingly embedded in scientific discovery, yet whether it can anticipate scientific progress remains unclear. To study this question, we introduce a temporally grounded evaluation framework for forecasting scientific progress under controlled knowledge constraints. We present CUSP (Cutoff-conditioned Unseen Scientific Progress), a multi-disciplinary and event-level benchmark that evaluates scientific forecasting in AI systems through feasibility assessment, me...

πŸ“„ Claw AI Lab: An Autonomous Multi-Agent Research Team
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22662v1
πŸ‘₯ Authors: Fan Wu, Cheng Chen (possible past Google (United States) affiliation), Zhenshan Tan, Taiyu Zhang, Xinzhen Xu, Yanyu Qian, Dingcheng Gao, Lanyun Zhu, Qi Zhu, Yi Tan, Deyi Ji, Guosheng Lin, Tianrun Chen, Deheng Ye (possible past Tencent (China) affiliation), Fayao Liu
Abstract

We present Claw AI Lab, a lab-native autonomous research platform that advances automated research from a hidden prompt-to-paper pipeline into an interactive AI laboratory. Rather than centering the system around a single agent or a fixed serial workflow, we allow users to instantiate a full research team from one prompt, with customizable roles, collaborative workflows, real-time monitoring, artifact inspection, and rollback/resume control through a unified dashboard. The platform also supports...

πŸ“„ Hinge Regression Trees and HRT-Boost: Newton-Optimized Oblique Learning for Compact Tabular Models
πŸ—“οΈ Published: 5/22/2026
πŸ”— http://arxiv.org/abs/2605.23422v1
πŸ‘₯ Authors: Hongyi Li (possible past Google (United States) affiliation), Jun Xu (possible past Google (United States) affiliation), Hong Yan
Abstract

Learning high-quality oblique decision trees remains a significant challenge due to the discrete and non-convex nature of split optimization. We present the Hinge Regression Tree (HRT) framework, which reframes each oblique split as a nonlinear least-squares problem over two linear predictors whose max/min envelope induces ReLU-like representation capacity. We show that the resulting node-level optimization can be interpreted as a damped Newton method, and we establish the monotonic decrease of ...

πŸ“„ RelPrism: A Multi-Faceted Pre-training Framework with Self-Generated Tasks for Relational Databases
πŸ—“οΈ Published: 5/22/2026
πŸ”— http://arxiv.org/abs/2605.23241v1
πŸ‘₯ Authors: Jinyu Yang, Cheng Yang (possible past Tsinghua University affiliation), Junze Chen, Zedi Liu, Muhan Zhang (possible past Meta (United States) affiliation), Hanyang Peng, Chuan Shi
Abstract

Relational databases (RDBs) remain the cornerstone of modern data systems and support diverse predictive tasks. Recent relational deep learning (RDL) methods enable end-to-end prediction by converting RDBs into graphs, where rows are represented as nodes and inter-table interactions are represented as edges, and then applying graph-based models for representation learning. Despite the strong capability of RDL, effective self-supervised pre-training for RDBs remains non-trivial. RDB tasks often r...

πŸ“„ Remember to be Curious: Episodic Context and Persistent Worlds for 3D Exploration
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22814v1
πŸ‘₯ Authors: Lily Goli, Justin Kerr, Daniele Reda, Alec Jacobson, Andrea Tagliasacchi (possible past University Of Toronto affiliation), Angjoo Kanazawa (possible past University Of California, Berkeley affiliation)
Abstract

Exploration is a prerequisite for learning useful behaviors in sparse-reward, long-horizon tasks, particularly within 3D environments. Curiosity-driven reinforcement learning addresses this via intrinsic rewards derived from the mismatch between the agent's predictive model of the world and reality. However, translating this intrinsic motivation to complex, photorealistic environments remains difficult, as agents can become trapped in local loops and receive fresh rewards for revisiting forgotte...

πŸ“„ Clipping Bottleneck: Stabilizing RLVR via Stochastic Recovery of Near-Boundary Signals
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22703v1
πŸ‘₯ Authors: Shuo Yang, Jinda Lu, Chiyu Ma, Kexin Huang (possible past Stanford University affiliation), Haoming Meng, Qihui Zhang, Yuyang Liu, Bolin Ding, Guoyin Wang, Li Yuan (possible past National University Of Singapore affiliation), Jingren Zhou
Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a central paradigm for scaling LLM reasoning, yet its optimization often suffers from training instability and suboptimal convergence. Through a systematic dissection of clipping-based GRPO-style objectives, we identify the rigid clipping decision induced by hard clipping as a key practical bottleneck in the studied RLVR setups. Specifically, our analysis suggests that informative signals can lie in the near-boundary region jus...

*Notable papers are those with at least two authors from a "big" AI/ML lab.