πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22791v1
πŸ‘₯ Authors: Ali Hatamizadeh (possible past Nvidia (United States) affiliation), Yejin Choi (possible past Allen Institute For Artificial Intelligence affiliation), Jan Kautz (possible past Nvidia (United States) affiliation)
Abstract

Linear attention replaces the unbounded cache of softmax attention with a fixed-size recurrent state, reducing sequence mixing to linear time and decoding to constant memory. The hard part is not just what to forget, but how to edit this compressed memory without scrambling existing associations. Delta-rule models subtract the current read before writing a new value, and Kimi Delta Attention (KDA) sharpens forgetting with channel-wise decay. But the active edit still uses a single scalar gate to...

πŸ“„ Advancing Mathematics Research with AI-Driven Formal Proof Search
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22763v1
πŸ‘₯ Authors: George Tsoukalas, Anton Kovsharov, Sergey Shirobokov, Anja Surina, Moritz Firsching, Gergely BΓ©rczi, Francisco J. R. Ruiz (possible past Deepmind (United Kingdom) affiliation), Arun Suggala, Adam Zsolt Wagner, Eric Wieser (possible past University Of Cambridge affiliation), Lei Yu (possible past University Of Oxford affiliation), Aja Huang (possible past Google (United States) affiliation), MiklΓ³s Z. HorvΓ‘th, Andrew Ferrauiolo, Henryk Michalewski, Codrut Grosu, Thomas Hubert (possible past Deepmind (United Kingdom) affiliation), Matej Balog (possible past Deepmind (United Kingdom) affiliation), Pushmeet Kohli (possible past Google (United States) affiliation), Swarat Chaudhuri
Abstract

Large language models (LLMs) increasingly excel at mathematical reasoning, but their unreliability limits their utility in mathematics research. A mitigation is using LLMs to generate formal proofs in languages like Lean. We perform the first large-scale evaluation of this method's ability to solve open problems. Our most capable agent autonomously resolved 9 of 353 open ErdΕ‘s problems at the per-problem cost of a few hundred dollars, proved 44/492 OEIS conjectures, and is being deployed in comb...

πŸ“„ Towards a General Intelligence and Interface for Wearable Health Data
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22759v1
πŸ‘₯ Authors: Girish Narayanswamy, Maxwell A. Xu, A. Ali Heydari, Samy Abdel-Ghaffar, Marius Guerard, Kara Vaillancourt, Zhihan Zhang, Jake Garrison, Levi Albuquerque, Dimitris Spathis, Hong Yu, Hamid Palangi, Xuhai "orson" Xu, David G. T. Barrett (possible past Google (United States) affiliation), Joseph Breda, Jed Mcgiffin, Yubin Kim, Yuwei Zhang, Naghmeh Rezaei, Samuel Solomon, Karan Ahuja, Tim Althoff, Jake Sunshine, Ming-Zher Poh, Benjamin Yetton, Ari Winbush, Nicholas B. Allen, James M. Rehg, Isaac Galatzer-Levy, Yun Liu (possible past Google (United States) affiliation), John Hernandez (possible past Google (United States) affiliation), Anupam Pathak, Conor Heneghan, Yuzhe Yang, Ahmed A. Metwally, Pushmeet Kohli (possible past Google (United States) affiliation), Mark Malhotra, Shwetak Patel, Xin Liu, Daniel Mcduff (possible past Google (United States) affiliation)
Abstract

While ubiquitous wearable sensors capture a wealth of behavioral and physiological information, effectively transforming these signals into personalized health insights is challenging. Specifically, converting low-level sensor data into representations capable of characterizing higher-level states is difficult due to high phenotypic diversity and variation in individual baseline health, physiology, and lifestyle factors. Moreover, collecting wearable data paired with health outcome annotations i...

πŸ“„ Forecasting Scientific Progress with Artificial Intelligence
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22681v1
πŸ‘₯ Authors: Sean Wu, Pan Lu (possible past Baidu (China) affiliation), Yupeng Chen, Jonathan Bragg, Yutaro Yamada, Peter Clark, David Clifton, Philip Torr (possible past University Of Oxford affiliation), James Zou, Junchi Yu
Abstract

Artificial intelligence (AI) is increasingly embedded in scientific discovery, yet whether it can anticipate scientific progress remains unclear. To study this question, we introduce a temporally grounded evaluation framework for forecasting scientific progress under controlled knowledge constraints. We present CUSP (Cutoff-conditioned Unseen Scientific Progress), a multi-disciplinary and event-level benchmark that evaluates scientific forecasting in AI systems through feasibility assessment, me...

πŸ“„ Claw AI Lab: An Autonomous Multi-Agent Research Team
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22662v1
πŸ‘₯ Authors: Fan Wu, Cheng Chen (possible past Google (United States) affiliation), Zhenshan Tan, Taiyu Zhang, Xinzhen Xu, Yanyu Qian, Dingcheng Gao, Lanyun Zhu, Qi Zhu, Yi Tan, Deyi Ji, Guosheng Lin, Tianrun Chen, Deheng Ye (possible past Tencent (China) affiliation), Fayao Liu
Abstract

We present Claw AI Lab, a lab-native autonomous research platform that advances automated research from a hidden prompt-to-paper pipeline into an interactive AI laboratory. Rather than centering the system around a single agent or a fixed serial workflow, we allow users to instantiate a full research team from one prompt, with customizable roles, collaborative workflows, real-time monitoring, artifact inspection, and rollback/resume control through a unified dashboard. The platform also supports...

πŸ“„ Diffusion-guided Generalizable Enhancer for Urban Scene Reconstruction
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22420v1
πŸ‘₯ Authors: Henry Che, Jingkang Wang, Yun Chen, Ze Yang (possible past Tsinghua University affiliation), Sivabalan Manivasagam, Raquel Urtasun (possible past University Of Toronto affiliation)
Abstract

Urban scene reconstruction from real-world observations has emerged as a powerful tool for self-driving development and testing. While current neural rendering approaches achieve high-fidelity rendering along the recorded trajectories, their quality degrades significantly under large viewpoint shifts, limiting the applicability for closed-loop simulation. Recent works have shown promising results in using diffusion models to enhance quality at these challenging viewpoints and distill improvement...

πŸ“„ Towards Clinically Interpretable Ophthalmic VQA via Spatially-Grounded Lesion Evidence
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22414v1
πŸ‘₯ Authors: Xingyue Wang, Bo Liu (possible past Meta (United States) affiliation), Meng Wang (possible past Google (United States) affiliation), Zhixuan Zhang, Chengcheng Zhu, Huazhu Fu (possible past Inception Institute Of Artificial Intelligence affiliation), Jiang Liu
Abstract

Visual Question Answering (VQA) holds great promise for clinical support, particularly in ophthalmology, where retinal fundus photography is essential for diagnosis. However, ophthalmic VQA benchmarks primarily emphasize answer accuracy, neglecting the explicit visual evidence necessary for clinical interpretability. In this work, we introduce FundusGround, a new benchmark for clinically interpretable ophthalmic VQA with spatially-grounded lesion evidence. Specifically, we propose a three-stage ...

πŸ“„ Tailoring Teaching to Aptitude: Direction-Adaptive Self-Distillation for LLM Reasoning
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22263v1
πŸ‘₯ Authors: Hongbin Zhang, Chaozheng Wang, Kehai Chen, Youcheng Pan, Yang Xiang, Jinpeng Wang (possible past Tencent (China) affiliation), Min Zhang (possible past Tsinghua University affiliation)
Abstract

On-policy self-distillation (OPSD) is an emerging LLM post-training paradigm in which the model serves as its own teacher: conditioned on privileged information such as a reference trace or hint, the same policy provides dense token-level supervision on its own rollouts. However, recent studies show that OPSD degrades complex reasoning by suppressing predictive uncertainty, which supports exploration and hypothesis revision. Our token-level analysis shows that this failure arises from applying a...

πŸ“„ CLORE: Content-Level Optimization for Reasoning Efficiency
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22211v1
πŸ‘₯ Authors: Yuyang Wu, Qiyao Xue, Guanxing Lu, Weichen Liu, Zihan Wang (possible past Tsinghua University affiliation), Manling Li, Olexandr Isayev (possible past Carnegie Mellon University affiliation)
Abstract

Reinforcement learning post-training has improved the reasoning ability of large language models, but often produces unnecessarily long, repetitive, or semantically opaque reasoning traces. Existing efficient reasoning methods mainly regulate response length through explicit budgets or length-aware rewards, leaving intermediate reasoning content weakly supervised. We propose CLORE, a content-level optimization framework that improves reasoning efficiency by editing correct on-policy rollouts. CL...

πŸ“„ One-Way Policy Optimization for Self-Evolving LLMs
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22156v1
πŸ‘₯ Authors: Shuo Yang, Jinda Lu, Kexin Huang (possible past Stanford University affiliation), Chiyu Ma, Shaohang Wei, Yuyang Liu, Guoyin Wang, Jingren Zhou, Li Yuan (possible past National University Of Singapore affiliation)
Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has become a promising paradigm for scaling reasoning capabilities of Large Language Models (LLMs). However, the sparsity of binary verifier rewards often leads to low efficiency and optimization instability. To stabilize training, existing methods typically impose token-level constraints relative to a reference policy. We identify that such constraints penalize deviations indiscriminately; this can flip verifier-determined direction when the...

πŸ“„ Echo4DIR: 4D Implicit Heart Reconstruction from 2D Echocardiography Videos
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22066v1
πŸ‘₯ Authors: Yanan Liu, Qinya Li, Hao Zhang (possible past Tencent (China) affiliation), Kangjian He, Xuan Yang (possible past Stanford University affiliation), Hao Li (possible past Tsinghua University affiliation), Dan Xu (possible past University Of Oxford affiliation), Lei Li (possible past Carnegie Mellon University affiliation)
Abstract

Reconstructing 4D (3D+t) cardiac geometry from sparse 2D echocardiography is highly desirable yet fundamentally challenged by geometric ambiguity and temporal discontinuity. To tackle these issues, we propose Echo4DIR, a novel test-time 4D implicit reconstruction framework. Specifically, we learn robust 3D shape priors from statistical shape models (SSMs) via a cardiac conditional SDF, constructing an Epipolar Mask Encoder module with epipolar cross attention to effectively fuse multi-view featu...

πŸ“„ AgroVG: A Large-Scale Multi-Source Benchmark for Agricultural Visual Grounding
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22034v1
πŸ‘₯ Authors: Haocheng Li, Juepeng Zheng (possible past Tsinghua University affiliation), Zenghao Yang, Kaiqi Du, Guilong Xiao, Gengmeng Pu, Haohuan Fu (possible past Tsinghua University affiliation), Jianxi Huang
Abstract

Visual grounding, the task of localizing objects described by natural-language expressions, is a foundational capability for agricultural AI systems, enabling applications such as selective weeding, disease monitoring, and targeted harvesting. Reliable evaluation of agricultural visual grounding remains challenging because agricultural targets are often small, repetitive, occluded, or irregularly shaped, and instructions may refer to one, many, or no objects in an image. Evaluating this capabili...

πŸ“„ What Counts as AI Sycophancy? A Taxonomy and Expert Survey of a Fragmented Construct
πŸ—“οΈ Published: 5/20/2026
πŸ”— http://arxiv.org/abs/2605.21778v1
πŸ‘₯ Authors: Meryl Ye, Lujain Ibrahim, Jessica Y. Bo, Myra Cheng (possible past Deepmind (United Kingdom) affiliation), Ida Mattsson, Daniel Vennemeyer, Robert Kraut (possible past Carnegie Mellon University affiliation), Steve Rathje (possible past University Of Cambridge affiliation)
Abstract

AI sycophancy has become a prominent concern in large language model (LLM) research. Yet the term lacks a consistent definition and has been applied to behaviors ranging from agreeing with a user's false claim to excessively praising the user to withholding corrective feedback. When researchers, companies, and policymakers use the same term to describe different behaviors, evaluation results become difficult to compare, mitigation strategies fail to transfer, and systems that are resistant to on...

πŸ“„ Investigating Concept Alignment Using Implausible Category Members
πŸ—“οΈ Published: 5/20/2026
πŸ”— http://arxiv.org/abs/2605.21683v1
πŸ‘₯ Authors: Sunayana Rane, Brenden M. Lake (possible past Meta (United States) affiliation), Thomas L. Griffiths (possible past University Of California, Berkeley affiliation)
Abstract

Developing AI systems with a human-like understanding of everyday concepts is a key step towards developing safe, reliable systems whose behavior makes sense to humans. When probing concept understanding, asking questions about plausible category members (e.g., "Is a car a vehicle?") is likely to recall patterns in the model's vast training data. We pursue an alternative strategy, characterizing the boundaries of conceptual categories by asking about implausible category members (e.g., "Is an ol...

πŸ“„ Remember to be Curious: Episodic Context and Persistent Worlds for 3D Exploration
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22814v1
πŸ‘₯ Authors: Lily Goli, Justin Kerr, Daniele Reda, Alec Jacobson, Andrea Tagliasacchi (possible past University Of Toronto affiliation), Angjoo Kanazawa (possible past University Of California, Berkeley affiliation)
Abstract

Exploration is a prerequisite for learning useful behaviors in sparse-reward, long-horizon tasks, particularly within 3D environments. Curiosity-driven reinforcement learning addresses this via intrinsic rewards derived from the mismatch between the agent's predictive model of the world and reality. However, translating this intrinsic motivation to complex, photorealistic environments remains difficult, as agents can become trapped in local loops and receive fresh rewards for revisiting forgotte...

πŸ“„ Clipping Bottleneck: Stabilizing RLVR via Stochastic Recovery of Near-Boundary Signals
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22703v1
πŸ‘₯ Authors: Shuo Yang, Jinda Lu, Chiyu Ma, Kexin Huang (possible past Stanford University affiliation), Haoming Meng, Qihui Zhang, Yuyang Liu, Bolin Ding, Guoyin Wang, Li Yuan (possible past National University Of Singapore affiliation), Jingren Zhou
Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a central paradigm for scaling LLM reasoning, yet its optimization often suffers from training instability and suboptimal convergence. Through a systematic dissection of clipping-based GRPO-style objectives, we identify the rigid clipping decision induced by hard clipping as a key practical bottleneck in the studied RLVR setups. Specifically, our analysis suggests that informative signals can lie in the near-boundary region jus...

πŸ“„ SegCompass: Exploring Interpretable Alignment with Sparse Autoencoders for Enhanced Reasoning Segmentation
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22658v1
πŸ‘₯ Authors: Zhenyu Lu, Liupeng Li, Jinpeng Wang (possible past Tencent (China) affiliation), Haoqian Kang, Yan Feng, Ke Chen (possible past Tencent (China) affiliation), Yaowei Wang
Abstract

While large language models provide strong compositional reasoning, existing reasoning segmentation pipelines fail to transparently connect this reasoning to visual perception. Current methods, such as latent query alignment, are end-to-end yet opaque "black boxes". Conversely, textual localization readout is merely readable, not truly interpretable, often functioning as an unconstrained post-hoc step. To bridge this interpretability gap, we propose SegCompass, an end-to-end model that leverages...

πŸ“„ Lost in Tokenization: Fundamental Trade-offs in Graph Tokenization for Transformers
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22471v1
πŸ‘₯ Authors: Maya Bechler-Speicher, Gilad Yehudai, Gil Harari, Clayton Sanford, Amir Globerson (possible past Google (United States) affiliation), Joan Bruna (possible past University Of California, Berkeley affiliation)
Abstract

Transformers have become a central architecture for graph learning, but their application to graphs requires first choosing a tokenization: a graph-to-token map that determines which structural information is exposed at the input. In this work, we show that this choice is a fundamental component of transformer expressivity. We examine three tokenizations that serve as building blocks for many existing graph tokenizations: spectral, random-walk, and adjacency tokenizations. We prove that differen...

πŸ“„ From Snapshots to Trajectories: Learning Single-Cell Gene Expression Dynamics via Conditional Flow Matching
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22340v1
πŸ‘₯ Authors: Siyu Pu, Qingqing Long, Xiaohan Huang, Haotian Chen, Jiajia Wang, Meng Xiao, Xiao Luo, Hengshu Zhu (possible past Baidu (China) affiliation), Yuanchun Zhou, Xuezhi Wang (possible past Google (United States) affiliation)
Abstract

Single-cell RNA sequencing (scRNA-seq) provides high-dimensional profiles of cellular states, enabling data-driven modeling of cellular dynamics over time. In practice, time-resolved scRNA-seq is collected at only a few discrete time points as unpaired snapshot populations, leaving substantial temporal gaps. This motivates trajectory inference at unmeasured time points. Existing methods mainly follow two directions, optimal-transport (OT) alignment provides distribution-level matching between ob...

πŸ“„ ARC-STAR: Auditable Post-Hoc Correction for PDE Foundation Models
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22222v1
πŸ‘₯ Authors: Chengze Li, Lingwei Wei, Li Sun, Hongbo Lv, Jie Yang (possible past Shanghai Jiao Tong University affiliation), Hongrong Zhang, Kening Zheng, Wei-Chieh Huang, Enze Ma, Philip S. Yu (possible past Tsinghua University affiliation)
Abstract

Partial differential equation (PDE) foundation models are pretrained networks that forecast how physical fields like velocity and pressure evolve from a single reusable solver. On unfamiliar flows their predictions drift step by step, errors concentrate in a few regions, yet retraining destabilizes the network and uniform post-hoc correction overlooks this spatial concentration. To address this, we propose a frozen-solver post-hoc correction framework, Adaptive Risk-Calibrated Spatial Triage for...

πŸ“„ RADAR: Defending RAG Dynamically against Retrieval Corruption
πŸ—“οΈ Published: 5/21/2026
πŸ”— http://arxiv.org/abs/2605.22041v1
πŸ‘₯ Authors: Ziyuan Chen, Yueming Lyu, Yi Liu (possible past Google (United States) affiliation), Weixiang Han, Jing Dong (possible past Meta (United States) affiliation), Caifeng Shan, Tieniu Tan
Abstract

While RAG systems are increasingly deployed in dynamic web search, temporal volatility amplifies their vulnerability to adversarial attacks. Existing static-oriented defenses struggle to handle evolving threats and incur prohibitive storage costs in dynamic settings. We propose RADAR, a framework that models reliable context selection as a graph-based energy minimization problem, solved exactly via Max-Flow Min-Cut. By incorporating a Bayesian memory node, RADAR recursively updates a belief stat...

*Notable papers are those with at least two authors from a "big" AI/ML lab.