πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling
πŸ—“οΈ Published: 5/20/2026
πŸ”— http://arxiv.org/abs/2605.21470v1
πŸ‘₯ Authors: Caleb Winston, Ron Yifeng Wang, Azalia Mirhoseini (possible past Google (United States) affiliation), Christos Kozyrakis (possible past Stanford University affiliation)
Abstract

Computer-use agents (CUA) automate tasks specified with natural language such as "order the cheapest item from Taco Bell" by generating sequences of calls to tools such as click, type, and scroll on a browser. Current implementations follow a sequential fetch-screenshot-execute loop where each iteration requires an LLM call, resulting in high latency and frequent errors from incorrect tool use. We present agent just-in-time (JIT) compilation, an alternative that compiles task descriptions direct...

πŸ“„ Mem-$Ο€$: Adaptive Memory through Learning When and What to Generate
πŸ—“οΈ Published: 5/20/2026
πŸ”— http://arxiv.org/abs/2605.21463v1
πŸ‘₯ Authors: Xiaoqiang Wang, Chao Wang (possible past Google (United States) affiliation), Hadi Nekoei, Christopher Pal, Alexandre Lacoste (possible past Google (United States) affiliation), Spandana Gella (possible past University Of Edinburgh affiliation), Bang Liu, Perouz Taslakian
Abstract

We present Mem-$Ο€$, a framework for adaptive memory in large language model (LLM) agents, where useful guidance is generated on demand rather than retrieved from external memory stores. Existing memory-augmented agents typically rely on similarity-based retrieval from episodic memory banks or skill libraries, returning static entries that often misalign with the current context. In contrast, Mem-$Ο€$ uses a dedicated language or vision-language model with its own parameters, separate from the dow...

πŸ“„ Learning Structural Latent Points for Efficient Visual Representations in Robotic Manipulation
πŸ—“οΈ Published: 5/20/2026
πŸ”— http://arxiv.org/abs/2605.21258v1
πŸ‘₯ Authors: Yicheng Jiang, Jiaxu Wang, Junhao He, Zesen Gan, Junhao Li, Qiang Zhang (possible past Tsinghua University affiliation), Jingkai Sun, Jiahang Cao, Mingyuan Sun, Xiangyu Yue (possible past University Of California, Berkeley affiliation), Qiming Shao
Abstract

Current 3D-aware pretraining methods for embodied perception and manipulation are largely built on differentiable rendering frameworks, producing either fully implicit neural fields or fully explicit geometric primitives. Implicit representations, while expressive, lack explicit structural cues, whereas explicit ones preserve geometry but suffer from resolution limits and weak generalization. To address these limitations, we propose a novel pretraining framework that learns a hybrid representati...

πŸ“„ RePCM: Region-Specific and Phenotype-Adaptive Bi-Ventricular Cardiac Motion Synthesis
πŸ—“οΈ Published: 5/20/2026
πŸ”— http://arxiv.org/abs/2605.21237v1
πŸ‘₯ Authors: Xuan Yang (possible past Stanford University affiliation), Xiaohan Yuan, Hao Li (possible past Tsinghua University affiliation), Lingyu Chen, Yanan Liu, Lei Li (possible past Carnegie Mellon University affiliation)
Abstract

Cardiac motion over a cardiac cycle is crucial for quantifying regional function and is strongly affected by cardiovascular diseases. Since temporally dense mesh sequences are difficult to obtain in practice, we focus on leveraging the more accessible end-diastolic frame to infer a full-cycle sequence. Due to strong regional and disease-specific differences, traditional methods often oversmooth the data by relying on generative models that are optimized for global patterns. To address this probl...

πŸ“„ Artificial Intelligence Reshapes Microwave Photonics
πŸ—“οΈ Published: 5/20/2026
πŸ”— http://arxiv.org/abs/2605.21224v1
πŸ‘₯ Authors: Peng Li (possible past Tsinghua University affiliation), Xihua Zou, Jia Ye (possible past Google (United States) affiliation), Wei Pan, Lianshan Yan
Abstract

As a rapidly emerging interdisciplinary field that intrinsically integrates microwave and photonics, microwave photonics (MWP) provides disruptive solutions to overcome the fundamental bandwidth of conventional electronic systems. By exploiting the inherently ultra-wide bandwidth and low-loss characteristics of photonic technologies, MWP enables the generation, transmission, processing, and detection of microwave, millimeter-wave, and terahertz signals. Representative breakthroughs include fully...

πŸ“„ ELSA: An ELastic SNN Inference Architecture for Efficient Neuromorphic Computing
πŸ—“οΈ Published: 5/20/2026
πŸ”— http://arxiv.org/abs/2605.20802v1
πŸ‘₯ Authors: Kang You, Chen Nie, Lee Jun Yan, Ziling Wei, Cheng Zou, Zekai Xu, Yu Feng (possible past University Of California, Berkeley affiliation), Honglan Jiang, Zhezhi He (possible past Shanghai Jiao Tong University affiliation)
Abstract

Spiking neural networks (SNNs) exploit event-driven and addition-only computation to substantially improve efficiency for intelligent computation. A key temporal property of SNNs, elastic inference, allows outputs to emerge progressively, enabling responses to salient inputs much earlier than full evaluation. However, existing SNN-specific accelerators cannot capitalize on this property. Layer-by-layer designs emit outputs only after all layers are complete, while time-step-by-time-step designs ...

πŸ“„ Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression
πŸ—“οΈ Published: 5/20/2026
πŸ”— http://arxiv.org/abs/2605.20740v1
πŸ‘₯ Authors: Jungsoo Park, Hyungjoo Chae, Ethan Mendes, Jay Deyoung, Varsha Kishore, Wei Xu (possible past Tencent (China) affiliation), Alan Ritter (possible past Carnegie Mellon University affiliation)
Abstract

Large language models can predict real-valued quantities from heterogeneous inputs such as text, code, and molecular strings, but most training objectives score each decoded floating-point number independently, improving point estimates without ensuring calibrated predictive distributions. This limits applications requiring candidate ranking or uncertainty estimation. We introduce Distribution-Aware Reward, an on-policy reinforcement learning objective whose main contribution is to train languag...

πŸ“„ On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists
πŸ—“οΈ Published: 5/20/2026
πŸ”— http://arxiv.org/abs/2605.20668v1
πŸ‘₯ Authors: Seungone Kim, Dongkeun Yoon, Kiril Gashteovski, Juyoung Suk, Jinheon Baek, Pranjal Aggarwal, Ian Wu, Viktor Zaverkin, Spase Petkoski, Daniel R. Schrider, Ilija Dukovski, Francesco Santini, Biljana Mitreska, Yong Jeong, Kyeongha Kwon, Young Min Sim, Dragana Manasova, Arthur Porto, Biljana Mojsoska, Makoto Takamoto, Marko Shuntov, Ruoqi Liu, Hyunjoo Jenny Lee, Niyazi Ulas DinΓ§, Yehhyun Jo, Sunkyu Han, Chungwoo Lee, Huishan Li, Esther H. R. Tsai, Ergun Simsek, Khushboo Shafi, Yeonseung Chung, Jihye Park, Aleksandar Shulevski, Henrik Christiansen, Yoosang Son, Elly Knight, Amanda Montoya, Jeongyoun Ahn, Christian Langkammer, Heera Moon, Changwon Yoon, Nikola Stikov, Mooseok Jang, Edward Choi (possible past Google (United States) affiliation), Junhan Kim, Yeon Sik Jung, Woo Youn Kim, Jae Kyoung Kim, Ishraq Md Anjum, Hyun Uk Kim, Drew Bridges, Carolin Lawrence, Xiang Yue, Alice Oh, Akari Asai (possible past Tencent (China) affiliation), Sean Welleck, Graham Neubig (possible past Carnegie Mellon University affiliation)
Abstract

With the advancement of AI capabilities, AI reviewers are beginning to be deployed in scientific peer review, yet their capability and credibility remain in question: many scientists simply view them as probabilistic systems without the expertise to evaluate research, while other researchers are more optimistic about their readiness without concrete evidence. Understanding what AI reviewers do well, where they fall short, and what challenges remain is essential. However, existing evaluations of ...

πŸ“„ STELLAR: Scaling 3D Perception Large Models for Autonomous Driving
πŸ—“οΈ Published: 5/19/2026
πŸ”— http://arxiv.org/abs/2605.20390v1
πŸ‘₯ Authors: Yingwei Li (possible past Google (United States) affiliation), Xin Huang (possible past Baidu (China) affiliation), Yang Liu (possible past Tsinghua University affiliation), Yang Fu, Alex Zihao Zhu, Chen Song, Junwen Yao, Anant Subramanian, Hao Xiang, Weijing Shi, Yuliang Zou, Tom Hoddes, Zhaoqi Leng, Govind Thattai, Dragomir Anguelov (possible past Google (United States) affiliation), Mingxing Tan (possible past Google (United States) affiliation)
Abstract

Model scaling has demonstrated remarkable success through large-scale training on diverse datasets. It remains an open question whether the same paradigm would apply to autonomous driving perception systems due to unique challenges, such as fusing heterogeneous sensor data and the need for sophisticated 3D spatial understanding. To bridge this gap, we present a comprehensive study on systematically analyzing the impact of scale on these systems. We develop our STELLAR model based on Sparse Windo...

πŸ“„ ConceptSeg-R1: Segment Any Concept via Meta-Reinforcement Learning
πŸ—“οΈ Published: 5/19/2026
πŸ”— http://arxiv.org/abs/2605.20385v1
πŸ‘₯ Authors: Yuan Zhao, Youwei Pang, Jiaming Zuo, Wei Ji (possible past Tencent (China) affiliation), Kailai Zhou, Bin Fan, Yunkang Cao, Lihe Zhang, Xiaofeng Liu (possible past Google (United States) affiliation), Huchuan Lu, Weisi Lin, Dacheng Tao, Xiaoqi Zhao
Abstract

Recent progress in promptable segmentation has shifted visual perception from object-level localization toward concept-level understanding. However, the notion of a concept remains under-specified, making it unclear whether current methods truly generalize beyond category recognition. In this work, we formalize generalized concept segmentation through a three-level taxonomy consisting of context-independent (CI), context-dependent (CD), and context-reasoning (CR) concepts, which reveals a clear ...

πŸ“„ SUGAR: A Scalable Human-Video-Driven Generalizable Humanoid Loco-Manipulation Learning Framework
πŸ—“οΈ Published: 5/19/2026
πŸ”— http://arxiv.org/abs/2605.20373v1
πŸ‘₯ Authors: Tianshu Wu, Xiangqi Kong, Yue Chen (possible past Google (United States) affiliation), Qize Yu, Hang Ye, Jia Li (possible past Google (United States) affiliation), Yizhou Wang (possible past Peking University affiliation), Hao Dong
Abstract

Building humanoid robots capable of generalizable whole-body loco-manipulation in the real world remains a fundamental challenge. Existing methods either rely on laborious task-specific reward engineering, rigidly replay reference motions that fail to generalize, or depend on costly teleoperation that limits scalability. While human videos capture diverse human behaviors, motion priors inferred from them are inherently imperfect, suffering from occlusion, contact artifacts, and retargeting error...

πŸ“„ Atoms of Thought: Universal EEG Representation Learning with Microstates
πŸ—“οΈ Published: 5/19/2026
πŸ”— http://arxiv.org/abs/2605.20182v1
πŸ‘₯ Authors: Xinyang Tian, Ruitao Liu, Ziyi Ye, Siyang Xue, Xin Wang (possible past University Of Edinburgh affiliation), Xuesong Chen (possible past Peking University affiliation)
Abstract

Learning universal representations from electroencephalogram (EEG) signals is a cutting-edge approach in the field of neuroinformatics and brain-computer interfaces (BCIs). Conventionally, EEG is treated as a multivariate temporal signal, where time- or frequency-domain features are extracted for representation learning. This paper investigates a simple yet effective EEG representation, i.e., microstates. Microstates represent the building blocks of brain activity patterns at a microscopic time ...

πŸ“„ DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards
πŸ—“οΈ Published: 5/20/2026
πŸ”— http://arxiv.org/abs/2605.21467v1
πŸ‘₯ Authors: Kaiyi Zhang, Wei Wu (possible past Tencent (China) affiliation), Yankai Lin (possible past Tsinghua University affiliation)
Abstract

Reinforcement learning from verifiable rewards (RLVR) has emerged as a central technique for improving the reasoning capabilities of large language models. Despite its effectiveness, how response-level rewards translate into token-level probability changes remains poorly understood. We introduce a discriminator view of RLVR updates, showing that the policy-gradient update direction implicitly acts as a linear discriminator over token-gradient vectors and thereby determines which token probabilit...

πŸ“„ Multimodal LLMs under Pairwise Modalities
πŸ—“οΈ Published: 5/20/2026
πŸ”— http://arxiv.org/abs/2605.21059v1
πŸ‘₯ Authors: Yan Li (possible past Tencent (China) affiliation), Yunlong Deng, Yuewen Sun, Gongxu Luo, Kun Zhang (possible past Google (United States) affiliation), Guangyi Chen
Abstract

Despite the impressive results achieved by multimodal large language models (MLLMs), their training typically relies on jointly curated multimodal data, requiring substantial human effort to construct multi-way aligned datasets and thereby limiting scalability across domains. In this work, we explore training MLLMs by only leveraging multiple paired modalities as a surrogate for the full joint multimodal distribution. Specifically, we first provide a theoretical analysis of the conditions under ...

πŸ“„ A Dialogue between Causal and Traditional Representation Learning: Toward Mutual Benefits in a Unified Formulation
πŸ—“οΈ Published: 5/20/2026
πŸ”— http://arxiv.org/abs/2605.21058v1
πŸ‘₯ Authors: Yan Li (possible past Tencent (China) affiliation), Yuewen Sun, Shaoan Xie, Gongxu Luo, Yunlong Deng, Kun Zhang (possible past Google (United States) affiliation), Guangyi Chen
Abstract

Causal representation learning (CRL) and traditional representation learning have largely developed along different trajectories. Traditional representation learning has been driven mainly by applications and empirical objectives, whereas CRL has focused more on theoretical questions, particularly identifiability. This difference in emphasis has created a gap between the two fields in terminology, problem formulation, and evaluation, limiting communication and sometimes leading to disconnected o...

πŸ“„ PlexRL: Cluster-Level Orchestration of Serviceized LLM Execution for RLVR
πŸ—“οΈ Published: 5/20/2026
πŸ”— http://arxiv.org/abs/2605.20863v1
πŸ‘₯ Authors: Yiqi Zhang, Fangzheng Jiao, Tian Tang, Boyu Tian, Hangyu Wang, Qiaoling Chen, Guoteng Wang, Zhen Jiang, Peng Sun (possible past Tencent (China) affiliation), Ping Zhang, Xiaohe Hu, Ziming Liu (possible past Massachusetts Institute Of Technology affiliation), Menghao Zhang, Yanmin Jia, Yang You (possible past University Of California, Berkeley affiliation), Siyuan Feng (possible past Carnegie Mellon University affiliation)
Abstract

Reinforcement learning with verifiable rewards (RLVR) has recently unlocked strong reasoning capabilities in large language models (LLMs), triggering rapid exploration of new algorithms and data. However, RLVR training is notoriously inefficient: long-tailed rollouts, tool-induced stalls, and asymmetric resource requirements between rollout and training introduce substantial idle time that cannot be eliminated by job-local optimizations such as synchronous pipelining, asynchronous rollout, or co...

πŸ“„ Most Transformer Modifications Still Do Not Transfer at 1-3B: A 2020-2026 Update to Narang et al. (2021) with Downstream Evaluation and a Noise Floor
πŸ—“οΈ Published: 5/20/2026
πŸ”— http://arxiv.org/abs/2605.20798v1
πŸ‘₯ Authors: Yang Zhao (possible past Google (United States) affiliation), Jiahao Lu, Bin Huang, Guhua Zhang, Jie Zhou (possible past Tsinghua University affiliation)
Abstract

Narang et al. (2021) evaluated 40+ Transformer modifications at T5-base scale and concluded that most did not transfer. Five years later, the typical working regime has moved to 1-3B parameters, downstream evaluation has replaced pretraining perplexity, and a substantially different catalogue of modifications has emerged. We revisit their question by testing 20 post-2021 Transformer modifications at 1.2B and 3B under strict iso-data, iso-compute, iso-recipe control, with a multi-seed baseline no...

πŸ“„ The Illusion of Intervention: Your LLM-Simulated Experiment is an Observational Study
πŸ—“οΈ Published: 5/20/2026
πŸ”— http://arxiv.org/abs/2605.20767v1
πŸ‘₯ Authors: Victoria Lin, Taedong Yun (possible past Google (United States) affiliation), Maja MatariΔ‡, John Canny (possible past University Of California, Berkeley affiliation), Arthur Gretton, Alexander D'amour
Abstract

Large language models (LLMs) show potential as simulators of human behavior, offering a scalable way to study responses to interventions. However, because LLMs are trained largely on observational data, interventions in experiments with LLM-simulated synthetic users can induce unintended shifts in latent user attributes, causing user drift where the implicit simulated population differs across treatment conditions, potentially distorting effect estimates. We formalize the confounding or selectio...

πŸ“„ Reinforcing Human Behavior Simulation via Verbal Feedback
πŸ—“οΈ Published: 5/19/2026
πŸ”— http://arxiv.org/abs/2605.20506v1
πŸ‘₯ Authors: Weiwei Sun, Xuhui Zhou, Jiarui Liu, Weihua Du, Haojia Sun, Yiqing Xie, Qianou Ma, Sihao Chen, Mengting Wan, Longqi Yang, Pei Zhou, Sherry Wu, Sean Welleck, Graham Neubig (possible past Carnegie Mellon University affiliation), Yiming Yang (possible past Microsoft (United States) affiliation), Maarten Sap
Abstract

Humans learn social norms and behaviors from verbal feedback (e.g., a parent saying "that was rude" or a friend explaining "here's why that hurt"). Yet, learning from feedback for LLMs has largely focused on domains like code and math, where RL rewards are directly verifiable and condensed into scalar values. As LLMs are increasingly used to simulate human behavior, e.g., standing in for users, patients, students, and other personas, there is a pressing need to make them more human-like, which r...

*Notable papers are those with at least two authors from a "big" AI/ML lab.