πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ OpenJarvis: Personal AI, On Personal Devices
πŸ—“οΈ Published: 5/16/2026
πŸ”— http://arxiv.org/abs/2605.17172v1
πŸ‘₯ Authors: Jon Saad-Falcon, Avanika Narayan, Robby Manihani, Tanvir Bhathal, Herumb Shandilya, Hakki Orhun Akengin, Gabriel Bo, Andrew Park, Matthew Hart, Caia Costello, Chuan Li, Christopher RΓ© (possible past Stanford University affiliation), Azalia Mirhoseini (possible past Google (United States) affiliation)
Abstract

Personal AI stacks, like OpenClaw and Hermes Agent, are becoming central to daily work, yet they route nearly every query (often over sensitive local data) to cloud-hosted frontier models. Replacing frontier models with local models inside existing stacks does not work: swapping Claude Opus 4.6 for Qwen3.5-9B drops accuracy by 25-39 pp across personal AI tasks like PinchBench and GAIA. Existing stacks bundle agentic prompts, tool descriptions, memory configuration, and runtime settings around a ...

πŸ“„ Scientific Logicality Enriched Methodology for LLM Reasoning: A Practice in Physics
πŸ—“οΈ Published: 5/16/2026
πŸ”— http://arxiv.org/abs/2605.17104v1
πŸ‘₯ Authors: Zhaoxin Yu, Nan Xu (possible past Tsinghua University affiliation), Kun Chen, Jiahao Zhao, Lei Wang (possible past Baidu (China) affiliation), Wenji Mao
Abstract

With the continuous advancement of reasoning abilities in Large Language Models (LLMs), their application to scientific reasoning tasks has gained significant research attention. Current research primarily emphasizes boosting LLMs' performance on scientific QA benchmarks by training on larger, more comprehensive datasets with extended reasoning chains. However, these approaches neglect the essence of the scientific reasoning process -- logicality, which is the rational foundation to ensure the v...

πŸ“„ How to Instruct Your Robot: Dense Language Annotations Power Robot Policy Learning
πŸ—“οΈ Published: 5/16/2026
πŸ”— http://arxiv.org/abs/2605.17077v1
πŸ‘₯ Authors: Bosung Kim, Ruiyi Wang, David Acuna (possible past University Of Toronto affiliation), Jaehun Jung, Alexander Trevithick, Brandon Cui, Yejin Choi (possible past Allen Institute For Artificial Intelligence affiliation), Prithviraj Ammanabrolu
Abstract

Scaling robot policy learning is bottlenecked by the cost of collecting demonstrations, while language annotations for existing demonstrations are comparatively cheap. We study language density as a lever for extracting more signal from a fixed robot or egocentric-video corpus. We introduce DeMiAn (Dense Multi-aspect Annotation), a two-stage approach that first re-labels demonstration segments with VLM-generated annotations along four complementary aspects: physical motion, scene composition, ar...

πŸ“„ Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training
πŸ—“οΈ Published: 5/16/2026
πŸ”— http://arxiv.org/abs/2605.17003v1
πŸ‘₯ Authors: Peng Cui (possible past Tsinghua University affiliation), Boyao Yang, Jun Zhu (possible past Tsinghua University affiliation)
Abstract

Reinforcement Learning (RL) post-training has emerged as the dominant paradigm for eliciting mathematical reasoning in Large Language Models (LLMs), yet prevailing techniques such as GRPO and DAPO distribute rollout and gradient budgets nearly uniformly across prompts, squandering compute on samples that are already mastered or remain far beyond the model's current capability. To address this fundamental inefficiency, we propose Learning-Zone Energy (LZE), a theoretically grounded, fully online ...

πŸ“„ Skills on the Fly: Test-Time Adaptive Skill Synthesis for LLM Agents
πŸ—“οΈ Published: 5/16/2026
πŸ”— http://arxiv.org/abs/2605.16986v1
πŸ‘₯ Authors: Jingxing Wang, Chenyu Zhou, Zhihui Fu, Jun Wang (possible past Tencent (China) affiliation), Weiwen Liu, Weinan Zhang (possible past Shanghai Jiao Tong University affiliation), Jianghao Lin
Abstract

LLM agents benefit from reusable skills, yet test-time tasks often require guidance more specific than a static skill library can provide. We propose \emph{SkillTTA}, a Test-Time Adaptive Skill Synthesis method that retrieves a small set of training trajectories relevant to the current task and synthesizes them into a temporary, task-specific textual skill. The solver model is kept fixed, so adaptation happens entirely through generated context rather than parameter updates. We evaluate the meth...

πŸ“„ Extending Pretrained 10-Second ECG Foundation Models to Longer Horizons
πŸ—“οΈ Published: 5/16/2026
πŸ”— http://arxiv.org/abs/2605.16975v1
πŸ‘₯ Authors: Wei Tang, Jinpei Han, Kangning Cui, Mattia Carletti, Fredrik K. Gustafsson, Shreyank N Gowda (possible past University Of Edinburgh affiliation), Patitapaban Palo, Anshul Thakur, Lei Clifton, Jean-Michel Morel, Raymond H. Chan, David A. Clifton (possible past University Of Oxford affiliation), Xiao Gu
Abstract

Electrocardiogram (ECG) foundation models pretrained on typical diagnostic 10-second ECG segments, have demonstrated strong transferability across a range of clinical applications. However, many real-world applications produce recordings that are typically longer, and are varied in duration during inference time. These 10-second models have no built-in way to combine information across time. Extending them to longer horizons introduces two challenges: structural incompatibilities arising from in...

πŸ“„ Harnessing AI for Inverse Partial Differential Equation Problems: Past, Present, and Prospects
πŸ—“οΈ Published: 5/16/2026
πŸ”— http://arxiv.org/abs/2605.16966v1
πŸ‘₯ Authors: Zhentao Tan, Yuze Hao, Boyi Zou, Mingsheng Long (possible past Tsinghua University affiliation), Yi Yang (possible past Baidu (China) affiliation), Gang Bao
Abstract

Solving inverse partial differential equation (PDE) problems is a fundamental topic in scientific research due to its broad significance across a wide range of real-world applications. Inverse PDE problems arise across medical imaging, geophysics, materials science, and aerodynamics, where the goal is to infer hidden causes, design structures, or control physical states. In this paper, we provide a comprehensive review of recent advances in solving inverse PDE problems using artificial intellige...

πŸ“„ OmniVL-Guard Pro: A Tool-Augmented Agent for Omnibus Vision-Language Forensics
πŸ—“οΈ Published: 5/16/2026
πŸ”— http://arxiv.org/abs/2605.16962v1
πŸ‘₯ Authors: Jinjie Shen, Zheng Huang, Yuchen Zhang (possible past University Of California, Berkeley affiliation), Yujiao Wu, Yaxiong Wang (possible past Tencent (China) affiliation), Lechao Cheng, Shengeng Tang, Tianrui Hui, Nan Pu, Zhun Zhong
Abstract

Existing vision-language forgery detection and grounding methods operate under a closed-world paradigm, assuming verification can be completed by the model alone. However, self-contained MLLMs are constrained by finite parametric knowledge, static training corpora, and limited perceptual resolution, creating a practical ceiling in dynamic open-world forensics -- particularly for real-time event verification requiring external clues and forgery segmentation demanding fine-grained scrutiny of loca...

πŸ“„ From Static Risk to Dynamic Trajectories: Toward World-Model-Inspired Clinical Prediction
πŸ—“οΈ Published: 5/16/2026
πŸ”— http://arxiv.org/abs/2605.16927v1
πŸ‘₯ Authors: Pujun Feng, Xiaoyu Guo, Seyed Ehsan Saffari, Min Hun Lee, Siew-Kei Lam, Erik Cambria, Xibin Sun, Yangtao Zhou, Tong Yang (possible past Peking University affiliation), Xiaoyu Zhang, Tao Tan, Yue Sun, Bin Cui (possible past Peking University affiliation)
Abstract

Clinical decision-making is a feedback system where risk estimates influence treatment, which in turn changes disease trajectories, and both shape clinicians' measurement practices. Static prediction often fails clinically: models trained on observational care logs conflate disease biology with clinician behavior, particularly under treatment confounder feedback and irregular or informative observation. This Review focuses on intervention-aware disease trajectory modeling in clinical AI--methods...

πŸ“„ VGGT-CD: Training-Free Robust Registration for 3D Change Detection
πŸ—“οΈ Published: 5/16/2026
πŸ”— http://arxiv.org/abs/2605.16859v1
πŸ‘₯ Authors: Wei Zhang (possible past Tsinghua University affiliation), Songhua Li, Yihang Wu, Qiang Li, Qi Wang (possible past Tsinghua University affiliation)
Abstract

3D change detection from multi-view images is essential for urban monitoring, disaster assessment, and autonomous driving. However, existing methods predominantly operate in the 2D domain, where viewpoint variations are mistaken for physical changes and depth is unavailable. While visual geometry foundation models like VGGT rapidly produce dense point clouds from unposed images, independent per-epoch reconstruction encounters fundamental obstacles: unpredictable inter-epoch scale ambiguity, regi...

πŸ“„ Sketch Then Paint: Hierarchical Reinforcement Learning for Diffusion Multi-Modal Large Language Models
πŸ—“οΈ Published: 5/16/2026
πŸ”— http://arxiv.org/abs/2605.16842v1
πŸ‘₯ Authors: Siqi Luo, Jianghan Shen, Yi Xin, Huayu Zheng, Haoxing Chen, Yan Tai, Yue Li, Junjun He, Yihao Liu, Guangtao Zhai (possible past Shanghai Jiao Tong University affiliation), Yuewen Cao, Xiaohong Liu (possible past Shanghai Jiao Tong University affiliation)
Abstract

Diffusion Multi-Modal Large Language Models (dMLLMs) are powerful for image generation, but optimizing them through reinforcement learning (RL) remains a major challenge. One primary difficulty is that a single image can be generated through many different unmasking sequences, which makes calculating importance ratios often intractable. Additionally, existing methods tend to ignore the hierarchical generation process of dMLLMs, where early tokens define the global layout and later tokens focus o...

πŸ“„ Multi-Paradigm Agent Interaction in Practice:A Systematic Analysis of Generator-Evaluator, ReAct Loop,and Adversarial Evaluation in the buddyMe Framework
πŸ—“οΈ Published: 5/16/2026
πŸ”— http://arxiv.org/abs/2605.16821v1
πŸ‘₯ Authors: Xiaohua Wang, Chao Han, Kai Yu (possible past Baidu (China) affiliation), Xiaoliang Xu, Liang Wang (possible past Tencent (China) affiliation)
Abstract

The rapid evolution of Large Language Model (LLM) agents has produced diverse interaction paradigms, yet few production systems integrate multiple paradigms within a unified architecture. This paper presents a systematic analysis of three principal agent interaction paradigms, including Multi-Agent Orchestration (Generator-Evaluator), ReAct Tool-Use Loops, and Memory-Augmented Interaction, as implemented in buddyMe, an open-source multi-model agent programming framework. We formalize a five-stag...

πŸ“„ AgentKernelArena: Generalization-Aware Benchmarking of GPU Kernel Optimization Agents
πŸ—“οΈ Published: 5/16/2026
πŸ”— http://arxiv.org/abs/2605.16819v1
πŸ‘₯ Authors: Sharareh Younesian, Wenwen Ouyang, Sina Rafati, Mehdi Rezagholizadeh, Sharon Zhou, Ji Liu (possible past Tencent (China) affiliation), Yue Liu, Yuchen Yang, Hao Li (possible past Tsinghua University affiliation), Ziqiong Liu, Dong Li, Vikram Appia, Zhenyu Gu, Emad Barsoum
Abstract

GPU kernel optimization is increasingly critical for efficient deep learning systems, but writing high-performance kernels still requires substantial low-level expertise. Recent AI coding agents can iteratively read code, invoke compilers and profilers, and refine implementations, yet existing kernel benchmarks evaluate single LLM calls rather than full agent workflows, and none include both kernel-to-kernel optimization and unseen-configuration generalization testing. We present AgentKernelAren...

πŸ“„ CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?
πŸ—“οΈ Published: 5/15/2026
πŸ”— http://arxiv.org/abs/2605.16679v1
πŸ‘₯ Authors: Haolin Chen, Deon Metelski, Leon Qi, Tao Xia, Joonyul Lee, Steve Brown, Kevin Riley, Frank Wang, T. Y. Alvin Liu, Hank Capps Md, Zeyu Tang, Xiangchen Song, Lingjing Kong, Fan Feng, Tianyi Zeng, Zhiwei Liu, Zixian Ma, Hang Jiang, Fangli Geng, Yuan Yuan, Chenyu You, Qingsong Wen, Hua Wei (possible past Google (United States) affiliation), Yanjie Fu, Yue Zhao, Carl Yang, Biwei Huang, Kun Zhang (possible past Google (United States) affiliation), Caiming Xiong (possible past Salesforce (United States) affiliation), Sanmi Koyejo, Eric P. Xing, Philip S. Yu (possible past Tsinghua University affiliation), Weiran Yao
Abstract

End-to-end automation of realistic healthcare operations stresses three capabilities underrepresented in current benchmarks: policy density, decisions must be grounded in a large library of medical, insurance, and operational rules; Multi-role composition: a single task requires the agent to play multiple roles with handoffs; and multilateral interaction: intermediate workflow steps are multi-turn dialogs, such as peer-to-peer review and patient outreach. We introduce $Ο‡$-Bench, a benchmark of l...

πŸ“„ Sustainable Intelligence for the Wild: Democratizing Ecological Monitoring via Knowledge-Adaptive Edge Expert Agents
πŸ—“οΈ Published: 5/15/2026
πŸ”— http://arxiv.org/abs/2605.16671v1
πŸ‘₯ Authors: Jiaxing Li, Hao Fang (possible past University Of Washington affiliation), Chi Xu, Miao Zhang (possible past Stanford University affiliation), Jiangchuan Liu, William I. Atlas, Katrina M. Connors, Mark A. Spoljaric
Abstract

Rapid biodiversity loss underscore the urgency of effective monitoring, yet manual surveys remain resource-intensive. While on-device AI offers a scalable alternative, its performance in the wild is often challenged by environmental variability. Current methods rely heavily on cloud resource, which requires continuous uploading of field data for model retraining. This approach is unsuitable for remote deployments because it consumes limited power and network connectivity. To address these constr...

πŸ“„ TTE-Flash: Accelerating Reasoning-based Multimodal Representations via Think-Then-Embed Tokens
πŸ—“οΈ Published: 5/15/2026
πŸ”— http://arxiv.org/abs/2605.16638v1
πŸ‘₯ Authors: Jianpeng Cheng, Xian Wu (possible past Tencent (China) affiliation), Jiangfan Zhang, Wentao Bao, Chaitanya Ahuja, Shlok Kumar Mishra, Hanchao Yu, Yang Gao (possible past Tencent (China) affiliation), Fan Xia (possible past Tencent (China) affiliation), Qi Guo, Shaodan Zhai, Xiangjun Fan, Jun Xiao
Abstract

Recent research has demonstrated that Universal Multimodal Embedding (UME) benefits significantly from Chain-of-Thought (CoT) reasoning. In this paradigm, a generative model produces explicit reasoning traces for a multimodal query, with the final representation extracted from an embedding token attending to both the query and the reasoning. Despite its effectiveness, the computational overhead of generating explicit CoT traces is often prohibitive. In this work, we propose replacing expli...

πŸ“„ Hypergraph Pattern Machine: Compositional Tokenization for Higher-Order Interactions
πŸ—“οΈ Published: 5/15/2026
πŸ”— http://arxiv.org/abs/2605.16527v1
πŸ‘₯ Authors: Kyrie Zhao, Zehong Wang, Tianyi Ma, Fang Wu, Xiangru Tang (possible past University Of Cambridge affiliation), Pietro Lio, Sheng Wang (possible past Tencent (China) affiliation), Yanfang Ye
Abstract

Hypergraphs model higher-order relations that drive real-world decisions, from drug prescriptions to recommendations. A central structural signal in such data, beyond what pairwise relations can express, is interaction compositionality: whether a higher-order relation is compositional, emergent, or inhibitory with respect to its observed or unobserved sets. In polypharmacy, the regime decides whether a drug should be dropped, kept, or excluded: a compositional drug triple can be safely simplifie...

πŸ“„ IVGT: Implicit Visual Geometry Transformer for Neural Scene Representation
πŸ—“οΈ Published: 5/15/2026
πŸ”— http://arxiv.org/abs/2605.16258v1
πŸ‘₯ Authors: Yuqi Wu, Tianyu Hu, Wenzhao Zheng, Yuanhui Huang, Haowen Sun, Jie Zhou (possible past Tsinghua University affiliation), Jiwen Lu (possible past Tsinghua University affiliation)
Abstract

Reconstructing coherent 3D geometry and appearance from unposed multi-view images is a fundamental yet challenging problem in computer vision. Most existing visual geometry foundation models predict explicit geometry by regressing pixel-aligned pointmaps, often suffering from redundancy and limited geometric continuity. We propose IVGT, an Implicit Visual Geometry Transformer that implicitly models continuous and coherent geometry from pose-free multi-view images. This formulation learns a conti...

πŸ“„ MoleCode unlocks structural intelligence in large language models
πŸ—“οΈ Published: 5/15/2026
πŸ”— http://arxiv.org/abs/2605.16480v1
πŸ‘₯ Authors: Zhiyuan Yan, Chen Liu, Boxuan Zhao, Kaiqing Lin, Jixiang Zhao, Yimi Wang, Liuzhenghao Lv, Hao Li (possible past Tsinghua University affiliation), Shanzhuo Zhang (possible past Baidu (China) affiliation), Li Yuan (possible past National University Of Singapore affiliation), Fanyang Mo
Abstract

Molecules are graphs, but large language models~(LLMs) are usually asked to reason about them through linear strings. The most popular molecular representation, SMILES, compresses atoms, bonds, branches and rings into a compact sequence in which topology is implicit, forcing LLMs to reconstruct molecular structure before performing the requested chemical operation. Here we introduce MoleCode, an LLM-native, training-free, graph-explicit molecular language in which all molecular components are re...

πŸ“„ Look Before You Leap: Autonomous Exploration for LLM Agents
πŸ—“οΈ Published: 5/15/2026
πŸ”— http://arxiv.org/abs/2605.16143v1
πŸ‘₯ Authors: Ziang Ye, Wentao Shi, Yuxin Liu, Yu Wang (possible past Tsinghua University affiliation), Zhengzhou Cai, Yaorui Shi, Qi Gu, Xunliang Cai, Fuli Feng (possible past National University Of Singapore affiliation)
Abstract

Large language model based agents often fail in unfamiliar environments due to premature exploitation: a tendency to act on prior knowledge before acquiring sufficient environment-specific information. We identify autonomous exploration as a critical yet underexplored capability for building adaptive agents. To formalize and quantify this capability, we introduce Exploration Checkpoint Coverage, a verifiable metric that measures how broadly an agent discovers key states, objects, and affordances...

πŸ“„ GenShield: Unified Detection and Artifact Correction for AI-Generated Images
πŸ—“οΈ Published: 5/15/2026
πŸ”— http://arxiv.org/abs/2605.16122v1
πŸ‘₯ Authors: Zhipei Xu, Xuanyu Zhang, Youmin Xu, Qing Huang, Shen Chen, Taiping Yao (possible past Tencent (China) affiliation), Shouhong Ding (possible past Tencent (China) affiliation), Jian Zhang (possible past Tencent (China) affiliation)
Abstract

Diffusion-based image synthesis has made AI-generated images (AIGI) increasingly photorealistic, raising urgent concerns about authenticity in applications such as misinformation detection, digital forensics, and content moderation. Despite the substantial advances in AIGI detection, how to correct detected AI-generated images with visible artifacts and restore realistic appearance remains largely underexplored. Moreover, few existing work has established the connection between AIGI detection an...

πŸ“„ DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention
πŸ—“οΈ Published: 5/18/2026
πŸ”— http://arxiv.org/abs/2605.18753v1
πŸ‘₯ Authors: Yuxiang Huang, Nuno M. T. GonΓ§alves, Federico Alvetreti, Lei Li (possible past Carnegie Mellon University affiliation), Xu Han (possible past Tsinghua University affiliation), Edoardo M. Ponti, AndrΓ© F. T. Martins, Marcos V. Treviso
Abstract

Current hierarchical attention methods, such as NSA and InfLLMv2, select the top-k relevant key-value (KV) blocks based on coarse attention scores and subsequently apply fine-grained softmax attention on the selected tokens. However, the top-k operation assumes the number of relevant tokens for any query is fixed and it precludes the gradient flow between the sparse and dense stages. In this work, we propose DashAttention (Differentiable and Adaptive Sparse Hierarchical Attention), which leverag...

πŸ“„ ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop
πŸ—“οΈ Published: 5/18/2026
πŸ”— http://arxiv.org/abs/2605.18746v1
πŸ‘₯ Authors: Yining Hong, Jiageng Liu, Han Yin, Manling Li, Leonidas Guibas (possible past Stanford University affiliation), Li Fei-Fei (possible past Stanford University affiliation), Jiajun Wu (possible past Massachusetts Institute Of Technology affiliation), Yejin Choi (possible past Allen Institute For Artificial Intelligence affiliation)
Abstract

Spatial intelligence unfolds through a perception-action loop: agents act to acquire observations, and reason about how observations vary as a function of action. Rather than passively processing what is seen, they actively uncover what is unseen - occluded structure, dynamics, containment, and functionality that cannot be resolved from passive sensing alone. We move beyond prior formulations of spatial intelligence that assume oracle observations by recasting the observer as an actor. We introd...

πŸ“„ Post-Trained MoE Can Skip Half Experts via Self-Distillation
πŸ—“οΈ Published: 5/18/2026
πŸ”— http://arxiv.org/abs/2605.18643v1
πŸ‘₯ Authors: Xingtai Lv, Li Sheng (possible past Google (United States) affiliation), Kaiyan Zhang, Yichen You, Siyan Gao, Xueheng Luo, Yuxin Zuo, Yuchen Fan, Junlin Yang, Ganqu Cui (possible past Tsinghua University affiliation), Bingning Wang, Fan Yang (possible past Tencent (China) affiliation), Youbang Sun, Ning Ding (possible past Tsinghua University affiliation), Bowen Zhou
Abstract

Mixture-of-Experts (MoE) scales language models efficiently through sparse expert activation, and its dynamic variant further reduces computation by adjusting the activated experts in an input-dependent manner. Existing dynamic MoE methods usually rely on pre-training from scratch or task-specific adaptation, leaving the practical conversion of fully trained MoE underexplored. Enabling such adaptation would directly alleviate the inference costs by allowing easy tokens to bypass unnecessary expe...

πŸ“„ S2Aligner: Pair-Efficient and Transferable Pre-Training for Sparse Text-Attributed Graphs
πŸ—“οΈ Published: 5/18/2026
πŸ”— http://arxiv.org/abs/2605.18579v1
πŸ‘₯ Authors: Yuhan Wang (possible past Tencent (China) affiliation), Haopeng Zhang, Yibo Ding, Jiaqi Yu, Xinyu Zhao, Yuhang Liu, Ziwei Zhang, Xiao Wang (possible past Google (United States) affiliation), Ruijie Wang
Abstract

Pre-training on text-attributed graphs (TAGs) is central to building transferable graph foundation models, where LLM-as-Aligner methods align graph and text representations through the semantic knowledge of large language models. However, these methods usually assume that node texts provide sufficient and reliable supervision, an assumption often violated in real-world sparse TAGs. When textual anchors are missing, noisy, or uneven across domains, graph structures must be aligned with weak seman...

πŸ“„ scHelix: Asymmetric Dual-Stream Integration via Explicit Gene-Level Disentanglement
πŸ—“οΈ Published: 5/18/2026
πŸ”— http://arxiv.org/abs/2605.18576v1
πŸ‘₯ Authors: Xichen Yan, Zelin Zang, Changxi Chi, Jingbo Zhou (possible past Baidu (China) affiliation), Chang Yu, Jinlin Wu, Shenghui Cheng, Fuji Yang, Jiebo Luo, Zhen Lei (possible past Beijing Academy Of Artificial Intelligence affiliation), Stan Z. Li
Abstract

A critical challenge in single-cell RNA sequencing (scRNA-seq) integration is resolving the tension between eliminating batch effects and maintaining biological fidelity. While recent evidence indicates that batch effects manifest heterogeneously across genes, most existing methods process the transcriptome uniformly, frequently resulting in over-correction and loss of subtle biological signals. To address this, we present scHelix, a dataset-adaptive framework that fundamentally changes how feat...

πŸ“„ Continuous Diffusion Scales Competitively with Discrete Diffusion for Language
πŸ—“οΈ Published: 5/18/2026
πŸ”— http://arxiv.org/abs/2605.18530v1
πŸ‘₯ Authors: Zhihan Yang, Wei Guo, Shuibai Zhang, Subham Sekhar Sahoo, Yongxin Chen, Arash Vahdat (possible past Nvidia (United States) affiliation), Morteza Mardani (possible past Nvidia (United States) affiliation), John Thickstun
Abstract

While diffusion has drawn considerable recent attention from the language modeling community, continuous diffusion has appeared less scalable than discrete approaches. To challenge this belief we revisit Plaid, a likelihood-based continuous diffusion language model (DLM), and construct RePlaid by aligning the architecture of Plaid with modern discrete DLMs. In this unified setting, we establish the first scaling law for continuous DLMs that rivals discrete DLMs: RePlaid exhibits a compute gap of...

πŸ“„ GAMMA: Global Bit Allocation for Mixed-Precision Models under Arbitrary Budgets
πŸ—“οΈ Published: 5/18/2026
πŸ”— http://arxiv.org/abs/2605.18475v1
πŸ‘₯ Authors: Zhangyang Yao, Haiyan Zhao, Haoyu Wang (possible past Tencent (China) affiliation), Tianbo Huang, Lihua Zhang, Xu Han (possible past Tsinghua University affiliation)
Abstract

Mixed-precision quantization improves the budget--accuracy trade-off for large language models (LLMs) by allocating more bits to sensitive modules. However, automating this allocation at LLM scale faces a unique combination of constraints: learnable approaches require quantization-aware training, which is infeasible for billion-parameter models; training-free alternatives rely on static proxy metrics that miss cross-module interactions and must be recomputed per target budget; and search-based m...

πŸ“„ Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation
πŸ—“οΈ Published: 5/18/2026
πŸ”— http://arxiv.org/abs/2605.18474v1
πŸ‘₯ Authors: Sixu Chen, Xiang Chen (possible past Tencent (China) affiliation), Hongyao Yu, Jiaxin Hong, Hao Fang (possible past University Of Washington affiliation), Shuoyang Sun, Bin Chen, Shu-Tao Xia
Abstract

The widespread deployment and redistribution of large language models (LLMs) have made model provenance tracking a critical challenge. While existing LLM fingerprinting methods, particularly active approaches that embed identity signals via fine-tuning, achieve high accuracy and robustness, they suffer from significant scalability bottlenecks. These methods typically treat fingerprint injection as an independent, one-off optimization task rather than a reusable capability, necessitating separate...

πŸ“„ EvoMemBench: Benchmarking Agent Memory from a Self-Evolving Perspective
πŸ—“οΈ Published: 5/18/2026
πŸ”— http://arxiv.org/abs/2605.18421v1
πŸ‘₯ Authors: Yuyao Wang, Zhongjian Zhang, Mo Chi, Kaichi Yu, Yuhan Li, Miao Peng, Bing Tong, Chen Zhang (possible past Peking University affiliation), Yan Zhou, Jia Li (possible past Google (United States) affiliation)
Abstract

Recent benchmarks for Large Language Model (LLM) agents mainly evaluate reasoning, planning, and execution. However, memory is also essential for agents, as it enables them to store, update, and retrieve information over time. This ability remains under-evaluated, largely because existing benchmarks do not provide a systematic way to assess memory mechanisms. In this paper, we study agent memory from a self-evolving perspective and introduce EvoMemBench, a unified benchmark organized along two a...

πŸ“„ Dual-Rate Diffusion: Accelerating diffusion models with an interleaved heavy-light network
πŸ—“οΈ Published: 5/18/2026
πŸ”— http://arxiv.org/abs/2605.18190v1
πŸ‘₯ Authors: Grigory Bartosh, David Ruhe, Emiel Hoogeboom, Jonathan Heek (possible past Google (United States) affiliation), Thomas Mensink, Tim Salimans (possible past Openai (United States) affiliation)
Abstract

Diffusion models achieve state-of-the-art generative performance but suffer from high computational costs during inference due to the repeated evaluation of a heavy neural network. In this work, we propose Dual-Rate Diffusion, a method to accelerate sampling by interleaving the execution of a heavy high-capacity context encoder and a light efficient denoising model. The context encoder is evaluated sparsely to extract high-dimensional features, which are effectively reused by the light denoising...

πŸ“„ Attention Sinks and Outliers in Attention Residuals
πŸ—“οΈ Published: 5/18/2026
πŸ”— http://arxiv.org/abs/2605.17887v1
πŸ‘₯ Authors: Haozheng Luo, Haoran Dai, Shaoyang Zhang, Xi Chen (possible past University Of California, Berkeley affiliation), Eric Hanchen Jiang, Yijiang Li, Jingyuan Huang, Chenghao Qiu, Chenwei Xu, Zhenyu Pan, Haotian Zhang (possible past Stanford University affiliation), Binghui Wang, Yan Chen
Abstract

We propose OASIS, an outlier- and sink-aware technique built on inter-layer null signaling. As AttnResidual architectures introduce an additional depth-wise normalization channel, they improve inter-layer routing flexibility but also exacerbate attention sinks, activation outliers, and the resulting degradation in inference stability and quantization robustness. OASIS addresses this issue by introducing a Softmax1-based null space and coupling token-level null evidence to depth routing through a...

πŸ“„ SNLP: Layer-Parallel Inference via Structured Newton Corrections
πŸ—“οΈ Published: 5/18/2026
πŸ”— http://arxiv.org/abs/2605.17842v1
πŸ‘₯ Authors: Ligong Han (possible past Google (United States) affiliation), Kai Xu (possible past National University Of Defense Technology affiliation), Hao Wang (possible past Tsinghua University affiliation), Akash Srivastava
Abstract

Autoregressive language models execute Transformer layers sequentially, creating a latency bottleneck that is not removed by conventional tensor or pipeline parallelism. We study whether this layerwise dependency can be relaxed by treating the hidden-state trace across layers as the solution of a nonlinear residual equation and solving it with parallel Newton-style updates. While this view is principled, exact Newton corrections require expensive Jacobian-vector products and naive fixed-point it...

*Notable papers are those with at least two authors from a "big" AI/ML lab.