πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ Understanding Usage and Engagement in AI-Powered Scientific Research Tools: The Asta Interaction Dataset
πŸ—“οΈ Published: 2/26/2026
πŸ”— http://arxiv.org/abs/2602.23335v1
πŸ‘₯ Authors: Dany Haddad, Dan Bareket, Joseph Chee Chang, Jay Deyoung, Jena D. Hwang, Uri Katz, Mark Polak, Sangho Suh, Harshit Surana, Aryeh Tiktinsky, Shriya Atmakuri, Jonathan Bragg, Mike D'arcy, Sergey Feldman, Amal Hassan-Ali, RubΓ©n Lozano, Bodhisattwa Prasad Majumder (possible past Google (United States) affiliation), Charles Mcgrady, Amanpreet Singh (possible past Meta (United States) affiliation), Brooke Vlahos, Yoav Goldberg (possible past Google (United States) affiliation), Doug Downey (possible past Allen Institute For Artificial Intelligence affiliation)
Abstract

AI-powered scientific research tools are rapidly being integrated into research workflows, yet the field lacks a clear lens into how researchers use these systems in real-world settings. We present and analyze the Asta Interaction Dataset, a large-scale resource comprising over 200,000 user queries and interaction logs from two deployed tools (a literature discovery interface and a scientific question-answering interface) within an LLM-powered retrieval-augmented generation platform. Using this ...

πŸ“„ AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning
πŸ—“οΈ Published: 2/26/2026
πŸ”— http://arxiv.org/abs/2602.23258v1
πŸ‘₯ Authors: Yutong Wang, Siyuan Xiong, Xuebo Liu, Wenkang Zhou, Liang Ding, Miao Zhang (possible past Stanford University affiliation), Min Zhang (possible past Tsinghua University affiliation)
Abstract

While Multi-Agent Systems (MAS) excel in complex reasoning, they suffer from the cascading impact of erroneous information generated by individual participants. Current solutions often resort to rigid structural engineering or expensive fine-tuning, limiting their deployability and adaptability. We propose AgentDropoutV2, a test-time rectify-or-reject pruning framework designed to dynamically optimize MAS information flow without retraining. Our approach acts as an active firewall, intercepting ...

πŸ“„ The Trinity of Consistency as a Defining Principle for General World Models
πŸ—“οΈ Published: 2/26/2026
πŸ”— http://arxiv.org/abs/2602.23152v1
πŸ‘₯ Authors: Jingxuan Wei, Siyuan Li (possible past Tencent (China) affiliation), Yuhang Xu, Zheng Sun, Junjie Jiang, Hexuan Jin, Caijun Jia, Honghao He, Xinglong Xu, Xi Bai, Chang Yu, Yumou Liu, Junnan Zhu, Xuanhe Zhou, Jintao Chen, Xiaobin Hu (possible past Tencent (China) affiliation), Shancheng Pang, Bihui Yu, Ran He, Zhen Lei (possible past Beijing Academy Of Artificial Intelligence affiliation), Stan Z. Li, Conghui He (possible past Tsinghua University affiliation), Shuicheng Yan (possible past National University Of Singapore affiliation), Cheng Tan
Abstract

The construction of World Models capable of learning, simulating, and reasoning about objective physical laws constitutes a foundational challenge in the pursuit of Artificial General Intelligence. Recent advancements represented by video generation models like Sora have demonstrated the potential of data-driven scaling laws to approximate physical dynamics, while the emerging Unified Multimodal Model (UMM) offers a promising architectural paradigm for integrating perception, language, and reaso...

πŸ“„ MoDora: Tree-Based Semi-Structured Document Analysis System
πŸ—“οΈ Published: 2/26/2026
πŸ”— http://arxiv.org/abs/2602.23061v1
πŸ‘₯ Authors: Bangrui Xu, Qihang Yao, Zirui Tang, Xuanhe Zhou, Yeye He, Shihan Yu, Qianqian Xu, Bin Wang, Guoliang Li (possible past Tsinghua University affiliation), Conghui He (possible past Tsinghua University affiliation), Fan Wu
Abstract

Semi-structured documents integrate diverse interleaved data elements (e.g., tables, charts, hierarchical paragraphs) arranged in various and often irregular layouts. These documents are widely observed across domains and account for a large portion of real-world data. However, existing methods struggle to support natural language question answering over these documents due to three main technical challenges: (1) The elements extracted by techniques like OCR are often fragmented and stripped of ...

πŸ“„ Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search
πŸ—“οΈ Published: 2/26/2026
πŸ”— http://arxiv.org/abs/2602.22983v1
πŸ‘₯ Authors: Xun Huang (possible past Nvidia (United States) affiliation), Simeng Qin, Xiaoshuang Jia, Ranjie Duan, Huanqian Yan, Zhitao Zeng, Fei Yang (possible past Meta (United States) affiliation), Yang Liu (possible past Tsinghua University affiliation), Xiaojun Jia
Abstract

As Large Language Models (LLMs) are increasingly used, their security risks have drawn increasing attention. Existing research reveals that LLMs are highly susceptible to jailbreak attacks, with effectiveness varying across language contexts. This paper investigates the role of classical Chinese in jailbreak attacks. Owing to its conciseness and obscurity, classical Chinese can partially bypass existing safety constraints, exposing notable vulnerabilities in LLMs. Based on this observation, this...

πŸ“„ FactGuard: Agentic Video Misinformation Detection via Reinforcement Learning
πŸ—“οΈ Published: 2/26/2026
πŸ”— http://arxiv.org/abs/2602.22963v1
πŸ‘₯ Authors: Zehao Li, Hongwei Yu, Hao Jiang, Qiang Sheng, Yilong Xu, Baolong Bi, Yang Li (possible past Google (United States) affiliation), Zhenlong Yuan (possible past Tsinghua University affiliation), Yujun Cai, Zhaoqi Wang
Abstract

Multimodal large language models (MLLMs) have substantially advanced video misinformation detection through unified multimodal reasoning, but they often rely on fixed-depth inference and place excessive trust in internally generated assumptions, particularly in scenarios where critical evidence is sparse, fragmented, or requires external verification. To address these limitations, we propose FactGuard, an agentic framework for video misinformation detection that formulates verification as an ite...

πŸ“„ Unleashing the Potential of Diffusion Models for End-to-End Autonomous Driving
πŸ—“οΈ Published: 2/26/2026
πŸ”— http://arxiv.org/abs/2602.22801v1
πŸ‘₯ Authors: Yinan Zheng, Tianyi Tan, Bin Huang, Enguang Liu, Ruiming Liang, Jianlin Zhang, Jianwei Cui, Guang Chen, Kun Ma, Hangjun Ye, Long Chen (possible past Tencent (China) affiliation), Ya-Qin Zhang, Xianyuan Zhan, Jingjing Liu (possible past Microsoft (United States) affiliation)
Abstract

Diffusion models have become a popular choice for decision-making tasks in robotics, and more recently, are also being considered for solving autonomous driving tasks. However, their applications and evaluations in autonomous driving remain limited to simulation-based or laboratory settings. The full strength of diffusion models for large-scale, complex real-world settings, such as End-to-End Autonomous Driving (E2E AD), remains underexplored. In this study, we conducted a systematic and large-s...

πŸ“„ Generative Data Transformation: From Mixed to Unified Data
πŸ—“οΈ Published: 2/26/2026
πŸ”— http://arxiv.org/abs/2602.22743v1
πŸ‘₯ Authors: Jiaqing Zhang, Mingjia Yin, Hao Wang (possible past Tsinghua University affiliation), Yuxin Tian, Yuyang Ye, Yawen Li, Wei Guo, Yong Liu, Enhong Chen (possible past Baidu (China) affiliation)
Abstract

Recommendation model performance is intrinsically tied to the quality, volume, and relevance of their training data. To address common challenges like data sparsity and cold start, recent researchs have leveraged data from multiple auxiliary domains to enrich information within the target domain. However, inherent domain gaps can degrade the quality of mixed-domain data, leading to negative transfer and diminished model performance. Existing prevailing \emph{model-centric} paradigm -- which reli...

πŸ“„ RLHFless: Serverless Computing for Efficient RLHF
πŸ—“οΈ Published: 2/26/2026
πŸ”— http://arxiv.org/abs/2602.22718v1
πŸ‘₯ Authors: Rui Wei, Hanfei Yu, Shubham Jain, Yogarajan Sivakumar, Devesh Tiwari, Jian Li (possible past Tencent (China) affiliation), Seung-Jong Park, Hao Wang (possible past Tsinghua University affiliation)
Abstract

Reinforcement Learning from Human Feedback (RLHF) has been widely applied to Large Language Model (LLM) post-training to align model outputs with human preferences. Recent models, such as DeepSeek-R1, have also shown RLHF's potential to improve LLM reasoning on complex tasks. In RL, inference and training co-exist, creating dynamic resource demands throughout the workflow. Compared to traditional RL, RLHF further challenges training efficiency due to expanding model sizes and resource consumptio...

πŸ“„ Reinforcing Real-world Service Agents: Balancing Utility and Cost in Task-oriented Dialogue
πŸ—“οΈ Published: 2/26/2026
πŸ”— http://arxiv.org/abs/2602.22697v1
πŸ‘₯ Authors: Ning Gao, Wei Zhang (possible past Tsinghua University affiliation), Yuqin Dai, Ling Shi, Ziyin Wang, Yujie Wang, Wei He (possible past Baidu (China) affiliation), Jinpeng Wang (possible past Tencent (China) affiliation), Chaozheng Wang
Abstract

The rapid evolution of Large Language Models (LLMs) has accelerated the transition from conversational chatbots to general agents. However, effectively balancing empathetic communication with budget-aware decision-making remains an open challenge. Since existing methods fail to capture these complex strategic trade-offs, we propose InteractCS-RL, a framework that reframes task-oriented dialogue as a multi-granularity reinforcement learning process. Specifically, we first establish a User-centric...

πŸ“„ dLLM: Simple Diffusion Language Modeling
πŸ—“οΈ Published: 2/26/2026
πŸ”— http://arxiv.org/abs/2602.22661v1
πŸ‘₯ Authors: Zhanhui Zhou, Lingjie Chen, Hanghang Tong (possible past Ibm (United States) affiliation), Dawn Song (possible past University Of California, Berkeley affiliation)
Abstract

Although diffusion language models (DLMs) are evolving quickly, many recent models converge on a set of shared components. These components, however, are distributed across ad-hoc research codebases or lack transparent implementations, making them difficult to reproduce or extend. As the field accelerates, there is a clear need for a unified framework that standardizes these common components while remaining flexible enough to support new methods and architectures. To address this gap, we intr...

πŸ“„ MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios
πŸ—“οΈ Published: 2/26/2026
πŸ”— http://arxiv.org/abs/2602.22638v1
πŸ‘₯ Authors: Zhiheng Song, Jingshuai Zhang, Chuan Qin (possible past Baidu (China) affiliation), Chao Wang (possible past Google (United States) affiliation), Chao Chen (possible past Tencent (China) affiliation), Longfei Xu, Kaikui Liu, Xiangxiang Chu, Hengshu Zhu (possible past Baidu (China) affiliation)
Abstract

Route-planning agents powered by large language models (LLMs) have emerged as a promising paradigm for supporting everyday human mobility through natural language interaction and tool-mediated decision making. However, systematic evaluation in real-world mobility settings is hindered by diverse routing demands, non-deterministic mapping services, and limited reproducibility. In this study, we introduce MobilityBench, a scalable benchmark for evaluating LLM-based route-planning agents in real-wor...

πŸ“„ ContextRL: Enhancing MLLM's Knowledge Discovery Efficiency with Context-Augmented RL
πŸ—“οΈ Published: 2/26/2026
πŸ”— http://arxiv.org/abs/2602.22623v1
πŸ‘₯ Authors: Xingyu Lu, Jinpeng Wang (possible past Tencent (China) affiliation), Yifan Zhang, Shijie Ma, Xiao Hu, Tianke Zhang, Haonan Fan, Kaiyu Jiang, Changyi Liu, Kaiyu Tang, Bin Wen, Fan Yang (possible past Tencent (China) affiliation), Tingting Gao, Han Li, Chun Yuan
Abstract

We propose ContextRL, a novel framework that leverages context augmentation to overcome these bottlenecks. Specifically, to enhance Identifiability, we provide the reward model with full reference solutions as context, enabling fine-grained process verification to filter out false positives (samples with the right answer but low-quality reasoning process). To improve Reachability, we introduce a multi-turn sampling strategy where the reward model generates mistake reports for failed attempts, gu...

πŸ“„ S2O: Early Stopping for Sparse Attention via Online Permutation
πŸ—“οΈ Published: 2/26/2026
πŸ”— http://arxiv.org/abs/2602.22575v1
πŸ‘₯ Authors: Yu Zhang (possible past Google (United States) affiliation), Songwei Liu, Chenqian Yan, Sheng Lin, Beichen Ning, Fangmin Chen, Xing Wang (possible past Tencent (China) affiliation)
Abstract

Attention scales quadratically with sequence length, fundamentally limiting long-context inference. Existing block-granularity sparsification can reduce latency, but coarse blocks impose an intrinsic sparsity ceiling, making further improvements difficult even with carefully engineered designs. We present S2O, which performs early stopping for sparse attention via online permutation. Inspired by virtual-to-physical address mapping in memory systems, S2O revisits and factorizes FlashAttention exe...

πŸ“„ Addressing Climate Action Misperceptions with Generative AI
πŸ—“οΈ Published: 2/26/2026
πŸ”— http://arxiv.org/abs/2602.22564v1
πŸ‘₯ Authors: Miriam Remshard, Yara Kyrychenko, Sander Van Der Linden (possible past University Of Cambridge affiliation), Matthew H. Goldberg, Anthony Leiserowitz, Elena Savoia, Jon Roozenbeek (possible past University Of Cambridge affiliation)
Abstract

Mitigating climate change requires behaviour change. However, even climate-concerned individuals often hold misperceptions about which actions most reduce carbon emissions. We recruited 1201 climate-concerned individuals to examine whether discussing climate actions with a large language model (LLM) equipped with climate knowledge and prompted to provide personalised responses would foster more accurate perceptions of the impacts of climate actions and increase willingness to adopt feasible, hig...

πŸ“„ ArchAgent: Agentic AI-driven Computer Architecture Discovery
πŸ—“οΈ Published: 2/25/2026
πŸ”— http://arxiv.org/abs/2602.22425v1
πŸ‘₯ Authors: Raghav Gupta (possible past Google (United States) affiliation), Akanksha Jain, Abraham Gonzalez, Alexander Novikov (possible past Google (United States) affiliation), Po-Sen Huang (possible past Google (United States) affiliation), Matej Balog (possible past Deepmind (United Kingdom) affiliation), Marvin Eisenberger, Sergey Shirobokov, NgΓ’n VΕ©, Martin Dixon, Borivoje NikoliΔ‡ (possible past University Of California, Berkeley affiliation), Parthasarathy Ranganathan (possible past Google (United States) affiliation), Sagar Karandikar
Abstract

Agile hardware design flows are a critically needed force multiplier to meet the exploding demand for compute. Recently, agentic generative AI systems have demonstrated significant advances in algorithm design, improving code efficiency, and enabling discovery across scientific domains. Bridging these worlds, we present ArchAgent, an automated computer architecture discovery system built on AlphaEvolve. We show ArchAgent's ability to automatically design/implement state-of-the-art (SoTA) cache...

πŸ“„ GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL
πŸ—“οΈ Published: 2/25/2026
πŸ”— http://arxiv.org/abs/2602.22190v1
πŸ‘₯ Authors: Rui Yang, Qianhui Wu, Zhaoyang Wang, Hanyang Chen, Ke Yang (possible past Google (United States) affiliation), Hao Cheng (possible past Tencent (China) affiliation), Huaxiu Yao, Baoling Peng, Huan Zhang, Jianfeng Gao (possible past Microsoft (United States) affiliation), Tong Zhang (possible past Tencent (China) affiliation)
Abstract

Open-source native GUI agents still lag behind closed-source systems on long-horizon navigation tasks. This gap stems from two limitations: a shortage of high-quality, action-aligned reasoning data, and the direct adoption of generic post-training pipelines that overlook the unique challenges of GUI agents. We identify two fundamental issues in these pipelines: (i) standard SFT with CoT reasoning often hurts grounding, and (ii) step-wise RLVR-tyle training faces partial verifiability, where mult...

πŸ“„ ParamMem: Augmenting Language Agents with Parametric Reflective Memory
πŸ—“οΈ Published: 2/26/2026
πŸ”— http://arxiv.org/abs/2602.23320v1
πŸ‘₯ Authors: Tianjun Yao, Yongqiang Chen, Yujia Zheng, Pan Li (possible past Baidu (China) affiliation), Zhiqiang Shen, Kun Zhang (possible past Google (United States) affiliation)
Abstract

Self-reflection enables language agents to iteratively refine solutions, yet often produces repetitive outputs that limit reasoning performance. Recent studies have attempted to address this limitation through various approaches, among which increasing reflective diversity has shown promise. Our empirical analysis reveals a strong positive correlation between reflective diversity and task success, further motivating the need for diverse reflection signals. We introduce ParamMem, a parametric mem...

πŸ“„ Generative Recommendation for Large-Scale Advertising
πŸ—“οΈ Published: 2/26/2026
πŸ”— http://arxiv.org/abs/2602.22732v1
πŸ‘₯ Authors: Ben Xue, Dan Liu (possible past Google (United States) affiliation), Lixiang Wang, Mingjie Sun, Peng Wang (possible past Peking University affiliation), Pengfei Zhang, Shaoyun Shi, Tianyu Xu, Yunhao Sha, Zhiqiang Liu, Bo Kong, Bo Wang (possible past Tencent (China) affiliation), Hang Yang, Jieting Xue, Junhao Wang, Shengyu Wang, Shuping Hui, Wencai Ye, Xiao Lin, Yongzhi Li, Yuhang Chen, Zhihui Yin, Quan Chen, Shiyang Wen, Wenjin Wu, Han Li, Guorui Zhou, Changcheng Li (possible past Tencent (China) affiliation), Peng Jiang
Abstract

Generative recommendation has recently attracted widespread attention in industry due to its potential for scaling and stronger model capacity. However, deploying real-time generative recommendation in large-scale advertising requires designs beyond large-language-model (LLM)-style training and serving recipes. We present a production-oriented generative recommender co-designed across architecture, learning, and serving, named GR4AD (Generative Recommendation for ADdvertising). As for tokenizati...

πŸ“„ Vectorizing the Trie: Efficient Constrained Decoding for LLM-based Generative Retrieval on Accelerators
πŸ—“οΈ Published: 2/26/2026
πŸ”— http://arxiv.org/abs/2602.22647v1
πŸ‘₯ Authors: Zhengyang Su, Isay Katsman, Yueqi Wang, Ruining He, Lukasz Heldt (possible past Google (United States) affiliation), Raghunandan Keshavan, Shao-Chuan Wang, Xinyang Yi (possible past Google (United States) affiliation), Mingyan Gao, Onkar Dalal, Lichan Hong (possible past Google (United States) affiliation), Ed Chi, Ningren Han
Abstract

Generative retrieval has emerged as a powerful paradigm for LLM-based recommendation. However, industrial recommender systems often benefit from restricting the output space to a constrained subset of items based on business logic (e.g. enforcing content freshness or product category), which standard autoregressive decoding cannot natively support. Moreover, existing constrained decoding methods that make use of prefix trees (Tries) incur severe latency penalties on hardware accelerators (TPUs/G...

πŸ“„ pQuant: Towards Effective Low-Bit Language Models via Decoupled Linear Quantization-Aware Training
πŸ—“οΈ Published: 2/26/2026
πŸ”— http://arxiv.org/abs/2602.22592v1
πŸ‘₯ Authors: Wenzheng Zhang, Bingzheng Liu, Yang Hu, Xiaoying Bai, Wentao Zhang (possible past Mila - Quebec Artificial Intelligence Institute affiliation), Bin Cui (possible past Peking University affiliation)
Abstract

Quantization-Aware Training from scratch has emerged as a promising approach for building efficient large language models (LLMs) with extremely low-bit weights (sub 2-bit), which can offer substantial advantages for edge deployment. However, existing methods still fail to achieve satisfactory accuracy and scalability. In this work, we identify a parameter democratization effect as a key bottleneck: the sensitivity of all parameters becomes homogenized, severely limiting expressivity. To address ...

πŸ“„ IBCircuit: Towards Holistic Circuit Discovery with Information Bottleneck
πŸ—“οΈ Published: 2/26/2026
πŸ”— http://arxiv.org/abs/2602.22581v1
πŸ‘₯ Authors: Tian Bian (possible past Tsinghua University affiliation), Yifan Niu, Chaohao Yuan, Chengzhi Piao, Bingzhe Wu, Long-Kai Huang, Yu Rong (possible past Tencent (China) affiliation), Tingyang Xu (possible past Tencent (China) affiliation), Hong Cheng, Jia Li (possible past Google (United States) affiliation)
Abstract

Circuit discovery has recently attracted attention as a potential research direction to explain the non-trivial behaviors of language models. It aims to find the computational subgraphs, also known as circuits, within the model that are responsible for solving specific tasks. However, most existing studies overlook the holistic nature of these circuits and require designing specific corrupted activations for different tasks, which is inaccurate and inefficient. In this work, we propose an end-to...

πŸ“„ A Synergistic Approach: Dynamics-AI Ensemble in Tropical Cyclone Forecasting
πŸ—“οΈ Published: 2/26/2026
πŸ”— http://arxiv.org/abs/2602.22533v1
πŸ‘₯ Authors: Yonghui Li, Wansuo Duan, Hao Li (possible past Tsinghua University affiliation), Wei Han (possible past Google (United States) affiliation), Han Zhang (possible past Tsinghua University affiliation), Yinuo Li
Abstract

This study addresses a critical challenge in AI-based weather forecasting by developing an AI-driven optimized ensemble forecast system using Orthogonal Conditional Nonlinear Optimal Perturbations (O-CNOPs). The system bridges the gap between computational efficiency and dynamic consistency in tropical cyclone (TC) forecasting. Unlike conventional ensembles limited by computational costs or AI ensembles constrained by inadequate perturbation methods, O-CNOPs generate dynamically optimized pertur...

πŸ“„ Global River Forecasting with a Topology-Informed AI Foundation Model
πŸ—“οΈ Published: 2/25/2026
πŸ”— http://arxiv.org/abs/2602.22293v1
πŸ‘₯ Authors: Hancheng Ren, Gang Zhao, Shuo Wang (possible past Nvidia (United States) affiliation), Louise Slater (possible past University Of Oxford affiliation), Dai Yamazaki, Shu Liu (possible past Tencent (China) affiliation), Jingfang Fan, Shibo Cui, Ziming Yu, Shengyu Kang, Depeng Zuo, Dingzhi Peng, Zongxue Xu, Bo Pang
Abstract

River systems operate as inherently interconnected continuous networks, meaning river hydrodynamic simulation ought to be a systemic process. However, widespread hydrology data scarcity often restricts data-driven forecasting to isolated predictions. To achieve systemic simulation and reduce reliance on river observations, we present GraphRiverCast (GRC), a topology-informed AI foundation model designed to simulate multivariate river hydrodynamics in global river systems. GRC is capable of opera...

*Notable papers are those with at least two authors from a "big" AI/ML lab.