πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ Synthetic Computers at Scale for Long-Horizon Productivity Simulation
πŸ—“οΈ Published: 4/30/2026
πŸ”— http://arxiv.org/abs/2604.28181v1
πŸ‘₯ Authors: Tao Ge, Baolin Peng, Hao Cheng (possible past Tencent (China) affiliation), Jianfeng Gao (possible past Microsoft (United States) affiliation)
Abstract

Realistic long-horizon productivity work is strongly conditioned on user-specific computer environments, where much of the work context is stored and organized through directory structures and content-rich artifacts. To scale synthetic data creation for such productivity scenarios, we introduce Synthetic Computers at Scale, a scalable methodology for creating such environments with realistic folder hierarchies and content-rich artifacts (e.g., documents, spreadsheets, and presentations). Conditi...

πŸ“„ Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists
πŸ—“οΈ Published: 4/30/2026
πŸ”— http://arxiv.org/abs/2604.28158v1
πŸ‘₯ Authors: Yujun Wu, Dongxu Zhang, Xinchen Li, Jinhang Xu, Yiling Duan, Yumou Liu, Jiabao Pan, Xuanhe Zhou, Jingxuan Wei, Siyuan Li (possible past Tencent (China) affiliation), Jintao Chen, Conghui He (possible past Tsinghua University affiliation), Cheng Tan
Abstract

Existing research infrastructure is fundamentally document-centric, providing citation links between papers but lacking explicit representations of methodological evolution. In particular, it does not capture the structured relationships that explain how and why research methods emerge, adapt, and build upon one another. With the rise of AI-driven research agents as a new class of consumers of scientific knowledge, this limitation becomes increasingly consequential, as such agents cannot reliabl...

πŸ“„ Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows
πŸ—“οΈ Published: 4/30/2026
πŸ”— http://arxiv.org/abs/2604.28139v1
πŸ‘₯ Authors: Chenxin Li, Zhengyang Tang, Huangxin Lin, Yunlong Lin, Shijue Huang, Shengyuan Liu, Bowen Ye, Rang Li, Lei Li (possible past Carnegie Mellon University affiliation), Benyou Wang (possible past Tencent (China) affiliation), Yixuan Yuan
Abstract

LLM agents are expected to complete end-to-end units of work across software tools, business services, and local workspaces. Yet many agent benchmarks freeze a curated task set at release time and grade mainly the final response, making it difficult to evaluate agents against evolving workflow demand or verify whether a task was executed. We introduce Claw-Eval-Live, a live benchmark for workflow agents that separates a refreshable signal layer, updated across releases from public workflow-deman...

πŸ“„ Crab: A Semantics-Aware Checkpoint/Restore Runtime for Agent Sandboxes
πŸ—“οΈ Published: 4/30/2026
πŸ”— http://arxiv.org/abs/2604.28138v1
πŸ‘₯ Authors: Tianyuan Wu, Chaokun Chang, Lunxi Cao, Wei Gao (possible past Peking University affiliation), Wei Wang (possible past University Of Oxford affiliation)
Abstract

Autonomous agents act through sandboxed containers and microVMs whose state spans filesystems, processes, and runtime artifacts. Checkpoint and restore (C/R) of this state is needed for fault tolerance, spot execution, RL rollout branching, and safe rollback-yet existing approaches fall into two extremes: application-level recovery preserves chat history but misses OS-side effects, while full per-turn checkpointing is correct but too expensive under dense co-location. The root cause is an agent-...

πŸ“„ From Mirage to Grounding: Towards Reliable Multimodal Circuit-to-Verilog Code Generation
πŸ—“οΈ Published: 4/30/2026
πŸ”— http://arxiv.org/abs/2604.27969v1
πŸ‘₯ Authors: Guang Yang, Xing Hu (possible past Baidu (China) affiliation), Xiang Chen (possible past Tencent (China) affiliation), Xin Xi
Abstract

Multimodal large language models (MLLMs) are increasingly used to translate visual artifacts into code, from UI mockups into HTML to scientific plots into Python scripts. A circuit diagram can be viewed as a visual domain-specific language for hardware: it encodes timing, topology, and bit level semantics that are invisible to casual inspection yet safety critical once fabricated in silicon. Translating such diagrams into register-transfer-level(RTL) code therefore represents an extreme reliabil...

πŸ“„ CastFlow: Learning Role-Specialized Agentic Workflows for Time Series Forecasting
πŸ—“οΈ Published: 4/30/2026
πŸ”— http://arxiv.org/abs/2604.27840v1
πŸ‘₯ Authors: Bokai Pan, Mingyue Cheng, Zhiding Liu, Shuo Yu, Xiaoyu Tao, Yuchong Wu, Qi Liu (possible past Tencent (China) affiliation), Defu Lian, Enhong Chen (possible past Baidu (China) affiliation)
Abstract

Recently, large language models (LLMs) have shown great promise in time series forecasting. However, most existing LLM-based forecasting methods still follow a static generative paradigm that directly maps historical observations to future values in a single pass. Under this paradigm, forecasting is constrained by limited temporal pattern extraction, single-round acquisition of contextual features, one-shot forecast generation, and lack of support from ensemble forecasts. To address these limita...

πŸ“„ AgentEconomist: An End-to-end Agentic System Translating Economic Intuitions into Executable Computational Experiments
πŸ—“οΈ Published: 4/30/2026
πŸ”— http://arxiv.org/abs/2604.27725v1
πŸ‘₯ Authors: Jiaju Chen, Jinghua Piao, Xia Xu, Songwei Li, Tong Xia, Xiangnan He (possible past National University Of Singapore affiliation), Yong Li (possible past Tsinghua University affiliation)
Abstract

A long-standing challenge in economics lies not in the lack of intuition, but in the difficulty of translating intuitive insights into verifiable research. To address this challenge, we introduce AgentEconomist, an end-to-end interactive system designed to translate abstract intuitions into executable computational experiments. Grounded in a domain-specific knowledge base covering over 13,000 high-quality academic papers, the system employs a modular multi-stage architecture. Specifically, the I...

πŸ“„ Bridging Values and Behavior: A Hierarchical Framework for Proactive Embodied Agents
πŸ—“οΈ Published: 4/30/2026
πŸ”— http://arxiv.org/abs/2604.27699v1
πŸ‘₯ Authors: Chunhui Zhang, Yuxuan Wang (possible past Google (United States) affiliation), Aoyang Qin, Yi-Long Lu, Kunlun Wu, Yizhou Wang (possible past Peking University affiliation), Wei Wang (possible past University Of Oxford affiliation)
Abstract

Current embodied agents are often limited to passive instruction-following or reactive need-satisfaction, lacking a stable, high-order value framework essential for long-term, self-directed behavior and resolving motivational conflicts. We introduce \textit{ValuePlanner}, a hierarchical cognitive architecture that decouples high-level value scheduling from low-level action execution. \textit{ValuePlanner} employs an LLM-based cognitive module to generate symbolic subgoals by reasoning through ab...

πŸ“„ PRTS: A Primitive Reasoning and Tasking System via Contrastive Representations
πŸ—“οΈ Published: 4/30/2026
πŸ”— http://arxiv.org/abs/2604.27472v1
πŸ‘₯ Authors: Yang Zhang (possible past Tsinghua University affiliation), Jiangyuan Zhao, Chenyou Fan, Fangzheng Yan, Tian Li (possible past Carnegie Mellon University affiliation), Haitong Tang, Sen Fu, Xuan'er Wu, Qizhen Weng, Weinan Zhang (possible past Shanghai Jiao Tong University affiliation), Xiu Li (possible past Tsinghua University affiliation), Chi Zhang (possible past Peking University affiliation), Chenjia Bai, Xuelong Li (possible past Tencent (China) affiliation)
Abstract

Vision-Language-Action (VLA) models advance robotic control via strong visual-linguistic priors. However, existing VLAs predominantly frame pretraining as supervised behavior cloning, overlooking the fundamental nature of robot learning as a goal-reaching process that requires understanding temporal task progress. We present \textbf{PRTS} (\textbf{P}rimitive \textbf{R}easoning and \textbf{T}asking \textbf{S}ystem), a VLA foundation model that reformulates pretraining through Goal-Conditioned Rei...

πŸ“„ COHERENCE: Benchmarking Fine-Grained Image-Text Alignment in Interleaved Multimodal Contexts
πŸ—“οΈ Published: 4/30/2026
πŸ”— http://arxiv.org/abs/2604.27389v1
πŸ‘₯ Authors: Bingli Wang, Huanze Tang, Haijun Lv (possible past Baidu (China) affiliation), Zhishan Lin, Lixin Gu, Lei Feng, Qipeng Guo, Kai Chen (possible past Shanghai Jiao Tong University affiliation)
Abstract

In recent years, Multimodal Large Language Models (MLLMs) have achieved remarkable progress on a wide range of multimodal benchmarks. Despite these advances, most existing benchmarks mainly focus on single-image or multi-image comprehension. In real-world scenarios such as document reading, information is often presented as interleaved multimodel contexts. This requires MLLMs not only to recognize the content of individual images, but also to identify relevant textual and visual evidence, establ...

πŸ“„ When 2D Tasks Meet 1D Serialization: On Serialization Friction in Structured Tasks
πŸ—“οΈ Published: 4/29/2026
πŸ”— http://arxiv.org/abs/2604.27272v1
πŸ‘₯ Authors: Chung-Hsiang Lo, Lu Li, Diji Yang, Tianyu Zhang, Yunkai Zhang, Yoshua Bengio (possible past Mila - Quebec Artificial Intelligence Institute affiliation), Yi Zhang (possible past Google (United States) affiliation)
Abstract

Large language models (LLMs) conventionally process structured inputs as 1D token sequences. While natural for prose, such linearization may introduce additional representational burden for tasks whose computation depends directly on explicit 2D structure, because row--column alignment and local neighborhoods are no longer directly expressed in the input. We study this setting, which we refer to as serialization friction, on a small diagnostic testbed of synthetic tasks with explicit 2D structur...

πŸ“„ Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction
πŸ—“οΈ Published: 4/29/2026
πŸ”— http://arxiv.org/abs/2604.27221v1
πŸ‘₯ Authors: Yuxuan Huang, Yihang Chen, Zhiyuan He, Yuxiang Chen, Ka Yiu Lee, Huichi Zhou, Weilin Luo, Meng Fang (possible past Tencent (China) affiliation), Jun Wang (possible past Tencent (China) affiliation)
Abstract

Agentic web search increasingly faces two distinct demands: deep reasoning over a single target, and structured aggregation across many entities and heterogeneous sources. Current systems struggle on both fronts. Breadth-oriented tasks demand schema-aligned outputs with wide coverage and cross-entity consistency, while depth-oriented tasks require coherent reasoning over long, branching search trajectories. We introduce \textbf{Web2BigTable}, a multi-agent framework for web-to-table search that ...

πŸ“„ Cost-Aware Learning
πŸ—“οΈ Published: 4/30/2026
πŸ”— http://arxiv.org/abs/2604.28020v1
πŸ‘₯ Authors: Clara Mohri, Amir Globerson (possible past Google (United States) affiliation), Haim Kaplan, Tomer Koren, Yishay Mansour (possible past Google (United States) affiliation)
Abstract

We consider the problem of Cost-Aware Learning, where sampling different component functions of a finite-sum objective incurs different costs. The objective is to reach a target error while minimizing the total cost. First, we propose the Cost-Aware Stochastic Gradient Descent algorithm for convex functions, and derive its cost complexity to attain an error of $Ξ΅$. Furthermore, we establish a lower bound for this setting and provide a subset selection algorithm to further reduce the cost of trai...

πŸ“„ Beyond the Baseband: Adaptive Multi-Band Encoding for Full-Spectrum Bioacoustics Classification
πŸ—“οΈ Published: 4/30/2026
πŸ”— http://arxiv.org/abs/2604.27936v1
πŸ‘₯ Authors: Eklavya Sarkar, Marius Miron, David Robinson, Gagan Narula, Milad Alizadeh, Ellen Gilsenan-Mcmahon, Emmanuel Chemla, Olivier Pietquin (possible past Google (United States) affiliation), Matthieu Geist (possible past Google (United States) affiliation)
Abstract

Animals hear and vocalize across frequency ranges that differ substantially from humans, often extending into the ultrasonic domain. Yet most computational bioacoustics systems rely on audio models pre-trained at 16 kHz, restricting their usable bandwidth to the 0-8 kHz baseband and discarding higher-frequency information present in many bioacoustic recordings. We investigate a multi-band encoding framework that decomposes the full spectrum of animal calls into band features and fuses them into ...

πŸ“„ Optimized Deferral for Imbalanced Settings
πŸ—“οΈ Published: 4/30/2026
πŸ”— http://arxiv.org/abs/2604.27723v1
πŸ‘₯ Authors: Corinna Cortes (possible past Google (United States) affiliation), Anqi Mao, Mehryar Mohri (possible past Google (United States) affiliation), Yutao Zhong
Abstract

Learning algorithms can be significantly improved by routing complex or uncertain inputs to specialized experts, balancing accuracy with computational cost. This approach, known as learning to defer, is essential in domains like natural language generation, medical diagnosis, and computer vision, where an effective deferral can reduce errors at low extra resource consumption. However, the two-stage learning to defer setting, which leverages existing predictors such as a collection of LLMs or oth...

πŸ“„ Low Rank Adaptation for Adversarial Perturbation
πŸ—“οΈ Published: 4/30/2026
πŸ”— http://arxiv.org/abs/2604.27487v1
πŸ‘₯ Authors: Han Liu (possible past Tsinghua University affiliation), Shanghao Shi, Yevgeniy Vorobeychik, Chongjie Zhang, Ning Zhang (possible past University Of California, Berkeley affiliation)
Abstract

Low-Rank Adaptation (LoRA), which leverages the insight that model updates typically reside in a low-dimensional space, has significantly improved the training efficiency of Large Language Models (LLMs) by updating neural network layers using low-rank matrices. Since the generation of adversarial examples is an optimization process analogous to model training, this naturally raises the question: Do adversarial perturbations exhibit a similar low-rank structure? In this paper, we provide both t...

πŸ“„ Distributional Alignment Games for Answer-Level Fine-Tuning
πŸ—“οΈ Published: 4/29/2026
πŸ”— http://arxiv.org/abs/2604.27166v1
πŸ‘₯ Authors: Mehryar Mohri (possible past Google (United States) affiliation), Jon Schneider, Yifan Wu (possible past Carnegie Mellon University affiliation)
Abstract

We focus on the problem of \emph{Answer-Level Fine-Tuning} (ALFT), where the goal is to optimize a language model based on the correctness or properties of its final answers, rather than the specific reasoning traces used to produce them. Directly optimizing answer-level objectives is computationally intractable due to the need to marginalize over the vast space of latent reasoning paths. To overcome this, we propose a general game-theoretical framework that lifts the problem to a \emph{Distribu...

πŸ“„ Unifying Sparse Attention with Hierarchical Memory for Scalable Long-Context LLM Serving
πŸ—“οΈ Published: 4/29/2026
πŸ”— http://arxiv.org/abs/2604.26837v1
πŸ‘₯ Authors: Zihan Zhao, Baotong Lu, Shengjie Lin, Yizou Chen, Jing Liu (possible past Baidu (China) affiliation), Yanqi Zhang, Ziming Miao, Ming-Chang Yang, Haiying Shen, Qi Chen (possible past Baidu (China) affiliation), Fan Yang (possible past Tencent (China) affiliation)
Abstract

Long-context LLM serving is bottlenecked by the cost of attending over ever-growing KV caches. Dynamic sparse attention promises relief by accessing only a small, query-dependent subset of the KV state per decoding step and extending the KV storage to CPU memory. In practice, however, these algorithmic savings rarely translate into end-to-end system-level gains because sparse methods typically operate at different granularities and thus rely on ad hoc, per-algorithm implementations. At the same ...

πŸ“„ Quantum Feature Selection with Higher-Order Binary Optimization on Trapped-Ion Hardware
πŸ—“οΈ Published: 4/29/2026
πŸ”— http://arxiv.org/abs/2604.26834v1
πŸ‘₯ Authors: Carlos Flores-GarrigΓ³s, Anton Simen, Qi Zhang (possible past Tencent (China) affiliation), Enrique Solano, Narendra N. Hegade, Sayonee Ray, Claudio Girotto, Jason Iaconis, Martin Roetteler (possible past Microsoft (United States) affiliation)
Abstract

We present a quantum feature-selection framework based on a higher-order unconstrained binary optimization (HUBO) formulation that explicitly incorporates multivariate dependencies beyond standard quadratic encodings. In contrast to QUBO-based approaches, the proposed model includes one-, two-, and three-body interaction terms derived from mutual-information measures, enabling the objective function to capture feature relevance, pairwise redundancy, and higher-order statistical structure within ...

πŸ“„ CurEvo: Curriculum-Guided Self-Evolution for Video Understanding
πŸ—“οΈ Published: 4/29/2026
πŸ”— http://arxiv.org/abs/2604.26707v1
πŸ‘₯ Authors: Guiyi Zeng, Junqing Yu, Yi-Ping Phoebe Chen, Xu Chen (possible past Tencent (China) affiliation), Wei Yang (possible past Tencent (China) affiliation), Zikai Song
Abstract

Recent advances in self-evolution video understanding frameworks have demonstrated the potential of autonomous learning without human annotations. However, existing methods often suffer from weakly controlled optimization and uncontrolled difficulty progression, as they lack structured guidance throughout the iterative learning process. To address these limitations, we propose CurEvo, a curriculum-guided self-evolution framework that introduces curriculum learning into self-evolution to achieve ...

πŸ“„ SciHorizon-DataEVA: An Agentic System for AI-Readiness Evaluation of Heterogeneous Scientific Data
πŸ—“οΈ Published: 4/29/2026
πŸ”— http://arxiv.org/abs/2604.26645v1
πŸ‘₯ Authors: Dianyu Liu, Chuan Qin (possible past Baidu (China) affiliation), Xi Chen (possible past University Of California, Berkeley affiliation), Xiaohan Li, Wenxi Xu, Yuyang Wang, Xin Chen (possible past Tencent (China) affiliation), Yuanchun Zhou, Hengshu Zhu (possible past Baidu (China) affiliation)
Abstract

AI-for-Science (AI4Science) is increasingly transforming scientific discovery by embedding machine learning models into prediction, simulation, and hypothesis generation workflows across domains. However, the effectiveness of these models is fundamentally constrained by the AI-readiness of scientific data, for which no scalable and systematic evaluation mechanism currently exists. In this work, we propose SciHorizon-DataEVA, a novel agentic system to scalable AI-readiness evaluation of heterogen...

*Notable papers are those with at least two authors from a "big" AI/ML lab.