πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising
πŸ—“οΈ Published: 4/29/2026
πŸ”— http://arxiv.org/abs/2604.26694v1
πŸ‘₯ Authors: Jun Guo, Qiwei Li (possible past Tsinghua University affiliation), Peiyan Li, Zilong Chen, Nan Sun, Yifei Su, Heyun Wang, Yuan Zhang (possible past Google (United States) affiliation), Xinghang Li, Huaping Liu (possible past Tsinghua University affiliation)
Abstract

We propose X-WAM, a Unified 4D World Model that unifies real-time robotic action execution and high-fidelity 4D world synthesis (video + 3D reconstruction) in a single framework, addressing the critical limitations of prior unified world models (e.g., UWM) that only model 2D pixel-space and fail to balance action efficiency and world modeling quality. To leverage the strong visual priors of pretrained video diffusion models, X-WAM imagines the future world by predicting multi-view RGB-D videos, ...

πŸ“„ SciHorizon-DataEVA: An Agentic System for AI-Readiness Evaluation of Heterogeneous Scientific Data
πŸ—“οΈ Published: 4/29/2026
πŸ”— http://arxiv.org/abs/2604.26645v1
πŸ‘₯ Authors: Dianyu Liu, Chuan Qin (possible past Baidu (China) affiliation), Xi Chen (possible past University Of California, Berkeley affiliation), Xiaohan Li, Wenxi Xu, Yuyang Wang, Xin Chen (possible past Tencent (China) affiliation), Yuanchun Zhou, Hengshu Zhu (possible past Baidu (China) affiliation)
Abstract

AI-for-Science (AI4Science) is increasingly transforming scientific discovery by embedding machine learning models into prediction, simulation, and hypothesis generation workflows across domains. However, the effectiveness of these models is fundamentally constrained by the AI-readiness of scientific data, for which no scalable and systematic evaluation mechanism currently exists. In this work, we propose SciHorizon-DataEVA, a novel agentic system to scalable AI-readiness evaluation of heterogen...

πŸ“„ TimeMM: Time-as-Operator Spectral Filtering for Dynamic Multimodal Recommendation
πŸ—“οΈ Published: 4/29/2026
πŸ”— http://arxiv.org/abs/2604.26247v1
πŸ‘₯ Authors: Wei Yang (possible past Tencent (China) affiliation), Rui Zhong, Zihan Lin, Xiaodan Wang, Cheng Chen (possible past Google (United States) affiliation), Huan Ren, Yao Hu
Abstract

Multimodal recommendation improves user modeling by integrating collaborative signals with heterogeneous item content. In real applications, user interests evolve over time and exhibit nonstationary dynamics, where different preference factors change at different rates. This challenge is amplified in multimodal settings because visual and textual cues can dominate decisions under different temporal regimes. Despite strong progress, most multimodal recommenders still rely on static interaction gr...

πŸ“„ AMMA: A Multi-Chiplet Memory-Centric Architecture for Low-Latency 1M Context Attention Serving
πŸ—“οΈ Published: 4/28/2026
πŸ”— http://arxiv.org/abs/2604.26103v1
πŸ‘₯ Authors: Zhongkai Yu, Haotian Ye (possible past Peking University affiliation), Chenyang Zhou, Ohm Rishabh Venkatachalam, Zaifeng Pan, Zhengding Hu, Junsung Kim (possible past Carnegie Mellon University affiliation), Won Woo Ro, Po-An Tsai, Shuyi Pei, Yangwook Kang, Yufei Ding
Abstract

All current LLM serving systems place the GPU at the center, from production-level attention-FFN disaggregation to NVIDIA's Rubin GPU-LPU heterogeneous platform. Even academic PIM/PNM proposals still treat the GPU as the central hub for cross-device communication. Yet the GPU's compute-rich architecture is fundamentally mismatched with the memory-bound nature of decode-phase attention, inflating serving latency while wasting power and die area on idle compute units. The problem is compounded as ...

πŸ“„ Recursive Multi-Agent Systems
πŸ—“οΈ Published: 4/28/2026
πŸ”— http://arxiv.org/abs/2604.25917v1
πŸ‘₯ Authors: Xiyuan Yang, Jiaru Zou, Rui Pan, Ruizhong Qiu, Pan Lu (possible past Baidu (China) affiliation), Shizhe Diao, Jindong Jiang, Hanghang Tong (possible past Ibm (United States) affiliation), Tong Zhang (possible past Tencent (China) affiliation), Markus J. Buehler, Jingrui He, James Zou
Abstract

Recursive or looped language models have recently emerged as a new scaling axis by iteratively refining the same model computation over latent states to deepen reasoning. We extend such scaling principle from a single model to multi-agent systems, and ask: Can agent collaboration itself be scaled through recursion? To this end, we introduce RecursiveMAS, a recursive multi-agent framework that casts the entire system as a unified latent-space recursive computation. RecursiveMAS connects heterogen...

πŸ“„ Action-Aware Generative Sequence Modeling for Short Video Recommendation
πŸ—“οΈ Published: 4/28/2026
πŸ”— http://arxiv.org/abs/2604.25834v1
πŸ‘₯ Authors: Wenhao Li, Zihan Lin, Zhengxiao Guo, Jie Zhou (possible past Tsinghua University affiliation), Shukai Liu (possible past Tencent (China) affiliation), Yongqi Liu, Chuan Luo, Chaoyi Ma, Ruiming Tang (possible past Huawei Technologies (China) affiliation), Han Li
Abstract

With the rapid development of the Internet, users have increasingly higher expectations for the recommendation accuracy of online content consumption platforms. However, short videos often contain diverse segments, and users may not hold the same attitude toward all of them. Traditional binary-classification recommendation models, which treat a video as a single holistic entity, face limitations in accurately capturing such nuanced preferences. Considering that user consumption is a temporal pro...

πŸ“„ MAIC-UI: Making Interactive Courseware with Generative UI
πŸ—“οΈ Published: 4/28/2026
πŸ”— http://arxiv.org/abs/2604.25806v1
πŸ‘₯ Authors: Shangqing Tu, Yanjia Li, Keyu Chen, Sichen Zhang, Jifan Yu, Daniel Zhang-Li, Lei Hou (possible past Tsinghua University affiliation), Juanzi Li, Yu Zhang (possible past Google (United States) affiliation), Huiqin Liu
Abstract

Creating interactive STEM courseware traditionally requires HTML/CSS/JavaScript expertise, leaving barriers for educators. While generative AI can produce HTML codes, existing tools generate static presentations rather than interactive simulations, struggle with long documents, and lack pedagogical accuracy mechanisms. Furthermore, full regeneration for modifications requires 200--600 seconds, disrupting creative flow. We present MAIC-UI, a zero-code authoring system that enables educators to cr...

πŸ“„ Learning Generalizable Multimodal Representations for Software Vulnerability Detection
πŸ—“οΈ Published: 4/28/2026
πŸ”— http://arxiv.org/abs/2604.25711v1
πŸ‘₯ Authors: Zeming Dong, Yuejun Guo, Qiang Hu, Yao Zhang (possible past Tsinghua University affiliation), Maxime Cordy, Hao Liu (possible past Tencent (China) affiliation), Mike Papadakis, Yongqiang Lyu
Abstract

Source code and its accompanying comments are complementary yet naturally aligned modalities-code encodes structural logic while comments capture developer intent. However, existing vulnerability detection methods mostly rely on single-modality code representations, overlooking the complementary semantic information embedded in comments and thus limiting their generalization across complex code structures and logical relationships. To address this, we propose MultiVul, a multimodal contrastive f...

πŸ“„ LLM-ReSum: A Framework for LLM Reflective Summarization through Self-Evaluation
πŸ—“οΈ Published: 4/28/2026
πŸ”— http://arxiv.org/abs/2604.25665v1
πŸ‘₯ Authors: Huyen Nguyen (possible past Nvidia (United States) affiliation), Haoxuan Zhang, Yang Zhang (possible past Tsinghua University affiliation), Junhua Ding, Haihua Chen
Abstract

Reliable evaluation of large language model (LLM)-generated summaries remains an open challenge, particularly across heterogeneous domains and document lengths. We conduct a comprehensive meta-evaluation of 14 automatic summarization metrics and LLM-based evaluators across seven datasets spanning five domains, covering documents from short news articles to long scientific, governmental, and legal texts (2K-27K words) with over 1,500 human-annotated summaries. Our results show that traditional le...

πŸ“„ Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling
πŸ—“οΈ Published: 4/28/2026
πŸ”— http://arxiv.org/abs/2604.25578v1
πŸ‘₯ Authors: Fan Jiang (possible past Shanghai Jiao Tong University affiliation), Yu Zhao (possible past Tencent (China) affiliation), Chenyang Lyu, Tianqi Shi, Yichao Du, Feihu Jiang, Longyue Wang (possible past Tencent (China) affiliation), Weihua Luo
Abstract

We present Marco-MoE, a suite of fully open multilingual sparse Mixture-of-Experts (MoE) models. Marco-MoE features a highly sparse design in which only around 5\% of the total parameters are activated per input token. This extreme sparsity, combined with upcycling from dense models, enables efficient pre-training on 5T tokens. Our models surpass similarly-sized competitors on English and multilingual benchmarks, achieving a best-in-class performance-to-compute ratio. We further post-train these...

πŸ“„ SciEval: A Benchmark for Automatic Evaluation of K-12 Science Instructional Materials
πŸ—“οΈ Published: 4/28/2026
πŸ”— http://arxiv.org/abs/2604.25472v1
πŸ‘₯ Authors: Zhaohui Li, Peng He (possible past Tencent (China) affiliation), Zhiyuan Chen (possible past Google (United States) affiliation), Honglu Liu, Zeyuan Wang, Tingting Li, Jinjun Xiong
Abstract

The need to evaluate instructional materials for K-12 science education has become increasingly important, as more educators use generative AI to create instructional materials. However, the review of instructional materials is time-consuming, expertise-intensive, and difficult to scale, motivating interest in automated evaluation approaches. While large language models (LLMs) have shown strong performance on general evaluation tasks, their performance and reliability on instructional materials ...

πŸ“„ AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery
πŸ—“οΈ Published: 4/28/2026
πŸ”— http://arxiv.org/abs/2604.25256v1
πŸ‘₯ Authors: Lei Xiong, Kun Luo, Ziyi Xia, Wenbo Zhang, Jin-Ge Yao, Zheng Liu, Jingying Shao, Jianlyu Chen, Hongjin Qian, Xi Yang, Qian Yu (possible past Google (United States) affiliation), Hao Li (possible past Tsinghua University affiliation), Chen Yue, Xiaan Du, Yuyang Wang, Yesheng Liu, Haiyu Xu, Zhicheng Dou
Abstract

Autonomous scientific research is significantly advanced thanks to the development of AI agents. One key step in this process is finding the right scientific literature, whether to explore existing knowledge for a research problem, or to acquire evidence for verifying assumptions and supporting claims. To assess AI agents' capability in driving this process, we present AutoResearchBench, a dedicated benchmark for autonomous scientific literature discovery. AutoResearchBench consists of two compl...

πŸ“„ Unifying Sparse Attention with Hierarchical Memory for Scalable Long-Context LLM Serving
πŸ—“οΈ Published: 4/29/2026
πŸ”— http://arxiv.org/abs/2604.26837v1
πŸ‘₯ Authors: Zihan Zhao, Baotong Lu, Shengjie Lin, Yizou Chen, Jing Liu (possible past Baidu (China) affiliation), Yanqi Zhang, Ziming Miao, Ming-Chang Yang, Haiying Shen, Qi Chen (possible past Baidu (China) affiliation), Fan Yang (possible past Tencent (China) affiliation)
Abstract

Long-context LLM serving is bottlenecked by the cost of attending over ever-growing KV caches. Dynamic sparse attention promises relief by accessing only a small, query-dependent subset of the KV state per decoding step and extending the KV storage to CPU memory. In practice, however, these algorithmic savings rarely translate into end-to-end system-level gains because sparse methods typically operate at different granularities and thus rely on ad hoc, per-algorithm implementations. At the same ...

πŸ“„ Quantum Feature Selection with Higher-Order Binary Optimization on Trapped-Ion Hardware
πŸ—“οΈ Published: 4/29/2026
πŸ”— http://arxiv.org/abs/2604.26834v1
πŸ‘₯ Authors: Carlos Flores-GarrigΓ³s, Anton Simen, Qi Zhang (possible past Tencent (China) affiliation), Enrique Solano, Narendra N. Hegade, Sayonee Ray, Claudio Girotto, Jason Iaconis, Martin Roetteler (possible past Microsoft (United States) affiliation)
Abstract

We present a quantum feature-selection framework based on a higher-order unconstrained binary optimization (HUBO) formulation that explicitly incorporates multivariate dependencies beyond standard quadratic encodings. In contrast to QUBO-based approaches, the proposed model includes one-, two-, and three-body interaction terms derived from mutual-information measures, enabling the objective function to capture feature relevance, pairwise redundancy, and higher-order statistical structure within ...

πŸ“„ CurEvo: Curriculum-Guided Self-Evolution for Video Understanding
πŸ—“οΈ Published: 4/29/2026
πŸ”— http://arxiv.org/abs/2604.26707v1
πŸ‘₯ Authors: Guiyi Zeng, Junqing Yu, Yi-Ping Phoebe Chen, Xu Chen (possible past Tencent (China) affiliation), Wei Yang (possible past Tencent (China) affiliation), Zikai Song
Abstract

Recent advances in self-evolution video understanding frameworks have demonstrated the potential of autonomous learning without human annotations. However, existing methods often suffer from weakly controlled optimization and uncontrolled difficulty progression, as they lack structured guidance throughout the iterative learning process. To address these limitations, we propose CurEvo, a curriculum-guided self-evolution framework that introduces curriculum learning into self-evolution to achieve ...

πŸ“„ Understanding DNNs in Feature Interaction Models: A Dimensional Collapse Perspective
πŸ—“οΈ Published: 4/29/2026
πŸ”— http://arxiv.org/abs/2604.26489v1
πŸ‘₯ Authors: Jiancheng Wang, Mingjia Yin, Hao Wang (possible past Tsinghua University affiliation), Enhong Chen (possible past Baidu (China) affiliation)
Abstract

DNNs have gained widespread adoption in feature interaction recommendation models. However, there has been a longstanding debate on their roles. On one hand, some works claim that DNNs possess the ability to implicitly capture high-order feature interactions. Conversely, recent studies have highlighted the limitations of DNNs in effectively learning dot products, specifically second-order interactions, let alone higher-order interactions. In this paper, we present a novel perspective to understa...

*Notable papers are those with at least two authors from a "big" AI/ML lab.