πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ GPIC: A Giant Permissive Image Corpus for Visual Generation
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.30341v1
πŸ‘₯ Authors: Keshigeyan Chandrasegaran, Kyle Sargent, Suchir Agarwal, Michael Jang, Michael Poli, Juan Carlos Niebles (possible past Stanford University affiliation), Justin Johnson (possible past Stanford University affiliation), Jiajun Wu (possible past Massachusetts Institute Of Technology affiliation), Li Fei-Fei (possible past Stanford University affiliation)
Abstract

Studying scalable methods for visual generative modeling requires large, accessible, and stable datasets. We introduce GPIC, a Giant Permissive Image Corpus of approximately 28 trillion pixels. GPIC comprises diverse internet images captioned by a state-of-the-art vision-language model, including 100M training, 200K validation, and 1M test examples. Moreover, all GPIC images are permissively licensed for both research and commercial use. GPIC is safety-filtered, deduplicated, and centrally hoste...

πŸ“„ Demystifying Data Organization for Enhanced LLM Training
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.30334v1
πŸ‘₯ Authors: Yalun Dai, Yangyu Huang, Tongshen Yang, Yonghan Wang, Xin Zhang (possible past Google (United States) affiliation), Wenshan Wu, Qihao Zhao, Hao Li (possible past Tsinghua University affiliation), Yuanyuan Gao, Kim-Hui Yap, Scarlett Li
Abstract

Large Language Models (LLMs) have revolutionized various fields, yet their training efficiency is heavily reliant on effective data curation. While data selection has been widely studied, the strategic data organization for enhanced training remains an underexplored area, particularly since current LLMs are often trained for only one or a few epochs. This paper systematically explores the influence of data organization on LLM training by reusing pre-computed sample-level scores originally genera...

πŸ“„ RoboWits: Unexpected Challenges for Robotic Creative Problem Solving
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.30326v1
πŸ‘₯ Authors: Chunru Lin, Hongxin Zhang, Fenghao Yu, Zhehuan Chen, Thomas L. Griffiths (possible past University Of California, Berkeley affiliation), Yejin Choi (possible past Allen Institute For Artificial Intelligence affiliation), David Held (possible past University Of California, Berkeley affiliation), Chuang Gan (possible past Tsinghua University affiliation)
Abstract

The ability to reason, adapt, and creatively solve problems under unexpected challenges is essential for robots operating in real-world environments. However, current robotic benchmarks primarily emphasize skill-level execution and provide limited insight into such cognitive reasoning capabilities. We introduce RoboWits, a bi-manual robotic benchmark designed to systematically evaluate cognitive reasoning, creative tool use, and robustness to unexpected conditions. To enable scalable constructio...

πŸ“„ Archon: A Unified Multimodal Model for Holistic Digital Human Generation
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.30311v1
πŸ‘₯ Authors: Chong Bao, Shichen Liu, Lijun Yu, David Futschik, Stylianos Moschoglou, Shefali Srivastava, Ziqian Bai, Feitong Tan, Guofeng Zhang, Zhaopeng Cui (possible past Eth Zurich affiliation), Sean Fanello (possible past Google (United States) affiliation), Yinda Zhang (possible past Google (United States) affiliation)
Abstract

Digital humans are fundamental to immersive interaction, yet creating a unified model for holistic modalities, including text, audio, motion, and visual content, remains an open challenge. In this paper, we present Archon, a fully pretrained, human-centric unified multimodal model for holistic avatar generation. Archon unifies seven modalities with modality-specific tokenizers, and a native autoregressive unified multimodal model pretrained on synchronized modalities and 72 diverse tasks to mode...

πŸ“„ MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.30288v1
πŸ‘₯ Authors: Haowen Wang, Yaxin Du, Jian Yang, Jiajun Wu (possible past Massachusetts Institute Of Technology affiliation), Shukai Liu (possible past Tencent (China) affiliation), Yuxuan Zhang, Pingjie Wang, Siheng Chen, Tuney Zheng, Ming Zhou, Xianglong Liu
Abstract

Mid-training has become an important stage in modern LLM development, using large-scale curated mixtures to strengthen capabilities before final post-training. Its data selection problem is distinct: the data are optimized under a pretraining-style objective at near-pretraining scale, but are curated toward downstream capabilities and drawn from heterogeneous sources with different formats and training roles. As a result, effective selection requires both scalability and source-adaptive semantic...

πŸ“„ Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.30280v1
πŸ‘₯ Authors: Qiuyue Wang, Mingsheng Li, Jian Guan, Jinhui Ye, Sicheng Xie, Yitao Liu, Junhao Chen, Zhixuan Liang, Jie Zhang, Xintong Hu, Xuhong Huang, Pei Lin, Junyang Lin, Dayiheng Liu, Shuai Bai, Jingren Zhou, Jiazhao Zhang, Haoqi Yuan, Gengze Zhou, Hang Yin, Ye Wang, Yiyang Huang, Zixing Lei, Wujian Peng, Delin Chen, Yingming Zheng, Jingyang Fan, Xianwei Zhuang, Xin Zhou (possible past Stanford University affiliation), Haoyang Li, Anzhe Chen, Tong Zhang (possible past Tencent (China) affiliation), Xuejing Liu, Yuchong Sun, Ruizhe Chen, Zhaohai Li, Chenxu LΓΌ, Zhibo Yang, Tao Yu (possible past University Of Washington affiliation), Xionghui Chen
Abstract

Embodied intelligence is often studied through specialized models for individual tasks such as manipulation or navigation, resulting in fragmented capabilities and limited generalization across tasks, environments, and robot embodiments. In this work, we study whether heterogeneous embodied decision-making problems can be unified within a single vision-language-action model. We present Qwen-VLA, a unified embodied foundation model that extends Qwen's vision-language modeling stack from perceptio...

πŸ“„ Loong: A Human-Like Long Document Translation Agent with Observe-and-Act Adaptive Context Selection
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.30274v1
πŸ‘₯ Authors: Yutong Wang, Xuebo Liu, Derek F. Wong (possible past Tencent (China) affiliation), Zhilin Li, Rongqing Jiang, Min Zhang (possible past Tsinghua University affiliation), Shimin Tao, Daimeng Wei, Min Zhang (possible past Tsinghua University affiliation)
Abstract

Document-level translation remains one of the most challenging tasks for large language models, which are constrained by limited context windows that impede global cohesion, while simultaneously suffering from redundant contextual information that degrades translation quality. To address this, we propose a human-like long document translation agent called Loong, which leverages a 3E memory module (Essence-Exemplar-Entity) to store summaries, sentence pairs, and entity records as historical conte...

πŸ“„ Do Language Models Track Entities Across State Changes?
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.30233v1
πŸ‘₯ Authors: Zilu Tang, Qiao Zhao, Gabriel Franco, Derry Wijaya (possible past Carnegie Mellon University affiliation), Aaron Mueller, Sebastian Schuster (possible past Stanford University affiliation), Najoung Kim (possible past Google (United States) affiliation)
Abstract

Entity tracking (ET), the ability to keep track of states, is a fundamental skill that underlies complex reasoning. An increasing amount of work investigates how transformer language models (LMs) solve entity binding $\textit{without}$ state changes. However, there is limited understanding of how non-toy LMs address ET problems of realistic difficulties expressed in natural language. To this end, we investigate the mechanisms underlying ET in more complex scenarios featuring multiple state-chang...

πŸ“„ iLoRA: Bayesian Low-Rank Adaptation with Latent Interaction Graphs for Microbiome Diagnosis
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.30179v1
πŸ‘₯ Authors: Yang Song (possible past Stanford University affiliation), Yixuan Zhang, Lingfa Meng, Tongyuan Hu, Haizhou Shi, Hao Wang (possible past Tsinghua University affiliation), Samir Bhatt, Hengguan Huang
Abstract

Parameter-efficient adaptation has made LLMs practical for domain prediction, but standard LoRA still relies on a static low-rank update and does not expose the latent interactions that often drive scientific labels. We introduce iLoRA. To our knowledge, it is the first Bayesian graph-conditioned LoRA framework. It infers a latent interaction graph from the input and uses it to generate input-conditioned LoRA updates. As a result, iLoRA learns prediction and latent interaction structure jointly,...

πŸ“„ Cookie-Bench: Continuous On-screen Key Interaction Evaluation for Web Generation
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.30000v1
πŸ‘₯ Authors: Haoyue Yang, Zhangxiao Shen, Fan Ding, Hangting Lou, Yifeng Kou, Haoqing Yu, Jingyao Li, Zhengfan Wu, Siqi Bao (possible past Baidu (China) affiliation), Jing Liu (possible past Baidu (China) affiliation), Hua Wu (possible past Baidu (China) affiliation)
Abstract

Front-end web code has become a core product surface for every frontier LLM release, yet evaluating these interactive applications at development speed remains costly because human-judged leaderboards like Arena do not scale. Existing automated proxies typically lean on reference implementations, test suites, or rigid checklists, and tend to miss the reasoned synthesis a human reviewer performs over a live session. We articulate a new evaluation regime that is simultaneously reference-free, auto...

πŸ“„ Compass: Navigating Global Marine Lead Data Integration through Expert-Guided LLM Agent
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.29966v1
πŸ‘₯ Authors: Yiming Liu, Bin Lu, Meng Jin, Ziyuan Sang, Shuo Jiang, Lei Zhou (possible past Apple (United States) affiliation), Xinbing Wang, Chenghu Zhou, Jing Zhang (possible past University Of Washington affiliation)
Abstract

Marine lead (Pb) and its isotopes are critical tracers for ocean circulation and anthropogenic pollution, yet in-situ observations remain costly and sparse. While vast historical records exist, they lie buried within the unstructured content of academic papers, creating "data silos" inaccessible to comprehensive analysis. Manual extraction is unscalable, while general-purpose Large Language Models (LLMs) lack the necessary domain-specific knowledge, leading to hallucinations and scientifically i...

πŸ“„ HoliTok:A Coutinuous Holistic Tokenization with Robust Dual Capabilities of Speech Generation and Understanding
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.29948v1
πŸ‘₯ Authors: Bohan Li (possible past Google (United States) affiliation), Shi Lian, Hankun Wang, Yiwei Guo, Yu Xi, Zhihan Li (possible past Tsinghua University affiliation), Da Zheng, Colin Zhang, Kai Yu (possible past Baidu (China) affiliation)
Abstract

Unified speech foundation models require a holistic tokenization space that is both learnable by language models and decodable into high-quality waveforms. Existing speech tokenizers, however, often fail to satisfy these requirements simultaneously, leading to increased architectural complexity and more involved training designs. We propose HoliTok, a continuous Holistic speech Tokenization model designed for unified generation-understanding modeling. HoliTok encodes 48~kHz speech into a compact...

πŸ“„ OmniMatBench: A Human-Calibrated Multimodal Reasoning Benchmark Across 19 Materials Science Subfields
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.29833v1
πŸ‘₯ Authors: Wanhao Liu, Jiaqing Xie, Qian Tan, Weida Wang, Jue Wang (possible past Tencent (China) affiliation), Ran Sun, Zhuo Yang, Wanli Ouyang, Lei Bai, Tianfan Fu, Lu Chen, Xin Chen (possible past Tencent (China) affiliation), Yuqiang Li
Abstract

As multimodal language models play an increasingly important role in scientific research, materials science offers a critical testbed due to its interdisciplinary, multimodal, and application-driven nature. However, existing materials benchmarks mainly focus on property prediction, knowledge QA, or characterization understanding, leaving the broader reasoning process from materials knowledge to application underexplored. To fill this gap, we present OmniMatBench, a human-calibrated multimodal re...

πŸ“„ AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.29801v1
πŸ‘₯ Authors: Dongrui Liu, Yu Li (possible past Tencent (China) affiliation), Zhonghao Yang, Peng Wang (possible past Peking University affiliation), Guanxu Chen, Yuejin Xie, Qinghua Mao, Wanying Qu, Yanxu Zhu, Tianyi Zhou (possible past University Of Washington affiliation), Leitao Yuan, Zhijie Zheng, Qihao Lin, Yimin Wang, Haoyu Luo, Shuai Shao, Chen Qian (possible past Shanghai Jiao Tong University affiliation), Qingyu Liu, Ling Tang, Ruiyang Qin, Qihan Ren, Junxiao Yang, Kun Wang, Zhiheng Xi, Linfeng Zhang, Ranjie Duan, Bo Zhang (possible past Tencent (China) affiliation), Wenjie Wang, Wen Shen, Qiaosheng Zhang, Yan Teng, Chaochao Lu, Rui Mei, Man Li, Jialing Tao, Xi Lin, Tianhang Zheng, Yong Liu, Quanshi Zhang, Lei Zhu, Xingjun Ma, Junhua Liu, Hui Xue, Xiaoxiang Zuo, Xiangnan He (possible past National University Of Singapore affiliation), Chao Shen, Xianglong Liu, Minlie Huang, Jing Shao, Xia Hu
Abstract

Modern open-world agents such as OpenClaw exhibit powerful cross-environment execution capabilities yet introduce broad new safety risk sources. Meanwhile, advanced frontier AI models drastically lower attack barriers, rendering current agent alignment frameworks inadequate for real-world deployment. To tackle these emerging threats, we propose a lightweight and scalable agent safety alignment framework. Specifically, we update the agent safety taxonomy to accommodate emergent risks from Codex a...

πŸ“„ Evolve as a Team: Collaborative Self-Evolution for LLM-based Multi-Agent Systems
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.29790v1
πŸ‘₯ Authors: Zhezheng Hao, Tianfu Wang, Huanshuo Dong, Ziyan Liu, Hong Wang, Xiankun Lin, Qiang Lin, Can Wang (possible past Tsinghua University affiliation), Hande Dong, Jiawei Chen (possible past Tencent (China) affiliation)
Abstract

LLM-based multi-agent systems (MAS) have emerged as an effective paradigm for complex and long-horizon tasks. However, in real-world tasks, MAS often exhibit various failures during execution and such failures are difficult to eliminate during design. This motivates experience-driven MAS evolution, where a system improves based on its own execution experience. Yet such evolution is challenging because MAS experience is prolonged and intricate, interleaving multiple agents' execution chains and c...

πŸ“„ EviLink: Multi-Path Schema Linking with Uncertainty-Guided Evidence Acquisition for Large-Scale Text-to-SQL
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.29670v1
πŸ‘₯ Authors: Huawei Zheng, Sen Yang (possible past Tencent (China) affiliation), Zhaorui Yang, Yuhui Zhang, Haozhe Feng, Haoxuan Li, Xuan Yi, Chao Hu, Defeng Xie, Chen Hou, Danqing Huang, Wei Chen, Yingcai Wu, Peng Chen (possible past Tencent (China) affiliation), Dazhen Deng
Abstract

Schema linking is a difficult and important step in large-scale Text-to-SQL, where systems must identify a compact yet sufficient schema context from large and ambiguous databases. Existing methods often treat schema linking as deterministic selection around a single SQL path, but complex questions may admit multiple valid realizations with different schema needs. We reframe schema linking as uncertainty-aware schema-need inference over multiple plausible SQL paths, where the system distinguishe...

πŸ“„ OccamToken: Efficient VLM Inference with Training-Free and Budget-Adaptive Token Pruning
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.29657v1
πŸ‘₯ Authors: Geng Li, Guohao Chen, Ting Chen (possible past Google (United States) affiliation), Shilin Shan, Kuangji Zuo, Bofan Lyu, Tuo An, Gen Li (possible past University Of Edinburgh affiliation), Jianfei Yang
Abstract

Vision-language models (VLMs) rely on long visual token sequences for visual understanding, making the prefill stage expensive in both computation and memory. Most existing pruning methods follow an absolute-ranking paradigm, assigning importance scores to visual tokens and retaining a fixed top-K subset. In this work, we argue that this paradigm is fundamentally brittle: attention sinks distort token importance rankings, while image redundancy and query-dependent visual evidence make fixed toke...

πŸ“„ The Sample Complexity of Multiclass and Sparse Contextual Bandits
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.29645v1
πŸ‘₯ Authors: Liad Erez, Fan Chen, Alon Cohen (possible past Google (United States) affiliation), Tomer Koren, Yishay Mansour (possible past Google (United States) affiliation), Shay Moran, Alexander Rakhlin
Abstract

We study contextual bandits in the stochastic i.i.d.\ setting, where a learner observes contexts drawn from an unknown distribution, selects actions from a finite set $A$, and aims to identify an approximately optimal policy from a given class based on bandit feedback. Motivated by bandit multiclass classification with zero-one rewards, we focus on the \emph{$s$-sparse} setting in which, for every context, the reward vector has $L_1$-norm at most $s \ll |A|$. Our main result is the design of alg...

πŸ“„ Planning with the Views via Scene Self-Exploration
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.29563v1
πŸ‘₯ Authors: Kangrui Wang, Linjie Li, Zhengyuan Yang, Shiqi Chen, Zihan Wang (possible past Tsinghua University affiliation), Li Fei-Fei (possible past Stanford University affiliation), Jiajun Wu (possible past Massachusetts Institute Of Technology affiliation), Leonidas Guibas (possible past Stanford University affiliation), Lijuan Wang, Manling Li
Abstract

Can VLMs predict how each camera move changes the view, and plan many such moves ahead? We call this capability view planning, requiring (1)understanding how a single action transforms the view, and (2)composing many such transformations across multi-turn plans to identify a target view. We probe both abilities in our proposed ViewSuite, a 3D point-cloud environment on real ScanNet scenes. Across 13 frontier VLMs, a critical planning gap emerges: they possess basic view-action knowledge but fail...

πŸ“„ Battery-Sim-Agent: Leveraging LLM-Agent for Inverse Battery Parameter Estimation
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.29560v1
πŸ‘₯ Authors: Jiawei Chen (possible past Tencent (China) affiliation), Xiaofan Gui, Shikai Fang, Shengyu Tao, Shun Zheng, Weiqing Liu, Jiang Bian (possible past Baidu (China) affiliation)
Abstract

Parameterizing high-fidelity "digital twins" of batteries is a critical yet challenging inverse problem that hinders the pace of battery innovation. Prevailing methods formulate this as a black-box optimization (BBO) task, employing algorithms that are sample-inefficient and blind to the underlying physics. In this work, we introduce a new paradigm that reframes the inverse problem as a reasoning task, and present Battery-Sim-Agent, the first framework to deploy a Large Language Model (LLM) agen...

πŸ“„ Temporal Motif-aware Graph Test-time Adaptation for OOD Blockchain Anomaly Detection
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.29526v1
πŸ‘₯ Authors: Runang He, Tongya Zheng, Huiling Peng, Yuanyu Wan, Bingde Hu, Jiawei Chen (possible past Tencent (China) affiliation), Canghong Jin, Mingli Song, Can Wang (possible past Tsinghua University affiliation)
Abstract

Ever-evolving transaction patterns have significantly hindered anomaly detection on emerging cryptocurrency blockchains due to the vast number of addresses and diverse anomalous behaviors. Recently, advanced Graph Anomaly Detection (GAD) approaches applied to blockchains have faced two critical challenges: \textit{adversarial pattern evolution by malicious actors} and \textit{the out-of-distribution (OOD) problem caused by varied transaction semantics on blockchains}. To address these challenges...

πŸ“„ MINDGAMES: A Live Arena for Evaluating Social and Strategic Reasoning in Multi-Agent LLMs
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.29512v1
πŸ‘₯ Authors: Kevin Wang, Anna ThΓΆni, Benjamin Kempinski, Bobby Cheng, Jianzhu Yao, Benjamin Finch, Leon Guertler, Viraj Nadkarni, Yihan Jiang (possible past University Of Washington affiliation), Aliaksei Korshuk, Alexander Buyantuev, Ilya Makarov, Siyuan Wu, Yu-Chi Cheng, Yan-Ru Ju, Ti-Rong Wu, I-Hsuan Chu, Yu-Yu Yang, I-Chen Wu, Yitian Huang, Qinlu Cao, Yiheng Sun, Yuhong Dai, Hongkun Yao, Jingxuan Fu, Jiwei Zhang, Hao Liao, Mossimo Ebeling, Govind Arun, Sadhvik Bathini, Mihir S Arya, Avinash Anish, Aditya Ranjan, Kirtana Sunil Phatnani, Paval Ks, Vrushali Mehta, Aravind S, Nikhil Arora, Tanya Upadhyay, Amol Bandagale, Yuan Lu, Chunen Hsiao, Yuting Lin, Arvin Chung, Jerry John Thomas, Mathieu LauriΓ¨re, Leshem Choshen, Yoram Bachrach (possible past Deepmind (United Kingdom) affiliation), Pramod Viswanath, Maria Polukarov, Cheston Tan, Tal Kachman, Atlas Wang
Abstract

Large language models (LLMs) are increasingly deployed as interactive agents, yet their capacity for social and strategic reasoning over extended interaction remains poorly understood. Existing evaluations rely on static vignettes or single-game benchmarks that cannot capture the sustained, multi-faceted reasoning that real-world multi-agent settings demand. We introduce Mindgames, a multi-game arena and evaluation platform for LLM agents that operationalizes complementary reasoning demands rele...

πŸ“„ PhoneWorld: Scaling Phone-Use Agent Environments
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.29486v1
πŸ‘₯ Authors: Zhengyang Tang, Yuxuan Liu, Xin Lai, Junyi Li, Pengyuan Lyu (possible past Tencent (China) affiliation), Jason, Yiduo Guo, Zhengyao Fang, Yang Ding, Yi Zhang (possible past Google (United States) affiliation), Weinong Wang, Huawen Shen, Xingran Zhou, Liang Wu, Fei Tang, Sunqi Fan, Shangpin Peng, Zheng Ruan, Anran Zhang, Benyou Wang (possible past Tencent (China) affiliation), Rui Yan (possible past Peking University affiliation), Ji-Rong Wen, Chengquan Zhang (possible past Baidu (China) affiliation), Han Hu
Abstract

A central bottleneck for phone-use agents is that controllable, reproducible environments covering real mobile behavior are hard to build at scale. Existing mobile-agent benchmarks have made important progress on evaluation, but they do not by themselves provide a scalable way to construct many new phone-use environments. We present PhoneWorld, a reusable pipeline that converts real GUI trajectories and screenshots into controllable phone-use environments, executable tasks, automatic verifiers, ...

πŸ“„ OOD-GraphLLM: Graph Large Language Model for Out-of-Distribution Generalized Drug Synergy Prediction
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.30247v1
πŸ‘₯ Authors: Xin Wang (possible past University Of Edinburgh affiliation), Linxin Xiao, Yang Yao, Wenwu Zhu (possible past Tsinghua University affiliation)
Abstract

Drug synergy prediction (DSP) aims to identify efficacious drug combinations under various cellular contexts with different targets. However, the continual emergence of novel compounds results in variations in molecular scaffolds and sizes, causing drug synergy data to exhibit out-of-distribution (O.O.D.) shifts with respect to topological structure. Existing works rely on in-distribution (I.D.) assumption, failing to handle the O.O.D. shifts. To solve this problem, we study out-of-distribution ...

πŸ“„ GRASP: Plan-Guided Graph Retrieval with Adaptive Fusion and Reranking on Semi-Structured Knowledge Bases
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.30237v1
πŸ‘₯ Authors: Yicheng Tao, Yiqun Wang, Xiangchen Song, Xin Luo, Kai Liu (possible past Baidu (China) affiliation), Jie Liu (possible past Tencent (China) affiliation)
Abstract

Semi-structured knowledge bases (SKBs) embed textual documents in a typed graph of entities and relations, and underpin applications such as product search, academic paper search, and precision-medicine inquiries. Existing hybrid retrieval systems on SKBs either use the graph only for query expansion, mix textual and structural branches under a global weighting, or rely on fine-tuned graph-traversal generators. We present GRASP, a three-stage SKB retrieval framework unifying plan-based graph ret...

πŸ“„ When Do Graph Foundation Models Transfer? A Data-Centric Theory
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.29828v1
πŸ‘₯ Authors: Jiajun Zhu, Ying Chen (possible past Baidu (China) affiliation), Peihao Wang, Yixuan He, Pan Li (possible past Baidu (China) affiliation), Aditya Akella, Zhangyang Wang
Abstract

Graph foundation models (GFMs) aim to reuse a single backbone across diverse graph domains, yet their transfer is often uneven and can exhibit negative transfer. While most prior work improves transfer through architectural or adaptation choices, we ask a data-centric question: which properties of two graph domains determine how much a fixed representation model changes its outputs? Using a graphon-based continuous limit for dense graphs, we show that for both set-based and message-passing token...

πŸ“„ Cert-LAS: Toward Certified Model Ownership Verification for Text-to-Image Diffusion Models via Layer-Adaptive Smoothing
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.29809v1
πŸ‘₯ Authors: Leyi Qi, Yiming Li (possible past Tsinghua University affiliation), Siyuan Liang, Zhengzhong Tu (possible past Google (United States) affiliation), Dacheng Tao
Abstract

Large-scale text-to-image (T2I) diffusion models have enabled unprecedented creative applications, but their unauthorized use has raised serious intellectual property concerns, making model ownership verification (MOV) increasingly critical. We find that existing backdoor-based diffusion watermarking methods often (implicitly) assume a "faithful" verification process, namely, that the verifier can query a suspicious model and obtain the faithful watermark response to complete MOV. However, in pr...

πŸ“„ Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.29548v1
πŸ‘₯ Authors: Jing Huang (possible past Meta (United States) affiliation), Daniel Wurgaft, Rachit Bansal, Laura Ruis, Naomi Saphra, David Alvarez-Melis, Andrew Kyle Lampinen, Christopher Potts (possible past Tencent (China) affiliation), Ekdeep Singh Lubana
Abstract

Larger models learn tasks smaller models do not. What drives this phenomenon? We develop a simple phenomenological argument that power-law scaling already suggests that a larger model will be able to learn a part of the data distribution that a smaller model fails to learn, even with infinite training data. To validate this claim and identify its causes, we study the effects of model scaling on a synthetic setup consisting of a mixture of tasks that show monotonic scaling curves. The results poi...

πŸ“„ Forget Less, Generalize More: Unifying Temporal and Structural Adaptation for Dynamic Graphs
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.29453v1
πŸ‘₯ Authors: Qian Chang, Ciprian Doru Giurcaneanu, Runsong Jia, Xia Li (possible past Meta (United States) affiliation), Guoping Hu, Xiufeng Cheng, Jinqing Yang, Mengjia Wu, Yi Zhang (possible past Google (United States) affiliation)
Abstract

Representation learning on dynamic graphs requires capturing complex dependencies that evolve across both time and structure. Existing approaches typically adopt fixed temporal decay schemes or predetermined structural propagation depths, limiting their ability to generalize across graphs with diverse interaction frequencies and topological characteristics. We propose Dual-Scale Retentive Dynamics (DSRD), a unified framework that maintains a retentive representation state encoding both temporal ...

πŸ“„ PassNet: Scaling Large Language Models for Graph Compiler Pass Generation
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.29357v1
πŸ‘₯ Authors: Yiqun Liu, Yingsheng Wu, Ruqi Yang, Enrong Zheng, Honglei Qiu, Sijun He, Tai Liang, Jingjing Wu, Yuhan Zhou, Yiwei Zhang (possible past Tsinghua University affiliation), Dongyan Chen, Weihan Yi, Xinqi Li, Siqi Bao (possible past Baidu (China) affiliation)
Abstract

Modern tensor compilers such as TorchInductor deliver substantial speedups on mainstream models, yet face a systematic performance ceiling on long-tail workloads -- our profiling shows that 43% of real-world subgraphs experience end-to-end slowdowns under default compilation. While LLMs offer a path toward automated optimization, existing efforts focus on standalone kernel generation. We argue that pass generation -- where LLMs author structured graph transformations that integrate directly into...

πŸ“„ LoopFM: Learning frOm HistOrical RePresentations of Foundation Model for Recommendation
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.29280v1
πŸ‘₯ Authors: Shali Jiang, Hua Zheng, Boyang Liu, Laming Chen, Kenny Lov, Chuanqi Xu, Lisang Ding, Qinghai Zhou, Can Cui (possible past Baidu (China) affiliation), Xiaolong Liu, Xiaoyi Liu, Yasmine Badr, Xin Xu, Jiyan Yang (possible past Meta (United States) affiliation), Ellie Dingqiao Wen, Gerard Jonathan Mugisha Akkerhuis, Chenxiao Guan, Rong Jin, Ruichao Qiu, Xian Chen, Shifu Xu, Zhehui Zhou, Ping Chen, Rui Yang, Haicheng Chen, Xiangge Meng, Song Zhou, Dharak Kharod, Shuyu Xu, Qiang Jin, Qiao Yang, Wankun Zhu, Qin Huang, Yuzhen Huang, Darren Liu, Parish Aggarwal, Hui Zhou, Erzhuo Wang, Shuo Chang, Xiaorui Gan, Wenlin Chen (possible past Meta (United States) affiliation), Santanu Kolay, Huayu Li
Abstract

Knowledge distillation (KD) transfers a single scalar prediction from a large foundation model (FM) to compact vertical models (VMs), suffering from diminishing transfer ratio -- the fraction of FM improvement captured by the VM -- as a single scalar cannot convey the rich intermediate knowledge that larger FMs learn. To address this bottleneck, we propose LoopFM (Learning frOm HistOrical ReP*resentations of FM), a framework that opens a high-bandwidth transfer channel by structuring FM intermed...

πŸ“„ Implicit Identity Technologies for LLMs: Fingerprinting and Watermarking across Datasets, Models, and Generated Content
πŸ—“οΈ Published: 5/28/2026
πŸ”— http://arxiv.org/abs/2605.29245v1
πŸ‘₯ Authors: Bing Liu (possible past Carnegie Mellon University affiliation), Shunping Wang, Yufan Zhu, Xinyi Yu, Jing Huang (possible past Meta (United States) affiliation), Linkang Du, Hongbin Pei, Wei Luo (possible past Baidu (China) affiliation)
Abstract

This paper presents a survey and taxonomy of LLM fingerprinting and watermarking for identity, ownership verification, provenance, and generated-content attribution. Large language models (LLMs) require substantial investments in data, computation, and expertise, and are increasingly deployed in high-stakes settings, making it critical to protect LLM-related assets and trace their origins. Existing work has rapidly expanded across dataset provenance, model ownership, and generated-content detect...

*Notable papers are those with at least two authors from a "big" AI/ML lab.