πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ Learning, Fast and Slow: Towards LLMs That Adapt Continually
πŸ—“οΈ Published: 5/12/2026
πŸ”— http://arxiv.org/abs/2605.12484v1
πŸ‘₯ Authors: Rishabh Tiwari, Kusha Sareen, Lakshya A Agrawal, Joseph E. Gonzalez (possible past University Of California, Berkeley affiliation), Matei Zaharia (possible past University Of California, Berkeley affiliation), Kurt Keutzer (possible past University Of California, Berkeley affiliation), Inderjit S Dhillon, Rishabh Agarwal (possible past Google (United States) affiliation), Devvrit Khatri
Abstract

Large language models (LLMs) are trained for downstream tasks by updating their parameters (e.g., via RL). However, updating parameters forces them to absorb task-specific information, which can result in catastrophic forgetting and loss of plasticity. In contrast, in-context learning with fixed LLM parameters can cheaply and rapidly adapt to task-specific requirements (e.g., prompt optimization), but cannot by itself typically match the performance gains available through updating LLM parameter...

πŸ“„ OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation
πŸ—“οΈ Published: 5/12/2026
πŸ”— http://arxiv.org/abs/2605.12480v1
πŸ‘₯ Authors: Guohui Zhang, Xiaoxiao Ma, Jie Huang, Hang Xu, Hu Yu, Siming Fu, Yuming Li, Zeyue Xue, Lin Song (possible past Tencent (China) affiliation), Haoyang Huang, Nan Duan, Feng Zhao (possible past Microsoft (United States) affiliation)
Abstract

Recent advances in joint audio-video generation have been remarkable, yet real-world applications demand strong per-modality fidelity, cross-modal alignment, and fine-grained synchronization. Reinforcement Learning (RL) offers a promising paradigm, but its extension to multi-objective and multi-modal joint audio-video generation remains unexplored. Notably, our in-depth analysis first reveals that the primary obstacles to applying RL in this stem from: (i) multi-objective advantages inconsistenc...

πŸ“„ Formalize, Don't Optimize: The Heuristic Trap in LLM-Generated Combinatorial Solvers
πŸ—“οΈ Published: 5/12/2026
πŸ”— http://arxiv.org/abs/2605.12421v1
πŸ‘₯ Authors: Haoyu Wang (possible past Tencent (China) affiliation), Yuliang Song, Tao Li (possible past Baidu (China) affiliation), Zhiwei Deng, Yaqing Wang (possible past Baidu (China) affiliation), Deepak Ramachandran, Eldan Cohen, Dan Roth
Abstract

Large Language Models (LLMs) struggle to solve complex combinatorial problems through direct reasoning, so recent neuro-symbolic systems increasingly use them to synthesize executable solvers. A central design question is how the LLM should represent the solver, and whether it should also attempt to optimize search. We introduce CP-SynC-XL, a benchmark of 100 combinatorial problems (4,577 instances), and evaluate three solver-construction paradigms: native algorithmic search (Python), constraint...

πŸ“„ Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space
πŸ—“οΈ Published: 5/12/2026
πŸ”— http://arxiv.org/abs/2605.12412v1
πŸ‘₯ Authors: Eric Bigelow, RaphaΓ«l Sarfati, Daniel Wurgaft, Owen Lewis, Thomas Mcgrath (possible past Google (United States) affiliation), Jack Merullo, Atticus Geiger (possible past Stanford University affiliation), Ekdeep Singh Lubana
Abstract

Large Language Models (LLMs) update their behavior in context, which can be viewed as a form of Bayesian inference. However, the structure of the latent hypothesis space over which this inference operates remains unclear. In this work, we propose that LLMs assign beliefs over a low-dimensional geometric space - a conceptual belief space - and that in-context learning corresponds to a trajectory through this space as beliefs are updated over time. Using story understanding as a natural setting fo...

πŸ“„ ProfiliTable: Profiling-Driven Tabular Data Processing via Agentic Workflows
πŸ—“οΈ Published: 5/12/2026
πŸ”— http://arxiv.org/abs/2605.12376v1
πŸ‘₯ Authors: Wei Liu (possible past Tsinghua University affiliation), Yang Gu, Xi Yan, Zihan Nan, Beicheng Xu, Keyao Ding, Bin Cui (possible past Peking University affiliation), Wentao Zhang (possible past Mila - Quebec Artificial Intelligence Institute affiliation)
Abstract

Table processing-including cleaning, transformation, augmentation, and matching-is a foundational yet error-prone stage in real-world data pipelines. While recent LLM-based approaches show promise for automating such tasks, they often struggle in practice due to ambiguous instructions, complex task structures, and the lack of structured feedback, resulting in syntactically correct but semantically flawed code. To address these challenges, we propose ProfiliTable, an autonomous multi-agent framew...

πŸ“„ Fill the GAP: A Granular Alignment Paradigm for Visual Reasoning in Multimodal Large Language Models
πŸ—“οΈ Published: 5/12/2026
πŸ”— http://arxiv.org/abs/2605.12374v1
πŸ‘₯ Authors: Yanting Miao, Yutao Sun, Dexin Wang, Mengyu Zhou, Pascal Poupart, Lei Lv, Qi Zhao (possible past Google (United States) affiliation), Li Wang (possible past Tesla (United States) affiliation), Hao Li (possible past Tsinghua University affiliation), Xiaoxi Jiang, Guanjun Jiang
Abstract

Visual latent reasoning lets a multimodal large language model (MLLM) create intermediate visual evidence as continuous tokens, avoiding external tools or image generators. However, existing methods usually follow an output-as-input latent paradigm and yield unstable gains. We identify evidence for a feature-space mismatch that can contribute to this instability: dominant visual-latent models build on pre-norm MLLMs and reuse decoder hidden states as predicted latent inputs, even though these st...

πŸ“„ No Action Without a NOD: A Heterogeneous Multi-Agent Architecture for Reliable Service Agents
πŸ—“οΈ Published: 5/12/2026
πŸ”— http://arxiv.org/abs/2605.12240v1
πŸ‘₯ Authors: Zixu Yang, Hang Zheng, Nan Jiang (possible past Stanford University affiliation), Zhiyang Tang, Situo Zhang, Xiaobao Wu, Lu Chen, Kai Yu (possible past Baidu (China) affiliation)
Abstract

Large language model (LLM) agents have increasingly advanced service applications, such as booking flight tickets. However, these service agents suffer from unreliability in long-horizon tasks, as they often produce policy violations, tool hallucinations, and misaligned actions, which greatly impedes their real-world deployment. To address these challenges, we propose NOD (Navigator-Operator-Director), a heterogeneous multi-agent architecture for service agents. Instead of maintaining task state...

πŸ“„ Mitigating Context-Memory Conflicts in LLMs through Dynamic Cognitive Reconciliation Decoding
πŸ—“οΈ Published: 5/12/2026
πŸ”— http://arxiv.org/abs/2605.12185v1
πŸ‘₯ Authors: Yigeng Zhou, Wu Li, Yifan Lu, Yequan Wang (possible past Tsinghua University affiliation), Xuebo Liu, Wenya Wang, Jun Yu, Min Zhang (possible past Tsinghua University affiliation), Jing Li (possible past Tencent (China) affiliation)
Abstract

Large language models accumulate extensive parametric knowledge through pre-training. However, knowledge conflicts occur when outdated or incorrect parametric knowledge conflicts with external knowledge in the context. Existing methods address knowledge conflicts through contrastive decoding, but in conflict-free scenarios, static approaches disrupt output distribution. Other dynamic decoding methods attempt to measure the degree of conflict but still struggle with complex real-world situations....

πŸ“„ MM-OptBench: A Solver-Grounded Benchmark for Multimodal Optimization Modeling
πŸ—“οΈ Published: 5/12/2026
πŸ”— http://arxiv.org/abs/2605.12154v1
πŸ‘₯ Authors: Zhong Li (possible past Tencent (China) affiliation), Qi Huang (possible past Meta (United States) affiliation), Yuxuan Zhu, Mohammad Mohammadi Amiri, Niki Van Stein, Thomas BΓ€ck, Matthijs Van Leeuwen, Zaiwen Wen, Lincen Yang
Abstract

Optimization modeling translates real decision-making problems into mathematical optimization models and solver-executable implementations. Although language models are increasingly used to generate optimization formulations and solver code, existing benchmarks are almost entirely text-only. This omits many optimization-modeling tasks that arise in operational practice, where requirements are described in text but instance information is conveyed through visual artifacts such as tables, graphs, ...

πŸ“„ Large Language Models as Amortized Pareto-Front Generators for Constrained Bi-Objective Convex Optimization
πŸ—“οΈ Published: 5/12/2026
πŸ”— http://arxiv.org/abs/2605.12106v1
πŸ‘₯ Authors: Peipei Xu, Siyuan Ma, Yaohua Liu, Yu Wu (possible past Baidu (China) affiliation), Guanliang Liu, Yang Zhang (possible past Tsinghua University affiliation), Yong Liu
Abstract

Generating feasible Pareto fronts for constrained bi-objective continuous optimization is central to multi-criteria decision-making. Existing methods usually rely on iterative scalarization, evolutionary search, or problem-specific solvers, requiring repeated optimization for each instance. We introduce DIPS, an end-to-end framework that fine-tunes large language models as amortized Pareto-front generators for constrained bi-objective convex optimization. Given a textual problem description, DIP...

πŸ“„ HΓΆlder Policy Optimisation
πŸ—“οΈ Published: 5/12/2026
πŸ”— http://arxiv.org/abs/2605.12058v1
πŸ‘₯ Authors: Yuxiang Chen, Dingli Liang, Yihang Chen, Ziqin Gong, Chenyang Le, Zhaokai Wang, Jiachen Zhu, Lingyu Yang, Jianghao Lin, Weinan Zhang (possible past Shanghai Jiao Tong University affiliation), Jun Wang (possible past Tencent (China) affiliation)
Abstract

Group Relative Policy Optimisation (GRPO) enhances large language models by estimating advantages across a group of sampled trajectories. However, mapping these trajectory-level advantages to policy updates requires aggregating token-level probabilities within each sequence. Relying on a fixed aggregation mechanism for this step fundamentally limits the algorithm's adaptability. Empirically, we observe a critical trade-off: certain fixed aggregations frequently suffer from training collapse, whi...

πŸ“„ OmniRefine: Alignment-Aware Cooperative Compression for Efficient Omnimodal Large Language Models
πŸ—“οΈ Published: 5/12/2026
πŸ”— http://arxiv.org/abs/2605.12056v1
πŸ‘₯ Authors: Yuchen Deng, Zidang Cai, Hai-Tao Zheng (possible past Tsinghua University affiliation), Jie Wang (possible past Tsinghua University affiliation), Feidiao Yang, Yuxing Han
Abstract

Omnimodal large language models (Omni-LLMs) show strong capability in audio-video understanding, but their practical deployment remains limited by high inference cost of long video streams and dense audio sequences. Despite recent progress, existing compression methods for Omni-LLMs typically rely on fixed or native compression units, which can disrupt cross-modal correspondence and the complementary information required for audio-video reasoning, making it difficult to improve inference efficie...

πŸ“„ L2P: Unlocking Latent Potential for Pixel Generation
πŸ—“οΈ Published: 5/12/2026
πŸ”— http://arxiv.org/abs/2605.12013v1
πŸ‘₯ Authors: Zhennan Chen, Junwei Zhu, Xu Chen (possible past Tencent (China) affiliation), Jiangning Zhang (possible past Tencent (China) affiliation), Jiawei Chen (possible past Tencent (China) affiliation), Zhuoqi Zeng, Wei Zhang (possible past Tsinghua University affiliation), Chengjie Wang (possible past Tencent (China) affiliation), Jian Yang, Ying Tai (possible past Tencent (China) affiliation)
Abstract

Pixel diffusion models have recently regained attention for visual generation. However, training advanced pixel-space models from scratch demands prohibitive computational and data resources. To address this, we propose the Latent-to-Pixel (L2P) transfer paradigm, an efficient framework that directly harnesses the rich knowledge of pre-trained LDMs to build powerful pixel-space models. Specifically, L2P discards the VAE in favor of large-patch tokenization and freezes the source LDM's intermedia...

πŸ“„ AccLock: Unlocking Identity with Heartbeat Using In-Ear Accelerometers
πŸ—“οΈ Published: 5/12/2026
πŸ”— http://arxiv.org/abs/2605.11901v1
πŸ‘₯ Authors: Lei Wang (possible past Baidu (China) affiliation), Jiangxuan Shen, Xi Zhang, Dalin Zhang, Jingyu Li, Haipeng Dai, Chenren Xu, Daqing Zhang, He Huang (possible past Baidu (China) affiliation)
Abstract

The widespread use of earphones has enabled various sensing applications, including activity recognition, health monitoring, and context-aware computing. Among these, earphone-based user authentication has become a key technique by leveraging unique biometric features. However, existing earphone-based authentication systems face key limitations: they either require explicit user interaction or active speaker output, or suffer from poor accessibility and vulnerability to environmental noise, whic...

πŸ“„ GEAR: Granularity-Adaptive Advantage Reweighting for LLM Agents via Self-Distillation
πŸ—“οΈ Published: 5/12/2026
πŸ”— http://arxiv.org/abs/2605.11853v1
πŸ‘₯ Authors: Sijia Li, Yuchen Huang, Zifan Liu, Yanping Li, Jingjing Fu, Li Zhao, Jiang Bian (possible past Baidu (China) affiliation), Ling Zhang (possible past Nvidia (United States) affiliation), Jun Zhang (possible past Tencent (China) affiliation), Rui Wang (possible past Tencent (China) affiliation)
Abstract

Reinforcement learning has become a widely used post-training approach for LLM agents, where training commonly relies on outcome-level rewards that provide only coarse supervision. While finer-grained credit assignment is promising for effective policy updates, obtaining reliable local credit and assigning it to the right parts of the long-horizon trajectory remains an open challenge. In this paper, we propose Granularity-adaptivE Advantage Reweighting (GEAR), an adaptive-granularity credit assi...

πŸ“„ MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare
πŸ—“οΈ Published: 5/12/2026
πŸ”— http://arxiv.org/abs/2605.11814v1
πŸ‘₯ Authors: Yihao Wang, Haoran Xu, Renjie Gu, Yixuan Ye, Xinyi Chen, Xinyu Mu, Yuan Gao (possible past Tencent (China) affiliation), Chunxiao Guo, Peng Wei, Jinjie Gu, Huan Li, Ke Chen (possible past Tencent (China) affiliation), Lidan Shou
Abstract

The large-scale deployment of personalized healthcare agents demands memory mechanisms that are exceptionally precise, safe, and capable of long-term clinical tracking. However, existing benchmarks primarily focus on daily open-domain conversations, failing to capture the high-stakes complexity of real-world medical applications. Motivated by the stringent production requirements of an industry-leading health management agent serving tens of millions of active users, we introduce MedMemoryBench....

πŸ“„ OptArgus: A Multi-Agent System to Detect Hallucinations in LLM-based Optimization Modeling
πŸ—“οΈ Published: 5/12/2026
πŸ”— http://arxiv.org/abs/2605.11738v1
πŸ‘₯ Authors: Zhong Li (possible past Tencent (China) affiliation), Zihan Guo, Xiaohan Lu, Juntao Wang, Jie Song (possible past Eth Zurich affiliation), Chao Shen, Jiageng Wu, Mingyang Sun
Abstract

Large language models (LLMs) are increasingly used to translate natural-language optimization problems into mathematical formulations and solver code, but matching the reference objective value is not a reliable test of correctness: an artifact may agree numerically while still changing the underlying optimization semantics. We formulate this issue as \emph{optimization-modeling hallucination detection}, namely structural consistency auditing over the problem description, symbolic model, and sol...

πŸ“„ CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating
πŸ—“οΈ Published: 5/12/2026
πŸ”— http://arxiv.org/abs/2605.11723v1
πŸ‘₯ Authors: Jiyuan Wang, Huan Ouyang, Jiuzhou Lin, Chunyu Lin, Dewen Fan, Boheng Zhang, Haonan Fan, Fei Zuo, Jia Sun, Huaiqing Wang, Honglie Wang, Yiyang Fan, Zhenlong Yuan (possible past Tsinghua University affiliation), Zijun Li, Yongrui Heng, Guosheng Lin, Fan Yang (possible past Tencent (China) affiliation), Tingting Gao
Abstract

In this paper, we propose Concentrate and Concentrate (CaC), a coarse-to-fine anomaly reward model based on Vision-Language Models. During inference, it first conducts a global temporal scan to anchor anomalous time windows, then performs fine-grained spatial grounding within the localized interval, and finally derives robust judgments via structured spatiotemporal Chain-of-Thought reasoning. To equip the model with these capabilities, we construct the first large-scale generated video anomaly d...

πŸ“„ SeirΓͺnes: Adversarial Self-Play with Evolving Distractions for LLM Reasoning
πŸ—“οΈ Published: 5/12/2026
πŸ”— http://arxiv.org/abs/2605.11636v1
πŸ‘₯ Authors: Chi Zhang (possible past Peking University affiliation), Haibo Qiu, Qiming Zhang, Yufei Xu, Xinbo Gao, Jing Zhang (possible past University Of Washington affiliation)
Abstract

We present SeirΓͺnes, a self-play RL framework that transforms contextual interference from a failure mode of LLM reasoning into an internal training signal for co-evolving more resilient reasoners. While RL with verifiable rewards has significantly advanced reasoning capabilities, models can still exhibit fragility when encountering non-idealized contexts: scenarios characterized by superfluous information, tangential instructions, or incidental correlations that differ from the clean distributi...

πŸ“„ Controllable User Simulation
πŸ—“οΈ Published: 5/12/2026
πŸ”— http://arxiv.org/abs/2605.11519v1
πŸ‘₯ Authors: Guy Tennenholtz, Ofer Meshi, Amir Globerson (possible past Google (United States) affiliation), Uri Shalit (possible past Technion – Israel Institute Of Technology affiliation), Jihwan Jeong, Craig Boutilier (possible past Google (United States) affiliation)
Abstract

Using offline datasets to evaluate conversational agents often fails to cover rare scenarios or to support testing new policies. This has motivated the use of controllable user simulators for targeted, counterfactual evaluation, typically implemented by prompting or fine-tuning large language models. In this work, we formalize controllable simulation as a causal inference problem. By bridging natural language evaluation with off-policy evaluation methodology, we show that the standard practice o...

πŸ“„ ORCE: Order-Aware Alignment of Verbalized Confidence in Large Language Models
πŸ—“οΈ Published: 5/12/2026
πŸ”— http://arxiv.org/abs/2605.12446v1
πŸ‘₯ Authors: Chen Li (possible past Tencent (China) affiliation), Xiaoling Hu, Songzhu Zheng, Jiawei Zhou, Chao Chen (possible past Tencent (China) affiliation)
Abstract

Large language models (LLMs) often produce answers with high certainty even when they are incorrect, making reliable confidence estimation essential for deployment in real-world scenarios. Verbalized confidence, where models explicitly state their confidence in natural language, provides a flexible and user-facing uncertainty signal that can be applied even when token logits are unavailable. However, existing verbalized-confidence methods often optimize answer generation and confidence generatio...

πŸ“„ ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging
πŸ—“οΈ Published: 5/12/2026
πŸ”— http://arxiv.org/abs/2605.12419v1
πŸ‘₯ Authors: Neha Verma, Nikhil Mehta, Shao-Chuan Wang, Naijing Zhang, Alicia Tsai, Li Wei (possible past Google (United States) affiliation), Lukasz Heldt (possible past Google (United States) affiliation), Lichan Hong (possible past Google (United States) affiliation), Ed Chi, Xinyang Yi (possible past Google (United States) affiliation)
Abstract

Despite the rapid advancements in large language model (LLM) development, fine-tuning them for specific tasks often results in the catastrophic forgetting of their general, language-based reasoning abilities. This work investigates and addresses this challenge in the context of the Generative Retrieval (GenRetrieval) task. During GenRetrieval fine-tuning, we find this forgetting occurs rapidly and correlates with the distance between the fine-tuned and original model parameters. Given these obse...

πŸ“„ A Unified Graph Language Model for Multi-Domain Multi-Task Graph Alignment Instruction Tuning
πŸ—“οΈ Published: 5/12/2026
πŸ”— http://arxiv.org/abs/2605.12197v1
πŸ‘₯ Authors: Haibo Chen, Xin Wang (possible past University Of Edinburgh affiliation), Jiaheng Chao, Ling Feng, Wenwu Zhu (possible past Tsinghua University affiliation)
Abstract

Leveraging Graph Neural Networks (GNNs) as graph encoders and aligning the resulting representations with Large Language Models (LLMs) through alignment instruction tuning has become a mainstream paradigm for constructing Graph Language Models (GLMs), combining the generalization ability of LLMs with the structural modeling capacity of GNNs. However, existing GLMs that adopt GNNs as graph encoders largely overlook the problem of aligning GNN-encoded representations across domains and tasks with ...

πŸ“„ More Edits, More Stable: Understanding the Lifelong Normalization in Sequential Model Editing
πŸ—“οΈ Published: 5/12/2026
πŸ”— http://arxiv.org/abs/2605.11836v1
πŸ‘₯ Authors: Xin Ma, Wei Chen, Qi Liu (possible past Tencent (China) affiliation), Derong Xu, Zhi Zheng, Tong Xu (possible past Baidu (China) affiliation), Enhong Chen (possible past Baidu (China) affiliation)
Abstract

Lifelong Model Editing aims to continuously update evolving facts in Large Language Models while preserving unrelated knowledge and general capabilities, yet it remains plagued by catastrophic forgetting and model collapse. Empirically, we find that recent editors resilient over long horizons share the same core strategy: Lifelong Normalization (LN), which normalizes value gradients using running statistics. Removing LN causes immediate performance collapse, and we observe a counter-intuitive po...

πŸ“„ Slicing and Dicing: Configuring Optimal Mixtures of Experts
πŸ—“οΈ Published: 5/12/2026
πŸ”— http://arxiv.org/abs/2605.11689v1
πŸ‘₯ Authors: Margaret Li, Sneha Kudugunta (possible past Google (United States) affiliation), Danielle Rothermel, Luke Zettlemoyer (possible past University Of Washington affiliation)
Abstract

Mixture-of-Experts (MoE) architectures have become standard in large language models, yet many of their core design choices - expert count, granularity, shared experts, load balancing, token dropping - have only been studied one or two at a time over narrow configuration ranges. It remains an open question whether these choices can be optimized independently, without considering interactions. We present the first systematic study of over 2,000 pretraining runs spanning models up to 6.6B total pa...

*Notable papers are those with at least two authors from a "big" AI/ML lab.