📄 Notable* Recent AI/ML arXiv Papers

Last updated just now...

📄 DiT-Reward: Generative Representations for Text-to-Image Reward Modeling
🗓️ Published: 6/22/2026
🔗 http://arxiv.org/abs/2606.23626v1
👥 Authors: Yuanming Yang, Guoqing Ma, Bo Wang (possible past Tencent (China) affiliation), Yuan Zhang (possible past Google (United States) affiliation), Wei Tang, Chenyi Li, Haoyang Huang, Nan Duan
Abstract

Can representations learned for image generation also support the evaluation of generated images? We study text-to-image reward prediction as a downstream task of generative representation learning. To this end, we introduce DiT-Reward, which converts a pretrained text-to-image Diffusion Transformer into a reward model by processing near-clean image latents and aggregating text-conditioned image representations across transformer layers. Under the same training data mixture as HPSv3, DiT-Reward ...

📄 VeriEvol: Scaling Multimodal Mathematical Reasoning via Verifiable Evol-Instruct
🗓️ Published: 6/22/2026
🔗 http://arxiv.org/abs/2606.23543v1
👥 Authors: Haoling Li, Kai Zheng, Jie Wu, Can Xu (possible past Google (United States) affiliation), Qingfeng Sun, Han Hu, Yujiu Yang (possible past Tsinghua University affiliation)
Abstract

Scaling reinforcement learning for visual mathematical reasoning requires more than generating harder questions: as data volume grows, the reward labels themselves must remain reliable. Yet existing data pipelines scale supervision while trusting the labeller, and policy-side methods assume the underlying answers are already correct. We instead treat scaling as a verifiable data-construction problem and decouple two axes before any policy update: prompt difficulty, expanded by route-specific evo...

📄 ReasoningLens: Hierarchical Visualization and Diagnostic Auditing for Large Reasoning Models
🗓️ Published: 6/22/2026
🔗 http://arxiv.org/abs/2606.23404v1
👥 Authors: Jun Zhang (possible past Tencent (China) affiliation), Jiasheng Zheng, Boxi Cao, Yaojie Lu, Hongyu Lin, Jia Zheng, Xianpei Han (possible past Tencent (China) affiliation), Le Sun
Abstract

The emergence of Large Reasoning Models has introduced exceptionally long Chain-of-Thought traces, creating a transparency burden where critical logic is often buried under massive procedural text. To address this, we present ReasoningLens, an open-source framework designed for the hierarchical visualization and diagnostic auditing of complex reasoning chains. ReasoningLens addresses information necropsy by: (1) structuring traces into interactive hierarchies that separate high-level strategy fr...

📄 VideoAgent: All-in-One Framework for Video Understanding and Editing
🗓️ Published: 6/22/2026
🔗 http://arxiv.org/abs/2606.23327v1
👥 Authors: Hengji Zhou, Lingxuan Huang, Jian Wang (possible past Baidu (China) affiliation), Bing Zhou, Si Wu, Lianghao Xia, Chao Huang (possible past Tencent (China) affiliation)
Abstract

Video editing has become essential in digital media creation, yet existing automated systems are restricted to short segment processing and domain-specific tasks. They face two critical limitations: i) inability to handle diverse video comprehension and editing operations, and ii) lack of long-video understanding for coherent narrative creation. We propose VideoAgent, an all-in-one agentic framework addressing these challenges through two key innovations. First, we develop automated video shot c...

📄 AdaReP:Adaptive Re-Planning under Model Mismatch for Neural World-Model Predictive Control
🗓️ Published: 6/22/2026
🔗 http://arxiv.org/abs/2606.23079v1
👥 Authors: Yutian Cheng, Xiaojian Ma, Xianhao Wang, Min Yang (possible past Baidu (China) affiliation), Rongpeng Su, Hangxin Liu, Xi Chen (possible past University Of California, Berkeley affiliation), Shuai Li, Qing Li
Abstract

Neural world models coupled with model predictive control (MPC) replan at every environment step to bound accumulated prediction error, but this incurs substantial computational overhead. Reusing a cached plan reduces this overhead, yet its effectiveness depends on how prediction mismatch propagates through the local dynamics. We analyze this trade-off with a perturbation-based dynamic-regret framework and show that stale-plan penalties scale with the reuse tolerance, the accumulated mismatch si...

📄 Attention-Spectrum Regularization for Replay-Free Continual Multimodal LLMs
🗓️ Published: 6/22/2026
🔗 http://arxiv.org/abs/2606.23063v1
👥 Authors: Chuangxin Zhao, Canran Xiao, Siyuan Ma, Mengyao Lyu, Yanbiao Ma, Jun Xia, Guiguang Ding (possible past Tsinghua University affiliation), Yang Liu (possible past Tsinghua University affiliation)
Abstract

Multimodal large language models (MLLMs) are increasingly required to adapt to non-stationary streams of visual domains, question types, and user instructions, yet continual fine-tuning often causes severe forgetting of previously acquired multimodal skills. Existing continual vision-language methods mainly preserve outputs, replay data or pseudo-data, regularize embedding geometry, or allocate task-specific parameters, but they provide limited control over how internal cross-modal attention pat...

📄 Training Open Models for Agentic Phone Use
🗓️ Published: 6/22/2026
🔗 http://arxiv.org/abs/2606.23049v1
👥 Authors: Zhengyang Tang, Xin Lai, Pengyuan Lyu (possible past Tencent (China) affiliation), Xinyuan Wang, Tianyi Bai, Chenxin Li, Yiduo Guo, Huawen Shen, Yuxuan Liu, Junyi Li, Zhengyao Fang, Yang Ding, Yi Zhang (possible past Google (United States) affiliation), Weinong Wang, Xingran Zhou, Liang Wu, Fei Tang, Sunqi Fan, Shangpin Peng, Zheng Ruan, Anran Zhang, Benyou Wang (possible past Tencent (China) affiliation), Ji-Rong Wen, Rui Yan (possible past Peking University affiliation), Chengquan Zhang (possible past Baidu (China) affiliation), Han Hu
Abstract

Phones are becoming an important execution surface for general-purpose agents, but training open models for reliable phone use remains difficult because the environment that matters at deployment, real devices running real apps, is slow, stateful, side-effectful, and hard to reset or verify, while scalable mock environments only approximate real behavior. We present PhoneBuddy, a training recipe and open-model line for agentic phone use that combines a real-app environment with a mock-app enviro...

📄 Agent-as-a-Router: Agentic Model Routing for Coding Tasks
🗓️ Published: 6/22/2026
🔗 http://arxiv.org/abs/2606.22902v1
👥 Authors: Pengfei Zhou, Zhiwei Tang, Yixing Ma, Jiasheng Tang, Yizeng Han, Zhenglin Wan, Fanqing Meng, Wei Wang (possible past University Of Oxford affiliation), Bohan Zhuang, Wangbo Zhao, Yang You (possible past University Of California, Berkeley affiliation)
Abstract

Real-world users typically have access to multiple Large Language Models (LLMs) from different providers, and these LLMs often excel at distinct domains, yet none dominate all. Consequently, routing each task to the most suitable model becomes critical for both performance and cost. Existing routers treat this as a static, one-off classification problem. However, we identify the performance bottleneck for these routers as information deficit: simply augmenting a vanilla LLM router with performan...

📄 RaMem: Contextual Reinstatement for Long-term Agentic Memory
🗓️ Published: 6/22/2026
🔗 http://arxiv.org/abs/2606.22844v1
👥 Authors: Wei Yang (possible past Tencent (China) affiliation), Bryce Kan, Shixuan Li, Li Li (possible past Google (United States) affiliation), Yuehan Qin, Jiate Li, Paul Bogdan, Jesse Thomason (possible past University Of Washington affiliation)
Abstract

Long-term memory has become increasingly important for LLM agents that operate across extended interactions and evolving task contexts. Recent memory systems have made past experiences more persistent, compact, and retrievable, but retrieval alone does not ensure that a memory provides valid evidence for the current query. When experiences are compressed into reusable fragments, memories from different situations may appear equally relevant if they involve recurring entities or user states. We r...

📄 A Novel Approach to Temporal QoS Estimation via Extended Kalman Filter-Incorporated Latent Feature Analysis
🗓️ Published: 6/22/2026
🔗 http://arxiv.org/abs/2606.23010v1
👥 Authors: Ye Yuan (possible past Carnegie Mellon University affiliation), Song Wang, Hongxun Zhou, Ling Wang (possible past University Of Oxford affiliation), Xin Luo
Abstract

Predicting temporal Quality of Service (QoS) data is critical for optimizing network services and rationalizing resource allocation in cloud computing and service-oriented systems. Existing mainstream methods have achieved promising predictive performance. However, their purely data-driven manner limits their ability to capture non-stationary temporal patterns, thereby leading to accuracy degradation when temporal QoS data exhibits fluctuations. To tackle this limitation, we propose a novel Exte...

*Notable papers are those with at least two authors from a "big" AI/ML lab.