📄 Notable* Recent AI/ML arXiv Papers

Last updated just now...

📄 Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling
🗓️ Published: 2/11/2026
🔗 http://arxiv.org/abs/2602.11146v1
👥 Authors: Gongye Liu, Bo Yang (possible past Tencent (China) affiliation), Yida Zhi, Zhizhou Zhong, Lei Ke (possible past Tencent (China) affiliation), Didan Deng, Han Gao (possible past Tencent (China) affiliation), Yongxiang Huang, Kaihao Zhang (possible past Tencent (China) affiliation), Hongbo Fu, Wenhan Luo (possible past Tencent (China) affiliation)
Abstract

Preference optimization for diffusion and flow-matching models relies on reward functions that are both discriminatively robust and computationally efficient. Vision-Language Models (VLMs) have emerged as the primary reward provider, leveraging their rich multimodal priors to guide alignment. However, their computation and memory cost can be substantial, and optimizing a latent diffusion generator through a pixel-space reward introduces a domain mismatch that complicates alignment. In this paper...

📄 Chatting with Images for Introspective Visual Thinking
🗓️ Published: 2/11/2026
🔗 http://arxiv.org/abs/2602.11073v1
👥 Authors: Junfei Wu, Jian Guan, Qiang Liu, Shu Wu, Liang Wang (possible past Tencent (China) affiliation), Wei Wu (possible past Tencent (China) affiliation), Tienie Tan
Abstract

Current large vision-language models (LVLMs) typically rely on text-only reasoning based on a single-pass visual encoding, which often leads to loss of fine-grained visual information. Recently the proposal of ''thinking with images'' attempts to alleviate this limitation by manipulating images via external tools or code; however, the resulting visual states are often insufficiently grounded in linguistic semantics, impairing effective cross-modal alignment - particularly when visual semantics o...

📄 The CLEF-2026 FinMMEval Lab: Multilingual and Multimodal Evaluation of Financial AI Systems
🗓️ Published: 2/11/2026
🔗 http://arxiv.org/abs/2602.10886v1
👥 Authors: Zhuohan Xie, Rania Elbadry, Fan Zhang, Georgi Georgiev, Xueqing Peng, Lingfei Qian, Jimin Huang, Dimitar Dimitrov, Vanshikaa Jani, Yuyang Dai, Jiahui Geng, Yuxia Wang, Ivan Koychev, Veselin Stoyanov (possible past Meta (United States) affiliation), Preslav Nakov (possible past Tencent (China) affiliation)
Abstract

We present the setup and the tasks of the FinMMEval Lab at CLEF 2026, which introduces the first multilingual and multimodal evaluation framework for financial Large Language Models (LLMs). While recent advances in financial natural language processing have enabled automated analysis of market reports, regulatory documents, and investor communications, existing benchmarks remain largely monolingual, text-only, and limited to narrow subtasks. FinMMEval 2026 addresses this gap by offering three in...

📄 Flow caching for autoregressive video generation
🗓️ Published: 2/11/2026
🔗 http://arxiv.org/abs/2602.10825v1
👥 Authors: Yuexiao Ma, Xuzhe Zheng, Jing Xu (possible past Meta (United States) affiliation), Xiwei Xu, Feng Ling, Xiawu Zheng, Huafeng Kuang, Huixia Li, Xing Wang (possible past Tencent (China) affiliation), Xuefeng Xiao, Fei Chao, Rongrong Ji (possible past Tencent (China) affiliation)
Abstract

Autoregressive models, often built on Transformer architectures, represent a powerful paradigm for generating ultra-long videos by synthesizing content in sequential chunks. However, this sequential generation process is notoriously slow. While caching strategies have proven effective for accelerating traditional video diffusion models, existing methods assume uniform denoising across all frames-an assumption that breaks down in autoregressive models where different video chunks exhibit varying ...

📄 Locomo-Plus: Beyond-Factual Cognitive Memory Evaluation Framework for LLM Agents
🗓️ Published: 2/11/2026
🔗 http://arxiv.org/abs/2602.10715v1
👥 Authors: Yifei Li, Weidong Guo, Lingling Zhang (possible past Google (United States) affiliation), Rongman Xu, Muye Huang, Hui Liu, Lijiao Xu, Yu Xu (possible past Tencent (China) affiliation), Jun Liu (possible past Tencent (China) affiliation)
Abstract

Long-term conversational memory is a core capability for LLM-based dialogue systems, yet existing benchmarks and evaluation protocols primarily focus on surface-level factual recall. In realistic interactions, appropriate responses often depend on implicit constraints such as user state, goals, or values that are not explicitly queried later. To evaluate this setting, we introduce \textbf{LoCoMo-Plus}, a benchmark for assessing cognitive memory under cue--trigger semantic disconnect, where model...

📄 Spend Search Where It Pays: Value-Guided Structured Sampling and Optimization for Generative Recommendation
🗓️ Published: 2/11/2026
🔗 http://arxiv.org/abs/2602.10699v1
👥 Authors: Jie Jiang (possible past Tencent (China) affiliation), Yangru Huang, Zeyu Wang, Changping Wang, Yuling Xiong, Jun Zhang (possible past Tencent (China) affiliation), Huan Yu
Abstract

Generative recommendation via autoregressive models has unified retrieval and ranking into a single conditional generation framework. However, fine-tuning these models with Reinforcement Learning (RL) often suffers from a fundamental probability-reward mismatch. Conventional likelihood-dominated decoding (e.g., beam search) exhibits a myopic bias toward locally probable prefixes, which causes two critical failures: (1) insufficient exploration, where high-reward items in low-probability branches...

📄 Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters
🗓️ Published: 2/11/2026
🔗 http://arxiv.org/abs/2602.10604v1
👥 Authors: Ailin Huang, Ang Li (possible past Google (United States) affiliation), Aobo Kong, Bin Wang, Binxing Jiao, Bo Dong, Bojun Wang, Boyu Chen, Brian Li, Buyun Ma, Chang Su, Changxin Miao, Changyi Wan, Chao Lou, Chen Hu, Chen Xu, Chenfeng Yu, Chengting Feng, Chengyuan Yao, Chunrui Han, Dan Ma, Dapeng Shi, Daxin Jiang, Dehua Ma, Deshan Sun, Di Qi, Enle Liu, Fajie Zhang, Fanqi Wan, Guanzhe Huang, Gulin Yan, Guoliang Cao, Guopeng Li, Han Cheng, Hangyu Guo, Hanshan Zhang, Hao Nie, Haonan Jia, Haoran Lv, Hebin Zhou, Hekun Lv, Heng Wang, Heung-Yeung Shum, Hongbo Huang, Hongbo Peng, Hongyu Zhou, Hongyuan Wang, Houyong Chen, Huangxi Zhu, Huimin Wu, Huiyong Guo, Jia Wang, Jian Zhou (possible past Tencent (China) affiliation), Jianjian Sun, Jiaoren Wu, Jiaran Zhang, Jiashu Lv, Jiashuo Liu, Jiayi Fu, Jiayu Liu, Jie Cheng, Jie Luo, Jie Yang (possible past Shanghai Jiao Tong University affiliation), Jie Zhou (possible past Tsinghua University affiliation), Jieyi Hou, Jing Bai, Jingcheng Hu, Jingjing Xie, Jingwei Wu, Jingyang Zhang, Jishi Zhou, Junfeng Liu, Junzhe Lin, Ka Man Lo, Kai Liang, Kaibo Liu, Kaijun Tan, Kaiwen Yan, Kaixiang Li, Kang An, Kangheng Lin, Lei Yang (possible past Google (United States) affiliation), Liang Lv, Liang Zhao (possible past Baidu (China) affiliation), Liangyu Chen, Lieyu Shi, Liguo Tan, Lin Lin, Lina Chen, Luck Ma, Mengqiang Ren, Michael Li, Ming Li, Mingliang Li, Mingming Zhang, Mingrui Chen, Mitt Huang, Na Wang, Peng Liu, Qi Han, Qian Zhao, Qinglin He, Qinxin Du, Qiuping Wu, Quan Sun, Rongqiu Yang, Ruihang Miao, Ruixin Han, Ruosi Wan, Ruyan Guo, Shan Wang, Shaoliang Pang, Shaowen Yang, Shengjie Fan, Shijie Shang, Shiliang Yang, Shiwei Li, Shuangshuang Tian, Siqi Liu (possible past University Of Oxford affiliation), Siye Wu, Siyu Chen, Song Yuan, Tiancheng Cao, Tianchi Yue, Tianhao Cheng, Tianning Li, Tingdan Luo, Wang You, Wei Ji (possible past Tencent (China) affiliation), Wei Yuan, Wei Zhang (possible past Tsinghua University affiliation), Weibo Wu, Weihao Xie, Wen Sun, Wenjin Deng, Wenzhen Zheng, Wuxun Xie, Xiangfeng Wang, Xiangwen Kong, Xiangyu Liu, Xiangyu Zhang, Xiaobo Yang, Xiaojia Liu, Xiaolan Yuan, Xiaoran Jiao, Xiaoxiao Ren, Xiaoyun Zhang (possible past Shanghai Jiao Tong University affiliation), Xin Li (possible past Google (United States) affiliation), Xin Liu, Xin Wu, Xing Chen, Xingping Yang, Xinran Wang, Xu Zhao, Xuan He, Xuanti Feng, Xuedan Cai, Xuqiang Zhou, Yanbo Yu, Yang Li (possible past Google (United States) affiliation), Yang Xu, Yanlin Lai, Yanming Xu, Yaoyu Wang, Yeqing Shen, Yibo Zhu, Yichen Lv, Yicheng Cao, Yifeng Gong, Yijing Yang, Yikun Yang, Yin Zhao, Yingxiu Zhao, Yinmin Zhang, Yitong Zhang, Yixuan Zhang, Yiyang Chen, Yongchi Zhao, Yongshen Long, Yongyao Wang, Yousong Guan, Yu Zhou, Yuang Peng, Yuanhao Ding, Yuantao Fan, Yuanzhen Yang, Yuchu Luo, Yudi Zhao, Yue Peng, Yueqiang Lin, Yufan Lu, Yuling Zhao, Yunzhou Ju, Yurong Zhang, Yusheng Li, Yuxiang Yang, Yuyang Chen, Yuzhu Cai, Zejia Weng, Zetao Hong, Zexi Li, Zhe Xie, Zheng Ge, Zheng Gong, Zheng Zeng, Zhenyi Lu, Zhewei Huang, Zhichao Chang, Zhiguo Huang, Zhiheng Hu, Zidong Yang, Zili Wang, Ziqi Ren, Zixin Zhang, Zixuan Wang
Abstract

We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: sharp reasoning and fast, reliable execution. Step 3.5 Flash pairs a 196B-parameter foundation with 11B active parameters for efficient inference. It is optimized with interleaved 3:1 sliding-window/full attention and Multi-Token Prediction (MTP-3) to reduce the latency and cost of multi-round agent...

📄 Constructing Industrial-Scale Optimization Modeling Benchmark
🗓️ Published: 2/11/2026
🔗 http://arxiv.org/abs/2602.10450v1
👥 Authors: Zhong Li (possible past Tencent (China) affiliation), Hongliang Lu, Tao Wei (possible past Baidu (China) affiliation), Wenyu Liu, Yuxuan Chen, Yuan Lan, Fan Zhang, Zaiwen Wen
Abstract

Optimization modeling underpins decision-making in logistics, manufacturing, energy, and finance, yet translating natural-language requirements into correct optimization formulations and solver-executable code remains labor-intensive. Although large language models (LLMs) have been explored for this task, evaluation is still dominated by toy-sized or synthetic benchmarks, masking the difficulty of industrial problems with $10^{3}$--$10^{6}$ (or more) variables and constraints. A key bottleneck i...

📄 Breaking the Curse of Repulsion: Optimistic Distributionally Robust Policy Optimization for Off-Policy Generative Recommendation
🗓️ Published: 2/11/2026
🔗 http://arxiv.org/abs/2602.10430v1
👥 Authors: Jie Jiang (possible past Tencent (China) affiliation), Yusen Huo, Xiangxin Zhan, Changping Wang, Jun Zhang (possible past Tencent (China) affiliation)
Abstract

Policy-based Reinforcement Learning (RL) has established itself as the dominant paradigm in generative recommendation for optimizing sequential user interactions. However, when applied to offline historical logs, these methods suffer a critical failure: the dominance of low-quality data induces severe model collapse. We first establish the Divergence Theory of Repulsive Optimization, revealing that negative gradient updates inherently trigger exponential intensity explosion during off-policy tra...

📄 AI-rithmetic
🗓️ Published: 2/11/2026
🔗 http://arxiv.org/abs/2602.10416v1
👥 Authors: Alex Bie, Travis Dick, Alex Kulesza, Prabhakar Raghavan (possible past Google (United States) affiliation), Vinod Raman, Sergei Vassilvitskii (possible past Google (United States) affiliation)
Abstract

Modern AI systems have been successfully deployed to win medals at international math competitions, assist with research workflows, and prove novel technical lemmas. However, despite their progress at advanced levels of mathematics, they remain stubbornly bad at basic arithmetic, consistently failing on the simple task of adding two numbers. We present a systematic investigation of this phenomenon. We demonstrate empirically that all frontier models suffer significantly degraded accuracy for int...

📄 Affordances Enable Partial World Modeling with LLMs
🗓️ Published: 2/11/2026
🔗 http://arxiv.org/abs/2602.10390v1
👥 Authors: Khimya Khetarpal (possible past Deepmind (United Kingdom) affiliation), Gheorghe Comanici, Jonathan Richens, Jeremy Shar, Fei Xia (possible past Stanford University affiliation), Laurent Orseau (possible past Deepmind (United Kingdom) affiliation), Aleksandra Faust (possible past Google (United States) affiliation), Doina Precup (possible past Deepmind (United Kingdom) affiliation)
Abstract

Full models of the world require complex knowledge of immense detail. While pre-trained large models have been hypothesized to contain similar knowledge due to extensive pre-training on vast amounts of internet scale data, using them directly in a search procedure is inefficient and inaccurate. Conversely, partial models focus on making high quality predictions for a subset of state and actions: those linked through affordances that achieve user intents~\citep{khetarpal2020can}. Can we posit lar...

📄 Self-Evolving Recommendation System: End-To-End Autonomous Model Optimization With LLM Agents
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.10226v1
👥 Authors: Haochen Wang, Yi Wu (possible past University Of California, Berkeley affiliation), Daryl Chang, Li Wei (possible past Google (United States) affiliation), Lukasz Heldt (possible past Google (United States) affiliation)
Abstract

Optimizing large-scale machine learning systems, such as recommendation models for global video platforms, requires navigating a massive hyperparameter search space and, more critically, designing sophisticated optimizers, architectures, and reward functions to capture nuanced user behaviors. Achieving substantial improvements in these areas is a non-trivial task, traditionally relying on extensive manual iterations to test new hypotheses. We propose a self-evolving system that leverages Large L...

📄 Causality in Video Diffusers is Separable from Denoising
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.10095v1
👥 Authors: Xingjian Bai, Guande He, Zhengqi Li (possible past Google (United States) affiliation), Eli Shechtman, Xun Huang (possible past Nvidia (United States) affiliation), Zongze Wu
Abstract

Causality -- referring to temporal, uni-directional cause-effect relationships between components -- underlies many complex generative processes, including videos, language, and robot trajectories. Current causal diffusion models entangle temporal reasoning with iterative denoising, applying causal attention across all layers, at every denoising step, and over the entire context. In this paper, we show that the causal reasoning in these models is separable from the multi-step denoising process. ...

📄 Towards Autonomous Mathematics Research
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.10177v1
👥 Authors: Tony Feng, Trieu H. Trinh (possible past Google (United States) affiliation), Garrett Bingham, Dawsen Hwang, Yuri Chervonyi, Junehyuk Jung, Joonkyung Lee, Carlo Pagano, Sang-Hyun Kim, Federico Pasqualotto, Sergei Gukov, Jonathan N. Lee, Junsu Kim, Kaiying Hou, Golnaz Ghiasi (possible past Google (United States) affiliation), Yi Tay (possible past Stanford University affiliation), Yaguang Li, Chenkai Kuang, Yuan Liu (possible past Google (United States) affiliation), Hanzhao, Lin, Evan Zheran Liu, Nigamaa Nayakanti, Xiaomeng Yang, Heng-Tze Cheng (possible past Google (United States) affiliation), Demis Hassabis (possible past Google (United States) affiliation), Koray Kavukcuoglu (possible past Google (United States) affiliation), Quoc V. Le (possible past Stanford University affiliation), Thang Luong (possible past Stanford University affiliation)
Abstract

Recent advances in foundational models have yielded reasoning systems capable of achieving a gold-medal standard at the International Mathematical Olympiad. The transition from competition-level problem-solving to professional research, however, requires navigating vast literature and constructing long-horizon proofs. In this work, we introduce Aletheia, a math research agent that iteratively generates, verifies, and revises solutions end-to-end in natural language. Specifically, Aletheia is pow...

📄 Chain of Mindset: Reasoning with Adaptive Cognitive Modes
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.10063v1
👥 Authors: Tianyi Jiang, Arctanx An, Hengyi Feng, Naixin Zhai, Haodong Li, Xiaomin Yu, Jiahui Liu (possible past Google (United States) affiliation), Hanwen Du, Shuo Zhang (possible past National University Of Defense Technology affiliation), Zhi Yang, Jie Huang, Yuhua Li, Yongxin Ni, Huacan Wang, Ronghao Chen
Abstract

Human problem-solving is never the repetition of a single mindset, by which we mean a distinct mode of cognitive processing. When tackling a specific task, we do not rely on a single mindset; instead, we integrate multiple mindsets within the single solution process. However, existing LLM reasoning methods fall into a common trap: they apply the same fixed mindset across all steps, overlooking that different stages of solving the same problem require fundamentally different mindsets. This single...

📄 Kunlun: Establishing Scaling Laws for Massive-Scale Recommendation Systems through Unified Architecture Design
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.10016v1
👥 Authors: Bojian Hou, Xiaolong Liu, Xiaoyi Liu, Jiaqi Xu, Yasmine Badr, Mengyue Hang, Sudhanshu Chanpuriya, Junqing Zhou, Yuhang Yang, Han Xu, Qiuling Suo, Laming Chen, Yuxi Hu, Jiasheng Zhang, Huaqing Xiong, Yuzhen Huang, Chao Chen (possible past Tencent (China) affiliation), Yue Dong, Yi Yang (possible past Baidu (China) affiliation), Shuo Chang, Xiaorui Gan, Wenlin Chen (possible past Meta (United States) affiliation), Santanu Kolay, Darren Liu, Jade Nie, Chunzhi Yang, Jiyan Yang (possible past Meta (United States) affiliation), Huayu Li
Abstract

Deriving predictable scaling laws that govern the relationship between model performance and computational investment is crucial for designing and allocating resources in massive-scale recommendation systems. While such laws are established for large language models, they remain challenging for recommendation systems, especially those processing both user history and context features. We identify poor scaling efficiency as the main barrier to predictable power-law scaling, stemming from ineffici...

📄 Infusion: Shaping Model Behavior by Editing Training Data via Influence Functions
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.09987v2
👥 Authors: J Rosser, Robert Kirk, Edward Grefenstette (possible past University Of Oxford affiliation), Jakob Foerster (possible past University Of Oxford affiliation), Laura Ruis
Abstract

Influence functions are commonly used to attribute model behavior to training documents. We explore the reverse: crafting training data that induces model behavior. Our framework, Infusion, uses scalable influence-function approximations to compute small perturbations to training documents that induce targeted changes in model behavior through parameter shifts. We evaluate Infusion on data poisoning tasks across vision and language domains. On CIFAR-10, we show that making subtle edits via Infus...

📄 Code2World: A GUI World Model via Renderable Code Generation
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.09856v1
👥 Authors: Yuhao Zheng, Li'an Zhong, Yi Wang, Rui Dai, Kaikui Liu, Xiangxiang Chu, Linyuan Lv, Philip Torr (possible past University Of Oxford affiliation), Kevin Qinghong Lin (possible past National University Of Singapore affiliation)
Abstract

Autonomous GUI agents interact with environments by perceiving interfaces and executing actions. As a virtual sandbox, the GUI World model empowers agents with human-like foresight by enabling action-conditioned prediction. However, existing text- and pixel-based approaches struggle to simultaneously achieve high visual fidelity and fine-grained structural controllability. To this end, we propose Code2World, a vision-language coder that simulates the next visual state via renderable code generat...

📄 EvoCodeBench: A Human-Performance Benchmark for Self-Evolving LLM-Driven Coding Systems
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.10171v1
👥 Authors: Wentao Zhang (possible past Mila - Quebec Artificial Intelligence Institute affiliation), Jianfeng Wang, Liheng Liang, Yilei Zhao, Haibin Wen, Zhe Zhao (possible past Tencent (China) affiliation)
Abstract

As large language models (LLMs) continue to advance in programming tasks, LLM-driven coding systems have evolved from one-shot code generation into complex systems capable of iterative improvement during inference. However, existing code benchmarks primarily emphasize static correctness and implicitly assume fixed model capability during inference. As a result, they do not capture inference-time self-evolution, such as whether accuracy and efficiency improve as an agent iteratively refines its s...

📄 YOR: Your Own Mobile Manipulator for Generalizable Robotics
🗓️ Published: 2/11/2026
🔗 http://arxiv.org/abs/2602.11150v1
👥 Authors: Manan H Anjaria, Mehmet Enes Erciyes, Vedant Ghatnekar, Neha Navarkar, Haritheja Etukuru, Xiaole Jiang, Kanad Patel, Dhawal Kabra, Nicholas Wojno, Radhika Ajay Prayage, Soumith Chintala (possible past Meta (United States) affiliation), Lerrel Pinto (possible past Carnegie Mellon University affiliation), Nur Muhammad Mahi Shafiullah, Zichen Jeff Cui
Abstract

Recent advances in robot learning have generated significant interest in capable platforms that may eventually approach human-level competence. This interest, combined with the commoditization of actuators, has propelled growth in low-cost robotic platforms. However, the optimal form factor for mobile manipulation, especially on a budget, remains an open question. We introduce YOR, an open-source, low-cost mobile manipulator that integrates an omnidirectional base, a telescopic vertical lift, an...

📄 RePO: Bridging On-Policy Learning and Off-Policy Knowledge through Rephrasing Policy Optimization
🗓️ Published: 2/11/2026
🔗 http://arxiv.org/abs/2602.10819v1
👥 Authors: Linxuan Xia, Xiaolong Yang (possible past Baidu (China) affiliation), Yongyuan Chen, Enyue Zhao, Deng Cai (possible past Shanghai Jiao Tong University affiliation), Yasheng Wang, Boxi Wu
Abstract

Aligning large language models (LLMs) on domain-specific data remains a fundamental challenge. Supervised fine-tuning (SFT) offers a straightforward way to inject domain knowledge but often degrades the model's generality. In contrast, on-policy reinforcement learning (RL) preserves generality but fails to effectively assimilate hard samples that exceed the model's current reasoning level. Recent off-policy RL attempts improve hard sample utilization, yet they suffer from severe training instabi...

📄 Why Does RL Generalize Better Than SFT? A Data-Centric Perspective on VLM Post-Training
🗓️ Published: 2/11/2026
🔗 http://arxiv.org/abs/2602.10815v1
👥 Authors: Aojun Lu, Tao Feng, Hangjie Yuan, Wei Li (possible past Peking University affiliation), Yanan Sun (possible past Tencent (China) affiliation)
Abstract

The adaptation of large-scale Vision-Language Models (VLMs) through post-training reveals a pronounced generalization gap: models fine-tuned with Reinforcement Learning (RL) consistently achieve superior out-of-distribution (OOD) performance compared to those trained with Supervised Fine-Tuning (SFT). This paper posits a data-centric explanation for this phenomenon, contending that RL's generalization advantage arises from an implicit data filtering mechanism that inherently prioritizes medium-d...

📄 PRISM: Parallel Residual Iterative Sequence Model
🗓️ Published: 2/11/2026
🔗 http://arxiv.org/abs/2602.10796v1
👥 Authors: Jie Jiang (possible past Tencent (China) affiliation), Ke Cheng, Xin Xu, Mengyang Pang, Tianhao Lu, Jiaheng Li, Yue Liu, Yuan Wang, Jun Zhang (possible past Tencent (China) affiliation), Huan Yu, Zhouchen Lin (possible past Peking University affiliation)
Abstract

Generative sequence modeling faces a fundamental tension between the expressivity of Transformers and the efficiency of linear sequence models. Existing efficient architectures are theoretically bounded by shallow, single-step linear updates, while powerful iterative methods like Test-Time Training (TTT) break hardware parallelism due to state-dependent gradients. We propose PRISM (Parallel Residual Iterative Sequence Model) to resolve this tension. PRISM introduces a solver-inspired inductive b...

📄 SnapMLA: Efficient Long-Context MLA Decoding via Hardware-Aware FP8 Quantized Pipelining
🗓️ Published: 2/11/2026
🔗 http://arxiv.org/abs/2602.10718v1
👥 Authors: Yifan Zhang, Zunhai Su, Shuhao Hu, Rui Yang, Wei Wu (possible past Tencent (China) affiliation), Yulei Qian (possible past Baidu (China) affiliation), Yuchen Xie, Xunliang Cai
Abstract

While FP8 attention has shown substantial promise in innovations like FlashAttention-3, its integration into the decoding phase of the DeepSeek Multi-head Latent Attention (MLA) architecture presents notable challenges. These challenges include numerical heterogeneity arising from the decoupling of positional embeddings, misalignment of quantization scales in FP8 PV GEMM, and the need for optimized system-level support. In this paper, we introduce SnapMLA, an FP8 MLA decoding framework optimized...

📄 Gauss-Newton Unlearning for the LLM Era
🗓️ Published: 2/11/2026
🔗 http://arxiv.org/abs/2602.10568v1
👥 Authors: Lev Mckinney, Anvith Thudi, Juhan Bae, Tara Rezaei, Nicolas Papernot (possible past University Of Toronto affiliation), Sheila A. Mcilraith, Roger Grosse (possible past University Of Toronto affiliation)
Abstract

Standard large language model training can create models that produce outputs their trainer deems unacceptable in deployment. The probability of these outputs can be reduced using methods such as LLM unlearning. However, unlearning a set of data (called the forget set) can degrade model performance on other distributions where the trainer wants to retain the model's behavior. To improve this trade-off, we demonstrate that using the forget set to compute only a few uphill Gauss-Newton steps provi...

📄 Compute Only Once: UG-Separation for Efficient Large Recommendation Models
🗓️ Published: 2/11/2026
🔗 http://arxiv.org/abs/2602.10455v1
👥 Authors: Hui Lu, Zheng Chai, Shipeng Bai, Hao Zhang (possible past Tencent (China) affiliation), Zhifang Fan, Kunmin Bai, Yingwen Wu, Bingzheng Wei (possible past Tencent (China) affiliation), Xiang Sun, Ziyan Gong, Tianyi Liu, Hua Chen, Deping Xie (possible past Baidu (China) affiliation), Zhongkai Chen, Zhiliang Guo, Qiwei Chen, Yuchao Zheng
Abstract

Driven by scaling laws, recommender systems increasingly rely on large-scale models to capture complex feature interactions and user behaviors, but this trend also leads to prohibitive training and inference costs. While long-sequence models(e.g., LONGER) can reuse user-side computation through KV caching, such reuse is difficult in dense feature interaction architectures(e.g., RankMixer), where user and group (candidate item) features are deeply entangled across layers. In this work, we propose...

📄 End-to-End Semantic ID Generation for Generative Advertisement Recommendation
🗓️ Published: 2/11/2026
🔗 http://arxiv.org/abs/2602.10445v1
👥 Authors: Jie Jiang (possible past Tencent (China) affiliation), Xinxun Zhang, Enming Zhang, Yuling Xiong, Jun Zhang (possible past Tencent (China) affiliation), Jingwen Wang (possible past Tencent (China) affiliation), Huan Yu, Yuxiang Wang, Hao Wang (possible past Tsinghua University affiliation), Xiao Yan, Jiawei Jiang (possible past Tencent (China) affiliation)
Abstract

Generative Recommendation (GR) has excelled by framing recommendation as next-token prediction. This paradigm relies on Semantic IDs (SIDs) to tokenize large-scale items into discrete sequences. Existing GR approaches predominantly generate SIDs via Residual Quantization (RQ), where items are encoded into embeddings and then quantized to discrete SIDs. However, this paradigm suffers from inherent limitations: 1) Objective misalignment and semantic degradation stemming from the two-stage compress...

📄 Hardware Co-Design Scaling Laws via Roofline Modelling for On-Device LLMs
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.10377v1
👥 Authors: Luoyang Sun, Jiwen Jiang, Yifeng Ding, Fengfa Li, Yan Song (possible past Tencent (China) affiliation), Haifeng Zhang, Jian Ying, Lei Ren, Kun Zhan, Wei Chen, Yan Xie, Cheng Deng (possible past Tencent (China) affiliation)
Abstract

Vision-Language-Action Models (VLAs) have emerged as a key paradigm of Physical AI and are increasingly deployed in autonomous vehicles, robots, and smart spaces. In these resource-constrained on-device settings, selecting an appropriate large language model (LLM) backbone is a critical challenge: models must balance accuracy with strict inference latency and hardware efficiency constraints. This makes hardware-software co-design a game-changing requirement for on-device LLM deployment, where ea...

*Notable papers are those with at least two authors from a "big" AI/ML lab.