πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ RISE-Video: Can Video Generators Decode Implicit World Rules?
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05986v1
πŸ‘₯ Authors: Mingxin Liu, Shuran Ma, Shibei Meng, Xiangyu Zhao, Zicheng Zhang, Shaofeng Zhang, Zhihang Zhong, Peixian Chen (possible past Tencent (China) affiliation), Haoyu Cao, Xing Sun (possible past Tencent (China) affiliation), Haodong Duan, Xue Yang
Abstract

While generative video models have achieved remarkable visual fidelity, their capacity to internalize and reason over implicit world rules remains a critical yet under-explored frontier. To bridge this gap, we present RISE-Video, a pioneering reasoning-oriented benchmark for Text-Image-to-Video (TI2V) synthesis that shifts the evaluative focus from surface-level aesthetics to deep cognitive reasoning. RISE-Video comprises 467 meticulously human-annotated samples spanning eight rigorous categorie...

πŸ“„ LSA: Localized Semantic Alignment for Enhancing Temporal Consistency in Traffic Video Generation
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05966v1
πŸ‘₯ Authors: Mirlan Karimov, Teodora Spasojevic, Markus Braun, Julian Wiederer, Vasileios Belagiannis (possible past University Of Oxford affiliation), Marc Pollefeys (possible past Google (United States) affiliation)
Abstract

Controllable video generation has emerged as a versatile tool for autonomous driving, enabling realistic synthesis of traffic scenarios. However, existing methods depend on control signals at inference time to guide the generative model towards temporally consistent generation of dynamic objects, limiting their utility as scalable and generalizable data engines. In this work, we propose Localized Semantic Alignment (LSA), a simple yet effective framework for fine-tuning pre-trained video generat...

πŸ“„ Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05885v1
πŸ‘₯ Authors: Wei Liu (possible past Tsinghua University affiliation), Jiawei Xu, Yingru Li, Longtao Zheng, Tianjian Li, Qian Liu, Junxian He (possible past Carnegie Mellon University affiliation)
Abstract

High-quality kernel is critical for scalable AI systems, and enabling LLMs to generate such code would advance AI development. However, training LLMs for this task requires sufficient data, a robust environment, and the process is often vulnerable to reward hacking and lazy optimization. In these cases, models may hack training rewards and prioritize trivial correctness over meaningful speedup. In this paper, we systematically study reinforcement learning (RL) for kernel generation. We first des...

πŸ“„ DLM-Scope: Mechanistic Interpretability of Diffusion Language Models via Sparse Autoencoders
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05859v1
πŸ‘₯ Authors: Xu Wang, Bingqing Jiang, Yu Wan, Baosong Yang (possible past Tencent (China) affiliation), Lingpeng Kong (possible past Google (United States) affiliation), Difan Zou
Abstract

Sparse autoencoders (SAEs) have become a standard tool for mechanistic interpretability in autoregressive large language models (LLMs), enabling researchers to extract sparse, human-interpretable features and intervene on model behavior. Recently, as diffusion language models (DLMs) have become an increasingly promising alternative to the autoregressive LLMs, it is essential to develop tailored mechanistic interpretability tools for this emerging class of models. In this work, we present DLM-Sco...

πŸ“„ BABE: Biology Arena BEnchmark
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05857v1
πŸ‘₯ Authors: Junting Zhou, Jin Chen, Linfeng Hao, Denghui Cao, Zheyu Wang, Qiguang Chen, Chaoyou Fu, Jiaze Chen, Yuchen Wu (possible past Google (United States) affiliation), Ge Zhang, Mingxuan Wang (possible past Tencent (China) affiliation), Wenhao Huang, Tong Yang (possible past Peking University affiliation)
Abstract

The rapid evolution of large language models (LLMs) has expanded their capabilities from basic dialogue to advanced scientific reasoning. However, existing benchmarks in biology often fail to assess a critical skill required of researchers: the ability to integrate experimental results with contextual knowledge to derive meaningful conclusions. To address this gap, we introduce BABE(Biology Arena BEnchmark), a comprehensive benchmark designed to evaluate the experimental reasoning capabilities o...

πŸ“„ Variational Speculative Decoding: Rethinking Draft Training from Token Likelihood to Sequence Acceptance
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05774v1
πŸ‘₯ Authors: Xiandong Zou, Jianshu Li (possible past National University Of Singapore affiliation), Jing Huang (possible past Meta (United States) affiliation), Pan Zhou
Abstract

Speculative decoding accelerates inference for (M)LLMs, yet a training-decoding discrepancy persists: while existing methods optimize single greedy trajectories, decoding involves verifying and ranking multiple sampled draft paths. We propose Variational Speculative Decoding (VSD), formulating draft training as variational inference over latent proposals (draft paths). VSD maximizes the marginal probability of target-model acceptance, yielding an ELBO that promotes high-quality latent proposals ...

πŸ“„ CSRv2: Unlocking Ultra-Sparse Embeddings
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05735v1
πŸ‘₯ Authors: Lixuan Guo, Yifei Wang, Tiansheng Wen, Yifan Wang (possible past Stanford University affiliation), Aosong Feng, Bo Chen (possible past Tencent (China) affiliation), Stefanie Jegelka, Chenyu You
Abstract

In the era of large foundation models, the quality of embeddings has become a central determinant of downstream task performance and overall system capability. Yet widely used dense embeddings are often extremely high-dimensional, incurring substantial costs in storage, memory, and inference latency. To address these, Contrastive Sparse Representation (CSR) is recently proposed as a promising direction, mapping dense embeddings into high-dimensional but k-sparse vectors, in contrast to compact d...

πŸ“„ CompactRAG: Reducing LLM Calls and Token Overhead in Multi-Hop Question Answering
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05728v1
πŸ‘₯ Authors: Hao Yang (possible past Tencent (China) affiliation), Zhiyu Yang, Xupeng Zhang, Wei Wei (possible past Google (United States) affiliation), Yunjie Zhang, Lin Yang
Abstract

Retrieval-augmented generation (RAG) has become a key paradigm for knowledge-intensive question answering. However, existing multi-hop RAG systems remain inefficient, as they alternate between retrieval and reasoning at each step, resulting in repeated LLM calls, high token consumption, and unstable entity grounding across hops. We propose CompactRAG, a simple yet effective framework that decouples offline corpus restructuring from online reasoning. In the offline stage, an LLM reads the corpu...

πŸ“„ Mining Generalizable Activation Functions
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05688v1
πŸ‘₯ Authors: Alex Vitvitskyi, Michael Boratko, Matej Grcic, Razvan Pascanu (possible past Google (United States) affiliation), Deep Shah, Petar VeličkoviΔ‡ (possible past University Of Cambridge affiliation)
Abstract

The choice of activation function is an active area of research, with different proposals aimed at improving optimization, while maintaining expressivity. Additionally, the activation function can significantly alter the implicit inductive bias of the architecture, controlling its non-linear behavior. In this paper, in line with previous work, we argue that evolutionary search provides a useful framework for finding new activation functions, while we also make two novel observations. The first i...

πŸ“„ Phi-Former: A Pairwise Hierarchical Approach for Compound-Protein Interactions Prediction
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05479v1
πŸ‘₯ Authors: Zhe Wang (possible past Deepmind (United Kingdom) affiliation), Zijing Liu, Chencheng Xu, Yuan Yao (possible past Tsinghua University affiliation)
Abstract

Drug discovery remains time-consuming, labor-intensive, and expensive, often requiring years and substantial investment per drug candidate. Predicting compound-protein interactions (CPIs) is a critical component in this process, enabling the identification of molecular interactions between drug candidates and target proteins. Recent deep learning methods have successfully modeled CPIs at the atomic level, achieving improved efficiency and accuracy over traditional energy-based approaches. Howeve...

πŸ“„ ProAct: Agentic Lookahead in Interactive Environments
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05327v1
πŸ‘₯ Authors: Yangbin Yu, Mingyu Yang, Junyou Li, Yiming Gao, Feiyu Liu, Yijun Yang, Zichuan Lin, Jiafei Lyu, Yicheng Liu, Zhicong Lu, Deheng Ye (possible past Tencent (China) affiliation), Jie Jiang (possible past Tencent (China) affiliation)
Abstract

Existing Large Language Model (LLM) agents struggle in interactive environments requiring long-horizon planning, primarily due to compounding errors when simulating future states. To address this, we propose ProAct, a framework that enables agents to internalize accurate lookahead reasoning through a two-stage training paradigm. First, we introduce Grounded LookAhead Distillation (GLAD), where the agent undergoes supervised fine-tuning on trajectories derived from environment-based search. By co...

πŸ“„ Towards a Science of Collective AI: LLM-based Multi-Agent Systems Need a Transition from Blind Trial-and-Error to Rigorous Science
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05289v1
πŸ‘₯ Authors: Jingru Fan, Dewen Liu, Yufan Dang, Huatao Li, Yuheng Wang, Wei Liu (possible past Tsinghua University affiliation), Feiyu Duan, Xuanwen Ding, Shu Yao, Lin Wu, Ruijie Shi, Wai-Shing Leung, Yuan Cheng, Zhongyu Wei, Cheng Yang (possible past Tsinghua University affiliation), Chen Qian (possible past Shanghai Jiao Tong University affiliation), Zhiyuan Liu (possible past Tsinghua University affiliation), Maosong Sun (possible past Tsinghua University affiliation)
Abstract

Recent advancements in Large Language Models (LLMs) have greatly extended the capabilities of Multi-Agent Systems (MAS), demonstrating significant effectiveness across a wide range of complex and open-ended domains. However, despite this rapid progress, the field still relies heavily on empirical trial-and-error. It lacks a unified and principled scientific framework necessary for systematic optimization and improvement. This bottleneck stems from the ambiguity of attribution: first, the absence...

πŸ“„ EGSS: Entropy-guided Stepwise Scaling for Reliable Software Engineering
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05242v1
πŸ‘₯ Authors: Chenhui Mao, Yuanting Lei, Zhixiang Wei, Ming Liang (possible past Tsinghua University affiliation), Zhixiang Wang, Jingxuan Xu, Dajun Chen, Wei Jiang (possible past Apple (United States) affiliation), Yong Li (possible past Tsinghua University affiliation)
Abstract

Agentic Test-Time Scaling (TTS) has delivered state-of-the-art (SOTA) performance on complex software engineering tasks such as code generation and bug fixing. However, its practical adoption remains limited due to significant computational overhead, primarily driven by two key challenges: (1) the high cost associated with deploying excessively large ensembles, and (2) the lack of a reliable mechanism for selecting the optimal candidate solution, ultimately constraining the performance gains tha...

πŸ“„ Position: Capability Control Should be a Separate Goal From Alignment
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05164v1
πŸ‘₯ Authors: Shoaib Ahmed Siddiqui, Eleni Triantafillou (possible past University Of Toronto affiliation), David Krueger, Adrian Weller (possible past University Of Cambridge affiliation)
Abstract

Foundation models are trained on broad data distributions, yielding generalist capabilities that enable many downstream applications but also expand the space of potential misuse and failures. This position paper argues that capability control -- imposing restrictions on permissible model behavior -- should be treated as a distinct goal from alignment. While alignment is often context and preference-driven, capability control aims to impose hard operational limits on permissible behaviors, inclu...

πŸ“„ Understanding LLM Evaluator Behavior: A Structured Multi-Evaluator Framework for Merchant Risk Assessment
πŸ—“οΈ Published: 2/4/2026
πŸ”— http://arxiv.org/abs/2602.05110v1
πŸ‘₯ Authors: Liang Wang (possible past Tencent (China) affiliation), Junpeng Wang, Chin-Chia Michael Yeh, Yan Zheng, Jiarui Sun (possible past Tencent (China) affiliation), Xiran Fan, Xin Dai, Yujie Fan, Yiwei Cai
Abstract

Large Language Models (LLMs) are increasingly used as evaluators of reasoning quality, yet their reliability and bias in payments-risk settings remain poorly understood. We introduce a structured multi-evaluator framework for assessing LLM reasoning in Merchant Category Code (MCC)-based merchant risk assessment, combining a five-criterion rubric with Monte-Carlo scoring to evaluate rationale quality and evaluator stability. Five frontier LLMs generate and cross-evaluate MCC risk rationales under...

πŸ“„ Do Vision-Language Models Respect Contextual Integrity in Location Disclosure?
πŸ—“οΈ Published: 2/4/2026
πŸ”— http://arxiv.org/abs/2602.05023v1
πŸ‘₯ Authors: Ruixin Yang, Ethan Mendes, Arthur Wang, James Hays, Sauvik Das (possible past Carnegie Mellon University affiliation), Wei Xu (possible past Tencent (China) affiliation), Alan Ritter (possible past Carnegie Mellon University affiliation)
Abstract

Vision-language models (VLMs) have demonstrated strong performance in image geolocation, a capability further sharpened by frontier multimodal large reasoning models (MLRMs). This poses a significant privacy risk, as these widely accessible models can be exploited to infer sensitive locations from casually shared photos, often at street-level precision, potentially surpassing the level of detail the sharer consented or intended to disclose. While recent work has proposed applying a blanket restr...

πŸ“„ DFPO: Scaling Value Modeling via Distributional Flow towards Robust and Generalizable LLM Post-Training
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05890v1
πŸ‘₯ Authors: Dingwei Zhu, Zhiheng Xi, Shihan Dou, Jiahan Li, Chenhao Huang, Junjie Ye, Sixian Li, Mingxu Chai, Yuhui Wang, Yajie Yang, Ming Zhang (possible past Peking University affiliation), Jiazheng Zhang, Shichun Liu, Caishuang Huang, Yunke Zhang, Yuran Wang, Tao Gui, Xipeng Qiu, Qi Zhang (possible past Tencent (China) affiliation), Xuanjing Huang
Abstract

Training reinforcement learning (RL) systems in real-world environments remains challenging due to noisy supervision and poor out-of-domain (OOD) generalization, especially in LLM post-training. Recent distributional RL methods improve robustness by modeling values with multiple quantile points, but they still learn each quantile independently as a scalar. This results in rough-grained value representations that lack fine-grained conditioning on state information, struggling under complex and OO...

πŸ“„ Cross-Domain Offline Policy Adaptation via Selective Transition Correction
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05776v1
πŸ‘₯ Authors: Mengbei Yan, Jiafei Lyu, Shengjie Sun, Zhongjian Qiao, Jingwen Yang, Zichuan Lin, Deheng Ye (possible past Tencent (China) affiliation), Xiu Li (possible past Tsinghua University affiliation)
Abstract

It remains a critical challenge to adapt policies across domains with mismatched dynamics in reinforcement learning (RL). In this paper, we study cross-domain offline RL, where an offline dataset from another similar source domain can be accessed to enhance policy learning upon a target domain dataset. Directly merging the two datasets may lead to suboptimal performance due to potential dynamics mismatches. Existing approaches typically mitigate this issue through source domain transition filter...

πŸ“„ Adaptive Global and Fine-Grained Perceptual Fusion for MLLM Embeddings Compatible with Hard Negative Amplification
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05729v1
πŸ‘₯ Authors: Lexiang Hu, Youze Xue, Dian Li (possible past Tencent (China) affiliation), Gang Liu (possible past Tencent (China) affiliation), Zhouchen Lin (possible past Peking University affiliation)
Abstract

Multimodal embeddings serve as a bridge for aligning vision and language, with the two primary implementations -- CLIP-based and MLLM-based embedding models -- both limited to capturing only global semantic information. Although numerous studies have focused on fine-grained understanding, we observe that complex scenarios currently targeted by MLLM embeddings often involve a hybrid perceptual pattern of both global and fine-grained elements, thus necessitating a compatible fusion mechanism. In t...

πŸ“„ Empowering Time Series Analysis with Large-Scale Multimodal Pretraining
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05646v1
πŸ‘₯ Authors: Peng Chen (possible past Tencent (China) affiliation), Siyuan Wang, Shiyan Hu, Xingjian Wu, Yang Shu, Zhongwen Rao, Meng Wang (possible past Google (United States) affiliation), Yijie Li, Bin Yang, Chenjuan Guo
Abstract

While existing time series foundation models primarily rely on large-scale unimodal pretraining, they lack complementary modalities to enhance time series understanding. Building multimodal foundation models is a natural next step, but it faces key challenges: 1) lack of a unified multimodal pretraining paradigm and large-scale multimodal corpora for time series analysis; 2) how to effectively integrate heterogeneous modalities and enhance model generalization. To address these challenges, we ta...

πŸ“„ Hinge Regression Tree: A Newton Method for Oblique Regression Tree Splitting
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05371v1
πŸ‘₯ Authors: Hongyi Li (possible past Google (United States) affiliation), Han Lin, Jun Xu (possible past Google (United States) affiliation)
Abstract

Oblique decision trees combine the transparency of trees with the power of multivariate decision boundaries, but learning high-quality oblique splits is NP-hard, and practical methods still rely on slow search or theory-free heuristics. We present the Hinge Regression Tree (HRT), which reframes each split as a non-linear least-squares problem over two linear predictors whose max/min envelope induces ReLU-like expressive power. The resulting alternating fitting procedure is exactly equivalent to ...

*Notable papers are those with at least two authors from a "big" AI/ML lab.