📄 Notable* Recent AI/ML arXiv Papers

Last updated just now...

📄 Rethinking Diffusion Models with Symmetries through Canonicalization with Applications to Molecular Graph Generation
🗓️ Published: 2/16/2026
🔗 http://arxiv.org/abs/2602.15022v1
👥 Authors: Cai Zhou, Zijie Chen, Zian Li, Jike Wang, Kaiyi Jiang, Pan Li (possible past Baidu (China) affiliation), Rose Yu, Muhan Zhang (possible past Meta (United States) affiliation), Stephen Bates, Tommi Jaakkola
Abstract

Many generative tasks in chemistry and science involve distributions invariant to group symmetries (e.g., permutation and rotation). A common strategy enforces invariance and equivariance through architectural constraints such as equivariant denoisers and invariant priors. In this paper, we challenge this tradition through the alternative canonicalization perspective: first map each sample to an orbit representative with a canonical pose or order, train an unconstrained (non-equivariant) diffusi...

📄 PhyScensis: Physics-Augmented LLM Agents for Complex Physical Scene Arrangement
🗓️ Published: 2/16/2026
🔗 http://arxiv.org/abs/2602.14968v1
👥 Authors: Yian Wang, Han Yang (possible past Eth Zurich affiliation), Minghao Guo, Xiaowen Qiu, Tsun-Hsuan Wang, Wojciech Matusik, Joshua B. Tenenbaum (possible past Massachusetts Institute Of Technology affiliation), Chuang Gan (possible past Tsinghua University affiliation)
Abstract

Automatically generating interactive 3D environments is crucial for scaling up robotic data collection in simulation. While prior work has primarily focused on 3D asset placement, it often overlooks the physical relationships between objects (e.g., contact, support, balance, and containment), which are essential for creating complex and realistic manipulation scenarios such as tabletop arrangements, shelf organization, or box packing. Compared to classical 3D layout generation, producing complex...

📄 On the Learning Dynamics of RLVR at the Edge of Competence
🗓️ Published: 2/16/2026
🔗 http://arxiv.org/abs/2602.14872v1
👥 Authors: Yu Huang (possible past Tencent (China) affiliation), Zixin Wen, Yuejie Chi, Yuting Wei, Aarti Singh (possible past Carnegie Mellon University affiliation), Yingbin Liang, Yuxin Chen
Abstract

Reinforcement learning with verifiable rewards (RLVR) has been a main driver of recent breakthroughs in large reasoning models. Yet it remains a mystery how rewards based solely on final outcomes can help overcome the long-horizon barrier to extended reasoning. To understand this, we develop a theory of the training dynamics of RL for transformers on compositional reasoning tasks. Our theory characterizes how the effectiveness of RLVR is governed by the smoothness of the difficulty spectrum. Whe...

📄 CoCoDiff: Correspondence-Consistent Diffusion Model for Fine-grained Style Transfer
🗓️ Published: 2/16/2026
🔗 http://arxiv.org/abs/2602.14464v1
👥 Authors: Wenbo Nie, Zixiang Li, Renshuai Tao, Bin Wu, Yunchao Wei (possible past National University Of Singapore affiliation), Yao Zhao (possible past Microsoft (United States) affiliation)
Abstract

Transferring visual style between images while preserving semantic correspondence between similar objects remains a central challenge in computer vision. While existing methods have made great strides, most of them operate at global level but overlook region-wise and even pixel-wise semantic correspondence. To address this, we propose CoCoDiff, a novel training-free and low-cost style transfer framework that leverages pretrained latent diffusion models to achieve fine-grained, semantically consi...

📄 Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5
🗓️ Published: 2/16/2026
🔗 http://arxiv.org/abs/2602.14457v1
👥 Authors: Dongrui Liu, Yi Yu, Jie Zhang, Guanxu Chen, Qihao Lin, Hanxi Zhu, Lige Huang, Yijin Zhou, Peng Wang (possible past Peking University affiliation), Shuai Shao, Boxuan Zhang, Zicheng Liu (possible past Microsoft (United States) affiliation), Jingwei Sun, Yu Li (possible past Tencent (China) affiliation), Yuejin Xie, Jiaxuan Guo, Jia Xu, Chaochao Lu, Bowen Zhou, Xia Hu, Jing Shao
Abstract

To understand and identify the unprecedented risks posed by rapidly advancing artificial intelligence (AI) models, Frontier AI Risk Management Framework in Practice presents a comprehensive assessment of their frontier risks. As Large Language Models (LLMs) general capabilities rapidly evolve and the proliferation of agentic AI, this version of the risk analysis technical report presents an updated and granular assessment of five critical dimensions: cyber offense, persuasion and manipulation, s...

📄 Precedent-Informed Reasoning: Mitigating Overthinking in Large Reasoning Models via Test-Time Precedent Learning
🗓️ Published: 2/16/2026
🔗 http://arxiv.org/abs/2602.14451v1
👥 Authors: Qianyue Wang, Jinwu Hu, Huanxiang Lin, Bolin Chen, Zhiquan Wen, Yaofo Chen, Yu Rong (possible past Tencent (China) affiliation), Mingkui Tan (possible past Baidu (China) affiliation)
Abstract

Reasoning in Large Language Models (LLMs) often suffers from inefficient long chain-of-thought traces with redundant self-exploration and validation, which inflate computational costs and even degrade performance. Inspired by human reasoning patterns where people solve new problems by leveraging past related cases to constrain search spaces and reduce trial-and-error, we propose Precedent Informed Reasoning (PIR) transforming LRMs'reasoning paradigm from exhaustive self-exploration to guided lea...

📄 pFedNavi: Structure-Aware Personalized Federated Vision-Language Navigation for Embodied AI
🗓️ Published: 2/16/2026
🔗 http://arxiv.org/abs/2602.14401v1
👥 Authors: Qingqian Yang, Hao Wang (possible past Tsinghua University affiliation), Sai Qian Zhang, Jian Li (possible past Tencent (China) affiliation), Yang Hua, Miao Pan, Tao Song, Zhengwei Qi, Haibing Guan
Abstract

Vision-Language Navigation VLN requires large-scale trajectory instruction data from private indoor environments, raising significant privacy concerns. Federated Learning FL mitigates this by keeping data on-device, but vanilla FL struggles under VLNs' extreme cross-client heterogeneity in environments and instruction styles, making a single global model suboptimal. This paper proposes pFedNavi, a structure-aware and dynamically adaptive personalized federated learning framework tailored for VLN...

📄 InnoEval: On Research Idea Evaluation as a Knowledge-Grounded, Multi-Perspective Reasoning Problem
🗓️ Published: 2/16/2026
🔗 http://arxiv.org/abs/2602.14367v1
👥 Authors: Shuofei Qiao, Yunxiang Wei, Xuehai Wang, Bin Wu, Boyang Xue, Ningyu Zhang (possible past Tencent (China) affiliation), Hossein A. Rahmani, Yanshan Wang, Qiang Zhang (possible past Tsinghua University affiliation), Keyan Ding, Jeff Z. Pan, Huajun Chen (possible past Alibaba Group (China) affiliation), Emine Yilmaz
Abstract

The rapid evolution of Large Language Models has catalyzed a surge in scientific idea production, yet this leap has not been accompanied by a matching advance in idea evaluation. The fundamental nature of scientific evaluation needs knowledgeable grounding, collective deliberation, and multi-criteria decision-making. However, existing idea evaluation methods often suffer from narrow knowledge horizons, flattened evaluation dimensions, and the inherent bias in LLM-as-a-Judge. To address these, we...

📄 AutoWebWorld: Synthesizing Infinite Verifiable Web Environments via Finite State Machines
🗓️ Published: 2/15/2026
🔗 http://arxiv.org/abs/2602.14296v1
👥 Authors: Yifan Wu (possible past Carnegie Mellon University affiliation), Yiran Peng, Yiyu Chen, Jianhao Ruan, Zijie Zhuang, Cheng Yang (possible past Tsinghua University affiliation), Jiayi Zhang, Man Chen, Yenchi Tseng, Zhaoyang Yu, Liang Chen (possible past Google (United States) affiliation), Yuyao Zhai, Bang Liu, Chenglin Wu, Yuyu Luo
Abstract

The performance of autonomous Web GUI agents heavily relies on the quality and quantity of their training data. However, a fundamental bottleneck persists: collecting interaction trajectories from real-world websites is expensive and difficult to verify. The underlying state transitions are hidden, leading to reliance on inconsistent and costly external verifiers to evaluate step-level correctness. To address this, we propose AutoWebWorld, a novel framework for synthesizing controllable and veri...

📄 KernelBlaster: Continual Cross-Task CUDA Optimization via Memory-Augmented In-Context Reinforcement Learning
🗓️ Published: 2/15/2026
🔗 http://arxiv.org/abs/2602.14293v1
👥 Authors: Kris Shengjun Dong, Sahil Modi, Dima Nikiforov, Sana Damani, Edward Lin, Siva Kumar Sastry Hari (possible past Nvidia (United States) affiliation), Christos Kozyrakis (possible past Stanford University affiliation)
Abstract

Optimizing CUDA code across multiple generations of GPU architectures is challenging, as achieving peak performance requires an extensive exploration of an increasingly complex, hardware-specific optimization space. Traditional compilers are constrained by fixed heuristics, whereas finetuning Large Language Models (LLMs) can be expensive. However, agentic workflows for CUDA code optimization have limited ability to aggregate knowledge from prior exploration, leading to biased sampling and subopt...

📄 Text Before Vision: Staged Knowledge Injection Matters for Agentic RLVR in Ultra-High-Resolution Remote Sensing Understanding
🗓️ Published: 2/15/2026
🔗 http://arxiv.org/abs/2602.14225v1
👥 Authors: Fengxiang Wang, Mingshuo Chen, Yueying Li, Yajie Yang, Yuhao Zhou, Di Wang, Yifan Zhang, Haoyu Wang (possible past Tencent (China) affiliation), Haiyan Zhao, Hongda Sun, Long Lan, Jun Song, Yulin Wang, Jing Zhang (possible past University Of Washington affiliation), Wenlong Zhang, Bo Du
Abstract

Multimodal reasoning for ultra-high-resolution (UHR) remote sensing (RS) is usually bottlenecked by visual evidence acquisition: the model necessitates localizing tiny task-relevant regions in massive pixel spaces. While Agentic Reinforcement Learning with Verifiable Rewards (RLVR) using zoom-in tools offers a path forward, we find that standard reinforcement learning struggles to navigate these vast visual spaces without structured domain priors. In this paper, we investigate the interplay betw...

📄 SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement
🗓️ Published: 2/15/2026
🔗 http://arxiv.org/abs/2602.14211v1
👥 Authors: Xiaojun Jia, Jie Liao, Simeng Qin, Jindong Gu, Wenqi Ren (possible past Tencent (China) affiliation), Xiaochun Cao, Yang Liu (possible past Tsinghua University affiliation), Philip Torr (possible past University Of Oxford affiliation)
Abstract

Agent skills are becoming a core abstraction in coding agents, packaging long-form instructions and auxiliary scripts to extend tool-augmented behaviors. This abstraction introduces an under-measured attack surface: skill-based prompt injection, where poisoned skills can steer agents away from user intent and safety policies. In practice, naive injections often fail because the malicious intent is too explicit or drifts too far from the original skill, leading agents to ignore or refuse them; ex...

📄 UniWeTok: An Unified Binary Tokenizer with Codebook Size $\mathit{2^{128}}$ for Unified Multimodal Large Language Model
🗓️ Published: 2/15/2026
🔗 http://arxiv.org/abs/2602.14178v1
👥 Authors: Shaobin Zhuang, Yuang Ai, Jiaming Han, Weijia Mao, Xiaohui Li, Fangyikang Wang, Xiao Wang (possible past Google (United States) affiliation), Yan Li (possible past Tencent (China) affiliation), Shanchuan Lin, Kun Xu (possible past Tsinghua University affiliation), Zhenheng Yang (possible past Meta (United States) affiliation), Huaibo Huang, Xiangyu Yue (possible past University Of California, Berkeley affiliation), Hao Chen, Yali Wang
Abstract

Unified Multimodal Large Language Models (MLLMs) require a visual representation that simultaneously supports high-fidelity reconstruction, complex semantic extraction, and generative suitability. However, existing visual tokenizers typically struggle to satisfy these conflicting objectives within a single framework. In this paper, we introduce UniWeTok, a unified discrete tokenizer designed to bridge this gap using a massive binary codebook ($\mathit{2^{128}}$). For training framework, we intro...

📄 Deep Dense Exploration for LLM Reinforcement Learning via Pivot-Driven Resampling
🗓️ Published: 2/15/2026
🔗 http://arxiv.org/abs/2602.14169v1
👥 Authors: Yiran Guo, Zhongjian Qiao, Yingqi Xie, Jie Liu (possible past Tencent (China) affiliation), Dan Ye, Ruiqing Zhang (possible past Baidu (China) affiliation), Shuang Qiu, Lijie Xu
Abstract

Effective exploration is a key challenge in reinforcement learning for large language models: discovering high-quality trajectories within a limited sampling budget from the vast natural language sequence space. Existing methods face notable limitations: GRPO samples exclusively from the root, saturating high-probability trajectories while leaving deep, error-prone states under-explored. Tree-based methods blindly disperse budgets across trivial or unrecoverable states, causing sampling dilution...

📄 DenseMLLM: Standard Multimodal LLMs are Intrinsic Dense Predictors
🗓️ Published: 2/15/2026
🔗 http://arxiv.org/abs/2602.14134v1
👥 Authors: Yi Li (possible past University Of Washington affiliation), Hongze Shen, Lexiang Tang, Xin Li (possible past Google (United States) affiliation), Xinpeng Ding, Yinsong Liu, Deqiang Jiang, Xing Sun (possible past Tencent (China) affiliation), Xiaomeng Li
Abstract

Multimodal Large Language Models (MLLMs) have demonstrated exceptional capabilities in high-level visual understanding. However, extending these models to fine-grained dense prediction tasks, such as semantic segmentation and depth estimation, typically necessitates the incorporation of complex, task-specific decoders and other customizations. This architectural fragmentation increases model complexity and deviates from the generalist design of MLLMs, ultimately limiting their practicality. In t...

📄 GUI-GENESIS: Automated Synthesis of Efficient Environments with Verifiable Rewards for GUI Agent Post-Training
🗓️ Published: 2/15/2026
🔗 http://arxiv.org/abs/2602.14093v1
👥 Authors: Yuan Cao (possible past Google (United States) affiliation), Dezhi Ran, Mengzhou Wu, Yuzhe Guo, Xin Chen (possible past Tencent (China) affiliation), Ang Li (possible past Google (United States) affiliation), Gang Cao, Gong Zhi, Hao Yu, Linyi Li, Wei Yang (possible past Tencent (China) affiliation), Tao Xie
Abstract

Post-training GUI agents in interactive environments is critical for developing generalization and long-horizon planning capabilities. However, training on real-world applications is hindered by high latency, poor reproducibility, and unverifiable rewards relying on noisy visual proxies. To address the limitations, we present GUI-GENESIS, the first framework to automatically synthesize efficient GUI training environments with verifiable rewards. GUI-GENESIS reconstructs real-world applications i...

📄 Plan-MCTS: Plan Exploration for Action Exploitation in Web Navigation
🗓️ Published: 2/15/2026
🔗 http://arxiv.org/abs/2602.14083v1
👥 Authors: Weiming Zhang, Jihong Wang, Jiamu Zhou, Qingyao Li, Xinbei Ma, Congmin Zheng, Xingyu Lou, Weiwen Liu, Zhuosheng Zhang, Jun Wang (possible past Tencent (China) affiliation), Yong Yu (possible past Shanghai Jiao Tong University affiliation), Weinan Zhang (possible past Shanghai Jiao Tong University affiliation)
Abstract

Large Language Models (LLMs) have empowered autonomous agents to handle complex web navigation tasks. While recent studies integrate tree search to enhance long-horizon reasoning, applying these algorithms in web navigation faces two critical challenges: sparse valid paths that lead to inefficient exploration, and a noisy context that dilutes accurate state perception. To address this, we introduce Plan-MCTS, a framework that reformulates web navigation by shifting exploration to a semantic Plan...

📄 BitDance: Scaling Autoregressive Generative Models with Binary Tokens
🗓️ Published: 2/15/2026
🔗 http://arxiv.org/abs/2602.14041v1
👥 Authors: Yuang Ai, Jiaming Han, Shaobin Zhuang, Weijia Mao, Xuefeng Hu, Ziyan Yang, Zhenheng Yang (possible past Meta (United States) affiliation), Huaibo Huang, Xiangyu Yue (possible past University Of California, Berkeley affiliation), Hao Chen
Abstract

We present BitDance, a scalable autoregressive (AR) image generator that predicts binary visual tokens instead of codebook indices. With high-entropy binary latents, BitDance lets each token represent up to $2^{256}$ states, yielding a compact yet highly expressive discrete representation. Sampling from such a huge token space is difficult with standard classification. To resolve this, BitDance uses a binary diffusion head: instead of predicting an index with softmax, it employs continuous-space...

📄 A Deployment-Friendly Foundational Framework for Efficient Computational Pathology
🗓️ Published: 2/15/2026
🔗 http://arxiv.org/abs/2602.14010v1
👥 Authors: Yu Cai, Cheng Jin, Jiabo Ma, Fengtao Zhou, Yingxue Xu, Zhengrui Guo, Yihui Wang, Zhengyu Zhang, Ling Liang, Yonghao Tan, Pingcheng Dong, Du Cai, On Ki Tang, Chenglong Zhao, Xi Wang (possible past Tsinghua University affiliation), Can Yang, Yali Xu, Jing Cui, Zhenhui Li, Ronald Cheong Kin Chan, Yueping Liu, Feng Gao, Xiuming Zhang (possible past University Of California, Berkeley affiliation), Li Liang, Hao Chen, Kwang-Ting Cheng
Abstract

Pathology foundation models (PFMs) have enabled robust generalization in computational pathology through large-scale datasets and expansive architectures, but their substantial computational cost, particularly for gigapixel whole slide images, limits clinical accessibility and scalability. Here, we present LitePath, a deployment-friendly foundational framework designed to mitigate model over-parameterization and patch level redundancy. LitePath integrates LiteFM, a compact model distilled from t...

📄 Eureka-Audio: Triggering Audio Intelligence in Compact Language Models
🗓️ Published: 2/15/2026
🔗 http://arxiv.org/abs/2602.13954v1
👥 Authors: Dan Zhang (possible past Google (United States) affiliation), Yishu Lei, Jing Hu, Shuwei He, Songhe Deng, Xianlong Luo, Danxiang Zhu, Shikun Feng (possible past Baidu (China) affiliation), Rui Liu, Jingzhou He, Yu Sun (possible past Baidu (China) affiliation), Hua Wu (possible past Baidu (China) affiliation), Haifeng Wang (possible past Google (United States) affiliation)
Abstract

We present Eureka-Audio, a compact yet high-performance audio language model that achieves competitive performance against models that are 4 to 18 times larger across a broad range of audio understanding benchmarks. Despite containing only 1.7B parameters, Eureka-Audio demonstrates strong performance on automatic speech recognition (ASR), audio understanding, and dense audio captioning, matching or surpassing multiple 7B to 30B audio and omni-modal baselines. The model adopts a unified end-to-en...

📄 Mean Flow Policy with Instantaneous Velocity Constraint for One-step Action Generation
🗓️ Published: 2/14/2026
🔗 http://arxiv.org/abs/2602.13810v1
👥 Authors: Guojian Zhan, Letian Tao, Pengcheng Wang, Yixiao Wang, Yiheng Li, Yuxin Chen, Masayoshi Tomizuka (possible past University Of California, Berkeley affiliation), Shengbo Eben Li (possible past Tsinghua University affiliation)
Abstract

Learning expressive and efficient policy functions is a promising direction in reinforcement learning (RL). While flow-based policies have recently proven effective in modeling complex action distributions with a fast deterministic sampling process, they still face a trade-off between expressiveness and computational burden, which is typically controlled by the number of flow steps. In this work, we propose mean velocity policy (MVP), a new generative policy function that models the mean velocit...

📄 BPP: Long-Context Robot Imitation Learning by Focusing on Key History Frames
🗓️ Published: 2/16/2026
🔗 http://arxiv.org/abs/2602.15010v1
👥 Authors: Max Sobol Mark, Jacky Liang (possible past Nvidia (United States) affiliation), Maria Attarian, Chuyuan Fu, Debidatta Dwibedi (possible past Google (United States) affiliation), Dhruv Shah, Aviral Kumar (possible past University Of California, Berkeley affiliation)
Abstract

Many robot tasks require attending to the history of past observations. For example, finding an item in a room requires remembering which places have already been searched. However, the best-performing robot policies typically condition only on the current observation, limiting their applicability to such tasks. Naively conditioning on past observations often fails due to spurious correlations: policies latch onto incidental features of training histories that do not generalize to out-of-distrib...

📄 Traceable Latent Variable Discovery Based on Multi-Agent Collaboration
🗓️ Published: 2/16/2026
🔗 http://arxiv.org/abs/2602.14456v1
👥 Authors: Huaming Du, Tao Hu (possible past Baidu (China) affiliation), Yijie Huang, Yu Zhao (possible past Tencent (China) affiliation), Guisong Liu, Tao Gu, Gang Kou, Carl Yang
Abstract

Revealing the underlying causal mechanisms in the real world is crucial for scientific and technological progress. Despite notable advances in recent decades, the lack of high-quality data and the reliance of traditional causal discovery algorithms (TCDA) on the assumption of no latent confounders, as well as their tendency to overlook the precise semantics of latent variables, have long been major obstacles to the broader application of causal discovery. To address this issue, we propose a nove...

📄 QuRL: Efficient Reinforcement Learning with Quantized Rollout
🗓️ Published: 2/15/2026
🔗 http://arxiv.org/abs/2602.13953v1
👥 Authors: Yuhang Li, Reena Elangovan, Xin Dong (possible past Tsinghua University affiliation), Priyadarshini Panda, Brucek Khailany (possible past Nvidia (United States) affiliation)
Abstract

Reinforcement learning with verifiable rewards (RLVR) has become a trending paradigm for training reasoning large language models (LLMs). However, due to the autoregressive decoding nature of LLMs, the rollout process becomes the efficiency bottleneck of RL training, consisting of up to 70\% of the total training time. In this work, we propose Quantized Reinforcement Learning (QuRL) that uses a quantized actor for accelerating the rollout. We address two challenges in QuRL. First, we propose Ada...

*Notable papers are those with at least two authors from a "big" AI/ML lab.