πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching
πŸ—“οΈ Published: 2/17/2026
πŸ”— http://arxiv.org/abs/2602.15827v1
πŸ‘₯ Authors: Zhen Wu, Xiaoyu Huang, Lujie Yang, Yuanhang Zhang, Koushil Sreenath, Xi Chen (possible past University Of California, Berkeley affiliation), Pieter Abbeel (possible past University Of California, Berkeley affiliation), Rocky Duan, Angjoo Kanazawa (possible past University Of California, Berkeley affiliation), Carmelo Sferrazza, Guanya Shi, C. Karen Liu (possible past Stanford University affiliation)
Abstract

While recent advances in humanoid locomotion have achieved stable walking on varied terrains, capturing the agility and adaptivity of highly dynamic human motions remains an open challenge. In particular, agile parkour in complex environments demands not only low-level robustness, but also human-like motion expressiveness, long-horizon skill composition, and perception-driven decision-making. In this paper, we present Perceptive Humanoid Parkour (PHP), a modular framework that enables humanoid r...

πŸ“„ MeshMimic: Geometry-Aware Humanoid Motion Learning through 3D Scene Reconstruction
πŸ—“οΈ Published: 2/17/2026
πŸ”— http://arxiv.org/abs/2602.15733v1
πŸ‘₯ Authors: Qiang Zhang (possible past Tsinghua University affiliation), Jiahao Ma, Peiran Liu, Shuai Shi, Zeran Su, Zifan Wang, Jingkai Sun, Wei Cui, Jialin Yu, Gang Han, Wen Zhao, Pihai Sun, Kangning Yin, Jiaxu Wang, Jiahang Cao, Lingfeng Zhang, Hao Cheng (possible past Tencent (China) affiliation), Xiaoshuai Hao, Yiding Ji, Junwei Liang (possible past Carnegie Mellon University affiliation), Jian Tang, Renjing Xu, Yijie Guo
Abstract

Humanoid motion control has witnessed significant breakthroughs in recent years, with deep reinforcement learning (RL) emerging as a primary catalyst for achieving complex, human-like behaviors. However, the high dimensionality and intricate dynamics of humanoid robots make manual motion design impractical, leading to a heavy reliance on expensive motion capture (MoCap) data. These datasets are not only costly to acquire but also frequently lack the necessary geometric context of the surrounding...

πŸ“„ Spanning the Visual Analogy Space with a Weight Basis of LoRAs
πŸ—“οΈ Published: 2/17/2026
πŸ”— http://arxiv.org/abs/2602.15727v1
πŸ‘₯ Authors: Hila Manor, Rinon Gal, Haggai Maron, Tomer Michaeli (possible past Technion – Israel Institute Of Technology affiliation), Gal Chechik (possible past Google (United States) affiliation)
Abstract

Visual analogy learning enables image manipulation through demonstration rather than textual description, allowing users to specify complex transformations difficult to articulate in words. Given a triplet $\{\mathbf{a}$, $\mathbf{a}'$, $\mathbf{b}\}$, the goal is to generate $\mathbf{b}'$ such that $\mathbf{a} : \mathbf{a}' :: \mathbf{b} : \mathbf{b}'$. Recent methods adapt text-to-image models to this task using a single Low-Rank Adaptation (LoRA) module, but they face a fundamental limitation...

πŸ“„ PERSONA: Dynamic and Compositional Inference-Time Personality Control via Activation Vector Algebra
πŸ—“οΈ Published: 2/17/2026
πŸ”— http://arxiv.org/abs/2602.15669v1
πŸ‘₯ Authors: Xiachong Feng, Liang Zhao (possible past Baidu (China) affiliation), Weihong Zhong, Yichong Huang, Yuxuan Gu, Lingpeng Kong (possible past Google (United States) affiliation), Xiaocheng Feng, Bing Qin
Abstract

Current methods for personality control in Large Language Models rely on static prompting or expensive fine-tuning, failing to capture the dynamic and compositional nature of human traits. We introduce PERSONA, a training-free framework that achieves fine-tuning level performance through direct manipulation of personality vectors in activation space. Our key insight is that personality traits appear as extractable, approximately orthogonal directions in the model's representation space that supp...

πŸ“„ STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens
πŸ—“οΈ Published: 2/17/2026
πŸ”— http://arxiv.org/abs/2602.15620v1
πŸ‘₯ Authors: Shiqi Liu, Zeyu He, Guojian Zhan, Letian Tao, Zhilong Zheng, Jiang Wu, Yinuo Wang, Yang Guan, Kehua Sheng, Bo Zhang (possible past Tencent (China) affiliation), Keqiang Li, Jingliang Duan (possible past Tsinghua University affiliation), Shengbo Eben Li (possible past Tsinghua University affiliation)
Abstract

Reinforcement Learning (RL) has significantly improved large language model reasoning, but existing RL fine-tuning methods rely heavily on heuristic techniques such as entropy regularization and reweighting to maintain stability. In practice, they often experience late-stage performance collapse, leading to degraded reasoning quality and unstable training. We derive that the magnitude of token-wise policy gradients in RL is negatively correlated with token probability and local policy entropy. B...

πŸ“„ Dynamic Training-Free Fusion of Subject and Style LoRAs
πŸ—“οΈ Published: 2/17/2026
πŸ”— http://arxiv.org/abs/2602.15539v1
πŸ‘₯ Authors: Qinglong Cao, Yuntian Chen, Chao Ma (possible past Shanghai Jiao Tong University affiliation), Xiaokang Yang (possible past Shanghai Jiao Tong University affiliation)
Abstract

Recent studies have explored the combination of multiple LoRAs to simultaneously generate user-specified subjects and styles. However, most existing approaches fuse LoRA weights using static statistical heuristics that deviate from LoRA's original purpose of learning adaptive feature adjustments and ignore the randomness of sampled inputs. To address this, we propose a dynamic training-free fusion framework that operates throughout the generation process. During the forward pass, at each LoRA-ap...

πŸ“„ CDRL: A Reinforcement Learning Framework Inspired by Cerebellar Circuits and Dendritic Computational Strategies
πŸ—“οΈ Published: 2/17/2026
πŸ”— http://arxiv.org/abs/2602.15367v1
πŸ‘₯ Authors: Sibo Zhang (possible past Baidu (China) affiliation), Rui Jing, Liangfu Lv, Jian Zhang (possible past Tencent (China) affiliation), Yunliang Zang
Abstract

Reinforcement learning (RL) has achieved notable performance in high-dimensional sequential decision-making tasks, yet remains limited by low sample efficiency, sensitivity to noise, and weak generalization under partial observability. Most existing approaches address these issues primarily through optimization strategies, while the role of architectural priors in shaping representation learning and decision dynamics is less explored. Inspired by structural principles of the cerebellum, we propo...

πŸ“„ On Surprising Effectiveness of Masking Updates in Adaptive Optimizers
πŸ—“οΈ Published: 2/17/2026
πŸ”— http://arxiv.org/abs/2602.15322v1
πŸ‘₯ Authors: Taejong Joo, Wenhan Xia, Cheolmin Kim, Ming Zhang (possible past Peking University affiliation), Eugene Ie (possible past Google (United States) affiliation)
Abstract

Training large language models (LLMs) relies almost exclusively on dense adaptive optimizers with increasingly sophisticated preconditioners. We challenge this by showing that randomly masking parameter updates can be highly effective, with a masked variant of RMSProp consistently outperforming recent state-of-the-art optimizers. Our analysis reveals that the random masking induces a curvature-dependent geometric regularization that smooths the optimization trajectory. Motivated by this finding,...

πŸ“„ X-MAP: eXplainable Misclassification Analysis and Profiling for Spam and Phishing Detection
πŸ—“οΈ Published: 2/17/2026
πŸ”— http://arxiv.org/abs/2602.15298v1
πŸ‘₯ Authors: Qi Zhang (possible past Tencent (China) affiliation), Dian Chen (possible past University Of California, Berkeley affiliation), Lance M. Kaplan, Audun JΓΈsang, Dong Hyun Jeong, Feng Chen, Jin-Hee Cho
Abstract

Misclassifications in spam and phishing detection are very harmful, as false negatives expose users to attacks while false positives degrade trust. Existing uncertainty-based detectors can flag potential errors, but possibly be deceived and offer limited interpretability. This paper presents X-MAP, an eXplainable Misclassification Analysis and Profilling framework that reveals topic-level semantic patterns behind model failures. X-MAP combines SHAP-based feature attributions with non-negative ma...

πŸ“„ Rethinking Diffusion Models with Symmetries through Canonicalization with Applications to Molecular Graph Generation
πŸ—“οΈ Published: 2/16/2026
πŸ”— http://arxiv.org/abs/2602.15022v1
πŸ‘₯ Authors: Cai Zhou, Zijie Chen, Zian Li, Jike Wang, Kaiyi Jiang, Pan Li (possible past Baidu (China) affiliation), Rose Yu, Muhan Zhang (possible past Meta (United States) affiliation), Stephen Bates, Tommi Jaakkola
Abstract

Many generative tasks in chemistry and science involve distributions invariant to group symmetries (e.g., permutation and rotation). A common strategy enforces invariance and equivariance through architectural constraints such as equivariant denoisers and invariant priors. In this paper, we challenge this tradition through the alternative canonicalization perspective: first map each sample to an orbit representative with a canonical pose or order, train an unconstrained (non-equivariant) diffusi...

πŸ“„ PhyScensis: Physics-Augmented LLM Agents for Complex Physical Scene Arrangement
πŸ—“οΈ Published: 2/16/2026
πŸ”— http://arxiv.org/abs/2602.14968v1
πŸ‘₯ Authors: Yian Wang, Han Yang (possible past Eth Zurich affiliation), Minghao Guo, Xiaowen Qiu, Tsun-Hsuan Wang, Wojciech Matusik, Joshua B. Tenenbaum (possible past Massachusetts Institute Of Technology affiliation), Chuang Gan (possible past Tsinghua University affiliation)
Abstract

Automatically generating interactive 3D environments is crucial for scaling up robotic data collection in simulation. While prior work has primarily focused on 3D asset placement, it often overlooks the physical relationships between objects (e.g., contact, support, balance, and containment), which are essential for creating complex and realistic manipulation scenarios such as tabletop arrangements, shelf organization, or box packing. Compared to classical 3D layout generation, producing complex...

πŸ“„ On the Learning Dynamics of RLVR at the Edge of Competence
πŸ—“οΈ Published: 2/16/2026
πŸ”— http://arxiv.org/abs/2602.14872v1
πŸ‘₯ Authors: Yu Huang (possible past Tencent (China) affiliation), Zixin Wen, Yuejie Chi, Yuting Wei, Aarti Singh (possible past Carnegie Mellon University affiliation), Yingbin Liang, Yuxin Chen
Abstract

Reinforcement learning with verifiable rewards (RLVR) has been a main driver of recent breakthroughs in large reasoning models. Yet it remains a mystery how rewards based solely on final outcomes can help overcome the long-horizon barrier to extended reasoning. To understand this, we develop a theory of the training dynamics of RL for transformers on compositional reasoning tasks. Our theory characterizes how the effectiveness of RLVR is governed by the smoothness of the difficulty spectrum. Whe...

πŸ“„ CoCoDiff: Correspondence-Consistent Diffusion Model for Fine-grained Style Transfer
πŸ—“οΈ Published: 2/16/2026
πŸ”— http://arxiv.org/abs/2602.14464v1
πŸ‘₯ Authors: Wenbo Nie, Zixiang Li, Renshuai Tao, Bin Wu, Yunchao Wei (possible past National University Of Singapore affiliation), Yao Zhao (possible past Microsoft (United States) affiliation)
Abstract

Transferring visual style between images while preserving semantic correspondence between similar objects remains a central challenge in computer vision. While existing methods have made great strides, most of them operate at global level but overlook region-wise and even pixel-wise semantic correspondence. To address this, we propose CoCoDiff, a novel training-free and low-cost style transfer framework that leverages pretrained latent diffusion models to achieve fine-grained, semantically consi...

πŸ“„ Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5
πŸ—“οΈ Published: 2/16/2026
πŸ”— http://arxiv.org/abs/2602.14457v1
πŸ‘₯ Authors: Dongrui Liu, Yi Yu, Jie Zhang, Guanxu Chen, Qihao Lin, Hanxi Zhu, Lige Huang, Yijin Zhou, Peng Wang (possible past Peking University affiliation), Shuai Shao, Boxuan Zhang, Zicheng Liu (possible past Microsoft (United States) affiliation), Jingwei Sun, Yu Li (possible past Tencent (China) affiliation), Yuejin Xie, Jiaxuan Guo, Jia Xu, Chaochao Lu, Bowen Zhou, Xia Hu, Jing Shao
Abstract

To understand and identify the unprecedented risks posed by rapidly advancing artificial intelligence (AI) models, Frontier AI Risk Management Framework in Practice presents a comprehensive assessment of their frontier risks. As Large Language Models (LLMs) general capabilities rapidly evolve and the proliferation of agentic AI, this version of the risk analysis technical report presents an updated and granular assessment of five critical dimensions: cyber offense, persuasion and manipulation, s...

πŸ“„ Precedent-Informed Reasoning: Mitigating Overthinking in Large Reasoning Models via Test-Time Precedent Learning
πŸ—“οΈ Published: 2/16/2026
πŸ”— http://arxiv.org/abs/2602.14451v1
πŸ‘₯ Authors: Qianyue Wang, Jinwu Hu, Huanxiang Lin, Bolin Chen, Zhiquan Wen, Yaofo Chen, Yu Rong (possible past Tencent (China) affiliation), Mingkui Tan (possible past Baidu (China) affiliation)
Abstract

Reasoning in Large Language Models (LLMs) often suffers from inefficient long chain-of-thought traces with redundant self-exploration and validation, which inflate computational costs and even degrade performance. Inspired by human reasoning patterns where people solve new problems by leveraging past related cases to constrain search spaces and reduce trial-and-error, we propose Precedent Informed Reasoning (PIR) transforming LRMs'reasoning paradigm from exhaustive self-exploration to guided lea...

πŸ“„ Operationalising the Superficial Alignment Hypothesis via Task Complexity
πŸ—“οΈ Published: 2/17/2026
πŸ”— http://arxiv.org/abs/2602.15829v1
πŸ‘₯ Authors: TomΓ‘s Vergara-Browne, Darshan Patil, Ivan Titov, Siva Reddy (possible past University Of Edinburgh affiliation), Tiago Pimentel (possible past Eth Zurich affiliation), Marius Mosbach
Abstract

The superficial alignment hypothesis (SAH) posits that large language models learn most of their knowledge during pre-training, and that post-training merely surfaces this knowledge. The SAH, however, lacks a precise definition, which has led to (i) different and seemingly orthogonal arguments supporting it, and (ii) important critiques to it. We propose a new metric called task complexity: the length of the shortest program that achieves a target performance on a task. In this framework, the SA...

πŸ“„ Dex4D: Task-Agnostic Point Track Policy for Sim-to-Real Dexterous Manipulation
πŸ—“οΈ Published: 2/17/2026
πŸ”— http://arxiv.org/abs/2602.15828v1
πŸ‘₯ Authors: Yuxuan Kuang, Sungjae Park, Katerina Fragkiadaki (possible past University Of California, Berkeley affiliation), Shubham Tulsiani (possible past University Of California, Berkeley affiliation)
Abstract

Learning generalist policies capable of accomplishing a plethora of everyday tasks remains an open challenge in dexterous manipulation. In particular, collecting large-scale manipulation data via real-world teleoperation is expensive and difficult to scale. While learning in simulation provides a feasible alternative, designing multiple task-specific environments and rewards for training is similarly challenging. We propose Dex4D, a framework that instead leverages simulation for learning task-a...

πŸ“„ GLM-5: from Vibe Coding to Agentic Engineering
πŸ—“οΈ Published: 2/17/2026
πŸ”— http://arxiv.org/abs/2602.15763v1
πŸ‘₯ Authors: Glm-5 Team, :, Aohan Zeng, Xin Lv, Zhenyu Hou (possible past Baidu (China) affiliation), Zhengxiao Du, Qinkai Zheng, Bin Chen, Da Yin, Chendi Ge, Chengxing Xie, Cunxiang Wang, Gengzheng Pan, Hao Zeng, Haoke Zhang, Haoran Wang, Huilong Chen, Jiajie Zhang, Jian Jiao, Jiaqi Guo, Jingsen Wang, Jingzhao Du, Jinzhu Wu, Kedong Wang, Lei Li (possible past Carnegie Mellon University affiliation), Lin Fan, Lucen Zhong, Mingdao Liu, Mingming Zhao, Pengfan Du, Qian Dong, Rui Lu, Shuang-Li, Shulin Cao, Song Liu, Ting Jiang, Xiaodong Chen, Xiaohan Zhang, Xuancheng Huang, Xuezhen Dong, Yabo Xu, Yao Wei, Yifan An, Yilin Niu, Yitong Zhu, Yuanhao Wen, Yukuo Cen, Yushi Bai, Zhongpei Qiao, Zihan Wang (possible past Tsinghua University affiliation), Zikang Wang, Zilin Zhu, Ziqiang Liu, Zixuan Li, Bojie Wang, Bosi Wen, Can Huang, Changpeng Cai, Chao Yu, Chen Li (possible past Tencent (China) affiliation), Chen Li (possible past Tencent (China) affiliation), Chenghua Huang, Chengwei Hu, Chenhui Zhang, Chenzheng Zhu, Congfeng Yin, Daoyan Lin, Dayong Yang, Di Wang, Ding Ai, Erle Zhu, Fangzhou Yi, Feiyu Chen, Guohong Wen, Hailong Sun, Haisha Zhao, Haiyi Hu, Hanchen Zhang, Hanrui Liu, Hanyu Zhang, Hao Peng (possible past Tsinghua University affiliation), Hao Tai, Haobo Zhang, He Liu (possible past Google (United States) affiliation), Hongwei Wang, Hongxi Yan, Hongyu Ge, Huan Liu (possible past Tsinghua University affiliation), Huan Liu (possible past Tsinghua University affiliation), Huanpeng Chu, Jia'ni Zhao, Jiachen Wang, Jiajing Zhao, Jiamin Ren, Jiapeng Wang, Jiaxin Zhang, Jiayi Gui, Jiayue Zhao, Jijie Li, Jing An, Jing Li (possible past Tencent (China) affiliation), Jingwei Yuan, Jinhua Du, Jinxin Liu, Junkai Zhi, Junwen Duan, Kaiyue Zhou, Kangjian Wei, Ke Wang (possible past Google (United States) affiliation), Keyun Luo, Laiqiang Zhang, Leigang Sha, Liang Xu, Lindong Wu, Lintao Ding, Lu Chen, Minghao Li, Nianyi Lin, Pan Ta, Qiang Zou, Rongjun Song, Ruiqi Yang, Shangqing Tu, Shangtong Yang, Shaoxiang Wu, Shengyan Zhang, Shijie Li, Shuang Li, Shuyi Fan, Wei Qin, Wei Tian, Weining Zhang, Wenbo Yu, Wenjie Liang, Xiang Kuang, Xiangmeng Cheng, Xiangyang Li, Xiaoquan Yan, Xiaowei Hu, Xiaoying Ling, Xing Fan, Xingye Xia, Xinyuan Zhang, Xinze Zhang, Xirui Pan, Xunkai Zhang, Yandong Wu, Yanfu Li, Yidong Wang, Yifan Zhu, Yijun Tan, Yilin Zhou, Yiming Pan, Ying Zhang (possible past Tencent (China) affiliation), Yinpei Su, Yipeng Geng, Yipeng Geng, Yong Yan, Yonglin Tan, Yuean Bi, Yuhan Shen, Yuhao Yang, Yujiang Li, Yunan Liu, Yunqing Wang (possible past Google (United States) affiliation), Yuntao Li, Yurong Wu, Yutao Zhang, Yuxi Duan, Yuxuan Zhang, Zezhen Liu, Zhengtao Jiang, Zhenhe Yan, Zheyu Zhang, Zhixiang Wei, Zhuo Chen, Zhuoer Feng, Zijun Yao, Ziwei Chai, Ziyuan Wang, Zuzhou Zhang, Bin Xu, Minlie Huang, Hongning Wang, Juanzi Li, Yuxiao Dong (possible past Microsoft (United States) affiliation), Jie Tang (possible past Tsinghua University affiliation)
Abstract

We present GLM-5, a next-generation foundation model designed to transition the paradigm of vibe coding to agentic engineering. Building upon the agentic, reasoning, and coding (ARC) capabilities of its predecessor, GLM-5 adopts DSA to significantly reduce training and inference costs while maintaining long-context fidelity. To advance model alignment and autonomy, we implement a new asynchronous reinforcement learning infrastructure that drastically improves post-training efficiency by decoupli...

πŸ“„ ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns
πŸ—“οΈ Published: 2/17/2026
πŸ”— http://arxiv.org/abs/2602.15521v1
πŸ‘₯ Authors: Ziyu Zhao, Tong Zhu (possible past Nvidia (United States) affiliation), Zhi Zhang, Tiantian Fan, Jinluan Yang, Kun Kuang, Zhongyu Wei, Fei Wu (possible past Google (United States) affiliation), Yu Cheng (possible past National University Of Singapore affiliation)
Abstract

Mixture-of-Experts (MoE) effectively scales model capacity while preserving computational efficiency through sparse expert activation. However, training high-quality MoEs from scratch is prohibitively expensive. A promising alternative is to convert pretrained dense models into sparse MoEs. Existing dense-to-MoE methods fall into two categories: \textbf{dynamic structural pruning} that converts dense models into MoE architectures with moderate sparsity to balance performance and inference effici...

πŸ“„ ÜberWeb: Insights from Multilingual Curation for a 20-Trillion-Token Dataset
πŸ—“οΈ Published: 2/16/2026
πŸ”— http://arxiv.org/abs/2602.15210v1
πŸ‘₯ Authors: Datologyai, :, Aldo Gael Carranza, Kaleigh Mentzer, Ricardo Pio Monti, Alex Fang, Alvin Deng, Amro Abbas, Anshuman Suri, Brett Larsen, Cody Blakeney, Darren Teh, David Schwab, Diego Kiner, Fan Pan, Haakon Mongstad, Jack Urbanek (possible past Meta (United States) affiliation), Jason Lee (possible past Stanford University affiliation), Jason Telanoff, Josh Wills, Luke Merrick, Parth Doshi, Paul Burstein, Pratyush Maini, Spandan Das, Tony Jiang, Vineeth Dorna, Zhengping Wang, Bogdan Gaza, Ari Morcos, Matthew Leavitt
Abstract

Multilinguality is a core capability for modern foundation models, yet training high-quality multilingual models remains challenging due to uneven data availability across languages. A further challenge is the performance interference that can arise from joint multilingual training, commonly referred to as the "curse of multilinguality". We study multilingual data curation across thirteen languages and find that many reported regressions are not inherent to multilingual scaling but instead stem ...

πŸ“„ BPP: Long-Context Robot Imitation Learning by Focusing on Key History Frames
πŸ—“οΈ Published: 2/16/2026
πŸ”— http://arxiv.org/abs/2602.15010v1
πŸ‘₯ Authors: Max Sobol Mark, Jacky Liang (possible past Nvidia (United States) affiliation), Maria Attarian, Chuyuan Fu, Debidatta Dwibedi (possible past Google (United States) affiliation), Dhruv Shah, Aviral Kumar (possible past University Of California, Berkeley affiliation)
Abstract

Many robot tasks require attending to the history of past observations. For example, finding an item in a room requires remembering which places have already been searched. However, the best-performing robot policies typically condition only on the current observation, limiting their applicability to such tasks. Naively conditioning on past observations often fails due to spurious correlations: policies latch onto incidental features of training histories that do not generalize to out-of-distrib...

*Notable papers are those with at least two authors from a "big" AI/ML lab.