📄 Notable* Recent AI/ML arXiv Papers

Last updated just now...

📄 Causality in Video Diffusers is Separable from Denoising
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.10095v1
👥 Authors: Xingjian Bai, Guande He, Zhengqi Li (possible past Google (United States) affiliation), Eli Shechtman, Xun Huang (possible past Nvidia (United States) affiliation), Zongze Wu
Abstract

Causality -- referring to temporal, uni-directional cause-effect relationships between components -- underlies many complex generative processes, including videos, language, and robot trajectories. Current causal diffusion models entangle temporal reasoning with iterative denoising, applying causal attention across all layers, at every denoising step, and over the entire context. In this paper, we show that the causal reasoning in these models is separable from the multi-step denoising process. ...

📄 Chain of Mindset: Reasoning with Adaptive Cognitive Modes
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.10063v1
👥 Authors: Tianyi Jiang, Arctanx An, Hengyi Feng, Naixin Zhai, Haodong Li, Xiaomin Yu, Jiahui Liu (possible past Google (United States) affiliation), Hanwen Du, Shuo Zhang (possible past National University Of Defense Technology affiliation), Zhi Yang, Jie Huang, Yuhua Li, Yongxin Ni, Huacan Wang, Ronghao Chen
Abstract

Human problem-solving is never the repetition of a single mindset, by which we mean a distinct mode of cognitive processing. When tackling a specific task, we do not rely on a single mindset; instead, we integrate multiple mindsets within the single solution process. However, existing LLM reasoning methods fall into a common trap: they apply the same fixed mindset across all steps, overlooking that different stages of solving the same problem require fundamentally different mindsets. This single...

📄 Kunlun: Establishing Scaling Laws for Massive-Scale Recommendation Systems through Unified Architecture Design
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.10016v1
👥 Authors: Bojian Hou, Xiaolong Liu, Xiaoyi Liu, Jiaqi Xu, Yasmine Badr, Mengyue Hang, Sudhanshu Chanpuriya, Junqing Zhou, Yuhang Yang, Han Xu, Qiuling Suo, Laming Chen, Yuxi Hu, Jiasheng Zhang, Huaqing Xiong, Yuzhen Huang, Chao Chen (possible past Tencent (China) affiliation), Yue Dong, Yi Yang (possible past Baidu (China) affiliation), Shuo Chang, Xiaorui Gan, Wenlin Chen (possible past Meta (United States) affiliation), Santanu Kolay, Darren Liu, Jade Nie, Chunzhi Yang, Jiyan Yang (possible past Meta (United States) affiliation), Huayu Li
Abstract

Deriving predictable scaling laws that govern the relationship between model performance and computational investment is crucial for designing and allocating resources in massive-scale recommendation systems. While such laws are established for large language models, they remain challenging for recommendation systems, especially those processing both user history and context features. We identify poor scaling efficiency as the main barrier to predictable power-law scaling, stemming from ineffici...

📄 Infusion: Shaping Model Behavior by Editing Training Data via Influence Functions
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.09987v1
👥 Authors: J Rosser, Robert Kirk, Edward Grefenstette (possible past University Of Oxford affiliation), Jakob Foerster (possible past University Of Oxford affiliation), Laura Ruis
Abstract

Influence functions are commonly used to attribute model behavior to training documents. We explore the reverse: crafting training data that induces model behavior. Our framework, Infusion, uses scalable influence-function approximations to compute small perturbations to training documents that induce targeted changes in model behavior through parameter shifts. We evaluate Infusion on data poisoning tasks across vision and language domains. On CIFAR-10, we show that making subtle edits via Infus...

📄 Code2World: A GUI World Model via Renderable Code Generation
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.09856v1
👥 Authors: Yuhao Zheng, Li'an Zhong, Yi Wang, Rui Dai, Kaikui Liu, Xiangxiang Chu, Linyuan Lv, Philip Torr (possible past University Of Oxford affiliation), Kevin Qinghong Lin (possible past National University Of Singapore affiliation)
Abstract

Autonomous GUI agents interact with environments by perceiving interfaces and executing actions. As a virtual sandbox, the GUI World model empowers agents with human-like foresight by enabling action-conditioned prediction. However, existing text- and pixel-based approaches struggle to simultaneously achieve high visual fidelity and fine-grained structural controllability. To this end, we propose Code2World, a vision-language coder that simulates the next visual state via renderable code generat...

📄 ClinAlign: Scaling Healthcare Alignment from Clinician Preference
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.09653v1
👥 Authors: Shiwei Lyu, Xidong Wang, Lei Liu, Hao Zhu (possible past Tsinghua University affiliation), Chaohe Zhang, Jian Wang (possible past Baidu (China) affiliation), Jinjie Gu, Benyou Wang (possible past Tencent (China) affiliation), Yue Shen
Abstract

Although large language models (LLMs) demonstrate expert-level medical knowledge, aligning their open-ended outputs with fine-grained clinician preferences remains challenging. Existing methods often rely on coarse objectives or unreliable automated judges that are weakly grounded in professional guidelines. We propose a two-stage framework to address this gap. First, we introduce HealthRubrics, a dataset of 7,034 physician-verified preference examples in which clinicians refine LLM-drafted rubr...

📄 MieDB-100k: A Comprehensive Dataset for Medical Image Editing
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.09587v1
👥 Authors: Yongfan Lai, Wen Qian, Bo Liu (possible past Meta (United States) affiliation), Hongyan Li, Hao Luo, Fan Wang (possible past Baidu (China) affiliation), Bohan Zhuang, Shenda Hong (possible past Peking University affiliation)
Abstract

The scarcity of high-quality data remains a primary bottleneck in adapting multimodal generative models for medical image editing. Existing medical image editing datasets often suffer from limited diversity, neglect of medical image understanding and inability to balance quality with scalability. To address these gaps, we propose MieDB-100k, a large-scale, high-quality and diverse dataset for text-guided medical image editing. It categorizes editing tasks into perspectives of Perception, Modific...

📄 Predictive Query Language: A Domain-Specific Language for Predictive Modeling on Relational Databases
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.09572v1
👥 Authors: Vid Kocijan, Jinu Sunil, Jan Eric Lenssen (possible past Meta (United States) affiliation), Viman Deb, Xinwei Xe, Federco Reyes Gomez, Matthias Fey, Jure Leskovec (possible past Stanford University affiliation)
Abstract

The purpose of predictive modeling on relational data is to predict future or missing values in a relational database, for example, future purchases of a user, risk of readmission of the patient, or the likelihood that a financial transaction is fraudulent. Typically powered by machine learning methods, predictive models are used in recommendations, financial fraud detection, supply chain optimization, and other systems, providing billions of predictions every day. However, training a machine le...

📄 Learning to Discover Iterative Spectral Algorithms
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.09530v1
👥 Authors: Zihang Liu, Oleg Balabanov, Yaoqing Yang (possible past University Of California, Berkeley affiliation), Michael W. Mahoney (possible past Stanford University affiliation)
Abstract

We introduce AutoSpec, a neural network framework for discovering iterative spectral algorithms for large-scale numerical linear algebra and numerical optimization. Our self-supervised models adapt to input operators using coarse spectral information (e.g., eigenvalue estimates and residual norms), and they predict recurrence coefficients for computing or applying a matrix polynomial tailored to a downstream task. The effectiveness of AutoSpec relies on three ingredients: an architecture whose i...

📄 SWE-AGI: Benchmarking Specification-Driven Software Construction with MoonBit in the Era of Autonomous Agents
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.09447v1
👥 Authors: Zhirui Zhang (possible past Tencent (China) affiliation), Hongbo Zhang, Haoxiang Fei, Zhiyuan Bao, Yubin Chen, Zhengyu Lei, Ziyue Liu, Yixuan Sun, Mingkun Xiao, Zihang Ye, Yu Zhang (possible past Google (United States) affiliation), Hongcheng Zhu, Yuxiang Wen, Heung-Yeung Shum
Abstract

Although large language models (LLMs) have demonstrated impressive coding capabilities, their ability to autonomously build production-scale software from explicit specifications remains an open question. We introduce SWE-AGI, an open-source benchmark for evaluating end-to-end, specification-driven construction of software systems written in MoonBit. SWE-AGI tasks require LLM-based agents to implement parsers, interpreters, binary decoders, and SAT solvers strictly from authoritative standards a...

📄 P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.09443v1
👥 Authors: Yun Luo, Futing Wang, Qianjia Cheng, Fangchen Yu, Haodi Lei, Jianhao Yan, Chenxi Li, Jiacheng Chen, Yufeng Zhao, Haiyuan Wan, Yuchen Zhang (possible past University Of California, Berkeley affiliation), Shenghe Zheng, Junchi Yao, Qingyang Zhang, Haonan He, Wenxuan Zeng, Li Sheng (possible past Google (United States) affiliation), Chengxing Xie, Yuxin Zuo, Yizhuo Li, Yulun Wu, Rui Huang (possible past Google (United States) affiliation), Dongzhan Zhou, Kai Chen (possible past Shanghai Jiao Tong University affiliation), Yu Qiao (possible past Shanghai Artificial Intelligence Laboratory affiliation), Lei Bai, Yu Cheng (possible past National University Of Singapore affiliation), Ning Ding (possible past Tsinghua University affiliation), Bowen Zhou, Peng Ye, Ganqu Cui (possible past Tsinghua University affiliation)
Abstract

The transition from symbolic manipulation to science-grade reasoning represents a pivotal frontier for Large Language Models (LLMs), with physics serving as the critical test anchor for binding abstract logic to physical reality. Physics demands that a model maintain physical consistency with the laws governing the universe, a task that fundamentally requires multimodal perception to ground abstract logic in reality. At the Olympiad level, diagrams are often constitutive rather than illustrative...

📄 Diffusion-Guided Pretraining for Brain Graph Foundation Models
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.09437v1
👥 Authors: Xinxu Wei, Rong Zhou (possible past Google (United States) affiliation), Lifang He, Yu Zhang (possible past Google (United States) affiliation)
Abstract

With the growing interest in foundation models for brain signals, graph-based pretraining has emerged as a promising paradigm for learning transferable representations from connectome data. However, existing contrastive and masked autoencoder methods typically rely on naive random dropping or masking for augmentation, which is ill-suited for brain graphs and hypergraphs as it disrupts semantically meaningful connectivity patterns. Moreover, commonly used graph-level readout and reconstruction sc...

📄 BiasScope: Towards Automated Detection of Bias in LLM-as-a-Judge Evaluation
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.09383v1
👥 Authors: Peng Lai, Zhihao Ou, Yong Wang (possible past Baidu (China) affiliation), Longyue Wang (possible past Tencent (China) affiliation), Jian Yang, Yun Chen, Guanhua Chen
Abstract

LLM-as-a-Judge has been widely adopted across various research and practical applications, yet the robustness and reliability of its evaluation remain a critical issue. A core challenge it faces is bias, which has primarily been studied in terms of known biases and their impact on evaluation outcomes, while automated and systematic exploration of potential unknown biases is still lacking. Nevertheless, such exploration is crucial for enhancing the robustness and reliability of evaluations. To br...

📄 Auditing Multi-Agent LLM Reasoning Trees Outperforms Majority Vote and LLM-as-Judge
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.09341v1
👥 Authors: Wei Yang (possible past Tencent (China) affiliation), Shixuan Li, Heng Ping, Peiyu Zhang, Paul Bogdan, Jesse Thomason (possible past University Of Washington affiliation)
Abstract

Multi-agent systems (MAS) can substantially extend the reasoning capacity of large language models (LLMs), yet most frameworks still aggregate agent outputs with majority voting. This heuristic discards the evidential structure of reasoning traces and is brittle under the confabulation consensus, where agents share correlated biases and converge on the same incorrect rationale. We introduce AgentAuditor, which replaces voting with a path search over a Reasoning Tree that explicitly represents ag...

📄 Effective Reasoning Chains Reduce Intrinsic Dimensionality
🗓️ Published: 2/9/2026
🔗 http://arxiv.org/abs/2602.09276v1
👥 Authors: Archiki Prasad, Mandar Joshi (possible past University Of Washington affiliation), Kenton Lee (possible past University Of Washington affiliation), Mohit Bansal, Peter Shaw (possible past Google (United States) affiliation)
Abstract

Chain-of-thought (CoT) reasoning and its variants have substantially improved the performance of language models on complex reasoning tasks, yet the precise mechanisms by which different strategies facilitate generalization remain poorly understood. While current explanations often point to increased test-time computation or structural guidance, establishing a consistent, quantifiable link between these factors and generalization remains challenging. In this work, we identify intrinsic dimension...

📄 Gradient Residual Connections
🗓️ Published: 2/9/2026
🔗 http://arxiv.org/abs/2602.09190v1
👥 Authors: Yangchen Pan, Qizhen Ying, Philip Torr (possible past University Of Oxford affiliation), Bo Liu (possible past Meta (United States) affiliation)
Abstract

Existing work has linked properties of a function's gradient to the difficulty of function approximation. Motivated by these insights, we study how gradient information can be leveraged to improve neural network's ability to approximate high-frequency functions, and we propose a gradient-based residual connection as a complement to the standard identity skip connection used in residual networks. We provide simple theoretical intuition for why gradient information can help distinguish inputs and ...

📄 Data Science and Technology Towards AGI Part I: Tiered Data Management
🗓️ Published: 2/9/2026
🔗 http://arxiv.org/abs/2602.09003v1
👥 Authors: Yudong Wang, Zixuan Fu, Hengyu Zhao, Chen Zhao (possible past Stanford University affiliation), Chuyue Zhou, Xinle Lin, Hongya Lyu, Shuaikang Xue, Yi Yi, Yingjiao Wang, Zhi Zheng, Yuzhou Zhang (possible past Google (United States) affiliation), Jie Zhou (possible past Tsinghua University affiliation), Chaojun Xiao, Xu Han (possible past Tsinghua University affiliation), Zhiyuan Liu (possible past Tsinghua University affiliation), Maosong Sun (possible past Tsinghua University affiliation)
Abstract

The development of artificial intelligence can be viewed as an evolution of data-driven learning paradigms, with successive shifts in data organization and utilization continuously driving advances in model capability. Current LLM research is dominated by a paradigm that relies heavily on unidirectional scaling of data size, increasingly encountering bottlenecks in data availability, acquisition cost, and training efficiency. In this work, we argue that the development of AGI is entering a new p...

📄 iGRPO: Self-Feedback-Driven LLM Reasoning
🗓️ Published: 2/9/2026
🔗 http://arxiv.org/abs/2602.09000v1
👥 Authors: Ali Hatamizadeh (possible past Nvidia (United States) affiliation), Shrimai Prabhumoye (possible past Carnegie Mellon University affiliation), Igor Gitman, Ximing Lu, Seungju Han, Wei Ping (possible past Baidu (China) affiliation), Yejin Choi (possible past Allen Institute For Artificial Intelligence affiliation), Jan Kautz (possible past Nvidia (United States) affiliation)
Abstract

Large Language Models (LLMs) have shown promise in solving complex mathematical problems, yet they still fall short of producing accurate and consistent solutions. Reinforcement Learning (RL) is a framework for aligning these models with task-specific rewards, improving overall quality and reliability. Group Relative Policy Optimization (GRPO) is an efficient, value-function-free alternative to Proximal Policy Optimization (PPO) that leverages group-relative reward normalization. We introduce It...

📄 UI-Venus-1.5 Technical Report
🗓️ Published: 2/9/2026
🔗 http://arxiv.org/abs/2602.09082v1
👥 Authors: Veuns-Team, :, Changlong Gao, Zhangxuan Gu, Yulin Liu, Xinyu Qiu, Shuheng Shen, Yue Wen, Tianyu Xia, Zhenyu Xu, Zhengwen Zeng, Beitong Zhou, Xingran Zhou, Weizhi Chen, Sunhao Dai, Jingya Dou, Yichen Gong, Yuan Guo, Zhenlin Guo, Feng Li, Qian Li (possible past National University Of Defense Technology affiliation), Jinzhen Lin, Yuqi Zhou, Linchao Zhu (possible past Baidu (China) affiliation), Liang Chen (possible past Google (United States) affiliation), Zhenyu Guo, Changhua Meng, Weiqiang Wang
Abstract

GUI agents have emerged as a powerful paradigm for automating interactions in digital environments, yet achieving both broad generality and consistently strong task performance remains challenging.In this report, we present UI-Venus-1.5, a unified, end-to-end GUI Agent designed for robust real-world applications.The proposed model family comprises two dense variants (2B and 8B) and one mixture-of-experts variant (30B-A3B) to meet various downstream application scenarios.Compared to our previous ...

📄 InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery
🗓️ Published: 2/9/2026
🔗 http://arxiv.org/abs/2602.08990v1
👥 Authors: Shiyang Feng, Runmin Ma, Xiangchao Yan, Yue Fan, Yusong Hu, Songtao Huang, Shuaiyu Zhang, Zongsheng Cao, Tianshuo Peng, Jiakang Yuan, Zijie Guo, Zhijie Zhong, Shangheng Du, Weida Wang, Jinxin Shi, Yuhao Zhou, Xiaohan He, Zhiyin Yu, Fangchen Yu, Qihao Zheng, Jiamin Wu, Mianxin Liu, Chi Zhang (possible past Peking University affiliation), Shaowei Hou, Shuya Li, Yankai Jiang, Wenjie Lou, Lilong Wang, Zifu Wang, Jiong Wang, Wanghan Xu, Yue Deng, Dongrui Liu, Yiheng Wang, Wenlong Zhang, Fenghua Ling, Shufei Zhang, Xiaosong Wang (possible past Nvidia (United States) affiliation), Shuangjia Zheng, Xun Huang (possible past Nvidia (United States) affiliation), Siqi Sun, Shuyue Hu, Peng Ye, Chunfeng Song, Bin Wang, Conghui He (possible past Tsinghua University affiliation), Yihao Liu, Xin Li (possible past Google (United States) affiliation), Qibin Hou, Tao Chen, Xiangyu Yue (possible past University Of California, Berkeley affiliation), Bin Wang, Liang He, Dahua Lin, Bowen Zhou, Bo Zhang (possible past Tencent (China) affiliation), Lei Bai
Abstract

We introduce InternAgent-1.5, a unified system designed for end-to-end scientific discovery across computational and empirical domains. The system is built on a structured architecture composed of three coordinated subsystems for generation, verification, and evolution. These subsystems are supported by foundational capabilities for deep research, solution optimization, and long horizon memory. The architecture allows InternAgent-1.5 to operate continuously across extended discovery cycles while...

📄 Looping Back to Move Forward: Recursive Transformers for Efficient and Flexible Large Multimodal Models
🗓️ Published: 2/9/2026
🔗 http://arxiv.org/abs/2602.09080v1
👥 Authors: Ruihan Xu, Yuting Gao (possible past Tencent (China) affiliation), Lan Wang, Jianing Li, Weihao Chen, Qingpei Guo, Ming Yang (possible past Meta (United States) affiliation), Shiliang Zhang
Abstract

Large Multimodal Models (LMMs) have achieved remarkable success in vision-language tasks, yet their vast parameter counts are often underutilized during both training and inference. In this work, we embrace the idea of looping back to move forward: reusing model parameters through recursive refinement to extract stronger multimodal representations without increasing model size. We propose RecursiveVLM, a recursive Transformer architecture tailored for LMMs. Two key innovations enable effective l...

📄 FlattenGPT: Depth Compression for Transformer with Layer Flattening
🗓️ Published: 2/9/2026
🔗 http://arxiv.org/abs/2602.08858v1
👥 Authors: Ruihan Xu, Qingpei Guo, Yao Zhu, Xiangyang Ji (possible past Tsinghua University affiliation), Ming Yang (possible past Meta (United States) affiliation), Shiliang Zhang
Abstract

Recent works have indicated redundancy across transformer blocks, prompting the research of depth compression to prune less crucial blocks. However, current ways of entire-block pruning suffer from risks of discarding meaningful cues learned in those blocks, leading to substantial performance degradation. As another line of model compression, channel pruning can better preserve performance, while it cannot reduce model depth and is challenged by inconsistent pruning ratios for individual layers....

📄 WildReward: Learning Reward Models from In-the-Wild Human Interactions
🗓️ Published: 2/9/2026
🔗 http://arxiv.org/abs/2602.08829v1
👥 Authors: Hao Peng (possible past Tsinghua University affiliation), Yunjia Qi, Xiaozhi Wang, Zijun Yao, Lei Hou (possible past Tsinghua University affiliation), Juanzi Li
Abstract

Reward models (RMs) are crucial for the training of large language models (LLMs), yet they typically rely on large-scale human-annotated preference pairs. With the widespread deployment of LLMs, in-the-wild interactions have emerged as a rich source of implicit reward signals. This raises the question: Can we develop reward models directly from in-the-wild interactions? In this work, we explore this possibility by adopting WildChat as an interaction source and proposing a pipeline to extract rel...

📄 CoFEH: LLM-driven Feature Engineering Empowered by Collaborative Bayesian Hyperparameter Optimization
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.09851v1
👥 Authors: Beicheng Xu, Keyao Ding, Wei Liu (possible past Tsinghua University affiliation), Yupeng Lu, Bin Cui (possible past Peking University affiliation)
Abstract

Feature Engineering (FE) is pivotal in automated machine learning (AutoML) but remains a bottleneck for traditional methods, which treat it as a black-box search, operating within rigid, predefined search spaces and lacking domain awareness. While Large Language Models (LLMs) offer a promising alternative by leveraging semantic reasoning to generate unbounded operators, existing methods fail to construct free-form FE pipelines, remaining confined to isolated subtasks such as feature generation. ...

📄 PlugSI: Plug-and-Play Test-Time Graph Adaptation for Spatial Interpolation
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.09824v1
👥 Authors: Xuhang Wu, Zhuoxuan Liang, Wei Li (possible past Peking University affiliation), Xiaohua Jia, Sumi Helal (possible past Google (United States) affiliation)
Abstract

With the rapid advancement of IoT and edge computing, sensor networks have become indispensable, driving the need for large-scale sensor deployment. However, the high deployment cost hinders their scalability. To tackle the issues, Spatial Interpolation (SI) introduces virtual sensors to infer readings from observed sensors, leveraging graph structure. However, current graph-based SI methods rely on pre-trained models, lack adaptation to larger and unseen graphs at test-time, and overlook test d...

📄 When Less is More: The LLM Scaling Paradox in Context Compression
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.09789v1
👥 Authors: Ruishan Guo, Yibing Liu, Guoxin Ma, Yan Wang (possible past Tencent (China) affiliation), Yueyang Zhang (possible past Baidu (China) affiliation), Long Xia, Kecheng Chen, Zhiyuan Sun, Daiting Shi
Abstract

Scaling up model parameters has long been a prevalent training paradigm driven by the assumption that larger models yield superior generation capabilities. However, under lossy context compression in a compressor-decoder setup, we observe a Size-Fidelity Paradox: increasing the compressor size can lessen the faithfulness of reconstructed contexts though training loss decreases. Through extensive experiments across models from 0.6B to 90B, we coin this paradox arising from two dominant factors: 1...

📄 Sparse Layer Sharpness-Aware Minimization for Efficient Fine-Tuning
🗓️ Published: 2/10/2026
🔗 http://arxiv.org/abs/2602.09395v1
👥 Authors: Yifei Cheng, Xianglin Yang, Guoxia Wang, Chao Huang (possible past Tencent (China) affiliation), Fei Ma, Dianhai Yu (possible past Baidu (China) affiliation), Xiaochun Cao, Li Shen (possible past Tencent (China) affiliation)
Abstract

Sharpness-aware minimization (SAM) seeks the minima with a flat loss landscape to improve the generalization performance in machine learning tasks, including fine-tuning. However, its extra parameter perturbation step doubles the computation cost, which becomes the bottleneck of SAM in the practical implementation. In this work, we propose an approach SL-SAM to break this bottleneck by introducing the sparse technique to layers. Our key innovation is to frame the dynamic selection of layers for ...

📄 Contact-Anchored Policies: Contact Conditioning Creates Strong Robot Utility Models
🗓️ Published: 2/9/2026
🔗 http://arxiv.org/abs/2602.09017v1
👥 Authors: Zichen Jeff Cui, Omar Rayyan, Haritheja Etukuru, Bowen Tan, Zavier Andrianarivo, Zicheng Teng, Yihang Zhou, Krish Mehta, Nicholas Wojno, Kevin Yuanbo Wu, Manan H Anjaria, Ziyuan Wu, Manrong Mao, Guangxun Zhang, Binit Shah, Yejin Kim, Soumith Chintala (possible past Meta (United States) affiliation), Lerrel Pinto (possible past Carnegie Mellon University affiliation), Nur Muhammad Mahi Shafiullah
Abstract

The prevalent paradigm in robot learning attempts to generalize across environments, embodiments, and tasks with language prompts at runtime. A fundamental tension limits this approach: language is often too abstract to guide the concrete physical understanding required for robust manipulation. In this work, we introduce Contact-Anchored Policies (CAP), which replace language conditioning with points of physical contact in space. Simultaneously, we structure CAP as a library of modular utility m...

📄 Analysis of Converged 3D Gaussian Splatting Solutions: Density Effects and Prediction Limit
🗓️ Published: 2/9/2026
🔗 http://arxiv.org/abs/2602.08909v1
👥 Authors: Zhendong Wang, Cihan Ruan, Jingchuan Xiao, Chuqing Shi, Wei Jiang (possible past Apple (United States) affiliation), Wei Wang (possible past University Of Oxford affiliation), Wenjie Liu, Nam Ling
Abstract

We investigate what structure emerges in 3D Gaussian Splatting (3DGS) solutions from standard multi-view optimization. We term these Rendering-Optimal References (RORs) and analyze their statistical properties, revealing stable patterns: mixture-structured scales and bimodal radiance across diverse scenes. To understand what determines these parameters, we apply learnability probes by training predictors to reconstruct RORs from point clouds without rendering supervision. Our analysis uncovers f...

*Notable papers are those with at least two authors from a "big" AI/ML lab.