📄 Notable* Recent AI/ML arXiv Papers

Last updated just now...

📄 Mixture-of-Depths Attention
🗓️ Published: 3/16/2026
🔗 http://arxiv.org/abs/2603.15619v1
👥 Authors: Lianghui Zhu, Yuxin Fang, Bencheng Liao, Shijie Wang, Tianheng Cheng, Zilong Huang (possible past Tencent (China) affiliation), Chen Chen (possible past Tencent (China) affiliation), Lai Wei, Yutao Zeng, Ya Wang (possible past Peking University affiliation), Yi Lin, Yu Li (possible past Tencent (China) affiliation), Xinggang Wang
Abstract

Scaling depth is a key driver for large language models (LLMs). Yet, as LLMs become deeper, they often suffer from signal degradation: informative features formed in shallow layers are gradually diluted by repeated residual updates, making them harder to recover in deeper layers. We introduce mixture-of-depths attention (MoDA), a mechanism that allows each attention head to attend to sequence KV pairs at the current layer and depth KV pairs from preceding layers. We further describe a hardware-e...

📄 The PokeAgent Challenge: Competitive and Long-Context Learning at Scale
🗓️ Published: 3/16/2026
🔗 http://arxiv.org/abs/2603.15563v1
👥 Authors: Seth Karten, Jake Grigsby, Tersoo Upaa, Junik Bae, Seonghun Hong, Hyunyoung Jeong, Jaeyoon Jung, Kun Kerdthaisong, Gyungbo Kim, Hyeokgi Kim, Yujin Kim, Eunju Kwon, Dongyu Liu, Patrick Mariglia, Sangyeon Park, Benedikt Schink, Xianwei Shi, Anthony Sistilli, Joseph Twin, Arian Urdu, Matin Urdu, Qiao Wang, Ling Wu, Wenli Zhang (possible past Baidu (China) affiliation), Kunsheng Zhou, Stephanie Milani, Kiran Vodrahalli, Amy Zhang (possible past University Of California, Berkeley affiliation), Fei Fang, Yuke Zhu (possible past Stanford University affiliation), Chi Jin
Abstract

We present the PokeAgent Challenge, a large-scale benchmark for decision-making research built on Pokemon's multi-agent battle system and expansive role-playing game (RPG) environment. Partial observability, game-theoretic reasoning, and long-horizon planning remain open problems for frontier AI, yet few benchmarks stress all three simultaneously under realistic conditions. PokeAgent targets these limitations at scale through two complementary tracks: our Battling Track, which calls for strategi...

📄 Are Dilemmas and Conflicts in LLM Alignment Solvable? A View from Priority Graph
🗓️ Published: 3/16/2026
🔗 http://arxiv.org/abs/2603.15527v1
👥 Authors: Zhenheng Tang, Xiang Liu, Qian Wang, Eunsol Choi (possible past Google (United States) affiliation), Bo Li (possible past Tencent (China) affiliation), Xiaowen Chu
Abstract

As Large Language Models (LLMs) become more powerful and autonomous, they increasingly face conflicts and dilemmas in many scenarios. We first summarize and taxonomize these diverse conflicts. Then, we model the LLM's preferences to make different choices as a priority graph, where instructions and values are nodes, and the edges represent context-specific priorities determined by the model's output distribution. This graph reveals that a unified stable LLM alignment is very challenging, because...

📄 Why AI systems don't learn and what to do about it: Lessons on autonomous learning from cognitive science
🗓️ Published: 3/16/2026
🔗 http://arxiv.org/abs/2603.15381v1
👥 Authors: Emmanuel Dupoux, Yann Lecun (possible past Meta (United States) affiliation), Jitendra Malik (possible past University Of California, Berkeley affiliation)
Abstract

We critically examine the limitations of current AI models in achieving autonomous learning and propose a learning architecture inspired by human and animal cognition. The proposed framework integrates learning from observation (System A) and learning from active behavior (System B) while flexibly switching between these learning modes as a function of internally generated meta-control signals (System M). We discuss how this could be built by taking inspiration on how organisms adapt to real-wor...

📄 FuXiWeather2: Learning accurate atmospheric state estimation for operational global weather forecasting
🗓️ Published: 3/16/2026
🔗 http://arxiv.org/abs/2603.15358v1
👥 Authors: Xiaoze Xu, Xiuyu Sun, Songling Zhu, Xiaohui Zhong, Yuanqing Huang, Zijian Zhu, Jun Liu (possible past Tencent (China) affiliation), Hao Li (possible past Tsinghua University affiliation)
Abstract

Numerical weather prediction has long been constrained by the computational bottlenecks inherent in data assimilation and numerical modeling. While machine learning has accelerated forecasting, existing models largely serve as "emulators of reanalysis products," thereby retaining their systematic biases and operational latencies. Here, we present FuXiWeather2, a unified end-to-end neural framework for assimilation and forecasting. We align training objectives directly with a combination of real-...

📄 From Documents to Spans: Code-Centric Learning for LLM-based ICD Coding
🗓️ Published: 3/16/2026
🔗 http://arxiv.org/abs/2603.15270v1
👥 Authors: Xu Zhang (possible past Tencent (China) affiliation), Wenxin Ma, Chenxu Wu, Rongsheng Wang, Kun Zhang (possible past Google (United States) affiliation), S. Kevin Zhou
Abstract

ICD coding is a critical yet challenging task in healthcare. Recently, LLM-based methods demonstrate stronger generalization than discriminative methods in ICD coding. However, fine-tuning LLMs for ICD coding faces three major challenges. First, existing public ICD coding datasets provide limited coverage of the ICD code space, restricting a model's ability to generalize to unseen codes. Second, naive fine-tuning diminishes the interpretability of LLMs, as few public datasets contain explicit su...

📄 AGCD: Agent-Guided Cross-Modal Decoding for Weather Forecasting
🗓️ Published: 3/16/2026
🔗 http://arxiv.org/abs/2603.15260v1
👥 Authors: Jing Wu, Yang Liu (possible past Tsinghua University affiliation), Lin Zhang, Junbo Zeng, Jiabin Wang, Zi Ye, Guowen Li, Shilei Cao (possible past Tencent (China) affiliation), Jiashun Cheng, Fang Wang (possible past Tencent (China) affiliation), Meng Jin, Yerong Feng, Hong Cheng, Yutong Lu, Haohuan Fu (possible past Tsinghua University affiliation), Juepeng Zheng (possible past Tsinghua University affiliation)
Abstract

Accurate weather forecasting is more than grid-wise regression: it must preserve coherent synoptic structures and physical consistency of meteorological fields, especially under autoregressive rollouts where small one-step errors can amplify into structural bias. Existing physics-priors approaches typically impose global, once-for-all constraints via architectures, regularization, or NWP coupling, offering limited state-adaptive and sample-specific controllability at deployment. To bridge this g...

📄 Decision-Level Ordinal Modeling for Multimodal Essay Scoring with Large Language Models
🗓️ Published: 3/16/2026
🔗 http://arxiv.org/abs/2603.14891v1
👥 Authors: Han Zhang (possible past Tsinghua University affiliation), Jiamin Su, Li Liu (possible past National University Of Defense Technology affiliation)
Abstract

Automated essay scoring (AES) predicts multiple rubric-defined trait scores for each essay, where each trait follows an ordered discrete rating scale. Most LLM-based AES methods cast scoring as autoregressive token generation and obtain the final score via decoding and parsing, making the decision implicit. This formulation is particularly sensitive in multimodal AES, where the usefulness of visual inputs varies across essays and traits. To address these limitations, we propose Decision-Level Or...

📄 Towards Privacy-Preserving Machine Translation at the Inference Stage: A New Task and Benchmark
🗓️ Published: 3/16/2026
🔗 http://arxiv.org/abs/2603.14756v1
👥 Authors: Wei Shao (possible past Stanford University affiliation), Lemao Liu (possible past Tencent (China) affiliation), Yinqiao Li, Guoping Huang, Shuming Shi (possible past Tencent (China) affiliation), Linqi Song
Abstract

Current online translation services require sending user text to cloud servers, posing a risk of privacy leakage when the text contains sensitive information. This risk hinders the application of online translation services in privacy-sensitive scenarios. One way to mitigate this risk for online translation services is introducing privacy protection mechanisms targeting the inference stage of translation models. However, compared to subfields of NLP like text classification and summarization, th...

📄 MVHOI: Bridge Multi-view Condition to Complex Human-Object Interaction Video Reenactment via 3D Foundation Model
🗓️ Published: 3/16/2026
🔗 http://arxiv.org/abs/2603.14686v1
👥 Authors: Jinguang Tong, Jinbo Wu, Kaisiyuan Wang, Zhelun Shen, Xuan Huang, Mochu Xiang, Xuesong Li, Yingying Li, Haocheng Feng (possible past Baidu (China) affiliation), Chen Zhao (possible past Stanford University affiliation), Hang Zhou (possible past Baidu (China) affiliation), Wei He (possible past Baidu (China) affiliation), Chuong Nguyen, Jingdong Wang (possible past Baidu (China) affiliation), Hongdong Li
Abstract

Human-Object Interaction (HOI) video reenactment with realistic motion remains a frontier in expressive digital human creation. Existing approaches primarily handle simple image-plane motion (e.g., in-plane translations), struggling with complex non-planar manipulations like out-of-plane reorientation. In this paper, we propose MVHOI, a two-stage HOI video reenactment framework that bridges multi-view reference conditions and video foundation models via a 3D Foundation Model (3DFM). The 3DFM fir...

📄 CangjieBench: Benchmarking LLMs on a Low-Resource General-Purpose Programming Language
🗓️ Published: 3/15/2026
🔗 http://arxiv.org/abs/2603.14501v1
👥 Authors: Junhang Cheng, Fang Liu (possible past Massachusetts Institute Of Technology affiliation), Jia Li (possible past Google (United States) affiliation), Chengru Wu, Nanxiang Jiang, Li Zhang (possible past University Of Oxford affiliation)
Abstract

Large Language Models excel in high-resource programming languages but struggle with low-resource ones. Existing research related to low-resource programming languages primarily focuses on Domain-Specific Languages (DSLs), leaving general-purpose languages that suffer from data scarcity underexplored. To address this gap, we introduce CangjieBench, a contamination-free benchmark for Cangjie, a representative low-resource general-purpose language. The benchmark comprises 248 high-quality samples ...

📄 MBD: A Model-Based Debiasing Framework Across User, Content, and Model Dimensions
🗓️ Published: 3/15/2026
🔗 http://arxiv.org/abs/2603.14422v1
👥 Authors: Yuantong Li, Lei Yuan, Zhihao Zheng, Weimiao Wu, Songbin Liu, Jeong Min Lee (possible past Meta (United States) affiliation), Ali Selman Aydin, Shaofeng Deng, Junbo Chen, Xinyi Zhang, Hongjing Xia, Sam Fieldman, Matthew Kosko, Wei Fu, Du Zhang, Peiyu Yang, Albert Jin Chung, Xianlei Qiu, Miao Yu, Zhongwei Teng, Hao Chen, Sunny Baek, Hui Tang (possible past Tencent (China) affiliation), Yang Lv, Renze Wang, Qifan Wang (possible past Google (United States) affiliation), Zhan Li, Tiantian Xu, Peng Wu, Ji Liu (possible past Tencent (China) affiliation)
Abstract

Modern recommendation systems rank candidates by aggregating multiple behavioral signals through a value model. However, many commonly used signals are inherently affected by heterogeneous biases. For example, watch time naturally favors long-form content, loop rate favors short - form content, and comment probability favors videos over images. Such biases introduce two critical issues: (1) value model scores may be systematically misaligned with users' relative preferences - for instance, a see...

📄 Fold-CP: A Context Parallelism Framework for Biomolecular Modeling
🗓️ Published: 3/16/2026
🔗 http://arxiv.org/abs/2603.14806v1
👥 Authors: Dejun Lin, Simon Chu, Vishanth Iyer, Youhan Lee, John St John, Kevin Boyd, Brian Roland, Xiaowei Ren, Guoqing Zhou, Zhonglin Cao, Polina Binder, Yuliya Zhautouskaya, Jakub Zakrzewski, Maximilian Stadler, Kyle Gion, Yuxing Peng, Xi Chen (possible past University Of California, Berkeley affiliation), Tianjing Zhang, Philipp Junk, Michelle Dimon (possible past Google (United States) affiliation), Paweł Gniewek, Fabian Ortega, Mckinley Polen, Ivan Grubisic, Ali Bashir (possible past Google (United States) affiliation), Graham Holt, Danny Kovtun, Matthias Grass, Luca Naef, Rui Wang (possible past Tencent (China) affiliation), Jian Peng, Anthony Costa (possible past Nvidia (United States) affiliation), Saee Paliwal, Eddie Calleja, Timur Rvachov, Neha Tadimeti, Roy Tal, Emine Kucukbenli
Abstract

Understanding cellular machinery requires atomic-scale reconstruction of large biomolecular assemblies. However, predicting the structures of these systems has been constrained by hardware memory requirements of models like AlphaFold 3, imposing a practical ceiling of a few thousand residues that can be processed on a single GPU. Here we present NVIDIA BioNeMo Fold-CP, a context parallelism framework that overcomes this barrier by distributing the inference and training pipelines of co-folding m...

*Notable papers are those with at least two authors from a "big" AI/ML lab.