πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
πŸ—“οΈ Published: 4/14/2026
πŸ”— http://arxiv.org/abs/2604.13016v1
πŸ‘₯ Authors: Yaxuan Li, Yuxin Zuo, Bingxiang He, Jinqian Zhang, Chaojun Xiao, Cheng Qian, Tianyu Yu, Huan-Ang Gao, Wenkai Yang, Zhiyuan Liu (possible past Tsinghua University affiliation), Ning Ding (possible past Tsinghua University affiliation)
Abstract

On-policy distillation (OPD) has become a core technique in the post-training of large language models, yet its training dynamics remain poorly understood. This paper provides a systematic investigation of OPD dynamics and mechanisms. We first identify that two conditions govern whether OPD succeeds or fails: (i) the student and teacher should share compatible thinking patterns; and (ii) even with consistent thinking patterns and higher scores, the teacher must offer genuinely new capabilities b...

πŸ“„ Human-Centric Topic Modeling with Goal-Prompted Contrastive Learning and Optimal Transport
πŸ—“οΈ Published: 4/14/2026
πŸ”— http://arxiv.org/abs/2604.12663v1
πŸ‘₯ Authors: Rui Wang (possible past Tencent (China) affiliation), Yi Zheng, Dongxin Wang, Haiping Huang, Yuanzhi Yao, Yuxiang Zhou, Jialin Yu, Philip Torr (possible past University Of Oxford affiliation)
Abstract

Existing topic modeling methods, from LDA to recent neural and LLM-based approaches, which focus mainly on statistical coherence, often produce redundant or off-target topics that miss the user's underlying intent. We introduce Human-centric Topic Modeling, \emph{Human-TM}), a novel task formulation that integrates a human-provided goal directly into the topic modeling process to produce interpretable, diverse and goal-oriented topics. To tackle this challenge, we propose the \textbf{G}oal-promp...

πŸ“„ KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance
πŸ—“οΈ Published: 4/14/2026
πŸ”— http://arxiv.org/abs/2604.12627v1
πŸ‘₯ Authors: Linhao Yu, Tianmeng Yang, Siyu Ding, Renren Jin, Naibin Gu, Xiangzhao Hao, Shuaiyi Nie, Deyi Xiong, Weichong Yin (possible past Baidu (China) affiliation), Yu Sun (possible past Baidu (China) affiliation), Hua Wu (possible past Baidu (China) affiliation)
Abstract

RLVR improves reasoning in large language models, but its effectiveness is often limited by severe reward sparsity on hard problems. Recent hint-based RL methods mitigate sparsity by injecting partial solutions or abstract templates, yet they typically scale guidance by adding more tokens, which introduce redundancy, inconsistency, and extra training overhead. We propose \textbf{KnowRL} (Knowledge-Guided Reinforcement Learning), an RL training framework that treats hint design as a minimal-suffi...

πŸ“„ Neural Dynamic GI: Random-Access Neural Compression for Temporal Lightmaps in Dynamic Lighting Environments
πŸ—“οΈ Published: 4/14/2026
πŸ”— http://arxiv.org/abs/2604.12625v1
πŸ‘₯ Authors: Jianhui Wu, Jian Zhou (possible past Tencent (China) affiliation), Zhi Zhou, Zhangjin Huang, Chao Li (possible past Baidu (China) affiliation)
Abstract

High-quality global illumination (GI) in real-time rendering is commonly achieved using precomputed lighting techniques, with lightmap as the standard choice. To support GI for static objects in dynamic lighting environments, multiple lightmaps at different lighting conditions need to be precomputed, which incurs substantial storage and memory overhead. To overcome this limitation, we propose Neural Dynamic GI (NDGI), a novel compression technique specifically designed for temporal lightmap se...

πŸ“„ KumoRFM-2: Scaling Foundation Models for Relational Learning
πŸ—“οΈ Published: 4/14/2026
πŸ”— http://arxiv.org/abs/2604.12596v1
πŸ‘₯ Authors: Valter Hudovernik, Federico LΓ³pez, Vid Kocijan, Akihiro Nitta, Jan Eric Lenssen (possible past Meta (United States) affiliation), Jure Leskovec (possible past Stanford University affiliation), Matthias Fey
Abstract

We introduce KumoRFM-2, the next iteration of a pre-trained foundation model for relational data. KumoRFM-2 supports in-context learning as well as fine-tuning and is applicable to a wide range of predictive tasks. In contrast to tabular foundation models, KumoRFM-2 natively operates on relational data, processing one or more connected tables simultaneously without manual table flattening or target variable generation, all while preserving temporal consistency. KumoRFM-2 leverages a large corpus...

πŸ“„ NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Challenge: Professional Image Quality Assessment (Track 1)
πŸ—“οΈ Published: 4/14/2026
πŸ”— http://arxiv.org/abs/2604.12512v1
πŸ‘₯ Authors: Guanyi Qin, Jie Liang, Bingbing Zhang, Lishen Qu, Ya-Nan Guan, Hui Zeng, Lei Zhang, Radu Timofte (possible past Eth Zurich affiliation), Jianhui Sun, Xinli Yue, Tao Shao, Huan Hou, Wenjie Liao, Shuhao Han, Jieyu Yuan, Chunle Guo, Chongyi Li, Zewen Chen, Yunze Liu, Jian Guo, Juan Wang, Yun Zeng, Bing Li, Weiming Hu, Hesong Li, Dehua Liu, Xinjie Zhang, Qiang Li, Li Yan, Wei Dong, Qingsen Yan, Xingcan Li, Shenglong Zhou, Manjiang Yin, Yinxiang Zhang, Hongbo Wang, Jikai Xu, Zhaohui Fan, Dandan Zhu, Wei Sun (possible past Google (United States) affiliation), Weixia Zhang, Kun Zhu, Nana Zhang, Kaiwei Zhang, Qianqian Zhang, Zhihan Zhang, William Gordon, Linwei Wu, Jiachen Tu, Guoyi Xu, Yaoxin Jiang, Cici Liu, Yaokun Shi
Abstract

In this paper, we present an overview of the NTIRE 2026 challenge on the 3rd Restore Any Image Model in the Wild, specifically focusing on Track 1: Professional Image Quality Assessment. Conventional Image Quality Assessment (IQA) typically relies on scalar scores. By compressing complex visual characteristics into a single number, these methods fundamentally struggle to distinguish subtle differences among uniformly high-quality images. Furthermore, they fail to articulate why one image is supe...

πŸ“„ IAD-Unify: A Region-Grounded Unified Model for Industrial Anomaly Segmentation, Understanding, and Generation
πŸ—“οΈ Published: 4/14/2026
πŸ”— http://arxiv.org/abs/2604.12440v1
πŸ‘₯ Authors: Haoyu Zheng, Tianwei Lin (possible past Baidu (China) affiliation), Wei Wang (possible past University Of Oxford affiliation), Zhuonan Wang, Wenqiao Zhang, Jiaqi Zhu, Feifei Shao
Abstract

Real-world industrial inspection requires not only localizing defects, but also explaining them in natural language and generating controlled defect edits. However, existing approaches fail to jointly support all three capabilities within a unified framework and evaluation protocol. We propose IAD-Unify, a dual-encoder unified framework in which a frozen DINOv2-based region expert supplies precise anomaly evidence to a shared Qwen3.5-4B vision-language backbone via lightweight token injection, j...

πŸ“„ Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
πŸ—“οΈ Published: 4/14/2026
πŸ”— http://arxiv.org/abs/2604.12374v1
πŸ‘₯ Authors: Nvidia, :, Aakshita Chandiramani, Aaron Blakeman, Abdullahi Olaoye, Abhibha Gupta, Abhilash Somasamudramath, Abhinav Khattar, Adeola Adesoba, Adi Renduchintala, Adil Asif, Aditya Agrawal (possible past Nvidia (United States) affiliation), Aditya Vavre, Ahmad Kiswani, Aishwarya Padmakumar, Ajay Hotchandani, Akanksha Shukla, Akhiad Bercovich, Aleksander Ficek, Aleksandr Shaposhnikov, Alex Gronskiy, Alex Kondratenko, Alex Neefus, Alex Steiner, Alex Yang, Alexander Bukharin, Alexander Young, Ali Hatamizadeh (possible past Nvidia (United States) affiliation), Ali Taghibakhshi, Alina Galiautdinova, Alisa Liu, Alok Kumar (possible past Google (United States) affiliation), Ameya Sunil Mahabaleshwarkar, Amir Klein, Amit Zuker, Amnon Geifman, Anahita Bhiwandiwalla, Ananth Subramaniam, Andrew Tao (possible past Nvidia (United States) affiliation), Anjaney Shrivastava, Anjulie Agrusa, Ankur Srivastava, Ankur Verma, Ann Guan, Anna Shors, Annamalai Chockalingam, Anubhav Mandarwal, Aparnaa Ramani, Arham Mehta, Arti Jain, Arun Venkatesan, Asha Anoosheh, Ashwath Aithal, Ashwin Poojary, Asif Ahamed, Asit Mishra, Asli Sabanci Demiroz, Asma Kuriparambil Thekkumpate, Atefeh Sohrabizadeh, Avinash Kaur, Ayush Dattagupta, Barath Subramaniam Anandan, Bardiya Sadeghi, Barnaby Simkin, Ben Lanir, Benedikt Schifferer, Benjamin Chislett, Besmira Nushi, Bilal Kartal, Bill Thiede, Bita Darvish Rouhani, Bobby Chen, Boris Ginsburg (possible past Nvidia (United States) affiliation), Brandon Norick, Branislav Kisacanin, Brian Yu, Bryan Catanzaro (possible past University Of California, Berkeley affiliation), Buvaneswari Mani, Carlo Del Mundo, Chankyu Lee, Chanran Kim, Chantal Hwang, Chao Ni, Charles Wang, Charlie Truong, Cheng-Ping Hsieh, Chenhan Yu, Chenjie Luo, Cherie Wang, Chetan Mungekar, Chintan Patel, Chris Alexiuk, Chris Holguin, Chris Wing, Christian Munley, Christopher Parisien (possible past Nvidia (United States) affiliation), Chuck Desai, Chunyang Sheng, Collin Neale, Cyril Meurillon, Dakshi Kumar, Dan Gil, Dan Su (possible past Tencent (China) affiliation), Dane Corneil, Daniel Afrimi, Daniel Burkhardt Eliuth Triana, Daniel Egert, Daniel Fatade, Daniel Lo, Daniel Rohrer, Daniel Serebrenik, Daniil Sorokin, Daria Gitman, Daria Levy, Darko Stosic, David Edelsohn, David Messina, David Mosallanezhad, David Tamok, Deena Donia, Deepak Narayanan, Devin O'kelly, Dheeraj Peri, Dhruv Nathawani, Di Wu, Dima Rekesh, Dina Yared, Divyanshu Kakwani, Dmitry Konyagin Brandon Tuttle, Dong Ahn, Dongfu Jiang, Dorrin Poorkay, Douglas O'flaherty, Duncan Riach, Dusan Stosic, Dustin Van Stee, Edgar Minasyan, Edward Lin, Eileen Peters Long, Elad Segal, Elena Lantz, Elena Lewis, Ellie Evans, Elliott Ning, Eric Chung, Eric Harper, Eric Pham-Hung, Eric W. Tramel, Erick Galinkin, Erik Pounds, Esti Etrog, Evan Briones, Evan Wu, Evelina Bakhturina (possible past Nvidia (United States) affiliation), Evgeny Tsykunov, Ewa Dobrowolska, Farshad Saberi Movahed, Farzan Memarian, Fay Wang, Fei Jia, Felipe Soares, Felipe Vieira Frujeri, Feng Chen, Fengguang Lin, Ferenc Galko, Fortuna Zhang, Frankie Siino, Frida Hou, Gantavya Bhatt, Gargi Prasad, Geethapriya Venkataramani, Geetika Gupta, George Armstrong, Gerald Shen, Giulio Borghesi, Gordana Neskovic, Gorkem Batmaz, Grace Lam, Grace Wu, Greg Pauloski, Greyson Davis, Grigor Nalbandyan, Guoming Zhang, Guy Farber, Guyue Huang, Haifeng Qian, Haran Kumar Shiv Kumar, Harry Kim, Harsh Sharma, Hayate Iso, Hayley Ross, Herbert Hum, Herman Sahota, Hexin Wang, Himanshu Soni, Hiren Upadhyay, Huy Nguyen, Iain Cunningham, Ido Galil, Ido Shahaf, Igino Padovani, Igor Gitman, Igor Shovkun, Ikroop Dhillon, Ilya Loshchilov, Ingrid Kelly, Itamar Schen, Itay Levy, Ivan Moshkov, Izik Golan, Izzy Putterman, Jain Tu, Jan Baczek, Jan Kautz (possible past Nvidia (United States) affiliation), Jane Polak Scowcroft, Janica Rosenberg, Jared Casper, Jarrod Pflum, Jason Grant, Jason Sewall, Jatin Mitra, Jeffrey Glick, Jenny Chen, Jesse Oliver, Jiacheng Xu, Jiafan Zhu, Jialin Song, Jian Zhang (possible past Tencent (China) affiliation), Jiaqi Zeng, Jie Lou, Jill Milton, Jim Chow, Jimmy Zhang, Jinhang Choi, Jining Huang, Jocelyn Huang (possible past Nvidia (United States) affiliation), Joel Caruso, Joey Conway, Joey Guman, Johan Jatko, John Kamalu, Johnny Greco, Jonathan Cohen (possible past Nvidia (United States) affiliation), Jonathan Raiman (possible past Openai (United States) affiliation), Joseph Jennings, Joyjit Daw, Juan Yu, Julio Tapia, Junkeun Yi, Jupinder Parmar, Jyothi Achar, Kari Briski, Kartik Mattoo, Katherine Cheung, Katherine Luna, Keith Wyss, Kevin Shih, Kezhi Kong, Khanh Nguyen, Khushi Bhardwaj, Kirill Buryak, Kirthi Shankar Sivamani, Konstantinos Krommydas, Kris Murphy, Krishna C. Puvvada, Krzysztof Pawelec, Kumar Anik, Laikh Tewari, Laya Sleiman, Leo Du, Leon Derczynski, Li Ding, Lilach Ilan, Lingjie Wu, Lizzie Wei, Luis Vega, Lun Su, Maarten Van Segbroeck, Maer Rodrigues De Melo, Magaret Zhang, Mahan Fathi, Makesh Narsimhan Sreedhar, Makesh Sreedhar, Makesh Tarun Chandran, Manuel Reyes Gomez, Maor Ashkenazi, Marc Cuevas, Marc Romeijn, Margaret Zhang, Mark Cai, Mark Gabel, Markus Kliegl, Martyna Patelka, Maryam Moosaei, Matthew Varacalli, Matvei Novikov, Mauricio Ferrato, Mehrzad Samadi (possible past Nvidia (United States) affiliation), Melissa Corpuz, Meng Xin, Mengdi Wang, Mengru Wang, Meredith Price, Micah Schaffer, Michael Andersch, Michael Boone, Michael Evans, Michael Z Wang, Miguel Martinez, Mikail Khona, Mike Chrzanowski (possible past Google (United States) affiliation), Mike Hollinger, Mingyuan Ma, Minseok Lee, Mohammad Dabbah, Mohammad Shoeybi (possible past Nvidia (United States) affiliation), Mostofa Patwary (possible past Nvidia (United States) affiliation), Nabin Mulepati, Nader Khalil, Najeeb Nabwani, Nancy Agarwal, Nanthini Balasubramaniam, Narimane Hennouni, Narsi Kodukula, Natalie Hereth, Nathaniel Pinckney (possible past Nvidia (United States) affiliation), Nave Assaf, Negar Habibi, Nestor Qin, Neta Zmora, Netanel Haber, Nick Reamaroon, Nickson Quak, Nidhi Bhatia, Nikhil Jukar, Nikki Pope, Nikolai Ludwig, Nima Tajbakhsh, Nir Ailon (possible past Google (United States) affiliation), Nirmal Juluru, Nirmalya De, Nowel Pitt, Oleg Rybakov (possible past Google (United States) affiliation), Oleksii Hrinchuk, Oleksii Kuchaiev (possible past Nvidia (United States) affiliation), Olivier Delalleau, Oluwatobi Olabiyi, Omer Ullman Argov, Omri Almog, Omri Puny, Oren Tropp, Otavio Padovani, Ouye Xie, Parth Chadha, Pasha Shamis, Paul Gibbons, Pavlo Molchanov (possible past Nvidia (United States) affiliation), Peter Belcak, Peter Jin, Pinky Xu, Piotr Januszewski, Pooya Jannaty, Prachi Shevate, Pradeep Thalasta, Pranav Prashant Thombre, Prasoon Varshney, Prerana Gambhir, Pritam Gundecha, Przemek Tredak, Qing Miao, Qiyu Wan, Quan Tran Minh, Rabeeh Karimi Mahabadi (possible past Google (United States) affiliation), Rachel Oberman, Rachit Garg, Rahul Kandu, Raina Zhong, Ran El-Yaniv, Ran Zilberstein, Rasoul Shafipour, Renee Yao, Renjie Pi, Richard Mazzarese, Richard Wang (possible past Carnegie Mellon University affiliation), Rick Izzo, Ridhima Singla, Rima Shahbazyan, Rishabh Garg, Ritika Borkar, Ritu Gala, Riyad Islam, Robert Clark, Robert Hesse, Roger Waleffe, Rohit Varma Kalidindi, Rohit Watve, Roi Koren, Ron Fan, Ruchika Kharwar, Ruisi Cai, Ruoxi Zhang, Russell J. Hewett, Ryan Prenger (possible past Nvidia (United States) affiliation), Ryan Timbrook, Ryota Egashira, Sadegh Mahdavi, Sagar Singh Ashutosh Joshi, Sahil Modi, Samuel Kriman, Sandeep Pombra, Sanjay Kariyappa, Sanjeev Satheesh (possible past Baidu (China) affiliation), Santiago Pombo, Saori Kaji, Satish Pasumarthi, Saurav Mishra, Saurav Muralidharan, Scott Hara, Sean Narenthiran, Sebastian Rogawski, Seonjin Na, Seonmyeong Bak, Sepehr Sameni, Seth Poulos, Shahar Mor, Shantanu Acharya, Shaona Ghosh Adam Lord, Sharath Turuvekere Sreenivas, Shaun Kotek, Shaya Gharghabi, Shelby Thomas, Sheng-Chieh Lin, Shibani Likhite, Shiqing Fan, Shiyang Chen, Shreya Gopal, Shrimai Prabhumoye (possible past Carnegie Mellon University affiliation), Shubham Pachori, Shubham Toshniwal, Shuo Zhang (possible past National University Of Defense Technology affiliation), Shuoyang Ding, Shyam Renjith, Shyamala Prayaga, Siddhartha Jain, Simeng Sun, Sirisha Rella, Sirshak Das, Smita Ithape, Sneha Harishchandra S, Somshubra Majumdar (possible past Nvidia (United States) affiliation), Soumye Singhal, Sri Harsha Singudasu, Sriharsha Niverty, Stas Sergienko, Stefana Gloginic, Stefania Alborghetti, Stephen Ge, Stephen Mccullough, Sugam Dipak Devare, Suguna Varshini Velury, Sukrit Rao, Sumeet Kumar Barua, Sunny Gai, Suseella Panguluri, Sushil Koundinyan, Swathi Patnam, Sweta Priyadarshi, Swetha Bhendigeri, Syeda Nahida Akter, Sylendran Arunagiri, Tailling Yuan, Talor Abramovich, Tan Bui, Tan Yu (possible past Baidu (China) affiliation), Terry Kong, Thanh Do, Thomas Gburek, Thorgane Marques, Tiffany Moore, Tijmen Blankevoort, Tim Moon, Timothy Ma, Tiyasa Mitra, Tomasz Grzegorzek, Tomer Asida, Tomer Bar Natan, Tomer Keren, Tomer Ronen, Traian Rebedea, Trenton Starkey, Tugrul Konuk, Twinkle Vashishth, Tyler Condensa, Udi Karpas, Ushnish De, Vahid Noorozi, Vahid Noroozi, Vanshil Atul Shah, Veena Vaidyanathan, Venkat Srinivasan, Venmugil Elango, Victor Cui, Vijay Korthikanti, Vikas Mehta, Virginia Adams, Virginia Wu, Vitaly Kurin, Vitaly Lavrukhin (possible past Nvidia (United States) affiliation), Vladimir Anisimov, Wan Seo, Wanli Jiang, Wasi Uddin Ahmad, Wei Du, Wei Ping (possible past Baidu (China) affiliation), Wei-Ming Chen, Wendy Quan, Wenliang Dai, Wenwen Gao, Will Jennings, William Zhang, Xiaowei Ren, Xiaowen Xin, Xin Li (possible past Google (United States) affiliation), Yang Yu, Yangyi Chen, Yaniv Galron, Yashaswi Karnati, Yejin Choi (possible past Allen Institute For Artificial Intelligence affiliation), Yev Meyer, Yi-Fu Wu, Yian Zhang, Ying Lin, Yonatan Geifman, Yonggan Fu, Yoshi Suhara, Youngeun Kwon, Yuan Zhang (possible past Google (United States) affiliation), Yuki Huang, Zach Moshe, Zhilin Wang, Zhiyu Cheng, Zhongbo Zhu, Zhuolin Yang, Zihan Liu, Zijia Chen, Zijie Yan, Zuhair Ahmed
Abstract

We describe the pre-training, post-training, and quantization of Nemotron 3 Super, a 120 billion (active 12 billion) parameter hybrid Mamba-Attention Mixture-of-Experts model. Nemotron 3 Super is the first model in the Nemotron 3 family to 1) be pre-trained in NVFP4, 2) leverage LatentMoE, a new Mixture-of-Experts architecture that optimizes for both accuracy per FLOP and accuracy per parameter, and 3) include MTP layers for inference acceleration through native speculative decoding. We pre-trai...

πŸ“„ GCA Framework: A Gulf-Grounded Dataset and Agentic Pipeline for Climate Decision Support
πŸ—“οΈ Published: 4/14/2026
πŸ”— http://arxiv.org/abs/2604.12306v1
πŸ‘₯ Authors: Muhammad Umer Sheikh, Khawar Shehzad, Salman Khan (possible past Inception Institute Of Artificial Intelligence affiliation), Fahad Shahbaz Khan (possible past Inception Institute Of Artificial Intelligence affiliation), Muhammad Haris Khan
Abstract

Climate decision-making in the Gulf increasingly demands systems that can translate heterogeneous scientific and policy evidence into actionable guidance, yet general-purpose large language models (LLMs) remain weak both in region-specific climate knowledge and grounded interaction with geospatial and forecasting tools. We present the GCA framework, which unifies (i) GCA-DS, a curated Gulf-focused multimodal dataset, and (ii) Gulf Climate Agent (GCA), a tool-augmented agent for climate analysis....

πŸ“„ Beyond Scores: Diagnostic LLM Evaluation via Fine-Grained Abilities
πŸ—“οΈ Published: 4/14/2026
πŸ”— http://arxiv.org/abs/2604.12191v1
πŸ‘₯ Authors: Xu Zhang (possible past Tencent (China) affiliation), Xudong Gong (possible past Tencent (China) affiliation), Jiacheng Qin, Qiang Wang, Jiaqi Liao, Zhe Wang (possible past Deepmind (United Kingdom) affiliation), Dawei Feng, Bo Ding
Abstract

Current evaluations of large language models aggregate performance across diverse tasks into single scores. This obscures fine-grained ability variation, limiting targeted model improvement and ability-guided selection for specific tasks. Motivated by this gap, we propose a cognitive diagnostic framework that estimates model abilities across multiple fine-grained dimensions. For mathematics, we construct a 35-dimensional ability taxonomy grounded in cognitive theory and domain knowledge. The fra...

πŸ“„ Long-Horizon Plan Execution in Large Tool Spaces through Entropy-Guided Branching
πŸ—“οΈ Published: 4/13/2026
πŸ”— http://arxiv.org/abs/2604.12126v1
πŸ‘₯ Authors: Rongzhe Wei, Ge Shi, Min Cheng, Na Zhang, Pan Li (possible past Baidu (China) affiliation), Sarthak Ghosh, Vaibhav Gorde, Leman Akoglu (possible past Carnegie Mellon University affiliation)
Abstract

Large Language Models (LLMs) have significantly advanced tool-augmented agents, enabling autonomous reasoning via API interactions. However, executing multi-step tasks within massive tool libraries remains challenging due to two critical bottlenecks: (1) the absence of rigorous, plan-level evaluation frameworks and (2) the computational demand of exploring vast decision spaces stemming from large toolsets and long-horizon planning. To bridge these gaps, we first introduce SLATE (Synthetic Large-...

πŸ“„ The Second Challenge on Cross-Domain Few-Shot Object Detection at NTIRE 2026: Methods and Results
πŸ—“οΈ Published: 4/13/2026
πŸ”— http://arxiv.org/abs/2604.11998v1
πŸ‘₯ Authors: Xingyu Qiu, Yuqian Fu, Jiawei Geng, Bin Ren, Jiancheng Pan, Zongwei Wu, Hao Tang, Yanwei Fu, Radu Timofte (possible past Eth Zurich affiliation), Nicu Sebe, Mohamed Elhoseiny (possible past Meta (United States) affiliation), Lingyi Hong, Mingxi Cheng, Xingqi He, Runze Li, Xingdong Sheng, Wenqiang Zhang, Jiacong Liu, Shu Luo, Yikai Qin, Yaze Zhao, Yongwei Jiang, Yixiong Zou, Zhe Zhang, Yang Yang (possible past Tencent (China) affiliation), Kaiyu Li, Bowen Fu, Zixuan Jiang, Ke Li (possible past University Of California, Berkeley affiliation), Hui Qiao, Xiangyong Cao, Xuanlong Yu, Youyang Sha, Longfei Liu, Di Yang, Xi Shen (possible past Tencent (China) affiliation), Kyeongryeol Go, Taewoong Jang, Saiprasad Meesiyawar, Ravi Kirasur, Rakshita Kulkarni, Bhoomi Deshpande, Harsh Patil, Uma Mudenagudi, Shuming Hu, Chao Chen (possible past Tencent (China) affiliation), Tao Wang (possible past Stanford University affiliation), Wei Zhou, Qi Xu, Zhenzhao Xing, Dandan Zhao, Hanzhe Xia, Dongdong Lu, Zhe Zhang, Jingru Wang, Guangwei Huang, Jiachen Tu, Yaokun Shi, Guoyi Xu, Yaoxin Jiang, Jiajia Liu, Liwei Zhou, Bei Dou, Tao Wu, Zekang Fan, Junjie Liu, AdhΓ©mar De Senneville, Flavien Armangeon, Mengbers, Yazhe Lyu, Zhimeng Xin, Zijian Zhuang, Hongchun Zhu, Li Wang (possible past Tesla (United States) affiliation)
Abstract

Cross-domain few-shot object detection (CD-FSOD) remains a challenging problem for existing object detectors and few-shot learning approaches, particularly when generalizing across distinct domains. As part of NTIRE 2026, we hosted the second CD-FSOD Challenge to systematically evaluate and promote progress in detecting objects in unseen target domains under limited annotation conditions. The challenge received strong community interest, with 128 registered participants and a total of 696 submis...

πŸ“„ How Transformers Learn to Plan via Multi-Token Prediction
πŸ—“οΈ Published: 4/13/2026
πŸ”— http://arxiv.org/abs/2604.11912v1
πŸ‘₯ Authors: Jianhao Huang, Zhanpeng Zhou, Renqiu Xia, Baharan Mirzasoleiman (possible past Eth Zurich affiliation), Weijie Su, Wei Huang (possible past Google (United States) affiliation)
Abstract

While next-token prediction (NTP) has been the standard objective for training language models, it often struggles to capture global structure in reasoning tasks. Multi-token prediction (MTP) has recently emerged as a promising alternative, yet its underlying mechanisms remain poorly understood. In this paper, we study how MTP facilitates reasoning, with a focus on planning. Empirically, we show that MTP consistently outperforms NTP on both synthetic graph path-finding tasks and more realistic r...

πŸ“„ Solving Physics Olympiad via Reinforcement Learning on Physics Simulators
πŸ—“οΈ Published: 4/13/2026
πŸ”— http://arxiv.org/abs/2604.11805v1
πŸ‘₯ Authors: Mihir Prabhudesai, Aryan Satpathy, Yangmin Li, Zheyang Qin, Nikash Bhardwaj, Amir Zadeh, Chuan Li, Katerina Fragkiadaki (possible past University Of California, Berkeley affiliation), Deepak Pathak (possible past University Of California, Berkeley affiliation)
Abstract

We have witnessed remarkable advances in LLM reasoning capabilities with the advent of DeepSeek-R1. However, much of this progress has been fueled by the abundance of internet question-answer (QA) pairs, a major bottleneck going forward, since such data is limited in scale and concentrated mainly in domains like mathematics. In contrast, other sciences such as physics lack large-scale QA datasets to effectively train reasoning-capable models. In this work, we show that physics simulators can ser...

πŸ“„ ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection
πŸ—“οΈ Published: 4/13/2026
πŸ”— http://arxiv.org/abs/2604.11790v1
πŸ‘₯ Authors: Wei Zhao (possible past Tencent (China) affiliation), Zhe Li (possible past Google (United States) affiliation), Peixin Zhang, Jun Sun
Abstract

Tool-augmented Large Language Model (LLM) agents have demonstrated impressive capabilities in automating complex, multi-step real-world tasks, yet remain vulnerable to indirect prompt injection. Adversaries exploit this weakness by embedding malicious instructions within tool-returned content, which agents directly incorporate into their conversation history as trusted observations. This vulnerability manifests across three primary attack channels: web and local content injection, MCP server inj...

πŸ“„ General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks
πŸ—“οΈ Published: 4/13/2026
πŸ”— http://arxiv.org/abs/2604.11778v1
πŸ‘₯ Authors: Junlin Liu, Shengnan An, Shuang Zhou, Dan Ma, Shixiong Luo, Ying Xie, Yuan Zhang (possible past Google (United States) affiliation), Wenling Yuan, Yifan Zhou, Xiaoyu Li (possible past Tencent (China) affiliation), Ziwen Wang, Xuezhi Cao, Xunliang Cai
Abstract

Contemporary large language models (LLMs) have demonstrated remarkable reasoning capabilities, particularly in specialized domains like mathematics and physics. However, their ability to generalize these reasoning skills to more general and broader contexts--often termed general reasoning--remains under-explored. Unlike domain-specific reasoning, general reasoning relies less on expert knowledge but still presents formidable reasoning challenges, such as complex constraints, nested logical branc...

πŸ“„ StarVLA-$Ξ±$: Reducing Complexity in Vision-Language-Action Systems
πŸ—“οΈ Published: 4/13/2026
πŸ”— http://arxiv.org/abs/2604.11757v1
πŸ‘₯ Authors: Jinhui Ye, Ning Gao, Senqiao Yang, Jinliang Zheng, Zixuan Wang, Yuxin Chen, Pengguang Chen, Yilun Chen, Shu Liu (possible past Tencent (China) affiliation), Jiaya Jia (possible past Tencent (China) affiliation)
Abstract

Vision-Language-Action (VLA) models have recently emerged as a promising paradigm for building general-purpose robotic agents. However, the VLA landscape remains highly fragmented and complex: as existing approaches vary substantially in architectures, training data, embodiment configurations, and benchmark-specific engineering. In this work, we introduce StarVLA-$Ξ±$, a simple yet strong baseline designed to study VLA design choices under controlled conditions. StarVLA-$Ξ±$ deliberately minimizes...

πŸ“„ Evolution of Optimization Methods: Algorithms, Scenarios, and Evaluations
πŸ—“οΈ Published: 4/14/2026
πŸ”— http://arxiv.org/abs/2604.12968v1
πŸ‘₯ Authors: Tong Zhang (possible past Tencent (China) affiliation), Jiangning Zhang (possible past Tencent (China) affiliation), Zhucun Xue, Juntao Jiang, Yicheng Xu, Chengming Xu, Teng Hu, Xingyu Xie, Xiaobin Hu (possible past Tencent (China) affiliation), Yabiao Wang (possible past Tencent (China) affiliation), Yong Liu, Shuicheng Yan (possible past National University Of Singapore affiliation)
Abstract

Balancing convergence speed, generalization capability, and computational efficiency remains a core challenge in deep learning optimization. First-order gradient descent methods, epitomized by stochastic gradient descent (SGD) and Adam, serve as the cornerstone of modern training pipelines. However, large-scale model training, stringent differential privacy requirements, and distributed learning paradigms expose critical limitations in these conventional approaches regarding privacy protection a...

πŸ“„ VideoFlexTok: Flexible-Length Coarse-to-Fine Video Tokenization
πŸ—“οΈ Published: 4/14/2026
πŸ”— http://arxiv.org/abs/2604.12887v1
πŸ‘₯ Authors: Andrei Atanov, Jesse Allardice, Roman Bachmann, Oğuzhan Fatih Kar, R Devon Hjelm (possible past Microsoft (United States) affiliation), David Griffiths, Peter Fu, Afshin Dehghan, Amir Zamir (possible past University Of California, Berkeley affiliation)
Abstract

Visual tokenizers map high-dimensional raw pixels into a compressed representation for downstream modeling. Beyond compression, tokenizers dictate what information is preserved and how it is organized. A de facto standard approach to video tokenization is to represent a video as a spatiotemporal 3D grid of tokens, each capturing the corresponding local information in the original signal. This requires the downstream model that consumes the tokens, e.g., a text-to-video model, to learn to predict...

πŸ“„ Labeled TrustSet Guided: Batch Active Learning with Reinforcement Learning
πŸ—“οΈ Published: 4/14/2026
πŸ”— http://arxiv.org/abs/2604.12303v1
πŸ‘₯ Authors: Guofeng Cui, Yang Liu (possible past Tsinghua University affiliation), Pichao Wang, Hankai Hsu, Xiaohang Sun, Xiang Hao, Zhu Liu (possible past Tsinghua University affiliation)
Abstract

Batch active learning (BAL) is a crucial technique for reducing labeling costs and improving data efficiency in training large-scale deep learning models. Traditional BAL methods often rely on metrics like Mahalanobis Distance to balance uncertainty and diversity when selecting data for annotation. However, these methods predominantly focus on the distribution of unlabeled data and fail to leverage feedback from labeled data or the model's performance. To address these limitations, we introduce ...

πŸ“„ SOLARIS: Speculative Offloading of Latent-bAsed Representation for Inference Scaling
πŸ—“οΈ Published: 4/13/2026
πŸ”— http://arxiv.org/abs/2604.12110v1
πŸ‘₯ Authors: Zikun Liu, Liang Luo, Qianru Li, Zhengyu Zhang, Wei Ling, Jingyi Shen, Zeliang Chen, Yaning Huang, Jingxian Huang, Abdallah Aboelela, Chonglin Sun, Feifan Gu, Fenggang Wu, Hang Qu, Huayu Li, Jill Pan, Kaidi Pei, Laming Chen, Longhao Jin, Qin Huang, Tongyi Tang, Varna Puvvada, Wenlin Chen (possible past Meta (United States) affiliation), Xiaohan Wei, Xu Cao, Yantao Yao, Yuan Jin, Yunchen Pu (possible past Meta (United States) affiliation), Yuxin Chen, Zijian Shen, Zhengkai Zhang, Dong Liang, Ellie Wen
Abstract

Recent advances in recommendation scaling laws have led to foundation models of unprecedented complexity. While these models offer superior performance, their computational demands make real-time serving impractical, often forcing practitioners to rely on knowledge distillation-compromising serving quality for efficiency. To address this challenge, we present SOLARIS (Speculative Offloading of Latent-bAsed Representation for Inference Scaling), a novel framework inspired by speculative decoding....

πŸ“„ LoSA: Locality Aware Sparse Attention for Block-Wise Diffusion Language Models
πŸ—“οΈ Published: 4/13/2026
πŸ”— http://arxiv.org/abs/2604.12056v1
πŸ‘₯ Authors: Haocheng Xi, Harman Singh, Yuezhou Hu, Coleman Hooper, Rishabh Tiwari, Aditya Tomar, Minjae Lee, Wonjun Kang, Michael Mahoney, Chenfeng Xu (possible past University Of California, Berkeley affiliation), Kurt Keutzer (possible past University Of California, Berkeley affiliation), Amir Gholami
Abstract

Block-wise diffusion language models (DLMs) generate multiple tokens in any order, offering a promising alternative to the autoregressive decoding pipeline. However, they still remain bottlenecked by memory-bound attention in long-context scenarios. Naive sparse attention fails on DLMs due to a KV Inflation problem, where different queries select different prefix positions, making the union of accessed KV pages large. To address this, we observe that between consecutive denoising steps, only a s...

πŸ“„ Agentic LLM Reasoning in a Self-Driving Laboratory for Air-Sensitive Lithium Halide Spinel Conductors
πŸ—“οΈ Published: 4/13/2026
πŸ”— http://arxiv.org/abs/2604.11957v1
πŸ‘₯ Authors: Yuxing Fei (possible past University Of California, Berkeley affiliation), Bernardus Rendy (possible past University Of California, Berkeley affiliation), Xiaochen Yang, Junhee Woo, Xu Huang, Chang Li, Shilong Wang, David Milsted, Yan Zeng, Gerbrand Ceder
Abstract

Self-driving laboratories promise to accelerate materials discovery. Yet current automated solid-state synthesis platforms are limited to ambient conditions, thereby precluding their use for air-sensitive materials. Here, we present A-Lab for Glovebox Powder Solid-state Synthesis (A-Lab GPSS), a robotic platform capable of synthesizing and characterizing air-sensitive inorganic materials under strict air-free conditions. By integrating an agentic AI framework into the A-Lab GPSS platform, we str...

πŸ“„ CAGenMol: Condition-Aware Diffusion Language Model for Goal-Directed Molecular Generation
πŸ—“οΈ Published: 4/13/2026
πŸ”— http://arxiv.org/abs/2604.11483v1
πŸ‘₯ Authors: Yanting Li, Zhuoyang Jiang, Enyan Dai, Lei Wang (possible past Baidu (China) affiliation), Wen-Cai Ye, Li Liu (possible past National University Of Defense Technology affiliation)
Abstract

Goal-directed molecular generation requires satisfying heterogeneous constraints such as protein--ligand compatibility and multi-objective drug-like properties, yet existing methods often optimize these constraints in isolation, failing to reconcile conflicting objectives (e.g., affinity vs. safety), and struggle to navigate the non-differentiable chemical space without compromising structural validity. To address these challenges, we propose CAGenMol, a condition-aware discrete diffusion framew...

πŸ“„ Think Before you Write: QA-Guided Reasoning for Character Descriptions in Books
πŸ—“οΈ Published: 4/13/2026
πŸ”— http://arxiv.org/abs/2604.11435v1
πŸ‘₯ Authors: Argyrios Papoudakis, Mirella Lapata (possible past University Of Edinburgh affiliation), Frank Keller (possible past University Of Edinburgh affiliation)
Abstract

Character description generation is an important capability for narrative-focused applications such as summarization, story analysis, and character-driven simulations. However, generating accurate character descriptions from long-form narratives (e.g., novels) is challenging: models must track evolving attributes (e.g., relationships and events), integrate evidence scattered across the text, and infer implicit details. Despite the success of reasoning-enabled LLMs on many benchmarks, we find tha...

πŸ“„ The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping
πŸ—“οΈ Published: 4/13/2026
πŸ”— http://arxiv.org/abs/2604.11297v1
πŸ‘₯ Authors: Yang Liu (possible past Tsinghua University affiliation), Enxi Wang, Yufei Gao, Weixin Zhang, Bo Wang (possible past Tencent (China) affiliation), Zhiyuan Zeng, Yikai Zhang, Yining Zheng, Xipeng Qiu
Abstract

Despite the success of reinforcement learning for large language models, a common failure mode is reduced sampling diversity, where the policy repeatedly generates similar erroneous behaviors. Classical entropy regularization encourages randomness under the current policy, but does not explicitly discourage recurrent failure patterns across rollouts. We propose MEDS, a Memory-Enhanced Dynamic reward Shaping framework that incorporates historical behavioral signals into reward design. By storing ...

*Notable papers are those with at least two authors from a "big" AI/ML lab.