πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification
πŸ—“οΈ Published: 3/27/2026
πŸ”— http://arxiv.org/abs/2603.26648v1
πŸ‘₯ Authors: Zehai He, Wenyi Hong, Zhen Yang (possible past Tsinghua University affiliation), Ziyang Pan, Mingdao Liu, Xiaotao Gu, Jie Tang (possible past Tsinghua University affiliation)
Abstract

Recent advances in large language models have improved the capabilities of coding agents, yet systematic evaluation of complex, end-to-end website development remains limited. To address this gap, we introduce Vision2Web, a hierarchical benchmark for visual website development, spanning from static UI-to-code generation, interactive multi-page frontend reproduction, to long-horizon full-stack website development. The benchmark is constructed from real-world websites and comprises a total of 193 ...

πŸ“„ Stabilizing Rubric Integration Training via Decoupled Advantage Normalization
πŸ—“οΈ Published: 3/27/2026
πŸ”— http://arxiv.org/abs/2603.26535v1
πŸ‘₯ Authors: Zelin Tan, Zhouliang Yu, Bohan Lin, Zijie Geng, Hejia Geng, Yudong Zhang, Mulei Zhang, Yang Chen (possible past Tencent (China) affiliation), Shuyue Hu, Zhenfei Yin, Chen Zhang (possible past Peking University affiliation), Lei Bai
Abstract

We propose Process-Aware Policy Optimization (PAPO), a method that integrates process-level evaluation into Group Relative Policy Optimization (GRPO) through decoupled advantage normalization, to address two limitations of existing reward designs. Outcome reward models (ORM) evaluate only final-answer correctness, treating all correct responses identically regardless of reasoning quality, and gradually lose the advantage signal as groups become uniformly correct. Process reward models (PRM) offe...

πŸ“„ AIRA_2: Overcoming Bottlenecks in AI Research Agents
πŸ—“οΈ Published: 3/27/2026
πŸ”— http://arxiv.org/abs/2603.26499v1
πŸ‘₯ Authors: Karen Hambardzumyan, Nicolas Baldwin, Edan Toledo, Rishi Hazra, Michael Kuchnik, Bassel Al Omari, Thomas Simon Foster, Anton Protopopov, Jean-Christophe Gagnon-Audet, Ishita Mediratta, Kelvin Niu, Michael Shvartsman, Alisia Lupidi, Alexis Audran-Reiss, Parth Pathak, Tatiana Shavrina, Despoina Magka, Hela Momand, Derek Dunfield, Nicola Cancedda, Pontus Stenetorp, Carole-Jean Wu (possible past Meta (United States) affiliation), Jakob Nicolaus Foerster, Yoram Bachrach (possible past Deepmind (United Kingdom) affiliation), Martin Josifoski
Abstract

Existing research has identified three structural performance bottlenecks in AI research agents: (1) synchronous single-GPU execution constrains sample throughput, limiting the benefit of search; (2) a generalization gap where validation-based selection causes performance to degrade over extended search horizons; and (3) the limited capability of fixed, single-turn LLM operators imposes a ceiling on search performance. We introduce AIRA$_2$, which addresses these bottlenecks through three archit...

πŸ“„ Automated near-term quantum algorithm discovery for molecular ground states
πŸ—“οΈ Published: 3/27/2026
πŸ”— http://arxiv.org/abs/2603.26359v1
πŸ‘₯ Authors: Fabian Finger, Frederic Rapp, Pranav Kalidindi, Kerry He, Kante Yin, Alexander Koziell-Pipe, David Zsolt Manrique, Gabriel Greene-Diniz, Stephen Clark (possible past University Of Cambridge affiliation), Hamza Fawzi, Bernardino Romera Paredes, Alhussein Fawzi (possible past Google (United States) affiliation), Konstantinos Meichanetzidis
Abstract

Designing quantum algorithms is a complex and counterintuitive task, making it an ideal candidate for AI-driven algorithm discovery. To this end, we employ the Hive, an AI platform for program synthesis, which utilises large language models to drive a highly distributed evolutionary process for discovering new algorithms. We focus on the ground state problem in quantum chemistry, and discover efficient quantum heuristic algorithms that solve it for molecules LiH, H2O, and F2 while exhibiting sig...

πŸ“„ Knowdit: Agentic Smart Contract Vulnerability Detection with Auditing Knowledge Summarization
πŸ—“οΈ Published: 3/27/2026
πŸ”— http://arxiv.org/abs/2603.26270v1
πŸ‘₯ Authors: Ziqiao Kong, Wanxu Xia, Chong Wang (possible past Google (United States) affiliation), Yi Lu, Pan Li (possible past Baidu (China) affiliation), Shaohua Li, Zong Cao, Yang Liu (possible past Tsinghua University affiliation)
Abstract

Smart contracts govern billions of dollars in decentralized finance (DeFi), yet automated vulnerability detection remains challenging because many vulnerabilities are tightly coupled with project-specific business logic. We observe that recurring vulnerabilities across diverse DeFi business models often share the same underlying economic mechanisms, which we term DeFi semantics, and that capturing these shared abstractions can enable more systematic auditing. Building on this insight, we propose...

πŸ“„ ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?
πŸ—“οΈ Published: 3/26/2026
πŸ”— http://arxiv.org/abs/2603.25823v1
πŸ‘₯ Authors: Haonan Han, Jiancheng Huang, Xiaopeng Sun, Junyan He, Rui Yang, Jie Hu, Xiaojiang Peng, Lin Ma (possible past Tencent (China) affiliation), Xiaoming Wei, Xiu Li (possible past Tsinghua University affiliation)
Abstract

Beneath the stunning visual fidelity of modern AIGC models lies a "logical desert", where systems fail tasks that require physical, causal, or complex spatial reasoning. Current evaluations largely rely on superficial metrics or fragmented benchmarks, creating a ``performance mirage'' that overlooks the generative process. To address this, we introduce ViGoR Vision-G}nerative Reasoning-centric Benchmark), a unified framework designed to dismantle this mirage. ViGoR distinguishes itself through f...

πŸ“„ Vega: Learning to Drive with Natural Language Instructions
πŸ—“οΈ Published: 3/26/2026
πŸ”— http://arxiv.org/abs/2603.25741v1
πŸ‘₯ Authors: Sicheng Zuo, Yuxuan Li, Wenzhao Zheng, Zheng Zhu, Jie Zhou (possible past Tsinghua University affiliation), Jiwen Lu (possible past Tsinghua University affiliation)
Abstract

Vision-language-action models have reshaped autonomous driving to incorporate languages into the decision-making process. However, most existing pipelines only utilize the language modality for scene descriptions or reasoning and lack the flexibility to follow diverse user instructions for personalized driving. To address this, we first construct a large-scale driving dataset (InstructScene) containing around 100,000 scenes annotated with diverse driving instructions with the corresponding traje...

πŸ“„ Back to Basics: Revisiting ASR in the Age of Voice Agents
πŸ—“οΈ Published: 3/26/2026
πŸ”— http://arxiv.org/abs/2603.25727v1
πŸ‘₯ Authors: Geeyang Tay, Wentao Ma, Jaewon Lee, Yuzhi Tang, Daniel Lee, Weisu Yin, Dongming Shen, Silin Meng, Yi Zhu, Mu Li (possible past Carnegie Mellon University affiliation), Alex Smola (possible past Google (United States) affiliation)
Abstract

Automatic speech recognition (ASR) systems have achieved near-human accuracy on curated benchmarks, yet still fail in real-world voice agents under conditions that current evaluations do not systematically cover. Without diagnostic tools that isolate specific failure factors, practitioners cannot anticipate which conditions, in which languages, will cause what degree of degradation. We introduce WildASR, a multilingual (four-language) diagnostic benchmark sourced entirely from real human speech ...

πŸ“„ Voxtral TTS
πŸ—“οΈ Published: 3/26/2026
πŸ”— http://arxiv.org/abs/2603.25551v1
πŸ‘₯ Authors: Alexander H. Liu, Alexis Tacnet, Andy Ehrenberg, Andy Lo, Chen-Yo Sun, Guillaume Lample, Henry Lagarde, Jean-Malo Delignon, Jaeyoung Kim, John Harvill, Khyathi Raghavi Chandu (possible past Carnegie Mellon University affiliation), Lorenzo Signoretti, Margaret Jennings, Patrick Von Platen, Pavankumar Reddy Muddireddy, Rohin Arora, Sanchit Gandhi, Samuel Humeau, Soham Ghosh, Srijan Mishra, Van Phung, Abdelaziz Bounhar, Abhinav Rastogi (possible past Google (United States) affiliation), Adrien SadΓ©, Alan Jeffares, Albert Jiang, Alexandre Cahill, Alexandre Gavaudan, Alexandre Sablayrolles, AmΓ©lie HΓ©liou, Amos You, Andrew Bai, Andrew Zhao, Angele Lenglemetz, Anmol Agarwal, Anton Eliseev, Antonia Calvi, Arjun Majumdar, Arthur Fournier, Artjom Joosen, Avi Sooriyarachchi, Aysenur Karaduman Utkur, Baptiste Bout, Baptiste RoziΓ¨re, Baudouin De Monicault, Benjamin Tibi, Bowen Yang, Charlotte CronjΓ€ger, ClΓ©mence Lanfranchi, Connor Chen, Corentin Barreau, Corentin Sautier, Cyprien Courtot, Darius Dabert, Diego De Las Casas (possible past Deepmind (United Kingdom) affiliation), Elizaveta Demyanenko, Elliot Chane-Sane, Emmanuel Gottlob, Enguerrand Paquin, Etienne Goffinet, Fabien Niel, Faruk Ahmed, Federico Baldassarre, Gabrielle Berrada, GaΓ«tan Ecrepont, Gauthier Guinet, Genevieve Hayes, Georgii Novikov, Giada Pistilli, Guillaume Kunsch, Guillaume Martin, Guillaume Raille, Gunjan Dhanuka, Gunshi Gupta, Han Zhou, Harshil Shah, Hope Mcgovern, Hugo Thimonier, Indraneel Mukherjee (possible past Google (United States) affiliation), Irene Zhang, Jacques Sun, Jan Ludziejewski, Jason Rute, JΓ©rΓ©mie Dentan, Joachim Studnia, Jonas Amar, JosΓ©phine Delas, Josselin Somerville Roberts, Julien Tauran, Karmesh Yadav, Kartik Khandelwal, Kilian Tep, Kush Jain, Laurence Aitchison, Laurent Fainsin, LΓ©onard Blier, Lingxiao Zhao, Louis Martin, Lucile Saulnier, Luyu Gao, Maarten Buyl, Manan Sharma, Marie Pellat, Mark Prins, Martin Alexandre, Mathieu PoirΓ©e, Mathieu Schmitt, Mathilde Guillaumin, Matthieu Dinot, Matthieu Futeral, Maxime Darrin, Maximilian Augustin, Mert Unsal, Mia Chiquier, Mikhail Biriuchinskii, Minh-Quang Pham, Mircea Lica, Morgane RiviΓ¨re (possible past Meta (United States) affiliation), Nathan Grinsztajn, Neha Gupta, Olivier Bousquet (possible past Google (United States) affiliation), Olivier Duchenne, Patricia Wang, Paul Jacob, Paul Wambergue, Paula Kurylowicz, Philippe Pinel, PhilomΓ¨ne Chagniot, Pierre Stock, Piotr MiΕ‚oΕ›, Prateek Gupta, Pravesh Agrawal, Quentin Torroba, Ram Ramrakhya, Randall Isenhour, Rishi Shah, Romain Sauvestre, Roman Soletskyi, Rosalie Millner, Rupert Menneer, Sagar Vaze, Samuel Barry, Samuel Belkadi, Sandeep Subramanian (possible past Carnegie Mellon University affiliation), Sean Cha, Shashwat Verma, Siddhant Waghjale, Siddharth Gandhi, Simon Lepage, Sumukh Aithal, Szymon Antoniak, Tarun Kumar Vangani, Teven Le Scao, ThΓ©o Cachet, Theo Simon Sorg, Thibaut Lavril (possible past Meta (United States) affiliation), Thomas Chabal, Thomas Foubert, Thomas Robert, Thomas Wang, Tim Lawson, Tom Bewley, Tom Edwards, Tyler Wang, Umar Jamil, Umberto Tomasini, Valeriia Nemychnikova, Vedant Nanda, Victor Jouault, Vincent MaladiΓ¨re, Vincent Pfister, Virgile Richard, Vladislav Bataev, Wassim Bouaziz, Wen-Ding Li, William Havard, William Marshall, Xinghui Li, Xingran Guo, Xinyu Yang, Yannic Neuhaus, Yassine El Ouahidi, Yassir Bendou, Yihan Wang, Yimu Pan, Zaccharie Ramzi, Zhenlin Xu
Abstract

We introduce Voxtral TTS, an expressive multilingual text-to-speech model that generates natural speech from as little as 3 seconds of reference audio. Voxtral TTS adopts a hybrid architecture that combines auto-regressive generation of semantic speech tokens with flow-matching for acoustic tokens. These tokens are encoded and decoded with Voxtral Codec, a speech tokenizer trained from scratch with a hybrid VQ-FSQ quantization scheme. In human evaluations conducted by native speakers, Voxtral TT...

πŸ“„ Evaluating Language Models for Harmful Manipulation
πŸ—“οΈ Published: 3/26/2026
πŸ”— http://arxiv.org/abs/2603.25326v1
πŸ‘₯ Authors: Canfer Akbulut, Rasmi Elasmar, Abhishek Roy, Anthony Payne, Priyanka Suresh, Lujain Ibrahim, Seliem El-Sayed, Charvi Rastogi, Ashyana Kachra, Will Hawkins (possible past Deepmind (United Kingdom) affiliation), Kristian Lum (possible past Google (United States) affiliation), Laura Weidinger (possible past Deepmind (United Kingdom) affiliation)
Abstract

Interest in the concept of AI-driven harmful manipulation is growing, yet current approaches to evaluating it are limited. This paper introduces a framework for evaluating harmful AI manipulation via context-specific human-AI interaction studies. We illustrate the utility of this framework by assessing an AI model with 10,101 participants spanning interactions in three AI use domains (public policy, finance, and health) and three locales (US, UK, and India). Overall, we find that that the tested...

πŸ“„ Activation Matters: Test-time Activated Negative Labels for OOD Detection with Vision-Language Models
πŸ—“οΈ Published: 3/26/2026
πŸ”— http://arxiv.org/abs/2603.25250v1
πŸ‘₯ Authors: Yabin Zhang, Maya Varma, Yunhe Gao, Jean-Benoit Delbrouck, Jiaming Liu (possible past Baidu (China) affiliation), Chong Wang (possible past Google (United States) affiliation), Curtis Langlotz
Abstract

Out-of-distribution (OOD) detection aims to identify samples that deviate from in-distribution (ID). One popular pipeline addresses this by introducing negative labels distant from ID classes and detecting OOD based on their distance to these labels. However, such labels may present poor activation on OOD samples, failing to capture the OOD characteristics. To address this, we propose \underline{T}est-time \underline{A}ctivated \underline{N}egative \underline{L}abels (TANL) by dynamically evalua...

πŸ“„ FluxEDA: A Unified Execution Infrastructure for Stateful Agentic EDA
πŸ—“οΈ Published: 3/26/2026
πŸ”— http://arxiv.org/abs/2603.25243v1
πŸ‘₯ Authors: Zhengrui Chen, Zixuan Song, Yu Li (possible past Tencent (China) affiliation), Qi Sun (possible past Google (United States) affiliation), Cheng Zhuo
Abstract

Large language models and autonomous agents are increasingly explored for EDA automation, but many existing integrations still rely on script-level or request-level interactions, which makes it difficult to preserve tool state and support iterative optimization in real production-oriented environments. In this work, we present FluxEDA, a unified and stateful infrastructure substrate for agentic EDA. FluxEDA introduces a managed gateway-based execution interface with structured request and respon...

πŸ“„ Photon: Speedup Volume Understanding with Efficient Multimodal Large Language Models
πŸ—“οΈ Published: 3/26/2026
πŸ”— http://arxiv.org/abs/2603.25155v1
πŸ‘₯ Authors: Chengyu Fang, Heng Guo (possible past Tencent (China) affiliation), Zheng Jiang, Chunming He, Xiu Li (possible past Tsinghua University affiliation), Minfeng Xu
Abstract

Multimodal large language models are promising for clinical visual question answering tasks, but scaling to 3D imaging is hindered by high computational costs. Prior methods often rely on 2D slices or fixed-length token compression, disrupting volumetric continuity and obscuring subtle findings. We present Photon, a framework that represents 3D medical volumes with token sequences of variable length. Photon introduces instruction-conditioned token scheduling and surrogate gradient propagation to...

πŸ“„ UniAI-GraphRAG: Synergizing Ontology-Guided Extraction, Multi-Dimensional Clustering, and Dual-Channel Fusion for Robust Multi-Hop Reasoning
πŸ—“οΈ Published: 3/26/2026
πŸ”— http://arxiv.org/abs/2603.25152v1
πŸ‘₯ Authors: Jie Wang (possible past Tsinghua University affiliation), Honghua Huang, Xi Ge, Jianhui Su, Wen Liu (possible past Tencent (China) affiliation), Shiguo Lian
Abstract

Retrieval-Augmented Generation (RAG) systems face significant challenges in complex reasoning, multi-hop queries, and domain-specific QA. While existing GraphRAG frameworks have made progress in structural knowledge organization, they still have limitations in cross-industry adaptability, community report integrity, and retrieval performance. This paper proposes UniAI-GraphRAG, an enhanced framework built upon open-source GraphRAG. The framework introduces three core innovations: (1) Ontology-Gu...

πŸ“„ MCLMR: A Model-Agnostic Causal Learning Framework for Multi-Behavior Recommendation
πŸ—“οΈ Published: 3/26/2026
πŸ”— http://arxiv.org/abs/2603.25126v1
πŸ‘₯ Authors: Ranxu Zhang, Junjie Meng, Ying Sun, Ziqi Xu, Bing Yin, Hao Li (possible past Tsinghua University affiliation), Yanyong Zhang, Chao Wang (possible past Google (United States) affiliation)
Abstract

Multi-Behavior Recommendation (MBR) leverages multiple user interaction types (e.g., views, clicks, purchases) to enrich preference modeling and alleviate data sparsity issues in traditional single-behavior approaches. However, existing MBR methods face fundamental challenges: they lack principled frameworks to model complex confounding effects from user behavioral habits and item multi-behavior distributions, struggle with effective aggregation of heterogeneous auxiliary behaviors, and fail to ...

πŸ“„ Machine Unlearning under Retain-Forget Entanglement
πŸ—“οΈ Published: 3/27/2026
πŸ”— http://arxiv.org/abs/2603.26569v1
πŸ‘₯ Authors: Jingpu Cheng, Ping Liu, Qianxiao Li (possible past National University Of Singapore affiliation), Chi Zhang (possible past Peking University affiliation)
Abstract

Forgetting a subset in machine unlearning is rarely an isolated task. Often, retained samples that are closely related to the forget set can be unintentionally affected, particularly when they share correlated features from pretraining or exhibit strong semantic similarities. To address this challenge, we propose a novel two-phase optimization framework specifically designed to handle such retai-forget entanglements. In the first phase, an augmented Lagrangian method increases the loss on the fo...

πŸ“„ DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models
πŸ—“οΈ Published: 3/27/2026
πŸ”— http://arxiv.org/abs/2603.26164v1
πŸ‘₯ Authors: Hao Liang, Zhengyang Zhao, Meiyi Qiang, Mingrui Chen, Lu Ma, Rongyi Yu, Hengyi Feng, Shixuan Sun, Zimo Meng, Xiaochen Ma, Xuanlin Yang, Qifeng Cai, Ruichuan An, Bohan Zeng, Zhen Hao Wong, Chengyu Shen, Runming He, Zhaoyang Han, Yaowei Zheng, Fangcheng Fu (possible past Peking University affiliation), Conghui He (possible past Tsinghua University affiliation), Bin Cui (possible past Peking University affiliation), Zhiyu Li, Weinan E, Wentao Zhang (possible past Mila - Quebec Artificial Intelligence Institute affiliation)
Abstract

Data-centric training has emerged as a promising direction for improving large language models (LLMs) by optimizing not only model parameters but also the selection, composition, and weighting of training data during optimization. However, existing approaches to data selection, data mixture optimization, and data reweighting are often developed in isolated codebases with inconsistent interfaces, hindering reproducibility, fair comparison, and practical integration. In this paper, we present Data...

πŸ“„ Accurate Precipitation Forecast by Efficiently Learning from Massive Atmospheric Variables and Unbalanced Distribution
πŸ—“οΈ Published: 3/27/2026
πŸ”— http://arxiv.org/abs/2603.26108v1
πŸ‘₯ Authors: Shuangliang Li, Siwei Li, Li Li (possible past Google (United States) affiliation), Weijie Zou, Jie Yang (possible past Shanghai Jiao Tong University affiliation), Maolin Zhang
Abstract

Short-term (0-24 hours) precipitation forecasting is highly valuable to socioeconomic activities and public safety. However, the highly complex evolution patterns of precipitation events, the extreme imbalance between precipitation and non-precipitation samples, and the inability of existing models to efficiently and effectively utilize large volumes of multi-source atmospheric observation data hinder improvements in precipitation forecasting accuracy and computational efficiency. To address the...

πŸ“„ QuitoBench: A High-Quality Open Time Series Forecasting Benchmark
πŸ—“οΈ Published: 3/27/2026
πŸ”— http://arxiv.org/abs/2603.26017v1
πŸ‘₯ Authors: Siqiao Xue, Zhaoyang Zhu, Wei Zhang (possible past Tsinghua University affiliation), Rongyao Cai, Rui Wang (possible past Tencent (China) affiliation), Yixiang Mu, Fan Zhou, Jianguo Li, Peng Di, Hang Yu
Abstract

Time series forecasting is critical across finance, healthcare, and cloud computing, yet progress is constrained by a fundamental bottleneck: the scarcity of large-scale, high-quality benchmarks. To address this gap, we introduce \textsc{QuitoBench}, a regime-balanced benchmark for time series forecasting with coverage across eight trend$\times$seasonality$\times$forecastability (TSF) regimes, designed to capture forecasting-relevant properties rather than application-defined domain labels. The ...

πŸ“„ Incorporating contextual information into KGWAS for interpretable GWAS discovery
πŸ—“οΈ Published: 3/26/2026
πŸ”— http://arxiv.org/abs/2603.25855v1
πŸ‘₯ Authors: Cheng Jiang (possible past Tencent (China) affiliation), Brady Ryan, Megan Crow, Kipper Fletez-Brant, Kashish Doshi, Sandra Melo Carlos, Kexin Huang (possible past Stanford University affiliation), Burkhard Hoeckendorf, Heming Yao, David Richmond
Abstract

Genome-Wide Association Studies (GWAS) identify associations between genetic variants and disease; however, moving beyond associations to causal mechanisms is critical for therapeutic target prioritization. The recently proposed Knowledge Graph GWAS (KGWAS) framework addresses this challenge by linking genetic variants to downstream gene-gene interactions via a knowledge graph (KG), thereby improving detection power and providing mechanistic insights. However, the original KGWAS implementation r...

πŸ“„ ExVerus: Verus Proof Repair via Counterexample Reasoning
πŸ—“οΈ Published: 3/26/2026
πŸ”— http://arxiv.org/abs/2603.25810v1
πŸ‘₯ Authors: Jun Yang (possible past Tsinghua University affiliation), Yuechun Sun, Yi Wu (possible past University Of California, Berkeley affiliation), Rodrigo Caridad, Yongwei Yuan, Jianan Yao (possible past Google (United States) affiliation), Shan Lu, Kexin Pei
Abstract

Large Language Models (LLMs) have shown promising results in automating formal verification. However, existing approaches treat proof generation as a static, end-to-end prediction over source code, relying on limited verifier feedback and lacking access to concrete program behaviors. We present EXVERUS, a counterexample-guided framework that enables LLMs to reason about proofs using behavioral feedback via counterexamples. When a proof fails, EXVERUS automatically generates and validates counter...

πŸ“„ Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale
πŸ—“οΈ Published: 3/26/2026
πŸ”— http://arxiv.org/abs/2603.25040v1
πŸ‘₯ Authors: Yicheng Zou, Dongsheng Zhu, Lin Zhu, Tong Zhu (possible past Nvidia (United States) affiliation), Yunhua Zhou, Peiheng Zhou, Xinyu Zhou, Dongzhan Zhou, Zhiwang Zhou, Yuhao Zhou, Bowen Zhou, Zhanping Zhong, Zhijie Zhong, Haiteng Zhao, Penghao Zhao, Xiaomeng Zhao, Zhiyuan Zhao, Yechen Zhang, Jin Zhang, Wenwei Zhang, Hongjie Zhang, Zhuo Zhang, Wenlong Zhang, Bo Zhang (possible past Tencent (China) affiliation), Chao Zhang, Chen Zhang (possible past Peking University affiliation), Yuhang Zang, Fei Yuan, Jiakang Yuan, Jiashuo Yu, Jinhui Yin, Haochen Ye, Qian Yao, Bowen Yang, Danni Yang, Kaichen Yang, Ziang Yan, Jun Xu (possible past Google (United States) affiliation), Yicheng Xu, Wanghan Xu, Xuenan Xu, Chao Xu, Ruiliang Xu, Shuhao Xing, Long Xing, Xinchen Xie, Ling-I Wu, Zijian Wu, Zhenyu Wu, Lijun Wu, Yue Wu, Jianyu Wu, Wen Wu, Fan Wu, Xilin Wei, Qi Wei, Bingli Wang, Rui Wang (possible past Tencent (China) affiliation), Ziyi Wang, Zun Wang, Yi Wang, Haomin Wang, Yizhou Wang (possible past Peking University affiliation), Lintao Wang, Yiheng Wang, Longjiang Wang, Bin Wang, Jian Tong, Zhongbo Tian, Huanze Tang, Chen Tang, Shixiang Tang, Yu Sun (possible past Baidu (China) affiliation), Qiushi Sun, Xuerui Su, Qisheng Su, Chenlin Su, Demin Song, Jin Shi, Fukai Shang, Yuchen Ren, Pengli Ren, Xiaoye Qu, Yuan Qu, Jiantao Qiu, Yu Qiao (possible past Shanghai Artificial Intelligence Laboratory affiliation), Runyu Peng, Tianshuo Peng, Jiahui Peng, Qizhi Pei, Zhuoshi Pan, Linke Ouyang, Wenchang Ning, Yichuan Ma, Zerun Ma, Ningsheng Ma, Runyuan Ma, Chengqi Lyu, Haijun Lv (possible past Baidu (China) affiliation), Han Lv, Lindong Lu, Kuikun Liu, Jiangning Liu, Yuhong Liu, Kai Liu (possible past Baidu (China) affiliation), Hongwei Liu, Zhoumianze Liu, Mengjie Liu, Ziyu Liu, Wenran Liu, Yang Liu (possible past Tsinghua University affiliation), Liwei Liu, Kaiwen Liu, Junyao Lin, Junming Lin, Tianyang Lin, Dahua Lin, Jianze Liang, Linyang Li, Peiji Li, Zonglin Li, Zehao Li, Pengze Li, Guoyan Li, Lingkai Kong, Linglin Jing, Zhenjiang Jin, Feifei Jiang, Qian Jiang, Junhao Huang, Zixian Huang, Haian Huang, Zhouqi Hua, Han Hu, Linfeng Hou, Yinan He, Conghui He (possible past Tsinghua University affiliation), Tianyao He, Xu Guo, Qipeng Guo, Aijia Guo, Yuzhe Gu, Lixin Gu, Jingyang Gong, Qiming Ge, Jiaye Ge, Songyang Gao, Jianfei Gao, Xinyu Fang, Caihua Fan, Yue Fan, Yanhui Duan, Zichen Ding, Shengyuan Ding, Xuanlang Dai, Erfei Cui, Ganqu Cui (possible past Tsinghua University affiliation), Pei Chu, Tao Chu, Guangran Cheng, Yu Cheng (possible past National University Of Singapore affiliation), Kai Chen (possible past Shanghai Jiao Tong University affiliation), Yongkang Chen, Chiyu Chen, Guanzhou Chen, Qiaosheng Chen, Sitao Chen, Xin Chen (possible past Tencent (China) affiliation), Haojiong Chen, Yicheng Chen, Weihan Cao, Yuhang Cao, Qinglong Cao, Lei Bai
Abstract

We introduce Intern-S1-Pro, the first one-trillion-parameter scientific multimodal foundation model. Scaling to this unprecedented size, the model delivers a comprehensive enhancement across both general and scientific domains. Beyond stronger reasoning and image-text understanding capabilities, its intelligence is augmented with advanced agent capabilities. Simultaneously, its scientific expertise has been vastly expanded to master over 100 specialized tasks across critical science fields, incl...

*Notable papers are those with at least two authors from a "big" AI/ML lab.