πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering
πŸ—“οΈ Published: 3/20/2026
πŸ”— http://arxiv.org/abs/2603.20193v1
πŸ‘₯ Authors: Xinyi Shang, Yi Tang, Jiacheng Cui, Ahmed Elhagry, Salwa K. Al Khatib, Sondos Mahmoud Bsharat, Jiacheng Liu, Xiaohan Zhao, Jing-Hao Xue, Hao Li (possible past Tsinghua University affiliation), Salman Khan (possible past Inception Institute Of Artificial Intelligence affiliation), Zhiqiang Shen
Abstract

Existing tampering detection benchmarks largely rely on object masks, which severely misalign with the true edit signal: many pixels inside a mask are untouched or only trivially modified, while subtle yet consequential edits outside the mask are treated as natural. We reformulate VLM image tampering from coarse region labels to a pixel-grounded, meaning and language-aware task. First, we introduce a taxonomy spanning edit primitives (replace/remove/splice/inpaint/attribute/colorization, etc.) a...

πŸ“„ Pitfalls in Evaluating Interpretability Agents
πŸ—“οΈ Published: 3/20/2026
πŸ”— http://arxiv.org/abs/2603.20101v1
πŸ‘₯ Authors: Tal Haklay, Nikhil Prakash, Sana Pandey, Antonio Torralba (possible past Massachusetts Institute Of Technology affiliation), Aaron Mueller, Jacob Andreas (possible past University Of California, Berkeley affiliation), Tamar Rott Shaham (possible past Technion – Israel Institute Of Technology affiliation), Yonatan Belinkov
Abstract

Automated interpretability systems aim to reduce the need for human labor and scale analysis to increasingly large models and diverse tasks. Recent efforts toward this goal leverage large language models (LLMs) at increasing levels of autonomy, ranging from fixed one-shot workflows to fully autonomous interpretability agents. This shift creates a corresponding need to scale evaluation approaches to keep pace with both the volume and complexity of generated explanations. We investigate this chall...

πŸ“„ DIAL-KG: Schema-Free Incremental Knowledge Graph Construction via Dynamic Schema Induction and Evolution-Intent Assessment
πŸ—“οΈ Published: 3/20/2026
πŸ”— http://arxiv.org/abs/2603.20059v1
πŸ‘₯ Authors: Weidong Bao (possible past National University Of Defense Technology affiliation), Yilin Wang (possible past Google (United States) affiliation), Ruyu Gao, Fangling Leng, Yubin Bao, Ge Yu
Abstract

Knowledge Graphs (KGs) are foundational to applications such as search, question answering, and recommendation. Conventional knowledge graph construction methods are predominantly static, rely ing on a single-step construction from a fixed corpus with a prede f ined schema. However, such methods are suboptimal for real-world sce narios where data arrives dynamically, as incorporating new informa tion requires complete and computationally expensive graph reconstruc tions. Furthermore, predefined ...

πŸ“„ Detached Skip-Links and $R$-Probe: Decoupling Feature Aggregation from Gradient Propagation for MLLM OCR
πŸ—“οΈ Published: 3/20/2026
πŸ”— http://arxiv.org/abs/2603.20020v1
πŸ‘₯ Authors: Ziye Yuan, Ruchang Yao, Chengxin Zheng, Yusheng Zhao, Daxiang Dong (possible past Baidu (China) affiliation), Ming Zhang (possible past Peking University affiliation)
Abstract

Multimodal large language models (MLLMs) excel at high-level reasoning yet fail on OCR tasks where fine-grained visual details are compromised or misaligned. We identify an overlooked optimization issue in multi-layer feature fusion. Skip pathways introduce direct back-propagation paths from high-level semantic objectives to early visual layers. This mechanism overwrites low-level signals and destabilizes training. To mitigate this gradient interference, we propose Detached Skip-Links, a minimal...

πŸ“„ X-World: Controllable Ego-Centric Multi-Camera World Models for Scalable End-to-End Driving
πŸ—“οΈ Published: 3/20/2026
πŸ”— http://arxiv.org/abs/2603.19979v1
πŸ‘₯ Authors: Chaoda Zheng, Sean Li, Jinhao Deng, Zhennan Wang, Shijia Chen, Liqiang Xiao, Ziheng Chi, Hongbin Lin, Kangjie Chen, Boyang Wang, Yu Zhang (possible past Google (United States) affiliation), Xianming Liu (possible past Meta (United States) affiliation)
Abstract

Scalable and reliable evaluation is increasingly critical in the end-to-end era of autonomous driving, where vision--language--action (VLA) policies directly map raw sensor streams to driving actions. Yet, current evaluation pipelines still rely heavily on real-world road testing, which is costly, biased toward limited scenario coverage, and difficult to reproduce. These challenges motivate a real-world simulator that can generate realistic future observations under proposed actions, while remai...

πŸ“„ FormalEvolve: Neuro-Symbolic Evolutionary Search for Diverse and Prover-Effective Autoformalization
πŸ—“οΈ Published: 3/20/2026
πŸ”— http://arxiv.org/abs/2603.19828v1
πŸ‘₯ Authors: Haijian Lu, Wei Wang (possible past University Of Oxford affiliation), Jing Liu (possible past Baidu (China) affiliation)
Abstract

Autoformalization aims to translate natural-language mathematics into compilable, machine-checkable statements. However, semantic consistency does not imply prover effectiveness: even semantically consistent formalizations can differ substantially in proof-search cost and success rate. In this work, we formulate autoformalization as a budgeted, test-time search for semantically consistent repertoires, and propose FormalEvolve, a compilation-gated neuro-symbolic evolutionary framework. FormalEvol...

πŸ“„ Embodied Science: Closing the Discovery Loop with Agentic Embodied AI
πŸ—“οΈ Published: 3/20/2026
πŸ”— http://arxiv.org/abs/2603.19782v1
πŸ‘₯ Authors: Xiang Zhuang, Chenyi Zhou, Kehua Feng, Zhihui Zhu, Yunfan Gao, Yijie Zhong, Yichi Zhang, Junjie Huang, Keyan Ding, Lei Bai, Haofen Wang, Qiang Zhang (possible past Tsinghua University affiliation), Huajun Chen (possible past Alibaba Group (China) affiliation)
Abstract

Artificial intelligence has demonstrated remarkable capability in predicting scientific properties, yet scientific discovery remains an inherently physical, long-horizon pursuit governed by experimental cycles. Most current computational approaches are misaligned with this reality, framing discovery as isolated, task-specific predictions rather than continuous interaction with the physical world. Here, we argue for embodied science, a paradigm that reframes scientific discovery as a closed loop ...

πŸ“„ Stepwise: Neuro-Symbolic Proof Search for Automated Systems Verification
πŸ—“οΈ Published: 3/20/2026
πŸ”— http://arxiv.org/abs/2603.19715v1
πŸ‘₯ Authors: Baoding He, Zenan Li, Wei Sun (possible past Google (United States) affiliation), Yuan Yao (possible past Tsinghua University affiliation), Taolue Chen, Xiaoxing Ma, Zhendong Su
Abstract

Formal verification via interactive theorem proving is increasingly used to ensure the correctness of critical systems, yet constructing large proof scripts remains highly manual and limits scalability. Advances in large language models (LLMs), especially in mathematical reasoning, make their integration into software verification increasingly promising. This paper introduces a neuro-symbolic proof generation framework designed to automate proof search for systems-level verification projects. Th...

πŸ“„ PolicySim: An LLM-Based Agent Social Simulation Sandbox for Proactive Policy Optimization
πŸ—“οΈ Published: 3/20/2026
πŸ”— http://arxiv.org/abs/2603.19649v1
πŸ‘₯ Authors: Renhong Huang, Ning Tang, Jiarong Xu, Yuxuan Cao, Qingqian Tu, Sheng Guo (possible past Google (United States) affiliation), Bo Zheng, Huiyuan Liu, Yang Yang (possible past Tencent (China) affiliation)
Abstract

Social platforms serve as central hubs for information exchange, where user behaviors and platform interventions jointly shape opinions. However, intervention policies like recommendation and content filtering, can unintentionally amplify echo chambers and polarization, posing significant societal risks. Proactively evaluating the impact of such policies is therefore crucial. Existing approaches primarily rely on reactive online A/B testing, where risks are identified only after deployment, maki...

πŸ“„ OmniDiT: Extending Diffusion Transformer to Omni-VTON Framework
πŸ—“οΈ Published: 3/20/2026
πŸ”— http://arxiv.org/abs/2603.19643v1
πŸ‘₯ Authors: Weixuan Zeng, Pengcheng Wei, Huaiqing Wang, Boheng Zhang, Jia Sun, Dewen Fan, Lin He, Long Chen (possible past Tencent (China) affiliation), Qianqian Gan, Fan Yang (possible past Tencent (China) affiliation), Tingting Gao
Abstract

Despite the rapid advancement of Virtual Try-On (VTON) and Try-Off (VTOFF) technologies, existing VTON methods face challenges with fine-grained detail preservation, generalization to complex scenes, complicated pipeline, and efficient inference. To tackle these problems, we propose OmniDiT, an omni Virtual Try-On framework based on the Diffusion Transformer, which combines try-on and try-off tasks into one unified model. Specifically, we first establish a self-evolving data curation pipeline to...

πŸ“„ Physics-Informed Neural Network with Adaptive Clustering Learning Mechanism for Information Popularity Prediction
πŸ—“οΈ Published: 3/20/2026
πŸ”— http://arxiv.org/abs/2603.19599v1
πŸ‘₯ Authors: Guangyin Jin (possible past National University Of Defense Technology affiliation), Xiaohan Ni, Yanjie Song, Kun Wei, Jie Zhao (possible past Baidu (China) affiliation), Leiming Jia, Witold Pedrycz
Abstract

With society entering the Internet era, the volume and speed of data and information have been increasing. Predicting the popularity of information cascades can help with high-value information delivery and public opinion monitoring on the internet platforms. The current state-of-the-art models for predicting information popularity utilize deep learning methods such as graph convolution networks (GCNs) and recurrent neural networks (RNNs) to capture early cascades and temporal features to predic...

πŸ“„ PFM-VEPAR: Prompting Foundation Models for RGB-Event Camera based Pedestrian Attribute Recognition
πŸ—“οΈ Published: 3/20/2026
πŸ”— http://arxiv.org/abs/2603.19565v1
πŸ‘₯ Authors: Minghe Xu, Rouying Wu, Chiawei Chu, Xiao Wang (possible past Google (United States) affiliation), Yu Li (possible past Tencent (China) affiliation)
Abstract

Event-based pedestrian attribute recognition (PAR) leverages motion cues to enhance RGB cameras in low-light and motion-blur scenarios, enabling more accurate inference of attributes like age and emotion. However, existing two-stream multimodal fusion methods introduce significant computational overhead and neglect the valuable guidance from contextual samples. To address these limitations, this paper proposes an Event Prompter. Discarding the computationally expensive auxiliary backbone, this m...

πŸ“„ Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.19470v1
πŸ‘₯ Authors: Chenlu Ye, Xuanchang Zhang, Yifan Hao, Zhou Yu, Ziji Zhang, Abhinav Gullapalli, Hao Chen, Jing Huang (possible past Meta (United States) affiliation), Tong Zhang (possible past Tencent (China) affiliation)
Abstract

Off-policy problems such as policy staleness and training-inference mismatch, has become a major bottleneck for training stability and further exploration for LLM RL. To enhance inference efficiency, the distribution gap between the inference and updated policy grows, leading to heavy-tailed importance ratios. Heavy-tailed ratios arise when the policy is locally sharp, which further inflates sharp gradients and can push updates outside the trust region. To address this, we propose Adaptive Layer...

πŸ“„ A Framework for Formalizing LLM Agent Security
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.19469v1
πŸ‘₯ Authors: Vincent Siu, Jingxuan He, Kyle Montgomery, Zhun Wang, Neil Gong, Chenguang Wang (possible past Amazon (United States) affiliation), Dawn Song (possible past University Of California, Berkeley affiliation)
Abstract

Security in LLM agents is inherently contextual. For example, the same action taken by an agent may represent legitimate behavior or a security violation depending on whose instruction led to the action, what objective is being pursued, and whether the action serves that objective. However, existing definitions of security attacks against LLM agents often fail to capture this contextual nature. As a result, defenses face a fundamental utility-security tradeoff: applying defenses uniformly across...

πŸ“„ Hyperagents
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.19461v1
πŸ‘₯ Authors: Jenny Zhang, Bingchen Zhao, Wannan Yang, Jakob Foerster (possible past University Of Oxford affiliation), Jeff Clune (possible past Openai (United States) affiliation), Minqi Jiang, Sam Devlin, Tatiana Shavrina
Abstract

Self-improving AI systems aim to reduce reliance on human engineering by learning to improve their own learning and problem-solving processes. Existing approaches to self-improvement rely on fixed, handcrafted meta-level mechanisms, fundamentally limiting how fast such systems can improve. The Darwin GΓΆdel Machine (DGM) demonstrates open-ended self-improvement in coding by repeatedly generating and evaluating self-modified variants. Because both evaluation and self-modification are coding tasks,...

πŸ“„ Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.19220v1
πŸ‘₯ Authors: Zhuolin Yang, Zihan Liu, Yang Chen (possible past Tencent (China) affiliation), Wenliang Dai, Boxin Wang, Sheng-Chieh Lin, Chankyu Lee, Yangyi Chen, Dongfu Jiang, Jiafan He, Renjie Pi, Grace Lam, Nayeon Lee, Alexander Bukharin, Mohammad Shoeybi (possible past Nvidia (United States) affiliation), Bryan Catanzaro (possible past University Of California, Berkeley affiliation), Wei Ping (possible past Baidu (China) affiliation)
Abstract

We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight LLM, after DeepSeekV3.2-Speciale-671B-A37B, to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICP...

πŸ“„ SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.19173v1
πŸ‘₯ Authors: Edward Lin, Sahil Modi, Siva Kumar Sastry Hari (possible past Nvidia (United States) affiliation), Qijing Huang, Zhifan Ye, Nestor Qin, Fengzhe Zhou, Yuan Zhang (possible past Google (United States) affiliation), Jingquan Wang, Sana Damani, Dheeraj Peri, Ouye Xie, Aditya Kane, Moshe Maor, Michael Behar, Triston Cao, Rishabh Mehta, Vartika Singh, Vikram Sharma Mailthody, Terry Chen, Zihao Ye, Hanfeng Chen, Tianqi Chen (possible past University Of Washington affiliation), Vinod Grover, Wei Chen, Wei Liu (possible past Tsinghua University affiliation), Eric Chung, Luis Ceze, Roger Bringmann, Cyril Zeller, Michael Lightstone, Christos Kozyrakis (possible past Stanford University affiliation), Humphrey Shi
Abstract

As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity to hardware-efficient execution. We present SOL-ExecBench, a benchmark of 235 CUDA kernel optimization problems extracted from 124 production and emerging AI models spanning language, diffusion, vision, audio, video, and hybrid architectures, targeting NVIDIA Blackwell GPUs. The benchmark covers forward...

πŸ“„ LuMamba: Latent Unified Mamba for Electrode Topology-Invariant and Efficient EEG Modeling
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.19100v1
πŸ‘₯ Authors: DanaΓ© Broustail, Anna Tegon, Thorir Mar Ingolfsson, Yawei Li (possible past Google (United States) affiliation), Luca Benini (possible past Eth Zurich affiliation)
Abstract

Electroencephalography (EEG) enables non-invasive monitoring of brain activity across clinical and neurotechnology applications, yet building foundation models for EEG remains challenging due to \emph{differing electrode topologies} and \emph{computational scalability}, as Transformer architectures incur quadratic sequence complexity. As a joint solution, we propose \textbf{LuMamba} (\textbf{L}atent \textbf{U}nified \textbf{Mamba}), a self-supervised framework combining topology-invariant encodi...

πŸ“„ Em-Garde: A Propose-Match Framework for Proactive Streaming Video Understanding
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.19054v1
πŸ‘₯ Authors: Yikai Zheng, Xin Ding, Yifan Yang (possible past Tencent (China) affiliation), Shiqi Jiang, Hao Wu (possible past Tencent (China) affiliation), Qianxi Zhang, Weijun Wang (possible past Google (United States) affiliation), Ting Cao, Yunxin Liu
Abstract

Recent advances in Streaming Video Understanding has enabled a new interaction paradigm where models respond proactively to user queries. Current proactive VideoLLMs rely on per-frame triggering decision making, which suffers from an efficiency-accuracy dilemma. We propose Em-Garde, a novel framework that decouples semantic understanding from streaming perception. At query time, the Instruction-Guided Proposal Parser transforms user queries into structured, perceptually grounded visual proposals...

πŸ“„ Reasoning over mathematical objects: on-policy reward modeling and test time aggregation
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.18886v1
πŸ‘₯ Authors: Pranjal Aggarwal, Marjan Ghazvininejad (possible past Meta (United States) affiliation), Seungone Kim, Ilia Kulikov, Jack Lanchantin, Xian Li (possible past Meta (United States) affiliation), Tianjian Li, Bo Liu (possible past Meta (United States) affiliation), Graham Neubig (possible past Carnegie Mellon University affiliation), Anaelia Ovalle, Swarnadeep Saha, Sainbayar Sukhbaatar, Sean Welleck, Jason Weston (possible past Stanford University affiliation), Chenxi Whitehouse, Adina Williams (possible past Eth Zurich affiliation), Jing Xu (possible past Meta (United States) affiliation), Ping Yu, Weizhe Yuan, Jingyu Zhang, Wenting Zhao
Abstract

The ability to precisely derive mathematical objects is a core requirement for downstream STEM applications, including mathematics, physics, and chemistry, where reasoning must culminate in formally structured expressions. Yet, current LM evaluations of mathematical and scientific reasoning rely heavily on simplified answer formats such as numerical values or multiple choice options due to the convenience of automated assessment. In this paper we provide three contributions for improving reasoni...

πŸ“„ ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.18815v1
πŸ‘₯ Authors: Hao Zhang (possible past Tencent (China) affiliation), Mingjie Liu, Shaokun Zhang, Songyang Han, Jian Hu, Zhenghui Jin, Yuchi Zhang, Shizhe Diao, Ximing Lu, Binfeng Xu, Zhiding Yu (possible past Nvidia (United States) affiliation), Jan Kautz (possible past Nvidia (United States) affiliation), Yi Dong
Abstract

Multi-turn LLM agents are increasingly important for solving complex, interactive tasks, and reinforcement learning (RL) is a key ingredient for improving their long-horizon behavior. However, RL training requires generating large numbers of sandboxed rollout trajectories, and existing infrastructures often couple rollout orchestration with the training loop, making systems hard to migrate and maintain. Under the rollout-as-a-service philosophy, we present ProRL Agent , a scalable infrastructure...

πŸ“„ dTRPO: Trajectory Reduction in Policy Optimization of Diffusion Large Language Models
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.18806v1
πŸ‘₯ Authors: Wenxuan Zhang, Lemeng Wu, Changsheng Zhao, Ernie Chang, Mingchen Zhuge, Zechun Liu, Andy Su, Hanxian Huang, Jun Chen, Chong Zhou, Raghuraman Krishnamoorthi, Vikas Chandra (possible past Meta (United States) affiliation), Mohamed Elhoseiny (possible past Meta (United States) affiliation), Wei Wen (possible past Google (United States) affiliation)
Abstract

Diffusion Large Language Models (dLLMs) introduce a new paradigm for language generation, which in turn presents new challenges for aligning them with human preferences. In this work, we aim to improve the policy optimization for dLLMs by reducing the cost of the trajectory probability calculation, thereby enabling scaled-up offline policy training. We prove that: (i) under reference policy regularization, the probability ratio of the newly unmasked tokens is an unbiased estimate of that of inte...

πŸ“„ CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User Feedbacks
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.18736v1
πŸ‘₯ Authors: Hao Wang (possible past Tsinghua University affiliation), Licheng Pan, Zhichao Chen, Chunyuan Zheng, Zhixuan Chu, Xiaoxi Li, Yuan Lu, Xinggao Liu, Haoxuan Li, Zhouchen Lin (possible past Peking University affiliation)
Abstract

Despite the success of reinforcement learning from human feedback (RLHF) in aligning language models, current reward modeling heavily relies on experimental feedback data collected from human annotators under controlled and costly conditions. In this work, we introduce observational reward modeling -- learning reward models with observational user feedback (e.g., clicks, copies, and upvotes) -- as a scalable and cost-effective alternative. We identify two fundamental challenges in this setting: ...

πŸ“„ Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD
πŸ—“οΈ Published: 3/20/2026
πŸ”— http://arxiv.org/abs/2603.20155v1
πŸ‘₯ Authors: Emiel Hoogeboom, David Ruhe, Jonathan Heek (possible past Google (United States) affiliation), Thomas Mensink, Tim Salimans (possible past Openai (United States) affiliation)
Abstract

It is currently difficult to distill discrete diffusion models. In contrast, continuous diffusion literature has many distillation approaches methods that can reduce sampling steps to a handful. Our method, Discrete Moment Matching Distillation (D-MMD), leverages ideas that have been highly successful in the continuous domain. Whereas previous discrete distillation methods collapse, D-MMD maintains high quality and diversity (given sufficient sampling steps). This is demonstrated on both text ...

πŸ“„ Deep Autocorrelation Modeling for Time-Series Forecasting: Progress and Prospects
πŸ—“οΈ Published: 3/20/2026
πŸ”— http://arxiv.org/abs/2603.19899v1
πŸ‘₯ Authors: Hao Wang (possible past Tsinghua University affiliation), Licheng Pan, Qingsong Wen, Jialin Yu, Zhichao Chen, Chunyuan Zheng, Xiaoxi Li, Zhixuan Chu, Chao Xu, Mingming Gong, Haoxuan Li, Yuan Lu, Zhouchen Lin (possible past Peking University affiliation), Philip Torr (possible past University Of Oxford affiliation), Yan Liu (possible past Tencent (China) affiliation)
Abstract

Autocorrelation is a defining characteristic of time-series data, where each observation is statistically dependent on its predecessors. In the context of deep time-series forecasting, autocorrelation arises in both the input history and the label sequences, presenting two central research challenges: (1) designing neural architectures that model autocorrelation in history sequences, and (2) devising learning objectives that model autocorrelation in label sequences. Recent studies have made stri...

πŸ“„ FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization
πŸ—“οΈ Published: 3/20/2026
πŸ”— http://arxiv.org/abs/2603.19835v1
πŸ‘₯ Authors: Chiyu Ma, Shuo Yang, Kexin Huang (possible past Stanford University affiliation), Jinda Lu, Haoming Meng, Shangshang Wang, Bolin Ding, Soroush Vosoughi (possible past Google (United States) affiliation), Guoyin Wang, Jingren Zhou
Abstract

We present Future-KL Influenced Policy Optimization (FIPO), a reinforcement learning algorithm designed to overcome reasoning bottlenecks in large language models. While GRPO style training scales effectively, it typically relies on outcome-based rewards (ORM) that distribute a global advantage uniformly across every token in a trajectory. We argue that this coarse-grained credit assignment imposes a performance ceiling by failing to distinguish critical logical pivots from trivial tokens. FIPO ...

πŸ“„ Spectrally-Guided Diffusion Noise Schedules
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.19222v1
πŸ‘₯ Authors: Carlos Esteves (possible past Google (United States) affiliation), Ameesh Makadia (possible past Google (United States) affiliation)
Abstract

Denoising diffusion models are widely used for high-quality image and video generation. Their performance depends on noise schedules, which define the distribution of noise levels applied during training and the sequence of noise levels traversed during sampling. Noise schedules are typically handcrafted and require manual tuning across different resolutions. In this work, we propose a principled way to design per-instance noise schedules for pixel diffusion, based on the image's spectral proper...

πŸ“„ DriveTok: 3D Driving Scene Tokenization for Unified Multi-View Reconstruction and Understanding
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.19219v1
πŸ‘₯ Authors: Dong Zhuo, Wenzhao Zheng, Sicheng Zuo, Siming Yan, Lu Hou, Jie Zhou (possible past Tsinghua University affiliation), Jiwen Lu (possible past Tsinghua University affiliation)
Abstract

With the growing adoption of vision-language-action models and world models in autonomous driving systems, scalable image tokenization becomes crucial as the interface for the visual modality. However, most existing tokenizers are designed for monocular and 2D scenes, leading to inefficiency and inter-view inconsistency when applied to high-resolution multi-view driving scenes. To address this, we propose DriveTok, an efficient 3D driving scene tokenizer for unified multi-view reconstruction and...

πŸ“„ Fast and Effective Computation of Generalized Symmetric Matrix Factorization
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.19147v1
πŸ‘₯ Authors: Lei Yang (possible past Google (United States) affiliation), Han Wan, Min Zhang (possible past Tsinghua University affiliation), Ling Liang
Abstract

In this paper, we study a nonconvex, nonsmooth, and non-Lipschitz generalized symmetric matrix factorization model that unifies a broad class of matrix factorization formulations arising in machine learning, image science, engineering, and related areas. We first establish two exactness properties. On the modeling side, we prove an exact penalty property showing that, under suitable conditions, the symmetry-inducing quadratic penalty enforces symmetry whenever the penalty parameter is sufficient...

πŸ“„ STEP: Scientific Time-Series Encoder Pretraining via Cross-Domain Distillation
πŸ—“οΈ Published: 3/19/2026
πŸ”— http://arxiv.org/abs/2603.18688v1
πŸ‘₯ Authors: Chen Zhang (possible past Peking University affiliation), Liwei Liu, Jun Tao, Xiaoyu Yang, Xuenan Xu, Kai Chen (possible past Shanghai Jiao Tong University affiliation), Bowen Zhou, Wen Wu, Chao Zhang
Abstract

Scientific time series are central to scientific AI but are typically sparse, highly heterogeneous, and limited in scale, making unified representation learning particularly challenging. Meanwhile, foundation models pretrained on relevant time series domains such as audio, general time series, and brain signals contain rich knowledge, but their applicability to scientific signals remains underexplored. In this paper, we investigate the transferability and complementarity of foundation models fro...

*Notable papers are those with at least two authors from a "big" AI/ML lab.