📄 Notable* Recent AI/ML arXiv Papers

Last updated just now...

📄 SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks
🗓️ Published: 6/8/2026
🔗 http://arxiv.org/abs/2606.09669v1
👥 Authors: Hongcheng Gao, Hailong Qu, Jingyi Tang, Jiahao Wang, Zihao Huang, Hengkang Qiao, Shihong Huang, Junming Yang, Yi Li (possible past University Of Washington affiliation), Hongyixuan Yuan, Wenjie Li, Bohan Zeng, Wenbo Li, Bo Wang (possible past Tencent (China) affiliation), Jianhui Liu, Olive Huang, Haoyang Huang, Wentao Zhang (possible past Mila - Quebec Artificial Intelligence Institute affiliation), Guoqing Huang, Nan Duan, Yinpeng Dong (possible past Tsinghua University affiliation)
Abstract

Spatial reasoning is a foundational capability for multimodal large language models (MLLMs) to perceive and operate within the physical world. However, existing benchmarks predominantly rely on passive evaluation (e.g., static VQA) or simulator-specific pipelines, failing to assess general interactive spatial understanding. We introduce SpatialWorld, a unified benchmark designed specifically for evaluating the interactive spatial understanding of multimodal agents in complex real-world tasks. In...

📄 End-to-End Context Compression at Scale
🗓️ Published: 6/8/2026
🔗 http://arxiv.org/abs/2606.09659v1
👥 Authors: Ang Li (possible past Google (United States) affiliation), Sean Mcleish, Haozhe Chen, Nimit Kalra, Zaiqian Chen, Artem Gazizov, Venkata Anoop Suhas Kumar Morisetty, Bhavya Kailkhura, Harshitha Menon, Zhuang Liu (possible past University Of California, Berkeley affiliation), Brian R. Bartoldson, Tom Goldstein (possible past Meta (United States) affiliation), Sanae Lotfi, Micah Goldblum, Pavel Izmailov
Abstract

Long-context language model inference is bottlenecked by memory, as the KV cache grows with context length. Recent techniques to compress the KV cache fall short: they either degrade model quality substantially or require considerable time and compute to compress a single long prompt. Furthermore, many methods require the input to fit within the target model's context window, and are generally incompatible with modern production inference engines. Encoder-decoder compressors, which map a long to...

📄 Bayesian Selective Latent Inference for Wastewater-First Influenza Monitoring
🗓️ Published: 6/8/2026
🔗 http://arxiv.org/abs/2606.09433v1
👥 Authors: Yixuan Zhang, Yang Song (possible past Stanford University affiliation), Hao Wang (possible past Tsinghua University affiliation), Samir Bhatt, Hengguan Huang
Abstract

Wastewater influenza surveillance can reveal community circulation before clinical reporting, but wastewater alone is not a fully identifiable proxy for human burden. Existing wastewater models assume a fixed evidence set, while generic evidence-acquisition methods treat official surveillance streams as interchangeable costly features. We cast wastewater-first influenza monitoring as a selective decision problem: starting from mandatory wastewater evidence, the system must decide whether wastewa...

📄 Anything2Skill: Compiling External Knowledge into Reusable Skills for Agents
🗓️ Published: 6/8/2026
🔗 http://arxiv.org/abs/2606.09316v1
👥 Authors: Qianjun Pan, Yutao Yang, Junsong Li, Jie Zhou (possible past Tsinghua University affiliation), Kai Chen (possible past Shanghai Jiao Tong University affiliation), Xin Li (possible past Google (United States) affiliation), Qin Chen, Liang He
Abstract

Retrieval-augmented generation (RAG) enables agents to access external knowledge at inference time, but it primarily retrieves fragmented declarative evidence, leaving agents to repeatedly infer task procedures from passages, manuals, examples, logs, or trajectories. This raises a fundamental question: can skills extracted from external knowledge bases be installed into an agent, enabling it to rapidly approximate domain expertise? In this paper, we propose Anything2Skill, a taxonomy-guided fram...

📄 DynaOD: Dynamic Origin-Destination Flow Generation with Discrete-to-Continuous Temporal Semantic Modeling
🗓️ Published: 6/8/2026
🔗 http://arxiv.org/abs/2606.09086v1
👥 Authors: Jie Zhao (possible past Baidu (China) affiliation), Xianqi Dai, Jie Feng (possible past Tsinghua University affiliation), Huandong Wang, Yong Li (possible past Tsinghua University affiliation)
Abstract

Dynamic origin-destination (OD) flow generation seeks to synthesize realistic mobility dynamics from temporal context alone, without relying on historical OD observations. A key challenge is to translate semantic temporal signals into temporally coherent OD patterns while preserving the inherent spatial heterogeneity of urban regions. We propose DynaOD, a semantic-driven framework that models temporal dynamics through two complementary perspectives: discrete directional trends that characterize ...

📄 FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention
🗓️ Published: 6/8/2026
🔗 http://arxiv.org/abs/2606.09079v1
👥 Authors: Yan Wang (possible past Tencent (China) affiliation), Qifan Zhang, Jiachen Yu, Tian Liang, Dongyang Ma, Xiang Hu, Zibo Lin, Chunyang Li, Zhichao Wang, Jia Li (possible past Google (United States) affiliation), Yujiu Yang (possible past Tsinghua University affiliation), Haitao Mi, Dong Yu (possible past Tencent (China) affiliation)
Abstract

Conventional LLMs keep the full KV cache loaded during decoding, causing a severe GPU memory bottleneck for ultra-long context serving. In this report, we propose Lookahead Sparse Attention (LSA), a novel inference paradigm powered by a Neural Memory Indexer built upon the DeepSeek-V4 architecture. Rather than passively attending to all historical tokens, LSA proactively predicts future context demands and preserves only the query-critical KV chunks in the GPU memory. Crucially, we instantiate t...

📄 BareWave: Waveform-Native Flow-Matching Text-to-Speech
🗓️ Published: 6/8/2026
🔗 http://arxiv.org/abs/2606.09048v1
👥 Authors: Wei Fan (possible past Tencent (China) affiliation), Chao-Hong Tan, Qian Chen (possible past Shanghai Jiao Tong University affiliation), Wen Wang, Xiangang Li, Kejiang Chen, Weiming Zhang, Nenghai Yu
Abstract

Removing intermediate representations and separately trained decoding stages has become an important direction in generative modeling. In text-to-speech, however, high-quality systems are still commonly built through an intermediate acoustic representation before waveform synthesis. In this work, we present BareWave, a fully waveform-native framework for direct text-to-wave generation in flow-matching TTS. We consider this setting to raise three training challenges: raw-waveform modeling lacks a...

📄 Personalization Meets Safety:Mechanisms,Risks,and Mitigations in Personalized LLMs
🗓️ Published: 6/8/2026
🔗 http://arxiv.org/abs/2606.09038v1
👥 Authors: Yanyan Luo, Xue Han, Ruiqiao Bai, Xin Huang (possible past Baidu (China) affiliation), Yitong Wang (possible past Tencent (China) affiliation), Qian Hu, Qing Wang, Chunxu Zhao, Jie Liu (possible past Tencent (China) affiliation), Cong Geng, Lehao Xing, Pengwei Hu, Junlan Feng
Abstract

Large Language Models (LLMs) have enabled increasingly personalized interactions by adapting to users' preferences, contexts, and long-term histories. However, the mechanisms that enable personalization also expand the safety landscape in ways not systematically addressed by existing literature. Existing reviews typically focus either on personalization or safety, leaving their intersection largely unexplored. We present the first comprehensive, safety-aware review of personalized LLMs. We organ...

📄 PACT: Learning Diverse Diagnostic Strategies via Privileged Synthesis and Branch Consensus
🗓️ Published: 6/8/2026
🔗 http://arxiv.org/abs/2606.08938v1
👥 Authors: Gen Li (possible past University Of Edinburgh affiliation), Yuanze Hu, Zhichao Yang, Qingchen Yu, Jianwei Lv, Yue Guo, Yujing Liu, Faguo Wu, Hongwei Zheng, Xiandong Li, Bo Yuan, Yifan Sun (possible past Baidu (China) affiliation), Zhaoxin Fan
Abstract

Clinical diagnosis requires flexible use of multiple reasoning paradigms under incomplete patient information. Existing LLM-based medical agents show strong medical reasoning ability, but single-paradigm or naively mixed dialogue supervision makes these paradigms difficult to learn without interference. We propose \textbf{PACT} (Periodic Anchor Consensus Training), a framework that couples supervised multi-paradigm dialogue synthesis with consensus-based Branch training. At the data level, \text...

📄 PAI: Preserving Amplitude Information in Representation-Based Time-Series Anomaly Detection
🗓️ Published: 6/8/2026
🔗 http://arxiv.org/abs/2606.08935v1
👥 Authors: Kang Zhang, Wei Jian Lau, Shoushou Ren, Dong Lin (possible past Google (United States) affiliation), Joon Son Chung (possible past University Of Oxford affiliation), Chuanhao Sun
Abstract

Representation-based time-series anomaly detection algorithms significantly outperform other methods on diverse anomaly detection tasks. However, we notice that they suffer from a major limitation in our evaluation - their learned embeddings are often amplitude-agnostic. Losing amplitude information can degrade performance on amplitude related anomalies, and this failure is prevalent across all existing representation-based methods. To address aforementioned issues, we propose a new anomaly scor...

📄 From Statute to Control Flow: Span-Grounded Deontic Trees for Defeasible Scope Parsing
🗓️ Published: 6/8/2026
🔗 http://arxiv.org/abs/2606.08932v1
👥 Authors: Jian Chen (possible past Baidu (China) affiliation), Siyuan Li (possible past Tencent (China) affiliation), Chucheng Wan, Zixuan Yuan
Abstract

Rule-following agents tasked with executing policies and regulations often fail via Silent Scope Omission (SSO): a model applies a general rule but silently drops nested exceptions or counter-exceptions, producing outputs that appear compliant yet break on important edge cases. Although such failures are often framed as an agentic-systems problem, the underlying bottleneck is statutory and policy understanding, a capability typically studied in legal NLP. However, most existing legal NLP benchma...

📄 A multi-agent system for spine MRI report generation from multi-sequence imaging
🗓️ Published: 6/8/2026
🔗 http://arxiv.org/abs/2606.08897v1
👥 Authors: Zhiping Xiao, Junwei Yang, Gongbo Sun, Han Zhang (possible past Tsinghua University affiliation), Hanwen Xu, Yi Yao, Zachary D. Miller, William E. King, Mohammed M. Kanani, Jalal B. Andre, Sammy Chu, Ming Zhang (possible past Peking University affiliation), Paul E. Kinahan, Nathan M. Cross, Sheng Wang (possible past Tencent (China) affiliation)
Abstract

Spinal pathology is a leading cause of pain and disability worldwide. Spine MRI is central to clinical evaluation, yet its interpretation remains complex and time-consuming, requiring integration of information across multiple imaging sequences and anatomical regions. Despite recent advances in automated MRI analysis, effectively combining multi-sequence data while preserving sequence-specific diagnostic information remains an open challenge. Here we present SpineAgent, a multi-agent framework f...

📄 FiberTune: Preserving Action-Fiber Visual Residuals in Vision-Language-Action Fine-Tuning
🗓️ Published: 6/7/2026
🔗 http://arxiv.org/abs/2606.08653v1
👥 Authors: Haihao Lin, Xiangsheng Huang, Xiao Yang (possible past Tencent (China) affiliation), Weibang Zhou, Yiqi Zhang, Bo Yang (possible past Tencent (China) affiliation), Simin Zeng, Jiawei Yang, Zhengyang Wang, Jiahui Du
Abstract

Action-supervised fine-tuning of vision-language-action (VLA) policies fits demonstrations effectively but constrains only the directions that change predicted actions, leaving visual structure consistent across action-equivalent states free to collapse. We formalize this as residual visual collapse along local action fibers and propose FiberTune, a training-time objective that preserves teacher-structured visual residuals without adding inference-time overhead. FiberTune uses an online action p...

📄 Data-driven discovery of governing differential equations across physical systems
🗓️ Published: 6/8/2026
🔗 http://arxiv.org/abs/2606.09638v1
👥 Authors: Siyu Lou, Hao Xu, Wenguan Wang (possible past Eth Zurich affiliation), Lu Lu, Hao Sun, Yang Liu (possible past Tsinghua University affiliation), Linfeng Zhang, Dongxiao Zhang, Yuntian Chen
Abstract

Differential equations play a critical role in scientific discovery because they provide a mathematical framework to describe the behaviour of physical phenomena. As a promising alternative to traditional first principles, data-driven differential equation discovery has attracted increasing attention for its ability to infer governing laws directly from experimental or simulated data, especially when the underlying physics is unclear. However, the field has expanded rapidly along diverse methodo...

📄 Breaking the Tokenizer Barrier: On-Policy Distillation across Model Families
🗓️ Published: 6/8/2026
🔗 http://arxiv.org/abs/2606.09456v1
👥 Authors: Yifan Niu, Han Xiao, Dongyi Liu, Zelong Wang, Dihong Gong (possible past Tencent (China) affiliation), Yasheng Wang, Jia Li (possible past Google (United States) affiliation)
Abstract

On-Policy Distillation (OPD) has become a core technique in the post-training of Large Language Models (LLMs) for transferring knowledge from domain experts to student models. However, existing OPD distillation methods require teacher and student models to share the same tokenizer, restricting the applicability of OPD within the model series. Current mainstream practice typically employs Supervised Fine-Tuning (SFT) on teacher-generated responses for cross-tokenizer distillation, which fails to ...

📄 PBSD: Privileged Bayesian Self-Distillation for Long-Horizon Credit Assignment
🗓️ Published: 6/8/2026
🔗 http://arxiv.org/abs/2606.09348v1
👥 Authors: Yang Tian, Rui Wang (possible past Tencent (China) affiliation), Xumeng Wen, Junjie Li, Shizhao Sun, Lei Song, Jiang Bian (possible past Baidu (China) affiliation), Bo Zhao (possible past National University Of Singapore affiliation)
Abstract

Long-horizon agentic tasks pose a fundamental credit assignment challenge for outcome-base reinforcement learning: trajectory-level rewards verify final correctness but provide limited guidance on which intermediate reasoning steps or tool interactions contribute to the outcome. The difficulty is especially pronounced in multi-turn search agents, where successful trajectories may contain misleading actions and failed trajectories may contain valuable evidence-gathering steps. We propose PBSD (Pr...

📄 MilliVid: Hierarchical Latents for Long-Range Consistency in Video Generation
🗓️ Published: 6/8/2026
🔗 http://arxiv.org/abs/2606.09056v1
👥 Authors: Ishaan Preetam Chandratreya, David Charatan, Basile Van Hoorick, Sergey Zakharov, Vitor Guizilini, Phillip Isola (possible past University Of California, Berkeley affiliation), Vincent Sitzmann (possible past Stanford University affiliation)
Abstract

Video generative models have become increasingly powerful, but long-range consistency remains challenging to achieve because even a few dozen frames require impractically long transformer sequence lengths. We show that this issue can be mitigated by generating video using coarse-to-fine rollout within a multi-scale token space. Our approach is simple: first, we pre-train an autoencoder that compresses each frame into a hierarchy of tokens, with levels ranging from the typical latent resolution t...

📄 IR-SIM: A Lightweight Skill-Native Simulator for Navigation, Learning, and Benchmarking
🗓️ Published: 6/7/2026
🔗 http://arxiv.org/abs/2606.08729v1
👥 Authors: Ruihua Han, Shuai Wang, Chengyang Li, Rui Gao, Xinyi Wang (possible past Carnegie Mellon University affiliation), Zhe Liu, Guoliang Li (possible past Tsinghua University affiliation), Yupu Lu, Qi Hao, Jia Pan, Hengshuang Zhao (possible past University Of Oxford affiliation)
Abstract

Simulation plays a key role in automated robotics research supported by large language models (LLMs). However, existing simulators often require custom code or complex interfaces, creating a barrier to rapid prototyping and automated algorithm development. To this end, we propose the Intelligent Robot Simulator (IR-SIM), a lightweight skill-native navigation simulator designed for rapid scenario construction, benchmarking, and robot learning. In IR-SIM, scenarios are entirely defined by YAML con...

📄 Lost in the Non-convex Loss Landscape: How to Fine-tune the Large Time Series Model?
🗓️ Published: 6/7/2026
🔗 http://arxiv.org/abs/2606.08578v1
👥 Authors: Xu Zhang (possible past Tencent (China) affiliation), Peang Wang, Wei Wang (possible past University Of Oxford affiliation)
Abstract

Recently, large time series models (LTSMs) have gained increasing attention due to their similarities to large language models, including flexible context length, scalability, and task generality, outperforming advanced task-specific models. However, prior studies indicate that pre-trained LTSMs may exhibit a poorly conditioned non-convex loss landscape, leading to limited trainability. As a result, direct fine-tuning tends to cause overfitting and suboptimal performance, sometimes even worse th...

📄 SAEExplainer: Interpreting SAE Features with Activation-Guided Preference Optimization
🗓️ Published: 6/7/2026
🔗 http://arxiv.org/abs/2606.08496v1
👥 Authors: Jingyi He, Haiyan Zhao, Ruxue Shi, Yanguang Liu, Xin Wang (possible past University Of Edinburgh affiliation), Fei Sun (possible past Meta (United States) affiliation), Mengnan Du
Abstract

Although Sparse Autoencoders (SAEs) have mitigated the opacity of large language models (LLMs) by decomposing dense representations into sparse features, explaining these features still remains a central challenge. Current explanation methods, however, typically operate within an open-loop paradigm, failing to leverage mechanistic feedback for further refinement. In this paper, we propose SAEExplainer, a training framework utilizes activation scores as an objective reward signal to train the mod...

📄 STELLAR: Spatio-Temporal Environmental Learning with Latent Alignment and Refinement for Long-Tailed Species Distribution Modeling
🗓️ Published: 6/7/2026
🔗 http://arxiv.org/abs/2606.08484v1
👥 Authors: Shufeng Kong, Tao Yu (possible past University Of Washington affiliation), Yuanyuan Wei, Caihua Liu, Junwen Bai, Yingheng Wang, Marc Grimson, Daniel Fink (possible past Google (United States) affiliation), Carla P. Gomes
Abstract

Joint Species Distribution Modeling (JSDM) is a key enabler for biodiversity monitoring and conservation planning. However, accurate JSDM faces two coupled challenges: environmental drivers and species distributions are inherently spatio-temporal, while species co-occurrence patterns exhibit complex non-linear community structure and severe long-tail imbalance driven by rare species. Existing approaches often address these factors in isolation, learning from static covariates or neglecting the h...

📄 Physically Consistent Null Space Alignment for Detection of Low-Magnitude False Data Injection Attacks
🗓️ Published: 6/7/2026
🔗 http://arxiv.org/abs/2606.08473v1
👥 Authors: Xin Li (possible past Google (United States) affiliation), Chenhan Xiao, Jonathan Cohen (possible past Nvidia (United States) affiliation), Aviad Elyashar, Yang Weng, Rami Puzis
Abstract

False data injection attacks (FDIAs) introducing small measurement perturbations can still cause large deviations in power system state estimation when the injected signals align with the pseudo-null space of the system model. Existing model- and data-driven detectors may fail to identify such low-magnitude but high-impact attacks because residual tests ignore changes hidden in the pseudo-null space, while subspace learning methods capture correlation patterns without enforcing physical consiste...

*Notable papers are those with at least two authors from a "big" AI/ML lab.