πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ Learning a Generative Meta-Model of LLM Activations
πŸ—“οΈ Published: 2/6/2026
πŸ”— http://arxiv.org/abs/2602.06964v1
πŸ‘₯ Authors: Grace Luo, Jiahai Feng, Trevor Darrell (possible past University Of California, Berkeley affiliation), Alec Radford (possible past Openai (United States) affiliation), Jacob Steinhardt (possible past University Of California, Berkeley affiliation)
Abstract

Existing approaches for analyzing neural network activations, such as PCA and sparse autoencoders, rely on strong structural assumptions. Generative models offer an alternative: they can uncover structure without such assumptions and act as priors that improve intervention fidelity. We explore this direction by training diffusion models on one billion residual stream activations, creating "meta-models" that learn the distribution of a network's internal states. We find that diffusion loss decrea...

πŸ“„ DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos
πŸ—“οΈ Published: 2/6/2026
πŸ”— http://arxiv.org/abs/2602.06949v1
πŸ‘₯ Authors: Shenyuan Gao, William Liang, Kaiyuan Zheng, Ayaan Malik, Seonghyeon Ye, Sihyun Yu, Wei-Cheng Tseng, Yuzhu Dong, Kaichun Mo (possible past Stanford University affiliation), Chen-Hsuan Lin (possible past Nvidia (United States) affiliation), Qianli Ma, Seungjun Nah, Loic Magne, Jiannan Xiang, Yuqi Xie, Ruijie Zheng, Dantong Niu, You Liang Tan, K. R. Zentner, George Kurian, Suneel Indupuru, Pooya Jannaty, Jinwei Gu (possible past Shanghai Artificial Intelligence Laboratory affiliation), Jun Zhang (possible past Tencent (China) affiliation), Jitendra Malik (possible past University Of California, Berkeley affiliation), Pieter Abbeel (possible past University Of California, Berkeley affiliation), Ming-Yu Liu (possible past Nvidia (United States) affiliation), Yuke Zhu (possible past Stanford University affiliation), Joel Jang, Linxi "jim" Fan
Abstract

Being able to simulate the outcomes of actions in varied environments will revolutionize the development of generalist agents at scale. However, modeling these world dynamics, especially for dexterous robotics tasks, poses significant challenges due to limited data coverage and scarce action labels. As an endeavor towards this end, we introduce DreamDojo, a foundation world model that learns diverse interactions and dexterous controls from 44k hours of egocentric human videos. Our data mixture r...

πŸ“„ From Kepler to Newton: Inductive Biases Guide Learned World Models in Transformers
πŸ—“οΈ Published: 2/6/2026
πŸ”— http://arxiv.org/abs/2602.06923v1
πŸ‘₯ Authors: Ziming Liu (possible past Massachusetts Institute Of Technology affiliation), Sophia Sanborn, Surya Ganguli (possible past Stanford University affiliation), Andreas Tolias
Abstract

Can general-purpose AI architectures go beyond prediction to discover the physical laws governing the universe? True intelligence relies on "world models" -- causal abstractions that allow an agent to not only predict future states but understand the underlying governing dynamics. While previous "AI Physicist" approaches have successfully recovered such laws, they typically rely on strong, domain-specific priors that effectively "bake in" the physics. Conversely, Vafa et al. recently showed that...

πŸ“„ TraceCoder: A Trace-Driven Multi-Agent Framework for Automated Debugging of LLM-Generated Code
πŸ—“οΈ Published: 2/6/2026
πŸ”— http://arxiv.org/abs/2602.06875v1
πŸ‘₯ Authors: Jiangping Huang, Wenguang Ye, Weisong Sun, Jian Zhang (possible past Tencent (China) affiliation), Mingyue Zhang, Yang Liu (possible past Tsinghua University affiliation)
Abstract

Large Language Models (LLMs) often generate code with subtle but critical bugs, especially for complex tasks. Existing automated repair methods typically rely on superficial pass/fail signals, offering limited visibility into program behavior and hindering precise error localization. In addition, without a way to learn from prior failures, repair processes often fall into repetitive and inefficient cycles. To overcome these challenges, we present TraceCoder, a collaborative multi-agent framework...

πŸ“„ AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents
πŸ—“οΈ Published: 2/6/2026
πŸ”— http://arxiv.org/abs/2602.06855v1
πŸ‘₯ Authors: Alisia Lupidi, Bhavul Gauri, Thomas Simon Foster, Bassel Al Omari, Despoina Magka, Alberto Pepe, Alexis Audran-Reiss, Muna Aghamelu, Nicolas Baldwin, Lucia Cipolina-Kun, Jean-Christophe Gagnon-Audet, Chee Hau Leow, Sandra Lefdal, Hossam Mossalam, Abhinav Moudgil, Saba Nazir, Emanuel Tewolde, Isabel Urrego, Jordi Armengol Estape, Amar Budhiraja, Gaurav Chaurasia, Abhishek Charnalia, Derek Dunfield, Karen Hambardzumyan, Daniel Izcovich, Martin Josifoski, Ishita Mediratta, Kelvin Niu, Parth Pathak, Michael Shvartsman, Edan Toledo, Anton Protopopov, Roberta Raileanu, Alexander Miller (possible past Meta (United States) affiliation), Tatiana Shavrina, Jakob Foerster (possible past University Of Oxford affiliation), Yoram Bachrach (possible past Deepmind (United Kingdom) affiliation)
Abstract

LLM agents hold significant promise for advancing scientific research. To accelerate this progress, we introduce AIRS-Bench (the AI Research Science Benchmark), a suite of 20 tasks sourced from state-of-the-art machine learning papers. These tasks span diverse domains, including language modeling, mathematics, bioinformatics, and time series forecasting. AIRS-Bench tasks assess agentic capabilities over the full research lifecycle -- including idea generation, experiment analysis and iterative r...

πŸ“„ POP: Online Structural Pruning Enables Efficient Inference of Large Foundation Models
πŸ—“οΈ Published: 2/6/2026
πŸ”— http://arxiv.org/abs/2602.06822v1
πŸ‘₯ Authors: Yi Chen, Wonjin Shin, Shuhong Liu, Tho Mai, Jeongmo Lee, Chuanbo Hua, Kun Wang, Jun Liu (possible past Tencent (China) affiliation), Joo-Young Kim (possible past Microsoft (United States) affiliation)
Abstract

Large foundation models (LFMs) achieve strong performance through scaling, yet current structural pruning methods derive fixed pruning decisions during inference, overlooking sparsity patterns that emerge in the autoregressive token generation. In this paper, we propose POP (Partition-guided Online Pruning), an efficient online structural pruning framework that enables context-conditioned dynamic pruning with minimal computational overhead. POP partitions model channels into retained, candidate,...

πŸ“„ SaDiT: Efficient Protein Backbone Design via Latent Structural Tokenization and Diffusion Transformers
πŸ—“οΈ Published: 2/6/2026
πŸ”— http://arxiv.org/abs/2602.06706v1
πŸ‘₯ Authors: Shentong Mo (possible past Baidu (China) affiliation), Lanqing Li (possible past Tencent (China) affiliation)
Abstract

Generative models for de novo protein backbone design have achieved remarkable success in creating novel protein structures. However, these diffusion-based approaches remain computationally intensive and slower than desired for large-scale structural exploration. While recent efforts like Proteina have introduced flow-matching to improve sampling efficiency, the potential of tokenization for structural compression and acceleration remains largely unexplored in the protein domain. In this work, w...

πŸ“„ Humanoid Manipulation Interface: Humanoid Whole-Body Manipulation from Robot-Free Demonstrations
πŸ—“οΈ Published: 2/6/2026
πŸ”— http://arxiv.org/abs/2602.06643v1
πŸ‘₯ Authors: Ruiqian Nai, Boyuan Zheng, Junming Zhao, Haodong Zhu, Sicong Dai, Zunhao Chen, Yihang Hu, Yingdong Hu, Tong Zhang (possible past Tencent (China) affiliation), Chuan Wen, Yang Gao (possible past Tencent (China) affiliation)
Abstract

Current approaches for humanoid whole-body manipulation, primarily relying on teleoperation or visual sim-to-real reinforcement learning, are hindered by hardware logistics and complex reward engineering. Consequently, demonstrated autonomous skills remain limited and are typically restricted to controlled environments. In this paper, we present the Humanoid Manipulation Interface (HuMI), a portable and efficient framework for learning diverse whole-body manipulation tasks across various environ...

πŸ“„ Scaling Speech Tokenizers with Diffusion Autoencoders
πŸ—“οΈ Published: 2/6/2026
πŸ”— http://arxiv.org/abs/2602.06602v1
πŸ‘₯ Authors: Yuancheng Wang, Zhenyu Tang, Yun Wang, Arthur Hinsvark, Yingru Liu, Yinghao Li, Kainan Peng (possible past Baidu (China) affiliation), Junyi Ao, Mingbo Ma, Mike Seltzer (possible past Microsoft (United States) affiliation), Qing He (possible past Tencent (China) affiliation), Xubo Liu
Abstract

Speech tokenizers are foundational to speech language models, yet existing approaches face two major challenges: (1) balancing trade-offs between encoding semantics for understanding and acoustics for reconstruction, and (2) achieving low bit rates and low token rates. We propose Speech Diffusion Tokenizer (SiTok), a diffusion autoencoder that jointly learns semantic-rich representations through supervised learning and enables high-fidelity audio reconstruction with diffusion. We scale SiTok to ...

πŸ“„ AgentCPM-Report: Interleaving Drafting and Deepening for Open-Ended Deep Research
πŸ—“οΈ Published: 2/6/2026
πŸ”— http://arxiv.org/abs/2602.06540v1
πŸ‘₯ Authors: Yishan Li, Wentong Chen, Yukun Yan, Mingwei Li, Sen Mei, Xiaorong Wang, Kunpeng Liu, Xin Cong, Shuo Wang (possible past Nvidia (United States) affiliation), Zhong Zhang, Yaxi Lu, Zhenghao Liu (possible past Tsinghua University affiliation), Yankai Lin (possible past Tsinghua University affiliation), Zhiyuan Liu (possible past Tsinghua University affiliation), Maosong Sun (possible past Tsinghua University affiliation)
Abstract

Generating deep research reports requires large-scale information acquisition and the synthesis of insight-driven analysis, posing a significant challenge for current language models. Most existing approaches follow a plan-then-write paradigm, whose performance heavily depends on the quality of the initial outline. However, constructing a comprehensive outline itself demands strong reasoning ability, causing current deep research systems to rely almost exclusively on closed-source or online larg...

πŸ“„ AgentCPM-Explore: Realizing Long-Horizon Deep Exploration for Edge-Scale Agents
πŸ—“οΈ Published: 2/6/2026
πŸ”— http://arxiv.org/abs/2602.06485v1
πŸ‘₯ Authors: Haotian Chen, Xin Cong, Shengda Fan, Yuyang Fu, Ziqin Gong, Yaxi Lu, Yishan Li, Boye Niu, Chengjun Pan, Zijun Song, Huadong Wang, Yesai Wu, Yueying Wu, Zihao Xie, Yukun Yan, Zhong Zhang, Yankai Lin (possible past Tsinghua University affiliation), Zhiyuan Liu (possible past Tsinghua University affiliation), Maosong Sun (possible past Tsinghua University affiliation)
Abstract

While Large Language Model (LLM)-based agents have shown remarkable potential for solving complex tasks, existing systems remain heavily reliant on large-scale models, leaving the capabilities of edge-scale models largely underexplored. In this paper, we present the first systematic study on training agentic models at the 4B-parameter scale. We identify three primary bottlenecks hindering the performance of edge-scale models: catastrophic forgetting during Supervised Fine-Tuning (SFT), sensitivi...

πŸ“„ TrajAD: Trajectory Anomaly Detection for Trustworthy LLM Agents
πŸ—“οΈ Published: 2/6/2026
πŸ”— http://arxiv.org/abs/2602.06443v1
πŸ‘₯ Authors: Yibing Liu, Chong Zhang (possible past Tencent (China) affiliation), Zhongyi Han, Hansong Liu, Yong Wang (possible past Baidu (China) affiliation), Yang Yu, Xiaoyan Wang, Yilong Yin
Abstract

We address the problem of runtime trajectory anomaly detection, a critical capability for enabling trustworthy LLM agents. Current safety measures predominantly focus on static input/output filtering. However, we argue that ensuring LLM agents reliability requires auditing the intermediate execution process. In this work, we formulate the task of Trajectory Anomaly Detection. The goal is not merely detection, but precise error localization. This capability is essential for enabling efficient rol...

πŸ“„ Difficulty-Estimated Policy Optimization
πŸ—“οΈ Published: 2/6/2026
πŸ”— http://arxiv.org/abs/2602.06375v1
πŸ‘₯ Authors: Yu Zhao (possible past Tencent (China) affiliation), Fan Jiang (possible past Shanghai Jiao Tong University affiliation), Tianle Liu, Bo Zeng, Yu Liu, Longyue Wang (possible past Tencent (China) affiliation), Weihua Luo
Abstract

Recent advancements in Large Reasoning Models (LRMs), exemplified by DeepSeek-R1, have underscored the potential of scaling inference-time compute through Group Relative Policy Optimization (GRPO). However, GRPO frequently suffers from gradient signal attenuation when encountering problems that are either too trivial or overly complex. In these scenarios, the disappearance of inter-group advantages makes the gradient signal susceptible to noise, thereby jeopardizing convergence stability. While ...

πŸ“„ Revisiting Salient Object Detection from an Observer-Centric Perspective
πŸ—“οΈ Published: 2/6/2026
πŸ”— http://arxiv.org/abs/2602.06369v1
πŸ‘₯ Authors: Fuxi Zhang, Yifan Wang (possible past Stanford University affiliation), Hengrun Zhao, Zhuohan Sun, Changxing Xia, Lijun Wang (possible past Eth Zurich affiliation), Huchuan Lu, Yangrui Shao, Chen Yang (possible past Tencent (China) affiliation), Long Teng
Abstract

Salient object detection is inherently a subjective problem, as observers with different priors may perceive different objects as salient. However, existing methods predominantly formulate it as an objective prediction task with a single groundtruth segmentation map for each image, which renders the problem under-determined and fundamentally ill-posed. To address this issue, we propose Observer-Centric Salient Object Detection (OC-SOD), where salient regions are predicted by considering not only...

πŸ“„ RuleSmith: Multi-Agent LLMs for Automated Game Balancing
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.06232v1
πŸ‘₯ Authors: Ziyao Zeng, Chen Liu, Tianyu Liu, Hao Wang (possible past Tsinghua University affiliation), Xiatao Sun, Fengyu Yang, Xiaofeng Liu (possible past Google (United States) affiliation), Zhiwen Fan
Abstract

Game balancing is a longstanding challenge requiring repeated playtesting, expert intuition, and extensive manual tuning. We introduce RuleSmith, the first framework that achieves automated game balancing by leveraging the reasoning capabilities of multi-agent LLMs. It couples a game engine, multi-agent LLMs self-play, and Bayesian optimization operating over a multi-dimensional rule space. As a proof of concept, we instantiate RuleSmith on CivMini, a simplified civilization-style game containin...

πŸ“„ RISE-Video: Can Video Generators Decode Implicit World Rules?
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05986v1
πŸ‘₯ Authors: Mingxin Liu, Shuran Ma, Shibei Meng, Xiangyu Zhao, Zicheng Zhang, Shaofeng Zhang, Zhihang Zhong, Peixian Chen (possible past Tencent (China) affiliation), Haoyu Cao, Xing Sun (possible past Tencent (China) affiliation), Haodong Duan, Xue Yang
Abstract

While generative video models have achieved remarkable visual fidelity, their capacity to internalize and reason over implicit world rules remains a critical yet under-explored frontier. To bridge this gap, we present RISE-Video, a pioneering reasoning-oriented benchmark for Text-Image-to-Video (TI2V) synthesis that shifts the evaluative focus from surface-level aesthetics to deep cognitive reasoning. RISE-Video comprises 467 meticulously human-annotated samples spanning eight rigorous categorie...

πŸ“„ LSA: Localized Semantic Alignment for Enhancing Temporal Consistency in Traffic Video Generation
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05966v1
πŸ‘₯ Authors: Mirlan Karimov, Teodora Spasojevic, Markus Braun, Julian Wiederer, Vasileios Belagiannis (possible past University Of Oxford affiliation), Marc Pollefeys (possible past Google (United States) affiliation)
Abstract

Controllable video generation has emerged as a versatile tool for autonomous driving, enabling realistic synthesis of traffic scenarios. However, existing methods depend on control signals at inference time to guide the generative model towards temporally consistent generation of dynamic objects, limiting their utility as scalable and generalizable data engines. In this work, we propose Localized Semantic Alignment (LSA), a simple yet effective framework for fine-tuning pre-trained video generat...

πŸ“„ Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05885v2
πŸ‘₯ Authors: Wei Liu (possible past Tsinghua University affiliation), Jiawei Xu, Yingru Li, Longtao Zheng, Tianjian Li, Qian Liu, Junxian He (possible past Carnegie Mellon University affiliation)
Abstract

High-quality kernel is critical for scalable AI systems, and enabling LLMs to generate such code would advance AI development. However, training LLMs for this task requires sufficient data, a robust environment, and the process is often vulnerable to reward hacking and lazy optimization. In these cases, models may hack training rewards and prioritize trivial correctness over meaningful speedup. In this paper, we systematically study reinforcement learning (RL) for kernel generation. We first des...

πŸ“„ DLM-Scope: Mechanistic Interpretability of Diffusion Language Models via Sparse Autoencoders
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05859v1
πŸ‘₯ Authors: Xu Wang, Bingqing Jiang, Yu Wan, Baosong Yang (possible past Tencent (China) affiliation), Lingpeng Kong (possible past Google (United States) affiliation), Difan Zou
Abstract

Sparse autoencoders (SAEs) have become a standard tool for mechanistic interpretability in autoregressive large language models (LLMs), enabling researchers to extract sparse, human-interpretable features and intervene on model behavior. Recently, as diffusion language models (DLMs) have become an increasingly promising alternative to the autoregressive LLMs, it is essential to develop tailored mechanistic interpretability tools for this emerging class of models. In this work, we present DLM-Sco...

πŸ“„ BABE: Biology Arena BEnchmark
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05857v1
πŸ‘₯ Authors: Junting Zhou, Jin Chen, Linfeng Hao, Denghui Cao, Zheyu Wang, Qiguang Chen, Chaoyou Fu, Jiaze Chen, Yuchen Wu (possible past Google (United States) affiliation), Ge Zhang, Mingxuan Wang (possible past Tencent (China) affiliation), Wenhao Huang, Tong Yang (possible past Peking University affiliation)
Abstract

The rapid evolution of large language models (LLMs) has expanded their capabilities from basic dialogue to advanced scientific reasoning. However, existing benchmarks in biology often fail to assess a critical skill required of researchers: the ability to integrate experimental results with contextual knowledge to derive meaningful conclusions. To address this gap, we introduce BABE(Biology Arena BEnchmark), a comprehensive benchmark designed to evaluate the experimental reasoning capabilities o...

πŸ“„ Variational Speculative Decoding: Rethinking Draft Training from Token Likelihood to Sequence Acceptance
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05774v1
πŸ‘₯ Authors: Xiandong Zou, Jianshu Li (possible past National University Of Singapore affiliation), Jing Huang (possible past Meta (United States) affiliation), Pan Zhou
Abstract

Speculative decoding accelerates inference for (M)LLMs, yet a training-decoding discrepancy persists: while existing methods optimize single greedy trajectories, decoding involves verifying and ranking multiple sampled draft paths. We propose Variational Speculative Decoding (VSD), formulating draft training as variational inference over latent proposals (draft paths). VSD maximizes the marginal probability of target-model acceptance, yielding an ELBO that promotes high-quality latent proposals ...

πŸ“„ CSRv2: Unlocking Ultra-Sparse Embeddings
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05735v2
πŸ‘₯ Authors: Lixuan Guo, Yifei Wang, Tiansheng Wen, Yifan Wang (possible past Stanford University affiliation), Aosong Feng, Bo Chen (possible past Tencent (China) affiliation), Stefanie Jegelka, Chenyu You
Abstract

In the era of large foundation models, the quality of embeddings has become a central determinant of downstream task performance and overall system capability. Yet widely used dense embeddings are often extremely high-dimensional, incurring substantial costs in storage, memory, and inference latency. To address these, Contrastive Sparse Representation (CSR) is recently proposed as a promising direction, mapping dense embeddings into high-dimensional but k-sparse vectors, in contrast to compact d...

πŸ“„ CompactRAG: Reducing LLM Calls and Token Overhead in Multi-Hop Question Answering
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05728v1
πŸ‘₯ Authors: Hao Yang (possible past Tencent (China) affiliation), Zhiyu Yang, Xupeng Zhang, Wei Wei (possible past Google (United States) affiliation), Yunjie Zhang, Lin Yang
Abstract

Retrieval-augmented generation (RAG) has become a key paradigm for knowledge-intensive question answering. However, existing multi-hop RAG systems remain inefficient, as they alternate between retrieval and reasoning at each step, resulting in repeated LLM calls, high token consumption, and unstable entity grounding across hops. We propose CompactRAG, a simple yet effective framework that decouples offline corpus restructuring from online reasoning. In the offline stage, an LLM reads the corpu...

πŸ“„ Mining Generalizable Activation Functions
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.05688v1
πŸ‘₯ Authors: Alex Vitvitskyi, Michael Boratko, Matej Grcic, Razvan Pascanu (possible past Google (United States) affiliation), Deep Shah, Petar VeličkoviΔ‡ (possible past University Of Cambridge affiliation)
Abstract

The choice of activation function is an active area of research, with different proposals aimed at improving optimization, while maintaining expressivity. Additionally, the activation function can significantly alter the implicit inductive bias of the architecture, controlling its non-linear behavior. In this paper, in line with previous work, we argue that evolutionary search provides a useful framework for finding new activation functions, while we also make two novel observations. The first i...

πŸ“„ When RL Meets Adaptive Speculative Training: A Unified Training-Serving System
πŸ—“οΈ Published: 2/6/2026
πŸ”— http://arxiv.org/abs/2602.06932v1
πŸ‘₯ Authors: Junxiong Wang, Fengxiang Bie, Jisen Li, Zhongzhu Zhou, Zelei Shao, Yubo Wang, Yinghui Liu, Qingyang Wu, Avner May, Sri Yanamandra, Yineng Zhang, Ce Zhang (possible past Eth Zurich affiliation), Tri Dao, Percy Liang (possible past Stanford University affiliation), Ben Athiwaratkun, Shuaiwen Leon Song (possible past Microsoft (United States) affiliation), Chenfeng Xu (possible past University Of California, Berkeley affiliation), Xiaoxia Wu
Abstract

Speculative decoding can significantly accelerate LLM serving, yet most deployments today disentangle speculator training from serving, treating speculator training as a standalone offline modeling problem. We show that this decoupled formulation introduces substantial deployment and adaptation lag: (1) high time-to-serve, since a speculator must be trained offline for a considerable period before deployment; (2) delayed utility feedback, since the true end-to-end decoding speedup is only known ...

πŸ“„ DiTS: Multimodal Diffusion Transformers Are Time Series Forecasters
πŸ—“οΈ Published: 2/6/2026
πŸ”— http://arxiv.org/abs/2602.06597v1
πŸ‘₯ Authors: Haoran Zhang, Haixuan Liu, Yong Liu, Yunzhong Qiu, Yuxuan Wang (possible past Google (United States) affiliation), Jianmin Wang (possible past Tsinghua University affiliation), Mingsheng Long (possible past Tsinghua University affiliation)
Abstract

While generative modeling on time series facilitates more capable and flexible probabilistic forecasting, existing generative time series models do not address the multi-dimensional properties of time series data well. The prevalent architecture of Diffusion Transformers (DiT), which relies on simplistic conditioning controls and a single-stream Transformer backbone, tends to underutilize cross-variate dependencies in covariate-aware forecasting. Inspired by Multimodal Diffusion Transformers tha...

πŸ“„ Refining the Information Bottleneck via Adversarial Information Separation
πŸ—“οΈ Published: 2/6/2026
πŸ”— http://arxiv.org/abs/2602.06549v1
πŸ‘₯ Authors: Shuai Ning, Zhenpeng Wang, Lin Wang, Bing Chen, Shuangrong Liu, Xu Wu, Jin Zhou (possible past Google (United States) affiliation), Bo Yang (possible past Tencent (China) affiliation)
Abstract

Generalizing from limited data is particularly critical for models in domains such as material science, where task-relevant features in experimental datasets are often heavily confounded by measurement noise and experimental artifacts. Standard regularization techniques fail to precisely separate meaningful features from noise, while existing adversarial adaptation methods are limited by their reliance on explicit separation labels. To address this challenge, we propose the Adversarial Informati...

πŸ“„ NECromancer: Breathing Life into Skeletons via BVH Animation
πŸ—“οΈ Published: 2/6/2026
πŸ”— http://arxiv.org/abs/2602.06548v1
πŸ‘₯ Authors: Mingxi Xu, Qi Wang (possible past Tsinghua University affiliation), Zhengyu Wen, Phong Dao Thien, Zhengyu Li, Ning Zhang (possible past University Of California, Berkeley affiliation), Xiaoyu He, Wei Zhao (possible past Tencent (China) affiliation), Kehong Gong, Mingyuan Zhang
Abstract

Motion tokenization is a key component of generalizable motion models, yet most existing approaches are restricted to species-specific skeletons, limiting their applicability across diverse morphologies. We propose NECromancer (NEC), a universal motion tokenizer that operates directly on arbitrary BVH skeletons. NEC consists of three components: (1) an Ontology-aware Skeletal Graph Encoder (OwO) that encodes structural priors from BVH files, including joint semantics, rest-pose offsets, and skel...

πŸ“„ SCONE: A Practical, Constraint-Aware Plug-in for Latent Encoding in Learned DNA Storage
πŸ—“οΈ Published: 2/5/2026
πŸ”— http://arxiv.org/abs/2602.06157v1
πŸ‘₯ Authors: Cihan Ruan, Lebin Zhou, Rongduo Han, Linyi Han, Bingqing Zhao, Chenchen Zhu, Wei Jiang (possible past Apple (United States) affiliation), Wei Wang (possible past University Of Oxford affiliation), Nam Ling
Abstract

DNA storage has matured from concept to practical stage, yet its integration with neural compression pipelines remains inefficient. Early DNA encoders applied redundancy-heavy constraint layers atop raw binary data - workable but primitive. Recent neural codecs compress data into learned latent representations with rich statistical structure, yet still convert these latents to DNA via naive binary-to-quaternary transcoding, discarding the entropy model's optimization. This mismatch undermines co...

*Notable papers are those with at least two authors from a "big" AI/ML lab.