πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ Reverso: Efficient Time Series Foundation Models for Zero-shot Forecasting
πŸ—“οΈ Published: 2/19/2026
πŸ”— http://arxiv.org/abs/2602.17634v1
πŸ‘₯ Authors: Xinghong Fu, Yanhong Li (possible past Baidu (China) affiliation), Georgios Papaioannou, Yoon Kim (possible past University Of Oxford affiliation)
Abstract

Learning time series foundation models has been shown to be a promising approach for zero-shot time series forecasting across diverse time series domains. Insofar as scaling has been a critical driver of performance of foundation models in other modalities such as language and vision, much recent work on time series foundation modeling has focused on scaling. This has resulted in time series foundation models with hundreds of millions of parameters that are, while performant, inefficient and exp...

πŸ“„ AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games
πŸ—“οΈ Published: 2/19/2026
πŸ”— http://arxiv.org/abs/2602.17594v1
πŸ‘₯ Authors: Lance Ying, Ryan Truong, Prafull Sharma, Kaiya Ivy Zhao, Nathan Cloos, Kelsey R. Allen, Thomas L. Griffiths (possible past University Of California, Berkeley affiliation), Katherine M. Collins, JosΓ© HernΓ‘ndez-Orallo, Phillip Isola (possible past University Of California, Berkeley affiliation), Samuel J. Gershman, Joshua B. Tenenbaum (possible past Massachusetts Institute Of Technology affiliation)
Abstract

Rigorously evaluating machine intelligence against the broad spectrum of human general intelligence has become increasingly important and challenging in this era of rapid technological advance. Conventional AI benchmarks typically assess only narrow capabilities in a limited range of human activity. Most are also static, quickly saturating as developers explicitly or implicitly optimize for them. We propose that a more promising way to evaluate human-like general intelligence in AI systems is th...

πŸ“„ Improving LLM-based Recommendation with Self-Hard Negatives from Intermediate Layers
πŸ—“οΈ Published: 2/19/2026
πŸ”— http://arxiv.org/abs/2602.17410v1
πŸ‘₯ Authors: Bingqian Li, Bowen Zheng, Xiaolei Wang, Long Zhang, Jinpeng Wang (possible past Tencent (China) affiliation), Sheng Chen, Wayne Xin Zhao (possible past Baidu (China) affiliation), Ji-Rong Wen
Abstract

Large language models (LLMs) have shown great promise in recommender systems, where supervised fine-tuning (SFT) is commonly used for adaptation. Subsequent studies further introduce preference learning to incorporate negative samples into the training process. However, existing methods rely on sequence-level, offline-generated negatives, making them less discriminative and informative when adapting LLMs to recommendation tasks with large negative item spaces. To address these challenges, we pro...

πŸ“„ Towards Cross-lingual Values Assessment: A Consensus-Pluralism Perspective
πŸ—“οΈ Published: 2/19/2026
πŸ”— http://arxiv.org/abs/2602.17283v1
πŸ‘₯ Authors: Yukun Chen, Xinyu Zhang (possible past Baidu (China) affiliation), Jialong Tang, Yu Wan, Baosong Yang (possible past Tencent (China) affiliation), Yiming Li (possible past Tsinghua University affiliation), Zhan Qin, Kui Ren
Abstract

While large language models (LLMs) have become pivotal to content safety, current evaluation paradigms primarily focus on detecting explicit harms (e.g., violence or hate speech), neglecting the subtler value dimensions conveyed in digital content. To bridge this gap, we introduce X-Value, a novel Cross-lingual Values Assessment Benchmark designed to evaluate LLMs' ability to assess deep-level values of content from a global perspective. X-Value consists of more than 5,000 QA pairs across 18 lan...

πŸ“„ Deeper detection limits in astronomical imaging using self-supervised spatiotemporal denoising
πŸ—“οΈ Published: 2/19/2026
πŸ”— http://arxiv.org/abs/2602.17205v1
πŸ‘₯ Authors: Yuduo Guo, Hao Zhang (possible past Tencent (China) affiliation), Mingyu Li, Fujiang Yu, Yunjing Wu, Yuhan Hao, Song Huang, Yongming Liang, Xiaojing Lin, Xinyang Li, Jiamin Wu, Zheng Cai, Qionghai Dai (possible past Tsinghua University affiliation)
Abstract

The detection limit of astronomical imaging observations is limited by several noise sources. Some of that noise is correlated between neighbouring image pixels and exposures, so in principle could be learned and corrected. We present an astronomical self-supervised transformer-based denoising algorithm (ASTERIS), that integrates spatiotemporal information across multiple exposures. Benchmarking on mock data indicates that ASTERIS improves detection limits by 1.0 magnitude at 90% completeness an...

πŸ“„ LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation
πŸ—“οΈ Published: 2/18/2026
πŸ”— http://arxiv.org/abs/2602.16953v1
πŸ‘₯ Authors: Hejia Zhang, Zhongming Yu, Chia-Tung Ho, Haoxing Ren (possible past Nvidia (United States) affiliation), Brucek Khailany (possible past Nvidia (United States) affiliation), Jishen Zhao
Abstract

Execution-aware LLM agents offer a promising paradigm for learning from tool feedback, but such feedback is often expensive and slow to obtain, making online reinforcement learning (RL) impractical. High-coverage hardware verification exemplifies this challenge due to its reliance on industrial simulators and non-differentiable execution signals. We propose LLM4Cov, an offline agent-learning framework that models verification as memoryless state transitions guided by deterministic evaluators. Bu...

πŸ“„ Discovering Multiagent Learning Algorithms with Large Language Models
πŸ—“οΈ Published: 2/18/2026
πŸ”— http://arxiv.org/abs/2602.16928v1
πŸ‘₯ Authors: Zun Li, John Schultz, Daniel Hennes (possible past Deepmind (United Kingdom) affiliation), Marc Lanctot (possible past Google (United States) affiliation)
Abstract

Much of the advancement of Multi-Agent Reinforcement Learning (MARL) in imperfect-information games has historically depended on manual iterative refinement of baselines. While foundational families like Counterfactual Regret Minimization (CFR) and Policy Space Response Oracles (PSRO) rest on solid theoretical ground, the design of their most effective variants often relies on human intuition to navigate a vast algorithmic design space. In this work, we propose the use of AlphaEvolve, an evoluti...

πŸ“„ SimToolReal: An Object-Centric Policy for Zero-Shot Dexterous Tool Manipulation
πŸ—“οΈ Published: 2/18/2026
πŸ”— http://arxiv.org/abs/2602.16863v1
πŸ‘₯ Authors: Kushal Kedia, Tyler Ga Wei Lum, Jeannette Bohg (possible past Stanford University affiliation), C. Karen Liu (possible past Stanford University affiliation)
Abstract

The ability to manipulate tools significantly expands the set of tasks a robot can perform. Yet, tool manipulation represents a challenging class of dexterity, requiring grasping thin objects, in-hand object rotations, and forceful interactions. Since collecting teleoperation data for these behaviors is challenging, sim-to-real reinforcement learning (RL) is a promising alternative. However, prior approaches typically require substantial engineering effort to model objects and tune reward functi...

πŸ“„ Large-scale online deanonymization with LLMs
πŸ—“οΈ Published: 2/18/2026
πŸ”— http://arxiv.org/abs/2602.16800v1
πŸ‘₯ Authors: Simon Lermen, Daniel Paleka, Joshua Swanson, Michael Aerni, Nicholas Carlini (possible past Google (United States) affiliation), Florian TramΓ¨r (possible past Stanford University affiliation)
Abstract

We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that...

πŸ“„ AI-Driven Structure Refinement of X-ray Diffraction
πŸ—“οΈ Published: 2/18/2026
πŸ”— http://arxiv.org/abs/2602.16372v1
πŸ‘₯ Authors: Bin Cao (possible past Microsoft (United States) affiliation), Qian Zhang (possible past University Of Washington affiliation), Zhenjie Feng, Taolue Zhang, Jiaqiang Huang, Lu-Tao Weng, Tong-Yi Zhang
Abstract

Artificial intelligence can rapidly propose candidate phases and structures from X-ray diffraction (XRD), but these hypotheses often fail in downstream refinement because peak intensities cannot be stably assigned under severe overlap and diffraction consistency is enforced only weakly. Here we introduce WPEM, a physics-constrained whole-pattern decomposition and refinement workflow that turns Bragg's law into an explicit constraint within a batch expectation--maximization framework. WPEM models...

πŸ“„ Multi-agent cooperation through in-context co-player inference
πŸ—“οΈ Published: 2/18/2026
πŸ”— http://arxiv.org/abs/2602.16301v1
πŸ‘₯ Authors: Marissa A. Weis, Maciej WoΕ‚czyk, Rajai Nasser, Rif A. Saurous (possible past Google (United States) affiliation), Blaise AgΓΌera Y Arcas (possible past Google (United States) affiliation), JoΓ£o Sacramento (possible past Eth Zurich affiliation), Alexander Meulemans
Abstract

Achieving cooperation among self-interested agents remains a fundamental challenge in multi-agent reinforcement learning. Recent work showed that mutual cooperation can be induced between "learning-aware" agents that account for and shape the learning dynamics of their co-players. However, existing approaches typically rely on hardcoded, often inconsistent, assumptions about co-player learning rules or enforce a strict separation between "naive learners" updating on fast timescales and "meta-lea...

πŸ“„ A Theoretical Framework for Modular Learning of Robust Generative Models
πŸ—“οΈ Published: 2/19/2026
πŸ”— http://arxiv.org/abs/2602.17554v1
πŸ‘₯ Authors: Corinna Cortes (possible past Google (United States) affiliation), Mehryar Mohri (possible past Google (United States) affiliation), Yutao Zhong
Abstract

Training large-scale generative models is resource-intensive and relies heavily on heuristic dataset weighting. We address two fundamental questions: Can we train Large Language Models (LLMs) modularly-combining small, domain-specific experts to match monolithic performance-and can we do so robustly for any data mixture, eliminating heuristic tuning? We present a theoretical framework for modular generative modeling where a set of pre-trained experts are combined via a gating mechanism. We defin...

πŸ“„ Retrospective In-Context Learning for Temporal Credit Assignment with Large Language Models
πŸ—“οΈ Published: 2/19/2026
πŸ”— http://arxiv.org/abs/2602.17497v1
πŸ‘₯ Authors: Wen-Tse Chen, Jiayu Chen, Fahim Tajwar, Hao Zhu (possible past Tsinghua University affiliation), Xintong Duan, Ruslan Salakhutdinov (possible past University Of Toronto affiliation), Jeff Schneider
Abstract

Learning from self-sampled data and sparse environmental feedback remains a fundamental challenge in training self-evolving agents. Temporal credit assignment mitigates this issue by transforming sparse feedback into dense supervision signals. However, previous approaches typically depend on learning task-specific value functions for credit assignment, which suffer from poor sample efficiency and limited generalization. In this work, we propose to leverage pretrained knowledge from large languag...

πŸ“„ Unified Latents (UL): How to train your latents
πŸ—“οΈ Published: 2/19/2026
πŸ”— http://arxiv.org/abs/2602.17270v1
πŸ‘₯ Authors: Jonathan Heek (possible past Google (United States) affiliation), Emiel Hoogeboom, Thomas Mensink, Tim Salimans (possible past Openai (United States) affiliation)
Abstract

We present Unified Latents (UL), a framework for learning latent representations that are jointly regularized by a diffusion prior and decoded by a diffusion model. By linking the encoder's output noise to the prior's minimum noise level, we obtain a simple training objective that provides a tight upper bound on the latent bitrate. On ImageNet-512, our approach achieves competitive FID of 1.4, with high reconstruction quality (PSNR) while requiring fewer training FLOPs than models trained on Sta...

πŸ“„ Anti-causal domain generalization: Leveraging unlabeled data
πŸ—“οΈ Published: 2/19/2026
πŸ”— http://arxiv.org/abs/2602.17187v1
πŸ‘₯ Authors: Sorawit Saengkyongam, Juan L. Gamella, Andrew C. Miller (possible past Google (United States) affiliation), Jonas Peters (possible past Eth Zurich affiliation), Nicolai Meinshausen, Christina Heinze-Deml
Abstract

The problem of domain generalization concerns learning predictive models that are robust to distribution shifts when deployed in new, previously unseen environments. Existing methods typically require labeled data from multiple training environments, limiting their applicability when labeled data are scarce. In this work, we study domain generalization in an anti-causal setting, where the outcome causes the observed covariates. Under this structure, environment perturbations that affect the cova...

πŸ“„ MeGU: Machine-Guided Unlearning with Target Feature Disentanglement
πŸ—“οΈ Published: 2/19/2026
πŸ”— http://arxiv.org/abs/2602.17088v1
πŸ‘₯ Authors: Haoyu Wang (possible past Tencent (China) affiliation), Zhuo Huang, Xiaolong Wang (possible past Carnegie Mellon University affiliation), Bo Han, Zhiwei Lin, Tongliang Liu
Abstract

The growing concern over training data privacy has elevated the "Right to be Forgotten" into a critical requirement, thereby raising the demand for effective Machine Unlearning. However, existing unlearning approaches commonly suffer from a fundamental trade-off: aggressively erasing the influence of target data often degrades model utility on retained data, while conservative strategies leave residual target information intact. In this work, the intrinsic representation properties learned durin...

πŸ“„ Multi-Probe Zero Collision Hash (MPZCH): Mitigating Embedding Collisions and Enhancing Model Freshness in Large-Scale Recommenders
πŸ—“οΈ Published: 2/19/2026
πŸ”— http://arxiv.org/abs/2602.17050v1
πŸ‘₯ Authors: Ziliang Zhao, Bi Xue, Emma Lin, Mengjiao Zhou, Kaustubh Vartak, Shakhzod Ali-Zade, Carson Lu, Tao Li (possible past Baidu (China) affiliation), Bin Kuang, Rui Jian, Bin Wen, Dennis Van Der Staay, Yixin Bao, Eddy Li, Chao Deng, Songbin Liu, Qifan Wang (possible past Google (United States) affiliation), Kai Ren
Abstract

Embedding tables are critical components of large-scale recommendation systems, facilitating the efficient mapping of high-cardinality categorical features into dense vector representations. However, as the volume of unique IDs expands, traditional hash-based indexing methods suffer from collisions that degrade model performance and personalization quality. We present Multi-Probe Zero Collision Hash (MPZCH), a novel indexing mechanism based on linear probing that effectively mitigates embedding ...

πŸ“„ Characterizing the Predictive Impact of Modalities with Supervised Latent-Variable Modeling
πŸ—“οΈ Published: 2/19/2026
πŸ”— http://arxiv.org/abs/2602.16979v1
πŸ‘₯ Authors: Divyam Madaan, Sumit Chopra (possible past Meta (United States) affiliation), Kyunghyun Cho (possible past Meta (United States) affiliation)
Abstract

Despite the recent success of Multimodal Large Language Models (MLLMs), existing approaches predominantly assume the availability of multiple modalities during training and inference. In practice, multimodal data is often incomplete because modalities may be missing, collected asynchronously, or available only for a subset of examples. In this work, we propose PRIMO, a supervised latent-variable imputation model that quantifies the predictive impact of any missing modality within the multimodal ...

πŸ“„ Training Large Reasoning Models Efficiently via Progressive Thought Encoding
πŸ—“οΈ Published: 2/18/2026
πŸ”— http://arxiv.org/abs/2602.16839v1
πŸ‘₯ Authors: Zeliang Zhang, Xiaodong Liu, Hao Cheng (possible past Tencent (China) affiliation), Hao Sun, Chenliang Xu, Jianfeng Gao (possible past Microsoft (United States) affiliation)
Abstract

Large reasoning models (LRMs) excel on complex problems but face a critical barrier to efficiency: reinforcement learning (RL) training requires long rollouts for outcome-based rewards, where autoregressive decoding dominates time and memory usage. While sliding-window cache strategies can bound memory, they disrupt long-context reasoning and degrade performance. We introduce Progressive Thought Encoding, a parameter-efficient fine-tuning method that enables LRMs to reason effectively under fixe...

πŸ“„ Retrieval-Augmented Foundation Models for Matched Molecular Pair Transformations to Recapitulate Medicinal Chemistry Intuition
πŸ—“οΈ Published: 2/18/2026
πŸ”— http://arxiv.org/abs/2602.16684v1
πŸ‘₯ Authors: Bo Pan, Peter Zhiping Zhang, Hao-Wei Pang, Alex Zhu, Xiang Yu (possible past University Of Washington affiliation), Liying Zhang, Liang Zhao (possible past Baidu (China) affiliation)
Abstract

Matched molecular pairs (MMPs) capture the local chemical edits that medicinal chemists routinely use to design analogs, but existing ML approaches either operate at the whole-molecule level with limited edit controllability or learn MMP-style edits from restricted settings and small models. We propose a variable-to-variable formulation of analog generation and train a foundation model on large-scale MMP transformations (MMPTs) to generate diverse variables conditioned on an input variable. To e...

*Notable papers are those with at least two authors from a "big" AI/ML lab.