πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ Towards Streaming Target Speaker Extraction via Chunk-wise Interleaved Splicing of Autoregressive Language Model
πŸ—“οΈ Published: 4/21/2026
πŸ”— http://arxiv.org/abs/2604.19635v1
πŸ‘₯ Authors: Shuhai Peng, Hui Lu, Jinjiang Liu, Liyang Chen, Guiping Zhong, Jiakui Li, Huimeng Wang, Haiyun Li, Liang Cao, Shiyin Kang (possible past Tencent (China) affiliation), Zhiyong Wu (possible past Tsinghua University affiliation)
Abstract

While generative models have set new benchmarks for Target Speaker Extraction (TSE), their inherent reliance on global context precludes deployment in real-time applications. Direct adaptation to streaming scenarios often leads to catastrophic inference performance degradation due to the severe mismatch between training and streaming inference. To bridge this gap, we present the first autoregressive (AR) models tailored for streaming TSE. Our approach introduces a Chunk-wise Interleaved Splicing...

πŸ“„ EgoSelf: From Memory to Personalized Egocentric Assistant
πŸ—“οΈ Published: 4/21/2026
πŸ”— http://arxiv.org/abs/2604.19564v1
πŸ‘₯ Authors: Yanshuo Wang, Yuan Xu, Xuesong Li, Jie Hong, Yizhou Wang (possible past Peking University affiliation), Chang Wen Chen, Wentao Zhu (possible past Nvidia (United States) affiliation)
Abstract

Egocentric assistants often rely on first-person view data to capture user behavior and context for personalized services. Since different users exhibit distinct habits, preferences, and routines, such personalization is essential for truly effective assistance. However, effectively integrating long-term user data for personalization remains a key challenge. To address this, we introduce EgoSelf, a system that includes a graph-based interaction memory constructed from past observations and a ded...

πŸ“„ Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment
πŸ—“οΈ Published: 4/21/2026
πŸ”— http://arxiv.org/abs/2604.19548v1
πŸ‘₯ Authors: Bobo Li, Rui Wu (possible past Google (United States) affiliation), Zibo Ji, Meishan Zhang, Hao Fei, Min Zhang (possible past Tsinghua University affiliation), Mong-Li Lee, Wynne Hsu (possible past National University Of Singapore affiliation)
Abstract

Large Language Model agents have rapidly evolved from static text generators into dynamic systems capable of executing complex autonomous workflows. To enhance reliability, multi-agent frameworks assigning specialized roles are increasingly adopted to enable self-reflection and mutual auditing. While such role-playing effectively leverages domain expert knowledge, we find it simultaneously induces a human-like cognitive bias known as Actor-Observer Asymmetry (AOA). Specifically, an agent acting ...

πŸ“„ DT2IT-MRM: Debiased Preference Construction and Iterative Training for Multimodal Reward Modeling
πŸ—“οΈ Published: 4/21/2026
πŸ”— http://arxiv.org/abs/2604.19544v1
πŸ‘₯ Authors: Zhihong Zhang, Jie Zhao (possible past Baidu (China) affiliation), Xiaojian Huang, Jin Xu (possible past Tencent (China) affiliation), Zhuodong Luo, Xin Liu, Jiansheng Wei, Xuejin Chen
Abstract

Multimodal reward models (MRMs) play a crucial role in aligning Multimodal Large Language Models (MLLMs) with human preferences. Training a good MRM requires high-quality multimodal preference data. However, existing preference datasets face three key challenges: lack of granularity in preference strength, textual style bias, and unreliable preference signals. Besides, existing open-source multimodal preference datasets suffer from substantial noise, yet there is a lack of effective and scalable...

πŸ“„ From Experience to Skill: Multi-Agent Generative Engine Optimization via Reusable Strategy Learning
πŸ—“οΈ Published: 4/21/2026
πŸ”— http://arxiv.org/abs/2604.19516v1
πŸ‘₯ Authors: Beining Wu, Fuyou Mao, Jiong Lin, Cheng Yang (possible past Tsinghua University affiliation), Jiaxuan Lu, Yifu Guo, Siyu Zhang, Yifan Wu (possible past Carnegie Mellon University affiliation), Ying Huang, Fu Li (possible past Baidu (China) affiliation)
Abstract

Generative engines (GEs) are reshaping information access by replacing ranked links with citation-grounded answers, yet current Generative Engine Optimization (GEO) methods optimize each instance in isolation, unable to accumulate or transfer effective strategies across tasks and engines. We reframe GEO as a strategy learning problem and propose MAGEO, a multi-agent framework in which coordinated planning, editing, and fidelity-aware evaluation serve as the execution layer, while validated editi...

πŸ“„ LASER: Learning Active Sensing for Continuum Field Reconstruction
πŸ—“οΈ Published: 4/21/2026
πŸ”— http://arxiv.org/abs/2604.19355v1
πŸ‘₯ Authors: Huayu Deng, Jinghui Zhong, Xiangming Zhu, Yunbo Wang (possible past Tsinghua University affiliation), Xiaokang Yang (possible past Shanghai Jiao Tong University affiliation)
Abstract

High-fidelity measurements of continuum physical fields are essential for scientific discovery and engineering design but remain challenging under sparse and constrained sensing. Conventional reconstruction methods typically rely on fixed sensor layouts, which cannot adapt to evolving physical states. We propose LASER, a unified, closed-loop framework that formulates active sensing as a Partially Observable Markov Decision Process (POMDP). At its core, LASER employs a continuum field latent worl...

πŸ“„ Evaluation-driven Scaling for Scientific Discovery
πŸ—“οΈ Published: 4/21/2026
πŸ”— http://arxiv.org/abs/2604.19341v1
πŸ‘₯ Authors: Haotian Ye (possible past Peking University affiliation), Haowei Lin, Jingyi Tang, Yizhen Luo (possible past Tsinghua University affiliation), Caiyin Yang, Chang Su, Rahul Thapa, Rui Yang, Ruihua Liu, Zeyu Li (possible past Peking University affiliation), Chong Gao, Dachao Ding, Guangrong He, Miaolei Zhang, Lina Sun, Wenyang Wang, Yuchen Zhong, Zhuohao Shen, Di He, Jianzhu Ma, Stefano Ermon (possible past Stanford University affiliation), Tongyang Li, Xiaowen Chu, James Zou, Yuzhi Xu
Abstract

Language models are increasingly used in scientific discovery to generate hypotheses, propose candidate solutions, implement systems, and iteratively refine them. At the core of these trial-and-error loops lies evaluation: the process of obtaining feedback on candidate solutions via verifiers, simulators, or task-specific scoring functions. While prior work has highlighted the importance of evaluation, it has not explicitly formulated the problem of how evaluation-driven discovery loops can be s...

πŸ“„ Location Not Found: Exposing Implicit Local and Global Biases in Multilingual LLMs
πŸ—“οΈ Published: 4/21/2026
πŸ”— http://arxiv.org/abs/2604.19292v1
πŸ‘₯ Authors: Guy Mor-Lan, Omer Goldman, Matan Eyal, Adi Mayrav Gilady, Sivan Eiger, Idan Szpektor (possible past Google (United States) affiliation), Avinatan Hassidim (possible past Google (United States) affiliation), Yossi Matias (possible past Google (United States) affiliation), Reut Tsarfaty
Abstract

Multilingual large language models (LLMs) have minimized the fluency gap between languages. This advancement, however, exposes models to the risk of biased behavior, as knowledge and norms may propagate across languages. In this work, we aim to quantify models' inter- and intra-lingual biases, via their ability to answer locale-ambiguous questions. To this end, we present LocQA, a test set containing 2,156 questions in 12 languages, referring to various locale-dependent facts such as laws, dates...

πŸ“„ CulturALL: Benchmarking Multilingual and Multicultural Competence of LLMs on Grounded Tasks
πŸ—“οΈ Published: 4/21/2026
πŸ”— http://arxiv.org/abs/2604.19262v1
πŸ‘₯ Authors: Peiqin Lin, Chenyang Lyu, Wenjiang Luo, Haotian Ye (possible past Peking University affiliation), Md Mehrab Hossain, Chunlan Ma, Shaoxiong Ji, Younes Samih, Bo Zeng, Fan Jiang (possible past Shanghai Jiao Tong University affiliation), Yuanbin Cao, Dilda Duisenbek, Adrian Neo Sau Xun, Daria Pozdniakova, Liubou Misevich, Nevena MarinkoviΔ‡, Ngoc Gia Linh Nguyen, Thi Khanh Linh Do, Sarakmatak Sophy, Baotian Hu, Guanhua Chen, Gongbo Tang, Alham Fikri Aji, Longyue Wang (possible past Tencent (China) affiliation), Weihua Luo
Abstract

Large language models (LLMs) are now deployed worldwide, inspiring a surge of benchmarks that measure their multilingual and multicultural abilities. However, these benchmarks prioritize generic language understanding or superficial cultural trivia, leaving the evaluation of grounded tasks -- where models must reason within real-world, context-rich scenarios -- largely unaddressed. To fill this gap, we present CulturALL, a comprehensive and challenging benchmark to assess LLMs' multilingual and ...

πŸ“„ How Do Answer Tokens Read Reasoning Traces? Self-Reading Patterns in Thinking LLMs for Quantitative Reasoning
πŸ—“οΈ Published: 4/21/2026
πŸ”— http://arxiv.org/abs/2604.19149v1
πŸ‘₯ Authors: Haoyang Chen, Yi Liu (possible past Google (United States) affiliation), Jianzhi Shao, Tao Zhang (possible past Nvidia (United States) affiliation), Chengfu Huo, Wei Hu
Abstract

Thinking LLMs produce reasoning traces before answering. Prior activation steering work mainly targets on shaping these traces. It remains less understood how answer tokens actually read and integrate the reasoning to produce reliable outcomes. Focusing on quantitative reasoning, we analyze the answer-to-reasoning attention and observe a benign self-reading pattern aligned with correctness, characterized by a forward drift of the reading focus along the reasoning trace and a persistent concentra...

πŸ“„ SAHM: A Benchmark for Arabic Financial and Shari'ah-Compliant Reasoning
πŸ—“οΈ Published: 4/21/2026
πŸ”— http://arxiv.org/abs/2604.19098v1
πŸ‘₯ Authors: Rania Elbadry, Sarfraz Ahmad, Ahmed Heakl, Dani Bouch, Momina Ahsan, Muhra Almahri, Marwa Elsaid Khalil, Yuxia Wang, Salem Lahlou, Sophia Ananiadou, Veselin Stoyanov (possible past Meta (United States) affiliation), Jimin Huang, Xueqing Peng, Preslav Nakov (possible past Tencent (China) affiliation), Zhuohan Xie
Abstract

English financial NLP has progressed rapidly through benchmarks for sentiment, document understanding, and financial question answering, while Arabic financial NLP remains comparatively under-explored despite strong practical demand for trustworthy finance and Islamic-finance assistants. We introduce SAHM, a document-grounded benchmark and instruction-tuning dataset for Arabic financial NLP and Shari'ah-compliant reasoning. SAHM contains 14,380 expert-verified instances spanning seven tasks: AAO...

πŸ“„ RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation
πŸ—“οΈ Published: 4/21/2026
πŸ”— http://arxiv.org/abs/2604.19092v1
πŸ‘₯ Authors: Feng Jiang, Yang Chen (possible past Tencent (China) affiliation), Kyle Xu, Yuchen Liu, Haifeng Wang (possible past Google (United States) affiliation), Zhenhao Shen, Jasper Lu, Shengze Huang, Yuanfei Wang, Chen Xie, Ruihai Wu
Abstract

Recent advances in large-scale video world models have enabled increasingly realistic future prediction, raising the prospect of leveraging imagined videos for robot learning. However, visual realism does not imply physical plausibility, and behaviors inferred from generated videos may violate dynamics and fail when executed by embodied agents. Existing benchmarks begin to incorporate notions of physical plausibility, but they largely remain perception- or diagnostic-oriented and do not systemat...

πŸ“„ ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety
πŸ—“οΈ Published: 4/21/2026
πŸ”— http://arxiv.org/abs/2604.19083v1
πŸ‘₯ Authors: Kun Wang, Cheng Qian, Miao Yu, Lilan Peng, Liang Lin, Jiaming Zhang, Tianyu Zhang, Yu Cheng (possible past National University Of Singapore affiliation), Yang Wang (possible past Baidu (China) affiliation)
Abstract

Multimodal Large Language Models (MLLMs) have achieved remarkable success in cross-modal understanding and generation, yet their deployment is threatened by critical safety vulnerabilities. While prior works have demonstrated the feasibility of backdoors in MLLMs via fine-tuning data poisoning to manipulate inference, the underlying mechanisms of backdoor attacks remain opaque, complicating the understanding and mitigation. To bridge this gap, we propose ProjLens, an interpretability framework d...

πŸ“„ Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization
πŸ—“οΈ Published: 4/21/2026
πŸ”— http://arxiv.org/abs/2604.19079v1
πŸ‘₯ Authors: Andrei Andrusenko, Vladimir Bataev, Lilit Grigoryan, Nune Tadevosyan, Vitaly Lavrukhin (possible past Nvidia (United States) affiliation), Boris Ginsburg (possible past Nvidia (United States) affiliation)
Abstract

Unification of automatic speech recognition (ASR) systems reduces development and maintenance costs, but training a single model to perform well in both offline and low-latency streaming settings remains challenging. We present a Unified ASR framework for Transducer (RNNT) training that supports both offline and streaming decoding within a single model, using chunk-limited attention with right context and dynamic chunked convolutions. To further close the gap between offline and streaming perfor...

πŸ“„ FedProxy: Federated Fine-Tuning of LLMs via Proxy SLMs and Heterogeneity-Aware Fusion
πŸ—“οΈ Published: 4/21/2026
πŸ”— http://arxiv.org/abs/2604.19015v1
πŸ‘₯ Authors: Tao Fan (possible past Tencent (China) affiliation), Guoqiang Ma, Yuanfeng Song, Lixin Fan, Kai Chen (possible past Shanghai Jiao Tong University affiliation), Qiang Yang
Abstract

Federated fine-tuning of Large Language Models (LLMs) is obstructed by a trilemma of challenges: protecting LLMs intellectual property (IP), ensuring client privacy, and mitigating performance loss on heterogeneous data. Existing methods like Offsite-Tuning (OT) secure the LLMs IP by having clients train only lightweight adapters, yet our analysis reveals they suffer from a fundamental performance bottleneck, leaving a significant gap compared to centralized training. To bridge this gap, we intr...

πŸ“„ $R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction
πŸ—“οΈ Published: 4/21/2026
πŸ”— http://arxiv.org/abs/2604.18995v1
πŸ‘₯ Authors: Zhenbang Du, Kejing Xia, Xinrui Zhong, Yonggan Fu, Nicolai Oswald, Binfei Ji, Brucek Khailany (possible past Nvidia (United States) affiliation), Pavlo Molchanov (possible past Nvidia (United States) affiliation), Yingyan Lin
Abstract

Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to autoregressive generation by enabling parallel token prediction. However, practical dLLM decoding still suffers from high inference latency, which limits deployment. In this work, we observe that a substantial part of this inefficiency comes from recurring redundancy in the decoding process, including spatial redundancy caused by confidence clusters and positional ambiguity, and temporal redundancy caused by repea...

πŸ“„ Low-Rank Adaptation for Critic Learning in Off-Policy Reinforcement Learning
πŸ—“οΈ Published: 4/21/2026
πŸ”— http://arxiv.org/abs/2604.18978v1
πŸ‘₯ Authors: Yuan Zhuang (possible past Baidu (China) affiliation), Yuexin Bian, Sihong He, Jie Feng (possible past Tsinghua University affiliation), Qing Su, Songyang Han, Jonathan Petit, Shihao Ji, Yuanyuan Shi, Fei Miao
Abstract

Scaling critic capacity is a promising direction for enhancing off-policy reinforcement learning (RL). However, larger critics are prone to overfitting and unstable in replay-buffer-based bootstrap training. This paper leverages Low-Rank Adaptation (LoRA) as a structural-sparsity regularizer for off-policy critics. Our approach freezes randomly initialized base matrices and solely optimizes low-rank adapters, thereby constraining critic updates to a low-dimensional subspace. Built on top of Simb...

πŸ“„ Handling and Interpreting Missing Modalities in Patient Clinical Trajectories via Autoregressive Sequence Modeling
πŸ—“οΈ Published: 4/20/2026
πŸ”— http://arxiv.org/abs/2604.18753v1
πŸ‘₯ Authors: Andrew Wang (possible past University Of California, Berkeley affiliation), Ellie Pavlick (possible past Google (United States) affiliation), Ritambhara Singh
Abstract

An active challenge in developing multimodal machine learning (ML) models for healthcare is handling missing modalities during training and deployment. As clinical datasets are inherently temporal and sparse in terms of modality presence, capturing the underlying predictive signal via diagnostic multimodal ML models while retaining model explainability remains an ongoing challenge. In this work, we address this by re-framing clinical diagnosis as an autoregressive sequence modeling task, utilizi...

πŸ“„ MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval
πŸ—“οΈ Published: 4/20/2026
πŸ”— http://arxiv.org/abs/2604.18584v1
πŸ‘₯ Authors: Shaden Alshammari, Kevin Wen, Abrar Zainal, Mark Hamilton, Navid Safaei, Sultan Albarakati, William T. Freeman (possible past Massachusetts Institute Of Technology affiliation), Antonio Torralba (possible past Massachusetts Institute Of Technology affiliation)
Abstract

Mathematical problem solving remains a challenging test of reasoning for large language and multimodal models, yet existing benchmarks are limited in size, language coverage, and task diversity. We introduce MathNet, a high-quality, large-scale, multimodal, and multilingual dataset of Olympiad-level math problems together with a benchmark for evaluating mathematical reasoning in generative models and mathematical retrieval in embedding-based systems. MathNet spans 47 countries, 17 languages, and...

πŸ“„ ClawEnvKit: Automatic Environment Generation for Claw-Like Agents
πŸ—“οΈ Published: 4/20/2026
πŸ”— http://arxiv.org/abs/2604.18543v1
πŸ‘₯ Authors: Xirui Li, Ming Li, Derry Xu, Wei-Lin Chiang, Ion Stoica (possible past University Of California, Berkeley affiliation), Cho-Jui Hsieh, Tianyi Zhou (possible past University Of Washington affiliation)
Abstract

Constructing environments for training and evaluating claw-like agents remains a manual, human-intensive process that does not scale. We argue that what is needed is not just a dataset, but an automated pipeline capable of generating diverse, verified environments on demand. To this end, we introduce ClawEnvKit, an autonomous generation pipeline that instantiates this formalism from natural language descriptions. The pipeline comprises three modules: (1) a parser that extracts structured generat...

πŸ“„ OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning
πŸ—“οΈ Published: 4/20/2026
πŸ”— http://arxiv.org/abs/2604.18530v1
πŸ‘₯ Authors: Xinyu Ma, Mingzhou Xu, Xuebo Liu, Chang Jin, Qiang Wang, Derek F. Wong (possible past Tencent (China) affiliation), Min Zhang (possible past Tsinghua University affiliation)
Abstract

Recent advancements in Reinforcement Learning with Verifiable Rewards (RLVR) have significantly improved Large Language Model (LLM) reasoning, yet models often struggle to explore novel trajectories beyond their initial latent space. While offline teacher guidance and entropy-driven strategies have been proposed to address this, they often lack deep integration or are constrained by the model's inherent capacity. In this paper, we propose OGER, a novel framework that unifies offline teacher guid...

πŸ“„ LLM Safety From Within: Detecting Harmful Content with Internal Representations
πŸ—“οΈ Published: 4/20/2026
πŸ”— http://arxiv.org/abs/2604.18519v1
πŸ‘₯ Authors: Difan Jiao, Yilun Liu, Ye Yuan (possible past Carnegie Mellon University affiliation), Zhenwei Tang, Linfeng Du, Haolun Wu, Ashton Anderson (possible past Stanford University affiliation)
Abstract

Guard models are widely used to detect harmful content in user prompts and LLM responses. However, state-of-the-art guard models rely solely on terminal-layer representations and overlook the rich safety-relevant features distributed across internal layers. We present SIREN, a lightweight guard model that harnesses these internal features. By identifying safety neurons via linear probing and combining them through an adaptive layer-weighted strategy, SIREN builds a harmfulness detector from LLM ...

πŸ“„ Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation
πŸ—“οΈ Published: 4/20/2026
πŸ”— http://arxiv.org/abs/2604.18468v1
πŸ‘₯ Authors: Tianshi Cao, Jiawei Ren, Yuxuan Zhang, Jaewoo Seo, Jiahui Huang, Shikhar Solanki, Haotian Zhang (possible past Stanford University affiliation), Mingfei Guo, Haithem Turki, Muxingzi Li, Yue Zhu, Sipeng Zhang, Zan Gojcic, Sanja Fidler (possible past University Of Toronto affiliation), Kangxue Yin
Abstract

Closed-loop simulation is a core component of autonomous vehicle (AV) development, enabling scalable testing, training, and safety validation before real-world deployment. Neural scene reconstruction converts driving logs into interactive 3D environments for simulation, but it does not produce complete 3D object assets required for agent manipulation and large-viewpoint novel-view synthesis. To address this challenge, we present Asset Harvester, an image-to-3D model and end-to-end pipeline that ...

πŸ“„ Using large language models for embodied planning introduces systematic safety risks
πŸ—“οΈ Published: 4/20/2026
πŸ”— http://arxiv.org/abs/2604.18463v1
πŸ‘₯ Authors: Tao Zhang (possible past Nvidia (United States) affiliation), Kaixian Qu, Zhibin Li, Jiajun Wu (possible past Massachusetts Institute Of Technology affiliation), Marco Hutter, Manling Li, Fan Shi
Abstract

Large language models are increasingly used as planners for robotic systems, yet how safely they plan remains an open question. To evaluate safe planning systematically, we introduce DESPITE, a benchmark of 12,279 tasks spanning physical and normative dangers with fully deterministic validation. Across 23 models, even near-perfect planning ability does not ensure safety: the best-planning model fails to produce a valid plan on only 0.4% of tasks but produces dangerous plans on 28.3%. Among 18 op...

πŸ“„ Planning in entropy-regularized Markov decision processes and games
πŸ—“οΈ Published: 4/21/2026
πŸ”— http://arxiv.org/abs/2604.19695v1
πŸ‘₯ Authors: Jean-Bastien Grill (possible past Deepmind (United Kingdom) affiliation), Omar Darwiche Domingues, Pierre MΓ©nard, RΓ©mi Munos (possible past Google (United States) affiliation), Michal Valko
Abstract

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order O~(1/epsilon^4) for a desired accuracy epsilon, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sa...

πŸ“„ TEMPO: Scaling Test-time Training for Large Reasoning Models
πŸ—“οΈ Published: 4/21/2026
πŸ”— http://arxiv.org/abs/2604.19295v1
πŸ‘₯ Authors: Qingyang Zhang, Xinke Kong, Haitao Wu, Qinghua Hu, Minghao Wu, Baosong Yang (possible past Tencent (China) affiliation), Yu Cheng (possible past National University Of Singapore affiliation), Yun Luo, Ganqu Cui (possible past Tsinghua University affiliation), Changqing Zhang
Abstract

Test-time training (TTT) adapts model parameters on unlabeled test instances during inference time, which continuously extends capabilities beyond the reach of offline training. Despite initial gains, existing TTT methods for LRMs plateau quickly and do not benefit from additional test-time compute. Without external calibration, the self-generated reward signal increasingly drifts as the policy model evolves, leading to both performance plateaus and diversity collapse. We propose TEMPO, a TTT fr...

πŸ“„ SAW-INT4: System-Aware 4-Bit KV-Cache Quantization for Real-World LLM Serving
πŸ—“οΈ Published: 4/21/2026
πŸ”— http://arxiv.org/abs/2604.19157v1
πŸ‘₯ Authors: Jinda Jia, Jisen Li, Zhongzhu Zhou, Jung Hwan Heo, Jue Wang (possible past Tencent (China) affiliation), Tri Dao, Shuaiwen Leon Song (possible past Microsoft (United States) affiliation), Ben Athiwaratkun, Chenfeng Xu (possible past University Of California, Berkeley affiliation), Tianyi Zhang, Xiaoxia Wu
Abstract

KV-cache memory is a major bottleneck in real-world LLM serving, where systems must simultaneously support latency-sensitive small-batch requests and high-throughput concurrent workloads. Although many KV-cache compression methods improve offline accuracy or compression ratio, they often violate practical serving constraints such as paged memory layouts, regular memory access, and fused attention execution, limiting their effectiveness in deployment. In this work, we identify the minimal set o...

πŸ“„ Train Separately, Merge Together: Modular Post-Training with Mixture-of-Experts
πŸ—“οΈ Published: 4/20/2026
πŸ”— http://arxiv.org/abs/2604.18473v1
πŸ‘₯ Authors: Jacob Morrison, Sanjay Adhikesaven, Akshita Bhagia, Matei Zaharia (possible past University Of California, Berkeley affiliation), Noah A. Smith (possible past University Of Washington affiliation), Sewon Min (possible past University Of Washington affiliation)
Abstract

Extending a fully post-trained language model with new domain capabilities is fundamentally limited by monolithic training paradigms: retraining from scratch is expensive and scales poorly, while continued training often degrades existing capabilities. We present BAR (Branch-Adapt-Route), which trains independent domain experts, each through its own mid-training, supervised finetuning, and reinforcement learning pipeline, and composes them via a Mixture-of-Experts architecture with lightweight r...

πŸ“„ AutoPPA: Automated Circuit PPA Optimization via Contrastive Code-based Rule Library Learning
πŸ—“οΈ Published: 4/20/2026
πŸ”— http://arxiv.org/abs/2604.18445v1
πŸ‘₯ Authors: Chongxiao Li, Pengwei Jin, Di Huang (possible past Google (United States) affiliation), Guangrun Sun, Husheng Han, Jianan Mu, Xinyao Zheng, Jiaguo Zhu, Shuyi Xing, Hanjun Wei, Tianyun Ma, Shuyao Cheng, Rui Zhang, Ying Wang (possible past Tsinghua University affiliation), Zidong Du, Qi Guo, Xing Hu (possible past Baidu (China) affiliation)
Abstract

Performance, power, and area (PPA) optimization is a fundamental task in RTL design, requiring a precise understanding of circuit functionality and the relationship between circuit structures and PPA metrics. Recent studies attempt to automate this process using LLMs, but neither feedback-based nor knowledge-based methods are efficient enough, as they either design without any prior knowledge or rely heavily on human-summarized optimization rules. In this paper, we propose AutoPPA, a fully aut...

*Notable papers are those with at least two authors from a "big" AI/ML lab.