GitHub - Xuchen-Li/llm-arxiv-daily: Automatically update arXiv papers about LLM Reasoning, LLM Evaluation, LLM & MLLM and Video Understanding using Github Actions.

Updated on 2025.02.28

Table of Contents

LLM Reasoning
LLM Evaluation
LLM MLLM
Video Understanding

LLM Reasoning

Publish Date	Title	Authors	PDF	Code
2025-02-27	FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving	Guizhen Chen et.al.	2502.20238	null
2025-02-27	Collaborative Stance Detection via Small-Large Language Model Consistency Verification	Yu Yan et.al.	2502.19954	null
2025-02-27	Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models	Yuan Sui et.al.	2502.19918	null
2025-02-27	Order Doesn't Matter, But Reasoning Does: Training LLMs with Order-Centric Augmentation	Qianxi He et.al.	2502.19907	null
2025-02-27	Towards Multimodal Large-Language Models for Parent-Child Interaction: A Focus on Joint Attention	Weiyan Shi et.al.	2502.19877	null
2025-02-26	Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning	Yanan Chen et.al.	2502.19622	null
2025-02-26	General Reasoning Requires Learning to Reason from the Get-go	Seungwook Han et.al.	2502.19402	null
2025-02-26	BIG-Bench Extra Hard	Mehran Kazemi et.al.	2502.19187	null
2025-02-25	Scalable Best-of-N Selection for Large Language Models via Self-Certainty	Zhewei Kang et.al.	2502.18581	null
2025-02-25	SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution	Yuxiang Wei et.al.	2502.18449	null
2025-02-25	Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning	Wenkai Yang et.al.	2502.18080	null
2025-02-21	Improving Value-based Process Verifier via Structural Prior Injection	Zetian Sun et.al.	2502.17498	null
2025-02-24	Making LLMs Reason? The Intermediate Language Problem in Neurosymbolic Approaches	Alexander Beiser et.al.	2502.17216	null
2025-02-24	Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI	Syed Abdul Gaffar Shakhadri et.al.	2502.17092	null
2025-02-24	Understanding the Uncertainty of LLM Explanations: A Perspective Based on Reasoning Topology	Longchao Da et.al.	2502.17026	null
2025-02-24	All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark	Davide Testa et.al.	2502.16989	null
2025-02-24	AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language Models	Qin Zhu et.al.	2502.16906	link
2025-02-24	The Blessing of Reasoning: LLM-Based Contrastive Explanations in Black-Box Recommender Systems	Yuyan Wang et.al.	2502.16759	null
2025-02-23	Reasoning about Affordances: Causal and Compositional Reasoning in LLMs	Magnus F. Gjerde et.al.	2502.16606	null
2025-02-22	ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning	Shulin Huang et.al.	2502.16268	null
2025-02-27	Dynamic Parallel Tree Search for Efficient LLM Reasoning	Yifu Ding et.al.	2502.16235	null
2025-02-22	Patterns Over Principles: The Fragility of Inductive Reasoning in LLMs under Noisy Observations	Chunyang Li et.al.	2502.16169	link
2025-02-22	Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models	Qianqi Yan et.al.	2502.16033	null
2025-02-21	MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use	Zaid Khan et.al.	2502.15872	null
2025-02-21	Do Multilingual LLMs Think In English?	Lisa Schut et.al.	2502.15603	null
2025-02-21	Evaluating Social Biases in LLM Reasoning	Xuyang Wu et.al.	2502.15361	null
2025-02-21	Stepwise Informativeness Search for Improving LLM Reasoning	Siyuan Wang et.al.	2502.15335	null
2025-02-21	Latent Factor Models Meets Instructions:Goal-conditioned Latent Factor Discovery without Task Supervision	Zhouhang Xie et.al.	2502.15147	null
2025-02-19	SIFT: Grounding LLM Reasoning in Contexts via Stickers	Zihao Zeng et.al.	2502.14922	null
2025-02-18	Think Inside the JSON: Reinforcement Strategy for Strict LLM Schema Adherence	Bhavik Agarwal et.al.	2502.14905	null
2025-02-20	Exploring Advanced Techniques for Visual Question Answering: A Comprehensive Comparison	Aiswarya Baby et.al.	2502.14827	null
2025-02-20	Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning	Tian Xie et.al.	2502.14768	link
2025-02-19	Enhancing LLM-Based Recommendations Through Personalized Reasoning	Jiahao Liu et.al.	2502.13845	null
2025-02-19	MCTS-KBQA: Monte Carlo Tree Search for Knowledge Base Question Answering	Guanming Xiong et.al.	2502.13428	null
2025-02-19	MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification	Linzhuang Sun et.al.	2502.13383	link
2025-02-22	Grounding LLM Reasoning with Knowledge Graphs	Alfonso Amayuelas et.al.	2502.13247	null
2025-02-18	Theorem Prover as a Judge for Synthetic Data Generation	Joshua Ong Jun Leang et.al.	2502.13137	null
2025-02-18	Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options	Lakshmi Nair et.al.	2502.12929	link
2025-02-18	S $^2$ R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning	Ruotian Ma et.al.	2502.12853	link
2025-02-18	CutPaste&Find: Efficient Multimodal Hallucination Detector with Visual-aid Knowledge Base	Cong-Duy Nguyen et.al.	2502.12591	null
2025-02-18	Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights	Shubham Parashar et.al.	2502.12521	null
2025-02-18	HopRAG: Multi-Hop Reasoning for Logic-Aware Retrieval-Augmented Generation	Hao Liu et.al.	2502.12442	null
2025-02-17	Evaluating Step-by-step Reasoning Traces: A Survey	Jinu Lee et.al.	2502.12289	null
2025-02-17	SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs	Yige Xu et.al.	2502.12134	null
2025-02-17	TokenSkip: Controllable Chain-of-Thought Compression in LLMs	Heming Xia et.al.	2502.12067	link
2025-02-17	Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models	Hyunwoo Kim et.al.	2502.11881	null
2025-02-17	Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities	Hanbin Wang et.al.	2502.11829	link
2025-02-17	Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning	Yuqi Pang et.al.	2502.11751	link
2025-02-17	DeFiScope: Detecting Various DeFi Price Manipulations with LLM Reasoning	Juantao Zhong et.al.	2502.11521	null
2025-02-16	Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls	Ante Wang et.al.	2502.11183	null
2025-02-16	LogiDynamics: Unraveling the Dynamics of Logical Inference in Large Language Model Reasoning	Tianshi Zheng et.al.	2502.11176	null
2025-02-15	A Tutorial on LLM Reasoning: Relevant Methods behind ChatGPT o1	Jun Wang et.al.	2502.10867	null
2025-02-15	USER-VLM 360: Personalized Vision Language Models with User-aware Tuning for Social Human-Robot Interactions	Hamed Rahimi et.al.	2502.10636	null
2025-02-14	Do Large Language Models Reason Causally Like Us? Even Better?	Hanna M. Dettki et.al.	2502.10215	null
2025-02-14	MathConstruct: Challenging LLM Reasoning with Constructive Proofs	Mislav Balunović et.al.	2502.10197	null
2025-02-13	MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency	Dongzhi Jiang et.al.	2502.09621	null
2025-02-14	EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges	Clinton J. Wang et.al.	2502.08859	null
2025-02-11	CIRCUIT: A Benchmark for Circuit Interpretation and Reasoning Capabilities of LLMs	Lejla Skelic et.al.	2502.07980	null
2025-02-05	Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment	Cheryl Li et.al.	2502.07803	null
2025-02-17	Bag of Tricks for Inference-time Computation of LLM Reasoning	Fan Liu et.al.	2502.07191	null
2025-02-15	Self-Supervised Prompt Optimization	Jinyu Xiang et.al.	2502.06855	link
2025-02-06	Vision-Integrated LLMs for Autonomous Driving Assistance : Human Performance Comparison and Trust Evaluation	Namhee Kim et.al.	2502.06843	null
2025-02-04	Policy Guided Tree Search for Enhanced LLM Reasoning	Yang Li et.al.	2502.06813	null
2025-02-10	ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates	Ling Yang et.al.	2502.06772	link
2025-02-10	Resurrecting saturated LLM benchmarks with adversarial encoding	Igor Ivanov et.al.	2502.06738	null
2025-02-13	LawGPT: Knowledge-Guided Data Generation and Its Application to Legal LLM	Zhi Zhou et.al.	2502.06572	link
2025-02-09	A Generative Framework for Bidirectional Image-Report Understanding in Chest Radiography	Nicholas Evans et.al.	2502.05926	null
2025-02-08	Evaluating Vision-Language Models for Emotion Recognition	Sree Bhattacharyya et.al.	2502.05660	null
2025-02-07	GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?	Yang Zhou et.al.	2502.05252	link
2025-02-07	Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures	Tushar Pandey et.al.	2502.05078	link
2025-02-07	Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research	Junde Wu et.al.	2502.04644	link
2025-02-05	Enhancing Reasoning to Adapt Large Language Models for Domain-Specific Applications	Bo Wen et.al.	2502.04384	link
2025-02-05	Limitations of Large Language Models in Clinical Problem-Solving Arising from Inflexible Reasoning	Jonathan Kim et.al.	2502.04381	null
2025-02-04	Investigating the Robustness of Deductive Reasoning with Large Language Models	Fabian Hoppe et.al.	2502.04352	null
2025-02-04	Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search	Maohao Shen et.al.	2502.02508	null
2025-02-04	CoAT: Chain-of-Associated-Thoughts Framework for Enhancing Large Language Models Reasoning	Jianfeng Pan et.al.	2502.02390	null
2025-02-08	Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking	Jinyang Wu et.al.	2502.02339	null
2025-02-04	Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration	Younan Zhu et.al.	2502.01969	null
2025-01-31	Improving Rule-based Reasoning in LLMs via Neurosymbolic Representations	Varun Dhanraj et.al.	2502.01657	null
2025-02-03	Position: Empowering Time Series Reasoning with Multimodal LLMs	Yaxuan Kong et.al.	2502.01477	null
2025-02-03	ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning	Bill Yuchen Lin et.al.	2502.01100	null
2025-02-16	Learning Autonomous Code Integration for Math Language Models	Haozhe Wang et.al.	2502.00691	null
2025-02-13	Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning	Zhi Zhou et.al.	2502.00511	null
2025-02-14	Reward-Guided Speculative Decoding for Efficient LLM Reasoning	Baohao Liao et.al.	2501.19324	null
2025-01-31	Efficient Reasoning with Hidden Thinking	Xuan Shen et.al.	2501.19201	link
2025-01-31	BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning	Han Zhong et.al.	2501.18858	null
2025-01-28	A Stochastic Dynamical Theory of LLM Self-Adversariality: Modeling Severity Drift as a Critical Process	Jack David Carson et.al.	2501.16783	null
2025-01-27	Explaining GitHub Actions Failures with Large Language Models: Challenges, Insights, and Limitations	Pablo Valenzuela-Toledo et.al.	2501.16495	null
2025-01-27	Large Models in Dialogue for Active Perception and Anomaly Detection	Tzoulio Chamiti et.al.	2501.16300	link
2025-01-26	TensorLLM: Tensorising Multi-Head Attention for Enhanced Reasoning and Compression in LLMs	Yuxuan Gu et.al.	2501.15674	null
2025-01-28	Rethinking External Slow-Thinking: From Snowball Errors to Probability of Correct Reasoning	Zeyu Gan et.al.	2501.15602	link
2025-01-26	Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework	Yuhong Sun et.al.	2501.15581	null
2025-02-15	Option-ID Based Elimination For Multiple Choice Questions	Zhenhao Zhu et.al.	2501.15175	null
2025-01-24	Domaino1s: Guiding LLM Reasoning for Explainable Answers in High-Stakes Domains	Xu Chu et.al.	2501.14431	null
2025-02-12	GraphSOS: Graph Sampling and Order Selection to Help LLMs Understand Graphs Better	Xu Chu et.al.	2501.14427	null
2025-01-23	Pseudocode-Injection Magic: Enabling LLMs to Tackle Graph Computational Tasks	Chang Gong et.al.	2501.13731	null
2025-02-10	Cognitive Paradigms for Evaluating VLMs on Visual Reasoning Task	Mohit Vaishnav et.al.	2501.13620	null
2025-01-22	EvidenceMap: Unleashing the Power of Small Language Models with Evidence Analysis for Biomedical Question Answering	Chang Zong et.al.	2501.12746	null
2025-01-17	LLM Reasoner and Automated Planner: A new NPC approach	Israel Puerta-Merino et.al.	2501.10106	null
2025-01-22	FRAG: A Flexible Modular Framework for Retrieval-Augmented Generation based on Knowledge Graphs	Zengyi Gao et.al.	2501.09957	null
2025-01-17	Evolving Deeper LLM Thinking	Kuang-Huei Lee et.al.	2501.09891	null
2025-01-23	Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models	Fengli Xu et.al.	2501.09686	null
2025-01-15	Multimodal LLMs Can Reason about Aesthetics in Zero-Shot	Ruixiang Jiang et.al.	2501.09012	link
2025-02-10	Ensemble of Large Language Models for Curated Labeling and Rating of Free-text Data	Jiaxing Qiu et.al.	2501.08413	link
2025-01-14	Reasoning with Graphs: Structuring Implicit Knowledge to Enhance LLMs Reasoning	Haoyu Han et.al.	2501.07845	null
2025-01-09	Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark	Yunzhuo Hao et.al.	2501.05444	link
2025-01-08	Enhancing Financial VQA in Vision Language Models using Intermediate Structured Representations	Archita Srivastava et.al.	2501.04675	null
2025-01-08	DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests	Charles Corbière et.al.	2501.04671	null
2025-01-08	Understanding Before Reasoning: Enhancing Chain-of-Thought with Iterative Summarization Pre-Prompting	Dong-Hai Zhu et.al.	2501.04341	link
2025-01-07	Reasoning-Enhanced Self-Training for Long-Form Personalized Text Generation	Alireza Salemi et.al.	2501.04167	null
2025-01-07	Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild	Wanpeng Hu et.al.	2501.02964	link
2025-01-06	KG-CF: Knowledge Graph Completion with Context Filtering under the Guidance of Large Language Models	Zaiyi Zheng et.al.	2501.02711	null
2025-01-04	Table as Thought: Exploring Structured Thoughts in LLM Reasoning	Zhenjie Sun et.al.	2501.02152	null
2025-01-03	Recursive Decomposition of Logical Thoughts: Framework for Superior Reasoning and Knowledge Propagation in Large Language Models	Kaleem Ullah Qasim et.al.	2501.02026	null
2025-01-02	Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search	Shuangtao Li et.al.	2501.01478	null
2025-01-02	HetGCoT-Rec: Heterogeneous Graph-Enhanced Chain-of-Thought LLM Reasoning for Journal Recommendation	Runsong Jia et.al.	2501.01203	null
2025-01-03	Enhancing LLM Reasoning with Multi-Path Collaborative Reactive and Reflection agents	Chengbo He et.al.	2501.00430	null
2024-12-31	EQUATOR: A Deterministic Framework for Evaluating LLM Reasoning with Open-Ended Questions. # v1.0.0-beta	Raymond Bernard et.al.	2501.00257	null
2024-12-30	Efficiently Serving LLM Reasoning Programs with Certaindex	Yichao Fu et.al.	2412.20993	null
2024-12-28	LLM Reasoning Engine: Specialized Training for Enhanced Mathematical Reasoning	Shuguang Chen et.al.	2412.20227	null
2025-02-17	Token-Budget-Aware LLM Reasoning	Tingxu Han et.al.	2412.18547	link
2024-12-23	StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs	Hailin Chen et.al.	2412.18011	null
2025-02-09	Evaluating LLM Reasoning in the Operations Research Domain with ORQA	Mahdi Mostajabdaveh et.al.	2412.17874	link
2024-12-23	Diving into Self-Evolving Training for Multimodal Reasoning	Wei Liu et.al.	2412.17451	null
2024-12-21	SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization	Tan-Hanh Pham et.al.	2412.16771	null
2024-12-20	PruneVid: Visual Token Pruning for Efficient Video Large Language Models	Xiaohu Huang et.al.	2412.16117	link
2024-12-19	Eliciting Causal Abilities in Large Language Models for Reasoning Tasks	Yajing Wang et.al.	2412.15314	link
2024-12-19	Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying	Federico Castagna et.al.	2412.15177	link
2024-12-19	Progressive Multimodal Reasoning via Active Retrieval	Guanting Dong et.al.	2412.14835	null
2024-12-19	FiVL: A Framework for Improved Vision-Language Alignment	Estelle Aflalo et.al.	2412.14672	null
2024-12-19	FaultExplainer: Leveraging Large Language Models for Interpretable Fault Detection and Diagnosis	Abdullah Khan et.al.	2412.14492	link
2024-12-18	Cognition Chain for Explainable Psychological Stress Detection on Social Media	Xin Wang et.al.	2412.14009	null
2024-12-27	Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence	Jinghan He et.al.	2412.13949	null
2025-02-16	Do Language Models Understand Time?	Xi Ding et.al.	2412.13845	link
2024-12-18	Beyond Outcomes: Transparent Assessment of LLM Reasoning in Games	Wenye Lin et.al.	2412.13602	null
2024-12-17	ClarityEthic: Explainable Moral Judgment Utilizing Contrastive Ethical Insights from Large Language Models	Yuxi Sun et.al.	2412.12848	null
2024-12-12	A NotSo Simple Way to Beat Simple Bench	Soham Sane et.al.	2412.12173	null
2024-12-11	What Makes In-context Learning Effective for Mathematical Reasoning: A Theoretical Analysis	Jiayu Liu et.al.	2412.12157	null
2025-02-18	A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges	Yibo Yan et.al.	2412.11936	null
2024-12-24	Stepwise Reasoning Error Disruption Attack of LLMs	Jingyu Peng et.al.	2412.11934	null
2024-12-16	Leveraging Retrieval-Augmented Tags for Large Vision-Language Understanding in Complex Scenes	Antonio Carlos Rivera et.al.	2412.11396	null
2024-12-15	SceneLLM: Implicit Language Reasoning in LLM for Dynamic Scene Graph Generation	Hang Zhang et.al.	2412.11026	null
2024-12-15	Entropy-Regularized Process Reward Model	Hanning Zhang et.al.	2412.11006	link
2024-12-14	Optimizing Vision-Language Interactions Through Decoder-Only Models	Kaito Tanaka et.al.	2412.10758	null
2024-12-14	Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation	Sukai Huang et.al.	2412.10675	null
2024-12-14	Thinking with Knowledge Graphs: Enhancing LLM Reasoning Through Structured Data	Xue Wu et.al.	2412.10654	null
2024-12-13	EVLM: Self-Reflective Multimodal Reasoning for Cross-Dimensional Visual Editing	Umar Khalid et.al.	2412.10566	null
2024-12-13	Atomic Learning Objectives Labeling: A High-Resolution Approach for Physics Education	Naiming Liu et.al.	2412.09914	null
2025-01-18	Neptune: The Long Orbit to Benchmarking Long Video Understanding	Arsha Nagrani et.al.	2412.09582	link
2025-02-14	Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning	Zhenni Bi et.al.	2412.09078	link
2024-12-11	Training Large Language Models to Reason in a Continuous Latent Space	Shibo Hao et.al.	2412.06769	link
2025-01-23	GameArena: Evaluating LLM Reasoning through Live Computer Games	Lanxiang Hu et.al.	2412.06394	null
2024-12-08	Language hooks: a modular framework for augmenting LLM reasoning that decouples tool usage from the model and its prompt	Damien de Mijolla et.al.	2412.05967	null
2024-12-06	MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale	Jarvis Guo et.al.	2412.05237	null
2024-12-05	Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction	Yiheng Xu et.al.	2412.04454	null
2024-12-05	SocialMind: LLM-based Proactive AR Social Assistive System with Human-like Perception for In-situ Live Interactions	Bufang Yang et.al.	2412.04036	null
2024-12-04	DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation	Qingdong He et.al.	2412.03255	null
2024-12-03	Explainable CTR Prediction via LLM Reasoning	Xiaohan Yu et.al.	2412.02588	null
2025-02-12	NYT-Connections: A Deceptively Simple Text Classification Task that Stumps System-1 Thinkers	Angel Yahir Loredo Lopez et.al.	2412.01621	null
2025-01-13	Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability	Zicheng Lin et.al.	2411.19943	link
2024-11-29	TQA-Bench: Evaluating LLMs for Multi-Table Question Answering with Scalable Context and Symbolic Extension	Zipeng Qiu et.al.	2411.19504	link
2024-11-29	COLD: Causal reasOning in cLosed Daily activities	Abhinav Joshi et.al.	2411.19500	link
2024-12-16	Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning	Di Zhang et.al.	2411.18203	null
2024-11-26	NEMO: Can Multimodal LLMs Identify Attribute-Modified Objects?	Jiaxuan Li et.al.	2411.17794	null
2024-11-25	Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision	Zhiheng Xi et.al.	2411.16579	null
2024-11-22	On the Impact of Fine-Tuning on Chain-of-Thought Reasoning	Elita Lobo et.al.	2411.15382	null
2024-11-21	Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models	Yuhao Dong et.al.	2411.14432	link
2024-11-20	BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games	Davide Paglieri et.al.	2411.13543	null
2024-11-20	Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving	Hao Zhou et.al.	2411.13076	null
2024-11-15	Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination	Haojie Zheng et.al.	2411.12591	link
2024-12-23	Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic Corpus	Terufumi Morishita et.al.	2411.12498	link
2024-11-18	Semantic-Geometric-Physical-Driven Robot Manipulation Skill Transfer via Skill Library and Tactile Representation	Mingchao Qi et.al.	2411.11714	link
2024-12-31	Enhancing LLM Reasoning with Reward-guided Tree Search	Jinhao Jiang et.al.	2411.11694	null
2024-12-15	A dataset of questions on decision-theoretic reasoning in Newcomb-like problems	Caspar Oesterheld et.al.	2411.10588	link
2024-11-15	Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization	Weiyun Wang et.al.	2411.10442	null
2025-01-09	LLaVA-CoT: Let Vision Language Models Reason Step-by-Step	Guowei Xu et.al.	2411.10440	link
2024-11-15	Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level	Andong Deng et.al.	2411.09921	null
2024-11-14	Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering	Nghia Trung Ngo et.al.	2411.09213	null
2024-11-13	Tree-of-Table: Unleashing the Power of LLMs for Enhanced Large-Scale Table Understanding	Deyi Ji et.al.	2411.08516	null
2024-11-18	What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?	Katie Kang et.al.	2411.07681	link
2024-11-27	Self-Training Meets Consistency: Improving LLMs' Reasoning With Consistency-Driven Rationale Evaluation	Jaehyeok Lee et.al.	2411.06387	link
2024-11-09	A Picture is Worth A Thousand Numbers: Enabling LLMs Reason about Time Series via Visualization	Haoxin Liu et.al.	2411.06018	null
2024-11-11	LLMs as Method Actors: A Model for Prompt Engineering and Architecture	Colin Doyle et.al.	2411.05778	link
2024-11-12	Kwai-STaR: Transform LLMs into State-Transition Reasoners	Xingyu Lu et.al.	2411.04799	null
2024-11-21	Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding	Haolin Chen et.al.	2411.04282	link
2024-11-05	CrowdGenUI: Enhancing LLM-Based UI Widget Generation with a Crowdsourced Preference Library	Yimeng Liu et.al.	2411.03477	null
2025-01-27	MetRex: A Benchmark for Verilog Code Metric Reasoning Using LLMs	Manar Abdelatty et.al.	2411.03471	link
2024-11-04	RuAG: Learned-rule-augmented Generation for Large Language Models	Yudi Zhang et.al.	2411.03349	null
2024-10-30	Vision-Language Models Can Self-Improve Reasoning via Reflection	Kanzhi Cheng et.al.	2411.00855	null
2024-11-01	Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling	Yiwen Ding et.al.	2411.00750	link
2024-11-01	STEM-POM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing	Jiaru Zou et.al.	2411.00387	null
2024-11-08	GRS-QA -- Graph Reasoning-Structured Question Answering Dataset	Anish Pahilajani et.al.	2411.00369	null
2024-10-31	Thought Space Explorer: Navigating and Expanding Thought Space for Large Language Model Reasoning	Jinghan Zhang et.al.	2410.24155	null
2024-10-31	RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner	Fu-Chieh Chang et.al.	2410.23912	null
2024-10-31	OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models	Junda Wu et.al.	2410.23703	null
2024-10-30	ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning	Millennium Bismay et.al.	2410.23180	link
2024-10-30	On Memorization of Large Language Models in Logical Reasoning	Chulin Xie et.al.	2410.23123	null
2024-10-28	Causal Interventions on Causal Paths: Mapping GPT-2's Reasoning From Syntax to Semantics	Isabelle Lee et.al.	2410.21353	null
2024-10-28	Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments	Sangmim Song et.al.	2410.20666	null
2024-10-25	Cooperative Strategic Planning Enhances Reasoning Capabilities in Large Language Models	Danqing Wang et.al.	2410.20007	null
2024-10-25	Can Stories Help LLMs Reason? Curating Information Space Through Narrative	Vahid Sadiri Javadi et.al.	2410.19221	null
2024-10-18	Make LLMs better zero-shot reasoners: Structure-orientated autonomous reasoning	Pengfei He et.al.	2410.19000	link
2024-10-25	CLR-Bench: Evaluating Large Language Models in College-level Reasoning	Junnan Dong et.al.	2410.17558	null
2024-10-28	Non-myopic Generation of Language Models for Reasoning and Planning	Chang Ma et.al.	2410.17195	link
2024-11-06	Improving Causal Reasoning in Large Language Models: A Survey	Longxuan Yu et.al.	2410.16676	link
2024-10-22	A Statistical Analysis of LLMs' Self-Evaluation Using Proverbs	Ryosuke Sonoda et.al.	2410.16640	null
2024-10-21	Rulebreakers Challenge: Revealing a Blind Spot in Large Language Models' Reasoning with Formal Logic	Jason Chan et.al.	2410.16502	null
2024-11-27	On Designing Effective RL Reward at Training Time for LLM Reasoning	Jiaxuan Gao et.al.	2410.15115	null
2025-01-28	Paths-over-Graph: Knowledge Graph Empowered Large Language Model Reasoning	Xingyu Tan et.al.	2410.14211	null
2024-10-21	Unconstrained Model Merging for Enhanced LLM Reasoning	Yiming Zhang et.al.	2410.13699	null
2024-10-16	Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models	Linhao Luo et.al.	2410.13080	link
2024-10-16	KcMF: A Knowledge-compliant Framework for Schema and Entity Matching with Fine-tuning-free LLMs	Yongqin Xu et.al.	2410.12480	null
2024-10-17	Enhancing LLM Trading Performance with Fact-Subjectivity Aware Reasoning	Qian Wang et.al.	2410.12464	null
2024-10-16	Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up	Jiahao Yuan et.al.	2410.12323	link
2024-10-16	Exploiting LLMs' Reasoning Capability to Infer Implicit Concepts in Legal Information Retrieval	Hai-Long Nguyen et.al.	2410.12154	null
2024-10-15	Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming	Yilun Hao et.al.	2410.12112	null
2024-10-12	OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models	Jun Wang et.al.	2410.09671	null
2024-10-11	P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains	Simeng Han et.al.	2410.09207	null
2024-10-11	Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning	Yunpeng Gao et.al.	2410.08500	null
2024-10-10	SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation	Hang Yin et.al.	2410.08189	null
2024-10-10	Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning	Amrith Setlur et.al.	2410.08146	null
2024-10-10	Automatic Curriculum Expert Iteration for Reliable LLM Reasoning	Zirui Zhao et.al.	2410.07627	null
2024-10-09	Boosting Few-Shot Detection with Large Language Models and Layout-to-Image Synthesis	Ahmed Abdullah et.al.	2410.06841	null
2024-10-09	Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning	Xiyao Wang et.al.	2410.06508	null
2025-01-02	Filtering Discomforting Recommendations with Large Language Models	Jiahao Liu et.al.	2410.05411	null
2024-10-05	Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification	Zhenwen Liang et.al.	2410.05318	null
2024-10-06	Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval	Pengcheng Jiang et.al.	2410.04585	link
2024-10-03	The Role of Deductive and Inductive Reasoning in Large Language Models	Chengkun Cai et.al.	2410.02892	null
2024-10-02	Not All LLM Reasoners Are Created Equal	Arian Hosseini et.al.	2410.01748	null
2024-12-25	Interpretable Contrastive Monte Carlo Tree Search Reasoning	Zitian Gao et.al.	2410.01707	link
2024-10-02	VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment	Amirhossein Kazemnejad et.al.	2410.01679	link
2024-10-02	AHP-Powered LLM Reasoning for Multi-Criteria Evaluation of Open-Ended Responses	Xiaotian Lu et.al.	2410.01246	null
2024-10-01	Self-controller: Controlling LLMs with Multi-round Step-by-step Self-awareness	Xiao Peng et.al.	2410.00359	null
2024-10-01	Insight: A Multi-Modal Diagnostic Pipeline using LLMs for Ocular Surface Disease Diagnosis	Chun-Hsiao Yeh et.al.	2410.00292	null
2024-10-08	GUNDAM: Aligning Large Language Models with Graph Understanding	Sheng Ouyang et.al.	2409.20053	null
2024-09-27	Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs	Yanyuan Qiao et.al.	2409.18794	null
2024-10-23	Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning	Debargha Ganguly et.al.	2409.17270	null
2024-09-20	CSCE: Boosting LLM Reasoning by Simultaneous Enhancing of Casual Significance and Consistency	Kangsheng Wang et.al.	2409.17174	null
2024-09-20	Mufu: Multilingual Fused Learning for Low-Resource Translation with LLM	Zheng Wei Lim et.al.	2409.13949	null
2024-09-19	SituationAdapt: Contextual UI Optimization in Mixed Reality with Situation Awareness via LLM Reasoning	Zhipeng Li et.al.	2409.12836	null
2024-10-04	Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning	Jiaxin Wen et.al.	2409.12452	link
2024-12-16	Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data	Jiaming Zhou et.al.	2409.12437	link
2024-09-18	MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning	Justin Chih-Yao Chen et.al.	2409.12147	link
2024-11-05	Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent	Fatemeh Haji et.al.	2409.11527	link
2024-09-16	Enhancing RL Safety with Counterfactual LLM Reasoning	Dennis Gross et.al.	2409.10188	link
2024-09-11	Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation	SeongYeub Chu et.al.	2409.07355	link

(back to top)

LLM Evaluation

Publish Date	Title	Authors	PDF	Code
2025-02-26	Exploring Graph Tasks with Pure LLMs: A Comprehensive Benchmark and Investigation	Yuxiang Wang et.al.	2502.18771	link
2025-02-23	Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation	Simin Chen et.al.	2502.17521	link
2025-02-24	Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective	Chengyin Xu et.al.	2502.17262	null
2025-02-24	Detecting Benchmark Contamination Through Watermarking	Tom Sander et.al.	2502.17259	null
2025-02-24	Predicting Liquidity-Aware Bond Yields using Causal GANs and Deep Reinforcement Learning with LLM Evaluation	Jaskaran Singh Walia et.al.	2502.17011	null
2025-02-24	AlphaAgent: LLM-Driven Alpha Mining with Regularized Exploration to Counteract Alpha Decay	Ziyi Tang et.al.	2502.16789	null
2025-01-30	Retrieval Augmented Generation Based LLM Evaluation For Protocol State Machine Inference With Chain-of-Thought Reasoning	Youssef Maklad et.al.	2502.15727	null
2025-02-20	Prompt-to-Leaderboard	Evan Frick et.al.	2502.14855	link
2025-02-27	SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines	M-A-P Team et.al.	2502.14739	null
2025-02-20	SEA-HELM: Southeast Asian Holistic Evaluation of Language Models	Yosephine Susanto et.al.	2502.14301	null
2025-02-20	Transfer-Prompting: Enhancing Cross-Task Adaptation in Large Language Models via Dual-Stage Prompts Optimization	Yupeng Chang et.al.	2502.14211	link
2025-02-19	Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above	Nishant Balepur et.al.	2502.14127	null
2025-02-19	STEER-ME: Assessing the Microeconomic Reasoning of Large Language Models	Narun Raman et.al.	2502.13119	null
2025-02-18	HPSS: Heuristic Prompting Strategy Search for LLM Evaluators	Bosi Wen et.al.	2502.13031	null
2025-02-18	None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks	Eva Sánchez Salido et.al.	2502.12896	null
2025-02-18	Safe at the Margins: A General Approach to Safety Alignment in Low-Resource English Languages -- A Singlish Case Study	Isaac Lim et.al.	2502.12485	null
2025-02-17	Deviation Ratings: A General, Clone-Invariant Rating Method	Luke Marris et.al.	2502.11645	null
2025-02-21	TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking	Shahriar Kabir Nahin et.al.	2502.11187	null
2025-02-15	Rule-Bottleneck Reinforcement Learning: Joint Explanation and Decision Optimization for Resource Allocation with Language Agents	Mauricio Tec et.al.	2502.10732	null
2025-02-15	An Empirical Analysis of Uncertainty in Large Language Model Evaluations	Qiujie Xie et.al.	2502.10709	link
2025-02-25	Accelerating Unbiased LLM Evaluation via Synthetic Feedback	Zhaoyi Zhou et.al.	2502.10563	link
2025-02-14	MathConstruct: Challenging LLM Reasoning with Constructive Proofs	Mislav Balunović et.al.	2502.10197	null
2025-02-13	Enhancing Jailbreak Attacks via Compliance-Refusal-Based Initialization	Amit Levi et.al.	2502.09755	null
2025-02-13	NestQuant: Nested Lattice Quantization for Matrix Products and LLMs	Semyon Savkin et.al.	2502.09720	null
2025-02-12	The Science of Evaluating Foundation Models	Jiayi Yuan et.al.	2502.09670	null
2025-02-13	Copilot Arena: A Platform for Code LLM Evaluation in the Wild	Wayne Chi et.al.	2502.09328	null
2025-02-12	Revisiting 3D LLM Benchmarks: Are We Really Testing 3D Capabilities?	Jiahe Jin et.al.	2502.08503	link
2025-02-11	Forget What You Know about LLMs Evaluations -- LLMs are Like a Chameleon	Nurit Cohen-Inger et.al.	2502.07445	link
2025-02-10	Evaluating the Systematic Reasoning Abilities of Large Language Models through Graph Coloring	Alex Heyman et.al.	2502.07087	link
2025-02-10	Multi-turn Evaluation of Anthropomorphic Behaviours in Large Language Models	Lujain Ibrahim et.al.	2502.07077	null
2025-02-07	LLM-Supported Natural Language to Bash Translation	Finnian Westenfelder et.al.	2502.06858	link
2025-02-15	Self-Supervised Prompt Optimization	Jinyu Xiang et.al.	2502.06855	link
2025-02-10	Resurrecting saturated LLM benchmarks with adversarial encoding	Igor Ivanov et.al.	2502.06738	null
2025-02-10	Automatic Evaluation of Healthcare LLMs Beyond Question-Answering	Anna Arias-Duart et.al.	2502.06666	null
2025-02-10	Unbiased Evaluation of Large Language Models from a Causal Perspective	Meilin Chen et.al.	2502.06655	null
2025-02-10	LessLeak-Bench: A First Investigation of Data Leakage in LLMs Across 83 Software Engineering Benchmarks	Xin Zhou et.al.	2502.06215	null
2025-02-05	Aero-LLM: A Distributed Framework for Secure UAV Communication and Intelligent Decision-Making	Balakrishnan Dharmalingam et.al.	2502.05220	null
2025-02-06	TruthFlow: Truthful LLM Generation via Representation Flow Correction	Hanyu Wang et.al.	2502.04556	null
2025-02-05	How do Humans and Language Models Reason About Creativity? A Comparative Analysis	Antonio Laverghetta Jr. et.al.	2502.03253	null
2025-02-05	On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation	Nghiem T. Diep et.al.	2502.03029	null
2025-02-02	LLM-Powered Benchmark Factory: Reliable, Generic, and Efficient	Peiwen Yuan et.al.	2502.01683	link
2025-02-02	HASSLE-free: A unified Framework for Sparse plus Low-Rank Matrix Decomposition for LLMs	Mehdi Makni et.al.	2502.00899	null
2025-02-01	DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks	Zhiliang Chen et.al.	2502.00270	null
2025-01-30	Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination's Impact on Machine Translation	Muhammed Yusuf Kocyigit et.al.	2501.18771	null
2025-01-31	ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation	Minghua He et.al.	2501.18460	null
2025-02-01	LLM Evaluation Based on Aerospace Manufacturing Expertise: Automated Generation and Multi-Model Question Answering	Beiming Liu et.al.	2501.17183	null
2025-01-28	An LLM Benchmark for Addressee Recognition in Multi-modal Multi-party Dialogue	Koji Inoue et.al.	2501.16643	null
2025-01-26	HardML: A Benchmark For Evaluating Data Science And Machine Learning knowledge and reasoning in AI	Tidor-Vlad Pricope et.al.	2501.15627	null
2025-01-23	Question Answering on Patient Medical Records with Private Fine-Tuned LLMs	Sara Kothari et.al.	2501.13687	null
2025-01-10	CodEv: An Automated Grading Framework Leveraging Large Language Models for Consistent and Constructive Feedback	En-Qi Tseng et.al.	2501.10421	null
2025-01-15	Towards Multilingual LLM Evaluation for Baltic and Nordic languages: A study on Lithuanian History	Yevhen Kostiuk et.al.	2501.09154	null
2025-01-13	Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles	Samia Touileb et.al.	2501.07718	null
2025-01-03	FLAME: Financial Large-Language Model Assessment and Metrics Evaluation	Jiayu Guo et.al.	2501.06211	link
2025-01-07	MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems	Yannis Katsis et.al.	2501.03468	link
2025-01-05	Evaluating Large Language Models Against Human Annotators in Latent Content Analysis: Sentiment, Political Leaning, Emotional Intensity, and Sarcasm	Ljubisa Bojic et.al.	2501.02532	null
2025-01-04	LLMzSzŁ: a comprehensive LLM benchmark for Polish	Krzysztof Jassem et.al.	2501.02266	null
2025-01-08	VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM	Yuqian Yuan et.al.	2501.00599	link
2025-01-04	Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation	M. Ali Bayram et.al.	2501.00593	null
2024-12-31	Echoes in AI: Quantifying Lack of Plot Diversity in LLM Outputs	Weijia Xu et.al.	2501.00273	null
2024-12-30	EVOLVE: Emotion and Visual Output Learning via LLM Evaluation	Jordan Sinclair et.al.	2412.20632	null
2024-12-24	Muse: A Multimodal Conversational Recommendation Dataset with Scenario-Grounded User Profiles	Zihan Wang et.al.	2412.18416	null
2024-12-24	A Statistical Framework for Ranking LLM-Based Chatbots	Siavash Ameli et.al.	2412.18407	link
2025-01-25	DeepCRCEval: Revisiting the Evaluation of Code Review Comment Generation	Junyi Lu et.al.	2412.18291	null
2024-12-23	CARL-GT: Evaluating Causal Reasoning Capabilities of Large Language Models	Ruibo Tu et.al.	2412.17970	link
2025-01-02	Baichuan4-Finance Technical Report	Hanyu Zhang et.al.	2412.15270	null
2024-12-19	ObjVariantEnsemble: Advancing Point Cloud LLM Evaluation in Challenging Scenes with Subtly Distinguished Objects	Qihang Cao et.al.	2412.14837	null
2024-12-18	AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge	Xiaobao Wu et.al.	2412.13670	link
2025-02-16	Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning	Eitan Wagner et.al.	2412.13631	null
2025-02-17	OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain	Shuting Wang et.al.	2412.13018	link
2024-12-10	How to Choose a Threshold for an Evaluation Metric for Large Language Models	Bhaskarjit Sarmah et.al.	2412.12148	null
2024-12-15	Dual Traits in Probabilistic Reasoning of Large Language Models	Shenxiong Li et.al.	2412.11009	link
2024-12-30	LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation	Eunsu Kim et.al.	2412.10424	null
2024-12-13	Cultural Evolution of Cooperation among LLM Agents	Aron Vallinder et.al.	2412.10270	null
2024-12-12	Towards Understanding the Robustness of LLM-based Evaluations under Perturbations	Manav Chaudhary et.al.	2412.09269	null
2024-12-10	BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities	Sahal Shaji Mullappilly et.al.	2412.07769	link
2024-12-12	PediaBench: A Comprehensive Chinese Pediatric Dataset for Benchmarking Large Language Models	Qian Zhang et.al.	2412.06287	link
2024-12-02	AI Benchmarks and Datasets for LLM Evaluation	Todor Ivanov et.al.	2412.01020	null
2024-11-30	Evaluating the Consistency of LLM Evaluators	Noah Lee et.al.	2412.00543	null
2024-11-29	MIMDE: Exploring the Use of Synthetic vs Human Data for Evaluating Multi-Insight Multi-Document Extraction Tasks	John Francis et.al.	2411.19689	null
2024-11-29	Beyond Surface Structure: A Causal Assessment of LLMs' Comprehension Ability	Yujin Han et.al.	2411.19456	link
2024-11-27	Is my Meeting Summary Good? Estimating Quality with a Multi-LLM Evaluator	Frederic Kirstein et.al.	2411.18444	null
2025-01-17	CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity	Zhengmin Yu et.al.	2411.16239	link
2024-11-25	SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text	Reshmi Ghosh et.al.	2411.16077	null
2024-11-26	Do LLMs Agree on the Creativity Evaluation of Alternative Uses?	Abdullah Al Rabeyah et.al.	2411.15560	null
2025-02-17	Ranking Unraveled: Recipes for LLM Rankings in Head-to-Head AI Combat	Roland Daynauth et.al.	2411.14483	link
2024-11-21	Lost in Inference: Rediscovering the Role of Natural Language Inference for Large Language Models	Lovish Madaan et.al.	2411.14103	null
2024-11-21	An Evaluation-Driven Approach to Designing LLM Agents: Process and Architecture	Boming Xia et.al.	2411.13768	null
2024-11-21	A Framework for Evaluating LLMs Under Task Indeterminacy	Luke Guerdan et.al.	2411.13760	null
2024-11-12	Large Language Models as Neurolinguistic Subjects: Identifying Internal Representations for Form and Meaning	Linyang He et.al.	2411.07533	null
2024-11-13	Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models	Yancheng He et.al.	2411.07140	null
2024-11-09	Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models	Xiaojun Wu et.al.	2411.06272	link
2025-02-09	ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding	Israel Abebe Azime et.al.	2411.05049	null
2024-11-07	Bayesian Calibration of Win Rate Estimation with LLM Evaluators	Yicheng Gao et.al.	2411.04424	link
2024-11-05	Enhancing LLM Evaluations: The Garbling Trick	William F. Bradley et.al.	2411.01533	null
2025-02-19	Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models	Seonil Son et.al.	2411.01281	null
2025-02-07	Mastering the Craft of Data Synthesis for CodeLLMs	Meng Chen et.al.	2411.00005	link
2024-10-28	Project MPG: towards a generalized performance benchmark for LLM capabilities	Lucas Spangher et.al.	2410.22368	null
2024-10-29	Self-Preference Bias in LLM-as-a-Judge	Koki Wataoka et.al.	2410.21819	null
2024-10-28	Unveiling Context-Aware Criteria in Self-Assessing LLMs	Taneesh Gupta et.al.	2410.21545	null
2024-10-27	LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization	Jui-Nan Yen et.al.	2410.20625	null
2024-10-26	Limitations of the LLM-as-a-Judge Approach for Evaluating LLM Outputs in Expert Knowledge Tasks	Annalisa Szymanski et.al.	2410.20266	null
2024-10-23	MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning	Jingfan Zhang et.al.	2410.18035	null
2025-02-21	Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements	Isamu Isozaki et.al.	2410.17141	link
2024-10-21	CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution	Maosong Cao et.al.	2410.16256	link
2025-01-26	mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code Generation	Nishat Raihan et.al.	2410.15037	link
2024-10-19	CAP: Data Contamination Detection via Consistency Amplification	Yi Zhao et.al.	2410.15005	null
2024-10-18	Enabling Scalable Evaluation of Bias Patterns in Medical LLMs	Hamed Fayyaz et.al.	2410.14763	link
2024-11-06	Diverging Preferences: When do Annotators Disagree and do Models Know?	Michael JQ Zhang et.al.	2410.14632	null
2024-10-18	Combining Entropy and Matrix Nuclear Norm for Enhanced Evaluation of Language Models	James Vo et.al.	2410.14480	null
2024-10-21	BenTo: Benchmark Task Reduction with In-Context Transferability	Hongyu Zhao et.al.	2410.13804	link
2024-10-16	BenchmarkCards: Large Language Model and Risk Reporting	Anna Sokol et.al.	2410.12974	null
2025-02-01	Language Model Preference Evaluation with Multiple Weak Evaluators	Zhengyu Hu et.al.	2410.12869	link
2024-10-11	Enterprise Benchmarks for Large Language Model Evaluation	Bing Zhang et.al.	2410.12857	link
2024-10-16	An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation	Junjie Chen et.al.	2410.12265	null
2024-10-15	Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers	Lorenzo Pacchiardi et.al.	2410.11672	link
2024-10-15	Black-box Uncertainty Quantification Method for LLM-as-a-Judge	Nico Wagner et.al.	2410.11594	null
2024-10-14	Jailbreak Instruction-Tuned LLMs via end-of-sentence MLP Re-weighting	Yifan Luo et.al.	2410.10150	null
2024-12-13	HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics	Jingxuan Fan et.al.	2410.09988	link
2024-10-15	LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models	Han Qiu et.al.	2410.09962	link
2024-10-17	Towards Multilingual LLM Evaluation for European Languages	Klaudia Thellmann et.al.	2410.08928	null
2024-10-11	Test-driven Software Experimentation with LASSO: an LLM Benchmarking Example	Marcus Kessel et.al.	2410.08911	null
2024-10-10	Assessing Episodic Memory in LLMs with Sequence Order Recall Tasks	Mathis Pink et.al.	2410.08133	null
2025-02-03	COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act	Philipp Guldimann et.al.	2410.07959	link
2024-11-06	News Reporter: A Multi-lingual LLM Framework for Broadcast T.V News	Tarun Jain et.al.	2410.07520	null
2024-10-09	Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates	Xiaosen Zheng et.al.	2410.07137	link
2024-10-09	ReIFE: Re-evaluating Instruction-Following Evaluation	Yixin Liu et.al.	2410.07069	link
2024-10-08	Active Evaluation Acquisition for Efficient LLM Benchmarking	Yang Li et.al.	2410.05952	null
2024-10-07	TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles	Qingchen Yu et.al.	2410.05262	link
2024-10-01	Language Enhanced Model for Eye (LEME): An Open-Source Ophthalmology-Specific Large Language Model	Aidan Gilson et.al.	2410.03740	null
2024-10-04	TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation	Jonathan Cook et.al.	2410.03608	null
2024-10-04	Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores	Robert E. Blackwell et.al.	2410.03492	null
2024-10-29	AIME: AI System Optimization via Multiple LLM Evaluators	Bhrij Patel et.al.	2410.03131	null
2024-10-02	Comparing Criteria Development Across Domain Experts, Lay Users, and Models in Large Language Model Evaluation	Annalisa Szymanski et.al.	2410.02054	null
2024-10-02	Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models	Joseph Lee et.al.	2410.01795	link
2024-10-03	Extending Context Window of Large Language Models from a Distributional Perspective	Yingsheng Wu et.al.	2410.01490	null
2024-10-02	ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving	Yifan Qiao et.al.	2410.01228	null
2024-10-01	ViDAS: Vision-based Danger Assessment and Scoring	Pranav Gupta et.al.	2410.00477	null
2024-10-01	PclGPT: A Large Language Model for Patronizing and Condescending Language Detection	Hongbo Wang et.al.	2410.00361	link
2024-11-26	LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models	Haitao Li et.al.	2409.20288	link
2024-09-29	Does RAG Introduce Unfairness in LLMs? Evaluating Fairness in Retrieval-Augmented Generation Systems	Xuyang Wu et.al.	2409.19804	null
2024-10-19	Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models	Xin Li et.al.	2409.19667	link
2024-10-05	IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation	Fan Lin et.al.	2409.18892	link
2024-12-13	A Character-Centric Creative Story Generation via Imagination	Kyeongman Park et.al.	2409.16667	null
2024-09-25	Judgment of Thoughts: Courtroom of the Binary Logical Reasoning in Large Language Models	Sungjune Park et.al.	2409.16635	null
2024-12-18	Kalahi: A handcrafted, grassroots cultural LLM evaluation suite for Filipino	Jann Railey Montalan et.al.	2409.15380	link
2024-12-16	MQM-APE: Toward High-Quality Error Annotation Predictors with Automatic Post-Editing in LLM Translation Evaluators	Qingyu Lu et.al.	2409.14335	link
2024-09-21	ChemEval: A Comprehensive Multi-Level Chemical Evaluation for Large Language Models	Yuqing Huang et.al.	2409.13989	link
2024-12-17	AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs	Basel Mousi et.al.	2409.11404	null
2024-10-02	LLM-as-a-Judge & Reward Model: What They Can and Cannot Do	Guijin Son et.al.	2409.11239	null
2024-12-08	Towards Data Contamination Detection for Modern Large Language Models: Limitations, Inconsistencies, and Oracle Challenges	Vinay Samuel et.al.	2409.09927	link
2024-09-13	Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia	Fajri Koto et.al.	2409.08564	null
2024-09-09	Assessing SPARQL capabilities of Large Language Models	Lars-Peter Meyer et.al.	2409.05925	link
2024-10-08	LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs	Yuhao Wu et.al.	2409.02076	link
2024-10-14	Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation	Jasper Dekoninck et.al.	2409.00696	null
2024-08-26	Evaluating ChatGPT on Nuclear Domain-Specific Data	Muhammad Anwar et.al.	2409.00090	null
2024-08-28	LLMSecCode: Evaluating Large Language Models for Secure Coding	Anton Rydén et.al.	2408.16100	link
2024-08-26	LLM-3D Print: Large Language Models To Monitor and Control 3D Printing	Yayati Jadhav et.al.	2408.14307	null
2024-08-26	Epidemic Information Extraction for Event-Based Surveillance using Large Language Models	Sergio Consoli et.al.	2408.14277	null
2024-10-04	MobileQuant: Mobile-friendly Quantization for On-device Language Models	Fuwen Tan et.al.	2408.13933	link
2024-08-23	LalaEval: A Holistic Human Evaluation Framework for Domain-Specific Large Language Models	Chongyan Sun et.al.	2408.13338	null
2024-08-23	Open Llama2 Model for the Lithuanian Language	Artūras Nakvosas et.al.	2408.12963	null
2024-08-23	LIMP: Large Language Model Enhanced Intent-aware Mobility Prediction	Songwei Li et.al.	2408.12832	link
2024-12-20	Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts	Jiaqing Liu et.al.	2408.09688	null
2024-08-20	Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge	Ravi Raju et.al.	2408.08808	null
2024-10-16	The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation	Samee Arif et.al.	2408.08688	link
2024-10-19	Persona is a Double-edged Sword: Mitigating the Negative Impact of Role-playing Prompts in Zero-shot Reasoning Tasks	Junseok Kim et.al.	2408.08631	null

(back to top)

LLM MLLM

Publish Date	Title	Authors	PDF	Code
2025-02-27	R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts	Zhongyang Li et.al.	2502.20395	null
2025-02-27	InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions	Sirui Xu et.al.	2502.20390	null
2025-02-27	Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation	Sucheng Ren et.al.	2502.20388	null
2025-02-27	Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis	Jeffrey Yang Fan Chiang et.al.	2502.20383	null
2025-02-27	Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers	Shalev Lifshitz et.al.	2502.20379	null
2025-02-27	PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation	Albert Gong et.al.	2502.20377	null
2025-02-27	Constrained Generative Modeling with Manually Bridged Diffusion Models	Saeid Naderiparizi et.al.	2502.20371	null
2025-02-27	Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization	Ryan C. Barron et.al.	2502.20364	null
2025-02-27	Bridging the Creativity Understanding Gap: Small-Scale Human Alignment Enables Expert-Level Humor Ranking in LLMs	Kuan Lok Zhou et.al.	2502.20356	null
2025-02-27	KEDRec-LM: A Knowledge-distilled Explainable Drug Recommendation Large Language Model	Kai Zhang et.al.	2502.20350	null
2025-02-27	Sparse Auto-Encoder Interprets Linguistic Features in Large Language Models	Yi Jing et.al.	2502.20344	null
2025-02-27	Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners	Daniele Paliotta et.al.	2502.20339	null
2025-02-27	Expertise Is What We Want	Alan Ashworth et.al.	2502.20335	null
2025-02-27	Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models	Yukang Yang et.al.	2502.20332	null
2025-02-27	Long-Context Inference with Retrieval-Augmented Speculative Decoding	Guanzheng Chen et.al.	2502.20330	null
2025-02-27	EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants	Franck Cappello et.al.	2502.20309	null
2025-02-27	M^3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging	Jinghao Feng et.al.	2502.20301	null
2025-02-27	An exploration of features to improve the generalisability of fake news detection models	Nathaniel Hoy et.al.	2502.20299	null
2025-02-27	Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription	Benjamin Gutteridge et.al.	2502.20295	null
2025-02-27	Conformal Tail Risk Control for Large Language Model Alignment	Catherine Yu-Chi Chen et.al.	2502.20285	null
2025-02-27	Evaluating Human Trust in LLM-Based Planners: A Preliminary Study	Shenghui Chen et.al.	2502.20284	null
2025-02-27	Large Language Models as Attribution Regularizers for Efficient Model Training	Davor Vukadin et.al.	2502.20268	null
2025-02-27	Vector-Quantized Vision Foundation Models for Object-Centric Learning	Rongzhen Zhao et.al.	2502.20263	null
2025-02-27	LLM as a Broken Telephone: Iterative Generation Distorts Information	Amr Mohamed et.al.	2502.20258	null
2025-02-27	Do computer vision foundation models learn the low-level characteristics of the human visual system?	Yancheng Cai et.al.	2502.20256	null
2025-02-27	Beyond Natural Language Perplexity: Detecting Dead Code Poisoning in Code Generation Datasets	Chichien Tsai et.al.	2502.20246	null
2025-02-27	From Retrieval to Generation: Comparing Different Approaches	Abdelrahman Abdallah et.al.	2502.20245	null
2025-02-27	FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving	Guizhen Chen et.al.	2502.20238	null
2025-02-27	AI Will Always Love You: Studying Implicit Biases in Romantic AI Companions	Clare Grogan et.al.	2502.20231	null
2025-02-27	Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars	Tobias Kirschstein et.al.	2502.20220	null
2025-02-27	ChineseEcomQA: A Scalable E-commerce Concept Evaluation Benchmark for Large Language Models	Haibin Chen et.al.	2502.20196	null
2025-02-27	Model Checking Linear Temporal Logic with Standpoint Modalities	Rajab Aghamov et.al.	2502.20193	null
2025-02-27	Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge	Yan-Lun Chen et.al.	2502.20186	null
2025-02-27	DGFM: Full Body Dance Generation Driven by Music Foundation Models	Xinran Liu et.al.	2502.20176	null
2025-02-27	An Extensive Evaluation of PDDL Capabilities in off-the-shelf LLMs	Kaustubh Vyas et.al.	2502.20175	null
2025-02-27	Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think	Liang Chen et.al.	2502.20172	null
2025-02-27	Re-evaluating Open-ended Evaluation of Large Language Models	Siqi Liu et.al.	2502.20170	null
2025-02-27	Adaptive H&E-IHC information fusion staining framework based on feature extra	Yifan Jia et.al.	2502.20156	null
2025-02-27	Telephone Surveys Meet Conversational AI: Evaluating a LLM-Based Telephone Survey System at Scale	Max M. Lang et.al.	2502.20140	null
2025-02-27	Finite State Automata Inside Transformers with Chain-of-Thought: A Mechanistic Study on State Tracking	Yifan Zhang et.al.	2502.20129	null
2025-02-27	Self-Training Elicits Concise Reasoning in Large Language Models	Tergel Munkhbat et.al.	2502.20122	null
2025-02-27	LongRoPE2: Near-Lossless LLM Context Window Scaling	Ning Shang et.al.	2502.20082	null
2025-02-27	Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents	Haochen Sun et.al.	2502.20073	null
2025-02-27	A Generative Model Enhanced Multi-Agent Reinforcement Learning Method for Electric Vehicle Charging Navigation	Tianyang Qi et.al.	2502.20068	null
2025-02-27	Polish-ASTE: Aspect-Sentiment Triplet Extraction Datasets for Polish	Marta Lango et.al.	2502.20046	null
2025-02-27	3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds	Hengshuo Chu et.al.	2502.20041	null
2025-02-27	AsymLoRA: Harmonizing Data Conflicts and Commonalities in MLLMs	Xuyang Wei et.al.	2502.20035	null
2025-02-27	Erasing Without Remembering: Safeguarding Knowledge Forgetting in Large Language Models	Huazheng Wang et.al.	2502.19982	null
2025-02-27	The Lookahead Limitation: Why Multi-Operand Addition is Hard for LLMs	Tanja Baeumel et.al.	2502.19981	null
2025-02-27	Can Large Language Models Unveil the Mysteries? An Exploration of Their Ability to Unlock Information in Complex Scenarios	Chao Wang et.al.	2502.19973	null
2025-02-27	Deterministic or probabilistic? The psychology of LLMs as random number generators	Javier Coronado-Blázquez et.al.	2502.19965	null
2025-02-27	SeisMoLLM: Advancing Seismic Monitoring via Cross-modal Transfer with Pre-trained Large Language Model	Xinghao Wang et.al.	2502.19960	link
2025-02-27	Collaborative Stance Detection via Small-Large Language Model Consistency Verification	Yu Yan et.al.	2502.19954	null
2025-02-27	GeoEdit: Geometric Knowledge Editing for Large Language Models	Yujie Feng et.al.	2502.19953	null
2025-02-27	Algebraic Machine Learning: Learning as computing an algebraic decomposition of a task	Fernando Martin-Maroto et.al.	2502.19944	null
2025-02-27	Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation	Xiang Geng et.al.	2502.19941	null
2025-02-27	Playing Pokémon Red via Deep Reinforcement Learning	Marco Pleines et.al.	2502.19920	null
2025-02-27	Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models	Yuan Sui et.al.	2502.19918	null
2025-02-27	Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents	Zhenyu Liu et.al.	2502.19917	null
2025-02-27	LLM-driven Effective Knowledge Tracing by Integrating Dual-channel Difficulty	Jiahui Cen et.al.	2502.19915	null
2025-02-27	SkipPipe: Partial and Reordered Pipelining Framework for Training LLMs in Heterogeneous Networks	Nikolay Blagoev et.al.	2502.19913	null
2025-02-27	Order Doesn't Matter, But Reasoning Does: Training LLMs with Order-Centric Augmentation	Qianxi He et.al.	2502.19907	null
2025-02-27	Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy	Zaijing Li et.al.	2502.19902	null
2025-02-27	GenPC: Zero-shot Point Cloud Completion via 3D Generative Priors	An Li et.al.	2502.19896	null
2025-02-27	Beyond the Tip of Efficiency: Uncovering the Submerged Threats of Jailbreak Attacks in Small Language Models	Sibo Yi et.al.	2502.19883	null
2025-02-27	Towards Multimodal Large-Language Models for Parent-Child Interaction: A Focus on Joint Attention	Weiyan Shi et.al.	2502.19877	null
2025-02-27	MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge	Yuntao Du et.al.	2502.19870	link
2025-02-27	MIND: Towards Immersive Psychological Healing with Multi-agent Inner Dialogue	Yujia Chen et.al.	2502.19860	null
2025-02-27	ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments	Hojae Han et.al.	2502.19852	null
2025-02-27	One-for-More: Continual Diffusion Model for Anomaly Detection	Xiaofan Li et.al.	2502.19848	null
2025-02-27	ProAPO: Progressively Automatic Prompt Optimization for Visual Classification	Xiangyan Qu et.al.	2502.19844	null
2025-02-27	Shared Stochastic Gaussian Process Latent Variable Models: A Multi-modal Generative Model for Quasar Spectra	Vidhi Lalchand et.al.	2502.19824	null
2025-02-27	Foot-In-The-Door: A Multi-turn Jailbreak for LLMs	Zixuan Weng et.al.	2502.19820	null
2025-02-27	Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts	Shulai Zhang et.al.	2502.19811	null
2025-02-27	Implicit Search via Discrete Diffusion: A Study on Chess	Jiacheng Ye et.al.	2502.19805	null
2025-02-27	Developmental Support Approach to AI's Autonomous Growth: Toward the Realization of a Mutually Beneficial Stage Through Experiential Learning	Taichiro Endo et.al.	2502.19798	null
2025-02-27	ChatMol: A Versatile Molecule Designer Based on the Numerically Enhanced Large Language Model	Chuanliu Fan et.al.	2502.19794	null
2025-02-27	Mixtera: A Data Plane for Foundation Model Training	Maximilian Böther et.al.	2502.19790	null
2025-02-27	Advancements in Natural Language Processing for Automatic Text Summarization	Nevidu Jayatilleke et.al.	2502.19773	null
2025-02-27	Does Your Voice Assistant Remember? Analyzing Conversational Context Recall and Utilization in Voice Interaction Models	Heeseung Kim et.al.	2502.19759	null
2025-02-27	PolyPrompt: Automating Knowledge Extraction from Multilingual Language Models with Dynamic Prompt Generation	Nathan Roll et.al.	2502.19756	null
2025-02-27	Beneath the Surface: How Large Language Models Reflect Hidden Bias	Jinhao Pan et.al.	2502.19749	null
2025-02-27	HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture	Taiqiang Wu et.al.	2502.19747	null
2025-02-27	R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning	Minggui He et.al.	2502.19735	null
2025-02-27	Preference Learning Unlocks LLMs' Psycho-Counseling Skills	Mian Zhang et.al.	2502.19731	null
2025-02-27	Do Expressions Change Decisions? Exploring the Impact of AI's Explanation Tone on Decision-Making	Ayano Okoso et.al.	2502.19730	null
2025-02-27	Tokens for Learning, Tokens for Unlearning: Mitigating Membership Inference Attacks in Large Language Models via Dual-Purpose Training	Toan Tran et.al.	2502.19726	null
2025-02-27	Few-Shot Multilingual Open-Domain QA from 5 Examples	Fan Jiang et.al.	2502.19722	null
2025-02-27	Sensing and Steering Stereotypes: Extracting and Applying Gender Representation Vectors in LLMs	Hannah Cyberey et.al.	2502.19721	null
2025-02-27	Teaching Dense Retrieval Models to Specialize with Listwise Distillation and LLM Data Augmentation	Manveer Singh Tamber et.al.	2502.19712	null
2025-02-27	AoECR: AI-ization of Elderly Care Robot	Linkun Zhou et.al.	2502.19706	null
2025-02-27	You Only Click Once: Single Point Weakly Supervised 3D Instance Segmentation for Autonomous Driving	Guangfeng Jiang et.al.	2502.19698	null
2025-02-27	M-LLM Based Video Frame Selection for Efficient Video Understanding	Kai Hu et.al.	2502.19680	null
2025-02-27	Old Experience Helps: Leveraging Survey Methodology to Improve AI Text Annotation Reliability in Social Sciences	Linzhuo li et.al.	2502.19679	null
2025-02-27	Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack	Chenhe Gu et.al.	2502.19672	null
2025-02-27	SuPreME: A Supervised Pre-training Framework for Multimodal ECG Representation Learning	Mingsheng Cai et.al.	2502.19668	null
2025-02-27	Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models	Jan Wehner et.al.	2502.19649	null
2025-02-27	cMIM: A Contrastive Mutual Information Framework for Unified Generative and Discriminative Representation Learning	Micha Livne et.al.	2502.19642	null
2025-02-26	Agentic Mixture-of-Workflows for Multi-Modal Chemical Search	Tiffany J. Callahan et.al.	2502.19629	null
2025-02-26	Treatment Non-Adherence Bias in Clinical Machine Learning: A Real-World Study on Hypertension Medication	Zhongyuan Liang et.al.	2502.19625	null
2025-02-26	Norm Growth and Stability Challenges in Localized Sequential Knowledge Editing	Akshat Gupta et.al.	2502.19416	null
2025-02-26	Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs	Dayu Yang et.al.	2502.19411	null
2025-02-26	Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small Devices	Xinru Wang et.al.	2502.19410	null
2025-02-26	ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models	Danae Sánchez Villegas et.al.	2502.19409	null
2025-02-26	Learning Code-Edit Embedding to Model Student Debugging Behavior	Hasnain Heickal et.al.	2502.19407	null
2025-02-26	General Reasoning Requires Learning to Reason from the Get-go	Seungwook Han et.al.	2502.19402	null
2025-02-26	TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding	Max Ku et.al.	2502.19400	null
2025-02-26	Multi-modal Contrastive Learning for Tumor-specific Missing Modality Synthesis	Minjoo Lim et.al.	2502.19390	null
2025-02-26	LiDAR Registration with Visual Foundation Models	Niclas Vödisch et.al.	2502.19374	null
2025-02-26	Deep Learning For Time Series Analysis With Application On Human Motion	Ali Ismail-Fawaz et.al.	2502.19364	null
2025-02-26	DataMan: Data Manager for Pre-training Large Language Models	Ru Peng et.al.	2502.19363	null
2025-02-26	Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?	Yancheng He et.al.	2502.19361	null
2025-02-26	Evaluating LLMs and Pre-trained Models for Text Summarization Across Diverse Datasets	Tohida Rehman et.al.	2502.19339	null
2025-02-26	Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems	Hao Peng et.al.	2502.19328	null
2025-02-26	Shh, don't say that! Domain Certification in LLMs	Cornelius Emde et.al.	2502.19320	null
2025-02-26	Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond	Qizhou Wang et.al.	2502.19301	null
2025-02-26	Agent-centric Information Access	Evangelos Kanoulas et.al.	2502.19298	null
2025-02-26	Complex LLM Planning via Automated Heuristics Discovery	Hongyi Ling et.al.	2502.19295	null
2025-02-26	Efficient Federated Search for Retrieval-Augmented Generation	Rachid Guerraoui et.al.	2502.19280	null
2025-02-26	ArtInsight: Enabling AI-Powered Artwork Engagement for Mixed Visual-Ability Families	Arnavi Chheda-Kothary et.al.	2502.19263	null
2025-02-26	AI-Powered Bayesian Inference	Veronika Ročková et.al.	2502.19231	null
2025-02-26	Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time	Jiazheng Li et.al.	2502.19230	null
2025-02-26	A Lightweight and Extensible Cell Segmentation and Classification Model for Whole Slide Images	Nikita Shvetsov et.al.	2502.19217	null
2025-02-26	A Hybrid Transformer Architecture with a Quantized Self-Attention Mechanism Applied to Molecular Generation	Anthony M. Smaldone et.al.	2502.19214	null
2025-02-26	Negation-Induced Forgetting in LLMs	Francesca Capuano et.al.	2502.19211	null
2025-02-26	Bi'an: A Bilingual Benchmark and Model for Hallucination Detection in Retrieval-Augmented Generation	Zhouyu Jiang et.al.	2502.19209	null
2025-02-26	Simulation of Language Evolution under Regulated Social Media Platforms: A Synergistic Approach of Large Language Models and Genetic Algorithms	Jinyu Cai et.al.	2502.19193	null
2025-02-26	BIG-Bench Extra Hard	Mehran Kazemi et.al.	2502.19187	null
2025-02-26	INFO-SEDD: Continuous Time Markov Chains as Scalable Information Metrics Estimators	Alberto Foresti et.al.	2502.19183	null
2025-02-26	UQABench: Evaluating User Embedding for Prompting LLMs in Personalized Question Answering	Langming Liu et.al.	2502.19178	null
2025-02-26	MEDDxAgent: A Unified Modular Agent Framework for Explainable Automatic Differential Diagnosis	Daniel Rose et.al.	2502.19175	null
2025-02-26	A Model-Centric Review of Deep Learning for Protein Design	Gregory W. Kyro et.al.	2502.19173	null
2025-02-26	CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation	Kaiwen Yan et.al.	2502.19166	null
2025-02-26	TestNUC: Enhancing Test-Time Computing Approaches through Neighboring Unlabeled Data Consistency	Henry Peng Zou et.al.	2502.19163	null
2025-02-26	Detecting Linguistic Indicators for Stereotype Assessment with Large Language Models	Rebekka Görge et.al.	2502.19160	null
2025-02-26	A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMs	Xuan Ding et.al.	2502.19159	null
2025-02-26	When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning	Yijiang River Dong et.al.	2502.19158	null
2025-02-26	Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval	Jiarong Wu et.al.	2502.19149	null
2025-02-26	Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs	Zhaowei Zhang et.al.	2502.19148	null
2025-02-26	Identification Under the Semantic Effective Secrecy Constraint	Abdalla Ibrahim et.al.	2502.19142	null
2025-02-26	A Temporal Planning Framework for Multi-Agent Systems via LLM-Aided Knowledge Base Management	Enrico Saccon et.al.	2502.19135	null
2025-02-26	Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement	Siyuan Zhang et.al.	2502.19127	null
2025-02-26	A Survey on Foundation-Model-Based Industrial Defect Detection	Tianle Yang et.al.	2502.19106	null
2025-02-26	Evaluating Gender Bias in German Machine Translation	Michelle Kappl et.al.	2502.19104	null
2025-02-26	LongEval: A Comprehensive Analysis of Long-Text Generation Through a Plan-based Paradigm	Siwei Wu et.al.	2502.19103	null
2025-02-26	Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks Automation	Humza Sami et.al.	2502.19091	link
2025-02-26	EndoMamba: An Efficient Foundation Model for Endoscopic Videos	Qingyao Tian et.al.	2502.19090	null
2025-02-26	Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs	Yiheng Yang et.al.	2502.19078	null
2025-02-26	IndicEval-XL: Bridging Linguistic Diversity in Code Generation Across Indic Languages	Ujjwal Singh et.al.	2502.19067	null
2025-02-26	Can Large Language Models Outperform Non-Experts in Poetry Evaluation? A Comparative Study Using the Consensual Assessment Technique	Piotr Sawicki et.al.	2502.19064	null
2025-02-26	MathClean: A Benchmark for Synthetic Mathematical Data Cleaning	Hao Liang et.al.	2502.19058	null
2025-02-26	Beyond Surface-Level Patterns: An Essence-Driven Defense Framework Against Jailbreak Attacks in LLMs	Shiyu Xiang et.al.	2502.19041	null
2025-02-26	FungalZSL: Zero-Shot Fungal Classification with Image Captioning Using a Synthetic Data Approach	Anju Rani et.al.	2502.19038	null
2025-02-26	InternVQA: Advancing Compressed Video Quality Assessment with Distilling Large Foundation Model	Fengbin Guan et.al.	2502.19026	null
2025-02-26	Binary Neural Networks for Large Language Model: A Survey	Liangdong Liu et.al.	2502.19008	null
2025-02-26	The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training	Jinbo Wang et.al.	2502.19002	null
2025-02-26	MEBench: Benchmarking Large Language Models for Cross-Document Multi-Entity Question Answering	Teng Lin et.al.	2502.18993	null
2025-02-26	OntologyRAG: Better and Faster Biomedical Code Mapping with Retrieval-Augmented Generation (RAG) Leveraging Ontology Knowledge Graphs and Large Language Models	Hui Feng et.al.	2502.18992	null
2025-02-26	GenTool: Enhancing Tool Generalization in Language Models through Zero-to-One and Weak-to-Strong Simulation	Jie He et.al.	2502.18990	null
2025-02-26	PEToolLLM: Towards Personalized Tool Learning in Large Language Models	Qiancheng Xu et.al.	2502.18980	null
2025-02-26	Low-Confidence Gold: Refining Low-Confidence Samples for Efficient Instruction Tuning	Hongyi Cal et.al.	2502.18978	null
2025-02-26	(Mis)Fitting: A Survey of Scaling Laws	Margaret Li et.al.	2502.18969	null
2025-02-26	Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles	Kuang Wang et.al.	2502.18968	link
2025-02-26	OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment	Jiaxin Deng et.al.	2502.18965	null
2025-02-26	DualSpec: Text-to-spatial-audio Generation via Dual-Spectrogram Guided Diffusion Model	Lei Zhao et.al.	2502.18952	null
2025-02-26	Towards Label-Only Membership Inference Attack against Pre-trained Large Language Models	Yu He et.al.	2502.18943	null
2025-02-26	JailBench: A Comprehensive Chinese Security Assessment Benchmark for Large Language Models	Shuyi Liu et.al.	2502.18935	null
2025-02-26	Talking like Piping and Instrumentation Diagrams (P&IDs)	Achmad Anggawirya Alimin et.al.	2502.18928	null
2025-02-26	ClassInvGen: Class Invariant Synthesis using Large Language Models	Chuyue Sun et.al.	2502.18917	null
2025-02-26	END: Early Noise Dropping for Efficient and Effective Context Denoising	Hongye Jin et.al.	2502.18915	null
2025-02-26	CLLoRA: An Approach to Measure the Effects of the Context Length for LLM Fine-Tuning	Ping Zhang et.al.	2502.18910	null
2025-02-26	An Empirical Study on Commit Message Generation using LLMs via In-Context Learning	Yifan Wu et.al.	2502.18904	null
2025-02-26	From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens	Tong Wu et.al.	2502.18890	null
2025-02-26	Letters from Future Self: Augmenting the Letter-Exchange Exercise with LLM-based Agents to Enhance Young Adults' Career Exploration	Hayeon Jeon et.al.	2502.18881	null
2025-02-26	Learning to Generate Structured Output with Schema Reinforcement Learning	Yaxi Lu et.al.	2502.18878	null
2025-02-26	Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework	Kaishuai Xu et.al.	2502.18874	null
2025-02-26	Multi-LLM Collaborative Search for Complex Problem Solving	Sen Yang et.al.	2502.18873	null
2025-02-26	A Theoretical Perspective: How to Prevent Model Collapse in Self-consuming Training Loops	Shi Fu et.al.	2502.18865	null
2025-02-26	Sherlock: Towards Multi-scene Video Abnormal Event Extraction and Localization via a Global-local Spatial-sensitive LLM	Junxiao Ma et.al.	2502.18863	null
2025-02-26	A Causal Lens for Evaluating Faithfulness Metrics	Kerem Zaman et.al.	2502.18848	null
2025-02-26	Sliding Window Attention Training for Efficient Large Language Models	Zichuan Fu et.al.	2502.18845	null
2025-02-26	Evidence-Driven Marker Extraction for Social Media Suicide Risk Detection	Carter Adams et.al.	2502.18823	null
2025-02-26	Data-Efficient Multi-Agent Spatial Planning with LLMs	Huangyuan Su et.al.	2502.18822	null
2025-02-26	CAMEx: Curvature-aware Merging of Experts	Dung V. Nguyen et.al.	2502.18821	null
2025-02-26	Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models	Shuliang Liu et.al.	2502.18817	null
2025-02-26	Holistic Audit Dataset Generation for LLM Unlearning via Knowledge Graph Traversal and Redundancy Removal	Weipeng Jiang et.al.	2502.18810	null
2025-02-26	Optimal Stochastic Trace Estimation in Generative Modeling	Xinyang Liu et.al.	2502.18808	null
2025-02-26	SolEval: Benchmarking Large Language Models for Repository-level Solidity Code Generation	Zhiyuan Peng et.al.	2502.18793	null
2025-02-26	Active Few-Shot Learning for Text Classification	Saeed Ahmadnia et.al.	2502.18782	null
2025-02-26	Towards Optimal Multi-draft Speculative Decoding	Zhengmian Hu et.al.	2502.18779	null
2025-02-26	M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance	Qingpei Guo et.al.	2502.18778	null
2025-02-26	Plutus: Benchmarking Large Language Models in Low-Resource Greek Finance	Xueqing Peng et.al.	2502.18772	null
2025-02-26	Exploring Graph Tasks with Pure LLMs: A Comprehensive Benchmark and Investigation	Yuxiang Wang et.al.	2502.18771	link
2025-02-26	Reward Shaping to Mitigate Reward Hacking in RLHF	Jiayi Fu et.al.	2502.18770	null
2025-02-26	CommGPT: A Graph and Retrieval-Augmented Multimodal Communication Foundation Model	Feibo Jiang et.al.	2502.18763	null
2025-02-26	Training Large Recommendation Models via Graph-Language Token Alignment	Mingdai Yang et.al.	2502.18757	null
2025-02-26	M-ANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical Type	Weiming Hu et.al.	2502.18755	null
2025-02-26	AgentSociety Challenge: Designing LLM Agents for User Modeling and Recommendation on Web Platforms	Yuwei Yan et.al.	2502.18754	null
2025-02-26	Spectral-Enhanced Transformers: Leveraging Large-Scale Pretrained Models for Hyperspectral Object Tracking	Shaheer Mohamed et.al.	2502.18748	null
2025-02-26	Automatic Prompt Optimization via Heuristic Search: A Survey	Wendi Cui et.al.	2502.18746	null
2025-02-25	DRAMA: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers	Xueguang Ma et.al.	2502.18460	null
2025-02-25	LLM-Based Design Pattern Detection	Christian Schindler et.al.	2502.18458	null
2025-02-25	FRIDA to the Rescue! Analyzing Synthetic Data Effectiveness in Object-Based Common Sense Reasoning for Disaster Response	Mollie Shichman et.al.	2502.18452	null
2025-02-25	SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution	Yuxiang Wei et.al.	2502.18449	null
2025-02-25	MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning	Chanwoo Park et.al.	2502.18439	null
2025-02-25	TextGames: Learning to Self-Play Text-Based Puzzle Games via Language Model Reasoning	Frederikus Hudi et.al.	2502.18431	null
2025-02-25	OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference	Xiangyu Zhao et.al.	2502.18411	null
2025-02-25	Enhancing DNA Foundation Models to Address Masking Inefficiencies	Monireh Safari et.al.	2502.18405	null
2025-02-25	Monte Carlo Temperature: a robust sampling strategy for LLM's uncertainty quantification methods	Nicola Cecere et.al.	2502.18389	null
2025-02-25	How Far are LLMs from Real Search? A Comprehensive Study on Efficiency, Completeness, and Inherent Capabilities	Minhua Lin et.al.	2502.18387	null
2025-02-25	MindMem: Multimodal for Predicting Advertisement Memorability Using LLMs and Deep Learning	Sepehr Asgarian et.al.	2502.18371	null
2025-02-25	Sparse Bayesian Generative Modeling for Joint Parameter and Channel Estimation	Benedikt Böck et.al.	2502.18369	null
2025-02-25	ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation	Yifan Pu et.al.	2502.18364	null
2025-02-25	Responsible AI Agents	Deven R. Desai et.al.	2502.18359	null
2025-02-25	Which Contributions Deserve Credit? Perceptions of Attribution in Human-AI Co-Creation	Jessica He et.al.	2502.18357	null
2025-02-25	BRIDO: Bringing Democratic Order to Abstractive Summarization	Junhyun Lee et.al.	2502.18342	null
2025-02-25	Mapping of Subjective Accounts into Interpreted Clusters (MOSAIC): Topic Modelling and LLM applied to Stroboscopic Phenomenology	Romy Beauté et.al.	2502.18318	null
2025-02-25	GCDance: Genre-Controlled 3D Full Body Dance Generation Driven By Music	Xinran Liu et.al.	2502.18309	null
2025-02-25	RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction	Jianhao Yan et.al.	2502.18308	null
2025-02-25	LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation	Pengzhi Li et.al.	2502.18302	null
2025-02-25	Bayesian Computation in Deep Learning	Wenlong Chen et.al.	2502.18300	null
2025-02-25	DeepCircuitX: A Comprehensive Repository-Level Dataset for RTL Code Understanding, Generation, and PPA Analysis	Zeju Li et.al.	2502.18297	null
2025-02-25	AMPO: Active Multi-Preference Optimization	Taneesh Gupta et.al.	2502.18293	null
2025-02-25	Better Aligned with Survey Respondents or Training Data? Unveiling Political Leanings of LLMs on U.S. Supreme Court Cases	Shanshan Xu et.al.	2502.18282	null
2025-02-25	Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision Support	Guoxin Wang et.al.	2502.18274	null
2025-02-25	Imperfect Knowledge Management (IKM) in GEFRED (GENeralized model for Fuzzy RElational Databases)	Leoncio Jimenez et.al.	2502.18255	null
2025-02-25	Iterative Counterfactual Data Augmentation	Mitchell Plyler et.al.	2502.18249	null
2025-02-25	Unveiling and Causalizing CoT: A Causal Pespective	Jiarun Fu et.al.	2502.18239	null
2025-02-25	Beyond the convexity assumption: Realistic tabular data generation under quantifier-free real linear constraints	Mihaela Cătălina Stoian et.al.	2502.18237	null
2025-02-25	Debt Collection Negotiations with Large Language Models: An Evaluation System and Optimizing Decision Making with Multi-Agent	Xiaofeng Wang et.al.	2502.18228	null
2025-02-25	From ChatGPT to DeepSeek: Can LLMs Simulate Humanity?	Qian Wang et.al.	2502.18210	null
2025-02-25	LAG: LLM agents for Leaderboard Auto Generation on Demanding	Jian Wu et.al.	2502.18209	null
2025-02-25	Grandes modelos de lenguaje: de la predicción de palabras a la comprensión?	Carlos Gómez-Rodríguez et.al.	2502.18205	null
2025-02-25	Intersubjective Model of AI-mediated Communication: Augmenting Human-Human Text Chat through LLM-based Adaptive Agent Pair	Shutaro Aoyama et.al.	2502.18201	null
2025-02-25	Task-Agnostic Semantic Communication with Multimodal Foundation Models	Jiangjing Hu et.al.	2502.18200	null
2025-02-25	Agnostic calculation of atomic free energies with the descriptor density of states	Thomas D Swinburne et.al.	2502.18191	null
2025-02-25	ChatMotion: A Multimodal Multi-Agent for Human Motion Analysis	Li Lei et.al.	2502.18180	null
2025-02-25	Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs	Gaye Colakoglu et.al.	2502.18179	null
2025-02-25	CLIPure: Purification in Latent Space via CLIP for Adversarially Robust Zero-Shot Classification	Mingkun Zhang et.al.	2502.18176	null
2025-02-25	SECURA: Sigmoid-Enhanced CUR Decomposition with Uninterrupted Retention and Low-Rank Adaptation in Large Language Models	Zhang Yuxuan et.al.	2502.18168	null
2025-02-25	Can LLMs Explain Themselves Counterfactually?	Zahra Dehghanighobadi et.al.	2502.18156	null
2025-02-25	Carbon and Silicon, Coexist or Compete? A Survey on Human-AI Interactions in Agent-based Modeling and Simulation	Ziyue Lin et.al.	2502.18145	null
2025-02-25	LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers	Zhuocheng Zhang et.al.	2502.18139	null
2025-02-25	Large Language Model Driven Agents for Simulating Echo Chamber Formation	Chenhao Gu et.al.	2502.18138	null
2025-02-25	Inverse Materials Design by Large Language Model-Assisted Generative Framework	Yun Hao et.al.	2502.18127	null
2025-02-25	HyperG: Hypergraph-Enhanced LLMs for Structured Knowledge	Sirui Huang et.al.	2502.18125	null
2025-02-25	Bayesian Optimization for Controlled Image Editing via LLMs	Chengkun Cai et.al.	2502.18116	null
2025-02-25	PromptMID: Modal Invariant Descriptors Based on Diffusion and Vision Foundation Models for Optical-SAR Image Matching	Han Nie et.al.	2502.18104	null
2025-02-25	Detecting Offensive Memes with Social Biases in Singapore Context Using Multimodal Large Language Models	Cao Yuxuan et.al.	2502.18101	link
2025-02-25	Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning	Wenkai Yang et.al.	2502.18080	null
2025-02-25	Examining the Threat Landscape: Foundation Models and Model Stealing	Ankita Raj et.al.	2502.18077	null
2025-02-25	MRBTP: Efficient Multi-Robot Behavior Tree Planning and Collaboration	Yishuai Cai et.al.	2502.18072	null
2025-02-25	Golden Ratio Mixing of Real and Synthetic Data for Stabilizing Generative Model Training	Hengzhi He et.al.	2502.18049	null
2025-02-25	AutoCas: Autoregressive Cascade Predictor in Social Networks via Large Language Models	Yuhao Zheng et.al.	2502.18040	null
2025-02-25	Harnessing Multiple Large Language Models: A Survey on LLM Ensemble	Zhijun Chen et.al.	2502.18036	null
2025-02-25	Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference	Zhuo Chen et.al.	2502.18023	null
2025-02-25	AfroXLMR-Comet: Multilingual Knowledge Distillation with Attention Matching for Low-Resource languages	Joshua Sakthivel Raju et.al.	2502.18020	null
2025-02-25	NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms	Yashan Wang et.al.	2502.18008	null
2025-02-25	Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning	Xinghao Chen et.al.	2502.18001	null
2025-02-25	Model-Free Adversarial Purification via Coarse-To-Fine Tensor Network Representation	Guang Lin et.al.	2502.17972	null
2025-02-25	LLM Knows Geometry Better than Algebra: Numerical Understanding of LLM-Based Agents in A Trading Arena	Tianmi Ma et.al.	2502.17967	null
2025-02-25	Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments	Patomporn Payoungkhamdee et.al.	2502.17956	null
2025-02-25	DeepSeek-R1 Outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in Bilingual Complex Ophthalmology Reasoning	Pusheng Xu et.al.	2502.17947	null
2025-02-25	Assessing Large Language Models in Agentic Multilingual National Bias	Qianying Liu et.al.	2502.17945	null
2025-02-25	CaseGen: A Benchmark for Multi-Stage Legal Case Documents Generation	Haitao Li et.al.	2502.17943	null
2025-02-25	Advantage-Guided Distillation for Preference Alignment in Small Language Models	Shiping Gao et.al.	2502.17927	null
2025-02-25	LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction	Suozhi Huang et.al.	2502.17925	null
2025-02-25	FACT-AUDIT: An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models	Hongzhan Lin et.al.	2502.17924	null
2025-02-25	Towards Sustainable Web Agents: A Plea for Transparency and Dedicated Metrics for Energy Consumption	Lars Krupp et.al.	2502.17903	null
2025-02-25	Knowledge-enhanced Multimodal ECG Representation Learning with Arbitrary-Lead Inputs	Che Liu et.al.	2502.17900	null
2025-02-25	Can Large Language Models Identify Implicit Suicidal Ideation? An Empirical Evaluation	Tong Li et.al.	2502.17899	null
2025-02-25	FetchBot: Object Fetching in Cluttered Shelves via Zero-Shot Sim2Real	Weiheng Liu et.al.	2502.17894	null
2025-02-25	RankCoT: Refining Knowledge for Retrieval-Augmented Generation through Ranking Chain-of-Thoughts	Mingyan Wu et.al.	2502.17888	null
2025-02-25	Science Across Languages: Assessing LLM Multilingual Translation of Scientific Papers	Hannah Calzi Kleidermacher et.al.	2502.17882	null
2025-02-25	EEGM2: An Efficient Mamba-2-Based Self-Supervised Framework for Long-Sequence EEG Modeling	Jiazhen Hong et.al.	2502.17873	null
2025-02-25	ASurvey: Spatiotemporal Consistency in Video Generation	Zhiyu Yin et.al.	2502.17863	null
2025-02-25	HRR: Hierarchical Retrospection Refinement for Generated Image Detection	Peipei Yuan et.al.	2502.17862	null
2025-02-25	LR ${}^{2}$ Bench: Evaluating Long-chain Reflective Reasoning Capabilities of Large Language Models via Constraint Satisfaction Problems	Jianghao Chen et.al.	2502.17848	null
2025-02-25	Quantifying interdisciplinary synergy in higher STEM education	Gahyoun Gim et.al.	2502.17841	null
2025-02-25	A Combinatorial Identities Benchmark for Theorem Proving via Automated Theorem Generation	Beibei Xiong et.al.	2502.17840	null
2025-02-25	TagGAN: A Generative Model for Data Tagging	Muhammad Nawaz et.al.	2502.17836	null
2025-02-25	MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks	Hyeonjeong Ha et.al.	2502.17832	null
2025-02-25	A General Framework to Enhance Fine-tuning-based LLM Unlearning	Jie Ren et.al.	2502.17823	null
2025-02-25	An Overview of Large Language Models for Statisticians	Wenlong Ji et.al.	2502.17814	null
2025-02-25	Can Multimodal LLMs Perform Time Series Anomaly Detection?	Xiongxiao Xu et.al.	2502.17812	null
2025-02-25	URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models	Ruiqi Yan et.al.	2502.17810	null
2025-02-25	DocPuzzle: A Process-Aware Benchmark for Evaluating Realistic Long-Context Reasoning Capabilities	Tianyi Zhuang et.al.	2502.17807	null
2025-02-25	Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training	Yihang Yao et.al.	2502.17800	null
2025-02-25	AIR: Complex Instruction Generation via Automatic Iterative Refinement	Wei Liu et.al.	2502.17787	null
2025-02-25	Exploring the Potential of Large Language Models for Estimating the Reading Comprehension Question Difficulty	Yoshee Jain et.al.	2502.17785	null
2025-02-25	Tip of the Tongue Query Elicitation for Simulated Evaluation	Yifan He et.al.	2502.17776	null
2025-02-25	FoREST: Frame of Reference Evaluation in Spatial Reasoning Tasks	Tanawan Premsri et.al.	2502.17775	null
2025-02-25	Uncertainty Quantification for LLM-Based Survey Simulations	Chengpiao Huang et.al.	2502.17773	null
2025-02-25	DeepSeek vs. ChatGPT: A Comparative Study for Scientific Computing and Scientific Machine Learning Tasks	Qile Jiang et.al.	2502.17764	null
2025-02-25	Design and implementation of a distributed security threat detection system integrating federated learning and multimodal LLM	Yuqing Wang et.al.	2502.17763	null
2025-02-25	Detection of LLM-Paraphrased Code and Identification of the Responsible LLM Using Coding Style Features	Shinwoo Park et.al.	2502.17749	null
2025-02-24	LLM Inference Acceleration via Efficient Operation Fusion	Mahsa Salmani et.al.	2502.17728	null
2025-02-24	Can Score-Based Generative Modeling Effectively Handle Medical Image Classification?	Sushmita Sarker et.al.	2502.17727	null
2025-02-24	Spontaneous Giving and Calculated Greed in Language Models	Yuxuan Li et.al.	2502.17720	null
2025-02-24	Mind the Gesture: Evaluating AI Sensitivity to Culturally Offensive Non-Verbal Gestures	Akhila Yerukola et.al.	2502.17710	null
2025-02-24	Fractal Generative Models	Tianhong Li et.al.	2502.17437	link
2025-02-24	Introducing Visual Perception Token into Multimodal Large Language Model	Runpeng Yu et.al.	2502.17425	link
2025-02-24	MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs	Jiarui Zhang et.al.	2502.17422	link
2025-02-24	LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification	Penghui Yang et.al.	2502.17421	link
2025-02-24	The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence	Tom Wollschläger et.al.	2502.17420	null
2025-02-24	From System 1 to System 2: A Survey of Reasoning Large Language Models	Zhong-Zhi Li et.al.	2502.17419	link
2025-02-24	Reasoning with Latent Thoughts: On the Power of Looped Transformers	Nikunj Saunshi et.al.	2502.17416	null
2025-02-24	COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs	Liming Liu et.al.	2502.17410	link
2025-02-24	Large Language Models are Powerful EHR Encoders	Stefan Hegselmann et.al.	2502.17403	null
2025-02-24	What is a Good Question? Utility Estimation with LLM-based Simulations	Dong-Ho Lee et.al.	2502.17383	null
2025-02-24	KV-Edit: Training-Free Image Editing for Precise Background Preservation	Tianrui Zhu et.al.	2502.17363	link
2025-02-24	A Closer Look at TabPFN v2: Strength, Limitation, and Extension	Han-Jia Ye et.al.	2502.17361	null
2025-02-24	RELICT: A Replica Detection Framework for Medical Image Generation	Orhun Utku Aydin et.al.	2502.17360	null
2025-02-24	On Relation-Specific Neurons in Large Language Models	Yihong Liu et.al.	2502.17355	link
2025-02-24	How Scientists Use Large Language Models to Program	Gabrielle O'Brien et.al.	2502.17348	null
2025-02-24	Time series forecasting based on optimized LLM for fault prediction in distribution power grid insulators	João Pedro Matos-Carvalho et.al.	2502.17341	null
2025-02-24	HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization	Zhenghao Liu et.al.	2502.17315	link
2025-02-24	Delta Decompression for MoE-based LLMs Compression	Hao Gu et.al.	2502.17298	link
2025-02-24	Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts	Zhenghao Liu et.al.	2502.17297	null
2025-02-24	Integrating protein sequence embeddings with structure via graph-based deep learning for the prediction of single-residue properties	Kevin Michalewicz et.al.	2502.17294	null
2025-02-24	Capability Instruction Tuning: A New Paradigm for Dynamic LLM Routing	Yi-Kai Zhang et.al.	2502.17282	link
2025-02-24	MonoTODia: Translating Monologue Requests to Task-Oriented Dialogues	Sebastian Steindl et.al.	2502.17268	null
2025-02-24	Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective	Chengyin Xu et.al.	2502.17262	null
2025-02-24	Detecting Benchmark Contamination Through Watermarking	Tom Sander et.al.	2502.17259	null
2025-02-24	REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective	Simon Geisler et.al.	2502.17254	null
2025-02-24	Alpha-SQL: Zero-Shot Text-to-SQL using Monte Carlo Tree Search	Boyan Li et.al.	2502.17248	null
2025-02-24	Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction	Tianpeng Li et.al.	2502.17239	link
2025-02-24	Making LLMs Reason? The Intermediate Language Problem in Neurosymbolic Approaches	Alexander Beiser et.al.	2502.17216	null
2025-02-24	CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought	Boxuan Zhang et.al.	2502.17214	link
2025-02-24	Order Matters: Investigate the Position Bias in Multi-constraint Instruction Following	Jie Zeng et.al.	2502.17204	link
2025-02-24	IGDA: Interactive Graph Discovery through Large Language Model Agents	Alex Havrilla et.al.	2502.17189	null
2025-02-24	Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks	Andrei Chernov et.al.	2502.17187	null
2025-02-24	Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric	Yuming Yang et.al.	2502.17184	link
2025-02-24	Unsupervised Accelerated MRI Reconstruction via Ground-Truth-Free Flow Matching	Xinzhe Luo et.al.	2502.17174	null
2025-02-24	Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch	Xueru Wen et.al.	2502.17173	null
2025-02-24	Logic Haystacks: Probing LLMs Long-Context Logical Reasoning (Without Easily Identifiable Unrelated Padding)	Damien Sileo et.al.	2502.17169	null
2025-02-24	JUREX-4E: Juridical Expert-Annotated Four-Element Knowledge Base for Legal Reasoning	Huanghai Liu et.al.	2502.17166	link
2025-02-24	MEMERAG: A Multilingual End-to-End Meta-Evaluation Benchmark for Retrieval Augmented Generation	María Andrea Cruz Blandón et.al.	2502.17163	null
2025-02-24	Real-time Monitoring of Economic Shocks using Company Websites	Michael Koenig et.al.	2502.17161	null
2025-02-24	A Pragmatic Note on Evaluating Generative Models with Fréchet Inception Distance for Retinal Image Synthesis	Yuli Wu et.al.	2502.17160	null
2025-02-24	Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation	Fanhu Zeng et.al.	2502.17159	null
2025-02-24	CodeSwift: Accelerating LLM Inference for Efficient Code Generation	Qianhui Zhao et.al.	2502.17139	null
2025-02-24	Evaluating the Effectiveness of Large Language Models in Automated News Article Summarization	Lionel Richy Panlap Houamegni et.al.	2502.17136	null
2025-02-24	Applications of Large Models in Medicine	YunHe Su et.al.	2502.17132	null
2025-02-24	Thus Spake Long-Context Large Language Model	Xiaoran Liu et.al.	2502.17129	null
2025-02-24	Adversarial Training for Defense Against Label Poisoning Attacks	Melis Ilayda Bal et.al.	2502.17121	link
2025-02-24	Diffusion Models for Tabular Data: Challenges, Current Progress, and Future Directions	Zhong Li et.al.	2502.17119	link
2025-02-24	SFLD: Reducing the content bias for AI-generated Image Detection	Seoyeon Gye et.al.	2502.17105	null
2025-02-24	Generative Models in Decision Making: A Survey	Yinchuan Li et.al.	2502.17100	null
2025-02-24	Improved Diffusion-based Generative Model with Better Adversarial Robustness	Zekun Wang et.al.	2502.17099	link
2025-02-24	Conditional Diffusion-Flow models for generating 3D cosmic density fields: applications to f(R) cosmologies	Julieth Katherine Riveros et.al.	2502.17087	null
2025-02-24	Automatically Evaluating the Paper Reviewing Capability of Large Language Models	Hyungyu Shin et.al.	2502.17086	null
2025-02-24	Pleno-Generation: A Scalable Generative Face Video Compression Framework with Bandwidth Intelligence	Bolin Chen et.al.	2502.17085	null
2025-02-24	Systematic Weight Evaluation for Pruning Large Language Models: Enhancing Performance and Sustainability	Ashhadul Islam et.al.	2502.17071	null
2025-02-24	LLM-QE: Improving Query Expansion by Aligning Large Language Models with Ranking Preferences	Sijia Yao et.al.	2502.17057	link
2025-02-24	PrivaCI-Bench: Evaluating Privacy with Contextual Integrity and Legal Compliance	Haoran Li et.al.	2502.17041	null
2025-02-24	Evolution 6.0: Evolving Robotic Capabilities Through Generative Design	Muhammad Haris Khan et.al.	2502.17034	null
2025-02-24	Understanding the Uncertainty of LLM Explanations: A Perspective Based on Reasoning Topology	Longchao Da et.al.	2502.17026	null
2025-02-24	Towards Auto-Regressive Next-Token Prediction: In-Context Learning Emerges from Generalization	Zixuan Gong et.al.	2502.17024	null
2025-02-24	Quantifying Logical Consistency in Transformers via Query-Key Alignment	Eduard Tulchinskii et.al.	2502.17017	null
2025-02-24	Predicting Liquidity-Aware Bond Yields using Causal GANs and Deep Reinforcement Learning with LLM Evaluation	Jaskaran Singh Walia et.al.	2502.17011	null
2025-02-24	Be CIM or Be Memory: A Dual-mode-aware DNN Compiler for CIM Accelerators	Shixin Zhao et.al.	2502.17006	null
2025-02-24	An Enhanced Large Language Model For Cross Modal Query Understanding System Using DL-KeyBERT Based CAZSSCL-MPGPT	Shreya Singh et.al.	2502.17000	null
2025-02-24	Active Learning for Conditional Inverse Design with Crystal Generation and Foundation Atomic Models	Zhuoyuan Li et.al.	2502.16984	null
2025-02-24	LongSafety: Evaluating Long-Context Safety of Large Language Models	Yida Lu et.al.	2502.16971	link
2025-02-24	Autoregressive Image Generation Guided by Chains of Thought	Miaomiao Cai et.al.	2502.16965	null
2025-02-24	Make LLM Inference Affordable to Everyone: Augmenting GPU Memory with NDP-DIMM	Lian Liu et.al.	2502.16963	null
2025-02-24	UrduLLaMA 1.0: Dataset Curation, Preprocessing, and Evaluation in Low-Resource Settings	Layba Fiaz et.al.	2502.16961	null
2025-02-24	Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance	Chenghua Huang et.al.	2502.16944	null
2025-02-24	Reasoning Does Not Necessarily Improve Role-Playing Ability	Xiachong Feng et.al.	2502.16940	null
2025-02-24	BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference	Zewen Jin et.al.	2502.16927	null
2025-02-24	FilterLLM: Text-To-Distribution LLM for Billion-Scale Cold-Start Recommendation	Ruochen Liu et.al.	2502.16924	null
2025-02-24	A Systematic Survey of Automatic Prompt Optimization Techniques	Kiran Ramnath et.al.	2502.16923	null
2025-02-24	Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties	Zhenglin Wang et.al.	2502.16922	null
2025-02-24	SS-MPC: A Sequence-Structured Multi-Party Conversation System	Yoonjin Jang et.al.	2502.16920	null
2025-02-24	Multi-Dimensional Quality Assessment for Text-to-3D Assets: Dataset and Model	Kang Fu et.al.	2502.16915	null
2025-02-24	SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models	Kevin Miller et.al.	2502.16911	null
2025-02-24	AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language Models	Qin Zhu et.al.	2502.16906	null
2025-02-24	GuidedBench: Equipping Jailbreak Evaluation with Guidelines	Ruixuan Huang et.al.	2502.16903	null
2025-02-24	Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinment	Suchae Jeong et.al.	2502.16902	null
2025-02-24	Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs	Himanshu Beniwal et.al.	2502.16901	link
2025-02-24	Zero-shot Load Forecasting for Integrated Energy Systems: A Large Language Model-based Framework with Multi-task Learning	Jiaheng Li et.al.	2502.16896	null
2025-02-24	Unlocking Scientific Concepts: How Effective Are LLM-Generated Analogies for Student Understanding and Classroom Practice?	Zekai Shao et.al.	2502.16895	null
2025-02-24	Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment	Chenghao Fan et.al.	2502.16894	null
2025-02-24	Applying LLMs to Active Learning: Towards Cost-Efficient Cross-Task Text Classification without Manually Labeled Data	Yejian Zhang et.al.	2502.16892	null
2025-02-24	Unveiling Institution-Specific Bias in Pathology Foundation Models: Detriments, Causes, and Potential Solutions	Weiping Lin et.al.	2502.16889	null
2025-02-24	DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance	Xuanfan Ni et.al.	2502.16886	null
2025-02-24	CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter	Yepeng Weng et.al.	2502.16880	null
2025-02-24	A Multi-LLM-Agent-Based Framework for Economic and Public Policy Analysis	Yuzhi Hao et.al.	2502.16879	null
2025-02-24	Graphy'our Data: Towards End-to-End Modeling, Exploring and Generating Report from Raw Data	Longbin Lai et.al.	2502.16868	null
2025-02-24	Leveraging Large Language Models for Effective and Explainable Multi-Agent Credit Assignment	Kartik Nagpal et.al.	2502.16863	null
2025-02-24	LongAttn: Selecting Long-context Training Data via Token-level Attention	Longyun Wu et.al.	2502.16860	null
2025-02-24	Sarang at DEFACTIFY 4.0: Detecting AI-Generated Text Using Noised Data and an Ensemble of DeBERTa Models	Avinash Trivedi et.al.	2502.16857	null
2025-02-24	Improving LLM General Preference Alignment via Optimistic Online Mirror Descent	Yuheng Zhang et.al.	2502.16852	null
2025-02-24	Exploring Causes and Mitigation of Hallucinations in Large Vision Language Models	Yaqi Sun et.al.	2502.16842	null
2025-02-24	Fair Foundation Models for Medical Image Analysis: Challenges and Perspectives	Dilermando Queiroz et.al.	2502.16841	null
2025-02-24	In-context learning of evolving data streams with tabular foundational models	Afonso Lourenço et.al.	2502.16840	null
2025-02-24	"Actionable Help" in Crises: A Novel Dataset and Resource-Efficient Models for Identifying Request and Offer Social Media Posts	Rabindra Lamsal et.al.	2502.16839	null
2025-02-24	REGen: A Reliable Evaluation Framework for Generative Event Argument Extraction	Omar Sharif et.al.	2502.16838	null
2025-02-24	Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization	Yao Xiao et.al.	2502.16825	null
2025-02-21	ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval	Guanqi Zhan et.al.	2502.15682	null
2025-02-21	Privacy Ripple Effects from Adding or Removing Personal Information in Language Model Training	Jaydeep Borkar et.al.	2502.15680	null
2025-02-21	FLEKE: Federated Locate-then-Edit Knowledge Editing	Zongkai Zhao et.al.	2502.15677	null
2025-02-21	AutoToM: Automated Bayesian Inverse Planning and Model Discovery for Open-ended Theory of Mind	Zhining Zhang et.al.	2502.15676	null
2025-02-21	VaViM and VaVAM: Autonomous Driving through Video Generative Modeling	Florent Bartoccioni et.al.	2502.15672	link
2025-02-21	Almost AI, Almost Human: The Challenge of Detecting AI-Polished Writing	Shoumik Saha et.al.	2502.15666	null
2025-02-21	Machine-generated text detection prevents language model collapse	George Drayson et.al.	2502.15654	null
2025-02-21	Empowering LLMs with Logical Reasoning: A Comprehensive Survey	Fengxiang Cheng et.al.	2502.15652	null
2025-02-21	Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models	Anirudh Sundar et.al.	2502.15639	null
2025-02-21	Mantis: Lightweight Calibrated Foundation Model for User-Friendly Time Series Classification	Vasilii Feofanov et.al.	2502.15637	null
2025-02-21	The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer	Marthe Ballon et.al.	2502.15631	null
2025-02-21	Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing	Qi Le et.al.	2502.15618	null
2025-02-21	On the Robustness of Transformers against Context Hijacking for Linear Classification	Tianle Li et.al.	2502.15609	null
2025-02-21	Cross-Format Retrieval-Augmented Generation in XR with LLMs for Context-Aware Maintenance Assistance	Akos Nagy et.al.	2502.15604	null
2025-02-21	Do Multilingual LLMs Think In English?	Lisa Schut et.al.	2502.15603	null
2025-02-21	WorldCraft: Photo-Realistic 3D World Creation and Customization via LLM Agents	Xinhang Liu et.al.	2502.15601	null
2025-02-21	SafeInt: Shielding Large Language Models from Jailbreak Attacks via Safety-Aware Representation Intervention	Jiaqi Wu et.al.	2502.15594	null
2025-02-21	Generalizing From Short to Long: Effective Data Synthesis for Long-Context Instruction Tuning	Wenhao Zhu et.al.	2502.15592	null
2025-02-21	LightThinker: Thinking Step-by-Step Compression	Jintian Zhang et.al.	2502.15589	null
2025-02-21	Chats-Grid: An Iterative Retrieval Q&A Optimization Scheme Leveraging Large Model and Retrieval Enhancement Generation in smart grid	Yunfeng Li et.al.	2502.15583	null
2025-02-21	Fine-tuning foundation models of materials interatomic potentials with frozen transfer learning	Mariia Radova et.al.	2502.15582	null
2025-02-21	Interpreting and Steering LLMs with Mutual Information-based Explanations on Sparse Autoencoders	Xuansheng Wu et.al.	2502.15576	null
2025-02-21	DReSD: Dense Retrieval for Speculative Decoding	Milan Gritta et.al.	2502.15572	null
2025-02-21	A Cautionary Tale About "Neutrally" Informative AI Tools Ahead of the 2025 Federal Elections in Germany	Ina Dormuth et.al.	2502.15568	null
2025-02-21	PIP-KAG: Mitigating Knowledge Conflicts in Knowledge-Augmented Generation via Parametric Pruning	Pengcheng Huang et.al.	2502.15543	null
2025-02-21	Accurate and efficient machine learning interatomic potentials for finite temperature modeling of molecular crystals	Flaviano Della Pia et.al.	2502.15530	null
2025-02-21	Scaling Sparse and Dense Retrieval in Decoder-Only LLMs	Hansi Zeng et.al.	2502.15526	null
2025-02-21	Towards Swift Serverless LLM Cold Starts with ParaServe	Chiheng Lou et.al.	2502.15524	null
2025-02-21	Activation Steering in Neural Theorem Provers	Shashank Kirtania et.al.	2502.15507	null
2025-02-21	Construction and Evaluation of LLM-based agents for Semi-Autonomous penetration testing	Masaya Kobayashi et.al.	2502.15506	null
2025-02-21	Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models	Ya Wang et.al.	2502.15499	null
2025-02-21	Programmers Aren't Obsolete Yet: A Syllabus for Teaching CS Students to Responsibly Use Large Language Models for Code Generation	Bruno Pereira Cipriano et.al.	2502.15493	null
2025-02-21	ExpliCa: Evaluating Explicit Causal Reasoning in Large Language Models	Martina Miliani et.al.	2502.15487	null
2025-02-21	Enhancing RWKV-based Language Models for Long-Sequence Text Generation	Xinghan Pan et.al.	2502.15485	null
2025-02-21	FaultGPT: Industrial Fault Diagnosis Question Answering System by Vision Language Models	Jiao Chen et.al.	2502.15481	null
2025-02-21	PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System	Yintao He et.al.	2502.15470	null
2025-02-21	Mitigating Data Scarcity in Time Series Analysis: A Foundation Model with Series-Symbol Data Generation	Wenxuan Wang et.al.	2502.15466	null
2025-02-21	Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs	Gengyuan Zhang et.al.	2502.15457	null
2025-02-21	R-LoRA: Random Initialization of Multi-Head LoRA for Multi-Task Learning	Jinda Liu et.al.	2502.15455	null
2025-02-21	A fast convergence algorithm based on binary integer programming for expert load balancing in MoE LLMs	Yuan Sun et.al.	2502.15451	null
2025-02-21	When Compression Meets Model Compression: Memory-Efficient Double Compression for Large Language Models	Weilan Wang et.al.	2502.15443	null
2025-02-21	On the Effectiveness of Large Language Models in Writing Alloy Formulas	Yang Hong et.al.	2502.15441	null
2025-02-21	Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tuning	Raghav Singhal et.al.	2502.15436	link
2025-02-21	Single-pass Detection of Jailbreaking Input in Large Language Models	Leyla Naz Candogan et.al.	2502.15435	null
2025-02-21	Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation	Yue Zhou et.al.	2502.15434	null
2025-02-21	Pub-Guard-LLM: Detecting Fraudulent Biomedical Articles with Reliable Explanations	Lihu Chen et.al.	2502.15429	null
2025-02-21	Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs	Giulio Zizzo et.al.	2502.15427	null
2025-02-21	Beyond Translation: LLM-Based Data Generation for Multilingual Fact-Checking	Yi-Ling Chung et.al.	2502.15419	null
2025-02-21	MHQA: A Diverse, Knowledge Intensive Mental Health Question Answering Challenge for Language Models	Suraj Racha et.al.	2502.15418	null
2025-02-21	HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings	Rasmus Aavang et.al.	2502.15411	null
2025-02-21	Problem-Solving Logic Guided Curriculum In-Context Learning for LLMs Complex Reasoning	Xuetao Ma et.al.	2502.15401	null
2025-02-21	Beyond Tools: Understanding How Heavy Users Integrate LLMs into Everyday Tasks and Decision-Making	Eunhye Kim et.al.	2502.15395	null
2025-02-21	Chitrarth: Bridging Vision and Language for a Billion People	Shaharukh Khan et.al.	2502.15392	null
2025-02-21	MOVE: A Mixture-of-Vision-Encoders Approach for Domain-Focused Vision-Language Processing	Matvey Skripkin et.al.	2502.15381	null
2025-02-21	Weakly Supervised Video Scene Graph Generation via Natural Language Supervision	Kibum Kim et.al.	2502.15370	null
2025-02-21	Identifying Features that Shape Perceived Consciousness in Large Language Model-based AI: A Quantitative Study of Human Responses	Kang Bongsu et.al.	2502.15365	null
2025-02-21	Evaluating Social Biases in LLM Reasoning	Xuyang Wu et.al.	2502.15361	null
2025-02-21	ARS: Automatic Routing Solver with Large Language Models	Kai Li et.al.	2502.15359	null
2025-02-21	AttentionEngine: A Versatile Framework for Efficient Attention Mechanisms on Diverse Hardware Platforms	Feiyang Chen et.al.	2502.15349	null
2025-02-21	Constructing a Norm for Children's Scientific Drawing: Distribution Features Based on Semantic Similarity of Large Language Models	Yi Zhang et.al.	2502.15348	null
2025-02-21	Efficiently Solving Discounted MDPs with Predictions on Transition Matrices	Lixing Lyu et.al.	2502.15345	null
2025-02-21	Exploring Embodied Multimodal Large Models: Development, Datasets, and Future Directions	Shoubin Chen et.al.	2502.15336	null
2025-02-21	Stepwise Informativeness Search for Improving LLM Reasoning	Siyuan Wang et.al.	2502.15335	null
2025-02-21	Attention Eclipse: Manipulating Attention to Bypass LLM Safety-Alignment	Pedram Zaree et.al.	2502.15334	null
2025-02-21	Detecting Future-related Contexts of Entity Mentions	Puneet Prashar et.al.	2502.15332	null
2025-02-21	DynamicGSG: Dynamic 3D Gaussian Scene Graphs for Environment Adaptation	Luzhou Ge et.al.	2502.15309	link
2025-02-21	SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention	Hong Yankun et.al.	2502.15304	null
2025-02-21	Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference	Yaohua Tang et.al.	2502.15294	null
2025-02-21	Bridging Bug Localization and Issue Fixing: A Hierarchical Localization Framework Leveraging Large Language Models	Jianming Chang et.al.	2502.15292	null
2025-02-21	BundleFlow: Deep Menus for Combinatorial Auctions by Diffusion-Based Optimization	Tonghan Wang et.al.	2502.15283	null
2025-02-21	A Training-free LLM-based Approach to General Chinese Character Error Correction	Houquan Zhou et.al.	2502.15266	null
2025-02-21	Retrieval-Augmented Speech Recognition Approach for Domain Challenges	Peng Shen et.al.	2502.15264	null
2025-02-21	LightMamba: Efficient Mamba Acceleration on FPGA with Quantization and Hardware Co-design	Renjie Wei et.al.	2502.15260	null
2025-02-21	An approach for API synthesis using large language models	Hua Zhong et.al.	2502.15246	null
2025-02-21	Comparative Analysis of Large Language Models for Context-Aware Code Completion using SAFIM Framework	Hang Zhang et.al.	2502.15243	null
2025-02-21	From Documents to Dialogue: Building KG-RAG Enhanced AI Assistants	Manisha Mukherjee et.al.	2502.15237	null
2025-02-21	A General Pseudonymization Framework for Cloud-Based LLMs: Replacing Privacy Information in Controlled Text Generation	Shilong Hou et.al.	2502.15233	null
2025-02-21	User Experience with LLM-powered Conversational Recommendation Systems: A Case of Music Recommendation	Sojeong Yun et.al.	2502.15229	null
2025-02-21	Understand User Opinions of Large Language Models via LLM-Powered In-the-Moment User Experience Interviews	Mengqiao Liu et.al.	2502.15226	null
2025-02-21	Auto-Bench: An Automated Benchmark for Scientific Discovery in LLMs	Tingting Chen et.al.	2502.15224	null
2025-02-21	FormalSpecCpp: A Dataset of C++ Formal Specifications created using LLMs	Madhurima Chakraborty et.al.	2502.15217	link
2025-02-21	The Evolving Landscape of LLM- and VLM-Integrated Reinforcement Learning	Sheila Schoepp et.al.	2502.15214	null
2025-02-21	Unveiling Attractor Cycles in Large Language Models: A Dynamical Systems View of Successive Paraphrasing	Zhilin Wang et.al.	2502.15208	null
2025-02-21	Lung-DDPM: Semantic Layout-guided Diffusion Models for Thoracic CT Image Synthesis	Yifan Jiang et.al.	2502.15204	null
2025-02-21	TETRIS: Optimal Draft Token Selection for Batch Speculative Decoding	Zhaoxuan Wu et.al.	2502.15197	null
2025-02-21	LEDD: Large Language Model-Empowered Data Discovery in Data Lakes	Qi An et.al.	2502.15182	null
2025-02-21	Enhancing Speech Large Language Models with Prompt-Aware Mixture of Audio Encoders	Weiqiao Shan et.al.	2502.15178	null
2025-02-21	Methods and Trends in Detecting Generated Images: A Comprehensive Review	Arpan Mahara et.al.	2502.15176	null
2025-02-21	M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment	Chuan Cui et.al.	2502.15167	null
2025-02-21	Extreme Speech Classification in the Era of LLMs: Exploring Open-Source and Proprietary Models	Sarthak Mahajan et.al.	2502.15155	null
2025-02-21	Investigating the Adaptive Robustness with Knowledge Conflicts in LLM-based Multi-Agent Systems	Tianjie Ju et.al.	2502.15153	null
2025-02-21	Do LLMs Make Mistakes Like Students? Exploring Natural Alignment between Language Models and Human Error Patterns	Naiming Liu et.al.	2502.15140	null
2025-02-21	Chain-of-Rank: Enhancing Large Language Models for Domain-Specific RAG in Edge Device	Juntae Lee et.al.	2502.15134	null
2025-02-21	TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba	Xiuwei Chen et.al.	2502.15130	null
2025-02-20	LUME: LLM Unlearning with Multitask Evaluations	Anil Ramakrishna et.al.	2502.15097	null
2025-02-20	Detecting Student Intent for Chat-Based Intelligent Tutoring Systems	Ella Cutler et.al.	2502.15096	null
2025-02-20	Judging It, Washing It: Scoring and Greenwashing Corporate Climate Disclosures using Large Language Models	Marianne Chuang et.al.	2502.15094	null
2025-02-20	Optimizing Singular Spectrum for Large Language Model Compression	Dengjie Li et.al.	2502.15092	null
2025-02-20	Analyze the Neurons, not the Embeddings: Understanding When and Where LLM Representations Align with Humans	Masha Fedzechkina et.al.	2502.15090	null
2025-02-20	Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models	Yeonjun In et.al.	2502.15086	null
2025-02-20	LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention	Shang Yang et.al.	2502.14866	link
2025-02-20	Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning	Shuyue Stella Li et.al.	2502.14860	link
2025-02-20	FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling	Weilin Zhao et.al.	2502.14856	null
2025-02-20	Prompt-to-Leaderboard	Evan Frick et.al.	2502.14855	link
2025-02-20	GATE: Graph-based Adaptive Tool Evolution Across Diverse Tasks	Jianwen Luo et.al.	2502.14848	null
2025-02-20	Red-Teaming LLM Multi-Agent Systems via Communication Attacks	Pengfei He et.al.	2502.14847	null
2025-02-20	Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation	Yue Yang et.al.	2502.14846	null
2025-02-20	Revealing and Mitigating Over-Attention in Knowledge Editing	Pinzheng Wang et.al.	2502.14838	link
2025-02-20	Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs	Danni Liu et.al.	2502.14830	link
2025-02-20	A Survey of Model Architectures in Information Retrieval	Zhichao Xu et.al.	2502.14822	null
2025-02-20	eC-Tab2Text: Aspect-Based Text Generation from e-Commerce Product Tables	Luis Antonio Gutiérrez Guanilo et.al.	2502.14820	null
2025-02-20	Dynamic Low-Rank Sparse Adaptation for Large Language Models	Weizhong Huang et.al.	2502.14816	null
2025-02-20	FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis	Fadillah Maani et.al.	2502.14807	link
2025-02-20	From RAG to Memory: Non-Parametric Continual Learning for Large Language Models	Bernal Jiménez Gutiérrez et.al.	2502.14802	link
2025-02-20	A Multi-Agent Perspective on Modern Information Retrieval	Haya Nachimovsky et.al.	2502.14796	null
2025-02-20	Rapid Word Learning Through Meta In-Context Learning	Wentao Wang et.al.	2502.14791	null
2025-02-20	DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models	Hongji Yang et.al.	2502.14779	null
2025-02-20	SurveyX: Academic Survey Automation via Large Language Models	Xun Liang et.al.	2502.14776	null
2025-02-20	Determining Layer-wise Sparsity for Large Language Models Through a Theoretical Perspective	Weizhong Huang et.al.	2502.14770	null
2025-02-20	Tree-of-Debate: Multi-Persona Debate Trees Elicit Critical Thinking for Scientific Comparative Analysis	Priyanka Kargupta et.al.	2502.14767	link
2025-02-20	EquivaMap: Leveraging LLMs for Automatic Equivalence Checking of Optimization Formulations	Haotian Zhai et.al.	2502.14760	link
2025-02-20	On the Influence of Context Size and Model Choice in Retrieval-Augmented Generation Systems	Juraj Vladika et.al.	2502.14759	null
2025-02-20	TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators	Jianling Li et.al.	2502.14752	link
2025-02-20	Large Language Models Struggle to Describe the Haystack without Human Help: Human-in-the-loop Evaluation of LLMs	Zongxia Li et.al.	2502.14748	null
2025-02-20	Multi-Agent Coordination across Diverse Applications: A Survey	Lijun Sun et.al.	2502.14743	null
2025-02-20	SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines	M-A-P Team et.al.	2502.14739	null
2025-02-20	EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration	Minjie Hong et.al.	2502.14735	null
2025-02-20	WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models	Yifu Chen et.al.	2502.14727	null
2025-02-20	I-MCTS: Enhancing Agentic AutoML via Introspective Monte Carlo Tree Search	Zujie Liang et.al.	2502.14693	null
2025-02-20	Bridging the Gap: Transforming Natural Language Questions into SQL Queries via Abstract Query Pattern and Contextual Schema Markup	Yonghui Kong et.al.	2502.14682	null
2025-02-20	How to Get Your LLM to Generate Challenging Problems for Evaluation	Arkil Patel et.al.	2502.14678	link
2025-02-20	Data-Constrained Synthesis of Training Data for De-Identification	Thomas Vakili et.al.	2502.14677	null
2025-02-20	Explanations of Deep Language Models Explain Language Representations in the Brain	Maryam Rahimi et.al.	2502.14671	null
2025-02-20	AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO	Alan Dao et.al.	2502.14669	null
2025-02-20	Beyond the Surface: Uncovering Implicit Locations with LLMs for Personalized Local News	Gali Katz et.al.	2502.14660	null
2025-02-20	Edit Once, Update Everywhere: A Simple Framework for Cross-Lingual Knowledge Synchronization in LLMs	Yuchen Wu et.al.	2502.14645	null
2025-02-20	LIFT: Improving Long Context Understanding of Large Language Models through Long Input Fine-Tuning	Yansheng Mao et.al.	2502.14644	null
2025-02-20	Length-Controlled Margin-Based Preference Optimization without Reference Model	Gengxu Li et.al.	2502.14643	link
2025-02-20	ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation	Angxiao Yue et.al.	2502.14637	link
2025-02-20	CER: Confidence Enhanced Reasoning in LLMs	Ali Razghandi et.al.	2502.14634	link
2025-02-20	Augmenting Coaching with GenAI: Insights into Use, Effectiveness, and Future Potential	Jennifer Haase et.al.	2502.14632	null
2025-02-20	Synergistic Fusion of Multi-Source Knowledge via Evidence Theory for High-Entropy Alloy Discovery	Minh-Quyet Ha et.al.	2502.14631	null
2025-02-20	PEARL: Towards Permutation-Resilient LLMs	Liang Chen et.al.	2502.14628	link
2025-02-20	Reward Models Identify Consistency, Not Causality	Yuhui Xu et.al.	2502.14619	null
2025-02-20	Serving Models, Fast and Slow:Optimizing Heterogeneous LLM Inferencing Workloads at Scale	Shashwat Jaiswal et.al.	2502.14617	null
2025-02-20	FIND: Fine-grained Information Density Guided Adaptive Retrieval-Augmented Generation for Disease Diagnosis	Mingyi Jia et.al.	2502.14614	null
2025-02-20	Behavioral Analysis of Information Salience in Large Language Models	Jan Trienes et.al.	2502.14613	link
2025-02-20	"Don't Forget the Teachers": Towards an Educator-Centered Understanding of Harms from Large Language Models in Education	Emma Harvey et.al.	2502.14592	null
2025-02-20	Vision Foundation Models in Medical Image Analysis: Advances and Challenges	Pengchen Liang et.al.	2502.14584	null
2025-02-20	A Theory for Conditional Generative Modeling on Multiple Data Sources	Rongzhen Wang et.al.	2502.14583	link
2025-02-20	ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification	Hyunseok Lee et.al.	2502.14565	null
2025-02-20	Plan-over-Graph: Towards Parallelable LLM Agent Schedule	Shiqi Zhang et.al.	2502.14563	link
2025-02-20	Can LLMs Predict Citation Intent? An Experimental Analysis of In-context Learning and Fine-tuning on Open LLMs	Paris Koloveas et.al.	2502.14561	link
2025-02-20	Less is More: Improving LLM Alignment via Preference Data Selection	Xun Deng et.al.	2502.14560	null
2025-02-20	Multiscale Byte Language Models -- A Hierarchical Architecture for Causal Million-Length Sequence Modeling	Eric Egli et.al.	2502.14553	link
2025-02-20	Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks	Maya Bechler-Speicher et.al.	2502.14546	null
2025-02-20	LLM-based User Profile Management for Recommender System	Seunghwan Bang et.al.	2502.14541	null
2025-02-20	LoRA-GGPO: Mitigating Double Descent in LoRA Fine-Tuning via Gradient-Guided Perturbation Optimization	Yupeng Chang et.al.	2502.14538	link
2025-02-20	CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models	Zhenhong Zhou et.al.	2502.14529	link
2025-02-20	Generative adversarial networks vs large language models: a comparative study on synthetic tabular data generation	Austin A. Barr et.al.	2502.14523	link
2025-02-20	Can LLMs Simulate L2-English Dialogue? An Information-Theoretic Analysis of L1-Dependent Biases	Rena Gao et.al.	2502.14507	link
2025-02-20	How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?	Sergey Pletenev et.al.	2502.14502	link
2025-02-20	MLGym: A New Framework and Benchmark for Advancing AI Research Agents	Deepak Nathani et.al.	2502.14499	null
2025-02-20	StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction Following	Jinnan Li et.al.	2502.14494	link
2025-02-20	How Jailbreak Defenses Work and Ensemble? A Mechanistic Investigation	Zhuohang Long et.al.	2502.14486	null
2025-02-20	NLoRA: Nyström-Initiated Low-Rank Adaptation for Large Language Models	Chenlu Guo et.al.	2502.14482	link
2025-02-20	Unshackling Context Length: An Efficient Selective Attention Approach through Query-Key Compression	Haoyu Wang et.al.	2502.14477	null
2025-02-20	Argument-Based Comparative Question Answering Evaluation Benchmark	Irina Nikishina et.al.	2502.14476	null
2025-02-20	Enhancing Smart Environments with Context-Aware Chatbots using Large Language Models	Aurora Polo-Rodríguez et.al.	2502.14469	null
2025-02-20	Narrative-Driven Travel Planning: Geoculturally-Grounded Script Generation with Evolutionary Itinerary Optimization	Ran Ding et.al.	2502.14456	link
2025-02-20	Optimal word order for non-causal text generation with Large Language Models: the Spanish case	Andrea Busto-Castiñeira et.al.	2502.14451	null
2025-02-20	LLM4FaaS: No-Code Application Development using LLMs and FaaS	Minghe Wang et.al.	2502.14450	null
2025-02-20	PredictaBoard: Benchmarking LLM Score Predictability	Lorenzo Pacchiardi et.al.	2502.14445	link
2025-02-20	Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models	Artem Vazhentsev et.al.	2502.14427	link
2025-02-20	A Survey on Data Contamination for Large Language Models	Yuxing Cheng et.al.	2502.14425	link
2025-02-20	ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model	Zhongyi Zhou et.al.	2502.14420	null
2025-02-20	Towards Efficient Automatic Self-Pruning of Large Language Models	Weizhong Huang et.al.	2502.14413	null
2025-02-20	Evaluating Precise Geolocation Inference Capabilities of Vision Language Models	Neel Jay et.al.	2502.14412	link
2025-02-20	Unstructured Evidence Attribution for Long Context Query Focused Summarization	Dustin Wright et.al.	2502.14409	null
2025-02-20	HPS: Hard Preference Sampling for Human Preference Alignment	Xiandong Zou et.al.	2502.14400	null
2025-02-20	Enhancing Portuguese Variety Identification with Cross-Domain Approaches	Hugo Sousa et.al.	2502.14394	null
2025-02-20	Leveraging Small LLMs for Argument Mining in Education: Argument Component Identification, Classification, and Assessment	Lucile Favero et.al.	2502.14389	null
2025-02-20	S: Test Time Scaling for Code Generation*	Dacheng Li et.al.	2502.14382	link
2025-02-20	PPO-MI: Efficient Black-Box Model Inversion via Proximal Policy Optimization	Xinpeng Shou et.al.	2502.14370	null
2025-02-20	Entropy-UID: A Method for Optimizing Information Density	Xinpeng Shou et.al.	2502.14366	null
2025-02-20	Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning	Jiachen Zhu et.al.	2502.14361	null
2025-02-20	SR-LLM: Rethinking the Structured Representation in Large Language Model	Jiahuan Zhang et.al.	2502.14352	null
2025-02-20	SegAnyPET: Universal Promptable Segmentation from Positron Emission Tomography Images	Yichi Zhang et.al.	2502.14351	null
2025-02-20	FlowAgent: Achieving Compliance and Flexibility for Workflow Agents	Yuchen Shi et.al.	2502.14345	link
2025-02-20	Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective	Ruichen Shao et.al.	2502.14340	null
2025-02-20	A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics	Ting-Ruen Wei et.al.	2502.14333	null
2025-02-20	SolSearch: An LLM-Driven Framework for Efficient SAT-Solving Code Generation	Junjie Sheng et.al.	2502.14328	null
2025-02-20	ChemHTS: Hierarchical Tool Stacking for Enhancing Chemical Agents	Zhucong Li et.al.	2502.14327	link
2025-02-20	Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems	Bingyu Yan et.al.	2502.14321	null
2025-02-20	Line Goes Up? Inherent Limitations of Benchmarks for Evaluating Large Language Models	James Fodor et.al.	2502.14318	null
2025-02-20	ParallelComp: Parallel Long-Context Compressor for Length Extrapolation	Jing Xiong et.al.	2502.14317	null
2025-02-20	Unveiling Cultural Blind Spots: Analyzing the Limitations of mLLMs in Procedural Text Comprehension	Amir Hossein Yari et.al.	2502.14315	null
2025-02-20	Efficient AI in Practice: Training and Deployment of Efficient LLMs for Industry Applications	Kayhan Behdin et.al.	2502.14305	null
2025-02-20	MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models	Shrey Pandit et.al.	2502.14302	null
2025-02-20	SEA-HELM: Southeast Asian Holistic Evaluation of Language Models	Yosephine Susanto et.al.	2502.14301	null
2025-02-19	Where's the Bug? Attention Probing for Scalable Fault Localization	Adam Stein et.al.	2502.13966	null
2025-02-19	Autellix: An Efficient Serving Engine for LLM Agents as General Programs	Michael Luo et.al.	2502.13965	null
2025-02-19	MuDAF: Long-Context Multi-Document Attention Focusing through Contrastive Learning on Attention Heads	Weihao Liu et.al.	2502.13963	link
2025-02-19	Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering	William Jurayj et.al.	2502.13962	null
2025-02-19	LIDDIA: Language-based Intelligent Drug Discovery Agent	Reza Averly et.al.	2502.13959	null
2025-02-19	Neurosymbolic artificial intelligence via large language models and coherence-driven inference	Steve Huntsman et.al.	2502.13953	null
2025-02-19	Why Safeguarded Ships Run Aground? Aligned Large Language Models' Safety Mechanisms Tend to Be Anchored in The Template Region	Chak Tou Leong et.al.	2502.13946	null
2025-02-19	Image compositing is all you need for data augmentation	Ang Jia Ning Shermaine et.al.	2502.13936	null
2025-02-19	LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization	Guanzheng Chen et.al.	2502.13922	link
2025-02-19	Exploring Code Language Models for Automated HLS-based Hardware Generation: Benchmark, Infrastructure and Analysis	Jiahao Gai et.al.	2502.13921	null
2025-02-19	Exploring Personalized Health Support through Data-Driven, Theory-Guided LLMs: A Case Study in Sleep Health	Xingbo Wang et.al.	2502.13920	null
2025-02-19	How Do LLMs Perform Two-Hop Reasoning in Context?	Tianyu Guo et.al.	2502.13913	null
2025-02-19	Lost in Sequence: Do Large Language Models Understand Sequential Recommendation?	Sein Kim et.al.	2502.13909	link
2025-02-19	Judging the Judges: A Collection of LLM-Generated Relevance Judgements	Hossein A. Rahmani et.al.	2502.13908	link
2025-02-19	DataSciBench: An LLM Agent Benchmark for Data Science	Dan Zhang et.al.	2502.13897	link
2025-02-19	NavigateDiff: Visual Predictors are Zero-Shot Navigation Assistants	Yiran Qin et.al.	2502.13894	null
2025-02-19	Refining embeddings with fill-tuning: data-efficient generalised performance improvements for materials foundation models	Matthew P. Wilson et.al.	2502.13886	link
2025-02-19	SPEX: Scaling Feature Interaction Explanations for LLMs	Justin Singh Kang et.al.	2502.13870	link
2025-02-19	MagicGeo: Training-Free Text-Guided Geometric Diagram Generation	Junxiao Wang et.al.	2502.13855	null
2025-02-19	Enhancing LLM-Based Recommendations Through Personalized Reasoning	Jiahao Liu et.al.	2502.13845	null
2025-02-19	Enhancing Cross-Domain Recommendations with Memory-Optimized LLM-Based User Agents	Jiahao Liu et.al.	2502.13843	null
2025-02-19	Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking	Yilong Chen et.al.	2502.13842	null
2025-02-19	Quantifying Memorization and Retriever Performance in Retrieval-Augmented Vision-Language Models	Peter Carragher et.al.	2502.13836	null
2025-02-19	Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning	Zenan Li et.al.	2502.13834	null
2025-02-19	ArtMentor: AI-Assisted Evaluation of Artworks to Explore Multimodal Large Language Models Capabilities	Chanjin Zheng et.al.	2502.13832	link
2025-02-19	LESA: Learnable LLM Layer Scaling-Up	Yifei Yang et.al.	2502.13794	link
2025-02-19	From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions	Nathanaël Carraz Rakotonirina et.al.	2502.13791	link
2025-02-19	From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education	Yi-Fan Zhang et.al.	2502.13789	null
2025-02-19	Helix-mRNA: A Hybrid Foundation Model For Full Sequence mRNA Therapeutics	Matthew Wood et.al.	2502.13785	link
2025-02-19	Generative Large Recommendation Models: Emerging Trends in LLMs for Recommendation	Hao Wang et.al.	2502.13783	null
2025-02-19	Translation in the Hands of Many:Centering Lay Users in Machine Translation Interactions	Beatrice Savoldi et.al.	2502.13780	null
2025-02-19	VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare	Anudeex Shetty et.al.	2502.13775	null
2025-02-19	AI Software Engineer: Programming with Trust	Abhik Roychoudhury et.al.	2502.13767	null
2025-02-19	SCALAR: Scientific Citation-based Live Assessment of Long-context Academic Reasoning	Renxi Wang et.al.	2502.13753	null
2025-02-19	Reverse Markov Learning: Multi-Step Generative Models for Complex Distributions	Xinwei Shen et.al.	2502.13747	null
2025-02-19	Enhancing Input-Label Mapping in In-Context Learning with Contrastive Decoding	Keqin Peng et.al.	2502.13738	null
2025-02-19	CARE: Confidence-Aware Regression Estimation of building density fine-tuning EO Foundation Models	Nikolaos Dionelis et.al.	2502.13734	null
2025-02-19	Adapting Large Language Models for Time Series Modeling via a Novel Parameter-efficient Adaptation Method	Juyuan Zhang et.al.	2502.13725	null
2025-02-19	Direct Value Optimization: Improving Chain-of-Thought Reasoning in LLMs with Refined Values	Hongbo Zhang et.al.	2502.13723	null
2025-02-19	TALKPLAY: Multimodal Music Recommendation with Large Language Models	Seungheon Doh et.al.	2502.13713	null
2025-02-19	Is This Collection Worth My LLM's Time? Automatically Measuring Information Potential in Text Corpora	Tristan Karch et.al.	2502.13691	null
2025-02-19	An LLM-based Agent for Reliable Docker Environment Configuration	Ruida Hu et.al.	2502.13681	null
2025-02-19	SCOPE: A Self-supervised Framework for Improving Faithfulness in Conditional Text Generation	Song Duong et.al.	2502.13674	null
2025-02-19	Refining Sentence Embedding Model through Ranking Sentences Generation with Large Language Models	Liyang He et.al.	2502.13656	link
2025-02-19	C2T: A Classifier-Based Tree Construction Method in Speculative Decoding	Feiye Huo et.al.	2502.13652	null
2025-02-19	Reliability Across Parametric and External Knowledge: Understanding Knowledge Handling in LLMs	Youna Kim et.al.	2502.13648	null
2025-02-19	D.Va: Validate Your Demonstration First Before You Use It	Qi Zhang et.al.	2502.13646	null
2025-02-19	Qorgau: Evaluating LLM Safety in Kazakh-Russian Bilingual Contexts	Maiya Goloburda et.al.	2502.13640	null
2025-02-19	Concept Layers: Enhancing Interpretability and Intervenability via LLM Conceptualization	Or Raphael Bidusa et.al.	2502.13632	null
2025-02-19	AI-Empowered Catalyst Discovery: A Survey from Classical Machine Learning Approaches to Large Language Models	Yuanyuan Xu et.al.	2502.13626	null
2025-02-19	REFIND: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models	DongGeon Lee et.al.	2502.13622	null
2025-02-19	Complex Ontology Matching with Large Language Model Embeddings	Guilherme Sousa et.al.	2502.13619	null
2025-02-19	LaVCa: LLM-assisted Visual Cortex Captioning	Takuya Matsuyama et.al.	2502.13606	null
2025-02-19	BeamLoRA: Beam-Constraint Low-Rank Adaptation	Naibin Gu et.al.	2502.13604	null
2025-02-19	MMTEB: Massive Multilingual Text Embedding Benchmark	Kenneth Enevoldsen et.al.	2502.13595	null
2025-02-19	Don't Stop the Multi-Party! On Generating Synthetic Multi-Party Conversations with Constraints	Nicolò Penzo et.al.	2502.13592	null
2025-02-19	Unraveling the Localized Latents: Learning Stratified Manifold Structures in LLM Embedding Space with Sparse Mixture-of-Experts	Xin Li et.al.	2502.13577	null
2025-02-19	LSR-Adapt: Ultra-Efficient Parameter Tuning with Matrix Low Separation Rank Kernel Adaptation	Xin Li et.al.	2502.13568	null
2025-02-19	Extracting Social Connections from Finnish Karelian Refugee Interviews Using LLMs	Joonatan Laato et.al.	2502.13566	null
2025-02-19	PRIV-QA: Privacy-Preserving Question Answering for Cloud Large Language Models	Guangwei Li et.al.	2502.13564	link
2025-02-19	Are Large Language Models In-Context Graph Learners?	Jintang Li et.al.	2502.13562	null
2025-02-19	Democratizing Large Language Model-Based Graph Data Augmentation via Latent Knowledge Graphs	Yushi Feng et.al.	2502.13555	link
2025-02-19	STaR-SQL: Self-Taught Reasoner for Text-to-SQL	Mingqian He et.al.	2502.13550	null
2025-02-19	Detecting Linguistic Bias in Government Documents Using Large language Models	Milena de Swart et.al.	2502.13548	null
2025-02-19	From Sub-Ability Diagnosis to Human-Aligned Generation: Bridging the Gap for Text Length Control via MARKERGEN	Peiwen Yuan et.al.	2502.13544	null
2025-02-19	Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference	Qingfa Xiao et.al.	2502.13542	null
2025-02-19	Bursting Filter Bubble: Enhancing Serendipity Recommendations with Aligned Large Language Models	Yunjia Xi et.al.	2502.13539	null
2025-02-19	Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models	Jun Zhang et.al.	2502.13533	link
2025-02-19	Exploiting Prefix-Tree in Structured Output Interfaces for Enhancing Jailbreak Attacking	Yanzeng Li et.al.	2502.13527	link
2025-02-19	SPPD: Self-training with Process Preference Learning Using Dynamic Value Margin	Hao Yi et.al.	2502.13516	null
2025-02-19	Unlocking Multimodal Integration in EHRs: A Prompt Learning Framework for Language and Time Series Fusion	Shuai Niu et.al.	2502.13509	null
2025-02-19	Reproducing NevIR: Negation in Neural Information Retrieval	Coen van Elsen et.al.	2502.13506	link
2025-02-19	PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference	Burc Gokden et.al.	2502.13502	link
2025-02-19	Towards Geo-Culturally Grounded LLM Generations	Piyawat Lertvittayakumjorn et.al.	2502.13497	null
2025-02-19	What are Models Thinking about? Understanding Large Language Model Hallucinations "Psychology" through Model Inner State Analysis	Peiran Wang et.al.	2502.13490	null
2025-02-19	LLM4Tag: Automatic Tagging System for Information Retrieval via Large Language Models	Ruiming Tang et.al.	2502.13481	null
2025-02-19	Integration of Agentic AI with 6G Networks for Mission-Critical Applications: Use-case and Challenges	Sunder Ali Khowaja et.al.	2502.13476	null
2025-02-19	LLM should think and action as a human	Haun Leung et.al.	2502.13475	null
2025-02-19	Towards Lightweight, Adaptive and Attribute-Aware Multi-Aspect Controllable Text Generation with Large Language Models	Chenyu Zhu et.al.	2502.13474	null
2025-02-19	ThinkGuard: Deliberative Slow Thinking Leads to Cautious Guardrails	Xiaofei Wen et.al.	2502.13458	link
2025-02-19	Interleaved Gibbs Diffusion for Constrained Generation	Gautham Govind Anil et.al.	2502.13450	null
2025-02-19	Enhancing Chest X-ray Classification through Knowledge Injection in Cross-Modality Learning	Yang Yan et.al.	2502.13447	null
2025-02-19	TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination Evaluation	Jialin Ouyang et.al.	2502.13442	null
2025-02-19	The Self-Improvement Paradox: Can Language Models Bootstrap Reasoning Capabilities without External Scaffolding?	Yutao Sun et.al.	2502.13441	null
2025-02-19	MATS: An Audio Language Model under Text-only Supervision	Wen Wang et.al.	2502.13433	null
2025-02-19	Vision-Based Generic Potential Function for Policy Alignment in Multi-Agent Reinforcement Learning	Hao Ma et.al.	2502.13430	null
2025-02-19	MCTS-KBQA: Monte Carlo Tree Search for Knowledge Base Question Answering	Guanming Xiong et.al.	2502.13428	null
2025-02-19	TabSD: Large Free-Form Table Question Answering with SQL-Based Table Decomposition	Yuxiang Wang et.al.	2502.13422	null
2025-02-19	RLTHF: Targeted Human Feedback for LLM Alignment	Yifei Xu et.al.	2502.13417	null
2025-02-19	Detecting LLM Fact-conflicting Hallucinations Enhanced by Temporal-logic-based Reasoning	Ningke Li et.al.	2502.13416	null
2025-02-19	Explore-Construct-Filter: An Automated Framework for Rich and Reliable API Knowledge Graph Construction	Yanbang Sun et.al.	2502.13412	null
2025-02-19	Generative Predictive Control: Flow Matching Policies for Dynamic and Difficult-to-Demonstrate Tasks	Vince Kurtz et.al.	2502.13406	null
2025-02-19	$\mathtt{GeLLM^3O}$ : Generalizing Large Language Models for Multi-property Molecule Optimization	Vishal Dey et.al.	2502.13398	null
2025-02-19	Prompting a Weighting Mechanism into LLM-as-a-Judge in Two-Step: A Case Study	Wenwen Xie et.al.	2502.13396	null
2025-02-19	Flow-based generative models as iterative algorithms in probability space	Yao Xie et.al.	2502.13394	null
2025-02-19	Reasoning with Reinforced Functional Token Tuning	Kongcheng Zhang et.al.	2502.13389	link
2025-02-19	Reflection of Episodes: Learning to Play Game from Expert and Self Experiences	Xiaojie Xu et.al.	2502.13388	null
2025-02-19	MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification	Linzhuang Sun et.al.	2502.13383	link
2025-02-19	AutoTEE: Automated Migration and Protection of Programs in Trusted Execution Environments	Ruidong Han et.al.	2502.13379	null
2025-02-19	Task-agnostic Prompt Compression with Context-aware Sentence Embedding and Reward-guided Task Descriptor	Barys Liskavets et.al.	2502.13374	null
2025-02-18	Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization	Shuo Xing et.al.	2502.13146	link
2025-02-18	Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation	Bencheng Liao et.al.	2502.13145	link
2025-02-18	Pre-training Auto-regressive Robotic Models with 4D Representations	Dantong Niu et.al.	2502.13142	null
2025-02-18	UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models	Huawei Lin et.al.	2502.13141	link
2025-02-18	AIDE: AI-Driven Exploration in the Space of Code	Zhengyao Jiang et.al.	2502.13138	link
2025-02-18	Theorem Prover as a Judge for Synthetic Data Generation	Joshua Ong Jun Leang et.al.	2502.13137	null
2025-02-18	AV-Flow: Transforming Text to Audio-Visual Human-like Interactions	Aggelina Chatziagapi et.al.	2502.13133	null
2025-02-18	Learning to Defer for Causal Discovery with Imperfect Experts	Oscar Clivio et.al.	2502.13132	null
2025-02-18	Rethinking Diverse Human Preference Learning through Principal Component Analysis	Feng Luo et.al.	2502.13131	null
2025-02-18	Magma: A Foundation Model for Multimodal AI Agents	Jianwei Yang et.al.	2502.13130	link
2025-02-18	Is Noise Conditioning Necessary for Denoising Generative Models?	Qiao Sun et.al.	2502.13129	null
2025-02-18	Facilitating Long Context Understanding via Supervised Chain-of-Thought Reasoning	Jingyang Lin et.al.	2502.13127	null
2025-02-18	RuozhiBench: Evaluating LLMs with Logical Fallacies and Misleading Premises	Zenan Zhai et.al.	2502.13125	null
2025-02-18	Adapting Psycholinguistic Research for LLMs: Gender-inclusive Language in a Coreference Context	Marion Bartl et.al.	2502.13120	null
2025-02-18	STEER-ME: Assessing the Microeconomic Reasoning of Large Language Models	Narun Raman et.al.	2502.13119	null
2025-02-18	Performance Evaluation of Large Language Models in Statistical Programming	Xinyi Song et.al.	2502.13117	link
2025-02-18	MatterChat: A Multi-Modal LLM for Material Science	Yingheng Tang et.al.	2502.13107	null
2025-02-18	Text2World: Benchmarking Large Language Models for Symbolic World Model Generation	Mengkang Hu et.al.	2502.13092	null
2025-02-18	A Neural Difference-of-Entropies Estimator for Mutual Information	Haoran Ni et.al.	2502.13085	null
2025-02-18	Personalized Image Generation with Deep Generative Models: A Decade Survey	Yuxiang Wei et.al.	2502.13081	link
2025-02-18	SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models	Xianfu Cheng et.al.	2502.13059	null
2025-02-18	LAMD: Context-driven Android Malware Detection and Classification with LLMs	Xingzhi Qian et.al.	2502.13055	null
2025-02-18	Do we still need Human Annotators? Prompting Large Language Models for Aspect Sentiment Quad Prediction	Nils Constantin Hellwig et.al.	2502.13044	null
2025-02-18	HPSS: Heuristic Prompting Strategy Search for LLM Evaluators	Bosi Wen et.al.	2502.13031	null
2025-02-18	A deep learning framework for efficient pathology image analysis	Peter Neidlinger et.al.	2502.13027	null
2025-02-18	Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks	Markus J. Buehler et.al.	2502.13025	null
2025-02-18	Oreo: A Plug-in Context Reconstructor to Enhance Retrieval-Augmented Generation	Sha Li et.al.	2502.13019	null
2025-02-18	Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents	Chaoran Chen et.al.	2502.13012	null
2025-02-18	Adaptive Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge	Mohammad Reza Rezaei et.al.	2502.13010	null
2025-02-18	You need to MIMIC to get FAME: Solving Meeting Transcript Scarcity with a Multi-Agent Conversations	Frederic Kirstein et.al.	2502.13001	null
2025-02-18	Personalized Top-k Set Queries Over Predicted Scores	Sohrab Namazi Nia et.al.	2502.12998	null
2025-02-18	Beyond Profile: From Surface-Level Facts to Deep Persona Simulation in LLMs	Zixiao Wang et.al.	2502.12988	null
2025-02-18	Towards Variational Flow Matching on General Geometries	Olga Zaghen et.al.	2502.12981	null
2025-02-18	Learning More Effective Representations for Dense Retrieval through Deliberate Thinking Before Search	Yifan Ji et.al.	2502.12974	link
2025-02-18	Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking	Junda Zhu et.al.	2502.12970	link
2025-02-18	Trust Me, I'm Wrong: High-Certainty Hallucinations in LLMs	Adi Simhi et.al.	2502.12964	null
2025-02-18	Infinite Retrieval: Attention Enhanced LLMs in Long-Context Processing	Xiaoju Ye et.al.	2502.12962	null
2025-02-18	Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger	Wenjun Li et.al.	2502.12961	null
2025-02-18	Guaranteed Conditional Diffusion: 3D Block-based Models for Scientific Data Compression	Jaemoon Lee et.al.	2502.12951	null
2025-02-18	Fake It Till You Make It: Using Synthetic Data and Domain Knowledge for Improved Text-Based Learning for LGE Detection	Athira J Jacob et.al.	2502.12948	null
2025-02-18	Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models	Gyeongman Kim et.al.	2502.12947	null
2025-02-18	LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation	Junchen Fu et.al.	2502.12945	null
2025-02-18	Performance of Zero-Shot Time Series Foundation Models on Cloud Data	William Toner et.al.	2502.12944	null
2025-02-18	Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options	Lakshmi Nair et.al.	2502.12929	link
2025-02-18	Finedeep: Mitigating Sparse Activation in Dense LLMs via Multi-Layer Fine-Grained Experts	Leiyu Pan et.al.	2502.12928	null
2025-02-18	SEFL: Harnessing Large Language Model Agents to Improve Educational Feedback Systems	Mike Zhang et.al.	2502.12927	null
2025-02-18	Towards more Contextual Agents: An extractor-Generator Optimization Framework	Mourad Aouini et.al.	2502.12926	null
2025-02-18	Keep what you need : extracting efficient subnetworks from large audio representation models	David Genova et.al.	2502.12925	link
2025-02-18	Conditioning LLMs to Generate Code-Switched Text: A Methodology Grounded in Naturally Occurring Data	Maite Heredia et.al.	2502.12924	link
2025-02-18	On-Device LLMs for Home Assistant: Dual Role in Intent Detection and Response Generation	Rune Birkmose et.al.	2502.12923	link
2025-02-18	Q-STRUM Debate: Query-Driven Contrastive Summarization for Recommendation Comparison	George-Kirollos Saad et.al.	2502.12921	null
2025-02-18	Lightweight Online Adaption for Time Series Foundation Model Forecasts	Thomas L. Lee et.al.	2502.12920	null
2025-02-18	GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning	Sifan Zhou et.al.	2502.12913	null
2025-02-18	Probabilistic neural operators for functional uncertainty quantification	Christopher Bülte et.al.	2502.12902	link
2025-02-18	Soundwave: Less is More for Speech-Text Alignment in LLMs	Yuhao Zhang et.al.	2502.12900	link
2025-02-18	Multilingual European Language Models: Benchmarking Approaches and Challenges	Fabio Barth et.al.	2502.12895	null
2025-02-18	CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image	Kaixin Yao et.al.	2502.12894	null
2025-02-18	Are Multilingual Language Models an Off-ramp for Under-resourced Languages? Will we arrive at Digital Language Equality in Europe in 2030?	Georg Rehm et.al.	2502.12886	null
2025-02-18	How desirable is alignment between LLMs and linguistically diverse human users?	Pia Knoeferle et.al.	2502.12884	null
2025-02-18	Continuous Learning Conversational AI: A Personalized Agent Framework via A2C Reinforcement Learning	Nandakishor M et.al.	2502.12876	null
2025-02-18	RobotIQ: Empowering Mobile Robots with Human-Level Planning for Real-World Execution	Emmanuel K. Raptis et.al.	2502.12862	link
2025-02-18	PAFT: Prompt-Agnostic Fine-Tuning	Chenxing Wei et.al.	2502.12859	null
2025-02-18	Rejected Dialects: Biases Against African American Language in Reward Models	Joel Mire et.al.	2502.12858	null
2025-02-18	MeMo: Towards Language Models with Associative Memory Mechanisms	Fabio Massimo Zanzotto et.al.	2502.12851	null
2025-02-18	MOLLM: Multi-Objective Large Language Model for Molecular Design -- Optimizing with Experts	Nian Ran et.al.	2502.12845	null
2025-02-18	Towards Adaptive Feedback with AI: Comparing the Feedback Quality of LLMs and Teachers on Experimentation Protocols	Kathrin Seßler et.al.	2502.12842	null
2025-02-18	Towards Equitable AI: Detecting Bias in Using Large Language Models for Marketing	Berk Yilmaz et.al.	2502.12838	null
2025-02-18	An LLM-Powered Agent for Physiological Data Analysis: A Case Study on PPG-based Heart Rate Estimation	Mohammad Feli et.al.	2502.12836	null
2025-02-18	KazMMLU: Evaluating Language Models on Kazakh, Russian, and Regional Knowledge of Kazakhstan	Mukhammed Togmanov et.al.	2502.12829	null
2025-02-18	Reasoning and the Trusting Behavior of DeepSeek and GPT: An Experiment Revealing Hidden Fault Lines in Large Language Models	Rubing Lu et.al.	2502.12825	null
2025-02-18	Pitfalls of Scale: Investigating the Inverse Task of Redefinition in Large Language Models	Elena Stringli et.al.	2502.12821	null
2025-02-18	Simulating User Diversity in Task-Oriented Dialogue Systems using Large Language Models	Adnan Ahmad et.al.	2502.12813	null
2025-02-18	Towards Text-Image Interleaved Retrieval	Xin Zhang et.al.	2502.12799	link
2025-02-18	RAPID: Retrieval Augmented Training of Differentially Private Diffusion Models	Tanqiu Jiang et.al.	2502.12794	link
2025-02-18	Commonsense Reasoning in Arab Culture	Abdelrahman Sadallah et.al.	2502.12788	null
2025-02-18	Portable Reward Tuning: Towards Reusable Fine-Tuning across Different Pretrained Models	Daiki Chijiwa et.al.	2502.12776	null
2025-02-18	How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild	Saad Obaid ul Islam et.al.	2502.12769	link
2025-02-18	R2-KG: General-Purpose Dual-Agent Framework for Reliable Reasoning on Knowledge Graphs	Sumin Jo et.al.	2502.12767	null
2025-02-18	One-bit Compressed Sensing using Generative Models	Swatantra Kafle et.al.	2502.12762	null
2025-02-18	Efficient Machine Translation Corpus Generation: Integrating Human-in-the-Loop Post-Editing with Large Language Models	Kamer Ali Yuksel et.al.	2502.12755	link
2025-02-18	Architect of the Bits World: Masked Autoregressive Modeling for Circuit Generation Guided by Truth Table	Haoyuan Wu et.al.	2502.12751	null
2025-02-18	Self-Enhanced Reasoning Training: Activating Latent Reasoning in Small Models for Enhanced Reasoning Distillation	Yong Zhang et.al.	2502.12744	null
2025-02-18	"I know myself better, but not really greatly": Using LLMs to Detect and Explain LLM-Generated Texts	Jiazhou Ji et.al.	2502.12743	null
2025-02-18	Circuit Representation Learning with Masked Gate Modeling and Verilog-AIG Alignment	Haoyuan Wu et.al.	2502.12732	null
2025-02-18	TREND: A Whitespace Replacement Information Hiding Method	Malte Hellmeier et.al.	2502.12710	null
2025-02-18	Multi-Novelty: Improve the Diversity and Novelty of Contents Generated by Large Language Models via inference-time Multi-Views Brainstorming	Arash Lagzian et.al.	2502.12700	null
2025-02-18	Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees	Yongtao Wu et.al.	2502.12678	null
2025-02-18	Baichuan-M1: Pushing the Medical Capability of Large Language Models	Bingning Wang et.al.	2502.12671	null
2025-02-18	Perovskite-LLM: Knowledge-Enhanced Large Language Models for Perovskite Solar Cell Research	Xiang Liu et.al.	2502.12669	null
2025-02-18	Evaluation of Best-of-N Sampling Strategies for Language Model Alignment	Yuki Ichihara et.al.	2502.12668	null
2025-02-18	A $^2$ ATS: Retrieval-Based KV Cache Reduction via Windowed Rotary Position Embedding and Query-Aware Vector Quantization	Junhui He et.al.	2502.12665	null
2025-02-18	Demystifying Multilingual Chain-of-Thought in Process Reward Modeling	Weixuan Wang et.al.	2502.12663	null
2025-02-18	The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1	Kaiwen Zhou et.al.	2502.12659	null
2025-02-18	R.R.: Unveiling LLM Training Privacy through Recollection and Ranking	Wenlong Meng et.al.	2502.12658	link
2025-02-18	NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation	Zhiyuan Liu et.al.	2502.12638	link
2025-02-18	Corrupted but Not Broken: Rethinking the Impact of Corrupted Data in Visual Instruction Tuning	Yunhao Gou et.al.	2502.12635	null
2025-02-18	\textit{One Size doesn't Fit All}: A Personalized Conversational Tutoring Agent for Mathematics Instruction	Ben Liu et.al.	2502.12633	null
2025-02-18	Automating Prompt Leakage Attacks on Large Language Models Using Agentic Approach	Tvrtko Sternak et.al.	2502.12630	link
2025-02-18	DeepResonance: Enhancing Multimodal Music Understanding via Music-centric Multi-way Instruction Tuning	Zhuoyuan Mao et.al.	2502.12623	null
2025-02-18	Improving Chain-of-Thought Reasoning via Quasi-Symbolic Abstractions	Leonardo Ranaldi et.al.	2502.12616	null
2025-02-17	Idiosyncrasies in Large Language Models	Mingjie Sun et.al.	2502.12150	link
2025-02-17	HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation	Ling Yang et.al.	2502.12148	link
2025-02-17	Fast or Better? Balancing Accuracy and Cost in Retrieval-Augmented Generation with Flexible User Control	Jinyan Su et.al.	2502.12145	link
2025-02-17	Small Models Struggle to Learn from Strong Reasoners	Yuetai Li et.al.	2502.12143	null
2025-02-17	SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs	Yige Xu et.al.	2502.12134	null
2025-02-17	Transformer Dynamics: A neuroscientific approach to interpretability of large language models	Jesseba Fernando et.al.	2502.12131	null
2025-02-17	Scaling Autonomous Agents via Automatic Reward Modeling And Planning	Zhenfang Chen et.al.	2502.12130	null
2025-02-17	LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities	Florian Sestak et.al.	2502.12128	link
2025-02-17	Minimal Ranks, Maximum Confidence: Parameter-efficient Uncertainty Quantification for LoRA	Patryk Marszałek et.al.	2502.12122	link
2025-02-17	LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws	Prasanna Mayilvahanan et.al.	2502.12120	null
2025-02-17	PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection	Jinhe Bi et.al.	2502.12119	null
2025-02-17	A-MEM: Agentic Memory for LLM Agents	Wujiang Xu et.al.	2502.12110	link
2025-02-17	Personality Structured Interview for Large Language Model Simulation in Personality Research	Pengda Wang et.al.	2502.12109	null
2025-02-17	Relational Norms for Human-AI Cooperation	Brian D. Earp et.al.	2502.12102	null
2025-02-17	Token Communications: A Unified Framework for Cross-modal Context-aware Semantic Communications	Li Qiao et.al.	2502.12096	null
2025-02-17	How compositional generalization and creativity improve as diffusion models are trained	Alessandro Favero et.al.	2502.12089	null
2025-02-17	Meta-Statistical Learning: Supervised Learning of Statistical Inference	Maxime Peyrard et.al.	2502.12088	null
2025-02-17	APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs	Yuxiang Huang et.al.	2502.12085	link
2025-02-17	Can LLMs Simulate Social Media Engagement? A Study on Action-Guided Response Generation	Zhongyi Qiu et.al.	2502.12073	null
2025-02-17	TokenSkip: Controllable Chain-of-Thought Compression in LLMs	Heming Xia et.al.	2502.12067	link
2025-02-17	CONSTRUCTA: Automating Commercial Construction Schedules in Fabrication Facilities with Large Language Models	Yifan Zhang et.al.	2502.12066	null
2025-02-17	AI-generated Text Detection with a GLTR-based Approach	Lucía Yan Wu et.al.	2502.12064	null
2025-02-17	Designing Role Vectors to Improve LLM Inference Behaviour	Daniele Potertì et.al.	2502.12055	null
2025-02-17	PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning	Xinyu Zhang et.al.	2502.12054	null
2025-02-17	A Survey on Bridging EEG Signals and Generative AI: From Image and Text to Beyond	Shreya Shukla et.al.	2502.12048	null
2025-02-17	KnowPath: Knowledge-enhanced Reasoning via LLM-generated Inference Paths over Knowledge Graphs	Qi Zhao et.al.	2502.12029	null
2025-02-17	SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities	Fengqing Jiang et.al.	2502.12025	null
2025-02-17	Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving	Xin Xu et.al.	2502.12022	null
2025-02-17	Atom of Thoughts for Markov LLM Test-Time Scaling	Fengwei Teng et.al.	2502.12018	null
2025-02-17	Unsupervised Structural-Counterfactual Generation under Domain Shift	Krishn Vishwas Kher et.al.	2502.12013	null
2025-02-17	Design Considerations Based on Stability for a Class of TCP Algorithms	Sreekanth Prabhakar et.al.	2502.11983	null
2025-02-17	Image Inversion: A Survey from GANs to Diffusion and Beyond	Yinan Chen et.al.	2502.11974	link
2025-02-17	Generating Text from Uniform Meaning Representation	Emma Markle et.al.	2502.11973	link
2025-02-17	A MIMO Wireless Channel Foundation Model via CIR-CSI Consistency	Jun Jiang et.al.	2502.11965	null
2025-02-17	Navigating the Helpfulness-Truthfulness Trade-Off with Uncertainty-Aware Instruction Fine-Tuning	Tianyi Wu et.al.	2502.11962	null
2025-02-17	On Representational Dissociation of Language and Arithmetic in Large Language Models	Riku Kisako et.al.	2502.11932	null
2025-02-17	GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs	Yi Fang et.al.	2502.11925	null
2025-02-17	From Text to Trust: Empowering AI-assisted Decision Making with Adaptive LLM-powered Analysis	Zhuoyan Li et.al.	2502.11919	null
2025-02-17	EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models	Jiamin Su et.al.	2502.11916	null
2025-02-17	Adversarial Alignment for LLMs Requires Simpler, Reproducible, and More Measurable Objectives	Leo Schwinn et.al.	2502.11910	null
2025-02-17	MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation	Haochen Xue et.al.	2502.11903	null
2025-02-17	DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation	Zhihang Yuan et.al.	2502.11897	link
2025-02-17	CAMEL: Continuous Action Masking Enabled by Large Language Models for Reinforcement Learning	Yanxiao Zhao et.al.	2502.11896	null
2025-02-17	Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models?	Jacob Nielsen et.al.	2502.11895	null
2025-02-17	Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration	Shao Zhang et.al.	2502.11882	link
2025-02-17	Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models	Hyunwoo Kim et.al.	2502.11881	null
2025-02-17	Bitnet.cpp: Efficient Edge Inference for Ternary LLMs	Jinheng Wang et.al.	2502.11880	link
2025-02-17	JoLT: Joint Probabilistic Predictions on Tabular Data Using LLMs	Aliaksandra Shysheya et.al.	2502.11877	link
2025-02-17	FedEAT: A Robustness Optimization Framework for Federated LLMs	Yahao Pang et.al.	2502.11863	null
2025-02-17	Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu	Renhao Pei et.al.	2502.11862	null
2025-02-17	Exploring Large Language Models in Healthcare: Insights into Corpora Sources, Customization Strategies, and Evaluation Metrics	Shuqi Yang et.al.	2502.11861	null
2025-02-17	StructTransform: A Scalable Attack Surface for Safety-Aligned Large Language Models	Shehel Yoosuf et.al.	2502.11853	link
2025-02-17	BaxBench: Can LLMs Generate Correct and Secure Backends?	Mark Vero et.al.	2502.11844	null
2025-02-17	Can LLM Agents Maintain a Persona in Discourse?	Pranav Bhandari et.al.	2502.11843	null
2025-02-17	Model Generalization on Text Attribute Graphs: Principles with Large Language Models	Haoyu Wang et.al.	2502.11836	link
2025-02-17	HAAN: A Holistic Approach for Accelerating Normalization Operations in Large Language Models	Tianfan Peng et.al.	2502.11832	null
2025-02-17	Intuitive physics understanding emerges from self-supervised pretraining on natural videos	Quentin Garrido et.al.	2502.11831	link
2025-02-17	Text Classification in the LLM Era - Where do we stand?	Sowmya Vajjala et.al.	2502.11830	null
2025-02-17	Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities	Hanbin Wang et.al.	2502.11829	link
2025-02-17	M-ABSA: A Multilingual Dataset for Aspect-Based Sentiment Analysis	Chengyan Wu et.al.	2502.11824	link
2025-02-17	Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis	Xu Wang et.al.	2502.11812	null
2025-02-17	FineFilter: A Fine-grained Noise Filtering Mechanism for Retrieval-Augmented Large Language Models	Qianchi Zhang et.al.	2502.11811	null
2025-02-17	Exploring Translation Mechanism of Large Language Models	Hongbin Zhang et.al.	2502.11806	null
2025-02-17	Table-Critic: A Multi-Agent Framework for Collaborative Criticism and Refinement in Table Reasoning	Peiying Yu et.al.	2502.11799	null
2025-02-17	Personality Editing for Language Models through Relevant Knowledge Editing	Seojin Hwang et.al.	2502.11789	null
2025-02-17	Efficient Response Generation Method Selection for Fine-Tuning Large Language Models	Xuan Ren et.al.	2502.11779	null
2025-02-17	video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model	Guangzhi Sun et.al.	2502.11775	null
2025-02-17	The Validation Gap: A Mechanistic Analysis of How Language Models Compute Arithmetic but Fail to Validate It	Leonardo Bertolazzi et.al.	2502.11771	link
2025-02-17	Cognitive-Aligned Document Selection for Retrieval-augmented Generation	Bingyu Wan et.al.	2502.11770	null
2025-02-17	From Selection to Generation: A Survey of LLM-based Active Learning	Yu Xia et.al.	2502.11767	null
2025-02-17	Warmup-Distill: Bridge the Distribution Mismatch between Teacher and Student before Knowledge Distillation	Zengkui Sun et.al.	2502.11766	link
2025-02-17	HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims	Michiel van der Meer et.al.	2502.11753	null
2025-02-17	Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning	Yuqi Pang et.al.	2502.11751	link
2025-02-17	ILIAS: Instance-Level Image retrieval At Scale	Giorgos Kordopatis-Zilos et.al.	2502.11748	null
2025-02-17	SQL-o1: A Self-Reward Heuristic Dynamic Search Method for Text-to-SQL	Shuai Lyu et.al.	2502.11741	link
2025-02-17	ReviewEval: An Evaluation Framework for AI-Generated Reviews	Chavvi Kirtani et.al.	2502.11736	null
2025-02-17	Plant in Cupboard, Orange on Table, Book on Shelf. Benchmarking Practical Reasoning and Situation Modelling in a Text-Simulated Situated Environment	Jonathan Jordan et.al.	2502.11733	null
2025-02-17	Energy-Conscious LLM Decoding: Impact of Text Generation Strategies on GPU Energy Consumption	Alireza Nik et.al.	2502.11723	null
2025-02-17	Enhancing Recommendation Explanations through User-Centric Refinement	Jingsen Zhang et.al.	2502.11721	null
2025-02-17	Can you pass that tool?: Implications of Indirect Speech in Physical Human-Robot Collaboration	Yan Zhang et.al.	2502.11720	null
2025-02-17	Component-aware Unsupervised Logical Anomaly Generation for Industrial Anomaly Detection	Xuan Tong et.al.	2502.11712	null
2025-02-17	Ad-hoc Concept Forming in the Game Codenames as a Means for Evaluating Large Language Models	Sherzod Hakimov et.al.	2502.11707	null
2025-02-17	LLM Agents Making Agent Tools	Georg Wölflein et.al.	2502.11705	null
2025-02-17	CMQCIC-Bench: A Chinese Benchmark for Evaluating Large Language Models in Medical Quality Control Indicator Calculation	Guangya Yu et.al.	2502.11703	null
2025-02-17	MVTokenFlow: High-quality 4D Content Generation using Multiview Token Flow	Hanzhuo Huang et.al.	2502.11697	null
2025-02-17	Improve LLM-as-a-Judge Ability as a General Ability	Jiachen Yu et.al.	2502.11689	null
2025-02-17	MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task	Yuchen Yan et.al.	2502.11684	null
2025-02-17	RIDE: Enhancing Large Language Model Alignment through Restyled In-Context Learning Demonstration Exemplars	Yuncheng Hua et.al.	2502.11681	link
2025-02-17	Exploring LLM-based Student Simulation for Metacognitive Cultivation	Haoxuan Li et.al.	2502.11678	null
2025-02-17	Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception	Shiyu Ni et.al.	2502.11677	null
2025-02-17	Diversity-Oriented Data Augmentation with Large Language Models	Zaitian Wang et.al.	2502.11671	null
2025-02-17	VRoPE: Rotary Position Embedding for Video Large Language Models	Zikang Liu et.al.	2502.11664	link
2025-02-17	An Innovative Brain-Computer Interface Interaction System Based on the Large Language Model	Jing Jina et.al.	2502.11659	null
2025-02-17	Competing LLM Agents in a Non-Cooperative Game of Opinion Polarisation	Amin Qasmi et.al.	2502.11649	null
2025-02-17	DELMAN: Dynamic Defense Against Large Language Model Jailbreaking with Model Editing	Yi Wang et.al.	2502.11647	null
2025-02-17	Hyperspherical Energy Transformer with Recurrent Depth	Yunzhe Hu et.al.	2502.11646	null
2025-02-17	Is Human-Like Text Liked by Humans? Multilingual Human Detection and Preference Against AI	Yuxia Wang et.al.	2502.11614	null
2025-02-17	Maximum Entropy Reinforcement Learning with Diffusion Policy	Xiaoyi Dong et.al.	2502.11612	link
2025-02-17	Accuracy Assessment of OpenAlex and Clarivate Scholar ID with an LLM-Assisted Benchmark	Renyu Zhao et.al.	2502.11610	null
2025-02-17	GraphThought: Graph Combinatorial Optimization with Thought Generation	Zixiao Huang et.al.	2502.11607	null
2025-02-14	MM-RLHF: The Next Step Forward in Multimodal LLM Alignment	Yi-Fan Zhang et.al.	2502.10391	null
2025-02-14	Aspect-Oriented Summarization for Psychiatric Short-Term Readmission Prediction	WonJin Yoon et.al.	2502.10388	null
2025-02-14	Robustness tests for biomedical foundation models should tailor to specification	R. Patrick Xian et.al.	2502.10374	link
2025-02-14	AffinityFlow: Guided Flows for Antibody Affinity Maturation	Can Chen et.al.	2502.10365	null
2025-02-14	Enhancing Multilingual LLM Pretraining with Model-Based Data Selection	Bettina Messmer et.al.	2502.10361	null
2025-02-14	Dimension-free Score Matching and Time Bootstrapping for Diffusion Models	Syamantak Kumar et.al.	2502.10354	null
2025-02-14	Organize the Web: Constructing Domains Enhances Pre-Training Data Curation	Alexander Wettig et.al.	2502.10341	null
2025-02-14	Evaluating the Meta- and Object-Level Reasoning of Large Language Models for Question Answering	Nick Ferguson et.al.	2502.10338	null
2025-02-14	Generalised Parallel Tempering: Flexible Replica Exchange via Flows and Diffusions	Leo Zhang et.al.	2502.10328	null
2025-02-14	LLM-Powered Preference Elicitation in Combinatorial Assignment	Ermis Soumalias et.al.	2502.10308	null
2025-02-14	SPIRIT: Short-term Prediction of solar IRradIance for zero-shot Transfer learning using Foundation Models	Aditya Mishra et.al.	2502.10307	null
2025-02-14	Open-Source AI-Powered Optimization in Scalene: Advancing Python Performance Profiling with DeepSeek-R1 and LLaMA 3.2	Saem Hasan et.al.	2502.10299	null
2025-02-14	Probabilistic Super-Resolution for High-Fidelity Physical System Simulations with Uncertainty Quantification	Pengyu Zhang et.al.	2502.10280	null
2025-02-14	Are Large Language Models the future crowd workers of Linguistics?	Iris Ferrazzo et.al.	2502.10266	null
2025-02-14	Large Language Models and Synthetic Data for Monitoring Dataset Mentions in Research Papers	Aivin V. Solatorio et.al.	2502.10263	null
2025-02-14	VisCon-100K: Leveraging Contextual Web Data for Fine-tuning Vision Language Models	Gokul Karthik Kumar et.al.	2502.10250	null
2025-02-14	Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model	Guoqing Ma et.al.	2502.10248	link
2025-02-14	Efficient Zero-Order Federated Finetuning of Language Models for Resource-Constrained Devices	Mohamed Aboelenien Ahmed et.al.	2502.10239	null
2025-02-14	Shaping Inductive Bias in Diffusion Models through Frequency-Based Noise Control	Thomas Jiralerspong et.al.	2502.10236	null
2025-02-14	AdaPTS: Adapting Univariate Foundation Models to Probabilistic Multivariate Time Series Forecasting	Abdelhakim Benechehab et.al.	2502.10235	link
2025-02-14	Do Large Language Models Reason Causally Like Us? Even Better?	Hanna M. Dettki et.al.	2502.10215	null
2025-02-14	Can Post-Training Quantization Benefit from an Additional QLoRA Integration?	Xiliang Zhu et.al.	2502.10202	null
2025-02-14	Prediction hubs are context-informed frequent tokens in LLMs	Beatrix M. G. Nielsen et.al.	2502.10201	null
2025-02-14	MathConstruct: Challenging LLM Reasoning with Constructive Proofs	Mislav Balunović et.al.	2502.10197	null
2025-02-14	Translating Common Security Assertions Across Processor Designs: A RISC-V Case Study	Sharjeel Imtiaz et.al.	2502.10194	null
2025-02-14	VideoDiff: Human-AI Video Co-Creation with Alternatives	Mina Huh et.al.	2502.10190	null
2025-02-14	Modeling biases in binary decision-making within the generalized nonlinear q-voter model	Maciej Doniec et.al.	2502.10172	null
2025-02-14	Video Soundtrack Generation by Aligning Emotions and Temporal Boundaries	Serkan Sulun et.al.	2502.10154	null
2025-02-14	Semantica: Decentralized Search using a LLM-Guided Semantic Tree Overlay	Petru Neague et.al.	2502.10151	link
2025-02-14	Cooperative Multi-Agent Planning with Adaptive Skill Synthesis	Zhiyuan Li et.al.	2502.10148	null
2025-02-14	Small Models, Big Impact: Efficient Corpus and Graph-Based Adaptation of Small Multilingual Language Models for Low-Resource Languages	Daniil Gurgurov et.al.	2502.10140	null
2025-02-14	Physics-Informed Generative Modeling of Wireless Channels	Benedikt Böck et.al.	2502.10137	null
2025-02-14	ScamFerret: Detecting Scam Websites Autonomously with Large Language Models	Hiroki Nakano et.al.	2502.10110	link
2025-02-14	NeuroXVocal: Detection and Explanation of Alzheimer's Disease through Non-invasive Analysis of Picture-prompted Speech	Nikolaos Ntampakis et.al.	2502.10108	null
2025-02-14	A novel approach to data generation in generative model	JaeHong Kim et.al.	2502.10092	null
2025-02-14	Enhancing Patient Acceptance of Robotic Ultrasound through Conversational Virtual Agent and Immersive Visualizations	Tianyu Song et.al.	2502.10088	null
2025-02-14	DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery	Utkarsh Mall et.al.	2502.10060	null
2025-02-14	A Generalized Modeling Approach to Liquid-driven Ballooning Membranes	Mirroyal Ismayilov et.al.	2502.10057	null
2025-02-14	ORI: O Routing Intelligence	Ahmad Shadid et.al.	2502.10051	null
2025-02-14	A Survey on LLM-powered Agents for Recommender Systems	Qiyao Peng et.al.	2502.10050	null
2025-02-14	ViRAC: A Vision-Reasoning Agent Head Movement Control Framework in Arbitrary Virtual Environments	Juyeong Hwang et.al.	2502.10046	null
2025-02-14	POI-Enhancer: An LLM-based Semantic Enhancement Framework for POI Representation Learning	Jiawei Cheng et.al.	2502.10038	null
2025-02-14	Probabilistic Lexical Manifold Construction in Large Language Models via Hierarchical Vector Field Interpolation	Clive Pendleton et.al.	2502.10013	null
2025-02-14	ChatGPT and Deepseek: Can They Predict the Stock Market and Macroeconomy?	Jian Chen et.al.	2502.10008	null
2025-02-14	EmbBERT-Q: Breaking Memory Barriers in Embedded NLP	Riccardo Bravin et.al.	2502.10001	null
2025-02-14	Decision Information Meets Large Language Models: The Future of Explainable Operations Research	Yansen Zhang et.al.	2502.09994	null
2025-02-14	Large Language Diffusion Models	Shen Nie et.al.	2502.09992	null
2025-02-14	V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models	Hsu-kuang Chiu et.al.	2502.09980	null
2025-02-14	LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs - No Silver Bullet for LC or RAG Routing	Kuan Li et.al.	2502.09977	null
2025-02-14	Has My System Prompt Been Used? Large Language Model Prompt Membership Inference	Roman Levin et.al.	2502.09974	null
2025-02-14	KGGen: Extracting Knowledge Graphs from Plain Text with Language Models	Belinda Mo et.al.	2502.09956	null
2025-02-14	A Preliminary Exploration with GPT-4o Voice Mode	Yu-Xiang Lin et.al.	2502.09940	null
2025-02-14	Precise Parameter Localization for Textual Generation in Diffusion Models	Łukasz Staniszewski et.al.	2502.09935	null
2025-02-14	MIR-Bench: Benchmarking LLM's Long-Context Intelligence via Many-Shot In-Context Inductive Reasoning	Kai Yan et.al.	2502.09933	null
2025-02-14	Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence	Granite Vision Team et.al.	2502.09927	null
2025-02-14	λScale: Enabling Fast Scaling for Serverless Large Language Model Inference	Minchen Yu et.al.	2502.09922	null
2025-02-14	INF^2: High-Throughput Generative Inference of Large Language Models using Near-Storage Processing	Hongsun Jang et.al.	2502.09921	null
2025-02-14	AutoS $^2$ earch: Unlocking the Reasoning Potential of Large Models for Web-based Source Search	Zhengqiu Zhu et.al.	2502.09913	null
2025-02-14	Insect-Foundation: A Foundation Model and Large Multimodal Dataset for Vision-Language Insect Understanding	Thanh-Dat Truong et.al.	2502.09906	null
2025-02-14	The Ann Arbor Architecture for Agent-Oriented Programming	Wei Dong et.al.	2502.09903	null
2025-02-14	Artificial Intelligence in Spectroscopy: Advancing Chemistry from Prediction to Generation and Beyond	Kehan Guo et.al.	2502.09897	null
2025-02-14	ChatIoT: Large Language Model-based Security Assistant for Internet of Things with Retrieval-Augmented Generation	Ye Dong et.al.	2502.09896	null
2025-02-14	ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation	Shu Wang et.al.	2502.09891	null
2025-02-14	Video2Policy: Scaling up Manipulation Tasks in Simulation through Internet Videos	Weirui Ye et.al.	2502.09886	null
2025-02-14	Solvable Dynamics of Self-Supervised Word Embeddings and the Emergence of Analogical Reasoning	Dhruva Karkada et.al.	2502.09863	null
2025-02-14	Microphone Array Geometry Independent Multi-Talker Distant ASR: NTT System for the DASR Task of the CHiME-8 Challenge	Naoyuki Kamo et.al.	2502.09859	null
2025-02-14	Automated Hypothesis Validation with Agentic Sequential Falsifications	Kexin Huang et.al.	2502.09858	link
2025-02-14	Port-LLM: A Port Prediction Method for Fluid Antenna based on Large Language Models	Yali Zhang et.al.	2502.09857	null
2025-02-14	Efficient Multitask Learning in Small Language Models Through Upside-Down Reinforcement Learning	Yu-Chen Lin et.al.	2502.09854	null
2025-02-14	HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation	Tianwei Lin et.al.	2502.09838	link
2025-02-13	A Solver-Aided Hierarchical Language for LLM-Driven CAD Design	Benjamin T. Jones et.al.	2502.09819	null
2025-02-13	Statistical Coherence Alignment for Large Language Model Representation Learning Through Tensor Field Convergence	Jonathan Gale et.al.	2502.09815	null
2025-02-13	INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages	Hao Yu et.al.	2502.09814	null
2025-02-13	AgentGuard: Repurposing Agentic Orchestrator for Safety Evaluation of Tool Orchestration	Jizhou Chen et.al.	2502.09809	null
2025-02-13	Unit Testing Past vs. Present: Examining LLMs' Impact on Defect Detection and Efficiency	Rudolf Ramler et.al.	2502.09801	null
2025-02-13	Co-designing Large Language Model Tools for Project-Based Learning with K12 Educators	Prerna Ravi et.al.	2502.09799	null
2025-02-13	A Survey on LLM-based News Recommender Systems	Rongyao Wang et.al.	2502.09797	null
2025-02-13	TableTalk: Scaffolding Spreadsheet Development with a Language Agent	Jenny T. Liang et.al.	2502.09787	null
2025-02-13	Improving Acoustic Side-Channel Attacks on Keyboards Using Transformers and Large Language Models	Jin Hyun Park et.al.	2502.09782	null
2025-02-13	CellFlow: Simulating Cellular Morphology Changes via Flow Matching	Yuhui Zhang et.al.	2502.09775	null
2025-02-13	Non-Markovian Discrete Diffusion with Causal Language Models	Yangtian Zhang et.al.	2502.09767	null
2025-02-13	LLM-Generated Microservice Implementations from RESTful API Definitions	Saurabh Chauhan et.al.	2502.09766	link
2025-02-13	Enhancing Jailbreak Attacks via Compliance-Refusal-Based Initialization	Amit Levi et.al.	2502.09755	null
2025-02-13	Vote-Tree-Planner: Optimizing Execution Order in LLM-based Task Planning Pipeline via Voting	Chaoyuan Zhang et.al.	2502.09749	null
2025-02-13	The Widespread Adoption of Large Language Model-Assisted Writing Across Society	Weixin Liang et.al.	2502.09747	null
2025-02-13	Fine-Tuning Foundation Models with Federated Learning for Privacy Preserving Medical Time Series Forecasting	Mahad Ali et.al.	2502.09744	null
2025-02-13	FoNE: Precise Single-Token Number Embeddings via Fourier Features	Tianyi Zhou et.al.	2502.09741	null
2025-02-13	Making Them a Malicious Database: Exploiting Query Code to Jailbreak Aligned Large Language Models	Qingsong Zou et.al.	2502.09723	link
2025-02-13	NestQuant: Nested Lattice Quantization for Matrix Products and LLMs	Semyon Savkin et.al.	2502.09720	null
2025-02-13	Genetic Data Governance in Crisis: Policy Recommendations for Safeguarding Privacy and Preventing Discrimination	Vivek Ramanan et.al.	2502.09716	null
2025-02-13	MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency	Dongzhi Jiang et.al.	2502.09621	null
2025-02-13	Exploring the Potential of Encoder-free Architectures in 3D LMMs	Yiwen Tang et.al.	2502.09620	link
2025-02-13	Designing a Conditional Prior Distribution for Flow-Based Generative Models	Noam Issachar et.al.	2502.09611	null
2025-02-14	Score-of-Mixture Training: Training One-Step Generative Models Made Simple via Score Estimation of Mixture Distributions	Tejas Jayashankar et.al.	2502.09609	null
2025-02-13	Human-LLM Coevolution: Evidence from Academic Writing	Mingmeng Geng et.al.	2502.09606	null
2025-02-13	SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models	Yung-Sung Chuang et.al.	2502.09604	link
2025-02-13	Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs	Siyan Zhao et.al.	2502.09597	link
2025-02-13	KIMAs: A Configurable Knowledge Integrated Multi-Agent System	Zitao Li et.al.	2502.09596	null
2025-02-13	Logical forms complement probability in understanding language model (and human) performance	Yixuan Wang et.al.	2502.09589	null
2025-02-13	Rolling Ahead Diffusion for Traffic Scene Simulation	Yunpeng Liu et.al.	2502.09587	null
2025-02-13	Polymind: Parallel Visual Diagramming with Large Language Models to Support Prewriting Through Microtasks	Qian Wan et.al.	2502.09577	null
2025-02-13	Zero-shot generation of synthetic neurosurgical data with large language models	Austin A. Barr et.al.	2502.09566	link
2025-02-13	MDCrow: Automating Molecular Dynamics Workflows with Large Language Models	Quintina Campbell et.al.	2502.09565	link
2025-02-13	EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents	Rui Yang et.al.	2502.09560	null
2025-02-13	Explainable AI-assisted Optimization for Feynman Integral Reduction	Zhuo-Yang Song et.al.	2502.09544	null
2025-02-13	Mind the Gap! Choice Independence in Using Multilingual LLMs for Persuasive Co-Writing Tasks in Different Languages	Shreyan Biswas et.al.	2502.09532	null
2025-02-13	SQ-GAN: Semantic Image Communications Using Masked Vector Quantization	Francesco Pezone et.al.	2502.09520	link
2025-02-13	Diffusion Models for Molecules: A Survey of Methods and Tasks	Liang Wang et.al.	2502.09511	link
2025-02-13	EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling	Theodoros Kouzelis et.al.	2502.09509	null
2025-02-13	Improve LLM-based Automatic Essay Scoring with Linguistic Features	Zhaoyi Joey Hou et.al.	2502.09497	null
2025-02-13	Foundation Neural-Network Quantum States	Riccardo Rende et.al.	2502.09488	null
2025-02-13	Objective quantification of mood states using large language models	Jakub Onysk et.al.	2502.09487	null
2025-02-13	DiffRenderGAN: Addressing Training Data Scarcity in Deep Segmentation Networks for Quantitative Nanomaterial Analysis through Differentiable Rendering and Generative Modelling	Dennis Possart et.al.	2502.09477	null
2025-02-13	Transformer-Enhanced Variational Autoencoder for Crystal Structure Prediction	Ziyi Chen et.al.	2502.09423	null
2025-02-13	ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation	Rotem Shalev-Arkushin et.al.	2502.09411	null
2025-02-13	SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models	Daniel Fleischer et.al.	2502.09390	link
2025-02-13	Truth Knows No Language: Evaluating Truthfulness Beyond English	Blanca Calvo Figueras et.al.	2502.09387	null
2025-02-13	APT-LLM: Embedding-Based Anomaly Detection of Cyber Advanced Persistent Threats Using Large Language Models	Sidahmed Benabderrahmane et.al.	2502.09385	null
2025-02-13	LoRA Training Provably Converges to a Low-Rank Global Minimum or It Fails Loudly (But it Probably Won't Fail)	Junsu Kim et.al.	2502.09376	null
2025-02-13	Inverse problems with experiment-guided AlphaFold	Advaith Maddipatla et.al.	2502.09372	null
2025-02-13	Language Agents as Digital Representatives in Collective Decision-Making	Daniel Jarrett et.al.	2502.09369	null
2025-02-13	Machine learning for modelling unstructured grid data in computational physics: a review	Sibo Cheng et.al.	2502.09346	null
2025-02-13	ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments	Youhe Jiang et.al.	2502.09334	null
2025-02-13	Beyond English: The Impact of Prompt Translation Strategies across Languages and Tasks in Multilingual LLMs	Itai Mondshine et.al.	2502.09331	null
2025-02-13	Copilot Arena: A Platform for Code LLM Evaluation in the Wild	Wayne Chi et.al.	2502.09328	null
2025-02-13	A Benchmark for Crime Surveillance Video Analysis with Large Models	Haoran Chen et.al.	2502.09325	null
2025-02-13	A Judge-free LLM Open-ended Generation Benchmark Based on the Distributional Hypothesis	Kentaro Imajo et.al.	2502.09316	link
2025-02-13	When the LM misunderstood the human chuckled: Analyzing garden path effects in humans and language models	Samuel Joseph Amouyal et.al.	2502.09307	null
2025-02-13	Non-asymptotic Analysis of Diffusion Annealed Langevin Monte Carlo for Generative Modelling	Paula Cordero-Encinar et.al.	2502.09306	null
2025-02-13	KET-RAG: A Cost-Efficient Multi-Granular Indexing Framework for Graph-RAG	Yiqian Huang et.al.	2502.09304	link
2025-02-13	When do neural networks learn world models?	Tianren Zhang et.al.	2502.09297	null
2025-02-13	SparQLe: Speech Queries to Text Translation Through LLMs	Amirbek Djanibekov et.al.	2502.09284	null
2025-02-13	GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation	Hongyin Zhang et.al.	2502.09268	null
2025-02-13	AnomalyGFM: Graph Foundation Model for Zero/Few-shot Anomaly Detection	Hezhe Qiao et.al.	2502.09254	null
2025-02-13	From large language models to multimodal AI: A scoping review on the potential of generative AI in medicine	Lukas Buess et.al.	2502.09242	null
2025-02-13	OpenBench: A New Benchmark and Baseline for Semantic Navigation in Smart Logistics	Junhui Wang et.al.	2502.09238	null
2025-02-13	Reliable Conversational Agents under ASP Control that Understand Natural Language	Yankai Zeng et.al.	2502.09237	null
2025-02-13	Data2Concept2Text: An Explainable Multilingual Framework for Data Analysis Narration	Flavio Bertini et.al.	2502.09218	null
2025-02-13	LP-LM: No Hallucinations in Question Answering with Logic Programming	Katherine Wu et.al.	2502.09212	link
2025-02-13	Visual Graph Question Answering with ASP and LLMs for Language Parsing	Jakob Johannes Bauer et.al.	2502.09211	null
2025-02-13	On LLM-generated Logic Programs and their Inference Execution Methods	Paul Tarau et.al.	2502.09209	null
2025-02-13	Logical Lease Litigation: Prolog and LLMs for Rental Law Compliance in New York	Sanskar Sehgal et.al.	2502.09204	null
2025-02-13	XAInomaly: Explainable and Interpretable Deep Contractive Autoencoder for O-RAN Traffic Anomaly Detection	Osman Tugay Basaran et.al.	2502.09194	null
2025-02-13	Thinking beyond the anthropomorphic paradigm benefits LLM research	Lujain Ibrahim et.al.	2502.09192	null
2025-02-13	Matina: A Large-Scale 73B Token Persian Text Corpus	Sara Bourbour Hosseinbeigi et.al.	2502.09188	null
2025-02-13	RefineCoder: Iterative Improving of Large Language Models via Adaptive Critique Refinement for Code Generation	Changzhi Zhou et.al.	2502.09183	null
2025-02-13	FLAME: Flexible LLM-Assisted Moderation Engine	Ivan Bakulin et.al.	2502.09175	null
2025-02-13	Two-Stage Representation Learning for Analyzing Movement Behavior Dynamics in People Living with Dementia	Jin Cui et.al.	2502.09173	null
2025-02-13	Improving TCM Question Answering through Tree-Organized Self-Reflective Retrieval with LLMs	Chang Liu et.al.	2502.09156	null
2025-02-13	Finite-Time Analysis of Discrete-Time Stochastic Interpolants	Yuhao Liu et.al.	2502.09130	null
2025-02-13	One-shot Federated Learning Methods: A Practical Guide	Xiang Liu et.al.	2502.09104	null
2025-02-13	Bridging the Gap Between LLMs and Human Intentions: Progresses and Challenges in Instruction Understanding, Intention Reasoning, and Reliable Generation	Zongyu Chang et.al.	2502.09101	null
2025-02-13	Logical Reasoning in Large Language Models: A Survey	Hanmeng Liu et.al.	2502.09100	null
2025-02-13	Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking	Greta Warren et.al.	2502.09083	null
2025-02-13	CoSER: Coordinating LLM-Based Persona Simulation of Established Roles	Xintao Wang et.al.	2502.09082	link
2025-02-13	Enhancing RAG with Active Learning on Conversation Records: Reject Incapables and Answer Capables	Xuzhao Geng et.al.	2502.09073	null
2025-02-13	Unleashing the Power of Large Language Model for Denoising Recommendation	Shuyao Wang et.al.	2502.09058	null
2025-02-13	An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging	Kunat Pipatanakul et.al.	2502.09056	null
2025-02-13	Game Theory Meets Large Language Models: A Systematic Survey	Haoran Sun et.al.	2502.09053	null
2025-02-13	Typhoon T1: An Open Thai Reasoning Model	Pittawat Taveekitworachai et.al.	2502.09042	null
2025-02-13	Implementation of a Fuzzy Relational Database. Case Study: Chilean Cardboard Industry in the Maule Region	Leoncio Jimenez et.al.	2502.09035	null
2025-02-13	MTDP: Modulated Transformer Diffusion Policy Model	Qianhao Wang et.al.	2502.09029	null
2025-02-13	EventSTR: A Benchmark Dataset and Baselines for Event Stream based Scene Text Recognition	Xiao Wang et.al.	2502.09020	link
2025-02-13	Diversity Enhances an LLM's Performance in RAG and Long-context Task	Zhchao Wang et.al.	2502.09017	null
2025-02-13	Hope vs. Hate: Understanding User Interactions with LGBTQ+ News Content in Mainstream US News Media through the Lens of Hope Speech	Jonathan Pofcher et.al.	2502.09004	null
2025-02-13	RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models	Quan Wei et.al.	2502.09003	null
2025-02-13	End-to-End triplet loss based fine-tuning for network embedding in effective PII detection	Rishika Kohli et.al.	2502.09002	null
2025-02-13	Task Generalization With AutoRegressive Compositional Structure: Can Learning From $\d$ Tasks Generalize to $\d^{T}$ Tasks?	Amirhesam Abedsoltan et.al.	2502.08991	null
2025-02-13	Prophet Inequalities for Bandits, Cabinets, and DAGs	Robin Bowers et.al.	2502.08976	null
2025-02-13	Medicine on the Edge: Comparative Performance Analysis of On-Device LLMs for Clinical Reasoning	Leon Nissen et.al.	2502.08954	link
2025-02-13	Structured Convergence in Large Language Model Representations via Hierarchical Latent Space Folding	Fenella Harcourt et.al.	2502.08947	null
2025-02-13	Beyond the Singular: The Essential Role of Multiple Generations in Effective Benchmark Evaluation and Analysis	Wenbo Zhang et.al.	2502.08943	null
2025-02-13	Escaping Collapse: The Strength of Weak Data for Large Language Model Training	Kareem Amin et.al.	2502.08924	null
2025-02-13	Self-Consistency of the Internal Reward Models Improves Self-Rewarding Language Models	Xin Zhou et.al.	2502.08922	null
2025-02-13	Detecting Malicious Concepts Without Image Generation in AIGC	Kun Xu et.al.	2502.08921	null
2025-02-13	InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU	Heejun Lee et.al.	2502.08910	null
2025-02-13	Towards Automated Fact-Checking of Real-World Claims: Exploring Task Formulation and Assessment with LLMs	Premtim Sahitaj et.al.	2502.08909	null
2025-02-13	Reinforced Large Language Model is a formal theorem prover	Zhiling Luo et.al.	2502.08908	link
2025-02-13	DiffoRA: Enabling Parameter-Efficient LLM Fine-Tuning via Differential Low-Rank Matrix Adaptation	Tangyu Jiang et.al.	2502.08905	null
2025-02-13	MIH-TCCT: Mitigating Inconsistent Hallucinations in LLMs via Event-Driven Text-Code Cyclic Training	Xinxin You et.al.	2502.08904	null
2025-02-13	3D-Grounded Vision-Language Framework for Robotic Task Planning: Automated Prompt Synthesis and Supervised Reasoning	Guoqin Tang et.al.	2502.08903	null
2025-02-13	Communication is All You Need: Persuasion Dataset Construction via Multi-LLM Communication	Weicheng Ma et.al.	2502.08896	null
2025-02-13	ShapeLib: designing a library of procedural 3D shape abstractions with Large Language Models	R. Kenny Jones et.al.	2502.08884	null
2025-02-13	Utilizing Pre-trained and Large Language Models for 10-K Items Segmentation	Hsin-Min Lu et.al.	2502.08875	null
2025-02-13	Harnessing Vision Models for Time Series Analysis: A Survey	Jingchao Ni et.al.	2502.08869	link
2025-02-13	A Systematic Evaluation of Generative Models on Tabular Transportation Data	Chengen Wang et.al.	2502.08856	link
2025-02-12	Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation	Mohammad Mahdi Abootorabi et.al.	2502.08826	link
2025-02-12	DejAIvu: Identifying and Explaining AI Art on the Web in Real-Time with Saliency Maps	Jocelyn Dzuong et.al.	2502.08821	link
2025-02-12	Can a Single Model Master Both Multi-turn Conversations and Tool Use? CALM: A Unified Conversational Agentic Language Model	Emre Can Acikgoz et.al.	2502.08820	null
2025-02-12	Lexical Manifold Reconfiguration in Large Language Models: A Novel Architectural Approach for Contextual Modulation	Koinis Vassilis et.al.	2502.08818	null
2025-02-12	Examining Multilingual Embedding Models Cross-Lingually Through LLM-Generated Adversarial Examples	Andrianos Michail et.al.	2502.08638	null
2025-02-12	Ensemble based approach to quantifying uncertainty of LLM based classifications	Srijith Rajamohan et.al.	2502.08631	null
2025-02-12	Continuous Cardiac Arrest Prediction in ICU using PPG Foundation Model	Saurabh Kataria et.al.	2502.08612	null
2025-02-12	Causal Analysis of ASR Errors for Children: Quantifying the Impact of Physiological, Cognitive, and Extrinsic Factors	Vishwanath Pratap Singh et.al.	2502.08587	null
2025-02-12	Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks	Ang Li et.al.	2502.08586	null
2025-02-12	Statistically validated projection of bipartite signed networks	Anna Gallo et.al.	2502.08567	null
2025-02-12	QA-Expand: Multi-Question Answer Generation for Enhanced Query Expansion in Information Retrieval	Wonduk Seo et.al.	2502.08557	null
2025-02-12	Human-Centric Foundation Models: Perception, Generation and Agentic Modeling	Shixiang Tang et.al.	2502.08556	link
2025-02-12	Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies	Sunnie S. Y. Kim et.al.	2502.08554	null
2025-02-12	LLMs can implicitly learn from mistakes in-context	Lisa Alazraki et.al.	2502.08550	null
2025-02-12	LLM Pretraining with Continuous Concepts	Jihoon Tack et.al.	2502.08524	null
2025-02-12	FedMHO: Heterogeneous One-Shot Federated Learning Towards Resource-Constrained Edge Devices	Dezhong Yao et.al.	2502.08518	link
2025-02-12	The Paradox of Stochasticity: Limited Creativity and Computational Decoupling in Temperature-Varied LLM Outputs of Structured Fictional Data	Evgenii Evstafev et.al.	2502.08515	null
2025-02-12	Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary Evaluation	Mahnaz Koupaee et.al.	2502.08514	link
2025-02-12	Measuring Diversity in Synthetic Datasets	Yuchang Zhu et.al.	2502.08512	link
2025-02-12	Explanation based In-Context Demonstrations Retrieval for Multilingual Grammatical Error Correction	Wei Li et.al.	2502.08507	link
2025-02-12	Salamandra Technical Report	Aitor Gonzalez-Agirre et.al.	2502.08489	link
2025-02-12	One-Shot Federated Learning with Classifier-Free Diffusion Models	Obaidullah Zaland et.al.	2502.08488	null
2025-02-12	Computed fingertip touch for the instrumental control of musical sound with an excursion on the computed retinal afterimage	Staas de Jong et.al.	2502.08471	null
2025-02-12	mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data	Haonan Chen et.al.	2502.08468	link
2025-02-12	From Haystack to Needle: Label Space Reduction for Zero-shot Classification	Nathan Vandemoortele et.al.	2502.08436	null
2025-02-12	IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance	Paul Röttger et.al.	2502.08395	null
2025-02-12	ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification	Jiangbo Shi et.al.	2502.08391	link
2025-02-12	Top-Theta Attention: Sparsifying Transformers by Compensated Thresholding	Konstantin Berestizshevsky et.al.	2502.08363	link
2025-02-12	Systematic Knowledge Injection into Large Language Models via Diverse Augmentation for Domain-Specific RAG	Kushagra Bhushan et.al.	2502.08356	null
2025-02-12	Trustworthy GNNs with LLMs: A Systematic Review and Taxonomy	Ruizhan Xue et.al.	2502.08353	null
2025-02-12	Graph Foundation Models for Recommendation: A Comprehensive Survey	Bin Wu et.al.	2502.08346	null
2025-02-12	Foundation Models in Computational Pathology: A Review of Challenges, Opportunities, and Impact	Mohsin Bilal et.al.	2502.08333	null
2025-02-12	Modification and Generated-Text Detection: Achieving Dual Detection Capabilities for the Outputs of LLM by Watermark	Yuhang Cai et.al.	2502.08332	null
2025-02-12	Contextual Compression Encoding for Large Language Models: A Novel Framework for Multi-Layered Parameter Space Pruning	Barnaby Schmitt et.al.	2502.08323	null
2025-02-12	MultiProSE: A Multi-label Arabic Dataset for Propaganda, Sentiment, and Emotion Detection	Lubna Al-Henaki et.al.	2502.08319	null
2025-02-12	Word Synchronization Challenge: A Benchmark for Word Association Responses for LLMs	Tanguy Cazalets et.al.	2502.08312	null
2025-02-12	Unlocking Scaling Law in Industrial Recommendation Systems with a Three-step Paradigm based Large User Model	Bencheng Yan et.al.	2502.08309	null
2025-02-12	HDT: Hierarchical Discrete Transformer for Multivariate Time Series Forecasting	Shibo Feng et.al.	2502.08302	link
2025-02-12	Compromising Honesty and Harmlessness in Language Models via Deception Attacks	Laurène Vaugrante et.al.	2502.08301	null
2025-02-12	Improving Existing Optimization Algorithms with LLMs	Camilo Chacón Sartori et.al.	2502.08298	null
2025-02-12	Redefining Simplicity: Benchmarking Large Language Models from Lexical to Document Simplification	Jipeng Qiang et.al.	2502.08281	null
2025-02-12	MoLoRec: A Generalizable and Efficient Framework for LLM-Based Recommendation	Min Hou et.al.	2502.08271	null
2025-02-12	Exploring the Potential of Large Language Models to Simulate Personality	Maria Molchanova et.al.	2502.08265	link
2025-02-12	GenIAS: Generator for Instantiating Anomalies in time Series	Zahra Zamanzadeh Darban et.al.	2502.08262	null
2025-02-12	FixDrive: Automatically Repairing Autonomous Vehicle Driving Behaviour for $0.08 per Violation	Yang Sun et.al.	2502.08260	link
2025-02-12	Learning Human Skill Generators at Key-Step Levels	Yilu Wu et.al.	2502.08234	null
2025-02-12	Flow-of-Action: SOP Enhanced LLM-Based Multi-Agent System for Root Cause Analysis	Changhua Pei et.al.	2502.08224	null
2025-02-12	Memory Offloading for Large Language Model Inference with Latency SLO Guarantees	Chenxiang Ma et.al.	2502.08182	null
2025-02-12	Enhancing LLM Character-Level Manipulation via Divide and Conquer	Zhen Xiong et.al.	2502.08180	null
2025-02-12	ParetoRAG: Leveraging Sentence-Context Attention for Robust and Efficient Retrieval-Augmented Generation	Ruobing Yao et.al.	2502.08178	null
2025-02-12	SycEval: Evaluating LLM Sycophancy	Aaron Fanous et.al.	2502.08177	null
2025-02-12	Intention is All You Need: Refining Your Code from Your Intention	Qi Guo et.al.	2502.08172	null
2025-02-12	Force Matching with Relativistic Constraints: A Physics-Inspired Approach to Stable and Efficient Generative Modeling	Yang Cao et.al.	2502.08150	null
2025-02-12	ACCESS : A Benchmark for Abstract Causal Event Discovery and Reasoning	Vy Vo et.al.	2502.08148	null
2025-02-12	Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers	Siddharth Singh et.al.	2502.08145	null
2025-02-12	Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences	Shanshan Han et.al.	2502.08142	null
2025-02-12	LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits	Zikai Zhou et.al.	2502.08141	null
2025-02-12	Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models	Sonam Gupta et.al.	2502.08130	null
2025-02-12	Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance	Lingfei Qian et.al.	2502.08127	link
2025-02-12	HuDEx: Integrating Hallucination Detection and Explainability for Enhancing the Reliability of LLM responses	Sujeong Lee et.al.	2502.08109	null
2025-02-12	Large language models perpetuate bias in palliative care: development and analysis of the Palliative Care Adversarial Dataset (PCAD)	Naomi Akhras et.al.	2502.08073	null
2025-02-12	On Mechanistic Circuits for Extractive Question-Answering	Samyadeep Basu et.al.	2502.08059	null
2025-02-12	Break the Checkbox: Challenging Closed-Style Evaluations of Cultural Alignment in LLMs	Mohsinul Kabir et.al.	2502.08045	null
2025-02-12	Franken-Adapter: Cross-Lingual Adaptation of LLMs by Embedding Surgery	Fan Jiang et.al.	2502.08037	null
2025-02-12	Stochastic Kinetics of Transcription: Analysis and Computation	Yuntao Lu et.al.	2502.08028	null
2025-02-12	Contextual Subspace Manifold Projection for Structural Refinement of Large Language Model Representations	Alistair Wren et.al.	2502.08026	null
2025-02-11	Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding	Ziyao Wang et.al.	2502.08020	null
2025-02-11	The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models	Artem Kirsanov et.al.	2502.08009	null
2025-02-11	An Interactive Framework for Implementing Privacy-Preserving Federated Learning: Experiments on Large Language Models	Kasra Ahmadi et.al.	2502.08008	link
2025-02-11	Towards Training One-Step Diffusion Models Without Distillation	Mingtian Zhang et.al.	2502.08005	null
2025-02-11	Universal Adversarial Attack on Aligned Multimodal LLMs	Temurbek Rahmatullaev et.al.	2502.07987	null
2025-02-11	Deep Semantic Graph Learning via LLM based Node Enhancement	Chuanqi Shi et.al.	2502.07982	null
2025-02-11	CIRCUIT: A Benchmark for Circuit Interpretation and Reasoning Capabilities of LLMs	Lejla Skelic et.al.	2502.07980	null
2025-02-11	From Hazard Identification to Controller Design: Proactive and LLM-Supported Safety Engineering for ML-Powered Systems	Yining Hong et.al.	2502.07974	null
2025-02-11	Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature?	Hye Sun Yun et.al.	2502.07963	null
2025-02-11	Accelerating Scientific Research Through a Multi-LLM Framework	Joaquin Ramirez-Medina et.al.	2502.07960	null
2025-02-11	Bridging HCI and AI Research for the Evaluation of Conversational SE Assistants	Jonan Richards et.al.	2502.07956	null
2025-02-11	Symbiotic Cooperation for Web Agents: Harnessing Complementary Strengths of Large and Small LLMs	Ruichen Zhang et.al.	2502.07942	null
2025-02-11	Discrete Markov Probabilistic Models	Le-Tuyet-Nhi Pham et.al.	2502.07939	null
2025-02-11	Distributed Approach to Haskell Based Applications Refactoring with LLMs Based Multi-Agent Systems	Shahbaz Siddeeq et.al.	2502.07928	null
2025-02-11	Sign Operator for Coping with Heavy-Tailed Noise: High Probability Convergence Bounds with Extensions to Distributed Optimization and Comparison Oracle	Nikita Kornilov et.al.	2502.07923	null
2025-02-11	Elevating Legal LLM Responses: Harnessing Trainable Logical Structures and Semantic Knowledge with Legal Reasoning	Rujing Yao et.al.	2502.07912	link
2025-02-11	DeepSeek on a Trip: Inducing Targeted Visual Hallucinations via Representation Vulnerabilities	Chashi Mahiul Islam et.al.	2502.07905	null
2025-02-11	Intelligent Legal Assistant: An Interactive Clarification System for Legal Question Answering	Rujing Yao et.al.	2502.07904	null
2025-02-11	HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment	Youhe Jiang et.al.	2502.07903	null
2025-02-11	TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation	Alex Jinpeng Wang et.al.	2502.07870	link
2025-02-11	TransMLA: Multi-head Latent Attention Is All You Need	Fanxu Meng et.al.	2502.07864	link
2025-02-11	BalanceKV: KV Cache Compression through Discrepancy Theory	Insu Han et.al.	2502.07861	null
2025-02-11	Pippo: High-Resolution Multi-View Humans from a Single Image	Yash Kant et.al.	2502.07785	null
2025-02-11	DarwinLM: Evolutionary Structured Pruning of Large Language Models	Shengkun Tang et.al.	2502.07780	null
2025-02-11	Stay-Positive: A Case for Ignoring Real Image Features in Fake Image Detection	Anirudh Sundara Rajan et.al.	2502.07778	null
2025-02-11	Auditing Prompt Caching in Language Model APIs	Chenchen Gu et.al.	2502.07776	link
2025-02-11	Automatic Robot Task Planning by Integrating Large Language Model with Genetic Programming	Azizjon Kobilov et.al.	2502.07772	null
2025-02-11	Great Power Brings Great Responsibility: Personalizing Conversational AI for Diverse Problem-Solvers	Italo Santos et.al.	2502.07763	null
2025-02-11	Scalable Fingerprinting of Large Language Models	Anshul Nasery et.al.	2502.07760	null
2025-02-11	Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension	Wenbo Gong et.al.	2502.07752	null
2025-02-11	WHODUNIT: Evaluation benchmark for culprit detection in mystery stories	Kshitij Gupta et.al.	2502.07747	link
2025-02-11	The Economics of Large Language Models: Token Allocation, Fine-Tuning, and Optimal Pricing	Dirk Bergemann et.al.	2502.07736	null
2025-02-11	Revisiting Non-Acyclic GFlowNets in Discrete Environments	Nikita Morozov et.al.	2502.07735	link
2025-02-11	Economics of Sourcing Human Data	Sebastin Santy et.al.	2502.07732	null
2025-02-11	Verifying LLM-Generated Code in the Context of Software Verification with Ada/SPARK	Marcos Cramer et.al.	2502.07728	null
2025-02-11	Near-Optimal Sample Complexity in Reward-Free Kernel-Based Reinforcement Learning	Aya Kayal et.al.	2502.07715	null
2025-02-11	Magic 1-For-1: Generating One Minute Video Clips within One Minute	Hongwei Yi et.al.	2502.07701	link
2025-02-11	A Framework for LLM-powered Design Assistants	Swaroop Panda et.al.	2502.07698	null
2025-02-11	Large Language Models as Proxies for Theories of Human Linguistic Cognition	Imry Ziv et.al.	2502.07687	null
2025-02-11	Steering Protein Family Design through Profile Bayesian Flow	Jingjing Gong et.al.	2502.07671	null
2025-02-11	Guiding Time-Varying Generative Models with Natural Gradients on Exponential Family Manifold	Song Liu et.al.	2502.07650	null
2025-02-11	SymGPT: Auditing Smart Contracts via Combining Symbolic Execution with Large Language Models	Shihao Xia et.al.	2502.07644	null
2025-02-11	FoQA: A Faroese Question-Answering Dataset	Annika Simonsen et.al.	2502.07642	null
2025-02-11	Distributional Instrumental Variable Method	Anastasiia Holovchak et.al.	2502.07641	link
2025-02-11	Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving	Yong Lin et.al.	2502.07640	link
2025-02-11	Consistency Training with Physical Constraints	Che-Chia Chang et.al.	2502.07636	null
2025-02-11	Exploring Mobile Touch Interaction with Large Language Models	Tim Zindulka et.al.	2502.07629	null
2025-02-11	Tractable Transformers for Flexible Conditional Generation	Anji Liu et.al.	2502.07616	null
2025-02-11	Beyond Prompting: Time2Lang -- Bridging Time-Series Foundation Models and Large Language Models for Health Sensing	Arvind Pillai et.al.	2502.07608	null
2025-02-11	Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models	Jiacong Xu et.al.	2502.07601	null
2025-02-11	Towards spatial computing: recent advances in multimodal natural interaction for XR headsets	Zhimin Wang et.al.	2502.07598	null
2025-02-11	SEMU: Singular Value Decomposition for Efficient Machine Unlearning	Marcin Sendera et.al.	2502.07587	null
2025-02-11	Generative Modeling with Bayesian Sample Inference	Marten Lienen et.al.	2502.07580	link
2025-02-11	PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference	Yufeng Gu et.al.	2502.07578	link
2025-02-11	Automated Capability Discovery via Model Self-Exploration	Cong Lu et.al.	2502.07577	link
2025-02-11	JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation	Shenyi Zhang et.al.	2502.07557	link
2025-02-11	O1 Embedder: Let Retrievers Think Before Action	Ruin Yan et.al.	2502.07555	null
2025-02-11	Grammar Control in Dialogue Response Generation for Language Learning Chatbots	Dominik Glandorf et.al.	2502.07544	link
2025-02-11	NatureLM: Deciphering the Language of Nature for Scientific Discovery	Yingce Xia et.al.	2502.07527	null
2025-02-11	The Devil is in the Prompts: De-Identification Traces Enhance Memorization Risks in Synthetic Chest X-Ray Generation	Raman Dutt et.al.	2502.07516	link
2025-02-11	Enhance-A-Video: Better Generated Video for Free	Yang Luo et.al.	2502.07508	link
2025-02-11	Towards THz-based Obstacle Sensing: A Generative Radio Environment Awareness Framework	Tianyu Hu et.al.	2502.07504	null
2025-02-11	Unified Graph Networks (UGN): A Deep Neural Framework for Solving Graph Problems	Rudrajit Dawn et.al.	2502.07500	null
2025-02-11	LLM-Sketch: Enhancing Network Sketches with LLM	Yuanpeng Li et.al.	2502.07495	link
2025-02-11	Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More	Xialie Zhuang et.al.	2502.07490	link
2025-02-11	Improving Adaptive Moment Optimization via Preconditioner Diagonalization	Son Nguyen et.al.	2502.07488	null
2025-02-11	ETimeline: An Extensive Timeline Generation Dataset based on Large Language Model	Xiaochen Liu et.al.	2502.07474	null
2025-02-11	JamendoMaxCaps: A Large Scale Music-caption Dataset with Imputed Metadata	Abhinaba Roy et.al.	2502.07461	link
2025-02-11	Logarithmic Regret for Online KL-Regularized Reinforcement Learning	Heyang Zhao et.al.	2502.07460	null
2025-02-11	PerCul: A Story-Driven Cultural Evaluation of LLMs in Persian	Erfan Moosavi Monazzah et.al.	2502.07459	null
2025-02-11	RusCode: Russian Cultural Code Benchmark for Text-to-Image Generation	Viacheslav Vasilev et.al.	2502.07455	link
2025-02-11	Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon	Nurit Cohen-Inger et.al.	2502.07445	link
2025-02-11	Towards a Foundation Model for Physics-Informed Neural Networks: Multi-PDE Learning with Active Sampling	Keon Vin Park et.al.	2502.07425	null
2025-02-11	RomanLens: Latent Romanization and its role in Multilinguality in LLMs	Alan Saji et.al.	2502.07424	null
2025-02-11	Entity Linking using LLMs for Automated Product Carbon Footprint Estimation	Steffen Castle et.al.	2502.07418	null
2025-02-11	EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering	Sheng Zhou et.al.	2502.07411	link
2025-02-11	MGPATH: Vision-Language Model with Multi-Granular Prompt Learning for Few-Shot WSI Classification	Anh-Tien Nguyen et.al.	2502.07409	link
2025-02-11	On Iterative Evaluation and Enhancement of Code Quality Using GPT-4o	Rundong Liu et.al.	2502.07399	link
2025-02-11	FinRL-DeepSeek: LLM-Infused Risk-Sensitive Reinforcement Learning for Trading Agents	Mostapha Benhenda et.al.	2502.07393	link
2025-02-11	LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!	Dacheng Li et.al.	2502.07374	link
2025-02-11	EvoFlow: Evolving Diverse Agentic Workflows On The Fly	Guibin Zhang et.al.	2502.07373	null
2025-02-11	LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation	Zican Dong et.al.	2502.07365	null
2025-02-11	Bridging the Evaluation Gap: Leveraging Large Language Models for Topic Model Evaluation	Zhiyin Tan et.al.	2502.07352	link
2025-02-11	KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems	Jusheng Zhang et.al.	2502.07350	null
2025-02-11	BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models	Xu Huang et.al.	2502.07346	link
2025-02-11	Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering	Shuzheng Si et.al.	2502.07340	link
2025-02-11	Music for All: Exploring Multicultural Representations in Music Generation Models (Camera Ready)	Atharva Mehta et.al.	2502.07328	link
2025-02-11	Generative Ghost: Investigating Ranking Bias Hidden in AI-Generated Videos	Haowen Gao et.al.	2502.07327	null
2025-02-11	MEMIT-Merge: Addressing MEMIT's Key-Value Conflicts in Same-Subject Batch Editing for LLMs	Zilu Dong et.al.	2502.07322	null
2025-02-11	CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction	Junlong Li et.al.	2502.07316	link
2025-02-11	Prompt-Based Document Modifications In Ranking Competitions	Niv Bardas et.al.	2502.07315	null
2025-02-11	CreAgent: Towards Long-Term Evaluation of Recommender System under Platform-Creator Information Asymmetry	Xiaopeng Ye et.al.	2502.07307	link
2025-02-11	TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation	Navid Rajabi et.al.	2502.07306	null
2025-02-11	Flow Matching for Collaborative Filtering	Chengkai Liu et.al.	2502.07303	link
2025-02-11	Generation of Drug-Induced Cardiac Reactions towards Virtual Clinical Trials	Qian Shao et.al.	2502.07297	null
2025-02-11	Small Language Model Makes an Effective Long Text Extractor	Yelin Chen et.al.	2502.07286	link
2025-02-11	Articulate That Object Part (ATOP): 3D Part Articulation from Text and Motion Personalization	Aditya Vora et.al.	2502.07278	null
2025-02-11	Cost-Efficient Continual Learning with Sufficient Exemplar Memory	Dongkyu Cho et.al.	2502.07274	null
2025-02-11	GENERator: A Long-Context Generative Genomic Foundation Model	Wei Wu et.al.	2502.07272	null
2025-02-11	When More is Less: Understanding Chain-of-Thought Length in LLMs	Yuyang Wu et.al.	2502.07266	null
2025-02-11	DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization	Xuefeng Liu et.al.	2502.07237	null
2025-02-11	A Memory Efficient Randomized Subspace Optimization Method for Training Large Language Models	Yiming Chen et.al.	2502.07222	null
2025-02-11	MLLM4PUE: Toward Universal Embeddings in Computational Pathology through Multimodal LLMs	Qifeng Zhou et.al.	2502.07221	null
2025-02-11	LUNAR: LLM Unlearning via Neural Activation Redirection	William F. Shen et.al.	2502.07218	null
2025-02-11	Playmate: Flexible Control of Portrait Animation via 3D-Implicit Space Guided Diffusion	Xingpei Ma et.al.	2502.07203	null
2025-02-11	Provably Efficient RLHF Pipeline: A Unified View from Contextual Bandits	Long-Fei Li et.al.	2502.07193	link
2025-02-11	Bag of Tricks for Inference-time Computation of LLM Reasoning	Fan Liu et.al.	2502.07191	null
2025-02-11	A Large-Scale Benchmark for Vietnamese Sentence Paraphrases	Sang Quang Nguyen et.al.	2502.07188	link
2025-02-11	Refine Knowledge of Large Language Models via Adaptive Contrastive Learning	Yinghui Li et.al.	2502.07184	null
2025-02-11	Does Training on Synthetic Data Make Models Less Robust?	Lingze Zhang et.al.	2502.07164	null
2025-02-11	Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning	Feng Chen et.al.	2502.07154	link
2025-02-11	Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded Reasoning	Jiayuan Zhu et.al.	2502.07143	null
2025-02-11	Language-TPP: Integrating Temporal Point Processes with Language Models for Event Analysis	Quyu Kong et.al.	2502.07139	null
2025-02-10	Cardiverse: Harnessing LLMs for Novel Card Game Prototyping	Danrui Li et.al.	2502.07128	null
2025-02-10	Structural Reformation of Large Language Model Neuron Encapsulation for Divergent Information Aggregation	Denis Bakushev et.al.	2502.07124	null
2025-02-10	Online Scheduling for LLM Inference with KV Cache Constraints	Patrick Jaillet et.al.	2502.07115	null
2025-02-10	Generative Distribution Prediction: A Unified Approach to Multimodal Learning	Xinyu Tian et.al.	2502.07090	null
2025-02-10	Evaluating the Systematic Reasoning Abilities of Large Language Models through Graph Coloring	Alex Heyman et.al.	2502.07087	link
2025-02-10	MPFBench: A Large Scale Dataset for SciML of Multi-Phase-Flows: Droplet and Bubble Dynamics	Mehdi Shadkhah et.al.	2502.07080	null
2025-02-10	Multi-turn Evaluation of Anthropomorphic Behaviours in Large Language Models	Lujain Ibrahim et.al.	2502.07077	null
2025-02-10	IRepair: An Intent-Aware Approach to Repair Data-Driven Errors in Large Language Models	Sayem Mohammad Imtiaz et.al.	2502.07072	null
2025-02-10	Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations	Yong Cao et.al.	2502.07068	link
2025-02-10	Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT	Dongyang Liu et.al.	2502.06782	null
2025-02-10	Enhancing Performance of Explainable AI Models with Constrained Concept Refinement	Geyu Liang et.al.	2502.06775	null
2025-02-10	Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions	Jaeyeon Kim et.al.	2502.06768	null
2025-02-10	Rationalization Models for Text-to-SQL	Gaetano Rossiello et.al.	2502.06759	null
2025-02-10	Accelerating Data Processing and Benchmarking of AI Models for Pathology	Andrew Zhang et.al.	2502.06750	link
2025-02-10	Gradient Multi-Normalization for Stateless and Scalable LLM Training	Meyer Scetbon et.al.	2502.06742	null
2025-02-10	VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data	Thomas Zeng et.al.	2502.06737	null
2025-02-10	Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists	Bojia Zi et.al.	2502.06734	null
2025-02-10	Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining	Daouda Sow et.al.	2502.06733	null
2025-02-10	Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling	Runze Liu et.al.	2502.06703	link
2025-02-10	No Trick, No Treat: Pursuits and Challenges Towards Simulation-free Training of Neural Samplers	Jiajun He et.al.	2502.06685	null
2025-02-10	EquiTabPFN: A Target-Permutation Equivariant Prior Fitted Networks	Michael Arbel et.al.	2502.06684	null
2025-02-10	Boosting Self-Efficacy and Performance of Large Language Models via Verbal Efficacy Stimulations	Rui Chen et.al.	2502.06669	null
2025-02-10	Automatic Evaluation of Healthcare LLMs Beyond Question-Answering	Anna Arias-Duart et.al.	2502.06666	null
2025-02-10	Evaluation of Deep Audio Representations for Hearables	Fabian Gröger et.al.	2502.06664	null
2025-02-10	EfficientLLM: Scalable Pruning-Aware Pretraining for Architecture-Agnostic Edge Language Models	Xingrun Xing et.al.	2502.06663	null
2025-02-10	Unbiased Evaluation of Large Language Models from a Causal Perspective	Meilin Chen et.al.	2502.06655	null
2025-02-10	In-Context Learning (and Unlearning) of Length Biases	Stephanie Schoch et.al.	2502.06653	null
2025-02-10	Transparent NLP: Using RAG and LLM Alignment for Privacy Q&A	Anna Leschanowsky et.al.	2502.06652	null
2025-02-10	Automatic Annotation Augmentation Boosts Translation between Molecules and Natural Language	Zhiqiang Zhong et.al.	2502.06634	null
2025-02-10	Combining Large Language Models with Static Analyzers for Code Review Generation	Imen Jaoua et.al.	2502.06633	null
2025-02-10	Multi-Scale Feature Fusion with Image-Driven Spatial Integration for Left Atrium Segmentation from Cardiac MRI Images	Bipasha Kundu et.al.	2502.06615	null
2025-02-10	A Large-scale AI-generated Image Inpainting Benchmark	Paschalis Giakoumoglou et.al.	2502.06593	null
2025-02-10	Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training	Yuchen Zhuang et.al.	2502.06589	null
2025-02-10	A Survey on Video Analytics in Cloud-Edge-Terminal Collaborative Systems	Linxiao Gong et.al.	2502.06581	null
2025-02-10	LawGPT: Knowledge-Guided Data Generation and Its Application to Legal LLM	Zhi Zhou et.al.	2502.06572	link
2025-02-10	Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation	Chengwen Qi et.al.	2502.06563	null
2025-02-10	Is API Access to LLMs Useful for Generating Private Synthetic Tabular Data?	Marika Swanberg et.al.	2502.06555	null
2025-02-10	Efficient Scientific Full Text Classification: The Case of EICAT Impact Assessments	Marc Felix Brinner et.al.	2502.06551	null
2025-02-10	Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning	Jean Vassoyan et.al.	2502.06533	null
2025-02-10	Properties of Wasserstein Gradient Flows for the Sliced-Wasserstein Distance	Christophe Vauthier et.al.	2502.06525	null
2025-02-10	GuideLLM: Exploring LLM-Guided Conversation with Applications in Autobiography Interviewing	Jinhao Duan et.al.	2502.06494	null
2025-02-10	Recent Advances in Discrete Speech Tokens: A Review	Yiwei Guo et.al.	2502.06490	null
2025-02-10	Adaptive Prompting: Ad-hoc Prompt Composition for Social Bias Detection	Maximilian Spliethöver et.al.	2502.06487	null
2025-02-10	WyckoffDiff - A Generative Diffusion Model for Crystal Symmetry	Filip Ekström Kelvinius et.al.	2502.06485	null
2025-02-10	UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths	Weijia Mao et.al.	2502.06474	null
2025-02-10	KARMA: Leveraging Multi-Agent LLMs for Automated Knowledge Graph Enrichment	Yuxing Lu et.al.	2502.06472	link
2025-02-10	A Survey of Theory of Mind in Large Language Models: Evaluations, Representations, and Safety Risks	Hieu Minh "Jord" Nguyen et.al.	2502.06470	null
2025-02-10	MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations	Kaixuan Huang et.al.	2502.06453	null
2025-02-10	FEMBA: Efficient and Scalable EEG Analysis with a Bidirectional Mamba Foundation Model	Anna Tegon et.al.	2502.06438	null
2025-02-10	Prompt-SID: Learning Structural Representation Prompt via Latent Diffusion for Single-Image Denoising	Huaqiu Li et.al.	2502.06432	null
2025-02-10	CoS: Chain-of-Shot Prompting for Long Video Understanding	Jian Hu et.al.	2502.06428	null
2025-02-10	Generating Privacy-Preserving Personalized Advice with Zero-Knowledge Proofs and LLMs	Hiroki Watanabe et.al.	2502.06425	null
2025-02-10	Occ-LLM: Enhancing Autonomous Driving with Occupancy-Based Large Language Models	Tianshuo Xu et.al.	2502.06419	null
2025-02-10	Systematic Outliers in Large Language Models	Yongqi An et.al.	2502.06415	null
2025-02-10	AppVLM: A Lightweight Vision Language Model for Online App Control	Georgios Papoudakis et.al.	2502.06395	null
2025-02-10	How Humans Help LLMs: Assessing and Incentivizing Human Preference Annotators	Shang Liu et.al.	2502.06387	null
2025-02-10	Simulation as Reality? The Effectiveness of LLM-Generated Data in Open-ended Question Assessment	Long Zhang et.al.	2502.06371	null
2025-02-10	Calibrating LLMs with Information-Theoretic Evidential Deep Learning	Yawei Li et.al.	2502.06351	link
2025-02-10	Can AI Examine Novelty of Patents?: Novelty Evaluation Based on the Correspondence between Patent Claim and Prior Art	Hayato Ikoma et.al.	2502.06316	null
2025-02-10	Latent Convergence Modulation in Large Language Models: A Novel Approach to Iterative Contextual Realignment	Patricia Porretta et.al.	2502.06302	null
2025-02-10	SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia	Chaoqun Liu et.al.	2502.06298	null
2025-02-10	Is an Ultra Large Natural Image-Based Foundation Model Superior to a Retina-Specific Model for Detecting Ocular and Systemic Diseases?	Qingshan Hou et.al.	2502.06289	null
2025-02-10	Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE	Haiduo Huang et.al.	2502.06282	link
2025-02-10	DebateBench: A Challenging Long Context Reasoning Benchmark For Large Language Models	Utkarsh Tiwari et.al.	2502.06279	null
2025-02-10	Emergent Response Planning in LLM	Zhichen Dong et.al.	2502.06258	null
2025-02-10	K-ON: Stacking Knowledge On the Head Layer of Large Language Model	Lingbing Guo et.al.	2502.06257	null
2025-02-10	Find Central Dogma Again	Wang Liang et.al.	2502.06253	null
2025-02-10	Amplifying Minority Voices: AI-Mediated Devil's Advocate System for Inclusive Group Decision-Making	Soohwan Lee et.al.	2502.06251	null
2025-02-10	PiKE: Adaptive Data Mixing for Multi-Task Learning Under Low Gradient Conflicts	Zeman Li et.al.	2502.06244	null
2025-02-10	Fully Exploiting Vision Foundation Model's Profound Prior Knowledge for Generalizable RGB-Depth Driving Scene Parsing	Sicen Guo et.al.	2502.06219	null
2025-02-10	LessLeak-Bench: A First Investigation of Data Leakage in LLMs Across 83 Software Engineering Benchmarks	Xin Zhou et.al.	2502.06215	null
2025-02-10	Unveiling the Capabilities of Large Language Models in Detecting Offensive Language with Annotation Disagreement	Junyu Lu et.al.	2502.06207	null
2025-02-10	C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Generation	Guoxin Chen et.al.	2502.06205	null
2025-02-10	Non-literal Understanding of Number Words by Language Models	Polina Tsvilodub et.al.	2502.06204	null
2025-02-10	Timing Matters: How Using LLMs at Different Timings Influences Writers' Perceptions and Ideation Outcomes in AI-Assisted Ideation	Peinuan Qin et.al.	2502.06197	null
2025-02-10	Can LLMs Replace Human Evaluators? An Empirical Study of LLM-as-a-Judge in Software Engineering	Ruiqi Wang et.al.	2502.06193	null
2025-02-10	Uncertainty-Aware Adaptation of Large Language Models for Protein-Protein Interaction Analysis	Sanket Jantre et.al.	2502.06173	null
2025-02-10	A Data-Efficient Pan-Tumor Foundation Model for Oncology CT Interpretation	Wenhui Lei et.al.	2502.06171	null
2025-02-10	Universal Approximation of Visual Autoregressive Transformers	Yifang Chen et.al.	2502.06167	null
2025-02-10	Scaling Public Health Text Annotation: Zero-Shot Learning vs. Crowdsourcing for Improved Efficiency and Labeling Accuracy	Kamyar Kazari et.al.	2502.06150	null
2025-02-10	Optimizing Knowledge Integration in Retrieval-Augmented Generation with Self-Selection	Yan Weng et.al.	2502.06148	null
2025-02-10	LegalViz: Legal Text Visualization by Text To Diagram Generation	Eri Onami et.al.	2502.06147	null
2025-02-10	LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs	Sumin An et.al.	2502.06139	null
2025-02-10	Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models	Ce Zhang et.al.	2502.06130	null
2025-02-10	Foundation Model of Electronic Medical Records for Adaptive Risk Estimation	Pawel Renc et.al.	2502.06124	null
2025-02-10	Task-driven Layerwise Additive Activation Intervention	Hieu Trung Nguyen et.al.	2502.06115	null
2025-02-10	CSR-Bench: Benchmarking LLM Agents in Deployment of Computer Science Research Repositories	Yijia Xiao et.al.	2502.06111	null
2025-02-10	RALLRec: Improving Retrieval Augmented Large Language Model Recommendation with Representation Learning	Jian Xu et.al.	2502.06101	link
2025-02-10	ConMeC: A Dataset for Metonymy Resolution with Common Nouns	Saptarshi Ghosh et.al.	2502.06087	link
2025-02-10	Physics-Guided Foundation Model for Scientific Discovery: An Application to Aquatic Science	Runlong Yu et.al.	2502.06084	link
2025-02-10	Debiasing Guidance for Discrete Diffusion with Sequential Monte Carlo	Cheuk Kit Lee et.al.	2502.06079	null
2025-02-09	Deconstructing Depression Stigma: Integrating AI-driven Data Collection and Analysis with Causal Knowledge Graphs	Han Meng et.al.	2502.06075	null
2025-02-09	Allegro-FM: Towards Equivariant Foundation Model for Exascale Molecular Dynamics Simulations	Ken-ichi Nomura et.al.	2502.06073	null
2025-02-09	Benchmarking Prompt Sensitivity in Large Language Models	Amirhossein Razavi et.al.	2502.06065	null
2025-02-09	Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization	Jiajun Fan et.al.	2502.06061	null
2025-02-09	Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models	Marc Bruni et.al.	2502.06039	null
2025-02-09	Investigating Compositional Reasoning in Time Series Foundation Models	Willa Potosnak et.al.	2502.06037	link
2025-02-09	A Multimodal PDE Foundation Model for Prediction and Scientific Text Descriptions	Elisa Negrini et.al.	2502.06026	link
2025-02-09	Dual Caption Preference Optimization for Diffusion Models	Amir Saeidi et.al.	2502.06023	null
2025-02-09	Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding	Xingjian Diao et.al.	2502.06020	link
2025-02-09	Media Bias Detector: Designing and Implementing a Tool for Real-Time Selection and Framing Bias Analysis in News Coverage	Jenny S Wang et.al.	2502.06009	null
2025-02-09	Analysis of LLM as a grammatical feature tagger for African American English	Rahul Porwal et.al.	2502.06004	null
2025-02-09	HamRaz: A Culture-Based Persian Conversation Dataset for Person-Centered Therapy Using LLM Agents	Mohammad Amin Abbasi et.al.	2502.05982	null
2025-02-09	$μ$ nit Scaling: Simple and Scalable FP8 LLM Training	Saaketh Narayan et.al.	2502.05967	null
2025-02-09	Redefining Robot Generalization Through Interactive Intelligence	Sharmita Dey et.al.	2502.05963	null
2025-02-09	MetaChain: A Fully-Automated and Zero-Code Framework for LLM Agents	Jiabin Tang et.al.	2502.05957	null
2025-02-09	Cyri: A Conversational AI-based Assistant for Supporting the Human User in Detecting and Responding to Phishing Attacks	Antonio La Torre et.al.	2502.05951	null
2025-02-09	Acceleration Multiple Heads Decoding for LLM via Dynamic Tree Attention	Zhendong Zhang et.al.	2502.05947	null
2025-02-09	"Let the AI conspiracy begin..." Language Model coordination is just one inference-intervention away	Paul Darm et.al.	2502.05945	null
2025-02-07	Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray	Yunhang Shen et.al.	2502.05177	link
2025-02-07	Fillerbuster: Multi-View Scene Completion for Casual Captures	Ethan Weber et.al.	2502.05175	null
2025-02-07	NoLiMa: Long-Context Evaluation Beyond Literal Matching	Ali Modarressi et.al.	2502.05167	null
2025-02-07	Multitwine: Multi-Object Compositing with Text and Layout Control	Gemma Canet Tarrés et.al.	2502.05165	null
2025-02-07	DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails	Yihe Deng et.al.	2502.05163	link
2025-02-07	A Lightweight Method to Disrupt Memorized Sequences in LLM	Parjanya Prajakta Prashant et.al.	2502.05159	null
2025-02-07	Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation	Steffen Eger et.al.	2502.05151	null
2025-02-07	CodeSCM: Causal Analysis for Multi-Modal Code Generation	Mukur Gupta et.al.	2502.05150	link
2025-02-07	An Annotated Reading of 'The Singer of Tales' in the LLM Era	Kush R. Varshney et.al.	2502.05148	null
2025-02-07	Chest X-ray Foundation Model with Global and Local Representations Integration	Zefan Yang et.al.	2502.05142	link
2025-02-07	Latent Swap Joint Diffusion for Long-Form Audio Generation	Yusheng Dai et.al.	2502.05130	null
2025-02-07	Refining Integration-by-Parts Reduction of Feynman Integrals with Machine Learning	Matt von Hippel et.al.	2502.05121	null
2025-02-07	Flexible and Efficient Grammar-Constrained Decoding	Kanghee Park et.al.	2502.05111	null
2025-02-07	Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs	Rohit Saxena et.al.	2502.05092	null
2025-02-07	Mitigating Unintended Memorization with LoRA in Federated Learning for LLMs	Thierry Bossy et.al.	2502.05087	link
2025-02-07	Causality can systematically address the monsters under the bench(marks)	Felix Leeb et.al.	2502.05085	null
2025-02-07	ChallengeMe: An Adversarial Learning-enabled Text Summarization Framework	Xiaoyu Deng et.al.	2502.05084	null
2025-02-07	Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures	Tushar Pandey et.al.	2502.05078	link
2025-02-07	Beautiful Images, Toxic Words: Understanding and Addressing Offensive Text in Generated Images	Aditya Kumar et.al.	2502.05066	link
2025-02-07	nvAgent: Automated Data Visualization from Natural Language via Collaborative Agent Workflow	Geliang Ouyang et.al.	2502.05036	link
2025-02-07	Prospects for detecting generic fast-time features in the neutrino lightcurve of nearby supernovae in neutrino telescopes	Jakob Beise et.al.	2502.05024	null
2025-02-07	QuEST: Stable Training of LLMs with 1-Bit Weights and Activations	Andrei Panferov et.al.	2502.05003	link
2025-02-07	Aligning Black-box Language Models with Human Judgments	Gerrit J. J. van den Burg et.al.	2502.04997	null
2025-02-07	C2GM: Cascading Conditional Generation of Multi-scale Maps from Remote Sensing Images Constrained by Geographic Features	Chenxing Sun et.al.	2502.04991	null
2025-02-07	MoGraphGPT: Creating Interactive Scenes Using Modular LLM and Graphical Control	Hui Ye et.al.	2502.04983	null
2025-02-07	Enhancing Pre-Trained Decision Transformers with Prompt-Tuning Bandits	Finn Rietz et.al.	2502.04979	null
2025-02-07	Towards Multimodal Empathetic Response Generation: A Rich Text-Speech-Vision Avatar-based Benchmark	Han Zhang et.al.	2502.04976	null
2025-02-07	CoCoA: A Generalized Approach to Uncertainty Quantification by Integrating Confidence and Consistency of LLM Outputs	Roman Vashurin et.al.	2502.04964	null
2025-02-07	The Rising Threat to Emerging AI-Powered Search Engines	Zeren Luo et.al.	2502.04951	null
2025-02-07	Mobile Network-specialized Large Language Models for 6G: Architectures, Innovations, Challenges, and Future Trends	Abdelaali Chaoub et.al.	2502.04933	null
2025-02-07	Generative-enhanced optimization for knapsack problems: an industry-relevant study	Yelyzaveta Vodovozova et.al.	2502.04928	null
2025-02-07	Classification or Prompting: A Case Study on Legal Requirements Traceability	Romina Etezadi et.al.	2502.04916	null
2025-02-07	Goku: Flow Based Video Generative Foundation Models	Shoufa Chen et.al.	2502.04896	null
2025-02-07	A Foundational Brain Dynamics Model via Stochastic Optimal Control	Joonhyeong Park et.al.	2502.04892	null
2025-02-07	Training-free Task-oriented Grasp Generation	Jiaming Wang et.al.	2502.04873	null
2025-02-07	Advancing Wasserstein Convergence Analysis of Score-Based Models: Insights from Discretization and Second-Order Acceleration	Yifeng Yu et.al.	2502.04849	null
2025-02-07	Developmentally-plausible Working Memory Shapes a Critical Period for Language Acquisition	Masato Mita et.al.	2502.04795	null
2025-02-07	S $^2$ -MAD: Breaking the Token Barrier to Enhance Multi-Agent Debate Efficiency	Yuting Zeng et.al.	2502.04790	null
2025-02-07	Probing Internal Representations of Multi-Word Verbs in Large Language Models	Hassane Kissane et.al.	2502.04789	null
2025-02-07	Enhancing SQL Injection Detection and Prevention Using Generative Models	Naga Sai Dasari et.al.	2502.04786	null
2025-02-07	SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning	Wanjia Zhao et.al.	2502.04780	link
2025-02-07	SeDi-Instruct: Enhancing Alignment of Language Models through Self-Directed Instruction Generation	Jungwoo Kim et.al.	2502.04774	null
2025-02-07	Enhancing Phishing Email Identification with Large Language Models	Catherine Lee et.al.	2502.04759	null
2025-02-07	Concept Navigation and Classification via Open Source Large Language Model Processing	Maël Kubli et.al.	2502.04756	null
2025-02-07	Every Software as an Agent: Blueprint and Case Study	Mengwei Xu et.al.	2502.04747	null
2025-02-07	PhyloVAE: Unsupervised Learning of Phylogenetic Trees via Variational Autoencoders	Tianyu Xie et.al.	2502.04730	link
2025-02-07	Generating Symbolic World Models via Test-time Scaling of Large Language Models	Zhouliang Yu et.al.	2502.04728	link
2025-02-07	Evaluating Text Style Transfer Evaluation: Are There Any Reliable Metrics?	Sourabrata Mukherjee et.al.	2502.04718	null
2025-02-07	Enhancing Impression Change Prediction in Speed Dating Simulations Based on Speakers' Personalities	Kazuya Matsuo et.al.	2502.04706	null
2025-02-07	STRIDE: Automating Reward Design, Deep Reinforcement Learning Training and Feedback Optimization in Humanoid Robotics Locomotion	Zhenwei Wu et.al.	2502.04692	null
2025-02-07	ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning	Yuwei Yin et.al.	2502.04689	link
2025-02-07	M-IFEval: Multilingual Instruction-Following Evaluation	Antoine Dussolle et.al.	2502.04688	link
2025-02-07	Learning Strategic Language Agents in the Werewolf Game with Iterative Latent Space Policy Optimization	Zelai Xu et.al.	2502.04686	null
2025-02-07	G2PDiffusion: Genotype-to-Phenotype Prediction with Diffusion Models	Mengdi Liu et.al.	2502.04684	null
2025-02-07	CALF-SBM: A Covariate-Assisted Latent Factor Stochastic Block Model	Sydney Louit et.al.	2502.04681	null
2025-02-07	LLM Query Scheduling with Prefix Reuse and Latency Constraints	Gregory Dexter et.al.	2502.04677	null
2025-02-07	AdParaphrase: Paraphrase Dataset for Analyzing Linguistic Features toward Generating Attractive Ad Texts	Soichiro Murakami et.al.	2502.04674	link
2025-02-07	Unveiling the Mechanisms of Explicit CoT Training: How Chain-of-Thought Enhances Reasoning Generalization	Xinhao Yao et.al.	2502.04667	link
2025-02-07	Enhancing Health Information Retrieval with RAG by Prioritizing Topical Relevance and Factual Accuracy	Rishabh Uapadhyay et.al.	2502.04666	null
2025-02-07	Importance Sampling via Score-based Generative Models	Heasung Kim et.al.	2502.04646	null
2025-02-07	Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research	Junde Wu et.al.	2502.04644	link
2025-02-07	Confidence Elicitation: A New Attack Vector for Large Language Models	Brian Formento et.al.	2502.04643	null
2025-02-07	Contrastive Learning-Enhanced Large Language Models for Monolith-to-Microservice Decomposition	Khaled Sellami et.al.	2502.04604	null
2025-02-07	Extracting and Understanding the Superficial Knowledge in Alignment	Runjin Chen et.al.	2502.04602	link
2025-02-07	The $α$ -Alternator: Dynamic Adaptation To Varying Noise Levels In Sequences Using The Vendi Score For Improved Robustness and Performance	Mohammad Reza Rezaei et.al.	2502.04593	null
2025-02-07	Position-aware Automatic Circuit Discovery	Tal Haklay et.al.	2502.04577	link
2025-02-06	My LLM might Mimic AAE -- But When Should it?	Sandra C. Sandoval et.al.	2502.04564	link
2025-02-06	Speeding up Speculative Decoding via Approximate Verification	Meiyu Zhong et.al.	2502.04557	null
2025-02-06	TruthFlow: Truthful LLM Generation via Representation Flow Correction	Hanyu Wang et.al.	2502.04556	null
2025-02-06	Contextual Gradient Flow Modeling for Large Language Model Generalization in Multi-Scale Feature Spaces	Daphne Quillington et.al.	2502.04548	null
2025-02-06	Group-Adaptive Threshold Optimization for Robust AI-Generated Text Detection	Minseok Jung et.al.	2502.04528	null
2025-02-06	Safety is Essential for Responsible Open-Ended Systems	Ivaxi Sheth et.al.	2502.04512	null
2025-02-06	ULPT: Prompt Tuning with Ultra-Low-Dimensional Optimization	Zijun Wu et.al.	2502.04501	null
2025-02-06	Verifiable Format Control for Large Language Model Generations	Zhaoyang Wang et.al.	2502.04498	null
2025-02-06	Multi-Agent Reinforcement Learning with Focal Diversity Optimization	Selim Furkan Tekin et.al.	2502.04492	link
2025-02-06	Building A Unified AI-centric Language System: analysis, framework and future work	Edward Hong Wang et.al.	2502.04488	null
2025-02-06	Active Task Disambiguation with LLMs	Katarzyna Kobalczyk et.al.	2502.04485	link
2025-02-06	The ML Supply Chain in the Era of Software 2.0: Lessons Learned from Hugging Face	Trevor Stalnaker et.al.	2502.04484	null
2025-02-06	Near-Optimal Sample Complexity for MDPs via Anchoring	Jongmin Lee et.al.	2502.04477	null
2025-02-06	ADIFF: Explaining audio difference using natural language	Soham Deshmukh et.al.	2502.04476	link
2025-02-06	Augmented Conditioning Is Enough For Effective Training Image Generation	Jiahui Chen et.al.	2502.04475	null
2025-02-06	Iterative Importance Fine-tuning of Diffusion Models	Alexander Denker et.al.	2502.04468	null
2025-02-06	FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks	Luca Della Libera et.al.	2502.04465	null
2025-02-06	Training Language Models to Reason Efficiently	Daman Arora et.al.	2502.04463	link
2025-02-06	Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization	Yu-Neng Chuang et.al.	2502.04428	null
2025-02-06	Decoding AI Judgment: How LLMs Assess News Credibility and Bias	Edoardo Loru et.al.	2502.04426	null
2025-02-06	EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models	He Hu et.al.	2502.04424	null
2025-02-06	Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment	Zuyan Liu et.al.	2502.04328	link
2025-02-06	Can Grammarly and ChatGPT accelerate language change? AI-powered technologies and their impact on the English language: wordiness vs. conciseness	Karolina Rudnicka et.al.	2502.04324	null
2025-02-06	Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions	Yik Siu Chan et.al.	2502.04322	link
2025-02-06	ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features	Alec Helbling et.al.	2502.04320	link
2025-02-06	sshELF: Single-Shot Hierarchical Extrapolation of Latent Features for 3D Reconstruction from Sparse-Views	Eyvaz Najafli et.al.	2502.04318	null
2025-02-06	ChamaleonLLM: Batch-Aware Dynamic Low-Rank Adaptation via Inference-Time Clusters	Kamer Ali Yuksel et.al.	2502.04315	link
2025-02-06	ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization	Yinjie Wang et.al.	2502.04306	link
2025-02-06	MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation	Jinbo Xing et.al.	2502.04299	null
2025-02-06	Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression	Lirui Wang et.al.	2502.04296	null
2025-02-06	Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization	Yuanye Liu et.al.	2502.04295	link
2025-02-06	PILAF: Optimal Human Preference Sampling for Reward Modeling	Yunzhen Feng et.al.	2502.04270	null
2025-02-06	Efficient Randomized Experiments Using Foundation Models	Piersilvio De Bartolomeis et.al.	2502.04262	link
2025-02-06	Realistic Image-to-Image Machine Unlearning via Decoupling and Knowledge Retention	Ayush K. Varshney et.al.	2502.04260	null
2025-02-06	MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion	Xintong Hao et.al.	2502.04235	null
2025-02-06	Can LLMs Hack Enterprise Networks? Autonomous Assumed Breach Penetration-Testing Active Directory Networks	Andreas Happe et.al.	2502.04227	null
2025-02-06	Keep It Light! Simplifying Image Clustering Via Text-Free Adapters	Yicen Li et.al.	2502.04226	null
2025-02-06	Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents	Ilia Karmanov et.al.	2502.04223	null
2025-02-06	Sports and Women's Sports: Gender Bias in Text Generation with Olympic Data	Laura Biester et.al.	2502.04218	null
2025-02-06	Algorithmic causal structure emerging through compression	Liang Wendong et.al.	2502.04210	null
2025-02-06	"Short-length" Adversarial Training Helps LLMs Defend "Long-length" Jailbreak Attacks: Theoretical and Empirical Evidence	Shaopeng Fu et.al.	2502.04204	link
2025-02-06	The Best Instruction-Tuning Data are Those That Fit	Dylan Zhang et.al.	2502.04194	null
2025-02-06	PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models?	Mennatullah Siam et.al.	2502.04192	link
2025-02-06	Automated Microservice Pattern Instance Detection Using Infrastructure-as-Code Artifacts and Large Language Models	Carlos Eduardo Duarte et.al.	2502.04188	null
2025-02-06	Multi-agent Architecture Search via Agentic Supernet	Guibin Zhang et.al.	2502.04180	null
2025-02-06	MRAMG-Bench: A BeyondText Benchmark for Multimodal Retrieval-Augmented Multimodal Generation	Qinhan Yu et.al.	2502.04176	null
2025-02-06	Diffusion-based mass map reconstruction from weak lensing data	Supranta S. Boruah et.al.	2502.04158	null
2025-02-06	UltraIF: Advancing Instruction Following from the Wild	Kaikai An et.al.	2502.04153	null
2025-02-06	The Order Effect: Investigating Prompt Sensitivity in Closed-Source LLMs	Bryan Guan et.al.	2502.04134	null
2025-02-06	Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis	Zhen Ye et.al.	2502.04128	null
2025-02-06	Generative Adversarial Networks Bridging Art and Machine Intelligence	Junhao Song et.al.	2502.04116	null
2025-02-06	VTutor: An Open-Source SDK for Generative AI-Powered Animated Pedagogical Agents with Multi-Media Output	Eason Chen et.al.	2502.04103	null
2025-02-06	LLMs to Support a Domain Specific Knowledge Assistant	Maria-Flavia Lovin et.al.	2502.04095	null
2025-02-06	AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference	Qingyue Yang et.al.	2502.04077	null
2025-02-06	Content-Rich AIGC Video Quality Assessment via Intricate Text Alignment and Motion-Aware Consistency	Shangkun Sun et.al.	2502.04076	link
2025-02-06	Predicting Large Language Model Capabilities on Closed-Book QA Tasks Using Only Information Available Prior to Training	Changhao Jiang et.al.	2502.04066	null
2025-02-06	TQ-DiT: Efficient Time-Aware Quantization for Diffusion Transformers	Younghye Hwang et.al.	2502.04056	null
2025-02-06	Exploring Imbalanced Annotations for Effective In-Context Learning	Hongfu Gao et.al.	2502.04037	null
2025-02-06	Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging	Guinan Su et.al.	2502.04030	null
2025-02-06	Echo-Teddy: Preliminary Design and Development of Large Language Model-based Social Robot for Autistic Students	Unggi Lee et.al.	2502.04029	null
2025-02-06	Quantification of Biodiversity from Historical Survey Text with LLM-based Best-Worst Scaling	Thomas Haider et.al.	2502.04022	null
2025-02-06	Automating a Complete Software Test Process Using LLMs: An Automotive Case Study	Shuai Wang et.al.	2502.04008	null
2025-02-06	CAD-Editor: A Locate-then-Infill Framework with Automated Training Data Synthesis for Text-Based CAD Editing	Yu Yuan et.al.	2502.03997	null
2025-02-06	Ontology-Guided, Hybrid Prompt Learning for Generalization in Knowledge Graph Question Answering	Longquan Jiang et.al.	2502.03992	link
2025-02-06	Tight Bounds on Jensen's Gap: Novel Approach with Applications in Generative Modeling	Marcin Mazur et.al.	2502.03988	null
2025-02-06	MultiFloodSynth: Multi-Annotated Flood Synthetic Dataset Generation	YoonJe Kang et.al.	2502.03966	null
2025-02-06	MAQInstruct: Instruction-based Unified Event Relation Extraction	Jun Xu et.al.	2502.03954	null
2025-02-06	LR0.FM: Low-Resolution Zero-shot Classification Benchmark For Foundation Models	Priyank Pathak et.al.	2502.03950	link
2025-02-06	Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and Beyond	Mardhiyah Sanni et.al.	2502.03945	null
2025-02-06	Unravelling Causal Genetic Biomarkers of Alzheimer's Disease via Neuron to Gene-token Backtracking in Neural Architecture: A Groundbreaking Reverse-Gene-Finder Approach	Victor OK Li et.al.	2502.03938	null
2025-02-06	Quantifying Correlations of Machine Learning Models	Yuanyuan Li et.al.	2502.03937	link
2025-02-06	HEP-JEPA: A foundation model for collider physics using joint embedding predictive architecture	Jai Bardhan et.al.	2502.03933	null
2025-02-06	Experiments with Large Language Models on Retrieval-Augmented Generation for Closed-Source Simulation Software	Andreas Baumann et.al.	2502.03916	null
2025-02-06	No Free Lunch in Annotation either: An objective evaluation of foundation models for streamlining annotation in animal tracking	Emil Mededovic et.al.	2502.03907	link
2025-02-06	LeAP: Consistent multi-domain 3D labeling using Foundation Models	Simon Gebraad et.al.	2502.03901	null
2025-02-06	InfinitePOD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers	Chenchen Shou et.al.	2502.03885	null
2025-02-06	Rank Also Matters: Hierarchical Configuration for Mixture of Adapter Experts in LLM Fine-Tuning	Peizhuang Cong et.al.	2502.03884	null
2025-02-06	BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation	Bo Pang et.al.	2502.03860	null
2025-02-06	PAGNet: Pluggable Adaptive Generative Networks for Information Completion in Multi-Agent Communication	Zhuohui Zhang et.al.	2502.03845	null
2025-02-06	Improving Natural Language Understanding for LLMs via Large-Scale Instruction Synthesis	Lin Yuan et.al.	2502.03843	null
2025-02-06	FairT2I: Mitigating Social Bias in Text-to-Image Generation via Large Language Model-Assisted Detection and Attribute Rebalancing	Jinya Sakurai et.al.	2502.03826	null
2025-02-06	Synthetic Poisoning Attacks: The Impact of Poisoned MRI Image on U-Net Brain Tumor Segmentation	Tianhao Li et.al.	2502.03825	null
2025-02-06	PsyPlay: Personality-Infused Role-Playing Conversational Agents	Tao Yang et.al.	2502.03821	null
2025-02-06	Large Language Models for Multi-Robot Systems: A Survey	Peihan Li et.al.	2502.03814	null
2025-02-06	Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective	Yuan Feng et.al.	2502.03805	link
2025-02-06	Understanding and Supporting Formal Email Exchange by Answering AI-Generated Questions	Yusuke Miura et.al.	2502.03804	null
2025-02-06	Enhancing Hallucination Detection through Noise Injection	Litian Liu et.al.	2502.03799	null
2025-02-06	Distribution learning via neural differential equations: minimal energy regularization and approximation theory	Youssef Marzouk et.al.	2502.03795	null
2025-02-06	It's All in The [MASK]: Simple Instruction-Tuning Enables BERT-like Masked Language Models As Generative Classifiers	Benjamin Clavié et.al.	2502.03793	null
2025-02-06	Iterate to Accelerate: A Unified Framework for Iterative Reasoning and Feedback Convergence	Jacob Fein-Ashley et.al.	2502.03787	null
2025-02-06	GistVis: Automatic Generation of Word-scale Visualizations from Data-rich Documents	Ruishi Zou et.al.	2502.03784	link
2025-02-06	Adaptive Semantic Prompt Caching with VectorQ	Luis Gaspar Schroeder et.al.	2502.03771	null
2025-02-06	Hierarchical Contextual Manifold Alignment for Structuring Latent Representations in Large Language Models	Meiquan Dong et.al.	2502.03766	null
2025-02-06	Rethinking the Residual Distribution of Locate-then-Editing Methods in Model Editing	Xiaopeng Li et.al.	2502.03748	null
2025-02-06	Speaking the Language of Teamwork: LLM-Guided Credit Assignment in Multi-Agent Reinforcement Learning	Muhan Lin et.al.	2502.03723	null
2025-02-06	Boosting Knowledge Graph-based Recommendations through Confidence-Aware Augmentation with Large Language Models	Rui Cai et.al.	2502.03715	null
2025-02-06	MultiQ&A: An Analysis in Measuring Robustness via Automated Crowdsourcing of Question Perturbations and Answers	Nicole Cho et.al.	2502.03711	null
2025-02-06	Aggregate and conquer: detecting and steering LLM concepts by combining nonlinear predictors over multiple layers	Daniel Beaglehole et.al.	2502.03708	null
2025-02-06	LLM Alignment as Retriever Optimization: An Information Retrieval Perspective	Bowen Jin et.al.	2502.03699	null
2025-02-06	A Comparison of DeepSeek and Other LLMs	Tianchen Gao et.al.	2502.03688	null
2025-02-06	Conditional Diffusion Models are Medical Image Classifiers that Provide Explainability and Uncertainty for Free	Gian Mario Favero et.al.	2502.03687	null
2025-02-06	Controlled LLM Decoding via Discrete Auto-regressive Biasing	Patrick Pynadath et.al.	2502.03685	null
2025-02-05	Reflection-Window Decoding: Text Generation with Selective Refinement	Zeyu Tang et.al.	2502.03678	null
2025-02-05	Advancing Reasoning in Large Language Models: Promising Methods and Approaches	Avinash Patil et.al.	2502.03671	null
2025-02-05	Unrealized Expectations: Comparing AI Methods vs Classical Algorithms for Maximum Independent Set	Yikai Wu et.al.	2502.03669	null
2025-02-05	Privacy-Preserving Generative Models: A Comprehensive Survey	Debalina Padariya et.al.	2502.03668	null
2025-02-05	Context-Preserving Gradient Modulation for Large Language Models: A Novel Approach to Semantic Consistency in Long-Form Text Generation	Nirola Kobanov et.al.	2502.03643	null
2025-02-05	SymmCD: Symmetry-Preserving Crystal Generation with Diffusion Models	Daniel Levy et.al.	2502.03638	link
2025-02-05	AdaPhish: AI-Powered Adaptive Defense and Education Resource Against Deceptive Emails	Rei Meguro et.al.	2502.03622	null
2025-02-05	Bilevel ZOFO: Bridging Parameter-Efficient and Zeroth-Order Techniques for Efficient LLM Fine-Tuning and Meta-Training	Reza Shirkavand et.al.	2502.03604	null
2025-02-05	HACK: Homomorphic Acceleration via Compression of the Key-Value Cache for Disaggregated LLM Inference	Zeyu Zhang et.al.	2502.03589	null
2025-02-05	A Mixed-Methods Evaluation of LLM-Based Chatbots for Menopause	Roshini Deva et.al.	2502.03579	null
2025-02-05	Code Simulation as a Proxy for High-order Tasks in Large Language Models	Emanuele La Malfa et.al.	2502.03568	null
2025-02-05	Kronecker Mask and Interpretive Prompts are Language-Action Video Learners	Jingyi Yang et.al.	2502.03549	link
2025-02-05	YINYANG-ALIGN: Benchmarking Contradictory Objectives and Proposing Multi-Objective Optimization based DPO for Text-to-Image Alignment	Amitava Das et.al.	2502.03512	null
2025-02-05	Do Large Language Model Benchmarks Test Reliability?	Joshua Vendrow et.al.	2502.03461	link
2025-02-05	Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training	Boyao Wang et.al.	2502.03460	null
2025-02-05	A Schema-Guided Reason-while-Retrieve framework for Reasoning on Scene Graphs with Large-Language-Models (LLMs)	Yiye Chen et.al.	2502.03450	null
2025-02-05	Dress-1-to-3: Single Image to Simulation-Ready 3D Outfit with Diffusion Prior and Differentiable Physics	Xuan Li et.al.	2502.03449	null
2025-02-05	BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving	Ran Xin et.al.	2502.03438	null
2025-02-05	Taking a Big Step: Large Learning Rates in Denoising Score Matching Prevent Memorization	Yu-Han Wu et.al.	2502.03435	null
2025-02-05	On Fairness of Unified Multimodal Large Language Model for Image Generation	Ming Liu et.al.	2502.03429	null
2025-02-05	Harnessing Large Language Models for Curated Code Reviews	Oussama Ben Sghaier et.al.	2502.03425	link
2025-02-05	Can Text-to-Image Generative Models Accurately Depict Age? A Comparative Study on Synthetic Portrait Generation and Age Estimation	Alexey A. Novikov et.al.	2502.03420	null
2025-02-05	Think or Step-by-Step? UnZIPping the Black Box in Zero-Shot Prompts	Nikta Gohari Sadr et.al.	2502.03418	null
2025-02-05	SPRI: Aligning Large Language Models with Context-Situated Principles	Hongli Zhan et.al.	2502.03397	null
2025-02-05	Benchmarking Time Series Forecasting Models: From Statistical Techniques to Foundation Models in Real-World Applications	Issar Arab et.al.	2502.03395	null
2025-02-05	LIMO: Less is More for Reasoning	Yixin Ye et.al.	2502.03387	link
2025-02-05	Transformers and Their Roles as Time Series Foundation Models	Dennis Wu et.al.	2502.03383	null
2025-02-05	Demystifying Long Chain-of-Thought Reasoning in LLMs	Edward Yeo et.al.	2502.03373	link
2025-02-05	PalimpChat: Declarative and Interactive AI analytics	Chunwei Liu et.al.	2502.03368	null
2025-02-05	RadVLM: A Multitask Conversational Vision-Language Model for Radiology	Nicolas Deperrois et.al.	2502.03333	null
2025-02-05	ECM: A Unified Electronic Circuit Model for Explaining the Emergence of In-Context Learning and Chain-of-Thought in Large Language Model	Qiguang Chen et.al.	2502.03325	null
2025-02-05	Out-of-Distribution Detection using Synthetic Data Generation	Momin Abbas et.al.	2502.03323	null
2025-02-05	Simplifying Formal Proof-Generating Models with ChatGPT and Basic Searching Techniques	Sangjun Han et.al.	2502.03321	null
2025-02-05	Intent Representation Learning with Large Language Model for Recommendation	Yu Wang et.al.	2502.03307	link
2025-02-05	Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning	Qitao Tan et.al.	2502.03304	null
2025-02-05	MeDiSumQA: Patient-Oriented Question-Answer Generation from Discharge Letters	Amin Dada et.al.	2502.03298	null
2025-02-05	SymAgent: A Neural-Symbolic Self-Learning Agent Framework for Complex Reasoning over Knowledge Graphs	Ben Liu et.al.	2502.03283	null
2025-02-05	Posterior SBC: Simulation-Based Calibration Checking Conditional on Data	Teemu Säilynoja et.al.	2502.03279	link
2025-02-05	Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning	DiJia Su et.al.	2502.03275	null
2025-02-05	ZISVFM: Zero-Shot Object Instance Segmentation in Indoor Robotic Environments with Vision Foundation Models	Ying Zhang et.al.	2502.03266	link
2025-02-05	General Time-series Model for Universal Knowledge Representation of Multivariate Time-Series data	Cheng He et.al.	2502.03264	null
2025-02-05	CARROT: A Cost Aware Rate Optimal Router	Seamus Somerstep et.al.	2502.03261	null
2025-02-05	RiemannGFM: Learning a Graph Foundation Model from Riemannian Geometry	Li Sun et.al.	2502.03251	null
2025-02-05	Exploring the Security Threats of Knowledge Base Poisoning in Retrieval-Augmented Code Generation	Bo Lin et.al.	2502.03233	null
2025-02-05	Improve Decoding Factuality by Token-wise Cross Layer Entropy of Large Language Models	Jialiang Wu et.al.	2502.03199	null
2025-02-05	MaxInfo: A Training-Free Key-Frame Selection Method Using Maximum Volume for Enhanced Video Understanding	Pengyi Li et.al.	2502.03183	null
2025-02-05	PICBench: Benchmarking LLMs for Photonic Integrated Circuits Design	Yuchao Wu et.al.	2502.03159	null
2025-02-05	Strategizing with AI: Insights from a Beauty Contest Experiment	Iuliia Alekseenko et.al.	2502.03158	null
2025-02-05	Scalable In-Context Learning on Tabular Data via Retrieval-Augmented Large Language Models	Xumeng Wen et.al.	2502.03147	null
2025-02-05	Symmetry-Aware Bayesian Flow Networks for Crystal Generation	Laura Ruple et.al.	2502.03146	null
2025-02-05	Teaching Large Language Models Number-Focused Headline Generation With Key Element Rationales	Zhen Qian et.al.	2502.03129	null
2025-02-05	Metis: A Foundation Speech Generation Model with Masked Generative Pre-training	Yuancheng Wang et.al.	2502.03128	link
2025-02-05	Structured Token Retention and Computational Memory Paths in Large Language Models	Jonathan Delena et.al.	2502.03102	null
2025-02-05	Reveal the Mystery of DPO: The Connection between DPO and RL Algorithms	Xuerui Su et.al.	2502.03095	null
2025-02-05	Implementing Large Quantum Boltzmann Machines as Generative AI Models for Dataset Balancing	Salvatore Sinno et.al.	2502.03086	null
2025-02-05	IAO Prompting: Making Knowledge Flow Explicit in LLMs through Structured Reasoning Templates	Aissatou Diallo et.al.	2502.03080	null
2025-02-05	Poisson Flow Joint Model for Multiphase contrast-enhanced CT	Rongjun Ge et.al.	2502.03079	null
2025-02-05	Automatic Prompt Optimization Techniques: Exploring the Potential for Synthetic Data Generation	Nina Freise et.al.	2502.03078	null
2025-02-05	Optimizing Electric Vehicles Charging using Large Language Models and Graph Neural Networks	Stavros Orfanoudakis et.al.	2502.03067	null
2025-02-05	Understanding and Enhancing the Transferability of Jailbreaking Attacks	Runqi Lin et.al.	2502.03052	link
2025-02-05	RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts	Tuan Truong et.al.	2502.03044	null
2025-02-05	Large Language Models Are Universal Recommendation Learners	Junguang Jiang et.al.	2502.03041	null
2025-02-05	FuXi- $α$ : Scaling Recommendation Model with Feature Interaction Enhanced Transformer	Yufei Ye et.al.	2502.03036	null
2025-02-05	Knowledge Distillation from Large Language Models for Household Energy Modeling	Mohannad Takrouri et.al.	2502.03034	null
2025-02-05	Analyze Feature Flow to Enhance Interpretation and Steering in Language Models	Daniil Laptev et.al.	2502.03032	null
2025-02-05	Scaling Laws for Upcycling Mixture-of-Experts Language Models	Seng Pei Liew et.al.	2502.03009	null
2025-02-05	MedBioLM: Optimizing Medical and Biological QA with Fine-Tuned Large Language Models and Retrieval-Augmented Generation	Seonok Kim et.al.	2502.03004	null
2025-02-05	Training an LLM-as-a-Judge Model: Pipeline, Insights, and Practical Lessons	Renjun Hu et.al.	2502.02988	null
2025-02-05	Membership Inference Attack Should Move On to Distributional Statistics for Distilled Generative Models	Muxing Li et.al.	2502.02970	null
2025-02-05	The Labeled Coupon Collector Problem with Random Sample Sizes and Partial Recovery	Shoham Shimon Berrebi et.al.	2502.02968	null
2025-02-05	Large Language Model Adversarial Landscape Through the Lens of Attack Objectives	Nan Wang et.al.	2502.02960	null
2025-02-05	Position: Editing Large Language Models Poses Serious Safety Risks	Paul Youssef et.al.	2502.02958	null
2025-02-05	Control Search Rankings, Control the World: What is a Good Search Engine?	Simon Coghlan et.al.	2502.02957	null
2025-02-05	LLM-KT: Aligning Large Language Models with Knowledge Tracing using a Plug-and-Play Instruction	Ziwei Wang et.al.	2502.02945	null
2025-02-05	Large Language Model Guided Self-Debugging Code Generation	Muntasir Adnan et.al.	2502.02928	null
2025-02-05	SPARC: Subspace-Aware Prompt Adaptation for Robust Continual Learning in LLMs	Dinithi Jayasuriya et.al.	2502.02909	null
2025-02-05	AI-driven materials design: a mini-review	Mouyang Cheng et.al.	2502.02905	null
2025-02-05	A Benchmark for the Detection of Metalinguistic Disagreements between LLMs and Knowledge Graphs	Bradley P. Allen et.al.	2502.02896	null
2025-02-05	Lowering the Barrier of Machine Learning: Achieving Zero Manual Labeling in Review Classification Using LLMs	Yejian Zhang et.al.	2502.02893	null
2025-02-05	Expertized Caption Auto-Enhancement for Video-Text Retrieval	Junxiang Chen et.al.	2502.02885	null
2025-02-05	SensorChat: Answering Qualitative and Quantitative Questions during Long-Term Multimodal Sensor Interactions	Xiaofan Yu et.al.	2502.02883	null
2025-02-05	Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning	Yibo Yan et.al.	2502.02871	null
2025-02-05	A Systematic Approach for Assessing Large Language Models' Test Case Generation Capability	Hung-Fu Chang et.al.	2502.02866	null
2025-02-05	OceanChat: The Effect of Virtual Conversational AI Agents on Sustainable Attitude and Behavior Change	Pat Pataranutaporn et.al.	2502.02863	null
2025-02-05	A Survey of Sample-Efficient Deep Learning for Change Detection in Remote Sensing: Tasks, Strategies, and Challenges	Lei Ding et.al.	2502.02835	null
2025-02-05	COFFE: A Code Efficiency Benchmark for Code Generation	Yun Peng et.al.	2502.02827	link
2025-02-05	Accessible and Portable LLM Inference by Compiling Computational Graphs into SQL	Wenbo Sun et.al.	2502.02818	null
2025-02-05	Mol-LLM: Generalist Molecular LLM with Improved Graph Utilization	Chanhui Lee et.al.	2502.02810	null
2025-02-05	CAMI: A Counselor Agent Supporting Motivational Interviewing through State Inference and Topic Exploration	Yizhe Yang et.al.	2502.02807	null
2025-02-05	Leveraging the true depth of LLMs	Ramón Calvo González et.al.	2502.02790	null
2025-02-05	Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation	Jingyu Liu et.al.	2502.02789	link
2025-02-05	SimMark: A Robust Sentence-Level Similarity-Based Watermarking Algorithm for Large Language Models	Amirhossein Dabiriaghdam et.al.	2502.02787	link
2025-02-04	Classroom Simulacra: Building Contextual Student Generative Agents in Online Education for Learning Behavioral Simulation	Songlin Xu et.al.	2502.02780	link
2025-02-04	3D Foundation AI Model for Generalizable Disease Detection in Head Computed Tomography	Weicheng Zhu et.al.	2502.02779	null
2025-02-04	Twilight: Adaptive Attention Sparsity with Hierarchical Top- $p$ Pruning	Chaofan Lin et.al.	2502.02770	null
2025-02-04	LLM-USO: Large Language Model-based Universal Sizing Optimizer	Karthik Somayaji N. S et.al.	2502.02764	null
2025-02-04	Rethinking Vision Transformer for Object Centric Foundation Models	Manuel Traub et.al.	2502.02763	null
2025-02-04	Too Noisy To Learn: Enhancing Data Quality for Code Review C	Chunhua Liu et.al.	2502.02757	null
2025-02-04	PatchPilot: A Stable and Cost-Efficient Agentic Patching Framework	Hongwei Li et.al.	2502.02747	null
2025-02-04	LLM Bandit: Cost-Efficient LLM Generation via Preference-Conditioned Dynamic Routing	Yang Li et.al.	2502.02743	null
2025-02-04	RFMedSAM 2: Automatic Prompt Refinement for Enhanced Volumetric Medical Image Segmentation with SAM 2	Bin Xie et.al.	2502.02741	null
2025-02-04	SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model	Loubna Ben Allal et.al.	2502.02737	null
2025-02-04	Peri-LN: Revisiting Layer Normalization in the Transformer Architecture	Jeonghoon Kim et.al.	2502.02732	null
2025-02-04	Cross-Lingual Transfer for Low-Resource Natural Language Processing	Iker García-Ferrero et.al.	2502.02722	null
2025-02-04	Astromer 2	Cristobal Donoso-Oliva et.al.	2502.02717	null
2025-02-04	A Unified Understanding and Evaluation of Steering Methods	Shawn Im et.al.	2502.02716	null
2025-02-04	An Analysis of LLM Fine-Tuning and Few-Shot Learning for Flaky Test Detection and Classification	Riddhi More et.al.	2502.02715	null
2025-02-04	Exploring LLMs Impact on Student-Created User Stories and Acceptance Testing in Software Development	Allan Brockenbrough et.al.	2502.02675	null
2025-02-04	MedRAX: Medical Reasoning Agent for Chest X-ray	Adibvafa Fallahpour et.al.	2502.02673	link
2025-02-04	Transformers Boost the Performance of Decision Trees on Tabular Data across Sample Sizes	Mayuka Jayawardhana et.al.	2502.02672	null
2025-02-04	Machine-learning approaches to accelerating lattice simulations	Scott Lawrence et.al.	2502.02670	null
2025-02-04	A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI)	Yan Li et.al.	2502.02659	link
2025-02-04	Introducing the Rhea simulations of Milky-Way-like galaxies I: Effect of gravitational potential on morphology and star formation	Junia Göller et.al.	2502.02646	null
2025-02-04	COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for Fine-Grained Understanding and Generation	Xueqing Deng et.al.	2502.02589	null
2025-02-04	Open Materials Generation with Stochastic Interpolants	Philipp Hoellmer et.al.	2502.02582	null
2025-02-04	A comparison of translation performance between DeepL and Supertext	Alex Flückiger et.al.	2502.02577	link
2025-02-04	Are Language Models Up to Sequential Optimization Problems? From Evaluation to a Hegelian-Inspired Enhancement	Soheil Abbasloo et.al.	2502.02573	null
2025-02-04	Learning the RoPEs: Better 2D and 3D Position Encodings with STRING	Connor Schenck et.al.	2502.02562	null
2025-02-04	Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation	Junha Lee et.al.	2502.02548	null
2025-02-04	LLMs for Generation of Architectural Components: An Exploratory Empirical Study in the Serverless World	Shrikara Arun et.al.	2502.02539	null
2025-02-04	Adaptive Self-improvement LLM Agentic System for ML Library Development	Genghan Zhang et.al.	2502.02534	link
2025-02-04	Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies	Han Zhou et.al.	2502.02533	null
2025-02-04	Generative Modeling on Lie Groups via Euclidean Generalized Score Matching	Marco Bertolini et.al.	2502.02513	null
2025-02-04	Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search	Maohao Shen et.al.	2502.02508	null
2025-02-04	Learning to generate physical ocean states: Towards hybrid climate modeling	Etienne Meunier et.al.	2502.02499	null
2025-02-04	EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization	Yize Wu et.al.	2502.02493	null
2025-02-04	Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study	Menglong Cui et.al.	2502.02481	null
2025-02-04	Style transfer as data augmentation: evaluating unpaired image-to-image translation models in mammography	Emir Ahmed et.al.	2502.02475	null
2025-02-04	Mind the Gap: Evaluating Patch Embeddings from General-Purpose and Histopathology Foundation Models for Cell Segmentation and Classification	Valentina Vadori et.al.	2502.02471	link
2025-02-04	SAISA: Towards Multimodal Large Language Models with Both Training and Inference Efficiency	Qianhao Yuan et.al.	2502.02458	null
2025-02-04	Personalization Toolkit: Training Free Personalization of Large Vision Language Models	Soroush Seifi et.al.	2502.02452	null
2025-02-04	Beyond English: Evaluating Automated Measurement of Moral Foundations in Non-English Discourse with a Chinese Case Study	Calvin Yixiang Cheng et.al.	2502.02451	link
2025-02-04	Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language Models	Haoran Ye et.al.	2502.02444	null
2025-02-04	LLMER: Crafting Interactive Extended Reality Worlds with JSON Data Generated by Large Language Models	Jiangong Chen et.al.	2502.02441	link
2025-02-04	Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment	Yaling Shen et.al.	2502.02438	null
2025-02-04	TransformDAS: Mapping Φ-OTDR Signals to Riemannian Manifold for Robust Classification	Jiaju Kang et.al.	2502.02428	null
2025-02-04	Activation-Informed Merging of Large Language Models	Amin Heyrani Nobari et.al.	2502.02421	link
2025-02-04	Towards Fast Graph Generation via Autoregressive Noisy Filtration Modeling	Markus Krimmel et.al.	2502.02415	link
2025-02-04	AI-Powered, But Power-Hungry? Energy Efficiency of LLM-Generated Code	Lola Solovyeva et.al.	2502.02412	null
2025-02-04	Avoiding spurious sharpness minimization broadens applicability of SAM	Sidak Pal Singh et.al.	2502.02407	null
2025-02-04	LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models	Tzu-Tao Chang et.al.	2502.02406	null
2025-02-04	CoAT: Chain-of-Associated-Thoughts Framework for Enhancing Large Language Models Reasoning	Jianfeng Pan et.al.	2502.02390	null
2025-02-04	Hypergraph Link Prediction via Hyperedge Copying	Xie He et.al.	2502.02386	null
2025-02-04	STAIR: Improving Safety Alignment with Introspective Reasoning	Yichi Zhang et.al.	2502.02384	link
2025-02-04	Evaluating the Effectiveness of LLMs in Fixing Maintainability Issues in Real-World Projects	Henrique Nunes et.al.	2502.02368	null
2025-02-04	Field Matching: an Electrostatic Paradigm to Generate and Transfer Data	Alexander Kolesov et.al.	2502.02367	null
2025-02-04	Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs	Sagnik Mukherjee et.al.	2502.02362	null
2025-02-04	SHIELD: APT Detection and Intelligent Explanation Using LLM	Parth Atulbhai Gandhi et.al.	2502.02342	null
2025-02-04	Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking	Jinyang Wu et.al.	2502.02339	null
2025-02-04	ReSpark: Leveraging Previous Data Reports as References to Generate New Reports with LLMs	Yuan Tian et.al.	2502.02329	null
2025-02-04	Information-Theoretic Proofs for Diffusion Sampling	Galen Reeves et.al.	2502.02305	null
2025-02-04	Density Ratio Estimation with Conditional Probability Paths	Hanlin Yu et.al.	2502.02300	null
2025-02-04	Evalita-LLM: Benchmarking Large Language Models on Italian	Bernardo Magnini et.al.	2502.02289	null
2025-02-04	Adaptive Resource Allocation Optimization Using Large Language Models in Dynamic Wireless Environments	Hyeonho Noh et.al.	2502.02287	null
2025-02-04	Conversation AI Dialog for Medicare powered by Finetuning and Retrieval Augmented Generation	Atharva Mangeshkumar Agrawal et.al.	2502.02249	null
2025-02-04	Flatten Graphs as Sequences: Transformers are Scalable Graph Generators	Dexiong Chen et.al.	2502.02216	null
2025-02-04	When Dimensionality Hurts: The Role of LLM Embedding Compression for Noisy Regression Tasks	Felix Drinkall et.al.	2502.02199	link
2025-02-04	Large language models in climate and sustainability policy: limits and opportunities	Francesca Larosa et.al.	2502.02191	null
2025-02-04	ShapeShifter: 3D Variations Using Multiscale and Sparse Point-Voxel Diffusion	Nissim Maruani et.al.	2502.02187	null
2025-02-04	Generative Kernel Spectral Clustering	David Winant et.al.	2502.02185	null
2025-02-04	Mass-Editing Memory with Attention in Transformers: A cross-lingual exploration of knowledge	Daniel Tamayo et.al.	2502.02173	link
2025-02-04	EditIQ: Automated Cinematic Editing of Static Wide-Angle Videos via Dialogue Interpretation and Saliency Cues	Rohit Girmaji et.al.	2502.02172	null
2025-02-04	Risk-Aware Driving Scenario Analysis with Large Language Models	Yuan Gao et.al.	2502.02145	link
2025-02-04	IPO: Iterative Preference Optimization for Text-to-Video Generation	Xiaomeng Yang et.al.	2502.02088	null
2025-02-04	Position Paper: Building Trust in Synthetic Data for Clinical AI	Krishan Agyakari Raja Babu et.al.	2502.02076	null
2025-02-04	Rethinking stance detection: A theoretically-informed research agenda for user-level inference using language models	Prasanta Bhattacharya et.al.	2502.02074	null
2025-02-04	ASCenD-BDS: Adaptable, Stochastic and Context-aware framework for Detection of Bias, Discrimination and Stereotyping	Rajiv Bahl et.al.	2502.02072	null
2025-02-04	Robust and Secure Code Watermarking for Large Language Models via ML/Crypto Codesign	Ruisi Zhang et.al.	2502.02068	null
2025-02-04	AdaptBot: Combining LLM with Knowledge Graphs and Human Input for Generic-to-Specific Task Decomposition and Knowledge Refinement	Shivam Singh et.al.	2502.02067	link
2025-02-04	Anticipate & Act : Integrating LLMs and Classical Planning for Efficient Task Execution in Household Environments	Raghav Arora et.al.	2502.02066	null
2025-02-04	CASIM: Composite Aware Semantic Injection for Text to Motion Generation	Che-Jui Chang et.al.	2502.02063	null
2025-02-04	Large Language Models for Recommendation with Deliberative User Preference Alignment	Yi Fang et.al.	2502.02061	null
2025-02-04	Efficient Domain Adaptation of Multimodal Embeddings using Constrastive Learning	Georgios Margaritis et.al.	2502.02048	null
2025-02-04	Contextual Memory Reweaving in Large Language Models Using Layered Latent State Reconstruction	Frederick Dillon et.al.	2502.02046	null
2025-02-04	M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference	Nikhil Bhendawade et.al.	2502.02040	null
2025-02-04	ContinuouSP: Generative Model for Crystal Structure Prediction with Invariance and Continuity	Yuji Tone et.al.	2502.02026	null
2025-02-04	From Accidents to Insights: Leveraging Multimodal Data for Scenario-Driven ADS Testing	Siwei Luo et.al.	2502.02025	null
2025-02-04	ComplexDec: A Domain-robust High-fidelity Neural Audio Codec with Complex Spectrum Modeling	Yi-Chiao Wu et.al.	2502.02019	null
2025-02-04	Multi-Domain Graph Foundation Models: Robust Knowledge Transfer via Topology Alignment	Shuo Wang et.al.	2502.02017	null
2025-02-04	A Periodic Bayesian Flow for Material Generation	Hanlin Wu et.al.	2502.02016	link
2025-02-04	Layer by Layer: Uncovering Hidden Representations in Language Models	Oscar Skean et.al.	2502.02013	null
2025-02-04	LLMSecConfig: An LLM-Based Approach for Fixing Software Container Misconfigurations	Ziyang Ye et.al.	2502.02009	null
2025-02-04	Reasoning Bias of Next Token Prediction Training	Pengxiao Lin et.al.	2502.02007	null
2025-02-04	FinRLlama: A Solution to LLM-Engineered Signals Challenge at FinRL Contest 2024	Arnav Grover et.al.	2502.01992	null
2025-02-04	Can LLMs Assist Annotators in Identifying Morality Frames? -- Case Study on Vaccination Debate on Social Media	Tunazzina Islam et.al.	2502.01991	null
2025-02-04	Generative Data Mining with Longtail-Guided Diffusion	David S. Hayden et.al.	2502.01980	null
2025-02-04	Gradient-Regularized Latent Space Modulation in Large Language Models for Structured Contextual Synthesis	Derek Yotheringhay et.al.	2502.01979	null
2025-02-04	AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs	Hongxin Li et.al.	2502.01977	null
2025-02-04	CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing	Wenhao Zheng et.al.	2502.01976	null
2025-02-04	Token Cleaning: Fine-Grained Data Selection for LLM Supervised Fine-Tuning	Jinlong Pang et.al.	2502.01968	null
2025-02-04	MPIC: Position-Independent Multimodal Context Caching System for Efficient MLLM Serving	Shiju Zhao et.al.	2502.01960	null
2025-02-04	Local minima of the empirical risk in high dimension: General theorems and convex examples	Kiana Asgari et.al.	2502.01953	null
2025-02-04	DAMO: Data- and Model-aware Alignment of Multi-modal LLMs	Jinda Lu et.al.	2502.01943	null
2025-02-04	Can LLMs Maintain Fundamental Abilities under KV Cache Compression?	Xiang Liu et.al.	2502.01941	null
2025-02-04	Toward a Low-Cost Perception System in Autonomous Vehicles: A Spectrum Learning Approach	Mohammed Alsakabi et.al.	2502.01940	null
2025-02-04	Distributionally Robust Direct Preference Optimization	Zaiyan Xu et.al.	2502.01930	null
2025-02-04	PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negative Demonstration, and Adaptive Sampling	Avery Ma et.al.	2502.01925	null
2025-02-04	LAST SToP For Modeling Asynchronous Time Series	Shubham Gupta et.al.	2502.01922	null
2025-02-04	Anomaly Detection via Autoencoder Composite Features and NCE	Yalin Liao et.al.	2502.01920	null
2025-02-04	Unlocking Efficient Large Inference Models: One-Bit Unrolling Tips the Scales	Arian Eamaz et.al.	2502.01908	null
2025-02-04	Rethinking Homogeneity of Vision and Text Tokens in Large Vision-and-Language Models	Chia-Wen Kuo et.al.	2502.01906	null
2025-02-04	Conceptual Metaphor Theory as a Prompting Paradigm for Large Language Models	Oliver Kramer et.al.	2502.01901	null
2025-02-03	Latent Lexical Projection in Large Language Models: A Novel Approach to Implicit Representation Refinement	Ziad Shaker et.al.	2502.01882	null
2025-02-03	SE Arena: Benchmarking Software Engineering Chatbots with Iterative Interactions	Zhimin Zhao et.al.	2502.01860	null
2025-02-03	Security and Quality in LLM-Generated Code: A Multi-Language, Multi-Model Analysis	Mohammed Kharma et.al.	2502.01853	null
2025-02-03	Foundation Model-Based Apple Ripeness and Size Estimation for Selective Harvesting	Keyi Zhu et.al.	2502.01850	link
2025-02-03	Relatively-Secure LLM-Based Steganography via Constrained Markov Decision Processes	Yu-Shin Huang et.al.	2502.01827	link
2025-02-03	Agentic Bug Reproduction for Effective Automated Program Repair at Google	Runxiang Cheng et.al.	2502.01821	null
2025-02-03	Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning	Hanyang Zhao et.al.	2502.01819	null
2025-02-03	SelfCheckAgent: Zero-Resource Hallucination Detection in Generative Large Language Models	Diyana Muhammed et.al.	2502.01812	null
2025-02-03	Toward Neurosymbolic Program Comprehension	Alejandro Velasco et.al.	2502.01806	null
2025-02-03	Discovering Chunks in Neural Embeddings for Interpretability	Shuchen Wu et.al.	2502.01803	null
2025-02-03	Harmful Terms and Where to Find Them: Measuring and Modeling Unfavorable Financial Terms and Conditions in Shopping Websites at Scale	Elisa Tsai et.al.	2502.01798	link
2025-01-31	Vintix: Action Model via In-Context Reinforcement Learning	Andrey Polubarov et.al.	2501.19400	link
2025-01-31	Do LLMs Strategically Reveal, Conceal, and Infer Information? A Theoretical and Empirical Analysis in The Chameleon Game	Mustafa O. Karabag et.al.	2501.19398	link
2025-01-31	Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models	Alina Shutova et.al.	2501.19392	link
2025-01-31	Federated Sketching LoRA: On-Device Collaborative Fine-Tuning of Large Language Models	Wenzhi Fang et.al.	2501.19389	link
2025-02-03	SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions	Dominik Wagner et.al.	2501.19377	null
2025-01-31	Beyond Fixed Horizons: A Theoretical Framework for Adaptive Denoising Diffusions	Sören Christensen et.al.	2501.19373	null
2025-01-31	We're Different, We're the Same: Creative Homogeneity Across LLMs	Emily Wenger et.al.	2501.19361	null
2025-01-31	Mechanical Properties of the Meninges: Large Language Model Assisted Systematic Review of over 25,000 Studies	Brandon P. Chelstrom et.al.	2501.19359	null
2025-01-31	The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking	Yuchun Miao et.al.	2501.19358	null
2025-01-31	Addressing the correlation of Stokes-shifted photons emitted from two quantum emitters	Adrián Juan-Delgado et.al.	2501.19356	null
2025-01-31	Do Large Multimodal Models Solve Caption Generation for Scientific Figures? Lessons Learned from SCICAP Challenge 2023	Ting-Yao E. Hsu et.al.	2501.19353	null
2025-01-31	Towards Adaptive Self-Improvement for Smarter Energy Systems	Alexander Sommer et.al.	2501.19340	null
2025-01-31	PixelWorld: Towards Perceiving Everything as Pixels	Zhiheng Lyu et.al.	2501.19339	null
2025-01-31	Homogeneity Bias as Differential Sampling Uncertainty in Language Models	Messi H. J. Lee et.al.	2501.19337	null
2025-01-31	Reward-Guided Speculative Decoding for Efficient LLM Reasoning	Baohao Liao et.al.	2501.19324	null
2025-01-31	MINDSTORES: Memory-Informed Neural Decision Synthesis for Task-Oriented Reinforcement in Embodied Systems	Anirudh Chari et.al.	2501.19318	null
2025-01-31	LLM-based Affective Text Generation Quality Based on Different Quantization Values	Yarik Menchaca Resendiz et.al.	2501.19317	null
2025-01-31	Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment	Gregor Bachmann et.al.	2501.19309	null
2025-02-03	SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling	Jiefeng Chen et.al.	2501.19306	null
2025-01-31	Beyond checkmate: exploring the creative chokepoints in AI text	Nafis Irtiza Tripto et.al.	2501.19301	link
2025-01-31	Offline Learning for Combinatorial Multi-armed Bandits	Xutong Liu et.al.	2501.19300	null
2025-01-31	Synthetic User Behavior Sequence Generation with Large Language Models for Smart Homes	Zhiyao Xu et.al.	2501.19298	null
2025-01-31	Analysis of LLMs vs Human Experts in Requirements Engineering	Cory Hymel et.al.	2501.19297	null
2025-01-31	Low-Cost and Comprehensive Non-textual Input Fuzzing with LLM-Synthesized Input Generators	Kunpeng Zhang et.al.	2501.19282	null
2025-01-31	Pheromone-based Learning of Optimal Reasoning Paths	Anirudh Chari et.al.	2501.19278	null
2025-01-31	From Assistance to Autonomy -- A Researcher Study on the Potential of AI Support for Qualitative Data Analysis	Elisabeth Kirsten et.al.	2501.19275	null
2025-01-31	Jackpot! Alignment as a Maximal Lottery	Roberto-Rafael Maura-Rivero et.al.	2501.19266	null
2025-01-31	Neuro-LIFT: A Neuromorphic, LLM-based Interactive Framework for Autonomous Drone FlighT at the Edge	Amogh Joshi et.al.	2501.19259	null
2025-01-31	A Zero-Shot Generalization Framework for LLM-Driven Cross-Domain Sequential Recommendation	Yunzhe Li et.al.	2501.19232	null
2025-01-31	Autonomous Legacy Web Application Upgrades Using a Multi-Agent System	Valtteri Ala-Salmi et.al.	2501.19204	link
2025-02-03	Improving the Robustness of Representation Misdirection for Large Language Model Unlearning	Dang Huu-Tien et.al.	2501.19202	link
2025-01-31	Efficient Reasoning with Hidden Thinking	Xuan Shen et.al.	2501.19201	link
2025-01-31	Enhancing Model Defense Against Jailbreaks with Proactive Safety Reasoning	Xianglin Yang et.al.	2501.19180	null
2025-01-31	No Foundations without Foundations -- Why semi-mechanistic models are essential for regulatory biology	Luka Kovačević et.al.	2501.19178	null
2025-01-31	Position: Contextual Integrity Washing for Language Models	Yan Shvartzshnaider et.al.	2501.19173	null
2025-01-31	Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs	Kejia Zhang et.al.	2501.19164	null
2025-01-31	A theoretical framework for overfitting in energy-based modeling	Giovanni Catania et.al.	2501.19158	null
2025-01-31	A Tensor-Train Decomposition based Compression of LLMs on Group Vector Systolic Accelerator	Sixiao Huang et.al.	2501.19135	null
2025-01-31	Unraveling Zeroth-Order Optimization through the Lens of Low-Dimensional Structured Perturbations	Sihwan Park et.al.	2501.19099	null
2025-01-31	Ambient Denoising Diffusion Generative Adversarial Networks for Establishing Stochastic Object Models from Noisy Image Data	Xichen Xu et.al.	2501.19094	null
2025-01-31	Pivoting Factorization: A Compact Meta Low-Rank Representation of Sparsity for Efficient Inference in Large Language Models	Jialin Zhao et.al.	2501.19090	null
2025-01-31	Fairness Analysis of CLIP-Based Foundation Models for X-Ray Image Classification	Xiangyu Sun et.al.	2501.19086	null
2025-01-31	Enhancing Code Generation for Low-Resource Languages: No Silver Bullet	Alessandro Giagnorio et.al.	2501.19085	null
2025-01-31	Concept Steerers: Leveraging K-Sparse Autoencoders for Controllable Generations	Dahye Kim et.al.	2501.19066	link
2025-01-31	TeZO: Empowering the Low-Rankness on the Temporal Dimension in the Zeroth-Order Optimization for Fine-tuning LLMs	Yan Sun et.al.	2501.19057	null
2025-01-31	Enabling Autonomic Microservice Management through Self-Learning Agents	Fenglin Yu et.al.	2501.19056	null
2025-01-31	Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models	Ruiyu Wang et.al.	2501.19054	null
2025-01-31	Swarm-Gen: Fast Generation of Diverse Feasible Swarm Behaviors	Simon Idoko et.al.	2501.19042	link
2025-01-31	Towards the Worst-case Robustness of Large Language Models	Huanran Chen et.al.	2501.19040	null
2025-01-31	Beyond Token Compression: A Training-Free Reduction Framework for Efficient Visual Processing in MLLMs	Hongliang Li et.al.	2501.19036	null
2025-01-31	XRF V2: A Dataset for Action Summarization with Wi-Fi Signals, and IMUs in Phones, Watches, Earbuds, and Glasses	Bo Lan et.al.	2501.19034	link
2025-01-31	Multilayer Networks in Neuroimaging	Vesna Vuksanovic et.al.	2501.19024	null
2025-01-31	Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation	Bin Zhu et.al.	2501.19017	null
2025-01-31	Importing Phantoms: Measuring LLM Package Hallucination Vulnerabilities	Arjun Krishna et.al.	2501.19012	null
2025-01-31	Visual Autoregressive Modeling for Image Super-Resolution	Yunpeng Qu et.al.	2501.18993	null
2025-01-31	Symmetric Pruning of Large Language Models	Kai Yi et.al.	2501.18980	null
2025-01-31	BCAT: A Block Causal Transformer for PDE Foundation Models for Fluid Dynamics	Yuxuan Liu et.al.	2501.18972	null
2025-01-31	Spend Wisely: Maximizing Post-Training Gains in Iterative Synthetic Data Boostrapping	Pu Yang et.al.	2501.18962	link
2025-01-31	Intrinsic Tensor Field Propagation in Large Language Models: A Novel Approach to Contextual Information Flow	Alfred Bexley et.al.	2501.18957	null
2025-01-31	LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models	Shenghao Fu et.al.	2501.18954	link
2025-01-31	TabFSBench: Tabular Benchmark for Feature Shifts in Open Environment	Zi-Jian Cheng et.al.	2501.18935	link
2025-01-31	Language Games as the Pathway to Artificial Superhuman Intelligence	Ying Wen et.al.	2501.18924	null
2025-01-31	KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search	Haoran Luo et.al.	2501.18922	link
2025-01-31	LLM Program Optimization via Retrieval Augmented Search	Sagnik Anupam et.al.	2501.18916	null
2025-01-31	Scaling Laws for Differentially Private Language Models	Ryan McKenna et.al.	2501.18914	null
2025-01-31	Streamlining Security Vulnerability Triage with Large Language Models	Mohammad Jalili Torkamani et.al.	2501.18908	null
2025-01-31	Trustworthy Evaluation of Generative AI Models	Zijun Gao et.al.	2501.18897	null
2025-01-31	Can We Predict the Effect of Prompts?	Jae Yong Lee et.al.	2501.18883	null
2025-01-31	Adaptivity and Convergence of Probability Flow ODEs in Diffusion Generative Models	Jiaqi Tang et.al.	2501.18863	null
2025-01-31	BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning	Han Zhong et.al.	2501.18858	null
2025-01-31	Equivariant Hypergraph Diffusion for Crystal Structure Prediction	Yang Liu et.al.	2501.18850	null
2025-01-31	Text Data Augmentation for Large Language Models: A Comprehensive Survey of Methods, Challenges, and Opportunities	Yaping Chai et.al.	2501.18845	null
2025-01-31	Trading Inference-Time Compute for Adversarial Robustness	Wojciech Zaremba et.al.	2501.18841	null
2025-01-31	Partially Rewriting a Transformer in Natural Language	Gonçalo Paulo et.al.	2501.18838	link
2025-01-31	Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming	Mrinank Sharma et.al.	2501.18837	null
2025-01-31	Pitfalls of defacing whole-head MRI: re-identification risk with diffusion models and compromised research potential	Chenyu Gao et.al.	2501.18834	null
2025-01-31	Structural Embedding Projection for Contextual Large Language Model Inference	Vincent Enoasmo et.al.	2501.18826	null
2025-01-31	Bridging the Reasoning Gap: Small LLMs Can Plan with Generalised Strategies	Andrey Borro et.al.	2501.18817	link
2025-01-31	Large Language Models as Common-Sense Heuristics	Andrey Borro et.al.	2501.18816	null
2025-01-30	Compositional Generalization Requires More Than Disentangled Representations	Qiyao Liang et.al.	2501.18797	null
2025-01-30	Rope to Nope and Back Again: A New Hybrid Attention Strategy	Bowen Yang et.al.	2501.18795	null
2025-01-30	Survey and Improvement Strategies for Gene Prioritization with Large Language Models	Matthew Neeley et.al.	2501.18794	null
2025-01-30	LLM-Generated Heuristics for AI Planning: Do We Even Need Domain-Independence Anymore?	Alexander Tuisov et.al.	2501.18784	null
2025-01-30	Navigating the Fragrance space Via Graph Generative Models And Predicting Odors	Mrityunjay Sharma et.al.	2501.18777	link
2025-01-30	Probabilistic Joint Recovery Method for CO $_2$ Plume Monitoring	Zijun Deng et.al.	2501.18761	null
2025-01-30	Synthetic Data Generation for Augmenting Small Samples	Dan Liu et.al.	2501.18741	null
2025-01-30	Examining the Robustness of Large Language Models across Language Complexity	Jiayi Zhang et.al.	2501.18738	null
2025-01-30	Exploring Audio Editing Features as User-Centric Privacy Defenses Against Emotion Inference Attacks	Mohd. Farhan Israk Soumik et.al.	2501.18727	null
2025-01-30	Strong and Controllable 3D Motion Generation	Canxuan Gang et.al.	2501.18726	null
2025-01-30	Zero-shot Large Language Models for Long Clinical Text Summarization with Temporal Reasoning	Maya Kruse et.al.	2501.18724	null
2025-02-03	Invisible Traces: Using Hybrid Fingerprinting to identify underlying LLMs in GenAI Apps	Devansh Bhardwaj et.al.	2501.18712	null
2025-01-30	Regularized second-order optimization of tensor-network Born machines	Matan Ben-Dov et.al.	2501.18691	null
2025-01-30	Drag Your Gaussian: Effective Drag-Based Editing with Score Distillation for 3D Gaussian Splatting	Yansong Qu et.al.	2501.18672	null
2025-01-30	Foundational Models for 3D Point Clouds: A Survey and Outlook	Vishal Thengane et.al.	2501.18594	null
2025-01-30	Diffusion Autoencoders are Scalable Image Tokenizers	Yinbo Chen et.al.	2501.18593	null
2025-02-03	Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models	Hao Dong et.al.	2501.18592	link
2025-01-30	Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs	Yue Wang et.al.	2501.18585	null
2025-01-30	Token-Hungry, Yet Precise: DeepSeek R1 Highlights the Need for Multi-Step Reasoning Over Speed in MATH	Evgenii Evstafev et.al.	2501.18576	null
2025-01-30	BounTCHA: A CAPTCHA Utilizing Boundary Identification in AI-extended Videos	Lehao Lin et.al.	2501.18565	null
2025-01-30	SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation	Haoquan Fang et.al.	2501.18564	link
2025-01-30	Semantic Web and Creative AI -- A Technical Report from ISWS 2023	Raia Abu Ahmad et.al.	2501.18542	null
2025-01-30	Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges	Manveer Singh Tamber et.al.	2501.18536	link
2025-01-30	Differentially Private Steering for Large Language Model Alignment	Anmol Goel et.al.	2501.18532	link
2025-01-30	Learn from the Past: Language-conditioned Object Rearrangement with Large Language Models	Guanqun Cao et.al.	2501.18516	null
2025-01-30	Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch	Arthur Douillard et.al.	2501.18512	null
2025-01-30	WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training	Benjamin Feuer et.al.	2501.18511	link
2025-01-30	CLEAR: Cue Learning using Evolution for Accurate Recognition Applied to Sustainability Data Extraction	Peter J. Bentley et.al.	2501.18504	null
2025-01-30	Examining the Expanding Role of Synthetic Data Throughout the AI Development Pipeline	Shivani Kapania et.al.	2501.18493	null
2025-01-30	A Tool for In-depth Analysis of Code Execution Reasoning of Large Language Models	Changshu Liu et.al.	2501.18482	null
2025-01-30	CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization	Yanxia Deng et.al.	2501.18475	null
2025-01-30	Tuning Vision Foundation Model via Test-Time Prompt-Guided Training for VFSS Segmentations	Chengxi Zeng et.al.	2501.18474	null
2025-01-30	ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation	Minghua He et.al.	2501.18460	null
2025-01-30	CALM: Unleashing the Cross-Lingual Self-Aligning Ability of Language Model Question Answering	Yumeng Wang et.al.	2501.18457	null
2025-01-30	GENIE: Generative Note Information Extraction model for structuring EHR data	Huaiyuan Ying et.al.	2501.18435	null
2025-01-30	Exploring Potential Prompt Injection Attacks in Federated Military LLMs and Their Mitigation	Youngjoon Lee et.al.	2501.18416	null
2025-01-30	RbFT: Robust Fine-tuning for Retrieval-Augmented Generation against Retrieval Defects	Yiteng Tu et.al.	2501.18365	link
2025-01-30	A Video-grounded Dialogue Dataset and Metric for Event-driven Activities	Wiradee Imrattanatrai et.al.	2501.18324	link
2025-01-30	Leveraging LLM Agents for Automated Optimization Modeling for SASP Problems: A Graph-RAG based Approach	Tianpeng Pan et.al.	2501.18320	null
2025-01-30	Mining for Species, Locations, Habitats, and Ecosystems from Scientific Papers in Invasion Biology: A Large-Scale Exploratory Study with Large Language Models	Jennifer D'Souza et.al.	2501.18287	null
2025-01-30	Jailbreaking LLMs' Safeguard with Universal Magic Words for Text Embedding Models	Haoyu Liang et.al.	2501.18280	null
2025-01-30	Collecting Cost-Effective, High-Quality Truthfulness Assessments with LLM Summarized Evidence	Kevin Roitero et.al.	2501.18265	null
2025-01-30	How to Select Datapoints for Efficient Human Evaluation of NLG Models?	Vilém Zouhar et.al.	2501.18251	link
2025-01-30	Statistical multi-metric evaluation and visualization of LLM system predictive performance	Samuel Ackerman et.al.	2501.18243	null
2025-01-30	Contextually Structured Token Dependency Encoding for Large Language Models	James Blades et.al.	2501.18205	null
2025-01-30	Economic Rationality under Specialization: Evidence of Decision Bias in AI Agents	ShuiDe Wen et.al.	2501.18190	null
2025-01-30	Investigating Tax Evasion Emergence Using Dual Large Language Model and Deep Reinforcement Learning Powered Agent-based Simulation	Teddy Lazebnik et.al.	2501.18177	null
2025-01-30	Continually Evolved Multimodal Foundation Models for Cancer Prognosis	Jie Peng et.al.	2501.18170	null
2025-01-30	RepoAudit: An Autonomous LLM-Agent for Repository-Level Code Auditing	Jinyao Guo et.al.	2501.18160	null
2025-01-30	Large Language Models for Cryptocurrency Transaction Analysis: A Bitcoin Case Study	Yuchen Lei et.al.	2501.18158	null
2025-01-30	Mixed-Precision Graph Neural Quantization for Low Bit Large Language Models	Wanlong Liu et.al.	2501.18154	null
2025-01-30	Self-supervised Quantized Representation for Seamlessly Integrating Knowledge Graphs with Large Language Models	Qika Lin et.al.	2501.18119	null
2025-01-30	Scaling Inference-Efficient Language Models	Song Bian et.al.	2501.18107	null
2025-01-30	Panacea: Mitigating Harmful Fine-tuning for Large Language Models via Post-fine-tuning Perturbation	Yibo Wang et.al.	2501.18100	link
2025-01-30	AlphaAdam:Asynchronous Masked Optimization with Dynamic Alpha for Selective Updates	Da Chang et.al.	2501.18094	null
2025-01-30	Normative Evaluation of Large Language Models with Everyday Moral Dilemmas	Pratik S. Sachdeva et.al.	2501.18081	null
2025-01-30	FinanceQA: A Benchmark for Evaluating Financial Analysis Capabilities of Large Language Models	Spencer Mateega et.al.	2501.18062	null
2025-01-29	RL-based Query Rewriting with Distilled LLM for online E-Commerce Systems	Duy A. Nguyen et.al.	2501.18056	null
2025-01-29	Current Pathology Foundation Models are unrobust to Medical Center Differences	Edwin D. de Jong et.al.	2501.18055	null
2025-01-29	A Proximal Operator for Inducing 2:4-Sparsity	Jonas M Kübler et.al.	2501.18015	null
2025-01-29	Large Language Models Think Too Fast To Explore Effectively	Lan Pan et.al.	2501.18009	null
2025-01-29	Fault Localization via Fine-tuning Large Language Models with Mutation Generated Stack Traces	Neetha Jambigi et.al.	2501.18005	null
2025-01-29	InnerThoughts: Disentangling Representations and Predictions in Large Language Models	Didier Chételat et.al.	2501.17994	null
2025-01-29	Can Generative LLMs Create Query Variants for Test Collections? An Exploratory Study	Marwah Alaofi et.al.	2501.17981	link
2025-01-29	Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization	Zishun Yu et.al.	2501.17974	null
2025-01-29	"I Would Never Trust Anything Western": Kumu (Educator) Perspectives on Use of LLMs for Culturally Revitalizing CS Education in Hawaiian Schools	Manas Mhasakar et.al.	2501.17942	null
2025-01-29	DReSS: Data-driven Regularized Structured Streamlining for Large Language Models	Mingkuan Feng et.al.	2501.17905	null
2025-01-29	Learning Beyond the Surface: How Far Can Continual Pre-Training with LoRA Enhance LLMs' Domain-Specific Insight Learning?	Pouya Pezeshkpour et.al.	2501.17840	link
2025-01-29	Aggregation Schemes for Single-Vector WSI Representation Learning in Digital Pathology	Sobhan Hemati et.al.	2501.17822	null
2025-01-30	Leveraging Multimodal LLM for Inspirational User Interface Search	Seokhyeon Park et.al.	2501.17799	link
2025-01-29	BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation -- Challenges and Insights	Chan-Jan Hsu et.al.	2501.17790	null
2025-01-29	AdditiveLLM: Large Language Models Predict Defects in Additive Manufacturing	Peter Pak et.al.	2501.17784	null
2025-01-29	2SSP: A Two-Stage Framework for Structured Pruning of LLMs	Fabrizio Sandri et.al.	2501.17771	link
2025-01-29	Generative Unordered Flow for Set-Structured Data Generation	Yangming Li et.al.	2501.17770	null
2025-01-29	Hybrid Graphs for Table-and-Text based Question Answering using LLMs	Ankush Agarwal et.al.	2501.17767	null
2025-01-29	On the Partitioning of GPU Power among Multi-Instances	Tirth Vamja et.al.	2501.17752	null
2025-01-29	Early External Safety Testing of OpenAI's o3-mini: Insights from the Pre-Deployment Evaluation	Aitor Arrieta et.al.	2501.17749	null
2025-01-29	A technical review of multi-omics data integration methods: from classical statistical to deep generative approaches	Ana R. Baião et.al.	2501.17729	null
2025-01-29	Using Code Generation to Solve Open Instances of Combinatorial Design Problems	Christopher D. Rosin et.al.	2501.17725	link
2025-01-29	RICoTA: Red-teaming of In-the-wild Conversation with Test Attempts	Eujeong Choi et.al.	2501.17715	link
2025-01-29	Source-Channel Separation Theorems for Distortion Perception Coding	Chao Tian et.al.	2501.17706	null
2025-01-29	Planning with Vision-Language Models and a Use Case in Robot-Assisted Teaching	Xuzhe Dang et.al.	2501.17665	null
2025-01-30	In-Context Meta LoRA Generation	Yihua Shao et.al.	2501.17635	null
2025-01-29	Uncertainty Quantification and Decomposition for LLM-based Recommendation	Wonbin Kweon et.al.	2501.17630	link
2025-01-29	The Imitation Game According To Turing	Sharon Temtsin et.al.	2501.17629	null
2025-01-29	Structured Context Recomposition for Large Language Models Using Probabilistic Layer Realignment	Jonathan Teel et.al.	2501.17617	null
2025-01-29	Semantic Consistency Regularization with Large Language Models for Semi-supervised Sentiment Analysis	Kunrong Li et.al.	2501.17598	null
2025-01-30	Technical report on label-informed logit redistribution for better domain generalization in low-shot classification with foundation models	Behraj Khan et.al.	2501.17595	null
2025-01-29	GLLM: Self-Corrective G-Code Generation using Large Language Models with User Feedback	Mohamed Abdelaal et.al.	2501.17584	null
2025-01-29	CSEval: Towards Automated, Multi-Dimensional, and Reference-Free Counterspeech Evaluation using Auto-Calibrated LLMs	Amey Hengle et.al.	2501.17581	null
2025-01-29	Music2Latent2: Audio Compression with Summary Embeddings and Autoregressive Decoding	Marco Pasini et.al.	2501.17578	null
2025-01-29	Query-Aware Learnable Graph Pooling Tokens as Prompt for Large Language Models	Wooyoung Kim et.al.	2501.17549	null
2025-01-29	Towards Training-Free Open-World Classification with 3D Generative Models	Xinzhe Xia et.al.	2501.17547	null
2025-01-29	Is Conversational XAI All You Need? Human-AI Decision Making With a Conversational XAI Assistant	Gaole He et.al.	2501.17546	link
2025-01-29	Towards Supporting Penetration Testing Education with Large Language Models: an Evaluation and Comparison	Martin Nizon-Deladoeuille et.al.	2501.17539	null
2025-01-29	Neural Spelling: A Spell-Based BCI System for Language Neural Decoding	Xiaowei Jiang et.al.	2501.17489	null
2025-01-29	DFPE: A Diverse Fingerprint Ensemble for Enhancing LLM Performance	Seffi Cohen et.al.	2501.17479	link
2025-01-29	AugmenTest: Enhancing Tests with LLM-Driven Oracles	Shaker Mahmud Khandaker et.al.	2501.17461	null
2025-01-29	Large Language Models for Single-Step and Multi-Step Flight Trajectory Prediction	Kaiwei Luo et.al.	2501.17459	null
2025-01-29	Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation	Tiansheng Huang et.al.	2501.17433	link
2025-01-29	Actions Speak Louder than Words: Agent Decisions Reveal Implicit Biases in Language Models	Yuxuan Li et.al.	2501.17420	null
2025-01-29	MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs	Ved Sirdeshmukh et.al.	2501.17399	link
2025-01-29	Learning Free Token Reduction for Multi-Modal LLM	Zihui Zhao et.al.	2501.17391	null
2025-01-29	Context-Aware Semantic Recomposition Mechanism for Large Language Models	Richard Katrix et.al.	2501.17386	null
2025-01-28	Deep-and-Wide Learning: Enhancing Data-Driven Inference via Synergistic Learning of Inter- and Intra-Data Representations	Md Tauhidul Islam et.al.	2501.17347	null
2025-01-28	Memorize and Rank: Elevating Large Language Models for Clinical Diagnosis Prediction	Mingyu Derek Ma et.al.	2501.17326	null
2025-01-28	CardiCat: a Variational Autoencoder for High-Cardinality Tabular Data	Lee Carlin et.al.	2501.17324	null
2025-01-30	Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding	Yun-Shiuan Chuang et.al.	2501.17310	null
2025-01-28	"Ownership, Not Just Happy Talk": Co-Designing a Participatory Large Language Model for Journalism	Emily Tseng et.al.	2501.17299	null
2025-01-28	Mitigating Hallucinated Translations in Large Language Models with Hallucination-focused Preference Optimization	Zilu Tang et.al.	2501.17295	null
2025-01-28	Fine-Tuning Open-Source Large Language Models to Improve Their Performance on Radiation Oncology Tasks: A Feasibility Study to Investigate Their Potential Clinical Applications in Radiation Oncology	Peilong Wang et.al.	2501.17286	null
2025-01-30	From Natural Language to Extensive-Form Game Representations	Shilong Deng et.al.	2501.17282	link
2025-01-28	Engineering Point Defects in MoS2 for Tailored Material Properties using Large Language Models	Abdalaziz Al-Maeeni et.al.	2501.17279	null
2025-01-28	Tailored Truths: Optimizing LLM Persuasion with Personalization and Fabricated Statistics	Jasper Timm et.al.	2501.17273	link
2025-01-28	Integrating Reinforcement Learning and AI Agents for Adaptive Robotic Interaction and Assistance in Dementia Care	Fengpei Yuan et.al.	2501.17206	null
2025-01-28	SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training	Tianzhe Chu et.al.	2501.17161	null
2025-01-28	FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data	Deren Lei et.al.	2501.17144	link
2025-01-28	ASTRAL: Automated Safety Testing of Large Language Models	Miriam Ugarte et.al.	2501.17132	null
2025-01-28	Optimizing Large Language Model Training Using FP4 Quantization	Ruizhe Wang et.al.	2501.17116	null
2025-01-28	Unlocking Transparent Alignment Through Enhanced Inverse Constitutional AI for Principle Extraction	Carl-Leander Henneking et.al.	2501.17112	null
2025-01-28	Goodness of Fit for Bayesian Generative Models with Applications in Population Genetics	Guillaume Le Mailloux et.al.	2501.17107	link
2025-01-28	Token-by-Token Regeneration and Domain Biases: A Benchmark of LLMs on Advanced Mathematical Problem-Solving	Evgenii Evstafev et.al.	2501.17084	null
2025-01-28	Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding	Akash Kumar et.al.	2501.17053	null
2025-01-28	Enhanced Retrieval of Long Documents: Leveraging Fine-Grained Block Representations with Large Language Models	Minghan Li et.al.	2501.17039	null
2025-01-28	Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies	Manojkumar Parmar et.al.	2501.17030	null
2025-01-28	Automated Refactoring of Non-Idiomatic Python Code: A Differentiated Replication with LLMs	Alessandro Midolo et.al.	2501.17024	link
2025-01-28	Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement	Kei Katsumata et.al.	2501.17022	link
2025-01-28	MIDI-GPT: A Controllable Generative Model for Computer-Assisted Multitrack Music Composition	Philippe Pasquier et.al.	2501.17011	null
2025-01-28	Large Language Models for Code Generation: The Practitioners Perspective	Zeeshan Rasheed et.al.	2501.16998	link
2025-01-28	Artificial Intelligence Clones	Annie Liang et.al.	2501.16996	null
2025-01-28	FedEFM: Federated Endovascular Foundation Model with Unseen Data	Tuong Do et.al.	2501.16992	null
2025-01-28	Generative quantum combinatorial optimization by means of a novel conditional generative quantum eigensolver	Shunya Minami et.al.	2501.16986	null
2025-01-28	Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling	Hongzhi Huang et.al.	2501.16975	null
2025-01-28	Instantiation-based Formalization of Logical Reasoning Tasks using Language Models and Logical Solvers	Mohammad Raza et.al.	2501.16961	null
2025-01-28	Multiple Abstraction Level Retrieve Augment Generation	Zheng Zheng et.al.	2501.16952	null
2025-01-29	TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models	Makoto Shing et.al.	2501.16937	null
2025-01-28	Detecting harassment and defamation in cyberbullying with emotion-adaptive training	Peiling Yi et.al.	2501.16925	link
2025-01-28	RDMM: Fine-Tuned LLM Models for On-Device Robotic Decision Making with Enhanced Contextual Awareness in Specific Domains	Shady Nasrat et.al.	2501.16899	link
2025-01-28	Machine-learning semi-local exchange-correlation functionals for Kohn-Sham density functional theory of the Hubbard model	Eoghan Cronin et.al.	2501.16893	link
2025-01-28	Irony Detection, Reasoning and Understanding in Zero-shot Learning	Peiling Yi et.al.	2501.16884	null
2025-01-28	Comparing Human and LLM Generated Code: The Jury is Still Out!	Sherlock A. Licorish et.al.	2501.16857	null
2025-01-28	Adapting Network Information to Semantics for Generalizable and Plug-and-Play Multi-Scenario Network Diagnosis	Tiao Tan et.al.	2501.16842	null
2025-01-28	Misspellings in Natural Language Processing: A survey	Gianluca Sperduti et.al.	2501.16836	null
2025-01-28	DIRIGENt: End-To-End Robotic Imitation of Human Demonstrations Based on a Diffusion Model	Josua Spisak et.al.	2501.16800	null
2025-01-28	Algorithm for Automatic Legislative Text Consolidation	Matias Etcheverry et.al.	2501.16794	null
2025-01-28	Exponential Family Attention	Kevin Christian Wibisono et.al.	2501.16790	link
2025-01-28	Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding	Yun Li et.al.	2501.16786	null
2025-01-28	TORCHLIGHT: Shedding LIGHT on Real-World Attacks on Cloudless IoT Devices Concealed within the Tor Network	Yumingzhi Pan et.al.	2501.16784	null
2025-01-28	A Stochastic Dynamical Theory of LLM Self-Adversariality: Modeling Severity Drift as a Critical Process	Jack David Carson et.al.	2501.16783	null
2025-01-29	Beyond-Labels: Advancing Open-Vocabulary Segmentation With Vision-Language Models	Muhammad Atta ur Rahman et.al.	2501.16769	null
2025-01-28	DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation	Chenguo Lin et.al.	2501.16764	null
2025-01-28	HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns	Xinyue Shen et.al.	2501.16750	link
2025-01-28	Through the Prism of Culture: Evaluating LLMs' Understanding of Indian Subcultures and Traditions	Garima Chhikara et.al.	2501.16748	null
2025-01-28	LLM Assisted Anomaly Detection Service for Site Reliability Engineers: Enhancing Cloud Infrastructure Resilience	Nimesh Jha et.al.	2501.16744	null
2025-01-28	Distilling Large Language Models for Network Active Queue Management	Deol Satish et.al.	2501.16734	null
2025-01-28	xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking	Sunbowen Lee et.al.	2501.16727	link
2025-01-28	One Head Eight Arms: Block Matrix based Low Rank Adaptation for CLIP-based Few-Shot Learning	Chunpeng Zhou et.al.	2501.16720	null
2025-01-28	Outlier Synthesis via Hamiltonian Monte Carlo for Out-of-Distribution Detection	Hengzhuang Li et.al.	2501.16718	link
2025-01-28	3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow	Yueen Ma et.al.	2501.16698	null
2025-01-28	MME-Industry: A Cross-Industry Multimodal Evaluation Benchmark	Dongyi Yi et.al.	2501.16688	null
2025-01-28	Auto-Differentiating Any LLM Workflow: A Farewell to Manual Prompting	Li Yin et.al.	2501.16673	link
2025-01-28	VeriFact: Verifying Facts in LLM-Generated Clinical Text with Electronic Health Records	Philip Chung et.al.	2501.16672	link
2025-01-28	Contextual Reinforcement in Multimodal Token Compression for Large Language Models	Naderdel Piero et.al.	2501.16658	null
2025-01-28	Large Language Model Critics for Execution-Free Evaluation of Code Changes	Aashish Yadavally et.al.	2501.16655	link
2025-01-28	Molecular-driven Foundation Model for Oncologic Pathology	Anurag Vaidya et.al.	2501.16652	link
2025-01-28	DOCS: Quantifying Weight Similarity for Deeper Insights into Large Language Models	Zeping Min et.al.	2501.16650	null
2025-01-28	An LLM Benchmark for Addressee Recognition in Multi-modal Multi-party Dialogue	Koji Inoue et.al.	2501.16643	null
2025-01-28	CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs	Jinlan Fu et.al.	2501.16629	link
2025-01-28	Few-Shot Optimized Framework for Hallucination Detection in Resource-Limited NLP Systems	Baraa Hikal et.al.	2501.16616	null
2025-01-28	Sparse Autoencoders Trained on the Same Data Learn Different Features	Gonçalo Paulo et.al.	2501.16615	null
2025-01-28	Fine-Tuned Language Models as Space Systems Controllers	Enrico M. Zucchelli et.al.	2501.16588	null
2025-01-27	AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models	Zheng Lian et.al.	2501.16566	null
2025-01-27	LoRA-X: Bridging Foundation Models with Training-Free Cross-Model Adaptation	Farzad Farhadzadeh et.al.	2501.16559	null
2025-01-27	Distributional Information Embedding: A Framework for Multi-bit Watermarking	Haiyun He et.al.	2501.16558	null
2025-01-27	PackDiT: Joint Human Motion and Text Generation via Mutual Prompting	Zhongyu Jiang et.al.	2501.16551	null
2025-01-27	PhysAnimator: Physics-Guided Generative Cartoon Animation	Tianyi Xie et.al.	2501.16550	null
2025-01-27	Sample-Efficient Behavior Cloning Using General Domain Knowledge	Feiyu Zhu et.al.	2501.16546	null
2025-01-27	Generalized Mission Planning for Heterogeneous Multi-Robot Teams via LLM-constructed Hierarchical Trees	Piyush Gupta et.al.	2501.16539	null
2025-01-27	Targeting Alignment: Extracting Safety Classifiers of Aligned LLMs	Jean-Charles Noirot Ferrand et.al.	2501.16534	null
2025-01-27	A comparison of data filtering techniques for English-Polish LLM-based machine translation in the biomedical domain	Jorge del Pozo Lérida et.al.	2501.16533	null
2025-01-27	Programming by Examples Meets Historical Linguistics: A Large Language Model Based Approach to Sound Law Induction	Atharva Naik et.al.	2501.16524	null
2025-01-27	How well can LLMs Grade Essays in Arabic?	Rayed Ghazawi et.al.	2501.16516	null
2025-01-27	Deception in LLMs: Self-Preservation and Autonomous Goals in Large Language Models	Sudarshan Kamath Barkur et.al.	2501.16513	null
2025-01-27	Smoothed Embeddings for Robust Language Models	Ryo Hase et.al.	2501.16497	null
2025-01-27	Explaining GitHub Actions Failures with Large Language Models: Challenges, Insights, and Limitations	Pablo Valenzuela-Toledo et.al.	2501.16495	null
2025-01-27	Generating customized prompts for Zero-Shot Rare Event Medical Image Classification using LLM	Payal Kamboj et.al.	2501.16481	link
2025-01-27	Cross-Domain Semantic Segmentation with Large Language Model-Assisted Descriptor Generation	Philip Hughes et.al.	2501.16467	null
2025-01-27	CoCoNUT: Structural Code Understanding does not fall out of a tree	Claas Beger et.al.	2501.16456	link
2025-01-27	Detecting Zero-Day Attacks in Digital Substations via In-Context Learning	Faizan Manzoor et.al.	2501.16453	null
2025-01-27	360Brew: A Decoder-only Foundation Model for Personalized Ranking and Recommendation	Hamed Firooz et.al.	2501.16450	null
2025-01-27	DynAlign: Unsupervised Dynamic Taxonomy Alignment for Cross-Domain Segmentation	Han Sun et.al.	2501.16410	null
2025-01-27	Evaluating The Performance of Using Large Language Models to Automate Summarization of CT Simulation Orders in Radiation Oncology	Meiyun Cao et.al.	2501.16309	null
2025-01-27	RAPID: Retrieval-Augmented Parallel Inference Drafting for Text-Based Video Event Retrieval	Long Nguyen et.al.	2501.16303	null
2025-01-27	Matryoshka Re-Ranker: A Flexible Re-Ranking Architecture With Configurable Depth and Width	Zheng Liu et.al.	2501.16302	null
2025-01-27	Large Models in Dialogue for Active Perception and Anomaly Detection	Tzoulio Chamiti et.al.	2501.16300	link
2025-01-27	FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers	Renshan Zhang et.al.	2501.16297	null
2025-01-27	Brain-Adapter: Enhancing Neurological Disorder Analysis with Adapter-Tuning Multimodal Large Language Models	Jing Zhang et.al.	2501.16282	null
2025-01-27	Do LLMs Have Visualization Literacy? An Evaluation on Modified Visualizations to Test Generalization in Data Interpretation	Jiayi Hong et.al.	2501.16277	link
2025-01-27	URAG: Implementing a Unified Hybrid RAG for Precise Answers in University Admission Chatbots -- A Case Study at HCMUT	Long Nguyen et.al.	2501.16276	null
2025-01-27	A foundation model for human-AI collaboration in medical literature mining	Zifeng Wang et.al.	2501.16255	null
2025-01-27	Multi-Agent Geospatial Copilots for Remote Sensing Workflows	Chaehong Lee et.al.	2501.16254	null
2025-01-27	Zero-Shot Decision Tree Construction via Large Language Models	Lucas Carrasco et.al.	2501.16247	null
2025-01-27	CLISC: Bridging clip and sam by enhanced cam for unsupervised brain tumor segmentation	Xiaochuan Ma et.al.	2501.16246	null
2025-01-27	Phase Transitions in Large Language Models and the $O(N)$ Model	Youran Sun et.al.	2501.16241	null
2025-01-27	AiGet: Transforming Everyday Moments into Hidden Knowledge Discovery with AI Assistance on Smart Glasses	Runze Cai et.al.	2501.16240	null
2025-01-28	Distilling foundation models for robust and efficient models in digital pathology	Alexandre Filiot et.al.	2501.16239	null
2025-01-27	Language-Based Bayesian Optimization Research Assistant (BORA)	Abdoulatif Cissé et.al.	2501.16224	null
2025-01-27	Enhancing Visual Inspection Capability of Multi-Modal Large Language Models on Medical Time Series with Supportive Conformalized and Interpretable Small Specialized Models	Huayu Li et.al.	2501.16215	link
2025-01-27	Provence: efficient and robust context pruning for retrieval-augmented generation	Nadezhda Chirkova et.al.	2501.16214	null
2025-01-27	Raiders of the Lost Dependency: Fixing Dependency Conflicts in Python using LLMs	Antony Bartlett et.al.	2501.16191	null
2025-01-27	SWIFT: Mapping Sub-series with Wavelet Decomposition Improves Time Series Forecasting	Wenxuan Xie et.al.	2501.16178	link
2025-01-27	BAG: Body-Aligned 3D Wearable Asset Generation	Zhongjin Luo et.al.	2501.16177	null
2025-01-27	Will Systems of LLM Agents Cooperate: An Investigation into a Social Dilemma	Richard Willis et.al.	2501.16173	link
2025-01-27	MetaDecorator: Generating Immersive Virtual Tours through Multimodality	Shuang Xie et.al.	2501.16164	null
2025-01-27	CITYWALK: Enhancing LLM-Based C++ Unit Test Generation via Project-Dependency Awareness and Language-Specific Knowledge	Yuwei Zhang et.al.	2501.16155	null
2025-01-27	AdaCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Chain-of-Thought	Xin Huang et.al.	2501.16154	null
2025-01-27	AI Agents for Computer Use: A Review of Instruction-based Computer Control, GUI Automation, and Operator Assistants	Pascal J. Sager et.al.	2501.16150	null
2025-01-27	PATCH: Empowering Large Language Model with Programmer-Intent Guidance and Collaborative-Behavior Simulation for Automatic Bug Fixing	Yuwei Zhang et.al.	2501.16149	null
2025-01-27	SampleLLM: Optimizing Tabular Data Synthesis in Recommendations	Jingtong Gao et.al.	2501.16125	null
2025-01-27	Using Generative Models to Produce Realistic Populations of UK Windstorms	Yee Chun Tsoi et.al.	2501.16110	null
2025-01-27	Integration of LLM Quality Assurance into an NLG System	Ching-Yi Chen et.al.	2501.16078	null
2025-01-27	PISCO: Pretty Simple Compression for Retrieval-Augmented Generation	Maxime Louis et.al.	2501.16075	null
2025-01-27	A generative material transformer using Wyckoff representation	Pierre-Paul De Breuck et.al.	2501.16051	null
2025-01-27	Skeleton-Guided-Translation: A Benchmarking Framework for Code Repository Translation with Fine-Grained Quality Evaluation	Xing Zhang et.al.	2501.16050	null
2025-01-27	PRISMe: A Novel LLM-Powered Tool for Interactive Privacy Policy Assessment	Vincent Freiberger et.al.	2501.16033	null
2025-01-27	FDLLM: A Text Fingerprint Detection Method for LLMs in Multi-Language, Multi-Domain Black-Box Environments	Zhiyuan Fu et.al.	2501.16029	null
2025-01-27	Transformability reveals the interplay of dynamics across different network orders	Ming Xie et.al.	2501.16016	null
2025-01-27	TOPLOC: A Locality Sensitive Hashing Scheme for Trustless Verifiable Inference	Jack Min Ong et.al.	2501.16007	null
2025-01-27	EDSep: An Effective Diffusion-Based Method for Speech Source Separation	Jinwei Dong et.al.	2501.15965	null
2025-01-27	Rethinking the Bias of Foundation Model under Long-tailed Distribution	Jiahao Chen et.al.	2501.15955	null
2025-01-27	Understanding Long Videos via LLM-Powered Entity Relation Graphs	Meng Chu et.al.	2501.15953	null
2025-01-27	TimeHF: Billion-Scale Time Series Models Guided by Human Feedback	Yongzhi Qi et.al.	2501.15942	null
2025-01-27	SkillScope: A Tool to Predict Fine-Grained Skills Needed to Solve Issues on GitHub	Benjamin C. Carter et.al.	2501.15922	null
2025-01-27	Parametric Retrieval Augmented Generation	Weihang Su et.al.	2501.15915	link
2025-01-27	Robust Mobile Robot Path Planning via LLM-Based Dynamic Waypoint Generation	Muhammad Taha Tariq et.al.	2501.15901	null
2025-01-27	Investigating the Sensitivity of Pre-trained Audio Embeddings to Common Effects	Victor Deng et.al.	2501.15900	null
2025-01-27	Adaptive Width Neural Networks	Federico Errica et.al.	2501.15889	null
2025-01-27	LCTG Bench: LLM Controlled Text Generation Benchmark	Kentaro Kurihara et.al.	2501.15875	link
2025-01-27	LLM-attacker: Enhancing Closed-loop Adversarial Scenario Generation for Autonomous Driving with Large Language Models	Yuewen Mei et.al.	2501.15850	null
2025-01-27	SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model	Delin Qu et.al.	2501.15830	null
2025-01-27	Aging-aware CPU Core Management for Embodied Carbon Amortization in Cloud LLM Inference	Tharindu B. Hewage et.al.	2501.15829	link
2025-01-27	MADP: Multi-Agent Deductive Planning for Enhanced Cognitive-Behavioral Mental Health Question Answer	Qi Chen et.al.	2501.15826	null
2025-01-27	LemmaHead: RAG Assisted Proof Generation Using Large Language Models	Tianbo Yang et.al.	2501.15797	null
2025-01-27	Can Multimodal Large Language Models be Guided to Improve Industrial Anomaly Detection?	Zhiling Chen et.al.	2501.15795	null
2025-01-27	Harnessing Diverse Perspectives: A Multi-Agent Framework for Enhanced Error Detection in Knowledge Graphs	Yu Li et.al.	2501.15791	link
2025-01-27	Memorization and Regularization in Generative Diffusion Models	Ricardo Baptista et.al.	2501.15785	link
2025-01-27	Large Language Models to Diffusion Finetuning	Edoardo Cetin et.al.	2501.15781	null
2025-01-27	Is It Navajo? Accurate Language Detection in Endangered Athabaskan Languages	Ivory Yang et.al.	2501.15773	link
2025-01-27	GraphICL: Unlocking Graph Learning Potential in LLMs through Structured Prompt Design	Yuanfu Sun et.al.	2501.15755	null
2025-01-27	IndicMMLU-Pro: Benchmarking the Indic Large Language Models	Sankalp KJ et.al.	2501.15747	null
2025-01-27	Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and Reasoning	Michael Xieyang Liu et.al.	2501.15727	null
2025-01-27	A Survey on Computational Pathology Foundation Models: Datasets, Adaptation Strategies, and Evaluation Tasks	Dong Li et.al.	2501.15724	null
2025-01-27	On Parallelism in Music and Language: A Perspective from Symbol Emergence Systems based on Probabilistic Generative Models	Tadahiro Taniguchi et.al.	2501.15721	null
2025-01-26	Adapting Biomedical Abstracts into Plain language using Large Language Models	Haritha Gangavarapu et.al.	2501.15700	null
2025-01-26	TensorLLM: Tensorising Multi-Head Attention for Enhanced Reasoning and Compression in LLMs	Yuxuan Gu et.al.	2501.15674	null
2025-01-26	Bringing Characters to New Stories: Training-Free Theme-Specific Image Generation via Dynamic Visual Prompting	Yuxin Zhang et.al.	2501.15641	null
2025-01-26	BoKDiff: Best-of-K Diffusion Alignment for Target-Specific 3D Molecule Generation	Ali Khodabandeh Yalabadi et.al.	2501.15631	link
2025-01-26	Improving Estonian Text Simplification through Pretrained Language Models and Custom Datasets	Eduard Barbu et.al.	2501.15624	null
2025-01-26	Rethinking External Slow-Thinking: From Snowball Errors to Probability of Correct Reasoning	Zeyu Gan et.al.	2501.15602	link
2025-01-26	Evaluating an LLM-Powered Chatbot for Cognitive Restructuring: Insights from Mental Health Professionals	Yinzhou Wang et.al.	2501.15599	null
2025-01-26	Diffusion Generative Modeling for Spatially Resolved Gene Expression Inference from Histology Images	Sichen Zhu et.al.	2501.15598	link
2025-01-26	SedarEval: Automated Evaluation using Self-Adaptive Rubrics	Zhiyuan Fan et.al.	2501.15595	link
2025-01-26	SCP-116K: A High-Quality Problem-Solution Dataset and a Generalized Pipeline for Automated Extraction in the Higher Education Science Domain	Dakuan Lu et.al.	2501.15587	link
2025-01-26	Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework	Yuhong Sun et.al.	2501.15581	null
2025-01-26	Instruction Tuning for Story Understanding and Generation with Weak Supervision	Yangshu Yuan et.al.	2501.15574	null
2025-01-26	Cross-Cultural Fashion Design via Interactive Large Language Models and Diffusion Models	Spencer Ramsey et.al.	2501.15571	null
2025-01-26	ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer	Lin Yueyu et.al.	2501.15570	link
2025-01-26	Ocean-OCR: Towards General OCR Application via a Vision-Language Model	Song Chen et.al.	2501.15558	link
2025-01-26	Advancing Generative Artificial Intelligence and Large Language Models for Demand Side Management with Electric Vehicles	Hanwen Zhang et.al.	2501.15544	null
2025-01-26	Estimating Committor Functions via Deep Adaptive Sampling on Rare Transition Paths	Yueyang Wang et.al.	2501.15522	null
2025-01-26	Domain Adaptation from Generated Multi-Weather Images for Unsupervised Maritime Object Classification	Dan Song et.al.	2501.15503	null
2025-01-26	Unveiling the Potential of Multimodal Retrieval Augmented Generation with Planning	Xiaohan Yu et.al.	2501.15470	null
2025-01-26	Data-adaptive Safety Rules for Training Reward Models	Xiaomin Li et.al.	2501.15453	null
2025-01-26	OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas	Xiaoyang Wang et.al.	2501.15427	null
2025-01-26	Visual Generation Without Guidance	Huayu Chen et.al.	2501.15420	link
2025-01-26	AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement	Junan Zhang et.al.	2501.15417	null
2025-01-26	The Potential of Large Language Models in Supply Chain Management: Advancing Decision-Making, Efficiency, and Innovation	Raha Aghaei et.al.	2501.15411	null
2025-01-26	Semantic Layered Embedding Diffusion in Large Language Models for Multi-Contextual Consistency	Irin Kabakum et.al.	2501.15405	null
2025-01-26	How Green are Neural Language Models? Analyzing Energy Consumption in Text Summarization Fine-tuning	Tohida Rehman et.al.	2501.15398	null
2025-01-26	Zero-Shot Interactive Text-to-Image Retrieval via Diffusion-Augmented Representations	Zijun Long et.al.	2501.15379	null
2025-01-26	How to Mitigate Information Loss in Knowledge Graphs for GraphRAG: Leveraging Triple Context Restoration and Query-Driven Feedback	Manzong Huang et.al.	2501.15378	null
2025-01-26	Evaluating the Effectiveness of XAI Techniques for Encoder-Based Language Models	Melkamu Abay Mersha et.al.	2501.15374	null
2025-01-26	Scaling Large Vision-Language Models for Enhanced Multimodal Comprehension In Biomedical Image Analysis	Robinson Umeike et.al.	2501.15370	null
2025-01-26	Decentralized Low-Rank Fine-Tuning of Large Language Models	Sajjad Ghiasvand et.al.	2501.15361	null
2025-01-26	Large Language Models as Theory of Mind Aware Generative Agents with Counterfactual Reflection	Bo Yang et.al.	2501.15355	null
2025-01-25	Fairness in LLM-Generated Surveys	Andrés Abeliuk et.al.	2501.15351	null
2025-01-25	Between Puppet and Actor: Reframing Authorship in this Age of AI Agents	Yuqian Sun et.al.	2501.15346	null
2025-01-25	Recognize Any Surgical Object: Unleashing the Power of Weakly-Supervised Data	Jiajie Li et.al.	2501.15326	null
2025-01-25	ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning	Shangqian Gao et.al.	2501.15316	null
2025-01-25	The Multicultural Medical Assistant: Can LLMs Improve Medical ASR Errors Across Borders?	Ayo Adedeji et.al.	2501.15310	null
2025-01-25	You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning	Ayan Sengupta et.al.	2501.15296	null
2025-01-24	HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation	Xin Zhou et.al.	2501.14729	link
2025-01-24	Do LLMs Provide Consistent Answers to Health-Related Questions across Languages?	Ipek Baris Schlicht et.al.	2501.14719	null
2025-01-24	Towards Better Understanding Table Instruction Tuning: Decoupling the Effects from Data versus Models	Naihao Deng et.al.	2501.14717	null
2025-01-24	FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing	James Seale Smith et.al.	2501.14713	null
2025-01-24	The Karp Dataset	Mason DiCicco et.al.	2501.14705	null
2025-01-24	Rethinking Table Instruction Tuning	Naihao Deng et.al.	2501.14693	null
2025-01-24	Rethinking Foundation Models for Medical Image Classification through a Benchmark Study on MedMNIST	Fuping Wu et.al.	2501.14685	null
2025-01-24	An Empirical Study on LLM-based Classification of Requirements-related Provisions in Food-safety Regulations	Shabnam Hassani et.al.	2501.14683	null
2025-01-24	Diffusion based Text-to-Music Generationwith Global and Local Text based Conditioning	Jisi Zhang et.al.	2501.14680	null
2025-01-24	MedAgentBench: Dataset for Benchmarking LLMs as Agents in Medical Applications	Yixing Jiang et.al.	2501.14654	link
2025-01-24	Investigating the (De)Composition Capabilities of Large Language Models in Natural-to-Formal Language Conversion	Ziyao Xu et.al.	2501.14649	link
2025-01-24	Towards Scalable Topological Regularizers	Hiu-Tung Wong et.al.	2501.14641	null
2025-01-24	Recommending Actionable Strategies: A Semantic Approach to Integrating Analytical Frameworks with Decision Heuristics	Renato Ghisellini et.al.	2501.14634	null
2025-01-24	Extracting Problem Structure with LLMs for Optimized SAT Local Search	André Schilder et.al.	2501.14630	null
2025-01-24	Single-neuron deep generative model uncovers underlying physics of neuronal activity in Ca imaging data	Jordi Abante et.al.	2501.14615	null
2025-01-24	ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations	Tianming Liang et.al.	2501.14607	null
2025-01-24	Leveraging ChatGPT's Multimodal Vision Capabilities to Rank Satellite Images by Poverty Level: Advancing Tools for Social Science Research	Hamid Sarmadi et.al.	2501.14546	null
2025-01-24	VERUS-LM: a Versatile Framework for Combining LLMs with Symbolic Reasoning	Benjamin Callewaert et.al.	2501.14540	null
2025-01-24	Design and Implementation of a Psychiatry Resident Training System Based on Large Language Models	Zhenguang Zhong et.al.	2501.14530	link
2025-01-24	Scene Understanding Enabled Semantic Communication with Open Channel Coding	Zhe Xiang et.al.	2501.14520	null
2025-01-24	Real-world Edge Neural Network Implementations Leak Private Interactions Through Physical Side Channel	Zhuoran Liu et.al.	2501.14512	null
2025-01-24	Automated Assignment Grading with Large Language Models: Insights From a Bioinformatics Course	Pavlin G. Poličar et.al.	2501.14499	null
2025-01-24	Evaluating and Improving Graph to Text Generation with Large Language Models	Jie He et.al.	2501.14497	link
2025-01-24	RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques	Zhengyang Tang et.al.	2501.14492	link
2025-01-24	Pesti-Gen: Unleashing a Generative Molecule Approach for Toxicity Aware Pesticide Design	Taehan Kim et.al.	2501.14469	null
2025-01-24	Boundary Value Test Input Generation Using Prompt Engineering with LLMs: Fault Detection and Coverage Analysis	Xiujing Guo et.al.	2501.14465	null
2025-01-24	Understanding and Mitigating Gender Bias in LLMs via Interpretable Neuron Editing	Zeping Yu et.al.	2501.14457	null
2025-01-24	Domaino1s: Guiding LLM Reasoning for Explainable Answers in High-Stakes Domains	Xu Chu et.al.	2501.14431	null
2025-01-24	GraphBC: Improving LLMs for Better Graph Data Processing	Xu Chu et.al.	2501.14427	null
2025-01-24	CENTS: Generating synthetic electricity consumption time series for rare and unseen scenarios	Michael Fuest et.al.	2501.14426	null
2025-01-24	DeepFlow: Serverless Large Language Model Serving at Scale	Junhao Hu et.al.	2501.14417	null
2025-01-24	SKIL: Semantic Keypoint Imitation Learning for Generalizable Data-efficient Manipulation	Shengjie Wang et.al.	2501.14400	null
2025-01-24	ECTIL: Label-efficient Computational Tumour Infiltrating Lymphocyte (TIL) assessment in breast cancer: Multicentre validation in 2,340 patients with breast cancer	Yoni Schirris et.al.	2501.14379	link
2025-01-24	DRESSing Up LLM: Efficient Stylized Question-Answering via Style Subspace Editing	Xinyu Ma et.al.	2501.14371	link
2025-01-24	Uncovering the bias in the evidence for dynamical dark energy through minimal and generalized modeling approaches	Ziad Sakr et.al.	2501.14366	null
2025-01-24	FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration	Kai-Tuo Xu et.al.	2501.14350	link
2025-01-24	Chain-of-Retrieval Augmented Generation	Liang Wang et.al.	2501.14342	null
2025-01-24	Exploring the sustainable scaling of AI dilemma: A projective study of corporations' AI environmental impacts	Clément Desroches et.al.	2501.14334	null
2025-01-24	Assessing Large Language Models in Comprehending and Verifying Concurrent Programs across Memory Models	Ridhi Jain et.al.	2501.14326	null
2025-01-24	PAID: A Framework of Product-Centric Advertising Image Design	Hongyu Chen et.al.	2501.14316	null
2025-01-24	Locality-aware Fair Scheduling in LLM Serving	Shiyi Cao et.al.	2501.14312	null
2025-01-24	A Zero-Shot LLM Framework for Automatic Assignment Grading in Higher Education	Calvin Yeung et.al.	2501.14305	link
2025-01-24	MASTER: A Multi-Agent System with LLM Specialized MCTS	Bingzheng Gan et.al.	2501.14304	null
2025-01-24	Fast Think-on-Graph: Wider, Deeper and Faster Reasoning of Large Language Model on Knowledge Graph	Xujian Liang et.al.	2501.14300	link
2025-01-24	Multi-stage Large Language Model Pipelines Can Outperform GPT-4o in Relevance Assessment	Julian A. Schnabel et.al.	2501.14296	null
2025-01-24	Examining Alignment of Large Language Models through Representative Heuristics: The Case of Political Stereotypes	Sullam Jeoung et.al.	2501.14294	link
2025-01-24	Advances in Temporal Point Processes: Bayesian, Deep, and LLM Approaches	Feng Zhou et.al.	2501.14291	null
2025-01-24	Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation	Sadegh Mahdavi et.al.	2501.14275	link
2025-01-24	Siren: A Learning-Based Multi-Turn Attack Framework for Simulating Real-World Human Jailbreak Behaviors	Yi Zhao et.al.	2501.14250	link
2025-01-24	Humanity's Last Exam	Long Phan et.al.	2501.14249	null
2025-01-24	Multi-agent KTO: Reinforcing Strategic Interactions of Large Language Model in Language Game	Rong Ye et.al.	2501.14225	null
2025-01-24	Top Ten Challenges Towards Agentic Neural Graph Databases	Jiaxin Bai et.al.	2501.14224	null
2025-01-24	TFG-Flow: Training-free Guidance in Multimodal Generative Flow	Haowei Lin et.al.	2501.14216	null
2025-01-24	Serving Long-Context LLMs at the Mobile Edge: Test-Time Reinforcement Learning-based Model Caching and Inference Offloading	Minrui Xu et.al.	2501.14205	null
2025-01-24	VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking	Runyi Hu et.al.	2501.14195	link
2025-01-24	Distributed Multi-Agent Coordination Using Multi-Modal Foundation Models	Saaduddin Mahmud et.al.	2501.14189	null
2025-01-24	GeoSim.AI: AI assistants for numerical simulations in geomechanics	Yared W. Bekele et.al.	2501.14186	null
2025-01-24	AI Chatbots as Professional Service Agents: Developing a Professional Identity	Wenwen Li et.al.	2501.14179	null
2025-01-24	Argos: Agentic Time-Series Anomaly Detection with Autonomous Rule Generation via Large Language Models	Yile Gu et.al.	2501.14170	null
2025-01-24	Test-Time Code-Switching for Cross-lingual Aspect Sentiment Triplet Extraction	Dongming Sheng et.al.	2501.14144	null
2025-01-23	Autonomous Structural Memory Manipulation for Large Language Models Using Hierarchical Embedding Augmentation	Derek Yotheringhay et.al.	2501.14119	null
2025-01-23	Domain-Factored Untrained Deep Prior for Spectrum Cartography	Subash Timilsina et.al.	2501.14116	null
2025-01-23	MedSlice: Fine-Tuned Large Language Models for Secure Clinical Note Sectioning	Joshua Davis et.al.	2501.14105	link
2025-01-23	StreamingRAG: Real-time Contextual Retrieval and Generation Framework	Murugan Sankaradas et.al.	2501.14101	null
2025-01-23	Enhancing Biomedical Relation Extraction with Directionality	Po-Ting Lai et.al.	2501.14079	link
2025-01-23	LLMs are Vulnerable to Malicious Prompts Disguised as Scientific Language	Yubin Ge et.al.	2501.14073	null
2025-01-23	Efficient 2D CT Foundation Model for Contrast Phase Classification	Benjamin Hou et.al.	2501.14066	null
2025-01-23	Revisiting CLIP: Efficient Alignment of 3D MRI and Tabular Data using Domain-Specific Foundation Models	Jakob Krogh Petersen et.al.	2501.14051	link
2025-01-23	LLM-guided Instance-level Image Manipulation with Diffusion U-Net Cross-Attention Maps	Andrey Palaev et.al.	2501.14046	link
2025-01-23	Leveraging Large Language Models to Analyze Emotional and Contextual Drivers of Teen Substance Use in Online Discussions	Jianfeng Zhu et.al.	2501.14037	null
2025-01-23	CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation	Guofeng Cui et.al.	2501.13927	null
2025-01-23	Improving Video Generation with Human Feedback	Jie Liu et.al.	2501.13918	null
2025-01-23	Binary Diffusion Probabilistic Model	Vitaliy Kinakh et.al.	2501.13915	null
2025-01-23	Analysis of Indic Language Capabilities in LLMs	Aatman Vaidya et.al.	2501.13912	null
2025-01-23	Privacy-Preserving Personalized Federated Prompt Learning for Multimodal Large Language Models	Linh Tran et.al.	2501.13904	null
2025-01-23	Exploring Finetuned Audio-LLM on Heart Murmur Features	Adrian Florea et.al.	2501.13884	null
2025-01-23	The machine learning platform for developers of large systems	Alexey Naikov et.al.	2501.13881	null
2025-01-23	A RAG-Based Institutional Assistant	Gustavo Kuratomi et.al.	2501.13880	null
2025-01-23	On the Reasoning Capacity of AI Models and How to Quantify It	Santosh Kumar Radha et.al.	2501.13833	null
2025-01-23	Predicting Compact Phrasal Rewrites with Large Language Models for ASR Post Editing	Hao Zhang et.al.	2501.13831	null
2025-01-23	Hallucinations Can Improve Large Language Models in Drug Discovery	Shuzhou Yuan et.al.	2501.13824	null
2025-01-23	Large Language Model driven Policy Exploration for Recommender Systems	Jie Wang et.al.	2501.13816	null
2025-01-23	Enhancing LLMs for Governance with Human Oversight: Evaluating and Aligning LLMs on Expert Classification of Climate Misinformation for Detecting False or Misleading Claims about Climate Change	Mowafak Allaham et.al.	2501.13802	null
2025-01-23	Parameter-Efficient Fine-Tuning for Foundation Models	Dan Zhang et.al.	2501.13787	link
2025-01-23	Not Every AI Problem is a Data Problem: We Should Be Intentional About Data Scaling	Tanya Rodchenko et.al.	2501.13779	null
2025-01-23	Explainable XR: Understanding User Behaviors of XR Environments using LLM-assisted Analytics Framework	Yoonsang Kim et.al.	2501.13778	link
2025-01-23	Do Large Language Models Truly Understand Geometric Structures?	Xiaofeng Wang et.al.	2501.13773	link
2025-01-23	Tune In, Act Up: Exploring the Impact of Audio Modality-Specific Edits on Large Audio Language Models in Jailbreak	Erjia Xiao et.al.	2501.13772	null
2025-01-23	UGMathBench: A Diverse and Dynamic Benchmark for Undergraduate-Level Mathematical Reasoning with Large Language Models	Xin Xu et.al.	2501.13766	null
2025-01-23	EICopilot: Search and Explore Enterprise Information over Large-scale Knowledge Graphs with LLM-driven Agents	Yuhui Yun et.al.	2501.13746	null
2025-01-23	GPT-HTree: A Decision Tree Framework Integrating Hierarchical Clustering and Large Language Models for Explainable Classification	Te Pei et.al.	2501.13743	null
2025-01-23	An Empirical Study of Retrieval-Augmented Code Generation: Challenges and Opportunities	Zezhou Yang et.al.	2501.13742	link
2025-01-23	Pseudocode-Injection Magic: Enabling LLMs to Tackle Graph Computational Tasks	Chang Gong et.al.	2501.13731	null
2025-01-23	RPO: Retrieval Preference Optimization for Robust Retrieval-Augmented Generation	Shi-Qi Yan et.al.	2501.13726	null
2025-01-23	Musical ethnocentrism in Large Language Models	Anna Kruspe et.al.	2501.13720	null
2025-01-23	A Mutual Information Perspective on Multiple Latent Variable Generative Models for Positive View Generation	Dario Serez et.al.	2501.13718	null
2025-01-23	EventVL: Understand Event Streams via Multimodal Large Language Model	Pengteng Li et.al.	2501.13707	null
2025-01-23	DI-BENCH: Benchmarking Large Language Models on Dependency Inference with Testable Repositories at Scale	Linghao Zhang et.al.	2501.13699	null
2025-01-23	Question Answering on Patient Medical Records with Private Fine-Tuned LLMs	Sara Kothari et.al.	2501.13687	null
2025-01-23	HumorReject: Decoupling LLM Safety from Refusal Prefix via A Little Humor	Zihui Wu et.al.	2501.13677	link
2025-01-23	How to Complete Domain Tuning while Keeping General Ability in LLM: Adaptive Layer-wise and Element-wise Regularization	Shezheng Song et.al.	2501.13669	null
2025-01-23	LVPruning: An Effective yet Simple Language-Guided Vision Token Pruning Approach for Multi-modal Large Language Models	Yizheng Sun et.al.	2501.13652	null
2025-01-23	Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models	Zhenghao Lin et.al.	2501.13629	null
2025-01-23	Text-to-SQL based on Large Language Models and Database Keyword Search	Eduardo R. Nascimento et.al.	2501.13594	null
2025-01-23	Improving Contextual Faithfulness of Large Language Models via Retrieval Heads-Induced Optimization	Lei Huang et.al.	2501.13573	null
2025-01-23	One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt	Tao Liu et.al.	2501.13554	link
2025-01-23	LLMs Can Plan Only If We Tell Them	Bilgehan Sel et.al.	2501.13545	null
2025-01-23	ReasVQA: Advancing VideoQA with Imperfect Reasoning Process	Jianxin Liang et.al.	2501.13536	null
2025-01-23	RECALL: Library-Like Behavior In Language Models is Enhanced by Self-Referencing Causal Cycles	Munachiso Nwadike et.al.	2501.13491	link
2025-01-23	Adaptive Testing for LLM-Based Applications: A Diversity-based Approach	Juyeon Yoon et.al.	2501.13480	null
2025-01-23	LDR-Net: A Novel Framework for AI-generated Image Detection via Localized Discrepancy Representation	JiaXin Chen et.al.	2501.13475	null
2025-01-23	Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge	Haomiao Xiong et.al.	2501.13468	link
2025-01-23	Spurious Forgetting in Continual Learning of Language Models	Junhao Zheng et.al.	2501.13453	link
2025-01-23	Softplus Attention with Re-weighting Boosts Length Extrapolation in Large Language Models	Bo Gao et.al.	2501.13428	null
2025-01-23	Predicting Turbulence Structure In Street-Canyon Flows using Deep Generative Modeling	Tomek Jaroslawski et.al.	2501.13415	null
2025-01-23	VulnBot: Autonomous Penetration Testing for A Multi-Agent Collaborative Framework	He Kong et.al.	2501.13411	link
2025-01-23	Towards Intelligent Design: A Self-driven Framework for Collocated Clothing Synthesis Leveraging Fashion Styles and Textures	Minglong Dong et.al.	2501.13396	null
2025-01-23	Can Large Language Models Understand Preferences in Personalized Recommendation?	Zhaoxuan Tan et.al.	2501.13391	link
2025-01-23	Do as We Do, Not as You Think: the Conformity of Large Language Models	Zhiyuan Weng et.al.	2501.13381	link
2025-01-23	Scalable Evaluation Framework for Foundation Models in Musculoskeletal MRI Bridging Computational Innovation with Clinical Utility	Gabrielle Hoyer et.al.	2501.13376	null
2025-01-23	Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement	Jae-Sung Bae et.al.	2501.13372	null
2025-01-23	Meta-Feature Adapter: Integrating Environmental Metadata for Enhanced Animal Re-identification	Yuzhuo Li et.al.	2501.13368	null
2025-01-23	50 Shades of Deceptive Patterns: A Unified Taxonomy, Multimodal Detection, and Security Implications	Zewei Shi et.al.	2501.13351	link
2025-01-23	MSF: Efficient Diffusion Model Via Multi-Scale Latent Factorize	Haohang Xu et.al.	2501.13349	null
2025-01-23	Full-Stack Optimized Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation	Rong Shan et.al.	2501.13344	null
2025-01-23	Multi-aspect Knowledge Distillation with Large Language Model	Taegyeong Lee et.al.	2501.13341	link
2025-01-23	Generative Multi-Form Bayesian Optimization	Zhendong Guo et.al.	2501.13337	null
2025-01-23	SplitLLM: Hierarchical Split Learning for Large Language Model over Wireless Network	Songge Zhang et.al.	2501.13318	null
2025-01-23	Representing Visualization Insights as a Dense Insight Network	Jane Hoffswell et.al.	2501.13309	null
2025-01-23	OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia	Xuelong Geng et.al.	2501.13306	link
2025-01-23	Watching the AI Watchdogs: A Fairness and Robustness Analysis of AI Safety Moderation Classifiers	Akshit Achara et.al.	2501.13302	link
2025-01-23	Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents	Shrinidhi Kumbhar et.al.	2501.13299	null
2025-01-23	RAMQA: A Unified Framework for Retrieval-Augmented Multi-Modal Question Answering	Yang Bai et.al.	2501.13297	link
2025-01-23	Toyteller: AI-powered Visual Storytelling Through Toy-Playing with Character Symbols	John Joon Young Chung et.al.	2501.13284	null
2025-01-22	MEDFORM: A Foundation Model for Contrastive Learning of CT Imaging and Clinical Numeric Data in Multi-Cancer Analysis	Daeun Jung et.al.	2501.13277	link
2025-01-22	RAG-Reward: Optimizing RAG with Reward Modeling and RLHF	Hanning Zhang et.al.	2501.13264	null
2025-01-22	Exploring GPT's Ability as a Judge in Music Understanding	Kun Fang et.al.	2501.13261	link
2025-01-22	Bypassing Array Canaries via Autonomous Function Call Resolution	Nathaniel Oh et.al.	2501.13256	link
2025-01-22	S-LoRA: Scalable Low-Rank Adaptation for Class Incremental Learning	Yichen Wu et.al.	2501.13198	null
2025-01-22	Computational modelling of biological systems now and then: revisiting tools and visions from the beginning of the century	Axel Loewe et.al.	2501.13142	null
2025-01-23	VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding	Boqiang Zhang et.al.	2501.13106	link
2025-01-22	Robust Representation Consistency Model via Contrastive Denoising	Jiachen Lei et.al.	2501.13094	link
2025-01-22	Refining Input Guardrails: Enhancing LLM-as-a-Judge Efficiency Through Chain-of-Thought Fine-Tuning and Alignment	Melissa Kazemi Rad et.al.	2501.13080	null
2025-01-22	Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning	Bohao Yang et.al.	2501.13042	link
2025-01-22	Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament	Yantao Liu et.al.	2501.13007	link
2025-01-22	Neural network enhanced cross entropy benchmark for monitored circuits	Yangrui Hu et.al.	2501.13005	null
2025-01-22	Large Language Model-Based Semantic Communication System for Image Transmission	Soheyb Ribouh et.al.	2501.12988	null
2025-01-22	LLM4WM: Adapting LLM for Wireless Multi-Tasking	Xuanyu Liu et.al.	2501.12983	null
2025-01-22	Low-dimensional adaptation of diffusion models: Convergence in total variation	Jiadong Liang et.al.	2501.12982	null
2025-01-22	OnionEval: An Unified Evaluation of Fact-conflicting Hallucination for Small-Large Language Models	Chongren Sun et.al.	2501.12975	link
2025-01-22	Accessible Smart Contracts Verification: Synthesizing Formal Models with Tamed LLMs	Jan Corazza et.al.	2501.12972	null
2025-01-22	It's complicated. The relationship of algorithmic fairness and non-discrimination regulations in the EU AI Act	Kristof Meding et.al.	2501.12962	null
2025-01-22	Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference	Weizhi Fei et.al.	2501.12959	null
2025-01-22	GANQ: GPU-Adaptive Non-Uniform Quantization for Large Language Models	Pengxiang Zhao et.al.	2501.12956	null
2025-01-22	3D Object Manipulation in a Single Image using Generative Models	Ruisi Zhao et.al.	2501.12935	null
2025-01-22	Correctness Assessment of Code Generated by Large Language Models Using Internal Representations	Tuan-Dung Bui et.al.	2501.12934	link
2025-01-22	DynamicEarth: How Far are We from Open-Vocabulary Change Detection?	Kaiyu Li et.al.	2501.12931	null
2025-01-22	A Functional Software Reference Architecture for LLM-Integrated Systems	Alessio Bucaioni et.al.	2501.12904	null
2025-01-22	Architectural Fusion Through Contextual Partitioning in Large Language Models: A Novel Approach to Parameterized Knowledge Integration	Offa Kingsleigh et.al.	2501.12901	null
2025-01-22	Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback	Yafu Li et.al.	2501.12895	link
2025-01-23	Generative AI Misuse Potential in Cyber Security Education: A Case Study of a UK Degree Program	Carlton Shepherd et.al.	2501.12883	null
2025-01-22	WisdomBot: Tuning Large Language Models with Artificial Intelligence Knowledge	Jingyuan Chen et.al.	2501.12877	null
2025-01-22	ACEBench: Who Wins the Match Point in Tool Learning?	Chen Chen et.al.	2501.12851	null
2025-01-22	AMM-Diff: Adaptive Multi-Modality Diffusion Network for Missing Modality Imputation	Aghiles Kebaili et.al.	2501.12840	null
2025-01-22	Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home	Viktor Moskvoretskii et.al.	2501.12835	null
2025-01-22	Open or Closed LLM for Lesser-Resourced Languages? Lessons from Greek	John Pavlopoulos et.al.	2501.12826	link
2025-01-22	Enhancing Monocular Depth Estimation with Multi-Source Auxiliary Tasks	Alessio Quercia et.al.	2501.12824	null
2025-01-22	Certified Guidance for Planning with Deep Generative Models	Francesco Giacomarra et.al.	2501.12815	null
2025-01-22	Revisit Self-Debugging with Self-Generated Tests for Code Generation	Xiancai Chen et.al.	2501.12793	null
2025-01-22	LLMs as Repositories of Factual Knowledge: Limitations and Solutions	Seyed Mahed Mousavi et.al.	2501.12774	null
2025-01-22	NExtLong: Toward Effective Long-Context Training without Long Documents	Chaochen Gao et.al.	2501.12766	link
2025-01-22	Online Preference Alignment for Language Models via Count-based Exploration	Chenjia Bai et.al.	2501.12735	link
2025-01-22	Paradigm-Based Automatic HDL Code Generation Using LLMs	Wenhao Sun et.al.	2501.12702	null
2025-01-22	Training Dialogue Systems by AI Feedback for Improving Overall Dialogue Impression	Kai Yoshida et.al.	2501.12698	null
2025-01-22	Combining Knowledge Graph and LLMs for Enhanced Zero-shot Visual Question Answering	Qian Tao et.al.	2501.12697	null
2025-01-22	SoundSpring: Loss-Resilient Audio Transceiver with Dual-Functional Masked Language Modeling	Shengshi Yao et.al.	2501.12696	null
2025-01-22	EchoLM: Accelerating LLM Serving with Real-time Knowledge Distillation	Yifan Yu et.al.	2501.12689	null
2025-01-22	Distillation Quantification for Large Language Models	Sunbowen Lee et.al.	2501.12619	link
2025-01-22	Deep Learning-Based Identification of Inconsistent Method Names: How Far Are We?	Taiming Wang et.al.	2501.12617	null
2025-01-22	Kimi k1.5: Scaling Reinforcement Learning with LLMs	Kimi Team et.al.	2501.12599	null
2025-01-22	Leveraging LLMs to Create a Haptic Devices' Recommendation System	Yang Liu et.al.	2501.12573	null
2025-01-22	Understanding the LLM-ification of CHI: Unpacking the Impact of LLMs at CHI through a Systematic Literature Review	Rock Yuren Pang et.al.	2501.12557	link
2025-01-21	Human-like conceptual representations emerge from language prediction	Ningyu Xu et.al.	2501.12547	null
2025-01-21	How Does the Spatial Distribution of Pre-training Data Affect Geospatial Foundation Models?	Mirali Purohit et.al.	2501.12535	null
2025-01-21	An Empirically-grounded tool for Automatic Prompt Linting and Repair: A Case Study on Bias, Vulnerability, and Optimization in Developer Prompts	Dhia Elhaq Rzig et.al.	2501.12521	null
2025-01-21	A Domain Adaptation Framework for Speech Recognition Systems with Only Synthetic data	Minh Tran et.al.	2501.12501	null
2025-01-21	The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws	Tian Jin et.al.	2501.12486	null
2025-01-21	An Empirical Characterization of Outages and Incidents in Public Services for Large Language Models	Xiaoyu Chu et.al.	2501.12469	link
2025-01-21	Adaptive PII Mitigation Framework for Large Language Models	Shubhi Asthana et.al.	2501.12465	null
2025-01-21	Empowering AIOps: Leveraging Large Language Models for IT Operations ManagementOperations Management	Arthur Vitui et.al.	2501.12461	link
2025-01-21	Deploying Privacy Guardrails for LLMs: A Comparative Analysis of Real-World Applications	Shubhi Asthana et.al.	2501.12456	null
2025-01-21	Divide-Then-Aggregate: An Efficient Tool Learning Method via Parallel Tool Invocation	Dongsheng Zhu et.al.	2501.12432	null
2025-01-21	FREYR: A Framework for Recognizing and Executing Your Requests	Roberto Gallotta et.al.	2501.12423	link
2025-01-21	CroMe: Multimodal Fake News Detection using Cross-Modal Tri-Transformer and Metric Learning	Eunjee Choi et.al.	2501.12422	null
2025-01-22	InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling	Yi Wang et.al.	2501.12386	link
2025-01-21	Accelerating Pulsar Parameter Estimation Using Convolutional Neural Networks	Greg Olmschenk et.al.	2501.12383	null
2025-01-21	MMVU: Measuring Expert-Level Multi-Discipline Video Understanding	Yilun Zhao et.al.	2501.12380	link
2025-01-22	Video Depth Anything: Consistent Depth Estimation for Super-Long Videos	Sili Chen et.al.	2501.12375	null
2025-01-21	Expertise elevates AI usage: experimental evidence comparing laypeople and professional artists	Thomas F. Eisenmann et.al.	2501.12374	link
2025-01-21	Is Long Context All You Need? Leveraging LLM's Extended Context for NL2SQL	Yeounoh Chung et.al.	2501.12372	null
2025-01-21	Automatic Labelling with Open-source LLMs using Dynamic Label Schema Integration	Thomas Walshe et.al.	2501.12332	null
2025-01-21	Cinepro: Robust Training of Foundation Models for Cancer Detection in Prostate Ultrasound Cineloops	Mohamed Harmanani et.al.	2501.12331	link
2025-01-21	VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model	Xianwei Zhuang et.al.	2501.12327	link
2025-01-21	LLM-Assisted Knowledge Graph Completion for Curriculum and Domain Modelling in Personalized Higher Education Recommendations	Hasan Abu-Rasheed et.al.	2501.12300	null
2025-01-21	MoGERNN: An Inductive Traffic Predictor for Unobserved Locations in Dynamic Sensing Networks	Qishen Zhou et.al.	2501.12281	link
2025-01-21	Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement	Maosong Cao et.al.	2501.12273	link
2025-01-21	FOCUS: First Order Concentrated Updating Scheme	Yizhou Liu et.al.	2501.12243	null
2025-01-21	InsTALL: Context-aware Instructional Task Assistance with Multi-modal Large Language Models	Pha Nguyen et.al.	2501.12231	null
2025-01-21	CDW-CoT: Clustered Distance-Weighted Chain-of-Thoughts Reasoning	Yuanheng Fang et.al.	2501.12226	null
2025-01-21	Leveraging Large Language Models for Realizing Truly Intelligent User Interfaces	Allard Oelen et.al.	2501.12221	null
2025-01-21	You Can't Eat Your Cake and Have It Too: The Performance Degradation of LLMs with Jailbreak Defense	Wuyuao Mai et.al.	2501.12210	null
2025-01-21	Explainability for Vision Foundation Models: A Survey	Rémi Kazmierczak et.al.	2501.12203	null
2025-01-22	Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation	Zibo Zhao et.al.	2501.12202	link
2025-01-21	BiMarker: Enhancing Text Watermark Detection for Large Language Models with Bipolar Watermarks	Zhuang Li et.al.	2501.12174	null
2025-01-21	Contextualizing Recommendation Explanations with LLMs: A User Study	Yuanjun Feng et.al.	2501.12152	null
2025-01-21	Improving Influence-based Instruction Tuning Data Selection for Balanced Learning of Diverse Capabilities	Qirun Dai et.al.	2501.12147	null
2025-01-21	Do LLMs Provide Links to Code Similar to what they Generate? A Study with Gemini and Bing CoPilot	Daniele Bifolco et.al.	2501.12134	null
2025-01-21	Evaluating Efficiency and Engagement in Scripted and LLM-Enhanced Human-Robot Interactions	Tim Schreiter et.al.	2501.12128	null
2025-01-21	Can open source large language models be used for tumor documentation in Germany? -- An evaluation on urological doctors' notes	Stefan Lenz et.al.	2501.12106	link
2025-01-21	Dissecting the NVIDIA Hopper Architecture through Microbenchmarking and Multiple Level Analysis	Weile Luo et.al.	2501.12084	null
2025-01-21	Phishing Awareness via Game-Based Learning	Argianto Rahartomo et.al.	2501.12077	link
2025-01-21	PINNsAgent: Automated PDE Surrogation with Large Language Models	Qingpo Wuwu et.al.	2501.12053	null
2025-01-21	Harnessing Generative Pre-Trained Transformer for Datacenter Packet Trace Generation	Chen Griner et.al.	2501.12033	null
2025-01-21	Comparative Analysis of Pre-trained Deep Learning Models and DINOv2 for Cushing's Syndrome Diagnosis in Facial Analysis	Hongjun Liu et.al.	2501.12023	null
2025-01-21	Are Traditional Deep Learning Model Approaches as Effective as a Retinal-Specific Foundation Model for Ocular and Systemic Disease Detection?	Samantha Min Er Yew et.al.	2501.12016	null
2025-01-21	Rate-Aware Learned Speech Compression	Jun Xu et.al.	2501.11999	null
2025-01-21	Linear Feedback Control Systems for Iterative Prompt Optimization in Large Language Models	Rupesh Raj Karn et.al.	2501.11979	null
2025-01-21	Leveraging Graph Structures and Large Language Models for End-to-End Synthetic Task-Oriented Dialogues	Maya Medjad et.al.	2501.11977	link
2025-01-21	Bridging Visualization and Optimization: Multimodal Large Language Models on Graph-Structured Combinatorial Optimization	Jie Zhao et.al.	2501.11968	null
2025-01-21	A Hybrid Attention Framework for Fake News Detection with Large Language Models	Xiaochuan Xu et.al.	2501.11967	null
2025-01-21	**TAD-Bench: A Comprehensive Benchmark for Embedding-Based Text Anom

Name		Name	Last commit message	Last commit date
Latest commit History 184 Commits
.github/workflows		.github/workflows
docs		docs
README.md		README.md
config.yaml		config.yaml
daily_arxiv.py		daily_arxiv.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Updated on 2025.02.28

LLM Reasoning

LLM Evaluation

LLM MLLM

About

Releases

Packages

Languages

Xuchen-Li/llm-arxiv-daily

Folders and files

Latest commit

History

Repository files navigation

Updated on 2025.02.28

LLM Reasoning

LLM Evaluation

LLM MLLM

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages