Table of Contents
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-02-27 | FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving | Guizhen Chen et.al. | 2502.20238 | null |
2025-02-27 | Collaborative Stance Detection via Small-Large Language Model Consistency Verification | Yu Yan et.al. | 2502.19954 | null |
2025-02-27 | Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models | Yuan Sui et.al. | 2502.19918 | null |
2025-02-27 | Order Doesn't Matter, But Reasoning Does: Training LLMs with Order-Centric Augmentation | Qianxi He et.al. | 2502.19907 | null |
2025-02-27 | Towards Multimodal Large-Language Models for Parent-Child Interaction: A Focus on Joint Attention | Weiyan Shi et.al. | 2502.19877 | null |
2025-02-26 | Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning | Yanan Chen et.al. | 2502.19622 | null |
2025-02-26 | General Reasoning Requires Learning to Reason from the Get-go | Seungwook Han et.al. | 2502.19402 | null |
2025-02-26 | BIG-Bench Extra Hard | Mehran Kazemi et.al. | 2502.19187 | null |
2025-02-25 | Scalable Best-of-N Selection for Large Language Models via Self-Certainty | Zhewei Kang et.al. | 2502.18581 | null |
2025-02-25 | SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution | Yuxiang Wei et.al. | 2502.18449 | null |
2025-02-25 | Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning | Wenkai Yang et.al. | 2502.18080 | null |
2025-02-21 | Improving Value-based Process Verifier via Structural Prior Injection | Zetian Sun et.al. | 2502.17498 | null |
2025-02-24 | Making LLMs Reason? The Intermediate Language Problem in Neurosymbolic Approaches | Alexander Beiser et.al. | 2502.17216 | null |
2025-02-24 | Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI | Syed Abdul Gaffar Shakhadri et.al. | 2502.17092 | null |
2025-02-24 | Understanding the Uncertainty of LLM Explanations: A Perspective Based on Reasoning Topology | Longchao Da et.al. | 2502.17026 | null |
2025-02-24 | All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark | Davide Testa et.al. | 2502.16989 | null |
2025-02-24 | AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language Models | Qin Zhu et.al. | 2502.16906 | link |
2025-02-24 | The Blessing of Reasoning: LLM-Based Contrastive Explanations in Black-Box Recommender Systems | Yuyan Wang et.al. | 2502.16759 | null |
2025-02-23 | Reasoning about Affordances: Causal and Compositional Reasoning in LLMs | Magnus F. Gjerde et.al. | 2502.16606 | null |
2025-02-22 | ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning | Shulin Huang et.al. | 2502.16268 | null |
2025-02-27 | Dynamic Parallel Tree Search for Efficient LLM Reasoning | Yifu Ding et.al. | 2502.16235 | null |
2025-02-22 | Patterns Over Principles: The Fragility of Inductive Reasoning in LLMs under Noisy Observations | Chunyang Li et.al. | 2502.16169 | link |
2025-02-22 | Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models | Qianqi Yan et.al. | 2502.16033 | null |
2025-02-21 | MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use | Zaid Khan et.al. | 2502.15872 | null |
2025-02-21 | Do Multilingual LLMs Think In English? | Lisa Schut et.al. | 2502.15603 | null |
2025-02-21 | Evaluating Social Biases in LLM Reasoning | Xuyang Wu et.al. | 2502.15361 | null |
2025-02-21 | Stepwise Informativeness Search for Improving LLM Reasoning | Siyuan Wang et.al. | 2502.15335 | null |
2025-02-21 | Latent Factor Models Meets Instructions:Goal-conditioned Latent Factor Discovery without Task Supervision | Zhouhang Xie et.al. | 2502.15147 | null |
2025-02-19 | SIFT: Grounding LLM Reasoning in Contexts via Stickers | Zihao Zeng et.al. | 2502.14922 | null |
2025-02-18 | Think Inside the JSON: Reinforcement Strategy for Strict LLM Schema Adherence | Bhavik Agarwal et.al. | 2502.14905 | null |
2025-02-20 | Exploring Advanced Techniques for Visual Question Answering: A Comprehensive Comparison | Aiswarya Baby et.al. | 2502.14827 | null |
2025-02-20 | Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning | Tian Xie et.al. | 2502.14768 | link |
2025-02-19 | Enhancing LLM-Based Recommendations Through Personalized Reasoning | Jiahao Liu et.al. | 2502.13845 | null |
2025-02-19 | MCTS-KBQA: Monte Carlo Tree Search for Knowledge Base Question Answering | Guanming Xiong et.al. | 2502.13428 | null |
2025-02-19 | MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification | Linzhuang Sun et.al. | 2502.13383 | link |
2025-02-22 | Grounding LLM Reasoning with Knowledge Graphs | Alfonso Amayuelas et.al. | 2502.13247 | null |
2025-02-18 | Theorem Prover as a Judge for Synthetic Data Generation | Joshua Ong Jun Leang et.al. | 2502.13137 | null |
2025-02-18 | Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options | Lakshmi Nair et.al. | 2502.12929 | link |
2025-02-18 | S |
Ruotian Ma et.al. | 2502.12853 | link |
2025-02-18 | CutPaste&Find: Efficient Multimodal Hallucination Detector with Visual-aid Knowledge Base | Cong-Duy Nguyen et.al. | 2502.12591 | null |
2025-02-18 | Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights | Shubham Parashar et.al. | 2502.12521 | null |
2025-02-18 | HopRAG: Multi-Hop Reasoning for Logic-Aware Retrieval-Augmented Generation | Hao Liu et.al. | 2502.12442 | null |
2025-02-17 | Evaluating Step-by-step Reasoning Traces: A Survey | Jinu Lee et.al. | 2502.12289 | null |
2025-02-17 | SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs | Yige Xu et.al. | 2502.12134 | null |
2025-02-17 | TokenSkip: Controllable Chain-of-Thought Compression in LLMs | Heming Xia et.al. | 2502.12067 | link |
2025-02-17 | Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models | Hyunwoo Kim et.al. | 2502.11881 | null |
2025-02-17 | Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities | Hanbin Wang et.al. | 2502.11829 | link |
2025-02-17 | Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning | Yuqi Pang et.al. | 2502.11751 | link |
2025-02-17 | DeFiScope: Detecting Various DeFi Price Manipulations with LLM Reasoning | Juantao Zhong et.al. | 2502.11521 | null |
2025-02-16 | Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls | Ante Wang et.al. | 2502.11183 | null |
2025-02-16 | LogiDynamics: Unraveling the Dynamics of Logical Inference in Large Language Model Reasoning | Tianshi Zheng et.al. | 2502.11176 | null |
2025-02-15 | A Tutorial on LLM Reasoning: Relevant Methods behind ChatGPT o1 | Jun Wang et.al. | 2502.10867 | null |
2025-02-15 | USER-VLM 360: Personalized Vision Language Models with User-aware Tuning for Social Human-Robot Interactions | Hamed Rahimi et.al. | 2502.10636 | null |
2025-02-14 | Do Large Language Models Reason Causally Like Us? Even Better? | Hanna M. Dettki et.al. | 2502.10215 | null |
2025-02-14 | MathConstruct: Challenging LLM Reasoning with Constructive Proofs | Mislav Balunović et.al. | 2502.10197 | null |
2025-02-13 | MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency | Dongzhi Jiang et.al. | 2502.09621 | null |
2025-02-14 | EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges | Clinton J. Wang et.al. | 2502.08859 | null |
2025-02-11 | CIRCUIT: A Benchmark for Circuit Interpretation and Reasoning Capabilities of LLMs | Lejla Skelic et.al. | 2502.07980 | null |
2025-02-05 | Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment | Cheryl Li et.al. | 2502.07803 | null |
2025-02-17 | Bag of Tricks for Inference-time Computation of LLM Reasoning | Fan Liu et.al. | 2502.07191 | null |
2025-02-15 | Self-Supervised Prompt Optimization | Jinyu Xiang et.al. | 2502.06855 | link |
2025-02-06 | Vision-Integrated LLMs for Autonomous Driving Assistance : Human Performance Comparison and Trust Evaluation | Namhee Kim et.al. | 2502.06843 | null |
2025-02-04 | Policy Guided Tree Search for Enhanced LLM Reasoning | Yang Li et.al. | 2502.06813 | null |
2025-02-10 | ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates | Ling Yang et.al. | 2502.06772 | link |
2025-02-10 | Resurrecting saturated LLM benchmarks with adversarial encoding | Igor Ivanov et.al. | 2502.06738 | null |
2025-02-13 | LawGPT: Knowledge-Guided Data Generation and Its Application to Legal LLM | Zhi Zhou et.al. | 2502.06572 | link |
2025-02-09 | A Generative Framework for Bidirectional Image-Report Understanding in Chest Radiography | Nicholas Evans et.al. | 2502.05926 | null |
2025-02-08 | Evaluating Vision-Language Models for Emotion Recognition | Sree Bhattacharyya et.al. | 2502.05660 | null |
2025-02-07 | GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity? | Yang Zhou et.al. | 2502.05252 | link |
2025-02-07 | Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures | Tushar Pandey et.al. | 2502.05078 | link |
2025-02-07 | Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research | Junde Wu et.al. | 2502.04644 | link |
2025-02-05 | Enhancing Reasoning to Adapt Large Language Models for Domain-Specific Applications | Bo Wen et.al. | 2502.04384 | link |
2025-02-05 | Limitations of Large Language Models in Clinical Problem-Solving Arising from Inflexible Reasoning | Jonathan Kim et.al. | 2502.04381 | null |
2025-02-04 | Investigating the Robustness of Deductive Reasoning with Large Language Models | Fabian Hoppe et.al. | 2502.04352 | null |
2025-02-04 | Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search | Maohao Shen et.al. | 2502.02508 | null |
2025-02-04 | CoAT: Chain-of-Associated-Thoughts Framework for Enhancing Large Language Models Reasoning | Jianfeng Pan et.al. | 2502.02390 | null |
2025-02-08 | Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking | Jinyang Wu et.al. | 2502.02339 | null |
2025-02-04 | Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration | Younan Zhu et.al. | 2502.01969 | null |
2025-01-31 | Improving Rule-based Reasoning in LLMs via Neurosymbolic Representations | Varun Dhanraj et.al. | 2502.01657 | null |
2025-02-03 | Position: Empowering Time Series Reasoning with Multimodal LLMs | Yaxuan Kong et.al. | 2502.01477 | null |
2025-02-03 | ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning | Bill Yuchen Lin et.al. | 2502.01100 | null |
2025-02-16 | Learning Autonomous Code Integration for Math Language Models | Haozhe Wang et.al. | 2502.00691 | null |
2025-02-13 | Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning | Zhi Zhou et.al. | 2502.00511 | null |
2025-02-14 | Reward-Guided Speculative Decoding for Efficient LLM Reasoning | Baohao Liao et.al. | 2501.19324 | null |
2025-01-31 | Efficient Reasoning with Hidden Thinking | Xuan Shen et.al. | 2501.19201 | link |
2025-01-31 | BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning | Han Zhong et.al. | 2501.18858 | null |
2025-01-28 | A Stochastic Dynamical Theory of LLM Self-Adversariality: Modeling Severity Drift as a Critical Process | Jack David Carson et.al. | 2501.16783 | null |
2025-01-27 | Explaining GitHub Actions Failures with Large Language Models: Challenges, Insights, and Limitations | Pablo Valenzuela-Toledo et.al. | 2501.16495 | null |
2025-01-27 | Large Models in Dialogue for Active Perception and Anomaly Detection | Tzoulio Chamiti et.al. | 2501.16300 | link |
2025-01-26 | TensorLLM: Tensorising Multi-Head Attention for Enhanced Reasoning and Compression in LLMs | Yuxuan Gu et.al. | 2501.15674 | null |
2025-01-28 | Rethinking External Slow-Thinking: From Snowball Errors to Probability of Correct Reasoning | Zeyu Gan et.al. | 2501.15602 | link |
2025-01-26 | Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework | Yuhong Sun et.al. | 2501.15581 | null |
2025-02-15 | Option-ID Based Elimination For Multiple Choice Questions | Zhenhao Zhu et.al. | 2501.15175 | null |
2025-01-24 | Domaino1s: Guiding LLM Reasoning for Explainable Answers in High-Stakes Domains | Xu Chu et.al. | 2501.14431 | null |
2025-02-12 | GraphSOS: Graph Sampling and Order Selection to Help LLMs Understand Graphs Better | Xu Chu et.al. | 2501.14427 | null |
2025-01-23 | Pseudocode-Injection Magic: Enabling LLMs to Tackle Graph Computational Tasks | Chang Gong et.al. | 2501.13731 | null |
2025-02-10 | Cognitive Paradigms for Evaluating VLMs on Visual Reasoning Task | Mohit Vaishnav et.al. | 2501.13620 | null |
2025-01-22 | EvidenceMap: Unleashing the Power of Small Language Models with Evidence Analysis for Biomedical Question Answering | Chang Zong et.al. | 2501.12746 | null |
2025-01-17 | LLM Reasoner and Automated Planner: A new NPC approach | Israel Puerta-Merino et.al. | 2501.10106 | null |
2025-01-22 | FRAG: A Flexible Modular Framework for Retrieval-Augmented Generation based on Knowledge Graphs | Zengyi Gao et.al. | 2501.09957 | null |
2025-01-17 | Evolving Deeper LLM Thinking | Kuang-Huei Lee et.al. | 2501.09891 | null |
2025-01-23 | Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models | Fengli Xu et.al. | 2501.09686 | null |
2025-01-15 | Multimodal LLMs Can Reason about Aesthetics in Zero-Shot | Ruixiang Jiang et.al. | 2501.09012 | link |
2025-02-10 | Ensemble of Large Language Models for Curated Labeling and Rating of Free-text Data | Jiaxing Qiu et.al. | 2501.08413 | link |
2025-01-14 | Reasoning with Graphs: Structuring Implicit Knowledge to Enhance LLMs Reasoning | Haoyu Han et.al. | 2501.07845 | null |
2025-01-09 | Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark | Yunzhuo Hao et.al. | 2501.05444 | link |
2025-01-08 | Enhancing Financial VQA in Vision Language Models using Intermediate Structured Representations | Archita Srivastava et.al. | 2501.04675 | null |
2025-01-08 | DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests | Charles Corbière et.al. | 2501.04671 | null |
2025-01-08 | Understanding Before Reasoning: Enhancing Chain-of-Thought with Iterative Summarization Pre-Prompting | Dong-Hai Zhu et.al. | 2501.04341 | link |
2025-01-07 | Reasoning-Enhanced Self-Training for Long-Form Personalized Text Generation | Alireza Salemi et.al. | 2501.04167 | null |
2025-01-07 | Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild | Wanpeng Hu et.al. | 2501.02964 | link |
2025-01-06 | KG-CF: Knowledge Graph Completion with Context Filtering under the Guidance of Large Language Models | Zaiyi Zheng et.al. | 2501.02711 | null |
2025-01-04 | Table as Thought: Exploring Structured Thoughts in LLM Reasoning | Zhenjie Sun et.al. | 2501.02152 | null |
2025-01-03 | Recursive Decomposition of Logical Thoughts: Framework for Superior Reasoning and Knowledge Propagation in Large Language Models | Kaleem Ullah Qasim et.al. | 2501.02026 | null |
2025-01-02 | Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search | Shuangtao Li et.al. | 2501.01478 | null |
2025-01-02 | HetGCoT-Rec: Heterogeneous Graph-Enhanced Chain-of-Thought LLM Reasoning for Journal Recommendation | Runsong Jia et.al. | 2501.01203 | null |
2025-01-03 | Enhancing LLM Reasoning with Multi-Path Collaborative Reactive and Reflection agents | Chengbo He et.al. | 2501.00430 | null |
2024-12-31 | EQUATOR: A Deterministic Framework for Evaluating LLM Reasoning with Open-Ended Questions. # v1.0.0-beta | Raymond Bernard et.al. | 2501.00257 | null |
2024-12-30 | Efficiently Serving LLM Reasoning Programs with Certaindex | Yichao Fu et.al. | 2412.20993 | null |
2024-12-28 | LLM Reasoning Engine: Specialized Training for Enhanced Mathematical Reasoning | Shuguang Chen et.al. | 2412.20227 | null |
2025-02-17 | Token-Budget-Aware LLM Reasoning | Tingxu Han et.al. | 2412.18547 | link |
2024-12-23 | StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs | Hailin Chen et.al. | 2412.18011 | null |
2025-02-09 | Evaluating LLM Reasoning in the Operations Research Domain with ORQA | Mahdi Mostajabdaveh et.al. | 2412.17874 | link |
2024-12-23 | Diving into Self-Evolving Training for Multimodal Reasoning | Wei Liu et.al. | 2412.17451 | null |
2024-12-21 | SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization | Tan-Hanh Pham et.al. | 2412.16771 | null |
2024-12-20 | PruneVid: Visual Token Pruning for Efficient Video Large Language Models | Xiaohu Huang et.al. | 2412.16117 | link |
2024-12-19 | Eliciting Causal Abilities in Large Language Models for Reasoning Tasks | Yajing Wang et.al. | 2412.15314 | link |
2024-12-19 | Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying | Federico Castagna et.al. | 2412.15177 | link |
2024-12-19 | Progressive Multimodal Reasoning via Active Retrieval | Guanting Dong et.al. | 2412.14835 | null |
2024-12-19 | FiVL: A Framework for Improved Vision-Language Alignment | Estelle Aflalo et.al. | 2412.14672 | null |
2024-12-19 | FaultExplainer: Leveraging Large Language Models for Interpretable Fault Detection and Diagnosis | Abdullah Khan et.al. | 2412.14492 | link |
2024-12-18 | Cognition Chain for Explainable Psychological Stress Detection on Social Media | Xin Wang et.al. | 2412.14009 | null |
2024-12-27 | Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence | Jinghan He et.al. | 2412.13949 | null |
2025-02-16 | Do Language Models Understand Time? | Xi Ding et.al. | 2412.13845 | link |
2024-12-18 | Beyond Outcomes: Transparent Assessment of LLM Reasoning in Games | Wenye Lin et.al. | 2412.13602 | null |
2024-12-17 | ClarityEthic: Explainable Moral Judgment Utilizing Contrastive Ethical Insights from Large Language Models | Yuxi Sun et.al. | 2412.12848 | null |
2024-12-12 | A NotSo Simple Way to Beat Simple Bench | Soham Sane et.al. | 2412.12173 | null |
2024-12-11 | What Makes In-context Learning Effective for Mathematical Reasoning: A Theoretical Analysis | Jiayu Liu et.al. | 2412.12157 | null |
2025-02-18 | A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges | Yibo Yan et.al. | 2412.11936 | null |
2024-12-24 | Stepwise Reasoning Error Disruption Attack of LLMs | Jingyu Peng et.al. | 2412.11934 | null |
2024-12-16 | Leveraging Retrieval-Augmented Tags for Large Vision-Language Understanding in Complex Scenes | Antonio Carlos Rivera et.al. | 2412.11396 | null |
2024-12-15 | SceneLLM: Implicit Language Reasoning in LLM for Dynamic Scene Graph Generation | Hang Zhang et.al. | 2412.11026 | null |
2024-12-15 | Entropy-Regularized Process Reward Model | Hanning Zhang et.al. | 2412.11006 | link |
2024-12-14 | Optimizing Vision-Language Interactions Through Decoder-Only Models | Kaito Tanaka et.al. | 2412.10758 | null |
2024-12-14 | Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation | Sukai Huang et.al. | 2412.10675 | null |
2024-12-14 | Thinking with Knowledge Graphs: Enhancing LLM Reasoning Through Structured Data | Xue Wu et.al. | 2412.10654 | null |
2024-12-13 | EVLM: Self-Reflective Multimodal Reasoning for Cross-Dimensional Visual Editing | Umar Khalid et.al. | 2412.10566 | null |
2024-12-13 | Atomic Learning Objectives Labeling: A High-Resolution Approach for Physics Education | Naiming Liu et.al. | 2412.09914 | null |
2025-01-18 | Neptune: The Long Orbit to Benchmarking Long Video Understanding | Arsha Nagrani et.al. | 2412.09582 | link |
2025-02-14 | Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning | Zhenni Bi et.al. | 2412.09078 | link |
2024-12-11 | Training Large Language Models to Reason in a Continuous Latent Space | Shibo Hao et.al. | 2412.06769 | link |
2025-01-23 | GameArena: Evaluating LLM Reasoning through Live Computer Games | Lanxiang Hu et.al. | 2412.06394 | null |
2024-12-08 | Language hooks: a modular framework for augmenting LLM reasoning that decouples tool usage from the model and its prompt | Damien de Mijolla et.al. | 2412.05967 | null |
2024-12-06 | MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale | Jarvis Guo et.al. | 2412.05237 | null |
2024-12-05 | Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction | Yiheng Xu et.al. | 2412.04454 | null |
2024-12-05 | SocialMind: LLM-based Proactive AR Social Assistive System with Human-like Perception for In-situ Live Interactions | Bufang Yang et.al. | 2412.04036 | null |
2024-12-04 | DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation | Qingdong He et.al. | 2412.03255 | null |
2024-12-03 | Explainable CTR Prediction via LLM Reasoning | Xiaohan Yu et.al. | 2412.02588 | null |
2025-02-12 | NYT-Connections: A Deceptively Simple Text Classification Task that Stumps System-1 Thinkers | Angel Yahir Loredo Lopez et.al. | 2412.01621 | null |
2025-01-13 | Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability | Zicheng Lin et.al. | 2411.19943 | link |
2024-11-29 | TQA-Bench: Evaluating LLMs for Multi-Table Question Answering with Scalable Context and Symbolic Extension | Zipeng Qiu et.al. | 2411.19504 | link |
2024-11-29 | COLD: Causal reasOning in cLosed Daily activities | Abhinav Joshi et.al. | 2411.19500 | link |
2024-12-16 | Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning | Di Zhang et.al. | 2411.18203 | null |
2024-11-26 | NEMO: Can Multimodal LLMs Identify Attribute-Modified Objects? | Jiaxuan Li et.al. | 2411.17794 | null |
2024-11-25 | Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision | Zhiheng Xi et.al. | 2411.16579 | null |
2024-11-22 | On the Impact of Fine-Tuning on Chain-of-Thought Reasoning | Elita Lobo et.al. | 2411.15382 | null |
2024-11-21 | Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models | Yuhao Dong et.al. | 2411.14432 | link |
2024-11-20 | BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games | Davide Paglieri et.al. | 2411.13543 | null |
2024-11-20 | Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving | Hao Zhou et.al. | 2411.13076 | null |
2024-11-15 | Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination | Haojie Zheng et.al. | 2411.12591 | link |
2024-12-23 | Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic Corpus | Terufumi Morishita et.al. | 2411.12498 | link |
2024-11-18 | Semantic-Geometric-Physical-Driven Robot Manipulation Skill Transfer via Skill Library and Tactile Representation | Mingchao Qi et.al. | 2411.11714 | link |
2024-12-31 | Enhancing LLM Reasoning with Reward-guided Tree Search | Jinhao Jiang et.al. | 2411.11694 | null |
2024-12-15 | A dataset of questions on decision-theoretic reasoning in Newcomb-like problems | Caspar Oesterheld et.al. | 2411.10588 | link |
2024-11-15 | Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization | Weiyun Wang et.al. | 2411.10442 | null |
2025-01-09 | LLaVA-CoT: Let Vision Language Models Reason Step-by-Step | Guowei Xu et.al. | 2411.10440 | link |
2024-11-15 | Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level | Andong Deng et.al. | 2411.09921 | null |
2024-11-14 | Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering | Nghia Trung Ngo et.al. | 2411.09213 | null |
2024-11-13 | Tree-of-Table: Unleashing the Power of LLMs for Enhanced Large-Scale Table Understanding | Deyi Ji et.al. | 2411.08516 | null |
2024-11-18 | What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? | Katie Kang et.al. | 2411.07681 | link |
2024-11-27 | Self-Training Meets Consistency: Improving LLMs' Reasoning With Consistency-Driven Rationale Evaluation | Jaehyeok Lee et.al. | 2411.06387 | link |
2024-11-09 | A Picture is Worth A Thousand Numbers: Enabling LLMs Reason about Time Series via Visualization | Haoxin Liu et.al. | 2411.06018 | null |
2024-11-11 | LLMs as Method Actors: A Model for Prompt Engineering and Architecture | Colin Doyle et.al. | 2411.05778 | link |
2024-11-12 | Kwai-STaR: Transform LLMs into State-Transition Reasoners | Xingyu Lu et.al. | 2411.04799 | null |
2024-11-21 | Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding | Haolin Chen et.al. | 2411.04282 | link |
2024-11-05 | CrowdGenUI: Enhancing LLM-Based UI Widget Generation with a Crowdsourced Preference Library | Yimeng Liu et.al. | 2411.03477 | null |
2025-01-27 | MetRex: A Benchmark for Verilog Code Metric Reasoning Using LLMs | Manar Abdelatty et.al. | 2411.03471 | link |
2024-11-04 | RuAG: Learned-rule-augmented Generation for Large Language Models | Yudi Zhang et.al. | 2411.03349 | null |
2024-10-30 | Vision-Language Models Can Self-Improve Reasoning via Reflection | Kanzhi Cheng et.al. | 2411.00855 | null |
2024-11-01 | Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling | Yiwen Ding et.al. | 2411.00750 | link |
2024-11-01 | STEM-POM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing | Jiaru Zou et.al. | 2411.00387 | null |
2024-11-08 | GRS-QA -- Graph Reasoning-Structured Question Answering Dataset | Anish Pahilajani et.al. | 2411.00369 | null |
2024-10-31 | Thought Space Explorer: Navigating and Expanding Thought Space for Large Language Model Reasoning | Jinghan Zhang et.al. | 2410.24155 | null |
2024-10-31 | RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner | Fu-Chieh Chang et.al. | 2410.23912 | null |
2024-10-31 | OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models | Junda Wu et.al. | 2410.23703 | null |
2024-10-30 | ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning | Millennium Bismay et.al. | 2410.23180 | link |
2024-10-30 | On Memorization of Large Language Models in Logical Reasoning | Chulin Xie et.al. | 2410.23123 | null |
2024-10-28 | Causal Interventions on Causal Paths: Mapping GPT-2's Reasoning From Syntax to Semantics | Isabelle Lee et.al. | 2410.21353 | null |
2024-10-28 | Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments | Sangmim Song et.al. | 2410.20666 | null |
2024-10-25 | Cooperative Strategic Planning Enhances Reasoning Capabilities in Large Language Models | Danqing Wang et.al. | 2410.20007 | null |
2024-10-25 | Can Stories Help LLMs Reason? Curating Information Space Through Narrative | Vahid Sadiri Javadi et.al. | 2410.19221 | null |
2024-10-18 | Make LLMs better zero-shot reasoners: Structure-orientated autonomous reasoning | Pengfei He et.al. | 2410.19000 | link |
2024-10-25 | CLR-Bench: Evaluating Large Language Models in College-level Reasoning | Junnan Dong et.al. | 2410.17558 | null |
2024-10-28 | Non-myopic Generation of Language Models for Reasoning and Planning | Chang Ma et.al. | 2410.17195 | link |
2024-11-06 | Improving Causal Reasoning in Large Language Models: A Survey | Longxuan Yu et.al. | 2410.16676 | link |
2024-10-22 | A Statistical Analysis of LLMs' Self-Evaluation Using Proverbs | Ryosuke Sonoda et.al. | 2410.16640 | null |
2024-10-21 | Rulebreakers Challenge: Revealing a Blind Spot in Large Language Models' Reasoning with Formal Logic | Jason Chan et.al. | 2410.16502 | null |
2024-11-27 | On Designing Effective RL Reward at Training Time for LLM Reasoning | Jiaxuan Gao et.al. | 2410.15115 | null |
2025-01-28 | Paths-over-Graph: Knowledge Graph Empowered Large Language Model Reasoning | Xingyu Tan et.al. | 2410.14211 | null |
2024-10-21 | Unconstrained Model Merging for Enhanced LLM Reasoning | Yiming Zhang et.al. | 2410.13699 | null |
2024-10-16 | Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models | Linhao Luo et.al. | 2410.13080 | link |
2024-10-16 | KcMF: A Knowledge-compliant Framework for Schema and Entity Matching with Fine-tuning-free LLMs | Yongqin Xu et.al. | 2410.12480 | null |
2024-10-17 | Enhancing LLM Trading Performance with Fact-Subjectivity Aware Reasoning | Qian Wang et.al. | 2410.12464 | null |
2024-10-16 | Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up | Jiahao Yuan et.al. | 2410.12323 | link |
2024-10-16 | Exploiting LLMs' Reasoning Capability to Infer Implicit Concepts in Legal Information Retrieval | Hai-Long Nguyen et.al. | 2410.12154 | null |
2024-10-15 | Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming | Yilun Hao et.al. | 2410.12112 | null |
2024-10-12 | OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models | Jun Wang et.al. | 2410.09671 | null |
2024-10-11 | P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains | Simeng Han et.al. | 2410.09207 | null |
2024-10-11 | Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning | Yunpeng Gao et.al. | 2410.08500 | null |
2024-10-10 | SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation | Hang Yin et.al. | 2410.08189 | null |
2024-10-10 | Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning | Amrith Setlur et.al. | 2410.08146 | null |
2024-10-10 | Automatic Curriculum Expert Iteration for Reliable LLM Reasoning | Zirui Zhao et.al. | 2410.07627 | null |
2024-10-09 | Boosting Few-Shot Detection with Large Language Models and Layout-to-Image Synthesis | Ahmed Abdullah et.al. | 2410.06841 | null |
2024-10-09 | Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning | Xiyao Wang et.al. | 2410.06508 | null |
2025-01-02 | Filtering Discomforting Recommendations with Large Language Models | Jiahao Liu et.al. | 2410.05411 | null |
2024-10-05 | Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification | Zhenwen Liang et.al. | 2410.05318 | null |
2024-10-06 | Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval | Pengcheng Jiang et.al. | 2410.04585 | link |
2024-10-03 | The Role of Deductive and Inductive Reasoning in Large Language Models | Chengkun Cai et.al. | 2410.02892 | null |
2024-10-02 | Not All LLM Reasoners Are Created Equal | Arian Hosseini et.al. | 2410.01748 | null |
2024-12-25 | Interpretable Contrastive Monte Carlo Tree Search Reasoning | Zitian Gao et.al. | 2410.01707 | link |
2024-10-02 | VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment | Amirhossein Kazemnejad et.al. | 2410.01679 | link |
2024-10-02 | AHP-Powered LLM Reasoning for Multi-Criteria Evaluation of Open-Ended Responses | Xiaotian Lu et.al. | 2410.01246 | null |
2024-10-01 | Self-controller: Controlling LLMs with Multi-round Step-by-step Self-awareness | Xiao Peng et.al. | 2410.00359 | null |
2024-10-01 | Insight: A Multi-Modal Diagnostic Pipeline using LLMs for Ocular Surface Disease Diagnosis | Chun-Hsiao Yeh et.al. | 2410.00292 | null |
2024-10-08 | GUNDAM: Aligning Large Language Models with Graph Understanding | Sheng Ouyang et.al. | 2409.20053 | null |
2024-09-27 | Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs | Yanyuan Qiao et.al. | 2409.18794 | null |
2024-10-23 | Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning | Debargha Ganguly et.al. | 2409.17270 | null |
2024-09-20 | CSCE: Boosting LLM Reasoning by Simultaneous Enhancing of Casual Significance and Consistency | Kangsheng Wang et.al. | 2409.17174 | null |
2024-09-20 | Mufu: Multilingual Fused Learning for Low-Resource Translation with LLM | Zheng Wei Lim et.al. | 2409.13949 | null |
2024-09-19 | SituationAdapt: Contextual UI Optimization in Mixed Reality with Situation Awareness via LLM Reasoning | Zhipeng Li et.al. | 2409.12836 | null |
2024-10-04 | Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning | Jiaxin Wen et.al. | 2409.12452 | link |
2024-12-16 | Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data | Jiaming Zhou et.al. | 2409.12437 | link |
2024-09-18 | MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning | Justin Chih-Yao Chen et.al. | 2409.12147 | link |
2024-11-05 | Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent | Fatemeh Haji et.al. | 2409.11527 | link |
2024-09-16 | Enhancing RL Safety with Counterfactual LLM Reasoning | Dennis Gross et.al. | 2409.10188 | link |
2024-09-11 | Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation | SeongYeub Chu et.al. | 2409.07355 | link |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-02-26 | Exploring Graph Tasks with Pure LLMs: A Comprehensive Benchmark and Investigation | Yuxiang Wang et.al. | 2502.18771 | link |
2025-02-23 | Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation | Simin Chen et.al. | 2502.17521 | link |
2025-02-24 | Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective | Chengyin Xu et.al. | 2502.17262 | null |
2025-02-24 | Detecting Benchmark Contamination Through Watermarking | Tom Sander et.al. | 2502.17259 | null |
2025-02-24 | Predicting Liquidity-Aware Bond Yields using Causal GANs and Deep Reinforcement Learning with LLM Evaluation | Jaskaran Singh Walia et.al. | 2502.17011 | null |
2025-02-24 | AlphaAgent: LLM-Driven Alpha Mining with Regularized Exploration to Counteract Alpha Decay | Ziyi Tang et.al. | 2502.16789 | null |
2025-01-30 | Retrieval Augmented Generation Based LLM Evaluation For Protocol State Machine Inference With Chain-of-Thought Reasoning | Youssef Maklad et.al. | 2502.15727 | null |
2025-02-20 | Prompt-to-Leaderboard | Evan Frick et.al. | 2502.14855 | link |
2025-02-27 | SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines | M-A-P Team et.al. | 2502.14739 | null |
2025-02-20 | SEA-HELM: Southeast Asian Holistic Evaluation of Language Models | Yosephine Susanto et.al. | 2502.14301 | null |
2025-02-20 | Transfer-Prompting: Enhancing Cross-Task Adaptation in Large Language Models via Dual-Stage Prompts Optimization | Yupeng Chang et.al. | 2502.14211 | link |
2025-02-19 | Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above | Nishant Balepur et.al. | 2502.14127 | null |
2025-02-19 | STEER-ME: Assessing the Microeconomic Reasoning of Large Language Models | Narun Raman et.al. | 2502.13119 | null |
2025-02-18 | HPSS: Heuristic Prompting Strategy Search for LLM Evaluators | Bosi Wen et.al. | 2502.13031 | null |
2025-02-18 | None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks | Eva Sánchez Salido et.al. | 2502.12896 | null |
2025-02-18 | Safe at the Margins: A General Approach to Safety Alignment in Low-Resource English Languages -- A Singlish Case Study | Isaac Lim et.al. | 2502.12485 | null |
2025-02-17 | Deviation Ratings: A General, Clone-Invariant Rating Method | Luke Marris et.al. | 2502.11645 | null |
2025-02-21 | TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking | Shahriar Kabir Nahin et.al. | 2502.11187 | null |
2025-02-15 | Rule-Bottleneck Reinforcement Learning: Joint Explanation and Decision Optimization for Resource Allocation with Language Agents | Mauricio Tec et.al. | 2502.10732 | null |
2025-02-15 | An Empirical Analysis of Uncertainty in Large Language Model Evaluations | Qiujie Xie et.al. | 2502.10709 | link |
2025-02-25 | Accelerating Unbiased LLM Evaluation via Synthetic Feedback | Zhaoyi Zhou et.al. | 2502.10563 | link |
2025-02-14 | MathConstruct: Challenging LLM Reasoning with Constructive Proofs | Mislav Balunović et.al. | 2502.10197 | null |
2025-02-13 | Enhancing Jailbreak Attacks via Compliance-Refusal-Based Initialization | Amit Levi et.al. | 2502.09755 | null |
2025-02-13 | NestQuant: Nested Lattice Quantization for Matrix Products and LLMs | Semyon Savkin et.al. | 2502.09720 | null |
2025-02-12 | The Science of Evaluating Foundation Models | Jiayi Yuan et.al. | 2502.09670 | null |
2025-02-13 | Copilot Arena: A Platform for Code LLM Evaluation in the Wild | Wayne Chi et.al. | 2502.09328 | null |
2025-02-12 | Revisiting 3D LLM Benchmarks: Are We Really Testing 3D Capabilities? | Jiahe Jin et.al. | 2502.08503 | link |
2025-02-11 | Forget What You Know about LLMs Evaluations -- LLMs are Like a Chameleon | Nurit Cohen-Inger et.al. | 2502.07445 | link |
2025-02-10 | Evaluating the Systematic Reasoning Abilities of Large Language Models through Graph Coloring | Alex Heyman et.al. | 2502.07087 | link |
2025-02-10 | Multi-turn Evaluation of Anthropomorphic Behaviours in Large Language Models | Lujain Ibrahim et.al. | 2502.07077 | null |
2025-02-07 | LLM-Supported Natural Language to Bash Translation | Finnian Westenfelder et.al. | 2502.06858 | link |
2025-02-15 | Self-Supervised Prompt Optimization | Jinyu Xiang et.al. | 2502.06855 | link |
2025-02-10 | Resurrecting saturated LLM benchmarks with adversarial encoding | Igor Ivanov et.al. | 2502.06738 | null |
2025-02-10 | Automatic Evaluation of Healthcare LLMs Beyond Question-Answering | Anna Arias-Duart et.al. | 2502.06666 | null |
2025-02-10 | Unbiased Evaluation of Large Language Models from a Causal Perspective | Meilin Chen et.al. | 2502.06655 | null |
2025-02-10 | LessLeak-Bench: A First Investigation of Data Leakage in LLMs Across 83 Software Engineering Benchmarks | Xin Zhou et.al. | 2502.06215 | null |
2025-02-05 | Aero-LLM: A Distributed Framework for Secure UAV Communication and Intelligent Decision-Making | Balakrishnan Dharmalingam et.al. | 2502.05220 | null |
2025-02-06 | TruthFlow: Truthful LLM Generation via Representation Flow Correction | Hanyu Wang et.al. | 2502.04556 | null |
2025-02-05 | How do Humans and Language Models Reason About Creativity? A Comparative Analysis | Antonio Laverghetta Jr. et.al. | 2502.03253 | null |
2025-02-05 | On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation | Nghiem T. Diep et.al. | 2502.03029 | null |
2025-02-02 | LLM-Powered Benchmark Factory: Reliable, Generic, and Efficient | Peiwen Yuan et.al. | 2502.01683 | link |
2025-02-02 | HASSLE-free: A unified Framework for Sparse plus Low-Rank Matrix Decomposition for LLMs | Mehdi Makni et.al. | 2502.00899 | null |
2025-02-01 | DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks | Zhiliang Chen et.al. | 2502.00270 | null |
2025-01-30 | Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination's Impact on Machine Translation | Muhammed Yusuf Kocyigit et.al. | 2501.18771 | null |
2025-01-31 | ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation | Minghua He et.al. | 2501.18460 | null |
2025-02-01 | LLM Evaluation Based on Aerospace Manufacturing Expertise: Automated Generation and Multi-Model Question Answering | Beiming Liu et.al. | 2501.17183 | null |
2025-01-28 | An LLM Benchmark for Addressee Recognition in Multi-modal Multi-party Dialogue | Koji Inoue et.al. | 2501.16643 | null |
2025-01-26 | HardML: A Benchmark For Evaluating Data Science And Machine Learning knowledge and reasoning in AI | Tidor-Vlad Pricope et.al. | 2501.15627 | null |
2025-01-23 | Question Answering on Patient Medical Records with Private Fine-Tuned LLMs | Sara Kothari et.al. | 2501.13687 | null |
2025-01-10 | CodEv: An Automated Grading Framework Leveraging Large Language Models for Consistent and Constructive Feedback | En-Qi Tseng et.al. | 2501.10421 | null |
2025-01-15 | Towards Multilingual LLM Evaluation for Baltic and Nordic languages: A study on Lithuanian History | Yevhen Kostiuk et.al. | 2501.09154 | null |
2025-01-13 | Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles | Samia Touileb et.al. | 2501.07718 | null |
2025-01-03 | FLAME: Financial Large-Language Model Assessment and Metrics Evaluation | Jiayu Guo et.al. | 2501.06211 | link |
2025-01-07 | MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems | Yannis Katsis et.al. | 2501.03468 | link |
2025-01-05 | Evaluating Large Language Models Against Human Annotators in Latent Content Analysis: Sentiment, Political Leaning, Emotional Intensity, and Sarcasm | Ljubisa Bojic et.al. | 2501.02532 | null |
2025-01-04 | LLMzSzŁ: a comprehensive LLM benchmark for Polish | Krzysztof Jassem et.al. | 2501.02266 | null |
2025-01-08 | VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM | Yuqian Yuan et.al. | 2501.00599 | link |
2025-01-04 | Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation | M. Ali Bayram et.al. | 2501.00593 | null |
2024-12-31 | Echoes in AI: Quantifying Lack of Plot Diversity in LLM Outputs | Weijia Xu et.al. | 2501.00273 | null |
2024-12-30 | EVOLVE: Emotion and Visual Output Learning via LLM Evaluation | Jordan Sinclair et.al. | 2412.20632 | null |
2024-12-24 | Muse: A Multimodal Conversational Recommendation Dataset with Scenario-Grounded User Profiles | Zihan Wang et.al. | 2412.18416 | null |
2024-12-24 | A Statistical Framework for Ranking LLM-Based Chatbots | Siavash Ameli et.al. | 2412.18407 | link |
2025-01-25 | DeepCRCEval: Revisiting the Evaluation of Code Review Comment Generation | Junyi Lu et.al. | 2412.18291 | null |
2024-12-23 | CARL-GT: Evaluating Causal Reasoning Capabilities of Large Language Models | Ruibo Tu et.al. | 2412.17970 | link |
2025-01-02 | Baichuan4-Finance Technical Report | Hanyu Zhang et.al. | 2412.15270 | null |
2024-12-19 | ObjVariantEnsemble: Advancing Point Cloud LLM Evaluation in Challenging Scenes with Subtly Distinguished Objects | Qihang Cao et.al. | 2412.14837 | null |
2024-12-18 | AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge | Xiaobao Wu et.al. | 2412.13670 | link |
2025-02-16 | Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning | Eitan Wagner et.al. | 2412.13631 | null |
2025-02-17 | OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain | Shuting Wang et.al. | 2412.13018 | link |
2024-12-10 | How to Choose a Threshold for an Evaluation Metric for Large Language Models | Bhaskarjit Sarmah et.al. | 2412.12148 | null |
2024-12-15 | Dual Traits in Probabilistic Reasoning of Large Language Models | Shenxiong Li et.al. | 2412.11009 | link |
2024-12-30 | LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation | Eunsu Kim et.al. | 2412.10424 | null |
2024-12-13 | Cultural Evolution of Cooperation among LLM Agents | Aron Vallinder et.al. | 2412.10270 | null |
2024-12-12 | Towards Understanding the Robustness of LLM-based Evaluations under Perturbations | Manav Chaudhary et.al. | 2412.09269 | null |
2024-12-10 | BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities | Sahal Shaji Mullappilly et.al. | 2412.07769 | link |
2024-12-12 | PediaBench: A Comprehensive Chinese Pediatric Dataset for Benchmarking Large Language Models | Qian Zhang et.al. | 2412.06287 | link |
2024-12-02 | AI Benchmarks and Datasets for LLM Evaluation | Todor Ivanov et.al. | 2412.01020 | null |
2024-11-30 | Evaluating the Consistency of LLM Evaluators | Noah Lee et.al. | 2412.00543 | null |
2024-11-29 | MIMDE: Exploring the Use of Synthetic vs Human Data for Evaluating Multi-Insight Multi-Document Extraction Tasks | John Francis et.al. | 2411.19689 | null |
2024-11-29 | Beyond Surface Structure: A Causal Assessment of LLMs' Comprehension Ability | Yujin Han et.al. | 2411.19456 | link |
2024-11-27 | Is my Meeting Summary Good? Estimating Quality with a Multi-LLM Evaluator | Frederic Kirstein et.al. | 2411.18444 | null |
2025-01-17 | CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity | Zhengmin Yu et.al. | 2411.16239 | link |
2024-11-25 | SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text | Reshmi Ghosh et.al. | 2411.16077 | null |
2024-11-26 | Do LLMs Agree on the Creativity Evaluation of Alternative Uses? | Abdullah Al Rabeyah et.al. | 2411.15560 | null |
2025-02-17 | Ranking Unraveled: Recipes for LLM Rankings in Head-to-Head AI Combat | Roland Daynauth et.al. | 2411.14483 | link |
2024-11-21 | Lost in Inference: Rediscovering the Role of Natural Language Inference for Large Language Models | Lovish Madaan et.al. | 2411.14103 | null |
2024-11-21 | An Evaluation-Driven Approach to Designing LLM Agents: Process and Architecture | Boming Xia et.al. | 2411.13768 | null |
2024-11-21 | A Framework for Evaluating LLMs Under Task Indeterminacy | Luke Guerdan et.al. | 2411.13760 | null |
2024-11-12 | Large Language Models as Neurolinguistic Subjects: Identifying Internal Representations for Form and Meaning | Linyang He et.al. | 2411.07533 | null |
2024-11-13 | Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models | Yancheng He et.al. | 2411.07140 | null |
2024-11-09 | Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models | Xiaojun Wu et.al. | 2411.06272 | link |
2025-02-09 | ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding | Israel Abebe Azime et.al. | 2411.05049 | null |
2024-11-07 | Bayesian Calibration of Win Rate Estimation with LLM Evaluators | Yicheng Gao et.al. | 2411.04424 | link |
2024-11-05 | Enhancing LLM Evaluations: The Garbling Trick | William F. Bradley et.al. | 2411.01533 | null |
2025-02-19 | Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models | Seonil Son et.al. | 2411.01281 | null |
2025-02-07 | Mastering the Craft of Data Synthesis for CodeLLMs | Meng Chen et.al. | 2411.00005 | link |
2024-10-28 | Project MPG: towards a generalized performance benchmark for LLM capabilities | Lucas Spangher et.al. | 2410.22368 | null |
2024-10-29 | Self-Preference Bias in LLM-as-a-Judge | Koki Wataoka et.al. | 2410.21819 | null |
2024-10-28 | Unveiling Context-Aware Criteria in Self-Assessing LLMs | Taneesh Gupta et.al. | 2410.21545 | null |
2024-10-27 | LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization | Jui-Nan Yen et.al. | 2410.20625 | null |
2024-10-26 | Limitations of the LLM-as-a-Judge Approach for Evaluating LLM Outputs in Expert Knowledge Tasks | Annalisa Szymanski et.al. | 2410.20266 | null |
2024-10-23 | MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning | Jingfan Zhang et.al. | 2410.18035 | null |
2025-02-21 | Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements | Isamu Isozaki et.al. | 2410.17141 | link |
2024-10-21 | CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution | Maosong Cao et.al. | 2410.16256 | link |
2025-01-26 | mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code Generation | Nishat Raihan et.al. | 2410.15037 | link |
2024-10-19 | CAP: Data Contamination Detection via Consistency Amplification | Yi Zhao et.al. | 2410.15005 | null |
2024-10-18 | Enabling Scalable Evaluation of Bias Patterns in Medical LLMs | Hamed Fayyaz et.al. | 2410.14763 | link |
2024-11-06 | Diverging Preferences: When do Annotators Disagree and do Models Know? | Michael JQ Zhang et.al. | 2410.14632 | null |
2024-10-18 | Combining Entropy and Matrix Nuclear Norm for Enhanced Evaluation of Language Models | James Vo et.al. | 2410.14480 | null |
2024-10-21 | BenTo: Benchmark Task Reduction with In-Context Transferability | Hongyu Zhao et.al. | 2410.13804 | link |
2024-10-16 | BenchmarkCards: Large Language Model and Risk Reporting | Anna Sokol et.al. | 2410.12974 | null |
2025-02-01 | Language Model Preference Evaluation with Multiple Weak Evaluators | Zhengyu Hu et.al. | 2410.12869 | link |
2024-10-11 | Enterprise Benchmarks for Large Language Model Evaluation | Bing Zhang et.al. | 2410.12857 | link |
2024-10-16 | An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation | Junjie Chen et.al. | 2410.12265 | null |
2024-10-15 | Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers | Lorenzo Pacchiardi et.al. | 2410.11672 | link |
2024-10-15 | Black-box Uncertainty Quantification Method for LLM-as-a-Judge | Nico Wagner et.al. | 2410.11594 | null |
2024-10-14 | Jailbreak Instruction-Tuned LLMs via end-of-sentence MLP Re-weighting | Yifan Luo et.al. | 2410.10150 | null |
2024-12-13 | HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics | Jingxuan Fan et.al. | 2410.09988 | link |
2024-10-15 | LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models | Han Qiu et.al. | 2410.09962 | link |
2024-10-17 | Towards Multilingual LLM Evaluation for European Languages | Klaudia Thellmann et.al. | 2410.08928 | null |
2024-10-11 | Test-driven Software Experimentation with LASSO: an LLM Benchmarking Example | Marcus Kessel et.al. | 2410.08911 | null |
2024-10-10 | Assessing Episodic Memory in LLMs with Sequence Order Recall Tasks | Mathis Pink et.al. | 2410.08133 | null |
2025-02-03 | COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act | Philipp Guldimann et.al. | 2410.07959 | link |
2024-11-06 | News Reporter: A Multi-lingual LLM Framework for Broadcast T.V News | Tarun Jain et.al. | 2410.07520 | null |
2024-10-09 | Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates | Xiaosen Zheng et.al. | 2410.07137 | link |
2024-10-09 | ReIFE: Re-evaluating Instruction-Following Evaluation | Yixin Liu et.al. | 2410.07069 | link |
2024-10-08 | Active Evaluation Acquisition for Efficient LLM Benchmarking | Yang Li et.al. | 2410.05952 | null |
2024-10-07 | TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles | Qingchen Yu et.al. | 2410.05262 | link |
2024-10-01 | Language Enhanced Model for Eye (LEME): An Open-Source Ophthalmology-Specific Large Language Model | Aidan Gilson et.al. | 2410.03740 | null |
2024-10-04 | TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation | Jonathan Cook et.al. | 2410.03608 | null |
2024-10-04 | Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores | Robert E. Blackwell et.al. | 2410.03492 | null |
2024-10-29 | AIME: AI System Optimization via Multiple LLM Evaluators | Bhrij Patel et.al. | 2410.03131 | null |
2024-10-02 | Comparing Criteria Development Across Domain Experts, Lay Users, and Models in Large Language Model Evaluation | Annalisa Szymanski et.al. | 2410.02054 | null |
2024-10-02 | Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models | Joseph Lee et.al. | 2410.01795 | link |
2024-10-03 | Extending Context Window of Large Language Models from a Distributional Perspective | Yingsheng Wu et.al. | 2410.01490 | null |
2024-10-02 | ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving | Yifan Qiao et.al. | 2410.01228 | null |
2024-10-01 | ViDAS: Vision-based Danger Assessment and Scoring | Pranav Gupta et.al. | 2410.00477 | null |
2024-10-01 | PclGPT: A Large Language Model for Patronizing and Condescending Language Detection | Hongbo Wang et.al. | 2410.00361 | link |
2024-11-26 | LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models | Haitao Li et.al. | 2409.20288 | link |
2024-09-29 | Does RAG Introduce Unfairness in LLMs? Evaluating Fairness in Retrieval-Augmented Generation Systems | Xuyang Wu et.al. | 2409.19804 | null |
2024-10-19 | Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models | Xin Li et.al. | 2409.19667 | link |
2024-10-05 | IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation | Fan Lin et.al. | 2409.18892 | link |
2024-12-13 | A Character-Centric Creative Story Generation via Imagination | Kyeongman Park et.al. | 2409.16667 | null |
2024-09-25 | Judgment of Thoughts: Courtroom of the Binary Logical Reasoning in Large Language Models | Sungjune Park et.al. | 2409.16635 | null |
2024-12-18 | Kalahi: A handcrafted, grassroots cultural LLM evaluation suite for Filipino | Jann Railey Montalan et.al. | 2409.15380 | link |
2024-12-16 | MQM-APE: Toward High-Quality Error Annotation Predictors with Automatic Post-Editing in LLM Translation Evaluators | Qingyu Lu et.al. | 2409.14335 | link |
2024-09-21 | ChemEval: A Comprehensive Multi-Level Chemical Evaluation for Large Language Models | Yuqing Huang et.al. | 2409.13989 | link |
2024-12-17 | AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs | Basel Mousi et.al. | 2409.11404 | null |
2024-10-02 | LLM-as-a-Judge & Reward Model: What They Can and Cannot Do | Guijin Son et.al. | 2409.11239 | null |
2024-12-08 | Towards Data Contamination Detection for Modern Large Language Models: Limitations, Inconsistencies, and Oracle Challenges | Vinay Samuel et.al. | 2409.09927 | link |
2024-09-13 | Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia | Fajri Koto et.al. | 2409.08564 | null |
2024-09-09 | Assessing SPARQL capabilities of Large Language Models | Lars-Peter Meyer et.al. | 2409.05925 | link |
2024-10-08 | LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs | Yuhao Wu et.al. | 2409.02076 | link |
2024-10-14 | Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation | Jasper Dekoninck et.al. | 2409.00696 | null |
2024-08-26 | Evaluating ChatGPT on Nuclear Domain-Specific Data | Muhammad Anwar et.al. | 2409.00090 | null |
2024-08-28 | LLMSecCode: Evaluating Large Language Models for Secure Coding | Anton Rydén et.al. | 2408.16100 | link |
2024-08-26 | LLM-3D Print: Large Language Models To Monitor and Control 3D Printing | Yayati Jadhav et.al. | 2408.14307 | null |
2024-08-26 | Epidemic Information Extraction for Event-Based Surveillance using Large Language Models | Sergio Consoli et.al. | 2408.14277 | null |
2024-10-04 | MobileQuant: Mobile-friendly Quantization for On-device Language Models | Fuwen Tan et.al. | 2408.13933 | link |
2024-08-23 | LalaEval: A Holistic Human Evaluation Framework for Domain-Specific Large Language Models | Chongyan Sun et.al. | 2408.13338 | null |
2024-08-23 | Open Llama2 Model for the Lithuanian Language | Artūras Nakvosas et.al. | 2408.12963 | null |
2024-08-23 | LIMP: Large Language Model Enhanced Intent-aware Mobility Prediction | Songwei Li et.al. | 2408.12832 | link |
2024-12-20 | Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts | Jiaqing Liu et.al. | 2408.09688 | null |
2024-08-20 | Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge | Ravi Raju et.al. | 2408.08808 | null |
2024-10-16 | The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation | Samee Arif et.al. | 2408.08688 | link |
2024-10-19 | Persona is a Double-edged Sword: Mitigating the Negative Impact of Role-playing Prompts in Zero-shot Reasoning Tasks | Junseok Kim et.al. | 2408.08631 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-02-27 | R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts | Zhongyang Li et.al. | 2502.20395 | null |
2025-02-27 | InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions | Sirui Xu et.al. | 2502.20390 | null |
2025-02-27 | Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation | Sucheng Ren et.al. | 2502.20388 | null |
2025-02-27 | Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis | Jeffrey Yang Fan Chiang et.al. | 2502.20383 | null |
2025-02-27 | Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers | Shalev Lifshitz et.al. | 2502.20379 | null |
2025-02-27 | PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation | Albert Gong et.al. | 2502.20377 | null |
2025-02-27 | Constrained Generative Modeling with Manually Bridged Diffusion Models | Saeid Naderiparizi et.al. | 2502.20371 | null |
2025-02-27 | Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization | Ryan C. Barron et.al. | 2502.20364 | null |
2025-02-27 | Bridging the Creativity Understanding Gap: Small-Scale Human Alignment Enables Expert-Level Humor Ranking in LLMs | Kuan Lok Zhou et.al. | 2502.20356 | null |
2025-02-27 | KEDRec-LM: A Knowledge-distilled Explainable Drug Recommendation Large Language Model | Kai Zhang et.al. | 2502.20350 | null |
2025-02-27 | Sparse Auto-Encoder Interprets Linguistic Features in Large Language Models | Yi Jing et.al. | 2502.20344 | null |
2025-02-27 | Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners | Daniele Paliotta et.al. | 2502.20339 | null |
2025-02-27 | Expertise Is What We Want | Alan Ashworth et.al. | 2502.20335 | null |
2025-02-27 | Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models | Yukang Yang et.al. | 2502.20332 | null |
2025-02-27 | Long-Context Inference with Retrieval-Augmented Speculative Decoding | Guanzheng Chen et.al. | 2502.20330 | null |
2025-02-27 | EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants | Franck Cappello et.al. | 2502.20309 | null |
2025-02-27 | M^3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging | Jinghao Feng et.al. | 2502.20301 | null |
2025-02-27 | An exploration of features to improve the generalisability of fake news detection models | Nathaniel Hoy et.al. | 2502.20299 | null |
2025-02-27 | Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription | Benjamin Gutteridge et.al. | 2502.20295 | null |
2025-02-27 | Conformal Tail Risk Control for Large Language Model Alignment | Catherine Yu-Chi Chen et.al. | 2502.20285 | null |
2025-02-27 | Evaluating Human Trust in LLM-Based Planners: A Preliminary Study | Shenghui Chen et.al. | 2502.20284 | null |
2025-02-27 | Large Language Models as Attribution Regularizers for Efficient Model Training | Davor Vukadin et.al. | 2502.20268 | null |
2025-02-27 | Vector-Quantized Vision Foundation Models for Object-Centric Learning | Rongzhen Zhao et.al. | 2502.20263 | null |
2025-02-27 | LLM as a Broken Telephone: Iterative Generation Distorts Information | Amr Mohamed et.al. | 2502.20258 | null |
2025-02-27 | Do computer vision foundation models learn the low-level characteristics of the human visual system? | Yancheng Cai et.al. | 2502.20256 | null |
2025-02-27 | Beyond Natural Language Perplexity: Detecting Dead Code Poisoning in Code Generation Datasets | Chichien Tsai et.al. | 2502.20246 | null |
2025-02-27 | From Retrieval to Generation: Comparing Different Approaches | Abdelrahman Abdallah et.al. | 2502.20245 | null |
2025-02-27 | FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving | Guizhen Chen et.al. | 2502.20238 | null |
2025-02-27 | AI Will Always Love You: Studying Implicit Biases in Romantic AI Companions | Clare Grogan et.al. | 2502.20231 | null |
2025-02-27 | Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars | Tobias Kirschstein et.al. | 2502.20220 | null |
2025-02-27 | ChineseEcomQA: A Scalable E-commerce Concept Evaluation Benchmark for Large Language Models | Haibin Chen et.al. | 2502.20196 | null |
2025-02-27 | Model Checking Linear Temporal Logic with Standpoint Modalities | Rajab Aghamov et.al. | 2502.20193 | null |
2025-02-27 | Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge | Yan-Lun Chen et.al. | 2502.20186 | null |
2025-02-27 | DGFM: Full Body Dance Generation Driven by Music Foundation Models | Xinran Liu et.al. | 2502.20176 | null |
2025-02-27 | An Extensive Evaluation of PDDL Capabilities in off-the-shelf LLMs | Kaustubh Vyas et.al. | 2502.20175 | null |
2025-02-27 | Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think | Liang Chen et.al. | 2502.20172 | null |
2025-02-27 | Re-evaluating Open-ended Evaluation of Large Language Models | Siqi Liu et.al. | 2502.20170 | null |
2025-02-27 | Adaptive H&E-IHC information fusion staining framework based on feature extra | Yifan Jia et.al. | 2502.20156 | null |
2025-02-27 | Telephone Surveys Meet Conversational AI: Evaluating a LLM-Based Telephone Survey System at Scale | Max M. Lang et.al. | 2502.20140 | null |
2025-02-27 | Finite State Automata Inside Transformers with Chain-of-Thought: A Mechanistic Study on State Tracking | Yifan Zhang et.al. | 2502.20129 | null |
2025-02-27 | Self-Training Elicits Concise Reasoning in Large Language Models | Tergel Munkhbat et.al. | 2502.20122 | null |
2025-02-27 | LongRoPE2: Near-Lossless LLM Context Window Scaling | Ning Shang et.al. | 2502.20082 | null |
2025-02-27 | Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents | Haochen Sun et.al. | 2502.20073 | null |
2025-02-27 | A Generative Model Enhanced Multi-Agent Reinforcement Learning Method for Electric Vehicle Charging Navigation | Tianyang Qi et.al. | 2502.20068 | null |
2025-02-27 | Polish-ASTE: Aspect-Sentiment Triplet Extraction Datasets for Polish | Marta Lango et.al. | 2502.20046 | null |
2025-02-27 | 3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds | Hengshuo Chu et.al. | 2502.20041 | null |
2025-02-27 | AsymLoRA: Harmonizing Data Conflicts and Commonalities in MLLMs | Xuyang Wei et.al. | 2502.20035 | null |
2025-02-27 | Erasing Without Remembering: Safeguarding Knowledge Forgetting in Large Language Models | Huazheng Wang et.al. | 2502.19982 | null |
2025-02-27 | The Lookahead Limitation: Why Multi-Operand Addition is Hard for LLMs | Tanja Baeumel et.al. | 2502.19981 | null |
2025-02-27 | Can Large Language Models Unveil the Mysteries? An Exploration of Their Ability to Unlock Information in Complex Scenarios | Chao Wang et.al. | 2502.19973 | null |
2025-02-27 | Deterministic or probabilistic? The psychology of LLMs as random number generators | Javier Coronado-Blázquez et.al. | 2502.19965 | null |
2025-02-27 | SeisMoLLM: Advancing Seismic Monitoring via Cross-modal Transfer with Pre-trained Large Language Model | Xinghao Wang et.al. | 2502.19960 | link |
2025-02-27 | Collaborative Stance Detection via Small-Large Language Model Consistency Verification | Yu Yan et.al. | 2502.19954 | null |
2025-02-27 | GeoEdit: Geometric Knowledge Editing for Large Language Models | Yujie Feng et.al. | 2502.19953 | null |
2025-02-27 | Algebraic Machine Learning: Learning as computing an algebraic decomposition of a task | Fernando Martin-Maroto et.al. | 2502.19944 | null |
2025-02-27 | Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation | Xiang Geng et.al. | 2502.19941 | null |
2025-02-27 | Playing Pokémon Red via Deep Reinforcement Learning | Marco Pleines et.al. | 2502.19920 | null |
2025-02-27 | Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models | Yuan Sui et.al. | 2502.19918 | null |
2025-02-27 | Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents | Zhenyu Liu et.al. | 2502.19917 | null |
2025-02-27 | LLM-driven Effective Knowledge Tracing by Integrating Dual-channel Difficulty | Jiahui Cen et.al. | 2502.19915 | null |
2025-02-27 | SkipPipe: Partial and Reordered Pipelining Framework for Training LLMs in Heterogeneous Networks | Nikolay Blagoev et.al. | 2502.19913 | null |
2025-02-27 | Order Doesn't Matter, But Reasoning Does: Training LLMs with Order-Centric Augmentation | Qianxi He et.al. | 2502.19907 | null |
2025-02-27 | Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy | Zaijing Li et.al. | 2502.19902 | null |
2025-02-27 | GenPC: Zero-shot Point Cloud Completion via 3D Generative Priors | An Li et.al. | 2502.19896 | null |
2025-02-27 | Beyond the Tip of Efficiency: Uncovering the Submerged Threats of Jailbreak Attacks in Small Language Models | Sibo Yi et.al. | 2502.19883 | null |
2025-02-27 | Towards Multimodal Large-Language Models for Parent-Child Interaction: A Focus on Joint Attention | Weiyan Shi et.al. | 2502.19877 | null |
2025-02-27 | MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge | Yuntao Du et.al. | 2502.19870 | link |
2025-02-27 | MIND: Towards Immersive Psychological Healing with Multi-agent Inner Dialogue | Yujia Chen et.al. | 2502.19860 | null |
2025-02-27 | ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments | Hojae Han et.al. | 2502.19852 | null |
2025-02-27 | One-for-More: Continual Diffusion Model for Anomaly Detection | Xiaofan Li et.al. | 2502.19848 | null |
2025-02-27 | ProAPO: Progressively Automatic Prompt Optimization for Visual Classification | Xiangyan Qu et.al. | 2502.19844 | null |
2025-02-27 | Shared Stochastic Gaussian Process Latent Variable Models: A Multi-modal Generative Model for Quasar Spectra | Vidhi Lalchand et.al. | 2502.19824 | null |
2025-02-27 | Foot-In-The-Door: A Multi-turn Jailbreak for LLMs | Zixuan Weng et.al. | 2502.19820 | null |
2025-02-27 | Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts | Shulai Zhang et.al. | 2502.19811 | null |
2025-02-27 | Implicit Search via Discrete Diffusion: A Study on Chess | Jiacheng Ye et.al. | 2502.19805 | null |
2025-02-27 | Developmental Support Approach to AI's Autonomous Growth: Toward the Realization of a Mutually Beneficial Stage Through Experiential Learning | Taichiro Endo et.al. | 2502.19798 | null |
2025-02-27 | ChatMol: A Versatile Molecule Designer Based on the Numerically Enhanced Large Language Model | Chuanliu Fan et.al. | 2502.19794 | null |
2025-02-27 | Mixtera: A Data Plane for Foundation Model Training | Maximilian Böther et.al. | 2502.19790 | null |
2025-02-27 | Advancements in Natural Language Processing for Automatic Text Summarization | Nevidu Jayatilleke et.al. | 2502.19773 | null |
2025-02-27 | Does Your Voice Assistant Remember? Analyzing Conversational Context Recall and Utilization in Voice Interaction Models | Heeseung Kim et.al. | 2502.19759 | null |
2025-02-27 | PolyPrompt: Automating Knowledge Extraction from Multilingual Language Models with Dynamic Prompt Generation | Nathan Roll et.al. | 2502.19756 | null |
2025-02-27 | Beneath the Surface: How Large Language Models Reflect Hidden Bias | Jinhao Pan et.al. | 2502.19749 | null |
2025-02-27 | HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture | Taiqiang Wu et.al. | 2502.19747 | null |
2025-02-27 | R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning | Minggui He et.al. | 2502.19735 | null |
2025-02-27 | Preference Learning Unlocks LLMs' Psycho-Counseling Skills | Mian Zhang et.al. | 2502.19731 | null |
2025-02-27 | Do Expressions Change Decisions? Exploring the Impact of AI's Explanation Tone on Decision-Making | Ayano Okoso et.al. | 2502.19730 | null |
2025-02-27 | Tokens for Learning, Tokens for Unlearning: Mitigating Membership Inference Attacks in Large Language Models via Dual-Purpose Training | Toan Tran et.al. | 2502.19726 | null |
2025-02-27 | Few-Shot Multilingual Open-Domain QA from 5 Examples | Fan Jiang et.al. | 2502.19722 | null |
2025-02-27 | Sensing and Steering Stereotypes: Extracting and Applying Gender Representation Vectors in LLMs | Hannah Cyberey et.al. | 2502.19721 | null |
2025-02-27 | Teaching Dense Retrieval Models to Specialize with Listwise Distillation and LLM Data Augmentation | Manveer Singh Tamber et.al. | 2502.19712 | null |
2025-02-27 | AoECR: AI-ization of Elderly Care Robot | Linkun Zhou et.al. | 2502.19706 | null |
2025-02-27 | You Only Click Once: Single Point Weakly Supervised 3D Instance Segmentation for Autonomous Driving | Guangfeng Jiang et.al. | 2502.19698 | null |
2025-02-27 | M-LLM Based Video Frame Selection for Efficient Video Understanding | Kai Hu et.al. | 2502.19680 | null |
2025-02-27 | Old Experience Helps: Leveraging Survey Methodology to Improve AI Text Annotation Reliability in Social Sciences | Linzhuo li et.al. | 2502.19679 | null |
2025-02-27 | Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack | Chenhe Gu et.al. | 2502.19672 | null |
2025-02-27 | SuPreME: A Supervised Pre-training Framework for Multimodal ECG Representation Learning | Mingsheng Cai et.al. | 2502.19668 | null |
2025-02-27 | Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models | Jan Wehner et.al. | 2502.19649 | null |
2025-02-27 | cMIM: A Contrastive Mutual Information Framework for Unified Generative and Discriminative Representation Learning | Micha Livne et.al. | 2502.19642 | null |
2025-02-26 | Agentic Mixture-of-Workflows for Multi-Modal Chemical Search | Tiffany J. Callahan et.al. | 2502.19629 | null |
2025-02-26 | Treatment Non-Adherence Bias in Clinical Machine Learning: A Real-World Study on Hypertension Medication | Zhongyuan Liang et.al. | 2502.19625 | null |
2025-02-26 | Norm Growth and Stability Challenges in Localized Sequential Knowledge Editing | Akshat Gupta et.al. | 2502.19416 | null |
2025-02-26 | Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs | Dayu Yang et.al. | 2502.19411 | null |
2025-02-26 | Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small Devices | Xinru Wang et.al. | 2502.19410 | null |
2025-02-26 | ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models | Danae Sánchez Villegas et.al. | 2502.19409 | null |
2025-02-26 | Learning Code-Edit Embedding to Model Student Debugging Behavior | Hasnain Heickal et.al. | 2502.19407 | null |
2025-02-26 | General Reasoning Requires Learning to Reason from the Get-go | Seungwook Han et.al. | 2502.19402 | null |
2025-02-26 | TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding | Max Ku et.al. | 2502.19400 | null |
2025-02-26 | Multi-modal Contrastive Learning for Tumor-specific Missing Modality Synthesis | Minjoo Lim et.al. | 2502.19390 | null |
2025-02-26 | LiDAR Registration with Visual Foundation Models | Niclas Vödisch et.al. | 2502.19374 | null |
2025-02-26 | Deep Learning For Time Series Analysis With Application On Human Motion | Ali Ismail-Fawaz et.al. | 2502.19364 | null |
2025-02-26 | DataMan: Data Manager for Pre-training Large Language Models | Ru Peng et.al. | 2502.19363 | null |
2025-02-26 | Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? | Yancheng He et.al. | 2502.19361 | null |
2025-02-26 | Evaluating LLMs and Pre-trained Models for Text Summarization Across Diverse Datasets | Tohida Rehman et.al. | 2502.19339 | null |
2025-02-26 | Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems | Hao Peng et.al. | 2502.19328 | null |
2025-02-26 | Shh, don't say that! Domain Certification in LLMs | Cornelius Emde et.al. | 2502.19320 | null |
2025-02-26 | Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond | Qizhou Wang et.al. | 2502.19301 | null |
2025-02-26 | Agent-centric Information Access | Evangelos Kanoulas et.al. | 2502.19298 | null |
2025-02-26 | Complex LLM Planning via Automated Heuristics Discovery | Hongyi Ling et.al. | 2502.19295 | null |
2025-02-26 | Efficient Federated Search for Retrieval-Augmented Generation | Rachid Guerraoui et.al. | 2502.19280 | null |
2025-02-26 | ArtInsight: Enabling AI-Powered Artwork Engagement for Mixed Visual-Ability Families | Arnavi Chheda-Kothary et.al. | 2502.19263 | null |
2025-02-26 | AI-Powered Bayesian Inference | Veronika Ročková et.al. | 2502.19231 | null |
2025-02-26 | Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time | Jiazheng Li et.al. | 2502.19230 | null |
2025-02-26 | A Lightweight and Extensible Cell Segmentation and Classification Model for Whole Slide Images | Nikita Shvetsov et.al. | 2502.19217 | null |
2025-02-26 | A Hybrid Transformer Architecture with a Quantized Self-Attention Mechanism Applied to Molecular Generation | Anthony M. Smaldone et.al. | 2502.19214 | null |
2025-02-26 | Negation-Induced Forgetting in LLMs | Francesca Capuano et.al. | 2502.19211 | null |
2025-02-26 | Bi'an: A Bilingual Benchmark and Model for Hallucination Detection in Retrieval-Augmented Generation | Zhouyu Jiang et.al. | 2502.19209 | null |
2025-02-26 | Simulation of Language Evolution under Regulated Social Media Platforms: A Synergistic Approach of Large Language Models and Genetic Algorithms | Jinyu Cai et.al. | 2502.19193 | null |
2025-02-26 | BIG-Bench Extra Hard | Mehran Kazemi et.al. | 2502.19187 | null |
2025-02-26 | INFO-SEDD: Continuous Time Markov Chains as Scalable Information Metrics Estimators | Alberto Foresti et.al. | 2502.19183 | null |
2025-02-26 | UQABench: Evaluating User Embedding for Prompting LLMs in Personalized Question Answering | Langming Liu et.al. | 2502.19178 | null |
2025-02-26 | MEDDxAgent: A Unified Modular Agent Framework for Explainable Automatic Differential Diagnosis | Daniel Rose et.al. | 2502.19175 | null |
2025-02-26 | A Model-Centric Review of Deep Learning for Protein Design | Gregory W. Kyro et.al. | 2502.19173 | null |
2025-02-26 | CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation | Kaiwen Yan et.al. | 2502.19166 | null |
2025-02-26 | TestNUC: Enhancing Test-Time Computing Approaches through Neighboring Unlabeled Data Consistency | Henry Peng Zou et.al. | 2502.19163 | null |
2025-02-26 | Detecting Linguistic Indicators for Stereotype Assessment with Large Language Models | Rebekka Görge et.al. | 2502.19160 | null |
2025-02-26 | A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMs | Xuan Ding et.al. | 2502.19159 | null |
2025-02-26 | When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning | Yijiang River Dong et.al. | 2502.19158 | null |
2025-02-26 | Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval | Jiarong Wu et.al. | 2502.19149 | null |
2025-02-26 | Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs | Zhaowei Zhang et.al. | 2502.19148 | null |
2025-02-26 | Identification Under the Semantic Effective Secrecy Constraint | Abdalla Ibrahim et.al. | 2502.19142 | null |
2025-02-26 | A Temporal Planning Framework for Multi-Agent Systems via LLM-Aided Knowledge Base Management | Enrico Saccon et.al. | 2502.19135 | null |
2025-02-26 | Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement | Siyuan Zhang et.al. | 2502.19127 | null |
2025-02-26 | A Survey on Foundation-Model-Based Industrial Defect Detection | Tianle Yang et.al. | 2502.19106 | null |
2025-02-26 | Evaluating Gender Bias in German Machine Translation | Michelle Kappl et.al. | 2502.19104 | null |
2025-02-26 | LongEval: A Comprehensive Analysis of Long-Text Generation Through a Plan-based Paradigm | Siwei Wu et.al. | 2502.19103 | null |
2025-02-26 | Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks Automation | Humza Sami et.al. | 2502.19091 | link |
2025-02-26 | EndoMamba: An Efficient Foundation Model for Endoscopic Videos | Qingyao Tian et.al. | 2502.19090 | null |
2025-02-26 | Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs | Yiheng Yang et.al. | 2502.19078 | null |
2025-02-26 | IndicEval-XL: Bridging Linguistic Diversity in Code Generation Across Indic Languages | Ujjwal Singh et.al. | 2502.19067 | null |
2025-02-26 | Can Large Language Models Outperform Non-Experts in Poetry Evaluation? A Comparative Study Using the Consensual Assessment Technique | Piotr Sawicki et.al. | 2502.19064 | null |
2025-02-26 | MathClean: A Benchmark for Synthetic Mathematical Data Cleaning | Hao Liang et.al. | 2502.19058 | null |
2025-02-26 | Beyond Surface-Level Patterns: An Essence-Driven Defense Framework Against Jailbreak Attacks in LLMs | Shiyu Xiang et.al. | 2502.19041 | null |
2025-02-26 | FungalZSL: Zero-Shot Fungal Classification with Image Captioning Using a Synthetic Data Approach | Anju Rani et.al. | 2502.19038 | null |
2025-02-26 | InternVQA: Advancing Compressed Video Quality Assessment with Distilling Large Foundation Model | Fengbin Guan et.al. | 2502.19026 | null |
2025-02-26 | Binary Neural Networks for Large Language Model: A Survey | Liangdong Liu et.al. | 2502.19008 | null |
2025-02-26 | The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training | Jinbo Wang et.al. | 2502.19002 | null |
2025-02-26 | MEBench: Benchmarking Large Language Models for Cross-Document Multi-Entity Question Answering | Teng Lin et.al. | 2502.18993 | null |
2025-02-26 | OntologyRAG: Better and Faster Biomedical Code Mapping with Retrieval-Augmented Generation (RAG) Leveraging Ontology Knowledge Graphs and Large Language Models | Hui Feng et.al. | 2502.18992 | null |
2025-02-26 | GenTool: Enhancing Tool Generalization in Language Models through Zero-to-One and Weak-to-Strong Simulation | Jie He et.al. | 2502.18990 | null |
2025-02-26 | PEToolLLM: Towards Personalized Tool Learning in Large Language Models | Qiancheng Xu et.al. | 2502.18980 | null |
2025-02-26 | Low-Confidence Gold: Refining Low-Confidence Samples for Efficient Instruction Tuning | Hongyi Cal et.al. | 2502.18978 | null |
2025-02-26 | (Mis)Fitting: A Survey of Scaling Laws | Margaret Li et.al. | 2502.18969 | null |
2025-02-26 | Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles | Kuang Wang et.al. | 2502.18968 | link |
2025-02-26 | OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment | Jiaxin Deng et.al. | 2502.18965 | null |
2025-02-26 | DualSpec: Text-to-spatial-audio Generation via Dual-Spectrogram Guided Diffusion Model | Lei Zhao et.al. | 2502.18952 | null |
2025-02-26 | Towards Label-Only Membership Inference Attack against Pre-trained Large Language Models | Yu He et.al. | 2502.18943 | null |
2025-02-26 | JailBench: A Comprehensive Chinese Security Assessment Benchmark for Large Language Models | Shuyi Liu et.al. | 2502.18935 | null |
2025-02-26 | Talking like Piping and Instrumentation Diagrams (P&IDs) | Achmad Anggawirya Alimin et.al. | 2502.18928 | null |
2025-02-26 | ClassInvGen: Class Invariant Synthesis using Large Language Models | Chuyue Sun et.al. | 2502.18917 | null |
2025-02-26 | END: Early Noise Dropping for Efficient and Effective Context Denoising | Hongye Jin et.al. | 2502.18915 | null |
2025-02-26 | CLLoRA: An Approach to Measure the Effects of the Context Length for LLM Fine-Tuning | Ping Zhang et.al. | 2502.18910 | null |
2025-02-26 | An Empirical Study on Commit Message Generation using LLMs via In-Context Learning | Yifan Wu et.al. | 2502.18904 | null |
2025-02-26 | From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens | Tong Wu et.al. | 2502.18890 | null |
2025-02-26 | Letters from Future Self: Augmenting the Letter-Exchange Exercise with LLM-based Agents to Enhance Young Adults' Career Exploration | Hayeon Jeon et.al. | 2502.18881 | null |
2025-02-26 | Learning to Generate Structured Output with Schema Reinforcement Learning | Yaxi Lu et.al. | 2502.18878 | null |
2025-02-26 | Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework | Kaishuai Xu et.al. | 2502.18874 | null |
2025-02-26 | Multi-LLM Collaborative Search for Complex Problem Solving | Sen Yang et.al. | 2502.18873 | null |
2025-02-26 | A Theoretical Perspective: How to Prevent Model Collapse in Self-consuming Training Loops | Shi Fu et.al. | 2502.18865 | null |
2025-02-26 | Sherlock: Towards Multi-scene Video Abnormal Event Extraction and Localization via a Global-local Spatial-sensitive LLM | Junxiao Ma et.al. | 2502.18863 | null |
2025-02-26 | A Causal Lens for Evaluating Faithfulness Metrics | Kerem Zaman et.al. | 2502.18848 | null |
2025-02-26 | Sliding Window Attention Training for Efficient Large Language Models | Zichuan Fu et.al. | 2502.18845 | null |
2025-02-26 | Evidence-Driven Marker Extraction for Social Media Suicide Risk Detection | Carter Adams et.al. | 2502.18823 | null |
2025-02-26 | Data-Efficient Multi-Agent Spatial Planning with LLMs | Huangyuan Su et.al. | 2502.18822 | null |
2025-02-26 | CAMEx: Curvature-aware Merging of Experts | Dung V. Nguyen et.al. | 2502.18821 | null |
2025-02-26 | Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models | Shuliang Liu et.al. | 2502.18817 | null |
2025-02-26 | Holistic Audit Dataset Generation for LLM Unlearning via Knowledge Graph Traversal and Redundancy Removal | Weipeng Jiang et.al. | 2502.18810 | null |
2025-02-26 | Optimal Stochastic Trace Estimation in Generative Modeling | Xinyang Liu et.al. | 2502.18808 | null |
2025-02-26 | SolEval: Benchmarking Large Language Models for Repository-level Solidity Code Generation | Zhiyuan Peng et.al. | 2502.18793 | null |
2025-02-26 | Active Few-Shot Learning for Text Classification | Saeed Ahmadnia et.al. | 2502.18782 | null |
2025-02-26 | Towards Optimal Multi-draft Speculative Decoding | Zhengmian Hu et.al. | 2502.18779 | null |
2025-02-26 | M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance | Qingpei Guo et.al. | 2502.18778 | null |
2025-02-26 | Plutus: Benchmarking Large Language Models in Low-Resource Greek Finance | Xueqing Peng et.al. | 2502.18772 | null |
2025-02-26 | Exploring Graph Tasks with Pure LLMs: A Comprehensive Benchmark and Investigation | Yuxiang Wang et.al. | 2502.18771 | link |
2025-02-26 | Reward Shaping to Mitigate Reward Hacking in RLHF | Jiayi Fu et.al. | 2502.18770 | null |
2025-02-26 | CommGPT: A Graph and Retrieval-Augmented Multimodal Communication Foundation Model | Feibo Jiang et.al. | 2502.18763 | null |
2025-02-26 | Training Large Recommendation Models via Graph-Language Token Alignment | Mingdai Yang et.al. | 2502.18757 | null |
2025-02-26 | M-ANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical Type | Weiming Hu et.al. | 2502.18755 | null |
2025-02-26 | AgentSociety Challenge: Designing LLM Agents for User Modeling and Recommendation on Web Platforms | Yuwei Yan et.al. | 2502.18754 | null |
2025-02-26 | Spectral-Enhanced Transformers: Leveraging Large-Scale Pretrained Models for Hyperspectral Object Tracking | Shaheer Mohamed et.al. | 2502.18748 | null |
2025-02-26 | Automatic Prompt Optimization via Heuristic Search: A Survey | Wendi Cui et.al. | 2502.18746 | null |
2025-02-25 | DRAMA: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers | Xueguang Ma et.al. | 2502.18460 | null |
2025-02-25 | LLM-Based Design Pattern Detection | Christian Schindler et.al. | 2502.18458 | null |
2025-02-25 | FRIDA to the Rescue! Analyzing Synthetic Data Effectiveness in Object-Based Common Sense Reasoning for Disaster Response | Mollie Shichman et.al. | 2502.18452 | null |
2025-02-25 | SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution | Yuxiang Wei et.al. | 2502.18449 | null |
2025-02-25 | MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning | Chanwoo Park et.al. | 2502.18439 | null |
2025-02-25 | TextGames: Learning to Self-Play Text-Based Puzzle Games via Language Model Reasoning | Frederikus Hudi et.al. | 2502.18431 | null |
2025-02-25 | OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference | Xiangyu Zhao et.al. | 2502.18411 | null |
2025-02-25 | Enhancing DNA Foundation Models to Address Masking Inefficiencies | Monireh Safari et.al. | 2502.18405 | null |
2025-02-25 | Monte Carlo Temperature: a robust sampling strategy for LLM's uncertainty quantification methods | Nicola Cecere et.al. | 2502.18389 | null |
2025-02-25 | How Far are LLMs from Real Search? A Comprehensive Study on Efficiency, Completeness, and Inherent Capabilities | Minhua Lin et.al. | 2502.18387 | null |
2025-02-25 | MindMem: Multimodal for Predicting Advertisement Memorability Using LLMs and Deep Learning | Sepehr Asgarian et.al. | 2502.18371 | null |
2025-02-25 | Sparse Bayesian Generative Modeling for Joint Parameter and Channel Estimation | Benedikt Böck et.al. | 2502.18369 | null |
2025-02-25 | ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation | Yifan Pu et.al. | 2502.18364 | null |
2025-02-25 | Responsible AI Agents | Deven R. Desai et.al. | 2502.18359 | null |
2025-02-25 | Which Contributions Deserve Credit? Perceptions of Attribution in Human-AI Co-Creation | Jessica He et.al. | 2502.18357 | null |
2025-02-25 | BRIDO: Bringing Democratic Order to Abstractive Summarization | Junhyun Lee et.al. | 2502.18342 | null |
2025-02-25 | Mapping of Subjective Accounts into Interpreted Clusters (MOSAIC): Topic Modelling and LLM applied to Stroboscopic Phenomenology | Romy Beauté et.al. | 2502.18318 | null |
2025-02-25 | GCDance: Genre-Controlled 3D Full Body Dance Generation Driven By Music | Xinran Liu et.al. | 2502.18309 | null |
2025-02-25 | RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction | Jianhao Yan et.al. | 2502.18308 | null |
2025-02-25 | LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation | Pengzhi Li et.al. | 2502.18302 | null |
2025-02-25 | Bayesian Computation in Deep Learning | Wenlong Chen et.al. | 2502.18300 | null |
2025-02-25 | DeepCircuitX: A Comprehensive Repository-Level Dataset for RTL Code Understanding, Generation, and PPA Analysis | Zeju Li et.al. | 2502.18297 | null |
2025-02-25 | AMPO: Active Multi-Preference Optimization | Taneesh Gupta et.al. | 2502.18293 | null |
2025-02-25 | Better Aligned with Survey Respondents or Training Data? Unveiling Political Leanings of LLMs on U.S. Supreme Court Cases | Shanshan Xu et.al. | 2502.18282 | null |
2025-02-25 | Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision Support | Guoxin Wang et.al. | 2502.18274 | null |
2025-02-25 | Imperfect Knowledge Management (IKM) in GEFRED (GENeralized model for Fuzzy RElational Databases) | Leoncio Jimenez et.al. | 2502.18255 | null |
2025-02-25 | Iterative Counterfactual Data Augmentation | Mitchell Plyler et.al. | 2502.18249 | null |
2025-02-25 | Unveiling and Causalizing CoT: A Causal Pespective | Jiarun Fu et.al. | 2502.18239 | null |
2025-02-25 | Beyond the convexity assumption: Realistic tabular data generation under quantifier-free real linear constraints | Mihaela Cătălina Stoian et.al. | 2502.18237 | null |
2025-02-25 | Debt Collection Negotiations with Large Language Models: An Evaluation System and Optimizing Decision Making with Multi-Agent | Xiaofeng Wang et.al. | 2502.18228 | null |
2025-02-25 | From ChatGPT to DeepSeek: Can LLMs Simulate Humanity? | Qian Wang et.al. | 2502.18210 | null |
2025-02-25 | LAG: LLM agents for Leaderboard Auto Generation on Demanding | Jian Wu et.al. | 2502.18209 | null |
2025-02-25 | Grandes modelos de lenguaje: de la predicción de palabras a la comprensión? | Carlos Gómez-Rodríguez et.al. | 2502.18205 | null |
2025-02-25 | Intersubjective Model of AI-mediated Communication: Augmenting Human-Human Text Chat through LLM-based Adaptive Agent Pair | Shutaro Aoyama et.al. | 2502.18201 | null |
2025-02-25 | Task-Agnostic Semantic Communication with Multimodal Foundation Models | Jiangjing Hu et.al. | 2502.18200 | null |
2025-02-25 | Agnostic calculation of atomic free energies with the descriptor density of states | Thomas D Swinburne et.al. | 2502.18191 | null |
2025-02-25 | ChatMotion: A Multimodal Multi-Agent for Human Motion Analysis | Li Lei et.al. | 2502.18180 | null |
2025-02-25 | Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs | Gaye Colakoglu et.al. | 2502.18179 | null |
2025-02-25 | CLIPure: Purification in Latent Space via CLIP for Adversarially Robust Zero-Shot Classification | Mingkun Zhang et.al. | 2502.18176 | null |
2025-02-25 | SECURA: Sigmoid-Enhanced CUR Decomposition with Uninterrupted Retention and Low-Rank Adaptation in Large Language Models | Zhang Yuxuan et.al. | 2502.18168 | null |
2025-02-25 | Can LLMs Explain Themselves Counterfactually? | Zahra Dehghanighobadi et.al. | 2502.18156 | null |
2025-02-25 | Carbon and Silicon, Coexist or Compete? A Survey on Human-AI Interactions in Agent-based Modeling and Simulation | Ziyue Lin et.al. | 2502.18145 | null |
2025-02-25 | LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers | Zhuocheng Zhang et.al. | 2502.18139 | null |
2025-02-25 | Large Language Model Driven Agents for Simulating Echo Chamber Formation | Chenhao Gu et.al. | 2502.18138 | null |
2025-02-25 | Inverse Materials Design by Large Language Model-Assisted Generative Framework | Yun Hao et.al. | 2502.18127 | null |
2025-02-25 | HyperG: Hypergraph-Enhanced LLMs for Structured Knowledge | Sirui Huang et.al. | 2502.18125 | null |
2025-02-25 | Bayesian Optimization for Controlled Image Editing via LLMs | Chengkun Cai et.al. | 2502.18116 | null |
2025-02-25 | PromptMID: Modal Invariant Descriptors Based on Diffusion and Vision Foundation Models for Optical-SAR Image Matching | Han Nie et.al. | 2502.18104 | null |
2025-02-25 | Detecting Offensive Memes with Social Biases in Singapore Context Using Multimodal Large Language Models | Cao Yuxuan et.al. | 2502.18101 | link |
2025-02-25 | Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning | Wenkai Yang et.al. | 2502.18080 | null |
2025-02-25 | Examining the Threat Landscape: Foundation Models and Model Stealing | Ankita Raj et.al. | 2502.18077 | null |
2025-02-25 | MRBTP: Efficient Multi-Robot Behavior Tree Planning and Collaboration | Yishuai Cai et.al. | 2502.18072 | null |
2025-02-25 | Golden Ratio Mixing of Real and Synthetic Data for Stabilizing Generative Model Training | Hengzhi He et.al. | 2502.18049 | null |
2025-02-25 | AutoCas: Autoregressive Cascade Predictor in Social Networks via Large Language Models | Yuhao Zheng et.al. | 2502.18040 | null |
2025-02-25 | Harnessing Multiple Large Language Models: A Survey on LLM Ensemble | Zhijun Chen et.al. | 2502.18036 | null |
2025-02-25 | Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference | Zhuo Chen et.al. | 2502.18023 | null |
2025-02-25 | AfroXLMR-Comet: Multilingual Knowledge Distillation with Attention Matching for Low-Resource languages | Joshua Sakthivel Raju et.al. | 2502.18020 | null |
2025-02-25 | NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms | Yashan Wang et.al. | 2502.18008 | null |
2025-02-25 | Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning | Xinghao Chen et.al. | 2502.18001 | null |
2025-02-25 | Model-Free Adversarial Purification via Coarse-To-Fine Tensor Network Representation | Guang Lin et.al. | 2502.17972 | null |
2025-02-25 | LLM Knows Geometry Better than Algebra: Numerical Understanding of LLM-Based Agents in A Trading Arena | Tianmi Ma et.al. | 2502.17967 | null |
2025-02-25 | Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments | Patomporn Payoungkhamdee et.al. | 2502.17956 | null |
2025-02-25 | DeepSeek-R1 Outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in Bilingual Complex Ophthalmology Reasoning | Pusheng Xu et.al. | 2502.17947 | null |
2025-02-25 | Assessing Large Language Models in Agentic Multilingual National Bias | Qianying Liu et.al. | 2502.17945 | null |
2025-02-25 | CaseGen: A Benchmark for Multi-Stage Legal Case Documents Generation | Haitao Li et.al. | 2502.17943 | null |
2025-02-25 | Advantage-Guided Distillation for Preference Alignment in Small Language Models | Shiping Gao et.al. | 2502.17927 | null |
2025-02-25 | LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction | Suozhi Huang et.al. | 2502.17925 | null |
2025-02-25 | FACT-AUDIT: An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models | Hongzhan Lin et.al. | 2502.17924 | null |
2025-02-25 | Towards Sustainable Web Agents: A Plea for Transparency and Dedicated Metrics for Energy Consumption | Lars Krupp et.al. | 2502.17903 | null |
2025-02-25 | Knowledge-enhanced Multimodal ECG Representation Learning with Arbitrary-Lead Inputs | Che Liu et.al. | 2502.17900 | null |
2025-02-25 | Can Large Language Models Identify Implicit Suicidal Ideation? An Empirical Evaluation | Tong Li et.al. | 2502.17899 | null |
2025-02-25 | FetchBot: Object Fetching in Cluttered Shelves via Zero-Shot Sim2Real | Weiheng Liu et.al. | 2502.17894 | null |
2025-02-25 | RankCoT: Refining Knowledge for Retrieval-Augmented Generation through Ranking Chain-of-Thoughts | Mingyan Wu et.al. | 2502.17888 | null |
2025-02-25 | Science Across Languages: Assessing LLM Multilingual Translation of Scientific Papers | Hannah Calzi Kleidermacher et.al. | 2502.17882 | null |
2025-02-25 | EEGM2: An Efficient Mamba-2-Based Self-Supervised Framework for Long-Sequence EEG Modeling | Jiazhen Hong et.al. | 2502.17873 | null |
2025-02-25 | ASurvey: Spatiotemporal Consistency in Video Generation | Zhiyu Yin et.al. | 2502.17863 | null |
2025-02-25 | HRR: Hierarchical Retrospection Refinement for Generated Image Detection | Peipei Yuan et.al. | 2502.17862 | null |
2025-02-25 | LR |
Jianghao Chen et.al. | 2502.17848 | null |
2025-02-25 | Quantifying interdisciplinary synergy in higher STEM education | Gahyoun Gim et.al. | 2502.17841 | null |
2025-02-25 | A Combinatorial Identities Benchmark for Theorem Proving via Automated Theorem Generation | Beibei Xiong et.al. | 2502.17840 | null |
2025-02-25 | TagGAN: A Generative Model for Data Tagging | Muhammad Nawaz et.al. | 2502.17836 | null |
2025-02-25 | MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks | Hyeonjeong Ha et.al. | 2502.17832 | null |
2025-02-25 | A General Framework to Enhance Fine-tuning-based LLM Unlearning | Jie Ren et.al. | 2502.17823 | null |
2025-02-25 | An Overview of Large Language Models for Statisticians | Wenlong Ji et.al. | 2502.17814 | null |
2025-02-25 | Can Multimodal LLMs Perform Time Series Anomaly Detection? | Xiongxiao Xu et.al. | 2502.17812 | null |
2025-02-25 | URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models | Ruiqi Yan et.al. | 2502.17810 | null |
2025-02-25 | DocPuzzle: A Process-Aware Benchmark for Evaluating Realistic Long-Context Reasoning Capabilities | Tianyi Zhuang et.al. | 2502.17807 | null |
2025-02-25 | Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training | Yihang Yao et.al. | 2502.17800 | null |
2025-02-25 | AIR: Complex Instruction Generation via Automatic Iterative Refinement | Wei Liu et.al. | 2502.17787 | null |
2025-02-25 | Exploring the Potential of Large Language Models for Estimating the Reading Comprehension Question Difficulty | Yoshee Jain et.al. | 2502.17785 | null |
2025-02-25 | Tip of the Tongue Query Elicitation for Simulated Evaluation | Yifan He et.al. | 2502.17776 | null |
2025-02-25 | FoREST: Frame of Reference Evaluation in Spatial Reasoning Tasks | Tanawan Premsri et.al. | 2502.17775 | null |
2025-02-25 | Uncertainty Quantification for LLM-Based Survey Simulations | Chengpiao Huang et.al. | 2502.17773 | null |
2025-02-25 | DeepSeek vs. ChatGPT: A Comparative Study for Scientific Computing and Scientific Machine Learning Tasks | Qile Jiang et.al. | 2502.17764 | null |
2025-02-25 | Design and implementation of a distributed security threat detection system integrating federated learning and multimodal LLM | Yuqing Wang et.al. | 2502.17763 | null |
2025-02-25 | Detection of LLM-Paraphrased Code and Identification of the Responsible LLM Using Coding Style Features | Shinwoo Park et.al. | 2502.17749 | null |
2025-02-24 | LLM Inference Acceleration via Efficient Operation Fusion | Mahsa Salmani et.al. | 2502.17728 | null |
2025-02-24 | Can Score-Based Generative Modeling Effectively Handle Medical Image Classification? | Sushmita Sarker et.al. | 2502.17727 | null |
2025-02-24 | Spontaneous Giving and Calculated Greed in Language Models | Yuxuan Li et.al. | 2502.17720 | null |
2025-02-24 | Mind the Gesture: Evaluating AI Sensitivity to Culturally Offensive Non-Verbal Gestures | Akhila Yerukola et.al. | 2502.17710 | null |
2025-02-24 | Fractal Generative Models | Tianhong Li et.al. | 2502.17437 | link |
2025-02-24 | Introducing Visual Perception Token into Multimodal Large Language Model | Runpeng Yu et.al. | 2502.17425 | link |
2025-02-24 | MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs | Jiarui Zhang et.al. | 2502.17422 | link |
2025-02-24 | LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification | Penghui Yang et.al. | 2502.17421 | link |
2025-02-24 | The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence | Tom Wollschläger et.al. | 2502.17420 | null |
2025-02-24 | From System 1 to System 2: A Survey of Reasoning Large Language Models | Zhong-Zhi Li et.al. | 2502.17419 | link |
2025-02-24 | Reasoning with Latent Thoughts: On the Power of Looped Transformers | Nikunj Saunshi et.al. | 2502.17416 | null |
2025-02-24 | COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs | Liming Liu et.al. | 2502.17410 | link |
2025-02-24 | Large Language Models are Powerful EHR Encoders | Stefan Hegselmann et.al. | 2502.17403 | null |
2025-02-24 | What is a Good Question? Utility Estimation with LLM-based Simulations | Dong-Ho Lee et.al. | 2502.17383 | null |
2025-02-24 | KV-Edit: Training-Free Image Editing for Precise Background Preservation | Tianrui Zhu et.al. | 2502.17363 | link |
2025-02-24 | A Closer Look at TabPFN v2: Strength, Limitation, and Extension | Han-Jia Ye et.al. | 2502.17361 | null |
2025-02-24 | RELICT: A Replica Detection Framework for Medical Image Generation | Orhun Utku Aydin et.al. | 2502.17360 | null |
2025-02-24 | On Relation-Specific Neurons in Large Language Models | Yihong Liu et.al. | 2502.17355 | link |
2025-02-24 | How Scientists Use Large Language Models to Program | Gabrielle O'Brien et.al. | 2502.17348 | null |
2025-02-24 | Time series forecasting based on optimized LLM for fault prediction in distribution power grid insulators | João Pedro Matos-Carvalho et.al. | 2502.17341 | null |
2025-02-24 | HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization | Zhenghao Liu et.al. | 2502.17315 | link |
2025-02-24 | Delta Decompression for MoE-based LLMs Compression | Hao Gu et.al. | 2502.17298 | link |
2025-02-24 | Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts | Zhenghao Liu et.al. | 2502.17297 | null |
2025-02-24 | Integrating protein sequence embeddings with structure via graph-based deep learning for the prediction of single-residue properties | Kevin Michalewicz et.al. | 2502.17294 | null |
2025-02-24 | Capability Instruction Tuning: A New Paradigm for Dynamic LLM Routing | Yi-Kai Zhang et.al. | 2502.17282 | link |
2025-02-24 | MonoTODia: Translating Monologue Requests to Task-Oriented Dialogues | Sebastian Steindl et.al. | 2502.17268 | null |
2025-02-24 | Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective | Chengyin Xu et.al. | 2502.17262 | null |
2025-02-24 | Detecting Benchmark Contamination Through Watermarking | Tom Sander et.al. | 2502.17259 | null |
2025-02-24 | REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective | Simon Geisler et.al. | 2502.17254 | null |
2025-02-24 | Alpha-SQL: Zero-Shot Text-to-SQL using Monte Carlo Tree Search | Boyan Li et.al. | 2502.17248 | null |
2025-02-24 | Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction | Tianpeng Li et.al. | 2502.17239 | link |
2025-02-24 | Making LLMs Reason? The Intermediate Language Problem in Neurosymbolic Approaches | Alexander Beiser et.al. | 2502.17216 | null |
2025-02-24 | CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought | Boxuan Zhang et.al. | 2502.17214 | link |
2025-02-24 | Order Matters: Investigate the Position Bias in Multi-constraint Instruction Following | Jie Zeng et.al. | 2502.17204 | link |
2025-02-24 | IGDA: Interactive Graph Discovery through Large Language Model Agents | Alex Havrilla et.al. | 2502.17189 | null |
2025-02-24 | Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks | Andrei Chernov et.al. | 2502.17187 | null |
2025-02-24 | Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric | Yuming Yang et.al. | 2502.17184 | link |
2025-02-24 | Unsupervised Accelerated MRI Reconstruction via Ground-Truth-Free Flow Matching | Xinzhe Luo et.al. | 2502.17174 | null |
2025-02-24 | Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch | Xueru Wen et.al. | 2502.17173 | null |
2025-02-24 | Logic Haystacks: Probing LLMs Long-Context Logical Reasoning (Without Easily Identifiable Unrelated Padding) | Damien Sileo et.al. | 2502.17169 | null |
2025-02-24 | JUREX-4E: Juridical Expert-Annotated Four-Element Knowledge Base for Legal Reasoning | Huanghai Liu et.al. | 2502.17166 | link |
2025-02-24 | MEMERAG: A Multilingual End-to-End Meta-Evaluation Benchmark for Retrieval Augmented Generation | María Andrea Cruz Blandón et.al. | 2502.17163 | null |
2025-02-24 | Real-time Monitoring of Economic Shocks using Company Websites | Michael Koenig et.al. | 2502.17161 | null |
2025-02-24 | A Pragmatic Note on Evaluating Generative Models with Fréchet Inception Distance for Retinal Image Synthesis | Yuli Wu et.al. | 2502.17160 | null |
2025-02-24 | Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation | Fanhu Zeng et.al. | 2502.17159 | null |
2025-02-24 | CodeSwift: Accelerating LLM Inference for Efficient Code Generation | Qianhui Zhao et.al. | 2502.17139 | null |
2025-02-24 | Evaluating the Effectiveness of Large Language Models in Automated News Article Summarization | Lionel Richy Panlap Houamegni et.al. | 2502.17136 | null |
2025-02-24 | Applications of Large Models in Medicine | YunHe Su et.al. | 2502.17132 | null |
2025-02-24 | Thus Spake Long-Context Large Language Model | Xiaoran Liu et.al. | 2502.17129 | null |
2025-02-24 | Adversarial Training for Defense Against Label Poisoning Attacks | Melis Ilayda Bal et.al. | 2502.17121 | link |
2025-02-24 | Diffusion Models for Tabular Data: Challenges, Current Progress, and Future Directions | Zhong Li et.al. | 2502.17119 | link |
2025-02-24 | SFLD: Reducing the content bias for AI-generated Image Detection | Seoyeon Gye et.al. | 2502.17105 | null |
2025-02-24 | Generative Models in Decision Making: A Survey | Yinchuan Li et.al. | 2502.17100 | null |
2025-02-24 | Improved Diffusion-based Generative Model with Better Adversarial Robustness | Zekun Wang et.al. | 2502.17099 | link |
2025-02-24 | Conditional Diffusion-Flow models for generating 3D cosmic density fields: applications to f(R) cosmologies | Julieth Katherine Riveros et.al. | 2502.17087 | null |
2025-02-24 | Automatically Evaluating the Paper Reviewing Capability of Large Language Models | Hyungyu Shin et.al. | 2502.17086 | null |
2025-02-24 | Pleno-Generation: A Scalable Generative Face Video Compression Framework with Bandwidth Intelligence | Bolin Chen et.al. | 2502.17085 | null |
2025-02-24 | Systematic Weight Evaluation for Pruning Large Language Models: Enhancing Performance and Sustainability | Ashhadul Islam et.al. | 2502.17071 | null |
2025-02-24 | LLM-QE: Improving Query Expansion by Aligning Large Language Models with Ranking Preferences | Sijia Yao et.al. | 2502.17057 | link |
2025-02-24 | PrivaCI-Bench: Evaluating Privacy with Contextual Integrity and Legal Compliance | Haoran Li et.al. | 2502.17041 | null |
2025-02-24 | Evolution 6.0: Evolving Robotic Capabilities Through Generative Design | Muhammad Haris Khan et.al. | 2502.17034 | null |
2025-02-24 | Understanding the Uncertainty of LLM Explanations: A Perspective Based on Reasoning Topology | Longchao Da et.al. | 2502.17026 | null |
2025-02-24 | Towards Auto-Regressive Next-Token Prediction: In-Context Learning Emerges from Generalization | Zixuan Gong et.al. | 2502.17024 | null |
2025-02-24 | Quantifying Logical Consistency in Transformers via Query-Key Alignment | Eduard Tulchinskii et.al. | 2502.17017 | null |
2025-02-24 | Predicting Liquidity-Aware Bond Yields using Causal GANs and Deep Reinforcement Learning with LLM Evaluation | Jaskaran Singh Walia et.al. | 2502.17011 | null |
2025-02-24 | Be CIM or Be Memory: A Dual-mode-aware DNN Compiler for CIM Accelerators | Shixin Zhao et.al. | 2502.17006 | null |
2025-02-24 | An Enhanced Large Language Model For Cross Modal Query Understanding System Using DL-KeyBERT Based CAZSSCL-MPGPT | Shreya Singh et.al. | 2502.17000 | null |
2025-02-24 | Active Learning for Conditional Inverse Design with Crystal Generation and Foundation Atomic Models | Zhuoyuan Li et.al. | 2502.16984 | null |
2025-02-24 | LongSafety: Evaluating Long-Context Safety of Large Language Models | Yida Lu et.al. | 2502.16971 | link |
2025-02-24 | Autoregressive Image Generation Guided by Chains of Thought | Miaomiao Cai et.al. | 2502.16965 | null |
2025-02-24 | Make LLM Inference Affordable to Everyone: Augmenting GPU Memory with NDP-DIMM | Lian Liu et.al. | 2502.16963 | null |
2025-02-24 | UrduLLaMA 1.0: Dataset Curation, Preprocessing, and Evaluation in Low-Resource Settings | Layba Fiaz et.al. | 2502.16961 | null |
2025-02-24 | Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance | Chenghua Huang et.al. | 2502.16944 | null |
2025-02-24 | Reasoning Does Not Necessarily Improve Role-Playing Ability | Xiachong Feng et.al. | 2502.16940 | null |
2025-02-24 | BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference | Zewen Jin et.al. | 2502.16927 | null |
2025-02-24 | FilterLLM: Text-To-Distribution LLM for Billion-Scale Cold-Start Recommendation | Ruochen Liu et.al. | 2502.16924 | null |
2025-02-24 | A Systematic Survey of Automatic Prompt Optimization Techniques | Kiran Ramnath et.al. | 2502.16923 | null |
2025-02-24 | Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties | Zhenglin Wang et.al. | 2502.16922 | null |
2025-02-24 | SS-MPC: A Sequence-Structured Multi-Party Conversation System | Yoonjin Jang et.al. | 2502.16920 | null |
2025-02-24 | Multi-Dimensional Quality Assessment for Text-to-3D Assets: Dataset and Model | Kang Fu et.al. | 2502.16915 | null |
2025-02-24 | SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models | Kevin Miller et.al. | 2502.16911 | null |
2025-02-24 | AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language Models | Qin Zhu et.al. | 2502.16906 | null |
2025-02-24 | GuidedBench: Equipping Jailbreak Evaluation with Guidelines | Ruixuan Huang et.al. | 2502.16903 | null |
2025-02-24 | Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinment | Suchae Jeong et.al. | 2502.16902 | null |
2025-02-24 | Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs | Himanshu Beniwal et.al. | 2502.16901 | link |
2025-02-24 | Zero-shot Load Forecasting for Integrated Energy Systems: A Large Language Model-based Framework with Multi-task Learning | Jiaheng Li et.al. | 2502.16896 | null |
2025-02-24 | Unlocking Scientific Concepts: How Effective Are LLM-Generated Analogies for Student Understanding and Classroom Practice? | Zekai Shao et.al. | 2502.16895 | null |
2025-02-24 | Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment | Chenghao Fan et.al. | 2502.16894 | null |
2025-02-24 | Applying LLMs to Active Learning: Towards Cost-Efficient Cross-Task Text Classification without Manually Labeled Data | Yejian Zhang et.al. | 2502.16892 | null |
2025-02-24 | Unveiling Institution-Specific Bias in Pathology Foundation Models: Detriments, Causes, and Potential Solutions | Weiping Lin et.al. | 2502.16889 | null |
2025-02-24 | DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance | Xuanfan Ni et.al. | 2502.16886 | null |
2025-02-24 | CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter | Yepeng Weng et.al. | 2502.16880 | null |
2025-02-24 | A Multi-LLM-Agent-Based Framework for Economic and Public Policy Analysis | Yuzhi Hao et.al. | 2502.16879 | null |
2025-02-24 | Graphy'our Data: Towards End-to-End Modeling, Exploring and Generating Report from Raw Data | Longbin Lai et.al. | 2502.16868 | null |
2025-02-24 | Leveraging Large Language Models for Effective and Explainable Multi-Agent Credit Assignment | Kartik Nagpal et.al. | 2502.16863 | null |
2025-02-24 | LongAttn: Selecting Long-context Training Data via Token-level Attention | Longyun Wu et.al. | 2502.16860 | null |
2025-02-24 | Sarang at DEFACTIFY 4.0: Detecting AI-Generated Text Using Noised Data and an Ensemble of DeBERTa Models | Avinash Trivedi et.al. | 2502.16857 | null |
2025-02-24 | Improving LLM General Preference Alignment via Optimistic Online Mirror Descent | Yuheng Zhang et.al. | 2502.16852 | null |
2025-02-24 | Exploring Causes and Mitigation of Hallucinations in Large Vision Language Models | Yaqi Sun et.al. | 2502.16842 | null |
2025-02-24 | Fair Foundation Models for Medical Image Analysis: Challenges and Perspectives | Dilermando Queiroz et.al. | 2502.16841 | null |
2025-02-24 | In-context learning of evolving data streams with tabular foundational models | Afonso Lourenço et.al. | 2502.16840 | null |
2025-02-24 | "Actionable Help" in Crises: A Novel Dataset and Resource-Efficient Models for Identifying Request and Offer Social Media Posts | Rabindra Lamsal et.al. | 2502.16839 | null |
2025-02-24 | REGen: A Reliable Evaluation Framework for Generative Event Argument Extraction | Omar Sharif et.al. | 2502.16838 | null |
2025-02-24 | Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization | Yao Xiao et.al. | 2502.16825 | null |
2025-02-21 | ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval | Guanqi Zhan et.al. | 2502.15682 | null |
2025-02-21 | Privacy Ripple Effects from Adding or Removing Personal Information in Language Model Training | Jaydeep Borkar et.al. | 2502.15680 | null |
2025-02-21 | FLEKE: Federated Locate-then-Edit Knowledge Editing | Zongkai Zhao et.al. | 2502.15677 | null |
2025-02-21 | AutoToM: Automated Bayesian Inverse Planning and Model Discovery for Open-ended Theory of Mind | Zhining Zhang et.al. | 2502.15676 | null |
2025-02-21 | VaViM and VaVAM: Autonomous Driving through Video Generative Modeling | Florent Bartoccioni et.al. | 2502.15672 | link |
2025-02-21 | Almost AI, Almost Human: The Challenge of Detecting AI-Polished Writing | Shoumik Saha et.al. | 2502.15666 | null |
2025-02-21 | Machine-generated text detection prevents language model collapse | George Drayson et.al. | 2502.15654 | null |
2025-02-21 | Empowering LLMs with Logical Reasoning: A Comprehensive Survey | Fengxiang Cheng et.al. | 2502.15652 | null |
2025-02-21 | Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models | Anirudh Sundar et.al. | 2502.15639 | null |
2025-02-21 | Mantis: Lightweight Calibrated Foundation Model for User-Friendly Time Series Classification | Vasilii Feofanov et.al. | 2502.15637 | null |
2025-02-21 | The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer | Marthe Ballon et.al. | 2502.15631 | null |
2025-02-21 | Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing | Qi Le et.al. | 2502.15618 | null |
2025-02-21 | On the Robustness of Transformers against Context Hijacking for Linear Classification | Tianle Li et.al. | 2502.15609 | null |
2025-02-21 | Cross-Format Retrieval-Augmented Generation in XR with LLMs for Context-Aware Maintenance Assistance | Akos Nagy et.al. | 2502.15604 | null |
2025-02-21 | Do Multilingual LLMs Think In English? | Lisa Schut et.al. | 2502.15603 | null |
2025-02-21 | WorldCraft: Photo-Realistic 3D World Creation and Customization via LLM Agents | Xinhang Liu et.al. | 2502.15601 | null |
2025-02-21 | SafeInt: Shielding Large Language Models from Jailbreak Attacks via Safety-Aware Representation Intervention | Jiaqi Wu et.al. | 2502.15594 | null |
2025-02-21 | Generalizing From Short to Long: Effective Data Synthesis for Long-Context Instruction Tuning | Wenhao Zhu et.al. | 2502.15592 | null |
2025-02-21 | LightThinker: Thinking Step-by-Step Compression | Jintian Zhang et.al. | 2502.15589 | null |
2025-02-21 | Chats-Grid: An Iterative Retrieval Q&A Optimization Scheme Leveraging Large Model and Retrieval Enhancement Generation in smart grid | Yunfeng Li et.al. | 2502.15583 | null |
2025-02-21 | Fine-tuning foundation models of materials interatomic potentials with frozen transfer learning | Mariia Radova et.al. | 2502.15582 | null |
2025-02-21 | Interpreting and Steering LLMs with Mutual Information-based Explanations on Sparse Autoencoders | Xuansheng Wu et.al. | 2502.15576 | null |
2025-02-21 | DReSD: Dense Retrieval for Speculative Decoding | Milan Gritta et.al. | 2502.15572 | null |
2025-02-21 | A Cautionary Tale About "Neutrally" Informative AI Tools Ahead of the 2025 Federal Elections in Germany | Ina Dormuth et.al. | 2502.15568 | null |
2025-02-21 | PIP-KAG: Mitigating Knowledge Conflicts in Knowledge-Augmented Generation via Parametric Pruning | Pengcheng Huang et.al. | 2502.15543 | null |
2025-02-21 | Accurate and efficient machine learning interatomic potentials for finite temperature modeling of molecular crystals | Flaviano Della Pia et.al. | 2502.15530 | null |
2025-02-21 | Scaling Sparse and Dense Retrieval in Decoder-Only LLMs | Hansi Zeng et.al. | 2502.15526 | null |
2025-02-21 | Towards Swift Serverless LLM Cold Starts with ParaServe | Chiheng Lou et.al. | 2502.15524 | null |
2025-02-21 | Activation Steering in Neural Theorem Provers | Shashank Kirtania et.al. | 2502.15507 | null |
2025-02-21 | Construction and Evaluation of LLM-based agents for Semi-Autonomous penetration testing | Masaya Kobayashi et.al. | 2502.15506 | null |
2025-02-21 | Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models | Ya Wang et.al. | 2502.15499 | null |
2025-02-21 | Programmers Aren't Obsolete Yet: A Syllabus for Teaching CS Students to Responsibly Use Large Language Models for Code Generation | Bruno Pereira Cipriano et.al. | 2502.15493 | null |
2025-02-21 | ExpliCa: Evaluating Explicit Causal Reasoning in Large Language Models | Martina Miliani et.al. | 2502.15487 | null |
2025-02-21 | Enhancing RWKV-based Language Models for Long-Sequence Text Generation | Xinghan Pan et.al. | 2502.15485 | null |
2025-02-21 | FaultGPT: Industrial Fault Diagnosis Question Answering System by Vision Language Models | Jiao Chen et.al. | 2502.15481 | null |
2025-02-21 | PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System | Yintao He et.al. | 2502.15470 | null |
2025-02-21 | Mitigating Data Scarcity in Time Series Analysis: A Foundation Model with Series-Symbol Data Generation | Wenxuan Wang et.al. | 2502.15466 | null |
2025-02-21 | Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs | Gengyuan Zhang et.al. | 2502.15457 | null |
2025-02-21 | R-LoRA: Random Initialization of Multi-Head LoRA for Multi-Task Learning | Jinda Liu et.al. | 2502.15455 | null |
2025-02-21 | A fast convergence algorithm based on binary integer programming for expert load balancing in MoE LLMs | Yuan Sun et.al. | 2502.15451 | null |
2025-02-21 | When Compression Meets Model Compression: Memory-Efficient Double Compression for Large Language Models | Weilan Wang et.al. | 2502.15443 | null |
2025-02-21 | On the Effectiveness of Large Language Models in Writing Alloy Formulas | Yang Hong et.al. | 2502.15441 | null |
2025-02-21 | Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tuning | Raghav Singhal et.al. | 2502.15436 | link |
2025-02-21 | Single-pass Detection of Jailbreaking Input in Large Language Models | Leyla Naz Candogan et.al. | 2502.15435 | null |
2025-02-21 | Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation | Yue Zhou et.al. | 2502.15434 | null |
2025-02-21 | Pub-Guard-LLM: Detecting Fraudulent Biomedical Articles with Reliable Explanations | Lihu Chen et.al. | 2502.15429 | null |
2025-02-21 | Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs | Giulio Zizzo et.al. | 2502.15427 | null |
2025-02-21 | Beyond Translation: LLM-Based Data Generation for Multilingual Fact-Checking | Yi-Ling Chung et.al. | 2502.15419 | null |
2025-02-21 | MHQA: A Diverse, Knowledge Intensive Mental Health Question Answering Challenge for Language Models | Suraj Racha et.al. | 2502.15418 | null |
2025-02-21 | HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings | Rasmus Aavang et.al. | 2502.15411 | null |
2025-02-21 | Problem-Solving Logic Guided Curriculum In-Context Learning for LLMs Complex Reasoning | Xuetao Ma et.al. | 2502.15401 | null |
2025-02-21 | Beyond Tools: Understanding How Heavy Users Integrate LLMs into Everyday Tasks and Decision-Making | Eunhye Kim et.al. | 2502.15395 | null |
2025-02-21 | Chitrarth: Bridging Vision and Language for a Billion People | Shaharukh Khan et.al. | 2502.15392 | null |
2025-02-21 | MOVE: A Mixture-of-Vision-Encoders Approach for Domain-Focused Vision-Language Processing | Matvey Skripkin et.al. | 2502.15381 | null |
2025-02-21 | Weakly Supervised Video Scene Graph Generation via Natural Language Supervision | Kibum Kim et.al. | 2502.15370 | null |
2025-02-21 | Identifying Features that Shape Perceived Consciousness in Large Language Model-based AI: A Quantitative Study of Human Responses | Kang Bongsu et.al. | 2502.15365 | null |
2025-02-21 | Evaluating Social Biases in LLM Reasoning | Xuyang Wu et.al. | 2502.15361 | null |
2025-02-21 | ARS: Automatic Routing Solver with Large Language Models | Kai Li et.al. | 2502.15359 | null |
2025-02-21 | AttentionEngine: A Versatile Framework for Efficient Attention Mechanisms on Diverse Hardware Platforms | Feiyang Chen et.al. | 2502.15349 | null |
2025-02-21 | Constructing a Norm for Children's Scientific Drawing: Distribution Features Based on Semantic Similarity of Large Language Models | Yi Zhang et.al. | 2502.15348 | null |
2025-02-21 | Efficiently Solving Discounted MDPs with Predictions on Transition Matrices | Lixing Lyu et.al. | 2502.15345 | null |
2025-02-21 | Exploring Embodied Multimodal Large Models: Development, Datasets, and Future Directions | Shoubin Chen et.al. | 2502.15336 | null |
2025-02-21 | Stepwise Informativeness Search for Improving LLM Reasoning | Siyuan Wang et.al. | 2502.15335 | null |
2025-02-21 | Attention Eclipse: Manipulating Attention to Bypass LLM Safety-Alignment | Pedram Zaree et.al. | 2502.15334 | null |
2025-02-21 | Detecting Future-related Contexts of Entity Mentions | Puneet Prashar et.al. | 2502.15332 | null |
2025-02-21 | DynamicGSG: Dynamic 3D Gaussian Scene Graphs for Environment Adaptation | Luzhou Ge et.al. | 2502.15309 | link |
2025-02-21 | SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention | Hong Yankun et.al. | 2502.15304 | null |
2025-02-21 | Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference | Yaohua Tang et.al. | 2502.15294 | null |
2025-02-21 | Bridging Bug Localization and Issue Fixing: A Hierarchical Localization Framework Leveraging Large Language Models | Jianming Chang et.al. | 2502.15292 | null |
2025-02-21 | BundleFlow: Deep Menus for Combinatorial Auctions by Diffusion-Based Optimization | Tonghan Wang et.al. | 2502.15283 | null |
2025-02-21 | A Training-free LLM-based Approach to General Chinese Character Error Correction | Houquan Zhou et.al. | 2502.15266 | null |
2025-02-21 | Retrieval-Augmented Speech Recognition Approach for Domain Challenges | Peng Shen et.al. | 2502.15264 | null |
2025-02-21 | LightMamba: Efficient Mamba Acceleration on FPGA with Quantization and Hardware Co-design | Renjie Wei et.al. | 2502.15260 | null |
2025-02-21 | An approach for API synthesis using large language models | Hua Zhong et.al. | 2502.15246 | null |
2025-02-21 | Comparative Analysis of Large Language Models for Context-Aware Code Completion using SAFIM Framework | Hang Zhang et.al. | 2502.15243 | null |
2025-02-21 | From Documents to Dialogue: Building KG-RAG Enhanced AI Assistants | Manisha Mukherjee et.al. | 2502.15237 | null |
2025-02-21 | A General Pseudonymization Framework for Cloud-Based LLMs: Replacing Privacy Information in Controlled Text Generation | Shilong Hou et.al. | 2502.15233 | null |
2025-02-21 | User Experience with LLM-powered Conversational Recommendation Systems: A Case of Music Recommendation | Sojeong Yun et.al. | 2502.15229 | null |
2025-02-21 | Understand User Opinions of Large Language Models via LLM-Powered In-the-Moment User Experience Interviews | Mengqiao Liu et.al. | 2502.15226 | null |
2025-02-21 | Auto-Bench: An Automated Benchmark for Scientific Discovery in LLMs | Tingting Chen et.al. | 2502.15224 | null |
2025-02-21 | FormalSpecCpp: A Dataset of C++ Formal Specifications created using LLMs | Madhurima Chakraborty et.al. | 2502.15217 | link |
2025-02-21 | The Evolving Landscape of LLM- and VLM-Integrated Reinforcement Learning | Sheila Schoepp et.al. | 2502.15214 | null |
2025-02-21 | Unveiling Attractor Cycles in Large Language Models: A Dynamical Systems View of Successive Paraphrasing | Zhilin Wang et.al. | 2502.15208 | null |
2025-02-21 | Lung-DDPM: Semantic Layout-guided Diffusion Models for Thoracic CT Image Synthesis | Yifan Jiang et.al. | 2502.15204 | null |
2025-02-21 | TETRIS: Optimal Draft Token Selection for Batch Speculative Decoding | Zhaoxuan Wu et.al. | 2502.15197 | null |
2025-02-21 | LEDD: Large Language Model-Empowered Data Discovery in Data Lakes | Qi An et.al. | 2502.15182 | null |
2025-02-21 | Enhancing Speech Large Language Models with Prompt-Aware Mixture of Audio Encoders | Weiqiao Shan et.al. | 2502.15178 | null |
2025-02-21 | Methods and Trends in Detecting Generated Images: A Comprehensive Review | Arpan Mahara et.al. | 2502.15176 | null |
2025-02-21 | M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment | Chuan Cui et.al. | 2502.15167 | null |
2025-02-21 | Extreme Speech Classification in the Era of LLMs: Exploring Open-Source and Proprietary Models | Sarthak Mahajan et.al. | 2502.15155 | null |
2025-02-21 | Investigating the Adaptive Robustness with Knowledge Conflicts in LLM-based Multi-Agent Systems | Tianjie Ju et.al. | 2502.15153 | null |
2025-02-21 | Do LLMs Make Mistakes Like Students? Exploring Natural Alignment between Language Models and Human Error Patterns | Naiming Liu et.al. | 2502.15140 | null |
2025-02-21 | Chain-of-Rank: Enhancing Large Language Models for Domain-Specific RAG in Edge Device | Juntae Lee et.al. | 2502.15134 | null |
2025-02-21 | TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba | Xiuwei Chen et.al. | 2502.15130 | null |
2025-02-20 | LUME: LLM Unlearning with Multitask Evaluations | Anil Ramakrishna et.al. | 2502.15097 | null |
2025-02-20 | Detecting Student Intent for Chat-Based Intelligent Tutoring Systems | Ella Cutler et.al. | 2502.15096 | null |
2025-02-20 | Judging It, Washing It: Scoring and Greenwashing Corporate Climate Disclosures using Large Language Models | Marianne Chuang et.al. | 2502.15094 | null |
2025-02-20 | Optimizing Singular Spectrum for Large Language Model Compression | Dengjie Li et.al. | 2502.15092 | null |
2025-02-20 | Analyze the Neurons, not the Embeddings: Understanding When and Where LLM Representations Align with Humans | Masha Fedzechkina et.al. | 2502.15090 | null |
2025-02-20 | Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models | Yeonjun In et.al. | 2502.15086 | null |
2025-02-20 | LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention | Shang Yang et.al. | 2502.14866 | link |
2025-02-20 | Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning | Shuyue Stella Li et.al. | 2502.14860 | link |
2025-02-20 | FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling | Weilin Zhao et.al. | 2502.14856 | null |
2025-02-20 | Prompt-to-Leaderboard | Evan Frick et.al. | 2502.14855 | link |
2025-02-20 | GATE: Graph-based Adaptive Tool Evolution Across Diverse Tasks | Jianwen Luo et.al. | 2502.14848 | null |
2025-02-20 | Red-Teaming LLM Multi-Agent Systems via Communication Attacks | Pengfei He et.al. | 2502.14847 | null |
2025-02-20 | Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation | Yue Yang et.al. | 2502.14846 | null |
2025-02-20 | Revealing and Mitigating Over-Attention in Knowledge Editing | Pinzheng Wang et.al. | 2502.14838 | link |
2025-02-20 | Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs | Danni Liu et.al. | 2502.14830 | link |
2025-02-20 | A Survey of Model Architectures in Information Retrieval | Zhichao Xu et.al. | 2502.14822 | null |
2025-02-20 | eC-Tab2Text: Aspect-Based Text Generation from e-Commerce Product Tables | Luis Antonio Gutiérrez Guanilo et.al. | 2502.14820 | null |
2025-02-20 | Dynamic Low-Rank Sparse Adaptation for Large Language Models | Weizhong Huang et.al. | 2502.14816 | null |
2025-02-20 | FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis | Fadillah Maani et.al. | 2502.14807 | link |
2025-02-20 | From RAG to Memory: Non-Parametric Continual Learning for Large Language Models | Bernal Jiménez Gutiérrez et.al. | 2502.14802 | link |
2025-02-20 | A Multi-Agent Perspective on Modern Information Retrieval | Haya Nachimovsky et.al. | 2502.14796 | null |
2025-02-20 | Rapid Word Learning Through Meta In-Context Learning | Wentao Wang et.al. | 2502.14791 | null |
2025-02-20 | DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models | Hongji Yang et.al. | 2502.14779 | null |
2025-02-20 | SurveyX: Academic Survey Automation via Large Language Models | Xun Liang et.al. | 2502.14776 | null |
2025-02-20 | Determining Layer-wise Sparsity for Large Language Models Through a Theoretical Perspective | Weizhong Huang et.al. | 2502.14770 | null |
2025-02-20 | Tree-of-Debate: Multi-Persona Debate Trees Elicit Critical Thinking for Scientific Comparative Analysis | Priyanka Kargupta et.al. | 2502.14767 | link |
2025-02-20 | EquivaMap: Leveraging LLMs for Automatic Equivalence Checking of Optimization Formulations | Haotian Zhai et.al. | 2502.14760 | link |
2025-02-20 | On the Influence of Context Size and Model Choice in Retrieval-Augmented Generation Systems | Juraj Vladika et.al. | 2502.14759 | null |
2025-02-20 | TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators | Jianling Li et.al. | 2502.14752 | link |
2025-02-20 | Large Language Models Struggle to Describe the Haystack without Human Help: Human-in-the-loop Evaluation of LLMs | Zongxia Li et.al. | 2502.14748 | null |
2025-02-20 | Multi-Agent Coordination across Diverse Applications: A Survey | Lijun Sun et.al. | 2502.14743 | null |
2025-02-20 | SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines | M-A-P Team et.al. | 2502.14739 | null |
2025-02-20 | EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration | Minjie Hong et.al. | 2502.14735 | null |
2025-02-20 | WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models | Yifu Chen et.al. | 2502.14727 | null |
2025-02-20 | I-MCTS: Enhancing Agentic AutoML via Introspective Monte Carlo Tree Search | Zujie Liang et.al. | 2502.14693 | null |
2025-02-20 | Bridging the Gap: Transforming Natural Language Questions into SQL Queries via Abstract Query Pattern and Contextual Schema Markup | Yonghui Kong et.al. | 2502.14682 | null |
2025-02-20 | How to Get Your LLM to Generate Challenging Problems for Evaluation | Arkil Patel et.al. | 2502.14678 | link |
2025-02-20 | Data-Constrained Synthesis of Training Data for De-Identification | Thomas Vakili et.al. | 2502.14677 | null |
2025-02-20 | Explanations of Deep Language Models Explain Language Representations in the Brain | Maryam Rahimi et.al. | 2502.14671 | null |
2025-02-20 | AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO | Alan Dao et.al. | 2502.14669 | null |
2025-02-20 | Beyond the Surface: Uncovering Implicit Locations with LLMs for Personalized Local News | Gali Katz et.al. | 2502.14660 | null |
2025-02-20 | Edit Once, Update Everywhere: A Simple Framework for Cross-Lingual Knowledge Synchronization in LLMs | Yuchen Wu et.al. | 2502.14645 | null |
2025-02-20 | LIFT: Improving Long Context Understanding of Large Language Models through Long Input Fine-Tuning | Yansheng Mao et.al. | 2502.14644 | null |
2025-02-20 | Length-Controlled Margin-Based Preference Optimization without Reference Model | Gengxu Li et.al. | 2502.14643 | link |
2025-02-20 | ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation | Angxiao Yue et.al. | 2502.14637 | link |
2025-02-20 | CER: Confidence Enhanced Reasoning in LLMs | Ali Razghandi et.al. | 2502.14634 | link |
2025-02-20 | Augmenting Coaching with GenAI: Insights into Use, Effectiveness, and Future Potential | Jennifer Haase et.al. | 2502.14632 | null |
2025-02-20 | Synergistic Fusion of Multi-Source Knowledge via Evidence Theory for High-Entropy Alloy Discovery | Minh-Quyet Ha et.al. | 2502.14631 | null |
2025-02-20 | PEARL: Towards Permutation-Resilient LLMs | Liang Chen et.al. | 2502.14628 | link |
2025-02-20 | Reward Models Identify Consistency, Not Causality | Yuhui Xu et.al. | 2502.14619 | null |
2025-02-20 | Serving Models, Fast and Slow:Optimizing Heterogeneous LLM Inferencing Workloads at Scale | Shashwat Jaiswal et.al. | 2502.14617 | null |
2025-02-20 | FIND: Fine-grained Information Density Guided Adaptive Retrieval-Augmented Generation for Disease Diagnosis | Mingyi Jia et.al. | 2502.14614 | null |
2025-02-20 | Behavioral Analysis of Information Salience in Large Language Models | Jan Trienes et.al. | 2502.14613 | link |
2025-02-20 | "Don't Forget the Teachers": Towards an Educator-Centered Understanding of Harms from Large Language Models in Education | Emma Harvey et.al. | 2502.14592 | null |
2025-02-20 | Vision Foundation Models in Medical Image Analysis: Advances and Challenges | Pengchen Liang et.al. | 2502.14584 | null |
2025-02-20 | A Theory for Conditional Generative Modeling on Multiple Data Sources | Rongzhen Wang et.al. | 2502.14583 | link |
2025-02-20 | ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification | Hyunseok Lee et.al. | 2502.14565 | null |
2025-02-20 | Plan-over-Graph: Towards Parallelable LLM Agent Schedule | Shiqi Zhang et.al. | 2502.14563 | link |
2025-02-20 | Can LLMs Predict Citation Intent? An Experimental Analysis of In-context Learning and Fine-tuning on Open LLMs | Paris Koloveas et.al. | 2502.14561 | link |
2025-02-20 | Less is More: Improving LLM Alignment via Preference Data Selection | Xun Deng et.al. | 2502.14560 | null |
2025-02-20 | Multiscale Byte Language Models -- A Hierarchical Architecture for Causal Million-Length Sequence Modeling | Eric Egli et.al. | 2502.14553 | link |
2025-02-20 | Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks | Maya Bechler-Speicher et.al. | 2502.14546 | null |
2025-02-20 | LLM-based User Profile Management for Recommender System | Seunghwan Bang et.al. | 2502.14541 | null |
2025-02-20 | LoRA-GGPO: Mitigating Double Descent in LoRA Fine-Tuning via Gradient-Guided Perturbation Optimization | Yupeng Chang et.al. | 2502.14538 | link |
2025-02-20 | CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models | Zhenhong Zhou et.al. | 2502.14529 | link |
2025-02-20 | Generative adversarial networks vs large language models: a comparative study on synthetic tabular data generation | Austin A. Barr et.al. | 2502.14523 | link |
2025-02-20 | Can LLMs Simulate L2-English Dialogue? An Information-Theoretic Analysis of L1-Dependent Biases | Rena Gao et.al. | 2502.14507 | link |
2025-02-20 | How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? | Sergey Pletenev et.al. | 2502.14502 | link |
2025-02-20 | MLGym: A New Framework and Benchmark for Advancing AI Research Agents | Deepak Nathani et.al. | 2502.14499 | null |
2025-02-20 | StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction Following | Jinnan Li et.al. | 2502.14494 | link |
2025-02-20 | How Jailbreak Defenses Work and Ensemble? A Mechanistic Investigation | Zhuohang Long et.al. | 2502.14486 | null |
2025-02-20 | NLoRA: Nyström-Initiated Low-Rank Adaptation for Large Language Models | Chenlu Guo et.al. | 2502.14482 | link |
2025-02-20 | Unshackling Context Length: An Efficient Selective Attention Approach through Query-Key Compression | Haoyu Wang et.al. | 2502.14477 | null |
2025-02-20 | Argument-Based Comparative Question Answering Evaluation Benchmark | Irina Nikishina et.al. | 2502.14476 | null |
2025-02-20 | Enhancing Smart Environments with Context-Aware Chatbots using Large Language Models | Aurora Polo-Rodríguez et.al. | 2502.14469 | null |
2025-02-20 | Narrative-Driven Travel Planning: Geoculturally-Grounded Script Generation with Evolutionary Itinerary Optimization | Ran Ding et.al. | 2502.14456 | link |
2025-02-20 | Optimal word order for non-causal text generation with Large Language Models: the Spanish case | Andrea Busto-Castiñeira et.al. | 2502.14451 | null |
2025-02-20 | LLM4FaaS: No-Code Application Development using LLMs and FaaS | Minghe Wang et.al. | 2502.14450 | null |
2025-02-20 | PredictaBoard: Benchmarking LLM Score Predictability | Lorenzo Pacchiardi et.al. | 2502.14445 | link |
2025-02-20 | Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models | Artem Vazhentsev et.al. | 2502.14427 | link |
2025-02-20 | A Survey on Data Contamination for Large Language Models | Yuxing Cheng et.al. | 2502.14425 | link |
2025-02-20 | ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model | Zhongyi Zhou et.al. | 2502.14420 | null |
2025-02-20 | Towards Efficient Automatic Self-Pruning of Large Language Models | Weizhong Huang et.al. | 2502.14413 | null |
2025-02-20 | Evaluating Precise Geolocation Inference Capabilities of Vision Language Models | Neel Jay et.al. | 2502.14412 | link |
2025-02-20 | Unstructured Evidence Attribution for Long Context Query Focused Summarization | Dustin Wright et.al. | 2502.14409 | null |
2025-02-20 | HPS: Hard Preference Sampling for Human Preference Alignment | Xiandong Zou et.al. | 2502.14400 | null |
2025-02-20 | Enhancing Portuguese Variety Identification with Cross-Domain Approaches | Hugo Sousa et.al. | 2502.14394 | null |
2025-02-20 | Leveraging Small LLMs for Argument Mining in Education: Argument Component Identification, Classification, and Assessment | Lucile Favero et.al. | 2502.14389 | null |
2025-02-20 | S: Test Time Scaling for Code Generation* | Dacheng Li et.al. | 2502.14382 | link |
2025-02-20 | PPO-MI: Efficient Black-Box Model Inversion via Proximal Policy Optimization | Xinpeng Shou et.al. | 2502.14370 | null |
2025-02-20 | Entropy-UID: A Method for Optimizing Information Density | Xinpeng Shou et.al. | 2502.14366 | null |
2025-02-20 | Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning | Jiachen Zhu et.al. | 2502.14361 | null |
2025-02-20 | SR-LLM: Rethinking the Structured Representation in Large Language Model | Jiahuan Zhang et.al. | 2502.14352 | null |
2025-02-20 | SegAnyPET: Universal Promptable Segmentation from Positron Emission Tomography Images | Yichi Zhang et.al. | 2502.14351 | null |
2025-02-20 | FlowAgent: Achieving Compliance and Flexibility for Workflow Agents | Yuchen Shi et.al. | 2502.14345 | link |
2025-02-20 | Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective | Ruichen Shao et.al. | 2502.14340 | null |
2025-02-20 | A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics | Ting-Ruen Wei et.al. | 2502.14333 | null |
2025-02-20 | SolSearch: An LLM-Driven Framework for Efficient SAT-Solving Code Generation | Junjie Sheng et.al. | 2502.14328 | null |
2025-02-20 | ChemHTS: Hierarchical Tool Stacking for Enhancing Chemical Agents | Zhucong Li et.al. | 2502.14327 | link |
2025-02-20 | Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems | Bingyu Yan et.al. | 2502.14321 | null |
2025-02-20 | Line Goes Up? Inherent Limitations of Benchmarks for Evaluating Large Language Models | James Fodor et.al. | 2502.14318 | null |
2025-02-20 | ParallelComp: Parallel Long-Context Compressor for Length Extrapolation | Jing Xiong et.al. | 2502.14317 | null |
2025-02-20 | Unveiling Cultural Blind Spots: Analyzing the Limitations of mLLMs in Procedural Text Comprehension | Amir Hossein Yari et.al. | 2502.14315 | null |
2025-02-20 | Efficient AI in Practice: Training and Deployment of Efficient LLMs for Industry Applications | Kayhan Behdin et.al. | 2502.14305 | null |
2025-02-20 | MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models | Shrey Pandit et.al. | 2502.14302 | null |
2025-02-20 | SEA-HELM: Southeast Asian Holistic Evaluation of Language Models | Yosephine Susanto et.al. | 2502.14301 | null |
2025-02-19 | Where's the Bug? Attention Probing for Scalable Fault Localization | Adam Stein et.al. | 2502.13966 | null |
2025-02-19 | Autellix: An Efficient Serving Engine for LLM Agents as General Programs | Michael Luo et.al. | 2502.13965 | null |
2025-02-19 | MuDAF: Long-Context Multi-Document Attention Focusing through Contrastive Learning on Attention Heads | Weihao Liu et.al. | 2502.13963 | link |
2025-02-19 | Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering | William Jurayj et.al. | 2502.13962 | null |
2025-02-19 | LIDDIA: Language-based Intelligent Drug Discovery Agent | Reza Averly et.al. | 2502.13959 | null |
2025-02-19 | Neurosymbolic artificial intelligence via large language models and coherence-driven inference | Steve Huntsman et.al. | 2502.13953 | null |
2025-02-19 | Why Safeguarded Ships Run Aground? Aligned Large Language Models' Safety Mechanisms Tend to Be Anchored in The Template Region | Chak Tou Leong et.al. | 2502.13946 | null |
2025-02-19 | Image compositing is all you need for data augmentation | Ang Jia Ning Shermaine et.al. | 2502.13936 | null |
2025-02-19 | LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization | Guanzheng Chen et.al. | 2502.13922 | link |
2025-02-19 | Exploring Code Language Models for Automated HLS-based Hardware Generation: Benchmark, Infrastructure and Analysis | Jiahao Gai et.al. | 2502.13921 | null |
2025-02-19 | Exploring Personalized Health Support through Data-Driven, Theory-Guided LLMs: A Case Study in Sleep Health | Xingbo Wang et.al. | 2502.13920 | null |
2025-02-19 | How Do LLMs Perform Two-Hop Reasoning in Context? | Tianyu Guo et.al. | 2502.13913 | null |
2025-02-19 | Lost in Sequence: Do Large Language Models Understand Sequential Recommendation? | Sein Kim et.al. | 2502.13909 | link |
2025-02-19 | Judging the Judges: A Collection of LLM-Generated Relevance Judgements | Hossein A. Rahmani et.al. | 2502.13908 | link |
2025-02-19 | DataSciBench: An LLM Agent Benchmark for Data Science | Dan Zhang et.al. | 2502.13897 | link |
2025-02-19 | NavigateDiff: Visual Predictors are Zero-Shot Navigation Assistants | Yiran Qin et.al. | 2502.13894 | null |
2025-02-19 | Refining embeddings with fill-tuning: data-efficient generalised performance improvements for materials foundation models | Matthew P. Wilson et.al. | 2502.13886 | link |
2025-02-19 | SPEX: Scaling Feature Interaction Explanations for LLMs | Justin Singh Kang et.al. | 2502.13870 | link |
2025-02-19 | MagicGeo: Training-Free Text-Guided Geometric Diagram Generation | Junxiao Wang et.al. | 2502.13855 | null |
2025-02-19 | Enhancing LLM-Based Recommendations Through Personalized Reasoning | Jiahao Liu et.al. | 2502.13845 | null |
2025-02-19 | Enhancing Cross-Domain Recommendations with Memory-Optimized LLM-Based User Agents | Jiahao Liu et.al. | 2502.13843 | null |
2025-02-19 | Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking | Yilong Chen et.al. | 2502.13842 | null |
2025-02-19 | Quantifying Memorization and Retriever Performance in Retrieval-Augmented Vision-Language Models | Peter Carragher et.al. | 2502.13836 | null |
2025-02-19 | Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning | Zenan Li et.al. | 2502.13834 | null |
2025-02-19 | ArtMentor: AI-Assisted Evaluation of Artworks to Explore Multimodal Large Language Models Capabilities | Chanjin Zheng et.al. | 2502.13832 | link |
2025-02-19 | LESA: Learnable LLM Layer Scaling-Up | Yifei Yang et.al. | 2502.13794 | link |
2025-02-19 | From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions | Nathanaël Carraz Rakotonirina et.al. | 2502.13791 | link |
2025-02-19 | From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education | Yi-Fan Zhang et.al. | 2502.13789 | null |
2025-02-19 | Helix-mRNA: A Hybrid Foundation Model For Full Sequence mRNA Therapeutics | Matthew Wood et.al. | 2502.13785 | link |
2025-02-19 | Generative Large Recommendation Models: Emerging Trends in LLMs for Recommendation | Hao Wang et.al. | 2502.13783 | null |
2025-02-19 | Translation in the Hands of Many:Centering Lay Users in Machine Translation Interactions | Beatrice Savoldi et.al. | 2502.13780 | null |
2025-02-19 | VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare | Anudeex Shetty et.al. | 2502.13775 | null |
2025-02-19 | AI Software Engineer: Programming with Trust | Abhik Roychoudhury et.al. | 2502.13767 | null |
2025-02-19 | SCALAR: Scientific Citation-based Live Assessment of Long-context Academic Reasoning | Renxi Wang et.al. | 2502.13753 | null |
2025-02-19 | Reverse Markov Learning: Multi-Step Generative Models for Complex Distributions | Xinwei Shen et.al. | 2502.13747 | null |
2025-02-19 | Enhancing Input-Label Mapping in In-Context Learning with Contrastive Decoding | Keqin Peng et.al. | 2502.13738 | null |
2025-02-19 | CARE: Confidence-Aware Regression Estimation of building density fine-tuning EO Foundation Models | Nikolaos Dionelis et.al. | 2502.13734 | null |
2025-02-19 | Adapting Large Language Models for Time Series Modeling via a Novel Parameter-efficient Adaptation Method | Juyuan Zhang et.al. | 2502.13725 | null |
2025-02-19 | Direct Value Optimization: Improving Chain-of-Thought Reasoning in LLMs with Refined Values | Hongbo Zhang et.al. | 2502.13723 | null |
2025-02-19 | TALKPLAY: Multimodal Music Recommendation with Large Language Models | Seungheon Doh et.al. | 2502.13713 | null |
2025-02-19 | Is This Collection Worth My LLM's Time? Automatically Measuring Information Potential in Text Corpora | Tristan Karch et.al. | 2502.13691 | null |
2025-02-19 | An LLM-based Agent for Reliable Docker Environment Configuration | Ruida Hu et.al. | 2502.13681 | null |
2025-02-19 | SCOPE: A Self-supervised Framework for Improving Faithfulness in Conditional Text Generation | Song Duong et.al. | 2502.13674 | null |
2025-02-19 | Refining Sentence Embedding Model through Ranking Sentences Generation with Large Language Models | Liyang He et.al. | 2502.13656 | link |
2025-02-19 | C2T: A Classifier-Based Tree Construction Method in Speculative Decoding | Feiye Huo et.al. | 2502.13652 | null |
2025-02-19 | Reliability Across Parametric and External Knowledge: Understanding Knowledge Handling in LLMs | Youna Kim et.al. | 2502.13648 | null |
2025-02-19 | D.Va: Validate Your Demonstration First Before You Use It | Qi Zhang et.al. | 2502.13646 | null |
2025-02-19 | Qorgau: Evaluating LLM Safety in Kazakh-Russian Bilingual Contexts | Maiya Goloburda et.al. | 2502.13640 | null |
2025-02-19 | Concept Layers: Enhancing Interpretability and Intervenability via LLM Conceptualization | Or Raphael Bidusa et.al. | 2502.13632 | null |
2025-02-19 | AI-Empowered Catalyst Discovery: A Survey from Classical Machine Learning Approaches to Large Language Models | Yuanyuan Xu et.al. | 2502.13626 | null |
2025-02-19 | REFIND: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models | DongGeon Lee et.al. | 2502.13622 | null |
2025-02-19 | Complex Ontology Matching with Large Language Model Embeddings | Guilherme Sousa et.al. | 2502.13619 | null |
2025-02-19 | LaVCa: LLM-assisted Visual Cortex Captioning | Takuya Matsuyama et.al. | 2502.13606 | null |
2025-02-19 | BeamLoRA: Beam-Constraint Low-Rank Adaptation | Naibin Gu et.al. | 2502.13604 | null |
2025-02-19 | MMTEB: Massive Multilingual Text Embedding Benchmark | Kenneth Enevoldsen et.al. | 2502.13595 | null |
2025-02-19 | Don't Stop the Multi-Party! On Generating Synthetic Multi-Party Conversations with Constraints | Nicolò Penzo et.al. | 2502.13592 | null |
2025-02-19 | Unraveling the Localized Latents: Learning Stratified Manifold Structures in LLM Embedding Space with Sparse Mixture-of-Experts | Xin Li et.al. | 2502.13577 | null |
2025-02-19 | LSR-Adapt: Ultra-Efficient Parameter Tuning with Matrix Low Separation Rank Kernel Adaptation | Xin Li et.al. | 2502.13568 | null |
2025-02-19 | Extracting Social Connections from Finnish Karelian Refugee Interviews Using LLMs | Joonatan Laato et.al. | 2502.13566 | null |
2025-02-19 | PRIV-QA: Privacy-Preserving Question Answering for Cloud Large Language Models | Guangwei Li et.al. | 2502.13564 | link |
2025-02-19 | Are Large Language Models In-Context Graph Learners? | Jintang Li et.al. | 2502.13562 | null |
2025-02-19 | Democratizing Large Language Model-Based Graph Data Augmentation via Latent Knowledge Graphs | Yushi Feng et.al. | 2502.13555 | link |
2025-02-19 | STaR-SQL: Self-Taught Reasoner for Text-to-SQL | Mingqian He et.al. | 2502.13550 | null |
2025-02-19 | Detecting Linguistic Bias in Government Documents Using Large language Models | Milena de Swart et.al. | 2502.13548 | null |
2025-02-19 | From Sub-Ability Diagnosis to Human-Aligned Generation: Bridging the Gap for Text Length Control via MARKERGEN | Peiwen Yuan et.al. | 2502.13544 | null |
2025-02-19 | Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference | Qingfa Xiao et.al. | 2502.13542 | null |
2025-02-19 | Bursting Filter Bubble: Enhancing Serendipity Recommendations with Aligned Large Language Models | Yunjia Xi et.al. | 2502.13539 | null |
2025-02-19 | Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models | Jun Zhang et.al. | 2502.13533 | link |
2025-02-19 | Exploiting Prefix-Tree in Structured Output Interfaces for Enhancing Jailbreak Attacking | Yanzeng Li et.al. | 2502.13527 | link |
2025-02-19 | SPPD: Self-training with Process Preference Learning Using Dynamic Value Margin | Hao Yi et.al. | 2502.13516 | null |
2025-02-19 | Unlocking Multimodal Integration in EHRs: A Prompt Learning Framework for Language and Time Series Fusion | Shuai Niu et.al. | 2502.13509 | null |
2025-02-19 | Reproducing NevIR: Negation in Neural Information Retrieval | Coen van Elsen et.al. | 2502.13506 | link |
2025-02-19 | PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference | Burc Gokden et.al. | 2502.13502 | link |
2025-02-19 | Towards Geo-Culturally Grounded LLM Generations | Piyawat Lertvittayakumjorn et.al. | 2502.13497 | null |
2025-02-19 | What are Models Thinking about? Understanding Large Language Model Hallucinations "Psychology" through Model Inner State Analysis | Peiran Wang et.al. | 2502.13490 | null |
2025-02-19 | LLM4Tag: Automatic Tagging System for Information Retrieval via Large Language Models | Ruiming Tang et.al. | 2502.13481 | null |
2025-02-19 | Integration of Agentic AI with 6G Networks for Mission-Critical Applications: Use-case and Challenges | Sunder Ali Khowaja et.al. | 2502.13476 | null |
2025-02-19 | LLM should think and action as a human | Haun Leung et.al. | 2502.13475 | null |
2025-02-19 | Towards Lightweight, Adaptive and Attribute-Aware Multi-Aspect Controllable Text Generation with Large Language Models | Chenyu Zhu et.al. | 2502.13474 | null |
2025-02-19 | ThinkGuard: Deliberative Slow Thinking Leads to Cautious Guardrails | Xiaofei Wen et.al. | 2502.13458 | link |
2025-02-19 | Interleaved Gibbs Diffusion for Constrained Generation | Gautham Govind Anil et.al. | 2502.13450 | null |
2025-02-19 | Enhancing Chest X-ray Classification through Knowledge Injection in Cross-Modality Learning | Yang Yan et.al. | 2502.13447 | null |
2025-02-19 | TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination Evaluation | Jialin Ouyang et.al. | 2502.13442 | null |
2025-02-19 | The Self-Improvement Paradox: Can Language Models Bootstrap Reasoning Capabilities without External Scaffolding? | Yutao Sun et.al. | 2502.13441 | null |
2025-02-19 | MATS: An Audio Language Model under Text-only Supervision | Wen Wang et.al. | 2502.13433 | null |
2025-02-19 | Vision-Based Generic Potential Function for Policy Alignment in Multi-Agent Reinforcement Learning | Hao Ma et.al. | 2502.13430 | null |
2025-02-19 | MCTS-KBQA: Monte Carlo Tree Search for Knowledge Base Question Answering | Guanming Xiong et.al. | 2502.13428 | null |
2025-02-19 | TabSD: Large Free-Form Table Question Answering with SQL-Based Table Decomposition | Yuxiang Wang et.al. | 2502.13422 | null |
2025-02-19 | RLTHF: Targeted Human Feedback for LLM Alignment | Yifei Xu et.al. | 2502.13417 | null |
2025-02-19 | Detecting LLM Fact-conflicting Hallucinations Enhanced by Temporal-logic-based Reasoning | Ningke Li et.al. | 2502.13416 | null |
2025-02-19 | Explore-Construct-Filter: An Automated Framework for Rich and Reliable API Knowledge Graph Construction | Yanbang Sun et.al. | 2502.13412 | null |
2025-02-19 | Generative Predictive Control: Flow Matching Policies for Dynamic and Difficult-to-Demonstrate Tasks | Vince Kurtz et.al. | 2502.13406 | null |
2025-02-19 | Vishal Dey et.al. | 2502.13398 | null | |
2025-02-19 | Prompting a Weighting Mechanism into LLM-as-a-Judge in Two-Step: A Case Study | Wenwen Xie et.al. | 2502.13396 | null |
2025-02-19 | Flow-based generative models as iterative algorithms in probability space | Yao Xie et.al. | 2502.13394 | null |
2025-02-19 | Reasoning with Reinforced Functional Token Tuning | Kongcheng Zhang et.al. | 2502.13389 | link |
2025-02-19 | Reflection of Episodes: Learning to Play Game from Expert and Self Experiences | Xiaojie Xu et.al. | 2502.13388 | null |
2025-02-19 | MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification | Linzhuang Sun et.al. | 2502.13383 | link |
2025-02-19 | AutoTEE: Automated Migration and Protection of Programs in Trusted Execution Environments | Ruidong Han et.al. | 2502.13379 | null |
2025-02-19 | Task-agnostic Prompt Compression with Context-aware Sentence Embedding and Reward-guided Task Descriptor | Barys Liskavets et.al. | 2502.13374 | null |
2025-02-18 | Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization | Shuo Xing et.al. | 2502.13146 | link |
2025-02-18 | Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation | Bencheng Liao et.al. | 2502.13145 | link |
2025-02-18 | Pre-training Auto-regressive Robotic Models with 4D Representations | Dantong Niu et.al. | 2502.13142 | null |
2025-02-18 | UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models | Huawei Lin et.al. | 2502.13141 | link |
2025-02-18 | AIDE: AI-Driven Exploration in the Space of Code | Zhengyao Jiang et.al. | 2502.13138 | link |
2025-02-18 | Theorem Prover as a Judge for Synthetic Data Generation | Joshua Ong Jun Leang et.al. | 2502.13137 | null |
2025-02-18 | AV-Flow: Transforming Text to Audio-Visual Human-like Interactions | Aggelina Chatziagapi et.al. | 2502.13133 | null |
2025-02-18 | Learning to Defer for Causal Discovery with Imperfect Experts | Oscar Clivio et.al. | 2502.13132 | null |
2025-02-18 | Rethinking Diverse Human Preference Learning through Principal Component Analysis | Feng Luo et.al. | 2502.13131 | null |
2025-02-18 | Magma: A Foundation Model for Multimodal AI Agents | Jianwei Yang et.al. | 2502.13130 | link |
2025-02-18 | Is Noise Conditioning Necessary for Denoising Generative Models? | Qiao Sun et.al. | 2502.13129 | null |
2025-02-18 | Facilitating Long Context Understanding via Supervised Chain-of-Thought Reasoning | Jingyang Lin et.al. | 2502.13127 | null |
2025-02-18 | RuozhiBench: Evaluating LLMs with Logical Fallacies and Misleading Premises | Zenan Zhai et.al. | 2502.13125 | null |
2025-02-18 | Adapting Psycholinguistic Research for LLMs: Gender-inclusive Language in a Coreference Context | Marion Bartl et.al. | 2502.13120 | null |
2025-02-18 | STEER-ME: Assessing the Microeconomic Reasoning of Large Language Models | Narun Raman et.al. | 2502.13119 | null |
2025-02-18 | Performance Evaluation of Large Language Models in Statistical Programming | Xinyi Song et.al. | 2502.13117 | link |
2025-02-18 | MatterChat: A Multi-Modal LLM for Material Science | Yingheng Tang et.al. | 2502.13107 | null |
2025-02-18 | Text2World: Benchmarking Large Language Models for Symbolic World Model Generation | Mengkang Hu et.al. | 2502.13092 | null |
2025-02-18 | A Neural Difference-of-Entropies Estimator for Mutual Information | Haoran Ni et.al. | 2502.13085 | null |
2025-02-18 | Personalized Image Generation with Deep Generative Models: A Decade Survey | Yuxiang Wei et.al. | 2502.13081 | link |
2025-02-18 | SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models | Xianfu Cheng et.al. | 2502.13059 | null |
2025-02-18 | LAMD: Context-driven Android Malware Detection and Classification with LLMs | Xingzhi Qian et.al. | 2502.13055 | null |
2025-02-18 | Do we still need Human Annotators? Prompting Large Language Models for Aspect Sentiment Quad Prediction | Nils Constantin Hellwig et.al. | 2502.13044 | null |
2025-02-18 | HPSS: Heuristic Prompting Strategy Search for LLM Evaluators | Bosi Wen et.al. | 2502.13031 | null |
2025-02-18 | A deep learning framework for efficient pathology image analysis | Peter Neidlinger et.al. | 2502.13027 | null |
2025-02-18 | Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks | Markus J. Buehler et.al. | 2502.13025 | null |
2025-02-18 | Oreo: A Plug-in Context Reconstructor to Enhance Retrieval-Augmented Generation | Sha Li et.al. | 2502.13019 | null |
2025-02-18 | Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents | Chaoran Chen et.al. | 2502.13012 | null |
2025-02-18 | Adaptive Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge | Mohammad Reza Rezaei et.al. | 2502.13010 | null |
2025-02-18 | You need to MIMIC to get FAME: Solving Meeting Transcript Scarcity with a Multi-Agent Conversations | Frederic Kirstein et.al. | 2502.13001 | null |
2025-02-18 | Personalized Top-k Set Queries Over Predicted Scores | Sohrab Namazi Nia et.al. | 2502.12998 | null |
2025-02-18 | Beyond Profile: From Surface-Level Facts to Deep Persona Simulation in LLMs | Zixiao Wang et.al. | 2502.12988 | null |
2025-02-18 | Towards Variational Flow Matching on General Geometries | Olga Zaghen et.al. | 2502.12981 | null |
2025-02-18 | Learning More Effective Representations for Dense Retrieval through Deliberate Thinking Before Search | Yifan Ji et.al. | 2502.12974 | link |
2025-02-18 | Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking | Junda Zhu et.al. | 2502.12970 | link |
2025-02-18 | Trust Me, I'm Wrong: High-Certainty Hallucinations in LLMs | Adi Simhi et.al. | 2502.12964 | null |
2025-02-18 | Infinite Retrieval: Attention Enhanced LLMs in Long-Context Processing | Xiaoju Ye et.al. | 2502.12962 | null |
2025-02-18 | Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger | Wenjun Li et.al. | 2502.12961 | null |
2025-02-18 | Guaranteed Conditional Diffusion: 3D Block-based Models for Scientific Data Compression | Jaemoon Lee et.al. | 2502.12951 | null |
2025-02-18 | Fake It Till You Make It: Using Synthetic Data and Domain Knowledge for Improved Text-Based Learning for LGE Detection | Athira J Jacob et.al. | 2502.12948 | null |
2025-02-18 | Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models | Gyeongman Kim et.al. | 2502.12947 | null |
2025-02-18 | LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation | Junchen Fu et.al. | 2502.12945 | null |
2025-02-18 | Performance of Zero-Shot Time Series Foundation Models on Cloud Data | William Toner et.al. | 2502.12944 | null |
2025-02-18 | Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options | Lakshmi Nair et.al. | 2502.12929 | link |
2025-02-18 | Finedeep: Mitigating Sparse Activation in Dense LLMs via Multi-Layer Fine-Grained Experts | Leiyu Pan et.al. | 2502.12928 | null |
2025-02-18 | SEFL: Harnessing Large Language Model Agents to Improve Educational Feedback Systems | Mike Zhang et.al. | 2502.12927 | null |
2025-02-18 | Towards more Contextual Agents: An extractor-Generator Optimization Framework | Mourad Aouini et.al. | 2502.12926 | null |
2025-02-18 | Keep what you need : extracting efficient subnetworks from large audio representation models | David Genova et.al. | 2502.12925 | link |
2025-02-18 | Conditioning LLMs to Generate Code-Switched Text: A Methodology Grounded in Naturally Occurring Data | Maite Heredia et.al. | 2502.12924 | link |
2025-02-18 | On-Device LLMs for Home Assistant: Dual Role in Intent Detection and Response Generation | Rune Birkmose et.al. | 2502.12923 | link |
2025-02-18 | Q-STRUM Debate: Query-Driven Contrastive Summarization for Recommendation Comparison | George-Kirollos Saad et.al. | 2502.12921 | null |
2025-02-18 | Lightweight Online Adaption for Time Series Foundation Model Forecasts | Thomas L. Lee et.al. | 2502.12920 | null |
2025-02-18 | GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning | Sifan Zhou et.al. | 2502.12913 | null |
2025-02-18 | Probabilistic neural operators for functional uncertainty quantification | Christopher Bülte et.al. | 2502.12902 | link |
2025-02-18 | Soundwave: Less is More for Speech-Text Alignment in LLMs | Yuhao Zhang et.al. | 2502.12900 | link |
2025-02-18 | Multilingual European Language Models: Benchmarking Approaches and Challenges | Fabio Barth et.al. | 2502.12895 | null |
2025-02-18 | CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image | Kaixin Yao et.al. | 2502.12894 | null |
2025-02-18 | Are Multilingual Language Models an Off-ramp for Under-resourced Languages? Will we arrive at Digital Language Equality in Europe in 2030? | Georg Rehm et.al. | 2502.12886 | null |
2025-02-18 | How desirable is alignment between LLMs and linguistically diverse human users? | Pia Knoeferle et.al. | 2502.12884 | null |
2025-02-18 | Continuous Learning Conversational AI: A Personalized Agent Framework via A2C Reinforcement Learning | Nandakishor M et.al. | 2502.12876 | null |
2025-02-18 | RobotIQ: Empowering Mobile Robots with Human-Level Planning for Real-World Execution | Emmanuel K. Raptis et.al. | 2502.12862 | link |
2025-02-18 | PAFT: Prompt-Agnostic Fine-Tuning | Chenxing Wei et.al. | 2502.12859 | null |
2025-02-18 | Rejected Dialects: Biases Against African American Language in Reward Models | Joel Mire et.al. | 2502.12858 | null |
2025-02-18 | MeMo: Towards Language Models with Associative Memory Mechanisms | Fabio Massimo Zanzotto et.al. | 2502.12851 | null |
2025-02-18 | MOLLM: Multi-Objective Large Language Model for Molecular Design -- Optimizing with Experts | Nian Ran et.al. | 2502.12845 | null |
2025-02-18 | Towards Adaptive Feedback with AI: Comparing the Feedback Quality of LLMs and Teachers on Experimentation Protocols | Kathrin Seßler et.al. | 2502.12842 | null |
2025-02-18 | Towards Equitable AI: Detecting Bias in Using Large Language Models for Marketing | Berk Yilmaz et.al. | 2502.12838 | null |
2025-02-18 | An LLM-Powered Agent for Physiological Data Analysis: A Case Study on PPG-based Heart Rate Estimation | Mohammad Feli et.al. | 2502.12836 | null |
2025-02-18 | KazMMLU: Evaluating Language Models on Kazakh, Russian, and Regional Knowledge of Kazakhstan | Mukhammed Togmanov et.al. | 2502.12829 | null |
2025-02-18 | Reasoning and the Trusting Behavior of DeepSeek and GPT: An Experiment Revealing Hidden Fault Lines in Large Language Models | Rubing Lu et.al. | 2502.12825 | null |
2025-02-18 | Pitfalls of Scale: Investigating the Inverse Task of Redefinition in Large Language Models | Elena Stringli et.al. | 2502.12821 | null |
2025-02-18 | Simulating User Diversity in Task-Oriented Dialogue Systems using Large Language Models | Adnan Ahmad et.al. | 2502.12813 | null |
2025-02-18 | Towards Text-Image Interleaved Retrieval | Xin Zhang et.al. | 2502.12799 | link |
2025-02-18 | RAPID: Retrieval Augmented Training of Differentially Private Diffusion Models | Tanqiu Jiang et.al. | 2502.12794 | link |
2025-02-18 | Commonsense Reasoning in Arab Culture | Abdelrahman Sadallah et.al. | 2502.12788 | null |
2025-02-18 | Portable Reward Tuning: Towards Reusable Fine-Tuning across Different Pretrained Models | Daiki Chijiwa et.al. | 2502.12776 | null |
2025-02-18 | How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild | Saad Obaid ul Islam et.al. | 2502.12769 | link |
2025-02-18 | R2-KG: General-Purpose Dual-Agent Framework for Reliable Reasoning on Knowledge Graphs | Sumin Jo et.al. | 2502.12767 | null |
2025-02-18 | One-bit Compressed Sensing using Generative Models | Swatantra Kafle et.al. | 2502.12762 | null |
2025-02-18 | Efficient Machine Translation Corpus Generation: Integrating Human-in-the-Loop Post-Editing with Large Language Models | Kamer Ali Yuksel et.al. | 2502.12755 | link |
2025-02-18 | Architect of the Bits World: Masked Autoregressive Modeling for Circuit Generation Guided by Truth Table | Haoyuan Wu et.al. | 2502.12751 | null |
2025-02-18 | Self-Enhanced Reasoning Training: Activating Latent Reasoning in Small Models for Enhanced Reasoning Distillation | Yong Zhang et.al. | 2502.12744 | null |
2025-02-18 | "I know myself better, but not really greatly": Using LLMs to Detect and Explain LLM-Generated Texts | Jiazhou Ji et.al. | 2502.12743 | null |
2025-02-18 | Circuit Representation Learning with Masked Gate Modeling and Verilog-AIG Alignment | Haoyuan Wu et.al. | 2502.12732 | null |
2025-02-18 | TREND: A Whitespace Replacement Information Hiding Method | Malte Hellmeier et.al. | 2502.12710 | null |
2025-02-18 | Multi-Novelty: Improve the Diversity and Novelty of Contents Generated by Large Language Models via inference-time Multi-Views Brainstorming | Arash Lagzian et.al. | 2502.12700 | null |
2025-02-18 | Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees | Yongtao Wu et.al. | 2502.12678 | null |
2025-02-18 | Baichuan-M1: Pushing the Medical Capability of Large Language Models | Bingning Wang et.al. | 2502.12671 | null |
2025-02-18 | Perovskite-LLM: Knowledge-Enhanced Large Language Models for Perovskite Solar Cell Research | Xiang Liu et.al. | 2502.12669 | null |
2025-02-18 | Evaluation of Best-of-N Sampling Strategies for Language Model Alignment | Yuki Ichihara et.al. | 2502.12668 | null |
2025-02-18 | A |
Junhui He et.al. | 2502.12665 | null |
2025-02-18 | Demystifying Multilingual Chain-of-Thought in Process Reward Modeling | Weixuan Wang et.al. | 2502.12663 | null |
2025-02-18 | The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1 | Kaiwen Zhou et.al. | 2502.12659 | null |
2025-02-18 | R.R.: Unveiling LLM Training Privacy through Recollection and Ranking | Wenlong Meng et.al. | 2502.12658 | link |
2025-02-18 | NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation | Zhiyuan Liu et.al. | 2502.12638 | link |
2025-02-18 | Corrupted but Not Broken: Rethinking the Impact of Corrupted Data in Visual Instruction Tuning | Yunhao Gou et.al. | 2502.12635 | null |
2025-02-18 | \textit{One Size doesn't Fit All}: A Personalized Conversational Tutoring Agent for Mathematics Instruction | Ben Liu et.al. | 2502.12633 | null |
2025-02-18 | Automating Prompt Leakage Attacks on Large Language Models Using Agentic Approach | Tvrtko Sternak et.al. | 2502.12630 | link |
2025-02-18 | DeepResonance: Enhancing Multimodal Music Understanding via Music-centric Multi-way Instruction Tuning | Zhuoyuan Mao et.al. | 2502.12623 | null |
2025-02-18 | Improving Chain-of-Thought Reasoning via Quasi-Symbolic Abstractions | Leonardo Ranaldi et.al. | 2502.12616 | null |
2025-02-17 | Idiosyncrasies in Large Language Models | Mingjie Sun et.al. | 2502.12150 | link |
2025-02-17 | HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation | Ling Yang et.al. | 2502.12148 | link |
2025-02-17 | Fast or Better? Balancing Accuracy and Cost in Retrieval-Augmented Generation with Flexible User Control | Jinyan Su et.al. | 2502.12145 | link |
2025-02-17 | Small Models Struggle to Learn from Strong Reasoners | Yuetai Li et.al. | 2502.12143 | null |
2025-02-17 | SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs | Yige Xu et.al. | 2502.12134 | null |
2025-02-17 | Transformer Dynamics: A neuroscientific approach to interpretability of large language models | Jesseba Fernando et.al. | 2502.12131 | null |
2025-02-17 | Scaling Autonomous Agents via Automatic Reward Modeling And Planning | Zhenfang Chen et.al. | 2502.12130 | null |
2025-02-17 | LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities | Florian Sestak et.al. | 2502.12128 | link |
2025-02-17 | Minimal Ranks, Maximum Confidence: Parameter-efficient Uncertainty Quantification for LoRA | Patryk Marszałek et.al. | 2502.12122 | link |
2025-02-17 | LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws | Prasanna Mayilvahanan et.al. | 2502.12120 | null |
2025-02-17 | PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection | Jinhe Bi et.al. | 2502.12119 | null |
2025-02-17 | A-MEM: Agentic Memory for LLM Agents | Wujiang Xu et.al. | 2502.12110 | link |
2025-02-17 | Personality Structured Interview for Large Language Model Simulation in Personality Research | Pengda Wang et.al. | 2502.12109 | null |
2025-02-17 | Relational Norms for Human-AI Cooperation | Brian D. Earp et.al. | 2502.12102 | null |
2025-02-17 | Token Communications: A Unified Framework for Cross-modal Context-aware Semantic Communications | Li Qiao et.al. | 2502.12096 | null |
2025-02-17 | How compositional generalization and creativity improve as diffusion models are trained | Alessandro Favero et.al. | 2502.12089 | null |
2025-02-17 | Meta-Statistical Learning: Supervised Learning of Statistical Inference | Maxime Peyrard et.al. | 2502.12088 | null |
2025-02-17 | APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs | Yuxiang Huang et.al. | 2502.12085 | link |
2025-02-17 | Can LLMs Simulate Social Media Engagement? A Study on Action-Guided Response Generation | Zhongyi Qiu et.al. | 2502.12073 | null |
2025-02-17 | TokenSkip: Controllable Chain-of-Thought Compression in LLMs | Heming Xia et.al. | 2502.12067 | link |
2025-02-17 | CONSTRUCTA: Automating Commercial Construction Schedules in Fabrication Facilities with Large Language Models | Yifan Zhang et.al. | 2502.12066 | null |
2025-02-17 | AI-generated Text Detection with a GLTR-based Approach | Lucía Yan Wu et.al. | 2502.12064 | null |
2025-02-17 | Designing Role Vectors to Improve LLM Inference Behaviour | Daniele Potertì et.al. | 2502.12055 | null |
2025-02-17 | PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning | Xinyu Zhang et.al. | 2502.12054 | null |
2025-02-17 | A Survey on Bridging EEG Signals and Generative AI: From Image and Text to Beyond | Shreya Shukla et.al. | 2502.12048 | null |
2025-02-17 | KnowPath: Knowledge-enhanced Reasoning via LLM-generated Inference Paths over Knowledge Graphs | Qi Zhao et.al. | 2502.12029 | null |
2025-02-17 | SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities | Fengqing Jiang et.al. | 2502.12025 | null |
2025-02-17 | Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving | Xin Xu et.al. | 2502.12022 | null |
2025-02-17 | Atom of Thoughts for Markov LLM Test-Time Scaling | Fengwei Teng et.al. | 2502.12018 | null |
2025-02-17 | Unsupervised Structural-Counterfactual Generation under Domain Shift | Krishn Vishwas Kher et.al. | 2502.12013 | null |
2025-02-17 | Design Considerations Based on Stability for a Class of TCP Algorithms | Sreekanth Prabhakar et.al. | 2502.11983 | null |
2025-02-17 | Image Inversion: A Survey from GANs to Diffusion and Beyond | Yinan Chen et.al. | 2502.11974 | link |
2025-02-17 | Generating Text from Uniform Meaning Representation | Emma Markle et.al. | 2502.11973 | link |
2025-02-17 | A MIMO Wireless Channel Foundation Model via CIR-CSI Consistency | Jun Jiang et.al. | 2502.11965 | null |
2025-02-17 | Navigating the Helpfulness-Truthfulness Trade-Off with Uncertainty-Aware Instruction Fine-Tuning | Tianyi Wu et.al. | 2502.11962 | null |
2025-02-17 | On Representational Dissociation of Language and Arithmetic in Large Language Models | Riku Kisako et.al. | 2502.11932 | null |
2025-02-17 | GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs | Yi Fang et.al. | 2502.11925 | null |
2025-02-17 | From Text to Trust: Empowering AI-assisted Decision Making with Adaptive LLM-powered Analysis | Zhuoyan Li et.al. | 2502.11919 | null |
2025-02-17 | EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models | Jiamin Su et.al. | 2502.11916 | null |
2025-02-17 | Adversarial Alignment for LLMs Requires Simpler, Reproducible, and More Measurable Objectives | Leo Schwinn et.al. | 2502.11910 | null |
2025-02-17 | MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation | Haochen Xue et.al. | 2502.11903 | null |
2025-02-17 | DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation | Zhihang Yuan et.al. | 2502.11897 | link |
2025-02-17 | CAMEL: Continuous Action Masking Enabled by Large Language Models for Reinforcement Learning | Yanxiao Zhao et.al. | 2502.11896 | null |
2025-02-17 | Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models? | Jacob Nielsen et.al. | 2502.11895 | null |
2025-02-17 | Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration | Shao Zhang et.al. | 2502.11882 | link |
2025-02-17 | Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models | Hyunwoo Kim et.al. | 2502.11881 | null |
2025-02-17 | Bitnet.cpp: Efficient Edge Inference for Ternary LLMs | Jinheng Wang et.al. | 2502.11880 | link |
2025-02-17 | JoLT: Joint Probabilistic Predictions on Tabular Data Using LLMs | Aliaksandra Shysheya et.al. | 2502.11877 | link |
2025-02-17 | FedEAT: A Robustness Optimization Framework for Federated LLMs | Yahao Pang et.al. | 2502.11863 | null |
2025-02-17 | Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu | Renhao Pei et.al. | 2502.11862 | null |
2025-02-17 | Exploring Large Language Models in Healthcare: Insights into Corpora Sources, Customization Strategies, and Evaluation Metrics | Shuqi Yang et.al. | 2502.11861 | null |
2025-02-17 | StructTransform: A Scalable Attack Surface for Safety-Aligned Large Language Models | Shehel Yoosuf et.al. | 2502.11853 | link |
2025-02-17 | BaxBench: Can LLMs Generate Correct and Secure Backends? | Mark Vero et.al. | 2502.11844 | null |
2025-02-17 | Can LLM Agents Maintain a Persona in Discourse? | Pranav Bhandari et.al. | 2502.11843 | null |
2025-02-17 | Model Generalization on Text Attribute Graphs: Principles with Large Language Models | Haoyu Wang et.al. | 2502.11836 | link |
2025-02-17 | HAAN: A Holistic Approach for Accelerating Normalization Operations in Large Language Models | Tianfan Peng et.al. | 2502.11832 | null |
2025-02-17 | Intuitive physics understanding emerges from self-supervised pretraining on natural videos | Quentin Garrido et.al. | 2502.11831 | link |
2025-02-17 | Text Classification in the LLM Era - Where do we stand? | Sowmya Vajjala et.al. | 2502.11830 | null |
2025-02-17 | Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities | Hanbin Wang et.al. | 2502.11829 | link |
2025-02-17 | M-ABSA: A Multilingual Dataset for Aspect-Based Sentiment Analysis | Chengyan Wu et.al. | 2502.11824 | link |
2025-02-17 | Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis | Xu Wang et.al. | 2502.11812 | null |
2025-02-17 | FineFilter: A Fine-grained Noise Filtering Mechanism for Retrieval-Augmented Large Language Models | Qianchi Zhang et.al. | 2502.11811 | null |
2025-02-17 | Exploring Translation Mechanism of Large Language Models | Hongbin Zhang et.al. | 2502.11806 | null |
2025-02-17 | Table-Critic: A Multi-Agent Framework for Collaborative Criticism and Refinement in Table Reasoning | Peiying Yu et.al. | 2502.11799 | null |
2025-02-17 | Personality Editing for Language Models through Relevant Knowledge Editing | Seojin Hwang et.al. | 2502.11789 | null |
2025-02-17 | Efficient Response Generation Method Selection for Fine-Tuning Large Language Models | Xuan Ren et.al. | 2502.11779 | null |
2025-02-17 | video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model | Guangzhi Sun et.al. | 2502.11775 | null |
2025-02-17 | The Validation Gap: A Mechanistic Analysis of How Language Models Compute Arithmetic but Fail to Validate It | Leonardo Bertolazzi et.al. | 2502.11771 | link |
2025-02-17 | Cognitive-Aligned Document Selection for Retrieval-augmented Generation | Bingyu Wan et.al. | 2502.11770 | null |
2025-02-17 | From Selection to Generation: A Survey of LLM-based Active Learning | Yu Xia et.al. | 2502.11767 | null |
2025-02-17 | Warmup-Distill: Bridge the Distribution Mismatch between Teacher and Student before Knowledge Distillation | Zengkui Sun et.al. | 2502.11766 | link |
2025-02-17 | HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims | Michiel van der Meer et.al. | 2502.11753 | null |
2025-02-17 | Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning | Yuqi Pang et.al. | 2502.11751 | link |
2025-02-17 | ILIAS: Instance-Level Image retrieval At Scale | Giorgos Kordopatis-Zilos et.al. | 2502.11748 | null |
2025-02-17 | SQL-o1: A Self-Reward Heuristic Dynamic Search Method for Text-to-SQL | Shuai Lyu et.al. | 2502.11741 | link |
2025-02-17 | ReviewEval: An Evaluation Framework for AI-Generated Reviews | Chavvi Kirtani et.al. | 2502.11736 | null |
2025-02-17 | Plant in Cupboard, Orange on Table, Book on Shelf. Benchmarking Practical Reasoning and Situation Modelling in a Text-Simulated Situated Environment | Jonathan Jordan et.al. | 2502.11733 | null |
2025-02-17 | Energy-Conscious LLM Decoding: Impact of Text Generation Strategies on GPU Energy Consumption | Alireza Nik et.al. | 2502.11723 | null |
2025-02-17 | Enhancing Recommendation Explanations through User-Centric Refinement | Jingsen Zhang et.al. | 2502.11721 | null |
2025-02-17 | Can you pass that tool?: Implications of Indirect Speech in Physical Human-Robot Collaboration | Yan Zhang et.al. | 2502.11720 | null |
2025-02-17 | Component-aware Unsupervised Logical Anomaly Generation for Industrial Anomaly Detection | Xuan Tong et.al. | 2502.11712 | null |
2025-02-17 | Ad-hoc Concept Forming in the Game Codenames as a Means for Evaluating Large Language Models | Sherzod Hakimov et.al. | 2502.11707 | null |
2025-02-17 | LLM Agents Making Agent Tools | Georg Wölflein et.al. | 2502.11705 | null |
2025-02-17 | CMQCIC-Bench: A Chinese Benchmark for Evaluating Large Language Models in Medical Quality Control Indicator Calculation | Guangya Yu et.al. | 2502.11703 | null |
2025-02-17 | MVTokenFlow: High-quality 4D Content Generation using Multiview Token Flow | Hanzhuo Huang et.al. | 2502.11697 | null |
2025-02-17 | Improve LLM-as-a-Judge Ability as a General Ability | Jiachen Yu et.al. | 2502.11689 | null |
2025-02-17 | MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task | Yuchen Yan et.al. | 2502.11684 | null |
2025-02-17 | RIDE: Enhancing Large Language Model Alignment through Restyled In-Context Learning Demonstration Exemplars | Yuncheng Hua et.al. | 2502.11681 | link |
2025-02-17 | Exploring LLM-based Student Simulation for Metacognitive Cultivation | Haoxuan Li et.al. | 2502.11678 | null |
2025-02-17 | Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception | Shiyu Ni et.al. | 2502.11677 | null |
2025-02-17 | Diversity-Oriented Data Augmentation with Large Language Models | Zaitian Wang et.al. | 2502.11671 | null |
2025-02-17 | VRoPE: Rotary Position Embedding for Video Large Language Models | Zikang Liu et.al. | 2502.11664 | link |
2025-02-17 | An Innovative Brain-Computer Interface Interaction System Based on the Large Language Model | Jing Jina et.al. | 2502.11659 | null |
2025-02-17 | Competing LLM Agents in a Non-Cooperative Game of Opinion Polarisation | Amin Qasmi et.al. | 2502.11649 | null |
2025-02-17 | DELMAN: Dynamic Defense Against Large Language Model Jailbreaking with Model Editing | Yi Wang et.al. | 2502.11647 | null |
2025-02-17 | Hyperspherical Energy Transformer with Recurrent Depth | Yunzhe Hu et.al. | 2502.11646 | null |
2025-02-17 | Is Human-Like Text Liked by Humans? Multilingual Human Detection and Preference Against AI | Yuxia Wang et.al. | 2502.11614 | null |
2025-02-17 | Maximum Entropy Reinforcement Learning with Diffusion Policy | Xiaoyi Dong et.al. | 2502.11612 | link |
2025-02-17 | Accuracy Assessment of OpenAlex and Clarivate Scholar ID with an LLM-Assisted Benchmark | Renyu Zhao et.al. | 2502.11610 | null |
2025-02-17 | GraphThought: Graph Combinatorial Optimization with Thought Generation | Zixiao Huang et.al. | 2502.11607 | null |
2025-02-14 | MM-RLHF: The Next Step Forward in Multimodal LLM Alignment | Yi-Fan Zhang et.al. | 2502.10391 | null |
2025-02-14 | Aspect-Oriented Summarization for Psychiatric Short-Term Readmission Prediction | WonJin Yoon et.al. | 2502.10388 | null |
2025-02-14 | Robustness tests for biomedical foundation models should tailor to specification | R. Patrick Xian et.al. | 2502.10374 | link |
2025-02-14 | AffinityFlow: Guided Flows for Antibody Affinity Maturation | Can Chen et.al. | 2502.10365 | null |
2025-02-14 | Enhancing Multilingual LLM Pretraining with Model-Based Data Selection | Bettina Messmer et.al. | 2502.10361 | null |
2025-02-14 | Dimension-free Score Matching and Time Bootstrapping for Diffusion Models | Syamantak Kumar et.al. | 2502.10354 | null |
2025-02-14 | Organize the Web: Constructing Domains Enhances Pre-Training Data Curation | Alexander Wettig et.al. | 2502.10341 | null |
2025-02-14 | Evaluating the Meta- and Object-Level Reasoning of Large Language Models for Question Answering | Nick Ferguson et.al. | 2502.10338 | null |
2025-02-14 | Generalised Parallel Tempering: Flexible Replica Exchange via Flows and Diffusions | Leo Zhang et.al. | 2502.10328 | null |
2025-02-14 | LLM-Powered Preference Elicitation in Combinatorial Assignment | Ermis Soumalias et.al. | 2502.10308 | null |
2025-02-14 | SPIRIT: Short-term Prediction of solar IRradIance for zero-shot Transfer learning using Foundation Models | Aditya Mishra et.al. | 2502.10307 | null |
2025-02-14 | Open-Source AI-Powered Optimization in Scalene: Advancing Python Performance Profiling with DeepSeek-R1 and LLaMA 3.2 | Saem Hasan et.al. | 2502.10299 | null |
2025-02-14 | Probabilistic Super-Resolution for High-Fidelity Physical System Simulations with Uncertainty Quantification | Pengyu Zhang et.al. | 2502.10280 | null |
2025-02-14 | Are Large Language Models the future crowd workers of Linguistics? | Iris Ferrazzo et.al. | 2502.10266 | null |
2025-02-14 | Large Language Models and Synthetic Data for Monitoring Dataset Mentions in Research Papers | Aivin V. Solatorio et.al. | 2502.10263 | null |
2025-02-14 | VisCon-100K: Leveraging Contextual Web Data for Fine-tuning Vision Language Models | Gokul Karthik Kumar et.al. | 2502.10250 | null |
2025-02-14 | Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model | Guoqing Ma et.al. | 2502.10248 | link |
2025-02-14 | Efficient Zero-Order Federated Finetuning of Language Models for Resource-Constrained Devices | Mohamed Aboelenien Ahmed et.al. | 2502.10239 | null |
2025-02-14 | Shaping Inductive Bias in Diffusion Models through Frequency-Based Noise Control | Thomas Jiralerspong et.al. | 2502.10236 | null |
2025-02-14 | AdaPTS: Adapting Univariate Foundation Models to Probabilistic Multivariate Time Series Forecasting | Abdelhakim Benechehab et.al. | 2502.10235 | link |
2025-02-14 | Do Large Language Models Reason Causally Like Us? Even Better? | Hanna M. Dettki et.al. | 2502.10215 | null |
2025-02-14 | Can Post-Training Quantization Benefit from an Additional QLoRA Integration? | Xiliang Zhu et.al. | 2502.10202 | null |
2025-02-14 | Prediction hubs are context-informed frequent tokens in LLMs | Beatrix M. G. Nielsen et.al. | 2502.10201 | null |
2025-02-14 | MathConstruct: Challenging LLM Reasoning with Constructive Proofs | Mislav Balunović et.al. | 2502.10197 | null |
2025-02-14 | Translating Common Security Assertions Across Processor Designs: A RISC-V Case Study | Sharjeel Imtiaz et.al. | 2502.10194 | null |
2025-02-14 | VideoDiff: Human-AI Video Co-Creation with Alternatives | Mina Huh et.al. | 2502.10190 | null |
2025-02-14 | Modeling biases in binary decision-making within the generalized nonlinear q-voter model | Maciej Doniec et.al. | 2502.10172 | null |
2025-02-14 | Video Soundtrack Generation by Aligning Emotions and Temporal Boundaries | Serkan Sulun et.al. | 2502.10154 | null |
2025-02-14 | Semantica: Decentralized Search using a LLM-Guided Semantic Tree Overlay | Petru Neague et.al. | 2502.10151 | link |
2025-02-14 | Cooperative Multi-Agent Planning with Adaptive Skill Synthesis | Zhiyuan Li et.al. | 2502.10148 | null |
2025-02-14 | Small Models, Big Impact: Efficient Corpus and Graph-Based Adaptation of Small Multilingual Language Models for Low-Resource Languages | Daniil Gurgurov et.al. | 2502.10140 | null |
2025-02-14 | Physics-Informed Generative Modeling of Wireless Channels | Benedikt Böck et.al. | 2502.10137 | null |
2025-02-14 | ScamFerret: Detecting Scam Websites Autonomously with Large Language Models | Hiroki Nakano et.al. | 2502.10110 | link |
2025-02-14 | NeuroXVocal: Detection and Explanation of Alzheimer's Disease through Non-invasive Analysis of Picture-prompted Speech | Nikolaos Ntampakis et.al. | 2502.10108 | null |
2025-02-14 | A novel approach to data generation in generative model | JaeHong Kim et.al. | 2502.10092 | null |
2025-02-14 | Enhancing Patient Acceptance of Robotic Ultrasound through Conversational Virtual Agent and Immersive Visualizations | Tianyu Song et.al. | 2502.10088 | null |
2025-02-14 | DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery | Utkarsh Mall et.al. | 2502.10060 | null |
2025-02-14 | A Generalized Modeling Approach to Liquid-driven Ballooning Membranes | Mirroyal Ismayilov et.al. | 2502.10057 | null |
2025-02-14 | ORI: O Routing Intelligence | Ahmad Shadid et.al. | 2502.10051 | null |
2025-02-14 | A Survey on LLM-powered Agents for Recommender Systems | Qiyao Peng et.al. | 2502.10050 | null |
2025-02-14 | ViRAC: A Vision-Reasoning Agent Head Movement Control Framework in Arbitrary Virtual Environments | Juyeong Hwang et.al. | 2502.10046 | null |
2025-02-14 | POI-Enhancer: An LLM-based Semantic Enhancement Framework for POI Representation Learning | Jiawei Cheng et.al. | 2502.10038 | null |
2025-02-14 | Probabilistic Lexical Manifold Construction in Large Language Models via Hierarchical Vector Field Interpolation | Clive Pendleton et.al. | 2502.10013 | null |
2025-02-14 | ChatGPT and Deepseek: Can They Predict the Stock Market and Macroeconomy? | Jian Chen et.al. | 2502.10008 | null |
2025-02-14 | EmbBERT-Q: Breaking Memory Barriers in Embedded NLP | Riccardo Bravin et.al. | 2502.10001 | null |
2025-02-14 | Decision Information Meets Large Language Models: The Future of Explainable Operations Research | Yansen Zhang et.al. | 2502.09994 | null |
2025-02-14 | Large Language Diffusion Models | Shen Nie et.al. | 2502.09992 | null |
2025-02-14 | V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models | Hsu-kuang Chiu et.al. | 2502.09980 | null |
2025-02-14 | LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs - No Silver Bullet for LC or RAG Routing | Kuan Li et.al. | 2502.09977 | null |
2025-02-14 | Has My System Prompt Been Used? Large Language Model Prompt Membership Inference | Roman Levin et.al. | 2502.09974 | null |
2025-02-14 | KGGen: Extracting Knowledge Graphs from Plain Text with Language Models | Belinda Mo et.al. | 2502.09956 | null |
2025-02-14 | A Preliminary Exploration with GPT-4o Voice Mode | Yu-Xiang Lin et.al. | 2502.09940 | null |
2025-02-14 | Precise Parameter Localization for Textual Generation in Diffusion Models | Łukasz Staniszewski et.al. | 2502.09935 | null |
2025-02-14 | MIR-Bench: Benchmarking LLM's Long-Context Intelligence via Many-Shot In-Context Inductive Reasoning | Kai Yan et.al. | 2502.09933 | null |
2025-02-14 | Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence | Granite Vision Team et.al. | 2502.09927 | null |
2025-02-14 | λScale: Enabling Fast Scaling for Serverless Large Language Model Inference | Minchen Yu et.al. | 2502.09922 | null |
2025-02-14 | INF^2: High-Throughput Generative Inference of Large Language Models using Near-Storage Processing | Hongsun Jang et.al. | 2502.09921 | null |
2025-02-14 | AutoS |
Zhengqiu Zhu et.al. | 2502.09913 | null |
2025-02-14 | Insect-Foundation: A Foundation Model and Large Multimodal Dataset for Vision-Language Insect Understanding | Thanh-Dat Truong et.al. | 2502.09906 | null |
2025-02-14 | The Ann Arbor Architecture for Agent-Oriented Programming | Wei Dong et.al. | 2502.09903 | null |
2025-02-14 | Artificial Intelligence in Spectroscopy: Advancing Chemistry from Prediction to Generation and Beyond | Kehan Guo et.al. | 2502.09897 | null |
2025-02-14 | ChatIoT: Large Language Model-based Security Assistant for Internet of Things with Retrieval-Augmented Generation | Ye Dong et.al. | 2502.09896 | null |
2025-02-14 | ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation | Shu Wang et.al. | 2502.09891 | null |
2025-02-14 | Video2Policy: Scaling up Manipulation Tasks in Simulation through Internet Videos | Weirui Ye et.al. | 2502.09886 | null |
2025-02-14 | Solvable Dynamics of Self-Supervised Word Embeddings and the Emergence of Analogical Reasoning | Dhruva Karkada et.al. | 2502.09863 | null |
2025-02-14 | Microphone Array Geometry Independent Multi-Talker Distant ASR: NTT System for the DASR Task of the CHiME-8 Challenge | Naoyuki Kamo et.al. | 2502.09859 | null |
2025-02-14 | Automated Hypothesis Validation with Agentic Sequential Falsifications | Kexin Huang et.al. | 2502.09858 | link |
2025-02-14 | Port-LLM: A Port Prediction Method for Fluid Antenna based on Large Language Models | Yali Zhang et.al. | 2502.09857 | null |
2025-02-14 | Efficient Multitask Learning in Small Language Models Through Upside-Down Reinforcement Learning | Yu-Chen Lin et.al. | 2502.09854 | null |
2025-02-14 | HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation | Tianwei Lin et.al. | 2502.09838 | link |
2025-02-13 | A Solver-Aided Hierarchical Language for LLM-Driven CAD Design | Benjamin T. Jones et.al. | 2502.09819 | null |
2025-02-13 | Statistical Coherence Alignment for Large Language Model Representation Learning Through Tensor Field Convergence | Jonathan Gale et.al. | 2502.09815 | null |
2025-02-13 | INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages | Hao Yu et.al. | 2502.09814 | null |
2025-02-13 | AgentGuard: Repurposing Agentic Orchestrator for Safety Evaluation of Tool Orchestration | Jizhou Chen et.al. | 2502.09809 | null |
2025-02-13 | Unit Testing Past vs. Present: Examining LLMs' Impact on Defect Detection and Efficiency | Rudolf Ramler et.al. | 2502.09801 | null |
2025-02-13 | Co-designing Large Language Model Tools for Project-Based Learning with K12 Educators | Prerna Ravi et.al. | 2502.09799 | null |
2025-02-13 | A Survey on LLM-based News Recommender Systems | Rongyao Wang et.al. | 2502.09797 | null |
2025-02-13 | TableTalk: Scaffolding Spreadsheet Development with a Language Agent | Jenny T. Liang et.al. | 2502.09787 | null |
2025-02-13 | Improving Acoustic Side-Channel Attacks on Keyboards Using Transformers and Large Language Models | Jin Hyun Park et.al. | 2502.09782 | null |
2025-02-13 | CellFlow: Simulating Cellular Morphology Changes via Flow Matching | Yuhui Zhang et.al. | 2502.09775 | null |
2025-02-13 | Non-Markovian Discrete Diffusion with Causal Language Models | Yangtian Zhang et.al. | 2502.09767 | null |
2025-02-13 | LLM-Generated Microservice Implementations from RESTful API Definitions | Saurabh Chauhan et.al. | 2502.09766 | link |
2025-02-13 | Enhancing Jailbreak Attacks via Compliance-Refusal-Based Initialization | Amit Levi et.al. | 2502.09755 | null |
2025-02-13 | Vote-Tree-Planner: Optimizing Execution Order in LLM-based Task Planning Pipeline via Voting | Chaoyuan Zhang et.al. | 2502.09749 | null |
2025-02-13 | The Widespread Adoption of Large Language Model-Assisted Writing Across Society | Weixin Liang et.al. | 2502.09747 | null |
2025-02-13 | Fine-Tuning Foundation Models with Federated Learning for Privacy Preserving Medical Time Series Forecasting | Mahad Ali et.al. | 2502.09744 | null |
2025-02-13 | FoNE: Precise Single-Token Number Embeddings via Fourier Features | Tianyi Zhou et.al. | 2502.09741 | null |
2025-02-13 | Making Them a Malicious Database: Exploiting Query Code to Jailbreak Aligned Large Language Models | Qingsong Zou et.al. | 2502.09723 | link |
2025-02-13 | NestQuant: Nested Lattice Quantization for Matrix Products and LLMs | Semyon Savkin et.al. | 2502.09720 | null |
2025-02-13 | Genetic Data Governance in Crisis: Policy Recommendations for Safeguarding Privacy and Preventing Discrimination | Vivek Ramanan et.al. | 2502.09716 | null |
2025-02-13 | MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency | Dongzhi Jiang et.al. | 2502.09621 | null |
2025-02-13 | Exploring the Potential of Encoder-free Architectures in 3D LMMs | Yiwen Tang et.al. | 2502.09620 | link |
2025-02-13 | Designing a Conditional Prior Distribution for Flow-Based Generative Models | Noam Issachar et.al. | 2502.09611 | null |
2025-02-14 | Score-of-Mixture Training: Training One-Step Generative Models Made Simple via Score Estimation of Mixture Distributions | Tejas Jayashankar et.al. | 2502.09609 | null |
2025-02-13 | Human-LLM Coevolution: Evidence from Academic Writing | Mingmeng Geng et.al. | 2502.09606 | null |
2025-02-13 | SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models | Yung-Sung Chuang et.al. | 2502.09604 | link |
2025-02-13 | Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs | Siyan Zhao et.al. | 2502.09597 | link |
2025-02-13 | KIMAs: A Configurable Knowledge Integrated Multi-Agent System | Zitao Li et.al. | 2502.09596 | null |
2025-02-13 | Logical forms complement probability in understanding language model (and human) performance | Yixuan Wang et.al. | 2502.09589 | null |
2025-02-13 | Rolling Ahead Diffusion for Traffic Scene Simulation | Yunpeng Liu et.al. | 2502.09587 | null |
2025-02-13 | Polymind: Parallel Visual Diagramming with Large Language Models to Support Prewriting Through Microtasks | Qian Wan et.al. | 2502.09577 | null |
2025-02-13 | Zero-shot generation of synthetic neurosurgical data with large language models | Austin A. Barr et.al. | 2502.09566 | link |
2025-02-13 | MDCrow: Automating Molecular Dynamics Workflows with Large Language Models | Quintina Campbell et.al. | 2502.09565 | link |
2025-02-13 | EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents | Rui Yang et.al. | 2502.09560 | null |
2025-02-13 | Explainable AI-assisted Optimization for Feynman Integral Reduction | Zhuo-Yang Song et.al. | 2502.09544 | null |
2025-02-13 | Mind the Gap! Choice Independence in Using Multilingual LLMs for Persuasive Co-Writing Tasks in Different Languages | Shreyan Biswas et.al. | 2502.09532 | null |
2025-02-13 | SQ-GAN: Semantic Image Communications Using Masked Vector Quantization | Francesco Pezone et.al. | 2502.09520 | link |
2025-02-13 | Diffusion Models for Molecules: A Survey of Methods and Tasks | Liang Wang et.al. | 2502.09511 | link |
2025-02-13 | EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling | Theodoros Kouzelis et.al. | 2502.09509 | null |
2025-02-13 | Improve LLM-based Automatic Essay Scoring with Linguistic Features | Zhaoyi Joey Hou et.al. | 2502.09497 | null |
2025-02-13 | Foundation Neural-Network Quantum States | Riccardo Rende et.al. | 2502.09488 | null |
2025-02-13 | Objective quantification of mood states using large language models | Jakub Onysk et.al. | 2502.09487 | null |
2025-02-13 | DiffRenderGAN: Addressing Training Data Scarcity in Deep Segmentation Networks for Quantitative Nanomaterial Analysis through Differentiable Rendering and Generative Modelling | Dennis Possart et.al. | 2502.09477 | null |
2025-02-13 | Transformer-Enhanced Variational Autoencoder for Crystal Structure Prediction | Ziyi Chen et.al. | 2502.09423 | null |
2025-02-13 | ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation | Rotem Shalev-Arkushin et.al. | 2502.09411 | null |
2025-02-13 | SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models | Daniel Fleischer et.al. | 2502.09390 | link |
2025-02-13 | Truth Knows No Language: Evaluating Truthfulness Beyond English | Blanca Calvo Figueras et.al. | 2502.09387 | null |
2025-02-13 | APT-LLM: Embedding-Based Anomaly Detection of Cyber Advanced Persistent Threats Using Large Language Models | Sidahmed Benabderrahmane et.al. | 2502.09385 | null |
2025-02-13 | LoRA Training Provably Converges to a Low-Rank Global Minimum or It Fails Loudly (But it Probably Won't Fail) | Junsu Kim et.al. | 2502.09376 | null |
2025-02-13 | Inverse problems with experiment-guided AlphaFold | Advaith Maddipatla et.al. | 2502.09372 | null |
2025-02-13 | Language Agents as Digital Representatives in Collective Decision-Making | Daniel Jarrett et.al. | 2502.09369 | null |
2025-02-13 | Machine learning for modelling unstructured grid data in computational physics: a review | Sibo Cheng et.al. | 2502.09346 | null |
2025-02-13 | ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments | Youhe Jiang et.al. | 2502.09334 | null |
2025-02-13 | Beyond English: The Impact of Prompt Translation Strategies across Languages and Tasks in Multilingual LLMs | Itai Mondshine et.al. | 2502.09331 | null |
2025-02-13 | Copilot Arena: A Platform for Code LLM Evaluation in the Wild | Wayne Chi et.al. | 2502.09328 | null |
2025-02-13 | A Benchmark for Crime Surveillance Video Analysis with Large Models | Haoran Chen et.al. | 2502.09325 | null |
2025-02-13 | A Judge-free LLM Open-ended Generation Benchmark Based on the Distributional Hypothesis | Kentaro Imajo et.al. | 2502.09316 | link |
2025-02-13 | When the LM misunderstood the human chuckled: Analyzing garden path effects in humans and language models | Samuel Joseph Amouyal et.al. | 2502.09307 | null |
2025-02-13 | Non-asymptotic Analysis of Diffusion Annealed Langevin Monte Carlo for Generative Modelling | Paula Cordero-Encinar et.al. | 2502.09306 | null |
2025-02-13 | KET-RAG: A Cost-Efficient Multi-Granular Indexing Framework for Graph-RAG | Yiqian Huang et.al. | 2502.09304 | link |
2025-02-13 | When do neural networks learn world models? | Tianren Zhang et.al. | 2502.09297 | null |
2025-02-13 | SparQLe: Speech Queries to Text Translation Through LLMs | Amirbek Djanibekov et.al. | 2502.09284 | null |
2025-02-13 | GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation | Hongyin Zhang et.al. | 2502.09268 | null |
2025-02-13 | AnomalyGFM: Graph Foundation Model for Zero/Few-shot Anomaly Detection | Hezhe Qiao et.al. | 2502.09254 | null |
2025-02-13 | From large language models to multimodal AI: A scoping review on the potential of generative AI in medicine | Lukas Buess et.al. | 2502.09242 | null |
2025-02-13 | OpenBench: A New Benchmark and Baseline for Semantic Navigation in Smart Logistics | Junhui Wang et.al. | 2502.09238 | null |
2025-02-13 | Reliable Conversational Agents under ASP Control that Understand Natural Language | Yankai Zeng et.al. | 2502.09237 | null |
2025-02-13 | Data2Concept2Text: An Explainable Multilingual Framework for Data Analysis Narration | Flavio Bertini et.al. | 2502.09218 | null |
2025-02-13 | LP-LM: No Hallucinations in Question Answering with Logic Programming | Katherine Wu et.al. | 2502.09212 | link |
2025-02-13 | Visual Graph Question Answering with ASP and LLMs for Language Parsing | Jakob Johannes Bauer et.al. | 2502.09211 | null |
2025-02-13 | On LLM-generated Logic Programs and their Inference Execution Methods | Paul Tarau et.al. | 2502.09209 | null |
2025-02-13 | Logical Lease Litigation: Prolog and LLMs for Rental Law Compliance in New York | Sanskar Sehgal et.al. | 2502.09204 | null |
2025-02-13 | XAInomaly: Explainable and Interpretable Deep Contractive Autoencoder for O-RAN Traffic Anomaly Detection | Osman Tugay Basaran et.al. | 2502.09194 | null |
2025-02-13 | Thinking beyond the anthropomorphic paradigm benefits LLM research | Lujain Ibrahim et.al. | 2502.09192 | null |
2025-02-13 | Matina: A Large-Scale 73B Token Persian Text Corpus | Sara Bourbour Hosseinbeigi et.al. | 2502.09188 | null |
2025-02-13 | RefineCoder: Iterative Improving of Large Language Models via Adaptive Critique Refinement for Code Generation | Changzhi Zhou et.al. | 2502.09183 | null |
2025-02-13 | FLAME: Flexible LLM-Assisted Moderation Engine | Ivan Bakulin et.al. | 2502.09175 | null |
2025-02-13 | Two-Stage Representation Learning for Analyzing Movement Behavior Dynamics in People Living with Dementia | Jin Cui et.al. | 2502.09173 | null |
2025-02-13 | Improving TCM Question Answering through Tree-Organized Self-Reflective Retrieval with LLMs | Chang Liu et.al. | 2502.09156 | null |
2025-02-13 | Finite-Time Analysis of Discrete-Time Stochastic Interpolants | Yuhao Liu et.al. | 2502.09130 | null |
2025-02-13 | One-shot Federated Learning Methods: A Practical Guide | Xiang Liu et.al. | 2502.09104 | null |
2025-02-13 | Bridging the Gap Between LLMs and Human Intentions: Progresses and Challenges in Instruction Understanding, Intention Reasoning, and Reliable Generation | Zongyu Chang et.al. | 2502.09101 | null |
2025-02-13 | Logical Reasoning in Large Language Models: A Survey | Hanmeng Liu et.al. | 2502.09100 | null |
2025-02-13 | Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking | Greta Warren et.al. | 2502.09083 | null |
2025-02-13 | CoSER: Coordinating LLM-Based Persona Simulation of Established Roles | Xintao Wang et.al. | 2502.09082 | link |
2025-02-13 | Enhancing RAG with Active Learning on Conversation Records: Reject Incapables and Answer Capables | Xuzhao Geng et.al. | 2502.09073 | null |
2025-02-13 | Unleashing the Power of Large Language Model for Denoising Recommendation | Shuyao Wang et.al. | 2502.09058 | null |
2025-02-13 | An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging | Kunat Pipatanakul et.al. | 2502.09056 | null |
2025-02-13 | Game Theory Meets Large Language Models: A Systematic Survey | Haoran Sun et.al. | 2502.09053 | null |
2025-02-13 | Typhoon T1: An Open Thai Reasoning Model | Pittawat Taveekitworachai et.al. | 2502.09042 | null |
2025-02-13 | Implementation of a Fuzzy Relational Database. Case Study: Chilean Cardboard Industry in the Maule Region | Leoncio Jimenez et.al. | 2502.09035 | null |
2025-02-13 | MTDP: Modulated Transformer Diffusion Policy Model | Qianhao Wang et.al. | 2502.09029 | null |
2025-02-13 | EventSTR: A Benchmark Dataset and Baselines for Event Stream based Scene Text Recognition | Xiao Wang et.al. | 2502.09020 | link |
2025-02-13 | Diversity Enhances an LLM's Performance in RAG and Long-context Task | Zhchao Wang et.al. | 2502.09017 | null |
2025-02-13 | Hope vs. Hate: Understanding User Interactions with LGBTQ+ News Content in Mainstream US News Media through the Lens of Hope Speech | Jonathan Pofcher et.al. | 2502.09004 | null |
2025-02-13 | RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models | Quan Wei et.al. | 2502.09003 | null |
2025-02-13 | End-to-End triplet loss based fine-tuning for network embedding in effective PII detection | Rishika Kohli et.al. | 2502.09002 | null |
2025-02-13 | Task Generalization With AutoRegressive Compositional Structure: Can Learning From |
Amirhesam Abedsoltan et.al. | 2502.08991 | null |
2025-02-13 | Prophet Inequalities for Bandits, Cabinets, and DAGs | Robin Bowers et.al. | 2502.08976 | null |
2025-02-13 | Medicine on the Edge: Comparative Performance Analysis of On-Device LLMs for Clinical Reasoning | Leon Nissen et.al. | 2502.08954 | link |
2025-02-13 | Structured Convergence in Large Language Model Representations via Hierarchical Latent Space Folding | Fenella Harcourt et.al. | 2502.08947 | null |
2025-02-13 | Beyond the Singular: The Essential Role of Multiple Generations in Effective Benchmark Evaluation and Analysis | Wenbo Zhang et.al. | 2502.08943 | null |
2025-02-13 | Escaping Collapse: The Strength of Weak Data for Large Language Model Training | Kareem Amin et.al. | 2502.08924 | null |
2025-02-13 | Self-Consistency of the Internal Reward Models Improves Self-Rewarding Language Models | Xin Zhou et.al. | 2502.08922 | null |
2025-02-13 | Detecting Malicious Concepts Without Image Generation in AIGC | Kun Xu et.al. | 2502.08921 | null |
2025-02-13 | InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU | Heejun Lee et.al. | 2502.08910 | null |
2025-02-13 | Towards Automated Fact-Checking of Real-World Claims: Exploring Task Formulation and Assessment with LLMs | Premtim Sahitaj et.al. | 2502.08909 | null |
2025-02-13 | Reinforced Large Language Model is a formal theorem prover | Zhiling Luo et.al. | 2502.08908 | link |
2025-02-13 | DiffoRA: Enabling Parameter-Efficient LLM Fine-Tuning via Differential Low-Rank Matrix Adaptation | Tangyu Jiang et.al. | 2502.08905 | null |
2025-02-13 | MIH-TCCT: Mitigating Inconsistent Hallucinations in LLMs via Event-Driven Text-Code Cyclic Training | Xinxin You et.al. | 2502.08904 | null |
2025-02-13 | 3D-Grounded Vision-Language Framework for Robotic Task Planning: Automated Prompt Synthesis and Supervised Reasoning | Guoqin Tang et.al. | 2502.08903 | null |
2025-02-13 | Communication is All You Need: Persuasion Dataset Construction via Multi-LLM Communication | Weicheng Ma et.al. | 2502.08896 | null |
2025-02-13 | ShapeLib: designing a library of procedural 3D shape abstractions with Large Language Models | R. Kenny Jones et.al. | 2502.08884 | null |
2025-02-13 | Utilizing Pre-trained and Large Language Models for 10-K Items Segmentation | Hsin-Min Lu et.al. | 2502.08875 | null |
2025-02-13 | Harnessing Vision Models for Time Series Analysis: A Survey | Jingchao Ni et.al. | 2502.08869 | link |
2025-02-13 | A Systematic Evaluation of Generative Models on Tabular Transportation Data | Chengen Wang et.al. | 2502.08856 | link |
2025-02-12 | Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation | Mohammad Mahdi Abootorabi et.al. | 2502.08826 | link |
2025-02-12 | DejAIvu: Identifying and Explaining AI Art on the Web in Real-Time with Saliency Maps | Jocelyn Dzuong et.al. | 2502.08821 | link |
2025-02-12 | Can a Single Model Master Both Multi-turn Conversations and Tool Use? CALM: A Unified Conversational Agentic Language Model | Emre Can Acikgoz et.al. | 2502.08820 | null |
2025-02-12 | Lexical Manifold Reconfiguration in Large Language Models: A Novel Architectural Approach for Contextual Modulation | Koinis Vassilis et.al. | 2502.08818 | null |
2025-02-12 | Examining Multilingual Embedding Models Cross-Lingually Through LLM-Generated Adversarial Examples | Andrianos Michail et.al. | 2502.08638 | null |
2025-02-12 | Ensemble based approach to quantifying uncertainty of LLM based classifications | Srijith Rajamohan et.al. | 2502.08631 | null |
2025-02-12 | Continuous Cardiac Arrest Prediction in ICU using PPG Foundation Model | Saurabh Kataria et.al. | 2502.08612 | null |
2025-02-12 | Causal Analysis of ASR Errors for Children: Quantifying the Impact of Physiological, Cognitive, and Extrinsic Factors | Vishwanath Pratap Singh et.al. | 2502.08587 | null |
2025-02-12 | Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks | Ang Li et.al. | 2502.08586 | null |
2025-02-12 | Statistically validated projection of bipartite signed networks | Anna Gallo et.al. | 2502.08567 | null |
2025-02-12 | QA-Expand: Multi-Question Answer Generation for Enhanced Query Expansion in Information Retrieval | Wonduk Seo et.al. | 2502.08557 | null |
2025-02-12 | Human-Centric Foundation Models: Perception, Generation and Agentic Modeling | Shixiang Tang et.al. | 2502.08556 | link |
2025-02-12 | Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies | Sunnie S. Y. Kim et.al. | 2502.08554 | null |
2025-02-12 | LLMs can implicitly learn from mistakes in-context | Lisa Alazraki et.al. | 2502.08550 | null |
2025-02-12 | LLM Pretraining with Continuous Concepts | Jihoon Tack et.al. | 2502.08524 | null |
2025-02-12 | FedMHO: Heterogeneous One-Shot Federated Learning Towards Resource-Constrained Edge Devices | Dezhong Yao et.al. | 2502.08518 | link |
2025-02-12 | The Paradox of Stochasticity: Limited Creativity and Computational Decoupling in Temperature-Varied LLM Outputs of Structured Fictional Data | Evgenii Evstafev et.al. | 2502.08515 | null |
2025-02-12 | Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary Evaluation | Mahnaz Koupaee et.al. | 2502.08514 | link |
2025-02-12 | Measuring Diversity in Synthetic Datasets | Yuchang Zhu et.al. | 2502.08512 | link |
2025-02-12 | Explanation based In-Context Demonstrations Retrieval for Multilingual Grammatical Error Correction | Wei Li et.al. | 2502.08507 | link |
2025-02-12 | Salamandra Technical Report | Aitor Gonzalez-Agirre et.al. | 2502.08489 | link |
2025-02-12 | One-Shot Federated Learning with Classifier-Free Diffusion Models | Obaidullah Zaland et.al. | 2502.08488 | null |
2025-02-12 | Computed fingertip touch for the instrumental control of musical sound with an excursion on the computed retinal afterimage | Staas de Jong et.al. | 2502.08471 | null |
2025-02-12 | mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data | Haonan Chen et.al. | 2502.08468 | link |
2025-02-12 | From Haystack to Needle: Label Space Reduction for Zero-shot Classification | Nathan Vandemoortele et.al. | 2502.08436 | null |
2025-02-12 | IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance | Paul Röttger et.al. | 2502.08395 | null |
2025-02-12 | ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification | Jiangbo Shi et.al. | 2502.08391 | link |
2025-02-12 | Top-Theta Attention: Sparsifying Transformers by Compensated Thresholding | Konstantin Berestizshevsky et.al. | 2502.08363 | link |
2025-02-12 | Systematic Knowledge Injection into Large Language Models via Diverse Augmentation for Domain-Specific RAG | Kushagra Bhushan et.al. | 2502.08356 | null |
2025-02-12 | Trustworthy GNNs with LLMs: A Systematic Review and Taxonomy | Ruizhan Xue et.al. | 2502.08353 | null |
2025-02-12 | Graph Foundation Models for Recommendation: A Comprehensive Survey | Bin Wu et.al. | 2502.08346 | null |
2025-02-12 | Foundation Models in Computational Pathology: A Review of Challenges, Opportunities, and Impact | Mohsin Bilal et.al. | 2502.08333 | null |
2025-02-12 | Modification and Generated-Text Detection: Achieving Dual Detection Capabilities for the Outputs of LLM by Watermark | Yuhang Cai et.al. | 2502.08332 | null |
2025-02-12 | Contextual Compression Encoding for Large Language Models: A Novel Framework for Multi-Layered Parameter Space Pruning | Barnaby Schmitt et.al. | 2502.08323 | null |
2025-02-12 | MultiProSE: A Multi-label Arabic Dataset for Propaganda, Sentiment, and Emotion Detection | Lubna Al-Henaki et.al. | 2502.08319 | null |
2025-02-12 | Word Synchronization Challenge: A Benchmark for Word Association Responses for LLMs | Tanguy Cazalets et.al. | 2502.08312 | null |
2025-02-12 | Unlocking Scaling Law in Industrial Recommendation Systems with a Three-step Paradigm based Large User Model | Bencheng Yan et.al. | 2502.08309 | null |
2025-02-12 | HDT: Hierarchical Discrete Transformer for Multivariate Time Series Forecasting | Shibo Feng et.al. | 2502.08302 | link |
2025-02-12 | Compromising Honesty and Harmlessness in Language Models via Deception Attacks | Laurène Vaugrante et.al. | 2502.08301 | null |
2025-02-12 | Improving Existing Optimization Algorithms with LLMs | Camilo Chacón Sartori et.al. | 2502.08298 | null |
2025-02-12 | Redefining Simplicity: Benchmarking Large Language Models from Lexical to Document Simplification | Jipeng Qiang et.al. | 2502.08281 | null |
2025-02-12 | MoLoRec: A Generalizable and Efficient Framework for LLM-Based Recommendation | Min Hou et.al. | 2502.08271 | null |
2025-02-12 | Exploring the Potential of Large Language Models to Simulate Personality | Maria Molchanova et.al. | 2502.08265 | link |
2025-02-12 | GenIAS: Generator for Instantiating Anomalies in time Series | Zahra Zamanzadeh Darban et.al. | 2502.08262 | null |
2025-02-12 | FixDrive: Automatically Repairing Autonomous Vehicle Driving Behaviour for $0.08 per Violation | Yang Sun et.al. | 2502.08260 | link |
2025-02-12 | Learning Human Skill Generators at Key-Step Levels | Yilu Wu et.al. | 2502.08234 | null |
2025-02-12 | Flow-of-Action: SOP Enhanced LLM-Based Multi-Agent System for Root Cause Analysis | Changhua Pei et.al. | 2502.08224 | null |
2025-02-12 | Memory Offloading for Large Language Model Inference with Latency SLO Guarantees | Chenxiang Ma et.al. | 2502.08182 | null |
2025-02-12 | Enhancing LLM Character-Level Manipulation via Divide and Conquer | Zhen Xiong et.al. | 2502.08180 | null |
2025-02-12 | ParetoRAG: Leveraging Sentence-Context Attention for Robust and Efficient Retrieval-Augmented Generation | Ruobing Yao et.al. | 2502.08178 | null |
2025-02-12 | SycEval: Evaluating LLM Sycophancy | Aaron Fanous et.al. | 2502.08177 | null |
2025-02-12 | Intention is All You Need: Refining Your Code from Your Intention | Qi Guo et.al. | 2502.08172 | null |
2025-02-12 | Force Matching with Relativistic Constraints: A Physics-Inspired Approach to Stable and Efficient Generative Modeling | Yang Cao et.al. | 2502.08150 | null |
2025-02-12 | ACCESS : A Benchmark for Abstract Causal Event Discovery and Reasoning | Vy Vo et.al. | 2502.08148 | null |
2025-02-12 | Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers | Siddharth Singh et.al. | 2502.08145 | null |
2025-02-12 | Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences | Shanshan Han et.al. | 2502.08142 | null |
2025-02-12 | LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits | Zikai Zhou et.al. | 2502.08141 | null |
2025-02-12 | Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models | Sonam Gupta et.al. | 2502.08130 | null |
2025-02-12 | Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance | Lingfei Qian et.al. | 2502.08127 | link |
2025-02-12 | HuDEx: Integrating Hallucination Detection and Explainability for Enhancing the Reliability of LLM responses | Sujeong Lee et.al. | 2502.08109 | null |
2025-02-12 | Large language models perpetuate bias in palliative care: development and analysis of the Palliative Care Adversarial Dataset (PCAD) | Naomi Akhras et.al. | 2502.08073 | null |
2025-02-12 | On Mechanistic Circuits for Extractive Question-Answering | Samyadeep Basu et.al. | 2502.08059 | null |
2025-02-12 | Break the Checkbox: Challenging Closed-Style Evaluations of Cultural Alignment in LLMs | Mohsinul Kabir et.al. | 2502.08045 | null |
2025-02-12 | Franken-Adapter: Cross-Lingual Adaptation of LLMs by Embedding Surgery | Fan Jiang et.al. | 2502.08037 | null |
2025-02-12 | Stochastic Kinetics of Transcription: Analysis and Computation | Yuntao Lu et.al. | 2502.08028 | null |
2025-02-12 | Contextual Subspace Manifold Projection for Structural Refinement of Large Language Model Representations | Alistair Wren et.al. | 2502.08026 | null |
2025-02-11 | Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding | Ziyao Wang et.al. | 2502.08020 | null |
2025-02-11 | The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models | Artem Kirsanov et.al. | 2502.08009 | null |
2025-02-11 | An Interactive Framework for Implementing Privacy-Preserving Federated Learning: Experiments on Large Language Models | Kasra Ahmadi et.al. | 2502.08008 | link |
2025-02-11 | Towards Training One-Step Diffusion Models Without Distillation | Mingtian Zhang et.al. | 2502.08005 | null |
2025-02-11 | Universal Adversarial Attack on Aligned Multimodal LLMs | Temurbek Rahmatullaev et.al. | 2502.07987 | null |
2025-02-11 | Deep Semantic Graph Learning via LLM based Node Enhancement | Chuanqi Shi et.al. | 2502.07982 | null |
2025-02-11 | CIRCUIT: A Benchmark for Circuit Interpretation and Reasoning Capabilities of LLMs | Lejla Skelic et.al. | 2502.07980 | null |
2025-02-11 | From Hazard Identification to Controller Design: Proactive and LLM-Supported Safety Engineering for ML-Powered Systems | Yining Hong et.al. | 2502.07974 | null |
2025-02-11 | Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature? | Hye Sun Yun et.al. | 2502.07963 | null |
2025-02-11 | Accelerating Scientific Research Through a Multi-LLM Framework | Joaquin Ramirez-Medina et.al. | 2502.07960 | null |
2025-02-11 | Bridging HCI and AI Research for the Evaluation of Conversational SE Assistants | Jonan Richards et.al. | 2502.07956 | null |
2025-02-11 | Symbiotic Cooperation for Web Agents: Harnessing Complementary Strengths of Large and Small LLMs | Ruichen Zhang et.al. | 2502.07942 | null |
2025-02-11 | Discrete Markov Probabilistic Models | Le-Tuyet-Nhi Pham et.al. | 2502.07939 | null |
2025-02-11 | Distributed Approach to Haskell Based Applications Refactoring with LLMs Based Multi-Agent Systems | Shahbaz Siddeeq et.al. | 2502.07928 | null |
2025-02-11 | Sign Operator for Coping with Heavy-Tailed Noise: High Probability Convergence Bounds with Extensions to Distributed Optimization and Comparison Oracle | Nikita Kornilov et.al. | 2502.07923 | null |
2025-02-11 | Elevating Legal LLM Responses: Harnessing Trainable Logical Structures and Semantic Knowledge with Legal Reasoning | Rujing Yao et.al. | 2502.07912 | link |
2025-02-11 | DeepSeek on a Trip: Inducing Targeted Visual Hallucinations via Representation Vulnerabilities | Chashi Mahiul Islam et.al. | 2502.07905 | null |
2025-02-11 | Intelligent Legal Assistant: An Interactive Clarification System for Legal Question Answering | Rujing Yao et.al. | 2502.07904 | null |
2025-02-11 | HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment | Youhe Jiang et.al. | 2502.07903 | null |
2025-02-11 | TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation | Alex Jinpeng Wang et.al. | 2502.07870 | link |
2025-02-11 | TransMLA: Multi-head Latent Attention Is All You Need | Fanxu Meng et.al. | 2502.07864 | link |
2025-02-11 | BalanceKV: KV Cache Compression through Discrepancy Theory | Insu Han et.al. | 2502.07861 | null |
2025-02-11 | Pippo: High-Resolution Multi-View Humans from a Single Image | Yash Kant et.al. | 2502.07785 | null |
2025-02-11 | DarwinLM: Evolutionary Structured Pruning of Large Language Models | Shengkun Tang et.al. | 2502.07780 | null |
2025-02-11 | Stay-Positive: A Case for Ignoring Real Image Features in Fake Image Detection | Anirudh Sundara Rajan et.al. | 2502.07778 | null |
2025-02-11 | Auditing Prompt Caching in Language Model APIs | Chenchen Gu et.al. | 2502.07776 | link |
2025-02-11 | Automatic Robot Task Planning by Integrating Large Language Model with Genetic Programming | Azizjon Kobilov et.al. | 2502.07772 | null |
2025-02-11 | Great Power Brings Great Responsibility: Personalizing Conversational AI for Diverse Problem-Solvers | Italo Santos et.al. | 2502.07763 | null |
2025-02-11 | Scalable Fingerprinting of Large Language Models | Anshul Nasery et.al. | 2502.07760 | null |
2025-02-11 | Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension | Wenbo Gong et.al. | 2502.07752 | null |
2025-02-11 | WHODUNIT: Evaluation benchmark for culprit detection in mystery stories | Kshitij Gupta et.al. | 2502.07747 | link |
2025-02-11 | The Economics of Large Language Models: Token Allocation, Fine-Tuning, and Optimal Pricing | Dirk Bergemann et.al. | 2502.07736 | null |
2025-02-11 | Revisiting Non-Acyclic GFlowNets in Discrete Environments | Nikita Morozov et.al. | 2502.07735 | link |
2025-02-11 | Economics of Sourcing Human Data | Sebastin Santy et.al. | 2502.07732 | null |
2025-02-11 | Verifying LLM-Generated Code in the Context of Software Verification with Ada/SPARK | Marcos Cramer et.al. | 2502.07728 | null |
2025-02-11 | Near-Optimal Sample Complexity in Reward-Free Kernel-Based Reinforcement Learning | Aya Kayal et.al. | 2502.07715 | null |
2025-02-11 | Magic 1-For-1: Generating One Minute Video Clips within One Minute | Hongwei Yi et.al. | 2502.07701 | link |
2025-02-11 | A Framework for LLM-powered Design Assistants | Swaroop Panda et.al. | 2502.07698 | null |
2025-02-11 | Large Language Models as Proxies for Theories of Human Linguistic Cognition | Imry Ziv et.al. | 2502.07687 | null |
2025-02-11 | Steering Protein Family Design through Profile Bayesian Flow | Jingjing Gong et.al. | 2502.07671 | null |
2025-02-11 | Guiding Time-Varying Generative Models with Natural Gradients on Exponential Family Manifold | Song Liu et.al. | 2502.07650 | null |
2025-02-11 | SymGPT: Auditing Smart Contracts via Combining Symbolic Execution with Large Language Models | Shihao Xia et.al. | 2502.07644 | null |
2025-02-11 | FoQA: A Faroese Question-Answering Dataset | Annika Simonsen et.al. | 2502.07642 | null |
2025-02-11 | Distributional Instrumental Variable Method | Anastasiia Holovchak et.al. | 2502.07641 | link |
2025-02-11 | Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving | Yong Lin et.al. | 2502.07640 | link |
2025-02-11 | Consistency Training with Physical Constraints | Che-Chia Chang et.al. | 2502.07636 | null |
2025-02-11 | Exploring Mobile Touch Interaction with Large Language Models | Tim Zindulka et.al. | 2502.07629 | null |
2025-02-11 | Tractable Transformers for Flexible Conditional Generation | Anji Liu et.al. | 2502.07616 | null |
2025-02-11 | Beyond Prompting: Time2Lang -- Bridging Time-Series Foundation Models and Large Language Models for Health Sensing | Arvind Pillai et.al. | 2502.07608 | null |
2025-02-11 | Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models | Jiacong Xu et.al. | 2502.07601 | null |
2025-02-11 | Towards spatial computing: recent advances in multimodal natural interaction for XR headsets | Zhimin Wang et.al. | 2502.07598 | null |
2025-02-11 | SEMU: Singular Value Decomposition for Efficient Machine Unlearning | Marcin Sendera et.al. | 2502.07587 | null |
2025-02-11 | Generative Modeling with Bayesian Sample Inference | Marten Lienen et.al. | 2502.07580 | link |
2025-02-11 | PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference | Yufeng Gu et.al. | 2502.07578 | link |
2025-02-11 | Automated Capability Discovery via Model Self-Exploration | Cong Lu et.al. | 2502.07577 | link |
2025-02-11 | JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation | Shenyi Zhang et.al. | 2502.07557 | link |
2025-02-11 | O1 Embedder: Let Retrievers Think Before Action | Ruin Yan et.al. | 2502.07555 | null |
2025-02-11 | Grammar Control in Dialogue Response Generation for Language Learning Chatbots | Dominik Glandorf et.al. | 2502.07544 | link |
2025-02-11 | NatureLM: Deciphering the Language of Nature for Scientific Discovery | Yingce Xia et.al. | 2502.07527 | null |
2025-02-11 | The Devil is in the Prompts: De-Identification Traces Enhance Memorization Risks in Synthetic Chest X-Ray Generation | Raman Dutt et.al. | 2502.07516 | link |
2025-02-11 | Enhance-A-Video: Better Generated Video for Free | Yang Luo et.al. | 2502.07508 | link |
2025-02-11 | Towards THz-based Obstacle Sensing: A Generative Radio Environment Awareness Framework | Tianyu Hu et.al. | 2502.07504 | null |
2025-02-11 | Unified Graph Networks (UGN): A Deep Neural Framework for Solving Graph Problems | Rudrajit Dawn et.al. | 2502.07500 | null |
2025-02-11 | LLM-Sketch: Enhancing Network Sketches with LLM | Yuanpeng Li et.al. | 2502.07495 | link |
2025-02-11 | Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More | Xialie Zhuang et.al. | 2502.07490 | link |
2025-02-11 | Improving Adaptive Moment Optimization via Preconditioner Diagonalization | Son Nguyen et.al. | 2502.07488 | null |
2025-02-11 | ETimeline: An Extensive Timeline Generation Dataset based on Large Language Model | Xiaochen Liu et.al. | 2502.07474 | null |
2025-02-11 | JamendoMaxCaps: A Large Scale Music-caption Dataset with Imputed Metadata | Abhinaba Roy et.al. | 2502.07461 | link |
2025-02-11 | Logarithmic Regret for Online KL-Regularized Reinforcement Learning | Heyang Zhao et.al. | 2502.07460 | null |
2025-02-11 | PerCul: A Story-Driven Cultural Evaluation of LLMs in Persian | Erfan Moosavi Monazzah et.al. | 2502.07459 | null |
2025-02-11 | RusCode: Russian Cultural Code Benchmark for Text-to-Image Generation | Viacheslav Vasilev et.al. | 2502.07455 | link |
2025-02-11 | Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon | Nurit Cohen-Inger et.al. | 2502.07445 | link |
2025-02-11 | Towards a Foundation Model for Physics-Informed Neural Networks: Multi-PDE Learning with Active Sampling | Keon Vin Park et.al. | 2502.07425 | null |
2025-02-11 | RomanLens: Latent Romanization and its role in Multilinguality in LLMs | Alan Saji et.al. | 2502.07424 | null |
2025-02-11 | Entity Linking using LLMs for Automated Product Carbon Footprint Estimation | Steffen Castle et.al. | 2502.07418 | null |
2025-02-11 | EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering | Sheng Zhou et.al. | 2502.07411 | link |
2025-02-11 | MGPATH: Vision-Language Model with Multi-Granular Prompt Learning for Few-Shot WSI Classification | Anh-Tien Nguyen et.al. | 2502.07409 | link |
2025-02-11 | On Iterative Evaluation and Enhancement of Code Quality Using GPT-4o | Rundong Liu et.al. | 2502.07399 | link |
2025-02-11 | FinRL-DeepSeek: LLM-Infused Risk-Sensitive Reinforcement Learning for Trading Agents | Mostapha Benhenda et.al. | 2502.07393 | link |
2025-02-11 | LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! | Dacheng Li et.al. | 2502.07374 | link |
2025-02-11 | EvoFlow: Evolving Diverse Agentic Workflows On The Fly | Guibin Zhang et.al. | 2502.07373 | null |
2025-02-11 | LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation | Zican Dong et.al. | 2502.07365 | null |
2025-02-11 | Bridging the Evaluation Gap: Leveraging Large Language Models for Topic Model Evaluation | Zhiyin Tan et.al. | 2502.07352 | link |
2025-02-11 | KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems | Jusheng Zhang et.al. | 2502.07350 | null |
2025-02-11 | BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models | Xu Huang et.al. | 2502.07346 | link |
2025-02-11 | Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering | Shuzheng Si et.al. | 2502.07340 | link |
2025-02-11 | Music for All: Exploring Multicultural Representations in Music Generation Models (Camera Ready) | Atharva Mehta et.al. | 2502.07328 | link |
2025-02-11 | Generative Ghost: Investigating Ranking Bias Hidden in AI-Generated Videos | Haowen Gao et.al. | 2502.07327 | null |
2025-02-11 | MEMIT-Merge: Addressing MEMIT's Key-Value Conflicts in Same-Subject Batch Editing for LLMs | Zilu Dong et.al. | 2502.07322 | null |
2025-02-11 | CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction | Junlong Li et.al. | 2502.07316 | link |
2025-02-11 | Prompt-Based Document Modifications In Ranking Competitions | Niv Bardas et.al. | 2502.07315 | null |
2025-02-11 | CreAgent: Towards Long-Term Evaluation of Recommender System under Platform-Creator Information Asymmetry | Xiaopeng Ye et.al. | 2502.07307 | link |
2025-02-11 | TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation | Navid Rajabi et.al. | 2502.07306 | null |
2025-02-11 | Flow Matching for Collaborative Filtering | Chengkai Liu et.al. | 2502.07303 | link |
2025-02-11 | Generation of Drug-Induced Cardiac Reactions towards Virtual Clinical Trials | Qian Shao et.al. | 2502.07297 | null |
2025-02-11 | Small Language Model Makes an Effective Long Text Extractor | Yelin Chen et.al. | 2502.07286 | link |
2025-02-11 | Articulate That Object Part (ATOP): 3D Part Articulation from Text and Motion Personalization | Aditya Vora et.al. | 2502.07278 | null |
2025-02-11 | Cost-Efficient Continual Learning with Sufficient Exemplar Memory | Dongkyu Cho et.al. | 2502.07274 | null |
2025-02-11 | GENERator: A Long-Context Generative Genomic Foundation Model | Wei Wu et.al. | 2502.07272 | null |
2025-02-11 | When More is Less: Understanding Chain-of-Thought Length in LLMs | Yuyang Wu et.al. | 2502.07266 | null |
2025-02-11 | DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization | Xuefeng Liu et.al. | 2502.07237 | null |
2025-02-11 | A Memory Efficient Randomized Subspace Optimization Method for Training Large Language Models | Yiming Chen et.al. | 2502.07222 | null |
2025-02-11 | MLLM4PUE: Toward Universal Embeddings in Computational Pathology through Multimodal LLMs | Qifeng Zhou et.al. | 2502.07221 | null |
2025-02-11 | LUNAR: LLM Unlearning via Neural Activation Redirection | William F. Shen et.al. | 2502.07218 | null |
2025-02-11 | Playmate: Flexible Control of Portrait Animation via 3D-Implicit Space Guided Diffusion | Xingpei Ma et.al. | 2502.07203 | null |
2025-02-11 | Provably Efficient RLHF Pipeline: A Unified View from Contextual Bandits | Long-Fei Li et.al. | 2502.07193 | link |
2025-02-11 | Bag of Tricks for Inference-time Computation of LLM Reasoning | Fan Liu et.al. | 2502.07191 | null |
2025-02-11 | A Large-Scale Benchmark for Vietnamese Sentence Paraphrases | Sang Quang Nguyen et.al. | 2502.07188 | link |
2025-02-11 | Refine Knowledge of Large Language Models via Adaptive Contrastive Learning | Yinghui Li et.al. | 2502.07184 | null |
2025-02-11 | Does Training on Synthetic Data Make Models Less Robust? | Lingze Zhang et.al. | 2502.07164 | null |
2025-02-11 | Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning | Feng Chen et.al. | 2502.07154 | link |
2025-02-11 | Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded Reasoning | Jiayuan Zhu et.al. | 2502.07143 | null |
2025-02-11 | Language-TPP: Integrating Temporal Point Processes with Language Models for Event Analysis | Quyu Kong et.al. | 2502.07139 | null |
2025-02-10 | Cardiverse: Harnessing LLMs for Novel Card Game Prototyping | Danrui Li et.al. | 2502.07128 | null |
2025-02-10 | Structural Reformation of Large Language Model Neuron Encapsulation for Divergent Information Aggregation | Denis Bakushev et.al. | 2502.07124 | null |
2025-02-10 | Online Scheduling for LLM Inference with KV Cache Constraints | Patrick Jaillet et.al. | 2502.07115 | null |
2025-02-10 | Generative Distribution Prediction: A Unified Approach to Multimodal Learning | Xinyu Tian et.al. | 2502.07090 | null |
2025-02-10 | Evaluating the Systematic Reasoning Abilities of Large Language Models through Graph Coloring | Alex Heyman et.al. | 2502.07087 | link |
2025-02-10 | MPFBench: A Large Scale Dataset for SciML of Multi-Phase-Flows: Droplet and Bubble Dynamics | Mehdi Shadkhah et.al. | 2502.07080 | null |
2025-02-10 | Multi-turn Evaluation of Anthropomorphic Behaviours in Large Language Models | Lujain Ibrahim et.al. | 2502.07077 | null |
2025-02-10 | IRepair: An Intent-Aware Approach to Repair Data-Driven Errors in Large Language Models | Sayem Mohammad Imtiaz et.al. | 2502.07072 | null |
2025-02-10 | Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations | Yong Cao et.al. | 2502.07068 | link |
2025-02-10 | Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT | Dongyang Liu et.al. | 2502.06782 | null |
2025-02-10 | Enhancing Performance of Explainable AI Models with Constrained Concept Refinement | Geyu Liang et.al. | 2502.06775 | null |
2025-02-10 | Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions | Jaeyeon Kim et.al. | 2502.06768 | null |
2025-02-10 | Rationalization Models for Text-to-SQL | Gaetano Rossiello et.al. | 2502.06759 | null |
2025-02-10 | Accelerating Data Processing and Benchmarking of AI Models for Pathology | Andrew Zhang et.al. | 2502.06750 | link |
2025-02-10 | Gradient Multi-Normalization for Stateless and Scalable LLM Training | Meyer Scetbon et.al. | 2502.06742 | null |
2025-02-10 | VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data | Thomas Zeng et.al. | 2502.06737 | null |
2025-02-10 | Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists | Bojia Zi et.al. | 2502.06734 | null |
2025-02-10 | Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining | Daouda Sow et.al. | 2502.06733 | null |
2025-02-10 | Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling | Runze Liu et.al. | 2502.06703 | link |
2025-02-10 | No Trick, No Treat: Pursuits and Challenges Towards Simulation-free Training of Neural Samplers | Jiajun He et.al. | 2502.06685 | null |
2025-02-10 | EquiTabPFN: A Target-Permutation Equivariant Prior Fitted Networks | Michael Arbel et.al. | 2502.06684 | null |
2025-02-10 | Boosting Self-Efficacy and Performance of Large Language Models via Verbal Efficacy Stimulations | Rui Chen et.al. | 2502.06669 | null |
2025-02-10 | Automatic Evaluation of Healthcare LLMs Beyond Question-Answering | Anna Arias-Duart et.al. | 2502.06666 | null |
2025-02-10 | Evaluation of Deep Audio Representations for Hearables | Fabian Gröger et.al. | 2502.06664 | null |
2025-02-10 | EfficientLLM: Scalable Pruning-Aware Pretraining for Architecture-Agnostic Edge Language Models | Xingrun Xing et.al. | 2502.06663 | null |
2025-02-10 | Unbiased Evaluation of Large Language Models from a Causal Perspective | Meilin Chen et.al. | 2502.06655 | null |
2025-02-10 | In-Context Learning (and Unlearning) of Length Biases | Stephanie Schoch et.al. | 2502.06653 | null |
2025-02-10 | Transparent NLP: Using RAG and LLM Alignment for Privacy Q&A | Anna Leschanowsky et.al. | 2502.06652 | null |
2025-02-10 | Automatic Annotation Augmentation Boosts Translation between Molecules and Natural Language | Zhiqiang Zhong et.al. | 2502.06634 | null |
2025-02-10 | Combining Large Language Models with Static Analyzers for Code Review Generation | Imen Jaoua et.al. | 2502.06633 | null |
2025-02-10 | Multi-Scale Feature Fusion with Image-Driven Spatial Integration for Left Atrium Segmentation from Cardiac MRI Images | Bipasha Kundu et.al. | 2502.06615 | null |
2025-02-10 | A Large-scale AI-generated Image Inpainting Benchmark | Paschalis Giakoumoglou et.al. | 2502.06593 | null |
2025-02-10 | Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training | Yuchen Zhuang et.al. | 2502.06589 | null |
2025-02-10 | A Survey on Video Analytics in Cloud-Edge-Terminal Collaborative Systems | Linxiao Gong et.al. | 2502.06581 | null |
2025-02-10 | LawGPT: Knowledge-Guided Data Generation and Its Application to Legal LLM | Zhi Zhou et.al. | 2502.06572 | link |
2025-02-10 | Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation | Chengwen Qi et.al. | 2502.06563 | null |
2025-02-10 | Is API Access to LLMs Useful for Generating Private Synthetic Tabular Data? | Marika Swanberg et.al. | 2502.06555 | null |
2025-02-10 | Efficient Scientific Full Text Classification: The Case of EICAT Impact Assessments | Marc Felix Brinner et.al. | 2502.06551 | null |
2025-02-10 | Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning | Jean Vassoyan et.al. | 2502.06533 | null |
2025-02-10 | Properties of Wasserstein Gradient Flows for the Sliced-Wasserstein Distance | Christophe Vauthier et.al. | 2502.06525 | null |
2025-02-10 | GuideLLM: Exploring LLM-Guided Conversation with Applications in Autobiography Interviewing | Jinhao Duan et.al. | 2502.06494 | null |
2025-02-10 | Recent Advances in Discrete Speech Tokens: A Review | Yiwei Guo et.al. | 2502.06490 | null |
2025-02-10 | Adaptive Prompting: Ad-hoc Prompt Composition for Social Bias Detection | Maximilian Spliethöver et.al. | 2502.06487 | null |
2025-02-10 | WyckoffDiff - A Generative Diffusion Model for Crystal Symmetry | Filip Ekström Kelvinius et.al. | 2502.06485 | null |
2025-02-10 | UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths | Weijia Mao et.al. | 2502.06474 | null |
2025-02-10 | KARMA: Leveraging Multi-Agent LLMs for Automated Knowledge Graph Enrichment | Yuxing Lu et.al. | 2502.06472 | link |
2025-02-10 | A Survey of Theory of Mind in Large Language Models: Evaluations, Representations, and Safety Risks | Hieu Minh "Jord" Nguyen et.al. | 2502.06470 | null |
2025-02-10 | MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations | Kaixuan Huang et.al. | 2502.06453 | null |
2025-02-10 | FEMBA: Efficient and Scalable EEG Analysis with a Bidirectional Mamba Foundation Model | Anna Tegon et.al. | 2502.06438 | null |
2025-02-10 | Prompt-SID: Learning Structural Representation Prompt via Latent Diffusion for Single-Image Denoising | Huaqiu Li et.al. | 2502.06432 | null |
2025-02-10 | CoS: Chain-of-Shot Prompting for Long Video Understanding | Jian Hu et.al. | 2502.06428 | null |
2025-02-10 | Generating Privacy-Preserving Personalized Advice with Zero-Knowledge Proofs and LLMs | Hiroki Watanabe et.al. | 2502.06425 | null |
2025-02-10 | Occ-LLM: Enhancing Autonomous Driving with Occupancy-Based Large Language Models | Tianshuo Xu et.al. | 2502.06419 | null |
2025-02-10 | Systematic Outliers in Large Language Models | Yongqi An et.al. | 2502.06415 | null |
2025-02-10 | AppVLM: A Lightweight Vision Language Model for Online App Control | Georgios Papoudakis et.al. | 2502.06395 | null |
2025-02-10 | How Humans Help LLMs: Assessing and Incentivizing Human Preference Annotators | Shang Liu et.al. | 2502.06387 | null |
2025-02-10 | Simulation as Reality? The Effectiveness of LLM-Generated Data in Open-ended Question Assessment | Long Zhang et.al. | 2502.06371 | null |
2025-02-10 | Calibrating LLMs with Information-Theoretic Evidential Deep Learning | Yawei Li et.al. | 2502.06351 | link |
2025-02-10 | Can AI Examine Novelty of Patents?: Novelty Evaluation Based on the Correspondence between Patent Claim and Prior Art | Hayato Ikoma et.al. | 2502.06316 | null |
2025-02-10 | Latent Convergence Modulation in Large Language Models: A Novel Approach to Iterative Contextual Realignment | Patricia Porretta et.al. | 2502.06302 | null |
2025-02-10 | SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia | Chaoqun Liu et.al. | 2502.06298 | null |
2025-02-10 | Is an Ultra Large Natural Image-Based Foundation Model Superior to a Retina-Specific Model for Detecting Ocular and Systemic Diseases? | Qingshan Hou et.al. | 2502.06289 | null |
2025-02-10 | Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE | Haiduo Huang et.al. | 2502.06282 | link |
2025-02-10 | DebateBench: A Challenging Long Context Reasoning Benchmark For Large Language Models | Utkarsh Tiwari et.al. | 2502.06279 | null |
2025-02-10 | Emergent Response Planning in LLM | Zhichen Dong et.al. | 2502.06258 | null |
2025-02-10 | K-ON: Stacking Knowledge On the Head Layer of Large Language Model | Lingbing Guo et.al. | 2502.06257 | null |
2025-02-10 | Find Central Dogma Again | Wang Liang et.al. | 2502.06253 | null |
2025-02-10 | Amplifying Minority Voices: AI-Mediated Devil's Advocate System for Inclusive Group Decision-Making | Soohwan Lee et.al. | 2502.06251 | null |
2025-02-10 | PiKE: Adaptive Data Mixing for Multi-Task Learning Under Low Gradient Conflicts | Zeman Li et.al. | 2502.06244 | null |
2025-02-10 | Fully Exploiting Vision Foundation Model's Profound Prior Knowledge for Generalizable RGB-Depth Driving Scene Parsing | Sicen Guo et.al. | 2502.06219 | null |
2025-02-10 | LessLeak-Bench: A First Investigation of Data Leakage in LLMs Across 83 Software Engineering Benchmarks | Xin Zhou et.al. | 2502.06215 | null |
2025-02-10 | Unveiling the Capabilities of Large Language Models in Detecting Offensive Language with Annotation Disagreement | Junyu Lu et.al. | 2502.06207 | null |
2025-02-10 | C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Generation | Guoxin Chen et.al. | 2502.06205 | null |
2025-02-10 | Non-literal Understanding of Number Words by Language Models | Polina Tsvilodub et.al. | 2502.06204 | null |
2025-02-10 | Timing Matters: How Using LLMs at Different Timings Influences Writers' Perceptions and Ideation Outcomes in AI-Assisted Ideation | Peinuan Qin et.al. | 2502.06197 | null |
2025-02-10 | Can LLMs Replace Human Evaluators? An Empirical Study of LLM-as-a-Judge in Software Engineering | Ruiqi Wang et.al. | 2502.06193 | null |
2025-02-10 | Uncertainty-Aware Adaptation of Large Language Models for Protein-Protein Interaction Analysis | Sanket Jantre et.al. | 2502.06173 | null |
2025-02-10 | A Data-Efficient Pan-Tumor Foundation Model for Oncology CT Interpretation | Wenhui Lei et.al. | 2502.06171 | null |
2025-02-10 | Universal Approximation of Visual Autoregressive Transformers | Yifang Chen et.al. | 2502.06167 | null |
2025-02-10 | Scaling Public Health Text Annotation: Zero-Shot Learning vs. Crowdsourcing for Improved Efficiency and Labeling Accuracy | Kamyar Kazari et.al. | 2502.06150 | null |
2025-02-10 | Optimizing Knowledge Integration in Retrieval-Augmented Generation with Self-Selection | Yan Weng et.al. | 2502.06148 | null |
2025-02-10 | LegalViz: Legal Text Visualization by Text To Diagram Generation | Eri Onami et.al. | 2502.06147 | null |
2025-02-10 | LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs | Sumin An et.al. | 2502.06139 | null |
2025-02-10 | Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models | Ce Zhang et.al. | 2502.06130 | null |
2025-02-10 | Foundation Model of Electronic Medical Records for Adaptive Risk Estimation | Pawel Renc et.al. | 2502.06124 | null |
2025-02-10 | Task-driven Layerwise Additive Activation Intervention | Hieu Trung Nguyen et.al. | 2502.06115 | null |
2025-02-10 | CSR-Bench: Benchmarking LLM Agents in Deployment of Computer Science Research Repositories | Yijia Xiao et.al. | 2502.06111 | null |
2025-02-10 | RALLRec: Improving Retrieval Augmented Large Language Model Recommendation with Representation Learning | Jian Xu et.al. | 2502.06101 | link |
2025-02-10 | ConMeC: A Dataset for Metonymy Resolution with Common Nouns | Saptarshi Ghosh et.al. | 2502.06087 | link |
2025-02-10 | Physics-Guided Foundation Model for Scientific Discovery: An Application to Aquatic Science | Runlong Yu et.al. | 2502.06084 | link |
2025-02-10 | Debiasing Guidance for Discrete Diffusion with Sequential Monte Carlo | Cheuk Kit Lee et.al. | 2502.06079 | null |
2025-02-09 | Deconstructing Depression Stigma: Integrating AI-driven Data Collection and Analysis with Causal Knowledge Graphs | Han Meng et.al. | 2502.06075 | null |
2025-02-09 | Allegro-FM: Towards Equivariant Foundation Model for Exascale Molecular Dynamics Simulations | Ken-ichi Nomura et.al. | 2502.06073 | null |
2025-02-09 | Benchmarking Prompt Sensitivity in Large Language Models | Amirhossein Razavi et.al. | 2502.06065 | null |
2025-02-09 | Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization | Jiajun Fan et.al. | 2502.06061 | null |
2025-02-09 | Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models | Marc Bruni et.al. | 2502.06039 | null |
2025-02-09 | Investigating Compositional Reasoning in Time Series Foundation Models | Willa Potosnak et.al. | 2502.06037 | link |
2025-02-09 | A Multimodal PDE Foundation Model for Prediction and Scientific Text Descriptions | Elisa Negrini et.al. | 2502.06026 | link |
2025-02-09 | Dual Caption Preference Optimization for Diffusion Models | Amir Saeidi et.al. | 2502.06023 | null |
2025-02-09 | Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding | Xingjian Diao et.al. | 2502.06020 | link |
2025-02-09 | Media Bias Detector: Designing and Implementing a Tool for Real-Time Selection and Framing Bias Analysis in News Coverage | Jenny S Wang et.al. | 2502.06009 | null |
2025-02-09 | Analysis of LLM as a grammatical feature tagger for African American English | Rahul Porwal et.al. | 2502.06004 | null |
2025-02-09 | HamRaz: A Culture-Based Persian Conversation Dataset for Person-Centered Therapy Using LLM Agents | Mohammad Amin Abbasi et.al. | 2502.05982 | null |
2025-02-09 | Saaketh Narayan et.al. | 2502.05967 | null | |
2025-02-09 | Redefining Robot Generalization Through Interactive Intelligence | Sharmita Dey et.al. | 2502.05963 | null |
2025-02-09 | MetaChain: A Fully-Automated and Zero-Code Framework for LLM Agents | Jiabin Tang et.al. | 2502.05957 | null |
2025-02-09 | Cyri: A Conversational AI-based Assistant for Supporting the Human User in Detecting and Responding to Phishing Attacks | Antonio La Torre et.al. | 2502.05951 | null |
2025-02-09 | Acceleration Multiple Heads Decoding for LLM via Dynamic Tree Attention | Zhendong Zhang et.al. | 2502.05947 | null |
2025-02-09 | "Let the AI conspiracy begin..." Language Model coordination is just one inference-intervention away | Paul Darm et.al. | 2502.05945 | null |
2025-02-07 | Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray | Yunhang Shen et.al. | 2502.05177 | link |
2025-02-07 | Fillerbuster: Multi-View Scene Completion for Casual Captures | Ethan Weber et.al. | 2502.05175 | null |
2025-02-07 | NoLiMa: Long-Context Evaluation Beyond Literal Matching | Ali Modarressi et.al. | 2502.05167 | null |
2025-02-07 | Multitwine: Multi-Object Compositing with Text and Layout Control | Gemma Canet Tarrés et.al. | 2502.05165 | null |
2025-02-07 | DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails | Yihe Deng et.al. | 2502.05163 | link |
2025-02-07 | A Lightweight Method to Disrupt Memorized Sequences in LLM | Parjanya Prajakta Prashant et.al. | 2502.05159 | null |
2025-02-07 | Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation | Steffen Eger et.al. | 2502.05151 | null |
2025-02-07 | CodeSCM: Causal Analysis for Multi-Modal Code Generation | Mukur Gupta et.al. | 2502.05150 | link |
2025-02-07 | An Annotated Reading of 'The Singer of Tales' in the LLM Era | Kush R. Varshney et.al. | 2502.05148 | null |
2025-02-07 | Chest X-ray Foundation Model with Global and Local Representations Integration | Zefan Yang et.al. | 2502.05142 | link |
2025-02-07 | Latent Swap Joint Diffusion for Long-Form Audio Generation | Yusheng Dai et.al. | 2502.05130 | null |
2025-02-07 | Refining Integration-by-Parts Reduction of Feynman Integrals with Machine Learning | Matt von Hippel et.al. | 2502.05121 | null |
2025-02-07 | Flexible and Efficient Grammar-Constrained Decoding | Kanghee Park et.al. | 2502.05111 | null |
2025-02-07 | Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs | Rohit Saxena et.al. | 2502.05092 | null |
2025-02-07 | Mitigating Unintended Memorization with LoRA in Federated Learning for LLMs | Thierry Bossy et.al. | 2502.05087 | link |
2025-02-07 | Causality can systematically address the monsters under the bench(marks) | Felix Leeb et.al. | 2502.05085 | null |
2025-02-07 | ChallengeMe: An Adversarial Learning-enabled Text Summarization Framework | Xiaoyu Deng et.al. | 2502.05084 | null |
2025-02-07 | Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures | Tushar Pandey et.al. | 2502.05078 | link |
2025-02-07 | Beautiful Images, Toxic Words: Understanding and Addressing Offensive Text in Generated Images | Aditya Kumar et.al. | 2502.05066 | link |
2025-02-07 | nvAgent: Automated Data Visualization from Natural Language via Collaborative Agent Workflow | Geliang Ouyang et.al. | 2502.05036 | link |
2025-02-07 | Prospects for detecting generic fast-time features in the neutrino lightcurve of nearby supernovae in neutrino telescopes | Jakob Beise et.al. | 2502.05024 | null |
2025-02-07 | QuEST: Stable Training of LLMs with 1-Bit Weights and Activations | Andrei Panferov et.al. | 2502.05003 | link |
2025-02-07 | Aligning Black-box Language Models with Human Judgments | Gerrit J. J. van den Burg et.al. | 2502.04997 | null |
2025-02-07 | C2GM: Cascading Conditional Generation of Multi-scale Maps from Remote Sensing Images Constrained by Geographic Features | Chenxing Sun et.al. | 2502.04991 | null |
2025-02-07 | MoGraphGPT: Creating Interactive Scenes Using Modular LLM and Graphical Control | Hui Ye et.al. | 2502.04983 | null |
2025-02-07 | Enhancing Pre-Trained Decision Transformers with Prompt-Tuning Bandits | Finn Rietz et.al. | 2502.04979 | null |
2025-02-07 | Towards Multimodal Empathetic Response Generation: A Rich Text-Speech-Vision Avatar-based Benchmark | Han Zhang et.al. | 2502.04976 | null |
2025-02-07 | CoCoA: A Generalized Approach to Uncertainty Quantification by Integrating Confidence and Consistency of LLM Outputs | Roman Vashurin et.al. | 2502.04964 | null |
2025-02-07 | The Rising Threat to Emerging AI-Powered Search Engines | Zeren Luo et.al. | 2502.04951 | null |
2025-02-07 | Mobile Network-specialized Large Language Models for 6G: Architectures, Innovations, Challenges, and Future Trends | Abdelaali Chaoub et.al. | 2502.04933 | null |
2025-02-07 | Generative-enhanced optimization for knapsack problems: an industry-relevant study | Yelyzaveta Vodovozova et.al. | 2502.04928 | null |
2025-02-07 | Classification or Prompting: A Case Study on Legal Requirements Traceability | Romina Etezadi et.al. | 2502.04916 | null |
2025-02-07 | Goku: Flow Based Video Generative Foundation Models | Shoufa Chen et.al. | 2502.04896 | null |
2025-02-07 | A Foundational Brain Dynamics Model via Stochastic Optimal Control | Joonhyeong Park et.al. | 2502.04892 | null |
2025-02-07 | Training-free Task-oriented Grasp Generation | Jiaming Wang et.al. | 2502.04873 | null |
2025-02-07 | Advancing Wasserstein Convergence Analysis of Score-Based Models: Insights from Discretization and Second-Order Acceleration | Yifeng Yu et.al. | 2502.04849 | null |
2025-02-07 | Developmentally-plausible Working Memory Shapes a Critical Period for Language Acquisition | Masato Mita et.al. | 2502.04795 | null |
2025-02-07 | S |
Yuting Zeng et.al. | 2502.04790 | null |
2025-02-07 | Probing Internal Representations of Multi-Word Verbs in Large Language Models | Hassane Kissane et.al. | 2502.04789 | null |
2025-02-07 | Enhancing SQL Injection Detection and Prevention Using Generative Models | Naga Sai Dasari et.al. | 2502.04786 | null |
2025-02-07 | SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning | Wanjia Zhao et.al. | 2502.04780 | link |
2025-02-07 | SeDi-Instruct: Enhancing Alignment of Language Models through Self-Directed Instruction Generation | Jungwoo Kim et.al. | 2502.04774 | null |
2025-02-07 | Enhancing Phishing Email Identification with Large Language Models | Catherine Lee et.al. | 2502.04759 | null |
2025-02-07 | Concept Navigation and Classification via Open Source Large Language Model Processing | Maël Kubli et.al. | 2502.04756 | null |
2025-02-07 | Every Software as an Agent: Blueprint and Case Study | Mengwei Xu et.al. | 2502.04747 | null |
2025-02-07 | PhyloVAE: Unsupervised Learning of Phylogenetic Trees via Variational Autoencoders | Tianyu Xie et.al. | 2502.04730 | link |
2025-02-07 | Generating Symbolic World Models via Test-time Scaling of Large Language Models | Zhouliang Yu et.al. | 2502.04728 | link |
2025-02-07 | Evaluating Text Style Transfer Evaluation: Are There Any Reliable Metrics? | Sourabrata Mukherjee et.al. | 2502.04718 | null |
2025-02-07 | Enhancing Impression Change Prediction in Speed Dating Simulations Based on Speakers' Personalities | Kazuya Matsuo et.al. | 2502.04706 | null |
2025-02-07 | STRIDE: Automating Reward Design, Deep Reinforcement Learning Training and Feedback Optimization in Humanoid Robotics Locomotion | Zhenwei Wu et.al. | 2502.04692 | null |
2025-02-07 | ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning | Yuwei Yin et.al. | 2502.04689 | link |
2025-02-07 | M-IFEval: Multilingual Instruction-Following Evaluation | Antoine Dussolle et.al. | 2502.04688 | link |
2025-02-07 | Learning Strategic Language Agents in the Werewolf Game with Iterative Latent Space Policy Optimization | Zelai Xu et.al. | 2502.04686 | null |
2025-02-07 | G2PDiffusion: Genotype-to-Phenotype Prediction with Diffusion Models | Mengdi Liu et.al. | 2502.04684 | null |
2025-02-07 | CALF-SBM: A Covariate-Assisted Latent Factor Stochastic Block Model | Sydney Louit et.al. | 2502.04681 | null |
2025-02-07 | LLM Query Scheduling with Prefix Reuse and Latency Constraints | Gregory Dexter et.al. | 2502.04677 | null |
2025-02-07 | AdParaphrase: Paraphrase Dataset for Analyzing Linguistic Features toward Generating Attractive Ad Texts | Soichiro Murakami et.al. | 2502.04674 | link |
2025-02-07 | Unveiling the Mechanisms of Explicit CoT Training: How Chain-of-Thought Enhances Reasoning Generalization | Xinhao Yao et.al. | 2502.04667 | link |
2025-02-07 | Enhancing Health Information Retrieval with RAG by Prioritizing Topical Relevance and Factual Accuracy | Rishabh Uapadhyay et.al. | 2502.04666 | null |
2025-02-07 | Importance Sampling via Score-based Generative Models | Heasung Kim et.al. | 2502.04646 | null |
2025-02-07 | Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research | Junde Wu et.al. | 2502.04644 | link |
2025-02-07 | Confidence Elicitation: A New Attack Vector for Large Language Models | Brian Formento et.al. | 2502.04643 | null |
2025-02-07 | Contrastive Learning-Enhanced Large Language Models for Monolith-to-Microservice Decomposition | Khaled Sellami et.al. | 2502.04604 | null |
2025-02-07 | Extracting and Understanding the Superficial Knowledge in Alignment | Runjin Chen et.al. | 2502.04602 | link |
2025-02-07 | The |
Mohammad Reza Rezaei et.al. | 2502.04593 | null |
2025-02-07 | Position-aware Automatic Circuit Discovery | Tal Haklay et.al. | 2502.04577 | link |
2025-02-06 | My LLM might Mimic AAE -- But When Should it? | Sandra C. Sandoval et.al. | 2502.04564 | link |
2025-02-06 | Speeding up Speculative Decoding via Approximate Verification | Meiyu Zhong et.al. | 2502.04557 | null |
2025-02-06 | TruthFlow: Truthful LLM Generation via Representation Flow Correction | Hanyu Wang et.al. | 2502.04556 | null |
2025-02-06 | Contextual Gradient Flow Modeling for Large Language Model Generalization in Multi-Scale Feature Spaces | Daphne Quillington et.al. | 2502.04548 | null |
2025-02-06 | Group-Adaptive Threshold Optimization for Robust AI-Generated Text Detection | Minseok Jung et.al. | 2502.04528 | null |
2025-02-06 | Safety is Essential for Responsible Open-Ended Systems | Ivaxi Sheth et.al. | 2502.04512 | null |
2025-02-06 | ULPT: Prompt Tuning with Ultra-Low-Dimensional Optimization | Zijun Wu et.al. | 2502.04501 | null |
2025-02-06 | Verifiable Format Control for Large Language Model Generations | Zhaoyang Wang et.al. | 2502.04498 | null |
2025-02-06 | Multi-Agent Reinforcement Learning with Focal Diversity Optimization | Selim Furkan Tekin et.al. | 2502.04492 | link |
2025-02-06 | Building A Unified AI-centric Language System: analysis, framework and future work | Edward Hong Wang et.al. | 2502.04488 | null |
2025-02-06 | Active Task Disambiguation with LLMs | Katarzyna Kobalczyk et.al. | 2502.04485 | link |
2025-02-06 | The ML Supply Chain in the Era of Software 2.0: Lessons Learned from Hugging Face | Trevor Stalnaker et.al. | 2502.04484 | null |
2025-02-06 | Near-Optimal Sample Complexity for MDPs via Anchoring | Jongmin Lee et.al. | 2502.04477 | null |
2025-02-06 | ADIFF: Explaining audio difference using natural language | Soham Deshmukh et.al. | 2502.04476 | link |
2025-02-06 | Augmented Conditioning Is Enough For Effective Training Image Generation | Jiahui Chen et.al. | 2502.04475 | null |
2025-02-06 | Iterative Importance Fine-tuning of Diffusion Models | Alexander Denker et.al. | 2502.04468 | null |
2025-02-06 | FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks | Luca Della Libera et.al. | 2502.04465 | null |
2025-02-06 | Training Language Models to Reason Efficiently | Daman Arora et.al. | 2502.04463 | link |
2025-02-06 | Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization | Yu-Neng Chuang et.al. | 2502.04428 | null |
2025-02-06 | Decoding AI Judgment: How LLMs Assess News Credibility and Bias | Edoardo Loru et.al. | 2502.04426 | null |
2025-02-06 | EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models | He Hu et.al. | 2502.04424 | null |
2025-02-06 | Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment | Zuyan Liu et.al. | 2502.04328 | link |
2025-02-06 | Can Grammarly and ChatGPT accelerate language change? AI-powered technologies and their impact on the English language: wordiness vs. conciseness | Karolina Rudnicka et.al. | 2502.04324 | null |
2025-02-06 | Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions | Yik Siu Chan et.al. | 2502.04322 | link |
2025-02-06 | ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features | Alec Helbling et.al. | 2502.04320 | link |
2025-02-06 | sshELF: Single-Shot Hierarchical Extrapolation of Latent Features for 3D Reconstruction from Sparse-Views | Eyvaz Najafli et.al. | 2502.04318 | null |
2025-02-06 | ChamaleonLLM: Batch-Aware Dynamic Low-Rank Adaptation via Inference-Time Clusters | Kamer Ali Yuksel et.al. | 2502.04315 | link |
2025-02-06 | ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization | Yinjie Wang et.al. | 2502.04306 | link |
2025-02-06 | MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation | Jinbo Xing et.al. | 2502.04299 | null |
2025-02-06 | Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression | Lirui Wang et.al. | 2502.04296 | null |
2025-02-06 | Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization | Yuanye Liu et.al. | 2502.04295 | link |
2025-02-06 | PILAF: Optimal Human Preference Sampling for Reward Modeling | Yunzhen Feng et.al. | 2502.04270 | null |
2025-02-06 | Efficient Randomized Experiments Using Foundation Models | Piersilvio De Bartolomeis et.al. | 2502.04262 | link |
2025-02-06 | Realistic Image-to-Image Machine Unlearning via Decoupling and Knowledge Retention | Ayush K. Varshney et.al. | 2502.04260 | null |
2025-02-06 | MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion | Xintong Hao et.al. | 2502.04235 | null |
2025-02-06 | Can LLMs Hack Enterprise Networks? Autonomous Assumed Breach Penetration-Testing Active Directory Networks | Andreas Happe et.al. | 2502.04227 | null |
2025-02-06 | Keep It Light! Simplifying Image Clustering Via Text-Free Adapters | Yicen Li et.al. | 2502.04226 | null |
2025-02-06 | Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents | Ilia Karmanov et.al. | 2502.04223 | null |
2025-02-06 | Sports and Women's Sports: Gender Bias in Text Generation with Olympic Data | Laura Biester et.al. | 2502.04218 | null |
2025-02-06 | Algorithmic causal structure emerging through compression | Liang Wendong et.al. | 2502.04210 | null |
2025-02-06 | "Short-length" Adversarial Training Helps LLMs Defend "Long-length" Jailbreak Attacks: Theoretical and Empirical Evidence | Shaopeng Fu et.al. | 2502.04204 | link |
2025-02-06 | The Best Instruction-Tuning Data are Those That Fit | Dylan Zhang et.al. | 2502.04194 | null |
2025-02-06 | PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models? | Mennatullah Siam et.al. | 2502.04192 | link |
2025-02-06 | Automated Microservice Pattern Instance Detection Using Infrastructure-as-Code Artifacts and Large Language Models | Carlos Eduardo Duarte et.al. | 2502.04188 | null |
2025-02-06 | Multi-agent Architecture Search via Agentic Supernet | Guibin Zhang et.al. | 2502.04180 | null |
2025-02-06 | MRAMG-Bench: A BeyondText Benchmark for Multimodal Retrieval-Augmented Multimodal Generation | Qinhan Yu et.al. | 2502.04176 | null |
2025-02-06 | Diffusion-based mass map reconstruction from weak lensing data | Supranta S. Boruah et.al. | 2502.04158 | null |
2025-02-06 | UltraIF: Advancing Instruction Following from the Wild | Kaikai An et.al. | 2502.04153 | null |
2025-02-06 | The Order Effect: Investigating Prompt Sensitivity in Closed-Source LLMs | Bryan Guan et.al. | 2502.04134 | null |
2025-02-06 | Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis | Zhen Ye et.al. | 2502.04128 | null |
2025-02-06 | Generative Adversarial Networks Bridging Art and Machine Intelligence | Junhao Song et.al. | 2502.04116 | null |
2025-02-06 | VTutor: An Open-Source SDK for Generative AI-Powered Animated Pedagogical Agents with Multi-Media Output | Eason Chen et.al. | 2502.04103 | null |
2025-02-06 | LLMs to Support a Domain Specific Knowledge Assistant | Maria-Flavia Lovin et.al. | 2502.04095 | null |
2025-02-06 | AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference | Qingyue Yang et.al. | 2502.04077 | null |
2025-02-06 | Content-Rich AIGC Video Quality Assessment via Intricate Text Alignment and Motion-Aware Consistency | Shangkun Sun et.al. | 2502.04076 | link |
2025-02-06 | Predicting Large Language Model Capabilities on Closed-Book QA Tasks Using Only Information Available Prior to Training | Changhao Jiang et.al. | 2502.04066 | null |
2025-02-06 | TQ-DiT: Efficient Time-Aware Quantization for Diffusion Transformers | Younghye Hwang et.al. | 2502.04056 | null |
2025-02-06 | Exploring Imbalanced Annotations for Effective In-Context Learning | Hongfu Gao et.al. | 2502.04037 | null |
2025-02-06 | Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging | Guinan Su et.al. | 2502.04030 | null |
2025-02-06 | Echo-Teddy: Preliminary Design and Development of Large Language Model-based Social Robot for Autistic Students | Unggi Lee et.al. | 2502.04029 | null |
2025-02-06 | Quantification of Biodiversity from Historical Survey Text with LLM-based Best-Worst Scaling | Thomas Haider et.al. | 2502.04022 | null |
2025-02-06 | Automating a Complete Software Test Process Using LLMs: An Automotive Case Study | Shuai Wang et.al. | 2502.04008 | null |
2025-02-06 | CAD-Editor: A Locate-then-Infill Framework with Automated Training Data Synthesis for Text-Based CAD Editing | Yu Yuan et.al. | 2502.03997 | null |
2025-02-06 | Ontology-Guided, Hybrid Prompt Learning for Generalization in Knowledge Graph Question Answering | Longquan Jiang et.al. | 2502.03992 | link |
2025-02-06 | Tight Bounds on Jensen's Gap: Novel Approach with Applications in Generative Modeling | Marcin Mazur et.al. | 2502.03988 | null |
2025-02-06 | MultiFloodSynth: Multi-Annotated Flood Synthetic Dataset Generation | YoonJe Kang et.al. | 2502.03966 | null |
2025-02-06 | MAQInstruct: Instruction-based Unified Event Relation Extraction | Jun Xu et.al. | 2502.03954 | null |
2025-02-06 | LR0.FM: Low-Resolution Zero-shot Classification Benchmark For Foundation Models | Priyank Pathak et.al. | 2502.03950 | link |
2025-02-06 | Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and Beyond | Mardhiyah Sanni et.al. | 2502.03945 | null |
2025-02-06 | Unravelling Causal Genetic Biomarkers of Alzheimer's Disease via Neuron to Gene-token Backtracking in Neural Architecture: A Groundbreaking Reverse-Gene-Finder Approach | Victor OK Li et.al. | 2502.03938 | null |
2025-02-06 | Quantifying Correlations of Machine Learning Models | Yuanyuan Li et.al. | 2502.03937 | link |
2025-02-06 | HEP-JEPA: A foundation model for collider physics using joint embedding predictive architecture | Jai Bardhan et.al. | 2502.03933 | null |
2025-02-06 | Experiments with Large Language Models on Retrieval-Augmented Generation for Closed-Source Simulation Software | Andreas Baumann et.al. | 2502.03916 | null |
2025-02-06 | No Free Lunch in Annotation either: An objective evaluation of foundation models for streamlining annotation in animal tracking | Emil Mededovic et.al. | 2502.03907 | link |
2025-02-06 | LeAP: Consistent multi-domain 3D labeling using Foundation Models | Simon Gebraad et.al. | 2502.03901 | null |
2025-02-06 | InfinitePOD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers | Chenchen Shou et.al. | 2502.03885 | null |
2025-02-06 | Rank Also Matters: Hierarchical Configuration for Mixture of Adapter Experts in LLM Fine-Tuning | Peizhuang Cong et.al. | 2502.03884 | null |
2025-02-06 | BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation | Bo Pang et.al. | 2502.03860 | null |
2025-02-06 | PAGNet: Pluggable Adaptive Generative Networks for Information Completion in Multi-Agent Communication | Zhuohui Zhang et.al. | 2502.03845 | null |
2025-02-06 | Improving Natural Language Understanding for LLMs via Large-Scale Instruction Synthesis | Lin Yuan et.al. | 2502.03843 | null |
2025-02-06 | FairT2I: Mitigating Social Bias in Text-to-Image Generation via Large Language Model-Assisted Detection and Attribute Rebalancing | Jinya Sakurai et.al. | 2502.03826 | null |
2025-02-06 | Synthetic Poisoning Attacks: The Impact of Poisoned MRI Image on U-Net Brain Tumor Segmentation | Tianhao Li et.al. | 2502.03825 | null |
2025-02-06 | PsyPlay: Personality-Infused Role-Playing Conversational Agents | Tao Yang et.al. | 2502.03821 | null |
2025-02-06 | Large Language Models for Multi-Robot Systems: A Survey | Peihan Li et.al. | 2502.03814 | null |
2025-02-06 | Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective | Yuan Feng et.al. | 2502.03805 | link |
2025-02-06 | Understanding and Supporting Formal Email Exchange by Answering AI-Generated Questions | Yusuke Miura et.al. | 2502.03804 | null |
2025-02-06 | Enhancing Hallucination Detection through Noise Injection | Litian Liu et.al. | 2502.03799 | null |
2025-02-06 | Distribution learning via neural differential equations: minimal energy regularization and approximation theory | Youssef Marzouk et.al. | 2502.03795 | null |
2025-02-06 | It's All in The [MASK]: Simple Instruction-Tuning Enables BERT-like Masked Language Models As Generative Classifiers | Benjamin Clavié et.al. | 2502.03793 | null |
2025-02-06 | Iterate to Accelerate: A Unified Framework for Iterative Reasoning and Feedback Convergence | Jacob Fein-Ashley et.al. | 2502.03787 | null |
2025-02-06 | GistVis: Automatic Generation of Word-scale Visualizations from Data-rich Documents | Ruishi Zou et.al. | 2502.03784 | link |
2025-02-06 | Adaptive Semantic Prompt Caching with VectorQ | Luis Gaspar Schroeder et.al. | 2502.03771 | null |
2025-02-06 | Hierarchical Contextual Manifold Alignment for Structuring Latent Representations in Large Language Models | Meiquan Dong et.al. | 2502.03766 | null |
2025-02-06 | Rethinking the Residual Distribution of Locate-then-Editing Methods in Model Editing | Xiaopeng Li et.al. | 2502.03748 | null |
2025-02-06 | Speaking the Language of Teamwork: LLM-Guided Credit Assignment in Multi-Agent Reinforcement Learning | Muhan Lin et.al. | 2502.03723 | null |
2025-02-06 | Boosting Knowledge Graph-based Recommendations through Confidence-Aware Augmentation with Large Language Models | Rui Cai et.al. | 2502.03715 | null |
2025-02-06 | MultiQ&A: An Analysis in Measuring Robustness via Automated Crowdsourcing of Question Perturbations and Answers | Nicole Cho et.al. | 2502.03711 | null |
2025-02-06 | Aggregate and conquer: detecting and steering LLM concepts by combining nonlinear predictors over multiple layers | Daniel Beaglehole et.al. | 2502.03708 | null |
2025-02-06 | LLM Alignment as Retriever Optimization: An Information Retrieval Perspective | Bowen Jin et.al. | 2502.03699 | null |
2025-02-06 | A Comparison of DeepSeek and Other LLMs | Tianchen Gao et.al. | 2502.03688 | null |
2025-02-06 | Conditional Diffusion Models are Medical Image Classifiers that Provide Explainability and Uncertainty for Free | Gian Mario Favero et.al. | 2502.03687 | null |
2025-02-06 | Controlled LLM Decoding via Discrete Auto-regressive Biasing | Patrick Pynadath et.al. | 2502.03685 | null |
2025-02-05 | Reflection-Window Decoding: Text Generation with Selective Refinement | Zeyu Tang et.al. | 2502.03678 | null |
2025-02-05 | Advancing Reasoning in Large Language Models: Promising Methods and Approaches | Avinash Patil et.al. | 2502.03671 | null |
2025-02-05 | Unrealized Expectations: Comparing AI Methods vs Classical Algorithms for Maximum Independent Set | Yikai Wu et.al. | 2502.03669 | null |
2025-02-05 | Privacy-Preserving Generative Models: A Comprehensive Survey | Debalina Padariya et.al. | 2502.03668 | null |
2025-02-05 | Context-Preserving Gradient Modulation for Large Language Models: A Novel Approach to Semantic Consistency in Long-Form Text Generation | Nirola Kobanov et.al. | 2502.03643 | null |
2025-02-05 | SymmCD: Symmetry-Preserving Crystal Generation with Diffusion Models | Daniel Levy et.al. | 2502.03638 | link |
2025-02-05 | AdaPhish: AI-Powered Adaptive Defense and Education Resource Against Deceptive Emails | Rei Meguro et.al. | 2502.03622 | null |
2025-02-05 | Bilevel ZOFO: Bridging Parameter-Efficient and Zeroth-Order Techniques for Efficient LLM Fine-Tuning and Meta-Training | Reza Shirkavand et.al. | 2502.03604 | null |
2025-02-05 | HACK: Homomorphic Acceleration via Compression of the Key-Value Cache for Disaggregated LLM Inference | Zeyu Zhang et.al. | 2502.03589 | null |
2025-02-05 | A Mixed-Methods Evaluation of LLM-Based Chatbots for Menopause | Roshini Deva et.al. | 2502.03579 | null |
2025-02-05 | Code Simulation as a Proxy for High-order Tasks in Large Language Models | Emanuele La Malfa et.al. | 2502.03568 | null |
2025-02-05 | Kronecker Mask and Interpretive Prompts are Language-Action Video Learners | Jingyi Yang et.al. | 2502.03549 | link |
2025-02-05 | YINYANG-ALIGN: Benchmarking Contradictory Objectives and Proposing Multi-Objective Optimization based DPO for Text-to-Image Alignment | Amitava Das et.al. | 2502.03512 | null |
2025-02-05 | Do Large Language Model Benchmarks Test Reliability? | Joshua Vendrow et.al. | 2502.03461 | link |
2025-02-05 | Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training | Boyao Wang et.al. | 2502.03460 | null |
2025-02-05 | A Schema-Guided Reason-while-Retrieve framework for Reasoning on Scene Graphs with Large-Language-Models (LLMs) | Yiye Chen et.al. | 2502.03450 | null |
2025-02-05 | Dress-1-to-3: Single Image to Simulation-Ready 3D Outfit with Diffusion Prior and Differentiable Physics | Xuan Li et.al. | 2502.03449 | null |
2025-02-05 | BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving | Ran Xin et.al. | 2502.03438 | null |
2025-02-05 | Taking a Big Step: Large Learning Rates in Denoising Score Matching Prevent Memorization | Yu-Han Wu et.al. | 2502.03435 | null |
2025-02-05 | On Fairness of Unified Multimodal Large Language Model for Image Generation | Ming Liu et.al. | 2502.03429 | null |
2025-02-05 | Harnessing Large Language Models for Curated Code Reviews | Oussama Ben Sghaier et.al. | 2502.03425 | link |
2025-02-05 | Can Text-to-Image Generative Models Accurately Depict Age? A Comparative Study on Synthetic Portrait Generation and Age Estimation | Alexey A. Novikov et.al. | 2502.03420 | null |
2025-02-05 | Think or Step-by-Step? UnZIPping the Black Box in Zero-Shot Prompts | Nikta Gohari Sadr et.al. | 2502.03418 | null |
2025-02-05 | SPRI: Aligning Large Language Models with Context-Situated Principles | Hongli Zhan et.al. | 2502.03397 | null |
2025-02-05 | Benchmarking Time Series Forecasting Models: From Statistical Techniques to Foundation Models in Real-World Applications | Issar Arab et.al. | 2502.03395 | null |
2025-02-05 | LIMO: Less is More for Reasoning | Yixin Ye et.al. | 2502.03387 | link |
2025-02-05 | Transformers and Their Roles as Time Series Foundation Models | Dennis Wu et.al. | 2502.03383 | null |
2025-02-05 | Demystifying Long Chain-of-Thought Reasoning in LLMs | Edward Yeo et.al. | 2502.03373 | link |
2025-02-05 | PalimpChat: Declarative and Interactive AI analytics | Chunwei Liu et.al. | 2502.03368 | null |
2025-02-05 | RadVLM: A Multitask Conversational Vision-Language Model for Radiology | Nicolas Deperrois et.al. | 2502.03333 | null |
2025-02-05 | ECM: A Unified Electronic Circuit Model for Explaining the Emergence of In-Context Learning and Chain-of-Thought in Large Language Model | Qiguang Chen et.al. | 2502.03325 | null |
2025-02-05 | Out-of-Distribution Detection using Synthetic Data Generation | Momin Abbas et.al. | 2502.03323 | null |
2025-02-05 | Simplifying Formal Proof-Generating Models with ChatGPT and Basic Searching Techniques | Sangjun Han et.al. | 2502.03321 | null |
2025-02-05 | Intent Representation Learning with Large Language Model for Recommendation | Yu Wang et.al. | 2502.03307 | link |
2025-02-05 | Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning | Qitao Tan et.al. | 2502.03304 | null |
2025-02-05 | MeDiSumQA: Patient-Oriented Question-Answer Generation from Discharge Letters | Amin Dada et.al. | 2502.03298 | null |
2025-02-05 | SymAgent: A Neural-Symbolic Self-Learning Agent Framework for Complex Reasoning over Knowledge Graphs | Ben Liu et.al. | 2502.03283 | null |
2025-02-05 | Posterior SBC: Simulation-Based Calibration Checking Conditional on Data | Teemu Säilynoja et.al. | 2502.03279 | link |
2025-02-05 | Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning | DiJia Su et.al. | 2502.03275 | null |
2025-02-05 | ZISVFM: Zero-Shot Object Instance Segmentation in Indoor Robotic Environments with Vision Foundation Models | Ying Zhang et.al. | 2502.03266 | link |
2025-02-05 | General Time-series Model for Universal Knowledge Representation of Multivariate Time-Series data | Cheng He et.al. | 2502.03264 | null |
2025-02-05 | CARROT: A Cost Aware Rate Optimal Router | Seamus Somerstep et.al. | 2502.03261 | null |
2025-02-05 | RiemannGFM: Learning a Graph Foundation Model from Riemannian Geometry | Li Sun et.al. | 2502.03251 | null |
2025-02-05 | Exploring the Security Threats of Knowledge Base Poisoning in Retrieval-Augmented Code Generation | Bo Lin et.al. | 2502.03233 | null |
2025-02-05 | Improve Decoding Factuality by Token-wise Cross Layer Entropy of Large Language Models | Jialiang Wu et.al. | 2502.03199 | null |
2025-02-05 | MaxInfo: A Training-Free Key-Frame Selection Method Using Maximum Volume for Enhanced Video Understanding | Pengyi Li et.al. | 2502.03183 | null |
2025-02-05 | PICBench: Benchmarking LLMs for Photonic Integrated Circuits Design | Yuchao Wu et.al. | 2502.03159 | null |
2025-02-05 | Strategizing with AI: Insights from a Beauty Contest Experiment | Iuliia Alekseenko et.al. | 2502.03158 | null |
2025-02-05 | Scalable In-Context Learning on Tabular Data via Retrieval-Augmented Large Language Models | Xumeng Wen et.al. | 2502.03147 | null |
2025-02-05 | Symmetry-Aware Bayesian Flow Networks for Crystal Generation | Laura Ruple et.al. | 2502.03146 | null |
2025-02-05 | Teaching Large Language Models Number-Focused Headline Generation With Key Element Rationales | Zhen Qian et.al. | 2502.03129 | null |
2025-02-05 | Metis: A Foundation Speech Generation Model with Masked Generative Pre-training | Yuancheng Wang et.al. | 2502.03128 | link |
2025-02-05 | Structured Token Retention and Computational Memory Paths in Large Language Models | Jonathan Delena et.al. | 2502.03102 | null |
2025-02-05 | Reveal the Mystery of DPO: The Connection between DPO and RL Algorithms | Xuerui Su et.al. | 2502.03095 | null |
2025-02-05 | Implementing Large Quantum Boltzmann Machines as Generative AI Models for Dataset Balancing | Salvatore Sinno et.al. | 2502.03086 | null |
2025-02-05 | IAO Prompting: Making Knowledge Flow Explicit in LLMs through Structured Reasoning Templates | Aissatou Diallo et.al. | 2502.03080 | null |
2025-02-05 | Poisson Flow Joint Model for Multiphase contrast-enhanced CT | Rongjun Ge et.al. | 2502.03079 | null |
2025-02-05 | Automatic Prompt Optimization Techniques: Exploring the Potential for Synthetic Data Generation | Nina Freise et.al. | 2502.03078 | null |
2025-02-05 | Optimizing Electric Vehicles Charging using Large Language Models and Graph Neural Networks | Stavros Orfanoudakis et.al. | 2502.03067 | null |
2025-02-05 | Understanding and Enhancing the Transferability of Jailbreaking Attacks | Runqi Lin et.al. | 2502.03052 | link |
2025-02-05 | RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts | Tuan Truong et.al. | 2502.03044 | null |
2025-02-05 | Large Language Models Are Universal Recommendation Learners | Junguang Jiang et.al. | 2502.03041 | null |
2025-02-05 | FuXi- |
Yufei Ye et.al. | 2502.03036 | null |
2025-02-05 | Knowledge Distillation from Large Language Models for Household Energy Modeling | Mohannad Takrouri et.al. | 2502.03034 | null |
2025-02-05 | Analyze Feature Flow to Enhance Interpretation and Steering in Language Models | Daniil Laptev et.al. | 2502.03032 | null |
2025-02-05 | Scaling Laws for Upcycling Mixture-of-Experts Language Models | Seng Pei Liew et.al. | 2502.03009 | null |
2025-02-05 | MedBioLM: Optimizing Medical and Biological QA with Fine-Tuned Large Language Models and Retrieval-Augmented Generation | Seonok Kim et.al. | 2502.03004 | null |
2025-02-05 | Training an LLM-as-a-Judge Model: Pipeline, Insights, and Practical Lessons | Renjun Hu et.al. | 2502.02988 | null |
2025-02-05 | Membership Inference Attack Should Move On to Distributional Statistics for Distilled Generative Models | Muxing Li et.al. | 2502.02970 | null |
2025-02-05 | The Labeled Coupon Collector Problem with Random Sample Sizes and Partial Recovery | Shoham Shimon Berrebi et.al. | 2502.02968 | null |
2025-02-05 | Large Language Model Adversarial Landscape Through the Lens of Attack Objectives | Nan Wang et.al. | 2502.02960 | null |
2025-02-05 | Position: Editing Large Language Models Poses Serious Safety Risks | Paul Youssef et.al. | 2502.02958 | null |
2025-02-05 | Control Search Rankings, Control the World: What is a Good Search Engine? | Simon Coghlan et.al. | 2502.02957 | null |
2025-02-05 | LLM-KT: Aligning Large Language Models with Knowledge Tracing using a Plug-and-Play Instruction | Ziwei Wang et.al. | 2502.02945 | null |
2025-02-05 | Large Language Model Guided Self-Debugging Code Generation | Muntasir Adnan et.al. | 2502.02928 | null |
2025-02-05 | SPARC: Subspace-Aware Prompt Adaptation for Robust Continual Learning in LLMs | Dinithi Jayasuriya et.al. | 2502.02909 | null |
2025-02-05 | AI-driven materials design: a mini-review | Mouyang Cheng et.al. | 2502.02905 | null |
2025-02-05 | A Benchmark for the Detection of Metalinguistic Disagreements between LLMs and Knowledge Graphs | Bradley P. Allen et.al. | 2502.02896 | null |
2025-02-05 | Lowering the Barrier of Machine Learning: Achieving Zero Manual Labeling in Review Classification Using LLMs | Yejian Zhang et.al. | 2502.02893 | null |
2025-02-05 | Expertized Caption Auto-Enhancement for Video-Text Retrieval | Junxiang Chen et.al. | 2502.02885 | null |
2025-02-05 | SensorChat: Answering Qualitative and Quantitative Questions during Long-Term Multimodal Sensor Interactions | Xiaofan Yu et.al. | 2502.02883 | null |
2025-02-05 | Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning | Yibo Yan et.al. | 2502.02871 | null |
2025-02-05 | A Systematic Approach for Assessing Large Language Models' Test Case Generation Capability | Hung-Fu Chang et.al. | 2502.02866 | null |
2025-02-05 | OceanChat: The Effect of Virtual Conversational AI Agents on Sustainable Attitude and Behavior Change | Pat Pataranutaporn et.al. | 2502.02863 | null |
2025-02-05 | A Survey of Sample-Efficient Deep Learning for Change Detection in Remote Sensing: Tasks, Strategies, and Challenges | Lei Ding et.al. | 2502.02835 | null |
2025-02-05 | COFFE: A Code Efficiency Benchmark for Code Generation | Yun Peng et.al. | 2502.02827 | link |
2025-02-05 | Accessible and Portable LLM Inference by Compiling Computational Graphs into SQL | Wenbo Sun et.al. | 2502.02818 | null |
2025-02-05 | Mol-LLM: Generalist Molecular LLM with Improved Graph Utilization | Chanhui Lee et.al. | 2502.02810 | null |
2025-02-05 | CAMI: A Counselor Agent Supporting Motivational Interviewing through State Inference and Topic Exploration | Yizhe Yang et.al. | 2502.02807 | null |
2025-02-05 | Leveraging the true depth of LLMs | Ramón Calvo González et.al. | 2502.02790 | null |
2025-02-05 | Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation | Jingyu Liu et.al. | 2502.02789 | link |
2025-02-05 | SimMark: A Robust Sentence-Level Similarity-Based Watermarking Algorithm for Large Language Models | Amirhossein Dabiriaghdam et.al. | 2502.02787 | link |
2025-02-04 | Classroom Simulacra: Building Contextual Student Generative Agents in Online Education for Learning Behavioral Simulation | Songlin Xu et.al. | 2502.02780 | link |
2025-02-04 | 3D Foundation AI Model for Generalizable Disease Detection in Head Computed Tomography | Weicheng Zhu et.al. | 2502.02779 | null |
2025-02-04 | Twilight: Adaptive Attention Sparsity with Hierarchical Top- |
Chaofan Lin et.al. | 2502.02770 | null |
2025-02-04 | LLM-USO: Large Language Model-based Universal Sizing Optimizer | Karthik Somayaji N. S et.al. | 2502.02764 | null |
2025-02-04 | Rethinking Vision Transformer for Object Centric Foundation Models | Manuel Traub et.al. | 2502.02763 | null |
2025-02-04 | Too Noisy To Learn: Enhancing Data Quality for Code Review C | Chunhua Liu et.al. | 2502.02757 | null |
2025-02-04 | PatchPilot: A Stable and Cost-Efficient Agentic Patching Framework | Hongwei Li et.al. | 2502.02747 | null |
2025-02-04 | LLM Bandit: Cost-Efficient LLM Generation via Preference-Conditioned Dynamic Routing | Yang Li et.al. | 2502.02743 | null |
2025-02-04 | RFMedSAM 2: Automatic Prompt Refinement for Enhanced Volumetric Medical Image Segmentation with SAM 2 | Bin Xie et.al. | 2502.02741 | null |
2025-02-04 | SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model | Loubna Ben Allal et.al. | 2502.02737 | null |
2025-02-04 | Peri-LN: Revisiting Layer Normalization in the Transformer Architecture | Jeonghoon Kim et.al. | 2502.02732 | null |
2025-02-04 | Cross-Lingual Transfer for Low-Resource Natural Language Processing | Iker García-Ferrero et.al. | 2502.02722 | null |
2025-02-04 | Astromer 2 | Cristobal Donoso-Oliva et.al. | 2502.02717 | null |
2025-02-04 | A Unified Understanding and Evaluation of Steering Methods | Shawn Im et.al. | 2502.02716 | null |
2025-02-04 | An Analysis of LLM Fine-Tuning and Few-Shot Learning for Flaky Test Detection and Classification | Riddhi More et.al. | 2502.02715 | null |
2025-02-04 | Exploring LLMs Impact on Student-Created User Stories and Acceptance Testing in Software Development | Allan Brockenbrough et.al. | 2502.02675 | null |
2025-02-04 | MedRAX: Medical Reasoning Agent for Chest X-ray | Adibvafa Fallahpour et.al. | 2502.02673 | link |
2025-02-04 | Transformers Boost the Performance of Decision Trees on Tabular Data across Sample Sizes | Mayuka Jayawardhana et.al. | 2502.02672 | null |
2025-02-04 | Machine-learning approaches to accelerating lattice simulations | Scott Lawrence et.al. | 2502.02670 | null |
2025-02-04 | A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI) | Yan Li et.al. | 2502.02659 | link |
2025-02-04 | Introducing the Rhea simulations of Milky-Way-like galaxies I: Effect of gravitational potential on morphology and star formation | Junia Göller et.al. | 2502.02646 | null |
2025-02-04 | COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for Fine-Grained Understanding and Generation | Xueqing Deng et.al. | 2502.02589 | null |
2025-02-04 | Open Materials Generation with Stochastic Interpolants | Philipp Hoellmer et.al. | 2502.02582 | null |
2025-02-04 | A comparison of translation performance between DeepL and Supertext | Alex Flückiger et.al. | 2502.02577 | link |
2025-02-04 | Are Language Models Up to Sequential Optimization Problems? From Evaluation to a Hegelian-Inspired Enhancement | Soheil Abbasloo et.al. | 2502.02573 | null |
2025-02-04 | Learning the RoPEs: Better 2D and 3D Position Encodings with STRING | Connor Schenck et.al. | 2502.02562 | null |
2025-02-04 | Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation | Junha Lee et.al. | 2502.02548 | null |
2025-02-04 | LLMs for Generation of Architectural Components: An Exploratory Empirical Study in the Serverless World | Shrikara Arun et.al. | 2502.02539 | null |
2025-02-04 | Adaptive Self-improvement LLM Agentic System for ML Library Development | Genghan Zhang et.al. | 2502.02534 | link |
2025-02-04 | Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies | Han Zhou et.al. | 2502.02533 | null |
2025-02-04 | Generative Modeling on Lie Groups via Euclidean Generalized Score Matching | Marco Bertolini et.al. | 2502.02513 | null |
2025-02-04 | Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search | Maohao Shen et.al. | 2502.02508 | null |
2025-02-04 | Learning to generate physical ocean states: Towards hybrid climate modeling | Etienne Meunier et.al. | 2502.02499 | null |
2025-02-04 | EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization | Yize Wu et.al. | 2502.02493 | null |
2025-02-04 | Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study | Menglong Cui et.al. | 2502.02481 | null |
2025-02-04 | Style transfer as data augmentation: evaluating unpaired image-to-image translation models in mammography | Emir Ahmed et.al. | 2502.02475 | null |
2025-02-04 | Mind the Gap: Evaluating Patch Embeddings from General-Purpose and Histopathology Foundation Models for Cell Segmentation and Classification | Valentina Vadori et.al. | 2502.02471 | link |
2025-02-04 | SAISA: Towards Multimodal Large Language Models with Both Training and Inference Efficiency | Qianhao Yuan et.al. | 2502.02458 | null |
2025-02-04 | Personalization Toolkit: Training Free Personalization of Large Vision Language Models | Soroush Seifi et.al. | 2502.02452 | null |
2025-02-04 | Beyond English: Evaluating Automated Measurement of Moral Foundations in Non-English Discourse with a Chinese Case Study | Calvin Yixiang Cheng et.al. | 2502.02451 | link |
2025-02-04 | Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language Models | Haoran Ye et.al. | 2502.02444 | null |
2025-02-04 | LLMER: Crafting Interactive Extended Reality Worlds with JSON Data Generated by Large Language Models | Jiangong Chen et.al. | 2502.02441 | link |
2025-02-04 | Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment | Yaling Shen et.al. | 2502.02438 | null |
2025-02-04 | TransformDAS: Mapping Φ-OTDR Signals to Riemannian Manifold for Robust Classification | Jiaju Kang et.al. | 2502.02428 | null |
2025-02-04 | Activation-Informed Merging of Large Language Models | Amin Heyrani Nobari et.al. | 2502.02421 | link |
2025-02-04 | Towards Fast Graph Generation via Autoregressive Noisy Filtration Modeling | Markus Krimmel et.al. | 2502.02415 | link |
2025-02-04 | AI-Powered, But Power-Hungry? Energy Efficiency of LLM-Generated Code | Lola Solovyeva et.al. | 2502.02412 | null |
2025-02-04 | Avoiding spurious sharpness minimization broadens applicability of SAM | Sidak Pal Singh et.al. | 2502.02407 | null |
2025-02-04 | LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models | Tzu-Tao Chang et.al. | 2502.02406 | null |
2025-02-04 | CoAT: Chain-of-Associated-Thoughts Framework for Enhancing Large Language Models Reasoning | Jianfeng Pan et.al. | 2502.02390 | null |
2025-02-04 | Hypergraph Link Prediction via Hyperedge Copying | Xie He et.al. | 2502.02386 | null |
2025-02-04 | STAIR: Improving Safety Alignment with Introspective Reasoning | Yichi Zhang et.al. | 2502.02384 | link |
2025-02-04 | Evaluating the Effectiveness of LLMs in Fixing Maintainability Issues in Real-World Projects | Henrique Nunes et.al. | 2502.02368 | null |
2025-02-04 | Field Matching: an Electrostatic Paradigm to Generate and Transfer Data | Alexander Kolesov et.al. | 2502.02367 | null |
2025-02-04 | Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs | Sagnik Mukherjee et.al. | 2502.02362 | null |
2025-02-04 | SHIELD: APT Detection and Intelligent Explanation Using LLM | Parth Atulbhai Gandhi et.al. | 2502.02342 | null |
2025-02-04 | Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking | Jinyang Wu et.al. | 2502.02339 | null |
2025-02-04 | ReSpark: Leveraging Previous Data Reports as References to Generate New Reports with LLMs | Yuan Tian et.al. | 2502.02329 | null |
2025-02-04 | Information-Theoretic Proofs for Diffusion Sampling | Galen Reeves et.al. | 2502.02305 | null |
2025-02-04 | Density Ratio Estimation with Conditional Probability Paths | Hanlin Yu et.al. | 2502.02300 | null |
2025-02-04 | Evalita-LLM: Benchmarking Large Language Models on Italian | Bernardo Magnini et.al. | 2502.02289 | null |
2025-02-04 | Adaptive Resource Allocation Optimization Using Large Language Models in Dynamic Wireless Environments | Hyeonho Noh et.al. | 2502.02287 | null |
2025-02-04 | Conversation AI Dialog for Medicare powered by Finetuning and Retrieval Augmented Generation | Atharva Mangeshkumar Agrawal et.al. | 2502.02249 | null |
2025-02-04 | Flatten Graphs as Sequences: Transformers are Scalable Graph Generators | Dexiong Chen et.al. | 2502.02216 | null |
2025-02-04 | When Dimensionality Hurts: The Role of LLM Embedding Compression for Noisy Regression Tasks | Felix Drinkall et.al. | 2502.02199 | link |
2025-02-04 | Large language models in climate and sustainability policy: limits and opportunities | Francesca Larosa et.al. | 2502.02191 | null |
2025-02-04 | ShapeShifter: 3D Variations Using Multiscale and Sparse Point-Voxel Diffusion | Nissim Maruani et.al. | 2502.02187 | null |
2025-02-04 | Generative Kernel Spectral Clustering | David Winant et.al. | 2502.02185 | null |
2025-02-04 | Mass-Editing Memory with Attention in Transformers: A cross-lingual exploration of knowledge | Daniel Tamayo et.al. | 2502.02173 | link |
2025-02-04 | EditIQ: Automated Cinematic Editing of Static Wide-Angle Videos via Dialogue Interpretation and Saliency Cues | Rohit Girmaji et.al. | 2502.02172 | null |
2025-02-04 | Risk-Aware Driving Scenario Analysis with Large Language Models | Yuan Gao et.al. | 2502.02145 | link |
2025-02-04 | IPO: Iterative Preference Optimization for Text-to-Video Generation | Xiaomeng Yang et.al. | 2502.02088 | null |
2025-02-04 | Position Paper: Building Trust in Synthetic Data for Clinical AI | Krishan Agyakari Raja Babu et.al. | 2502.02076 | null |
2025-02-04 | Rethinking stance detection: A theoretically-informed research agenda for user-level inference using language models | Prasanta Bhattacharya et.al. | 2502.02074 | null |
2025-02-04 | ASCenD-BDS: Adaptable, Stochastic and Context-aware framework for Detection of Bias, Discrimination and Stereotyping | Rajiv Bahl et.al. | 2502.02072 | null |
2025-02-04 | Robust and Secure Code Watermarking for Large Language Models via ML/Crypto Codesign | Ruisi Zhang et.al. | 2502.02068 | null |
2025-02-04 | AdaptBot: Combining LLM with Knowledge Graphs and Human Input for Generic-to-Specific Task Decomposition and Knowledge Refinement | Shivam Singh et.al. | 2502.02067 | link |
2025-02-04 | Anticipate & Act : Integrating LLMs and Classical Planning for Efficient Task Execution in Household Environments | Raghav Arora et.al. | 2502.02066 | null |
2025-02-04 | CASIM: Composite Aware Semantic Injection for Text to Motion Generation | Che-Jui Chang et.al. | 2502.02063 | null |
2025-02-04 | Large Language Models for Recommendation with Deliberative User Preference Alignment | Yi Fang et.al. | 2502.02061 | null |
2025-02-04 | Efficient Domain Adaptation of Multimodal Embeddings using Constrastive Learning | Georgios Margaritis et.al. | 2502.02048 | null |
2025-02-04 | Contextual Memory Reweaving in Large Language Models Using Layered Latent State Reconstruction | Frederick Dillon et.al. | 2502.02046 | null |
2025-02-04 | M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference | Nikhil Bhendawade et.al. | 2502.02040 | null |
2025-02-04 | ContinuouSP: Generative Model for Crystal Structure Prediction with Invariance and Continuity | Yuji Tone et.al. | 2502.02026 | null |
2025-02-04 | From Accidents to Insights: Leveraging Multimodal Data for Scenario-Driven ADS Testing | Siwei Luo et.al. | 2502.02025 | null |
2025-02-04 | ComplexDec: A Domain-robust High-fidelity Neural Audio Codec with Complex Spectrum Modeling | Yi-Chiao Wu et.al. | 2502.02019 | null |
2025-02-04 | Multi-Domain Graph Foundation Models: Robust Knowledge Transfer via Topology Alignment | Shuo Wang et.al. | 2502.02017 | null |
2025-02-04 | A Periodic Bayesian Flow for Material Generation | Hanlin Wu et.al. | 2502.02016 | link |
2025-02-04 | Layer by Layer: Uncovering Hidden Representations in Language Models | Oscar Skean et.al. | 2502.02013 | null |
2025-02-04 | LLMSecConfig: An LLM-Based Approach for Fixing Software Container Misconfigurations | Ziyang Ye et.al. | 2502.02009 | null |
2025-02-04 | Reasoning Bias of Next Token Prediction Training | Pengxiao Lin et.al. | 2502.02007 | null |
2025-02-04 | FinRLlama: A Solution to LLM-Engineered Signals Challenge at FinRL Contest 2024 | Arnav Grover et.al. | 2502.01992 | null |
2025-02-04 | Can LLMs Assist Annotators in Identifying Morality Frames? -- Case Study on Vaccination Debate on Social Media | Tunazzina Islam et.al. | 2502.01991 | null |
2025-02-04 | Generative Data Mining with Longtail-Guided Diffusion | David S. Hayden et.al. | 2502.01980 | null |
2025-02-04 | Gradient-Regularized Latent Space Modulation in Large Language Models for Structured Contextual Synthesis | Derek Yotheringhay et.al. | 2502.01979 | null |
2025-02-04 | AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs | Hongxin Li et.al. | 2502.01977 | null |
2025-02-04 | CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing | Wenhao Zheng et.al. | 2502.01976 | null |
2025-02-04 | Token Cleaning: Fine-Grained Data Selection for LLM Supervised Fine-Tuning | Jinlong Pang et.al. | 2502.01968 | null |
2025-02-04 | MPIC: Position-Independent Multimodal Context Caching System for Efficient MLLM Serving | Shiju Zhao et.al. | 2502.01960 | null |
2025-02-04 | Local minima of the empirical risk in high dimension: General theorems and convex examples | Kiana Asgari et.al. | 2502.01953 | null |
2025-02-04 | DAMO: Data- and Model-aware Alignment of Multi-modal LLMs | Jinda Lu et.al. | 2502.01943 | null |
2025-02-04 | Can LLMs Maintain Fundamental Abilities under KV Cache Compression? | Xiang Liu et.al. | 2502.01941 | null |
2025-02-04 | Toward a Low-Cost Perception System in Autonomous Vehicles: A Spectrum Learning Approach | Mohammed Alsakabi et.al. | 2502.01940 | null |
2025-02-04 | Distributionally Robust Direct Preference Optimization | Zaiyan Xu et.al. | 2502.01930 | null |
2025-02-04 | PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negative Demonstration, and Adaptive Sampling | Avery Ma et.al. | 2502.01925 | null |
2025-02-04 | LAST SToP For Modeling Asynchronous Time Series | Shubham Gupta et.al. | 2502.01922 | null |
2025-02-04 | Anomaly Detection via Autoencoder Composite Features and NCE | Yalin Liao et.al. | 2502.01920 | null |
2025-02-04 | Unlocking Efficient Large Inference Models: One-Bit Unrolling Tips the Scales | Arian Eamaz et.al. | 2502.01908 | null |
2025-02-04 | Rethinking Homogeneity of Vision and Text Tokens in Large Vision-and-Language Models | Chia-Wen Kuo et.al. | 2502.01906 | null |
2025-02-04 | Conceptual Metaphor Theory as a Prompting Paradigm for Large Language Models | Oliver Kramer et.al. | 2502.01901 | null |
2025-02-03 | Latent Lexical Projection in Large Language Models: A Novel Approach to Implicit Representation Refinement | Ziad Shaker et.al. | 2502.01882 | null |
2025-02-03 | SE Arena: Benchmarking Software Engineering Chatbots with Iterative Interactions | Zhimin Zhao et.al. | 2502.01860 | null |
2025-02-03 | Security and Quality in LLM-Generated Code: A Multi-Language, Multi-Model Analysis | Mohammed Kharma et.al. | 2502.01853 | null |
2025-02-03 | Foundation Model-Based Apple Ripeness and Size Estimation for Selective Harvesting | Keyi Zhu et.al. | 2502.01850 | link |
2025-02-03 | Relatively-Secure LLM-Based Steganography via Constrained Markov Decision Processes | Yu-Shin Huang et.al. | 2502.01827 | link |
2025-02-03 | Agentic Bug Reproduction for Effective Automated Program Repair at Google | Runxiang Cheng et.al. | 2502.01821 | null |
2025-02-03 | Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning | Hanyang Zhao et.al. | 2502.01819 | null |
2025-02-03 | SelfCheckAgent: Zero-Resource Hallucination Detection in Generative Large Language Models | Diyana Muhammed et.al. | 2502.01812 | null |
2025-02-03 | Toward Neurosymbolic Program Comprehension | Alejandro Velasco et.al. | 2502.01806 | null |
2025-02-03 | Discovering Chunks in Neural Embeddings for Interpretability | Shuchen Wu et.al. | 2502.01803 | null |
2025-02-03 | Harmful Terms and Where to Find Them: Measuring and Modeling Unfavorable Financial Terms and Conditions in Shopping Websites at Scale | Elisa Tsai et.al. | 2502.01798 | link |
2025-01-31 | Vintix: Action Model via In-Context Reinforcement Learning | Andrey Polubarov et.al. | 2501.19400 | link |
2025-01-31 | Do LLMs Strategically Reveal, Conceal, and Infer Information? A Theoretical and Empirical Analysis in The Chameleon Game | Mustafa O. Karabag et.al. | 2501.19398 | link |
2025-01-31 | Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models | Alina Shutova et.al. | 2501.19392 | link |
2025-01-31 | Federated Sketching LoRA: On-Device Collaborative Fine-Tuning of Large Language Models | Wenzhi Fang et.al. | 2501.19389 | link |
2025-02-03 | SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions | Dominik Wagner et.al. | 2501.19377 | null |
2025-01-31 | Beyond Fixed Horizons: A Theoretical Framework for Adaptive Denoising Diffusions | Sören Christensen et.al. | 2501.19373 | null |
2025-01-31 | We're Different, We're the Same: Creative Homogeneity Across LLMs | Emily Wenger et.al. | 2501.19361 | null |
2025-01-31 | Mechanical Properties of the Meninges: Large Language Model Assisted Systematic Review of over 25,000 Studies | Brandon P. Chelstrom et.al. | 2501.19359 | null |
2025-01-31 | The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking | Yuchun Miao et.al. | 2501.19358 | null |
2025-01-31 | Addressing the correlation of Stokes-shifted photons emitted from two quantum emitters | Adrián Juan-Delgado et.al. | 2501.19356 | null |
2025-01-31 | Do Large Multimodal Models Solve Caption Generation for Scientific Figures? Lessons Learned from SCICAP Challenge 2023 | Ting-Yao E. Hsu et.al. | 2501.19353 | null |
2025-01-31 | Towards Adaptive Self-Improvement for Smarter Energy Systems | Alexander Sommer et.al. | 2501.19340 | null |
2025-01-31 | PixelWorld: Towards Perceiving Everything as Pixels | Zhiheng Lyu et.al. | 2501.19339 | null |
2025-01-31 | Homogeneity Bias as Differential Sampling Uncertainty in Language Models | Messi H. J. Lee et.al. | 2501.19337 | null |
2025-01-31 | Reward-Guided Speculative Decoding for Efficient LLM Reasoning | Baohao Liao et.al. | 2501.19324 | null |
2025-01-31 | MINDSTORES: Memory-Informed Neural Decision Synthesis for Task-Oriented Reinforcement in Embodied Systems | Anirudh Chari et.al. | 2501.19318 | null |
2025-01-31 | LLM-based Affective Text Generation Quality Based on Different Quantization Values | Yarik Menchaca Resendiz et.al. | 2501.19317 | null |
2025-01-31 | Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment | Gregor Bachmann et.al. | 2501.19309 | null |
2025-02-03 | SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling | Jiefeng Chen et.al. | 2501.19306 | null |
2025-01-31 | Beyond checkmate: exploring the creative chokepoints in AI text | Nafis Irtiza Tripto et.al. | 2501.19301 | link |
2025-01-31 | Offline Learning for Combinatorial Multi-armed Bandits | Xutong Liu et.al. | 2501.19300 | null |
2025-01-31 | Synthetic User Behavior Sequence Generation with Large Language Models for Smart Homes | Zhiyao Xu et.al. | 2501.19298 | null |
2025-01-31 | Analysis of LLMs vs Human Experts in Requirements Engineering | Cory Hymel et.al. | 2501.19297 | null |
2025-01-31 | Low-Cost and Comprehensive Non-textual Input Fuzzing with LLM-Synthesized Input Generators | Kunpeng Zhang et.al. | 2501.19282 | null |
2025-01-31 | Pheromone-based Learning of Optimal Reasoning Paths | Anirudh Chari et.al. | 2501.19278 | null |
2025-01-31 | From Assistance to Autonomy -- A Researcher Study on the Potential of AI Support for Qualitative Data Analysis | Elisabeth Kirsten et.al. | 2501.19275 | null |
2025-01-31 | Jackpot! Alignment as a Maximal Lottery | Roberto-Rafael Maura-Rivero et.al. | 2501.19266 | null |
2025-01-31 | Neuro-LIFT: A Neuromorphic, LLM-based Interactive Framework for Autonomous Drone FlighT at the Edge | Amogh Joshi et.al. | 2501.19259 | null |
2025-01-31 | A Zero-Shot Generalization Framework for LLM-Driven Cross-Domain Sequential Recommendation | Yunzhe Li et.al. | 2501.19232 | null |
2025-01-31 | Autonomous Legacy Web Application Upgrades Using a Multi-Agent System | Valtteri Ala-Salmi et.al. | 2501.19204 | link |
2025-02-03 | Improving the Robustness of Representation Misdirection for Large Language Model Unlearning | Dang Huu-Tien et.al. | 2501.19202 | link |
2025-01-31 | Efficient Reasoning with Hidden Thinking | Xuan Shen et.al. | 2501.19201 | link |
2025-01-31 | Enhancing Model Defense Against Jailbreaks with Proactive Safety Reasoning | Xianglin Yang et.al. | 2501.19180 | null |
2025-01-31 | No Foundations without Foundations -- Why semi-mechanistic models are essential for regulatory biology | Luka Kovačević et.al. | 2501.19178 | null |
2025-01-31 | Position: Contextual Integrity Washing for Language Models | Yan Shvartzshnaider et.al. | 2501.19173 | null |
2025-01-31 | Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs | Kejia Zhang et.al. | 2501.19164 | null |
2025-01-31 | A theoretical framework for overfitting in energy-based modeling | Giovanni Catania et.al. | 2501.19158 | null |
2025-01-31 | A Tensor-Train Decomposition based Compression of LLMs on Group Vector Systolic Accelerator | Sixiao Huang et.al. | 2501.19135 | null |
2025-01-31 | Unraveling Zeroth-Order Optimization through the Lens of Low-Dimensional Structured Perturbations | Sihwan Park et.al. | 2501.19099 | null |
2025-01-31 | Ambient Denoising Diffusion Generative Adversarial Networks for Establishing Stochastic Object Models from Noisy Image Data | Xichen Xu et.al. | 2501.19094 | null |
2025-01-31 | Pivoting Factorization: A Compact Meta Low-Rank Representation of Sparsity for Efficient Inference in Large Language Models | Jialin Zhao et.al. | 2501.19090 | null |
2025-01-31 | Fairness Analysis of CLIP-Based Foundation Models for X-Ray Image Classification | Xiangyu Sun et.al. | 2501.19086 | null |
2025-01-31 | Enhancing Code Generation for Low-Resource Languages: No Silver Bullet | Alessandro Giagnorio et.al. | 2501.19085 | null |
2025-01-31 | Concept Steerers: Leveraging K-Sparse Autoencoders for Controllable Generations | Dahye Kim et.al. | 2501.19066 | link |
2025-01-31 | TeZO: Empowering the Low-Rankness on the Temporal Dimension in the Zeroth-Order Optimization for Fine-tuning LLMs | Yan Sun et.al. | 2501.19057 | null |
2025-01-31 | Enabling Autonomic Microservice Management through Self-Learning Agents | Fenglin Yu et.al. | 2501.19056 | null |
2025-01-31 | Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models | Ruiyu Wang et.al. | 2501.19054 | null |
2025-01-31 | Swarm-Gen: Fast Generation of Diverse Feasible Swarm Behaviors | Simon Idoko et.al. | 2501.19042 | link |
2025-01-31 | Towards the Worst-case Robustness of Large Language Models | Huanran Chen et.al. | 2501.19040 | null |
2025-01-31 | Beyond Token Compression: A Training-Free Reduction Framework for Efficient Visual Processing in MLLMs | Hongliang Li et.al. | 2501.19036 | null |
2025-01-31 | XRF V2: A Dataset for Action Summarization with Wi-Fi Signals, and IMUs in Phones, Watches, Earbuds, and Glasses | Bo Lan et.al. | 2501.19034 | link |
2025-01-31 | Multilayer Networks in Neuroimaging | Vesna Vuksanovic et.al. | 2501.19024 | null |
2025-01-31 | Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation | Bin Zhu et.al. | 2501.19017 | null |
2025-01-31 | Importing Phantoms: Measuring LLM Package Hallucination Vulnerabilities | Arjun Krishna et.al. | 2501.19012 | null |
2025-01-31 | Visual Autoregressive Modeling for Image Super-Resolution | Yunpeng Qu et.al. | 2501.18993 | null |
2025-01-31 | Symmetric Pruning of Large Language Models | Kai Yi et.al. | 2501.18980 | null |
2025-01-31 | BCAT: A Block Causal Transformer for PDE Foundation Models for Fluid Dynamics | Yuxuan Liu et.al. | 2501.18972 | null |
2025-01-31 | Spend Wisely: Maximizing Post-Training Gains in Iterative Synthetic Data Boostrapping | Pu Yang et.al. | 2501.18962 | link |
2025-01-31 | Intrinsic Tensor Field Propagation in Large Language Models: A Novel Approach to Contextual Information Flow | Alfred Bexley et.al. | 2501.18957 | null |
2025-01-31 | LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models | Shenghao Fu et.al. | 2501.18954 | link |
2025-01-31 | TabFSBench: Tabular Benchmark for Feature Shifts in Open Environment | Zi-Jian Cheng et.al. | 2501.18935 | link |
2025-01-31 | Language Games as the Pathway to Artificial Superhuman Intelligence | Ying Wen et.al. | 2501.18924 | null |
2025-01-31 | KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search | Haoran Luo et.al. | 2501.18922 | link |
2025-01-31 | LLM Program Optimization via Retrieval Augmented Search | Sagnik Anupam et.al. | 2501.18916 | null |
2025-01-31 | Scaling Laws for Differentially Private Language Models | Ryan McKenna et.al. | 2501.18914 | null |
2025-01-31 | Streamlining Security Vulnerability Triage with Large Language Models | Mohammad Jalili Torkamani et.al. | 2501.18908 | null |
2025-01-31 | Trustworthy Evaluation of Generative AI Models | Zijun Gao et.al. | 2501.18897 | null |
2025-01-31 | Can We Predict the Effect of Prompts? | Jae Yong Lee et.al. | 2501.18883 | null |
2025-01-31 | Adaptivity and Convergence of Probability Flow ODEs in Diffusion Generative Models | Jiaqi Tang et.al. | 2501.18863 | null |
2025-01-31 | BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning | Han Zhong et.al. | 2501.18858 | null |
2025-01-31 | Equivariant Hypergraph Diffusion for Crystal Structure Prediction | Yang Liu et.al. | 2501.18850 | null |
2025-01-31 | Text Data Augmentation for Large Language Models: A Comprehensive Survey of Methods, Challenges, and Opportunities | Yaping Chai et.al. | 2501.18845 | null |
2025-01-31 | Trading Inference-Time Compute for Adversarial Robustness | Wojciech Zaremba et.al. | 2501.18841 | null |
2025-01-31 | Partially Rewriting a Transformer in Natural Language | Gonçalo Paulo et.al. | 2501.18838 | link |
2025-01-31 | Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming | Mrinank Sharma et.al. | 2501.18837 | null |
2025-01-31 | Pitfalls of defacing whole-head MRI: re-identification risk with diffusion models and compromised research potential | Chenyu Gao et.al. | 2501.18834 | null |
2025-01-31 | Structural Embedding Projection for Contextual Large Language Model Inference | Vincent Enoasmo et.al. | 2501.18826 | null |
2025-01-31 | Bridging the Reasoning Gap: Small LLMs Can Plan with Generalised Strategies | Andrey Borro et.al. | 2501.18817 | link |
2025-01-31 | Large Language Models as Common-Sense Heuristics | Andrey Borro et.al. | 2501.18816 | null |
2025-01-30 | Compositional Generalization Requires More Than Disentangled Representations | Qiyao Liang et.al. | 2501.18797 | null |
2025-01-30 | Rope to Nope and Back Again: A New Hybrid Attention Strategy | Bowen Yang et.al. | 2501.18795 | null |
2025-01-30 | Survey and Improvement Strategies for Gene Prioritization with Large Language Models | Matthew Neeley et.al. | 2501.18794 | null |
2025-01-30 | LLM-Generated Heuristics for AI Planning: Do We Even Need Domain-Independence Anymore? | Alexander Tuisov et.al. | 2501.18784 | null |
2025-01-30 | Navigating the Fragrance space Via Graph Generative Models And Predicting Odors | Mrityunjay Sharma et.al. | 2501.18777 | link |
2025-01-30 | Probabilistic Joint Recovery Method for CO |
Zijun Deng et.al. | 2501.18761 | null |
2025-01-30 | Synthetic Data Generation for Augmenting Small Samples | Dan Liu et.al. | 2501.18741 | null |
2025-01-30 | Examining the Robustness of Large Language Models across Language Complexity | Jiayi Zhang et.al. | 2501.18738 | null |
2025-01-30 | Exploring Audio Editing Features as User-Centric Privacy Defenses Against Emotion Inference Attacks | Mohd. Farhan Israk Soumik et.al. | 2501.18727 | null |
2025-01-30 | Strong and Controllable 3D Motion Generation | Canxuan Gang et.al. | 2501.18726 | null |
2025-01-30 | Zero-shot Large Language Models for Long Clinical Text Summarization with Temporal Reasoning | Maya Kruse et.al. | 2501.18724 | null |
2025-02-03 | Invisible Traces: Using Hybrid Fingerprinting to identify underlying LLMs in GenAI Apps | Devansh Bhardwaj et.al. | 2501.18712 | null |
2025-01-30 | Regularized second-order optimization of tensor-network Born machines | Matan Ben-Dov et.al. | 2501.18691 | null |
2025-01-30 | Drag Your Gaussian: Effective Drag-Based Editing with Score Distillation for 3D Gaussian Splatting | Yansong Qu et.al. | 2501.18672 | null |
2025-01-30 | Foundational Models for 3D Point Clouds: A Survey and Outlook | Vishal Thengane et.al. | 2501.18594 | null |
2025-01-30 | Diffusion Autoencoders are Scalable Image Tokenizers | Yinbo Chen et.al. | 2501.18593 | null |
2025-02-03 | Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models | Hao Dong et.al. | 2501.18592 | link |
2025-01-30 | Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs | Yue Wang et.al. | 2501.18585 | null |
2025-01-30 | Token-Hungry, Yet Precise: DeepSeek R1 Highlights the Need for Multi-Step Reasoning Over Speed in MATH | Evgenii Evstafev et.al. | 2501.18576 | null |
2025-01-30 | BounTCHA: A CAPTCHA Utilizing Boundary Identification in AI-extended Videos | Lehao Lin et.al. | 2501.18565 | null |
2025-01-30 | SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation | Haoquan Fang et.al. | 2501.18564 | link |
2025-01-30 | Semantic Web and Creative AI -- A Technical Report from ISWS 2023 | Raia Abu Ahmad et.al. | 2501.18542 | null |
2025-01-30 | Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges | Manveer Singh Tamber et.al. | 2501.18536 | link |
2025-01-30 | Differentially Private Steering for Large Language Model Alignment | Anmol Goel et.al. | 2501.18532 | link |
2025-01-30 | Learn from the Past: Language-conditioned Object Rearrangement with Large Language Models | Guanqun Cao et.al. | 2501.18516 | null |
2025-01-30 | Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch | Arthur Douillard et.al. | 2501.18512 | null |
2025-01-30 | WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training | Benjamin Feuer et.al. | 2501.18511 | link |
2025-01-30 | CLEAR: Cue Learning using Evolution for Accurate Recognition Applied to Sustainability Data Extraction | Peter J. Bentley et.al. | 2501.18504 | null |
2025-01-30 | Examining the Expanding Role of Synthetic Data Throughout the AI Development Pipeline | Shivani Kapania et.al. | 2501.18493 | null |
2025-01-30 | A Tool for In-depth Analysis of Code Execution Reasoning of Large Language Models | Changshu Liu et.al. | 2501.18482 | null |
2025-01-30 | CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization | Yanxia Deng et.al. | 2501.18475 | null |
2025-01-30 | Tuning Vision Foundation Model via Test-Time Prompt-Guided Training for VFSS Segmentations | Chengxi Zeng et.al. | 2501.18474 | null |
2025-01-30 | ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation | Minghua He et.al. | 2501.18460 | null |
2025-01-30 | CALM: Unleashing the Cross-Lingual Self-Aligning Ability of Language Model Question Answering | Yumeng Wang et.al. | 2501.18457 | null |
2025-01-30 | GENIE: Generative Note Information Extraction model for structuring EHR data | Huaiyuan Ying et.al. | 2501.18435 | null |
2025-01-30 | Exploring Potential Prompt Injection Attacks in Federated Military LLMs and Their Mitigation | Youngjoon Lee et.al. | 2501.18416 | null |
2025-01-30 | RbFT: Robust Fine-tuning for Retrieval-Augmented Generation against Retrieval Defects | Yiteng Tu et.al. | 2501.18365 | link |
2025-01-30 | A Video-grounded Dialogue Dataset and Metric for Event-driven Activities | Wiradee Imrattanatrai et.al. | 2501.18324 | link |
2025-01-30 | Leveraging LLM Agents for Automated Optimization Modeling for SASP Problems: A Graph-RAG based Approach | Tianpeng Pan et.al. | 2501.18320 | null |
2025-01-30 | Mining for Species, Locations, Habitats, and Ecosystems from Scientific Papers in Invasion Biology: A Large-Scale Exploratory Study with Large Language Models | Jennifer D'Souza et.al. | 2501.18287 | null |
2025-01-30 | Jailbreaking LLMs' Safeguard with Universal Magic Words for Text Embedding Models | Haoyu Liang et.al. | 2501.18280 | null |
2025-01-30 | Collecting Cost-Effective, High-Quality Truthfulness Assessments with LLM Summarized Evidence | Kevin Roitero et.al. | 2501.18265 | null |
2025-01-30 | How to Select Datapoints for Efficient Human Evaluation of NLG Models? | Vilém Zouhar et.al. | 2501.18251 | link |
2025-01-30 | Statistical multi-metric evaluation and visualization of LLM system predictive performance | Samuel Ackerman et.al. | 2501.18243 | null |
2025-01-30 | Contextually Structured Token Dependency Encoding for Large Language Models | James Blades et.al. | 2501.18205 | null |
2025-01-30 | Economic Rationality under Specialization: Evidence of Decision Bias in AI Agents | ShuiDe Wen et.al. | 2501.18190 | null |
2025-01-30 | Investigating Tax Evasion Emergence Using Dual Large Language Model and Deep Reinforcement Learning Powered Agent-based Simulation | Teddy Lazebnik et.al. | 2501.18177 | null |
2025-01-30 | Continually Evolved Multimodal Foundation Models for Cancer Prognosis | Jie Peng et.al. | 2501.18170 | null |
2025-01-30 | RepoAudit: An Autonomous LLM-Agent for Repository-Level Code Auditing | Jinyao Guo et.al. | 2501.18160 | null |
2025-01-30 | Large Language Models for Cryptocurrency Transaction Analysis: A Bitcoin Case Study | Yuchen Lei et.al. | 2501.18158 | null |
2025-01-30 | Mixed-Precision Graph Neural Quantization for Low Bit Large Language Models | Wanlong Liu et.al. | 2501.18154 | null |
2025-01-30 | Self-supervised Quantized Representation for Seamlessly Integrating Knowledge Graphs with Large Language Models | Qika Lin et.al. | 2501.18119 | null |
2025-01-30 | Scaling Inference-Efficient Language Models | Song Bian et.al. | 2501.18107 | null |
2025-01-30 | Panacea: Mitigating Harmful Fine-tuning for Large Language Models via Post-fine-tuning Perturbation | Yibo Wang et.al. | 2501.18100 | link |
2025-01-30 | AlphaAdam:Asynchronous Masked Optimization with Dynamic Alpha for Selective Updates | Da Chang et.al. | 2501.18094 | null |
2025-01-30 | Normative Evaluation of Large Language Models with Everyday Moral Dilemmas | Pratik S. Sachdeva et.al. | 2501.18081 | null |
2025-01-30 | FinanceQA: A Benchmark for Evaluating Financial Analysis Capabilities of Large Language Models | Spencer Mateega et.al. | 2501.18062 | null |
2025-01-29 | RL-based Query Rewriting with Distilled LLM for online E-Commerce Systems | Duy A. Nguyen et.al. | 2501.18056 | null |
2025-01-29 | Current Pathology Foundation Models are unrobust to Medical Center Differences | Edwin D. de Jong et.al. | 2501.18055 | null |
2025-01-29 | A Proximal Operator for Inducing 2:4-Sparsity | Jonas M Kübler et.al. | 2501.18015 | null |
2025-01-29 | Large Language Models Think Too Fast To Explore Effectively | Lan Pan et.al. | 2501.18009 | null |
2025-01-29 | Fault Localization via Fine-tuning Large Language Models with Mutation Generated Stack Traces | Neetha Jambigi et.al. | 2501.18005 | null |
2025-01-29 | InnerThoughts: Disentangling Representations and Predictions in Large Language Models | Didier Chételat et.al. | 2501.17994 | null |
2025-01-29 | Can Generative LLMs Create Query Variants for Test Collections? An Exploratory Study | Marwah Alaofi et.al. | 2501.17981 | link |
2025-01-29 | Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization | Zishun Yu et.al. | 2501.17974 | null |
2025-01-29 | "I Would Never Trust Anything Western": Kumu (Educator) Perspectives on Use of LLMs for Culturally Revitalizing CS Education in Hawaiian Schools | Manas Mhasakar et.al. | 2501.17942 | null |
2025-01-29 | DReSS: Data-driven Regularized Structured Streamlining for Large Language Models | Mingkuan Feng et.al. | 2501.17905 | null |
2025-01-29 | Learning Beyond the Surface: How Far Can Continual Pre-Training with LoRA Enhance LLMs' Domain-Specific Insight Learning? | Pouya Pezeshkpour et.al. | 2501.17840 | link |
2025-01-29 | Aggregation Schemes for Single-Vector WSI Representation Learning in Digital Pathology | Sobhan Hemati et.al. | 2501.17822 | null |
2025-01-30 | Leveraging Multimodal LLM for Inspirational User Interface Search | Seokhyeon Park et.al. | 2501.17799 | link |
2025-01-29 | BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation -- Challenges and Insights | Chan-Jan Hsu et.al. | 2501.17790 | null |
2025-01-29 | AdditiveLLM: Large Language Models Predict Defects in Additive Manufacturing | Peter Pak et.al. | 2501.17784 | null |
2025-01-29 | 2SSP: A Two-Stage Framework for Structured Pruning of LLMs | Fabrizio Sandri et.al. | 2501.17771 | link |
2025-01-29 | Generative Unordered Flow for Set-Structured Data Generation | Yangming Li et.al. | 2501.17770 | null |
2025-01-29 | Hybrid Graphs for Table-and-Text based Question Answering using LLMs | Ankush Agarwal et.al. | 2501.17767 | null |
2025-01-29 | On the Partitioning of GPU Power among Multi-Instances | Tirth Vamja et.al. | 2501.17752 | null |
2025-01-29 | Early External Safety Testing of OpenAI's o3-mini: Insights from the Pre-Deployment Evaluation | Aitor Arrieta et.al. | 2501.17749 | null |
2025-01-29 | A technical review of multi-omics data integration methods: from classical statistical to deep generative approaches | Ana R. Baião et.al. | 2501.17729 | null |
2025-01-29 | Using Code Generation to Solve Open Instances of Combinatorial Design Problems | Christopher D. Rosin et.al. | 2501.17725 | link |
2025-01-29 | RICoTA: Red-teaming of In-the-wild Conversation with Test Attempts | Eujeong Choi et.al. | 2501.17715 | link |
2025-01-29 | Source-Channel Separation Theorems for Distortion Perception Coding | Chao Tian et.al. | 2501.17706 | null |
2025-01-29 | Planning with Vision-Language Models and a Use Case in Robot-Assisted Teaching | Xuzhe Dang et.al. | 2501.17665 | null |
2025-01-30 | In-Context Meta LoRA Generation | Yihua Shao et.al. | 2501.17635 | null |
2025-01-29 | Uncertainty Quantification and Decomposition for LLM-based Recommendation | Wonbin Kweon et.al. | 2501.17630 | link |
2025-01-29 | The Imitation Game According To Turing | Sharon Temtsin et.al. | 2501.17629 | null |
2025-01-29 | Structured Context Recomposition for Large Language Models Using Probabilistic Layer Realignment | Jonathan Teel et.al. | 2501.17617 | null |
2025-01-29 | Semantic Consistency Regularization with Large Language Models for Semi-supervised Sentiment Analysis | Kunrong Li et.al. | 2501.17598 | null |
2025-01-30 | Technical report on label-informed logit redistribution for better domain generalization in low-shot classification with foundation models | Behraj Khan et.al. | 2501.17595 | null |
2025-01-29 | GLLM: Self-Corrective G-Code Generation using Large Language Models with User Feedback | Mohamed Abdelaal et.al. | 2501.17584 | null |
2025-01-29 | CSEval: Towards Automated, Multi-Dimensional, and Reference-Free Counterspeech Evaluation using Auto-Calibrated LLMs | Amey Hengle et.al. | 2501.17581 | null |
2025-01-29 | Music2Latent2: Audio Compression with Summary Embeddings and Autoregressive Decoding | Marco Pasini et.al. | 2501.17578 | null |
2025-01-29 | Query-Aware Learnable Graph Pooling Tokens as Prompt for Large Language Models | Wooyoung Kim et.al. | 2501.17549 | null |
2025-01-29 | Towards Training-Free Open-World Classification with 3D Generative Models | Xinzhe Xia et.al. | 2501.17547 | null |
2025-01-29 | Is Conversational XAI All You Need? Human-AI Decision Making With a Conversational XAI Assistant | Gaole He et.al. | 2501.17546 | link |
2025-01-29 | Towards Supporting Penetration Testing Education with Large Language Models: an Evaluation and Comparison | Martin Nizon-Deladoeuille et.al. | 2501.17539 | null |
2025-01-29 | Neural Spelling: A Spell-Based BCI System for Language Neural Decoding | Xiaowei Jiang et.al. | 2501.17489 | null |
2025-01-29 | DFPE: A Diverse Fingerprint Ensemble for Enhancing LLM Performance | Seffi Cohen et.al. | 2501.17479 | link |
2025-01-29 | AugmenTest: Enhancing Tests with LLM-Driven Oracles | Shaker Mahmud Khandaker et.al. | 2501.17461 | null |
2025-01-29 | Large Language Models for Single-Step and Multi-Step Flight Trajectory Prediction | Kaiwei Luo et.al. | 2501.17459 | null |
2025-01-29 | Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation | Tiansheng Huang et.al. | 2501.17433 | link |
2025-01-29 | Actions Speak Louder than Words: Agent Decisions Reveal Implicit Biases in Language Models | Yuxuan Li et.al. | 2501.17420 | null |
2025-01-29 | MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs | Ved Sirdeshmukh et.al. | 2501.17399 | link |
2025-01-29 | Learning Free Token Reduction for Multi-Modal LLM | Zihui Zhao et.al. | 2501.17391 | null |
2025-01-29 | Context-Aware Semantic Recomposition Mechanism for Large Language Models | Richard Katrix et.al. | 2501.17386 | null |
2025-01-28 | Deep-and-Wide Learning: Enhancing Data-Driven Inference via Synergistic Learning of Inter- and Intra-Data Representations | Md Tauhidul Islam et.al. | 2501.17347 | null |
2025-01-28 | Memorize and Rank: Elevating Large Language Models for Clinical Diagnosis Prediction | Mingyu Derek Ma et.al. | 2501.17326 | null |
2025-01-28 | CardiCat: a Variational Autoencoder for High-Cardinality Tabular Data | Lee Carlin et.al. | 2501.17324 | null |
2025-01-30 | Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding | Yun-Shiuan Chuang et.al. | 2501.17310 | null |
2025-01-28 | "Ownership, Not Just Happy Talk": Co-Designing a Participatory Large Language Model for Journalism | Emily Tseng et.al. | 2501.17299 | null |
2025-01-28 | Mitigating Hallucinated Translations in Large Language Models with Hallucination-focused Preference Optimization | Zilu Tang et.al. | 2501.17295 | null |
2025-01-28 | Fine-Tuning Open-Source Large Language Models to Improve Their Performance on Radiation Oncology Tasks: A Feasibility Study to Investigate Their Potential Clinical Applications in Radiation Oncology | Peilong Wang et.al. | 2501.17286 | null |
2025-01-30 | From Natural Language to Extensive-Form Game Representations | Shilong Deng et.al. | 2501.17282 | link |
2025-01-28 | Engineering Point Defects in MoS2 for Tailored Material Properties using Large Language Models | Abdalaziz Al-Maeeni et.al. | 2501.17279 | null |
2025-01-28 | Tailored Truths: Optimizing LLM Persuasion with Personalization and Fabricated Statistics | Jasper Timm et.al. | 2501.17273 | link |
2025-01-28 | Integrating Reinforcement Learning and AI Agents for Adaptive Robotic Interaction and Assistance in Dementia Care | Fengpei Yuan et.al. | 2501.17206 | null |
2025-01-28 | SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training | Tianzhe Chu et.al. | 2501.17161 | null |
2025-01-28 | FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data | Deren Lei et.al. | 2501.17144 | link |
2025-01-28 | ASTRAL: Automated Safety Testing of Large Language Models | Miriam Ugarte et.al. | 2501.17132 | null |
2025-01-28 | Optimizing Large Language Model Training Using FP4 Quantization | Ruizhe Wang et.al. | 2501.17116 | null |
2025-01-28 | Unlocking Transparent Alignment Through Enhanced Inverse Constitutional AI for Principle Extraction | Carl-Leander Henneking et.al. | 2501.17112 | null |
2025-01-28 | Goodness of Fit for Bayesian Generative Models with Applications in Population Genetics | Guillaume Le Mailloux et.al. | 2501.17107 | link |
2025-01-28 | Token-by-Token Regeneration and Domain Biases: A Benchmark of LLMs on Advanced Mathematical Problem-Solving | Evgenii Evstafev et.al. | 2501.17084 | null |
2025-01-28 | Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding | Akash Kumar et.al. | 2501.17053 | null |
2025-01-28 | Enhanced Retrieval of Long Documents: Leveraging Fine-Grained Block Representations with Large Language Models | Minghan Li et.al. | 2501.17039 | null |
2025-01-28 | Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies | Manojkumar Parmar et.al. | 2501.17030 | null |
2025-01-28 | Automated Refactoring of Non-Idiomatic Python Code: A Differentiated Replication with LLMs | Alessandro Midolo et.al. | 2501.17024 | link |
2025-01-28 | Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement | Kei Katsumata et.al. | 2501.17022 | link |
2025-01-28 | MIDI-GPT: A Controllable Generative Model for Computer-Assisted Multitrack Music Composition | Philippe Pasquier et.al. | 2501.17011 | null |
2025-01-28 | Large Language Models for Code Generation: The Practitioners Perspective | Zeeshan Rasheed et.al. | 2501.16998 | link |
2025-01-28 | Artificial Intelligence Clones | Annie Liang et.al. | 2501.16996 | null |
2025-01-28 | FedEFM: Federated Endovascular Foundation Model with Unseen Data | Tuong Do et.al. | 2501.16992 | null |
2025-01-28 | Generative quantum combinatorial optimization by means of a novel conditional generative quantum eigensolver | Shunya Minami et.al. | 2501.16986 | null |
2025-01-28 | Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling | Hongzhi Huang et.al. | 2501.16975 | null |
2025-01-28 | Instantiation-based Formalization of Logical Reasoning Tasks using Language Models and Logical Solvers | Mohammad Raza et.al. | 2501.16961 | null |
2025-01-28 | Multiple Abstraction Level Retrieve Augment Generation | Zheng Zheng et.al. | 2501.16952 | null |
2025-01-29 | TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models | Makoto Shing et.al. | 2501.16937 | null |
2025-01-28 | Detecting harassment and defamation in cyberbullying with emotion-adaptive training | Peiling Yi et.al. | 2501.16925 | link |
2025-01-28 | RDMM: Fine-Tuned LLM Models for On-Device Robotic Decision Making with Enhanced Contextual Awareness in Specific Domains | Shady Nasrat et.al. | 2501.16899 | link |
2025-01-28 | Machine-learning semi-local exchange-correlation functionals for Kohn-Sham density functional theory of the Hubbard model | Eoghan Cronin et.al. | 2501.16893 | link |
2025-01-28 | Irony Detection, Reasoning and Understanding in Zero-shot Learning | Peiling Yi et.al. | 2501.16884 | null |
2025-01-28 | Comparing Human and LLM Generated Code: The Jury is Still Out! | Sherlock A. Licorish et.al. | 2501.16857 | null |
2025-01-28 | Adapting Network Information to Semantics for Generalizable and Plug-and-Play Multi-Scenario Network Diagnosis | Tiao Tan et.al. | 2501.16842 | null |
2025-01-28 | Misspellings in Natural Language Processing: A survey | Gianluca Sperduti et.al. | 2501.16836 | null |
2025-01-28 | DIRIGENt: End-To-End Robotic Imitation of Human Demonstrations Based on a Diffusion Model | Josua Spisak et.al. | 2501.16800 | null |
2025-01-28 | Algorithm for Automatic Legislative Text Consolidation | Matias Etcheverry et.al. | 2501.16794 | null |
2025-01-28 | Exponential Family Attention | Kevin Christian Wibisono et.al. | 2501.16790 | link |
2025-01-28 | Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding | Yun Li et.al. | 2501.16786 | null |
2025-01-28 | TORCHLIGHT: Shedding LIGHT on Real-World Attacks on Cloudless IoT Devices Concealed within the Tor Network | Yumingzhi Pan et.al. | 2501.16784 | null |
2025-01-28 | A Stochastic Dynamical Theory of LLM Self-Adversariality: Modeling Severity Drift as a Critical Process | Jack David Carson et.al. | 2501.16783 | null |
2025-01-29 | Beyond-Labels: Advancing Open-Vocabulary Segmentation With Vision-Language Models | Muhammad Atta ur Rahman et.al. | 2501.16769 | null |
2025-01-28 | DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation | Chenguo Lin et.al. | 2501.16764 | null |
2025-01-28 | HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns | Xinyue Shen et.al. | 2501.16750 | link |
2025-01-28 | Through the Prism of Culture: Evaluating LLMs' Understanding of Indian Subcultures and Traditions | Garima Chhikara et.al. | 2501.16748 | null |
2025-01-28 | LLM Assisted Anomaly Detection Service for Site Reliability Engineers: Enhancing Cloud Infrastructure Resilience | Nimesh Jha et.al. | 2501.16744 | null |
2025-01-28 | Distilling Large Language Models for Network Active Queue Management | Deol Satish et.al. | 2501.16734 | null |
2025-01-28 | xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking | Sunbowen Lee et.al. | 2501.16727 | link |
2025-01-28 | One Head Eight Arms: Block Matrix based Low Rank Adaptation for CLIP-based Few-Shot Learning | Chunpeng Zhou et.al. | 2501.16720 | null |
2025-01-28 | Outlier Synthesis via Hamiltonian Monte Carlo for Out-of-Distribution Detection | Hengzhuang Li et.al. | 2501.16718 | link |
2025-01-28 | 3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow | Yueen Ma et.al. | 2501.16698 | null |
2025-01-28 | MME-Industry: A Cross-Industry Multimodal Evaluation Benchmark | Dongyi Yi et.al. | 2501.16688 | null |
2025-01-28 | Auto-Differentiating Any LLM Workflow: A Farewell to Manual Prompting | Li Yin et.al. | 2501.16673 | link |
2025-01-28 | VeriFact: Verifying Facts in LLM-Generated Clinical Text with Electronic Health Records | Philip Chung et.al. | 2501.16672 | link |
2025-01-28 | Contextual Reinforcement in Multimodal Token Compression for Large Language Models | Naderdel Piero et.al. | 2501.16658 | null |
2025-01-28 | Large Language Model Critics for Execution-Free Evaluation of Code Changes | Aashish Yadavally et.al. | 2501.16655 | link |
2025-01-28 | Molecular-driven Foundation Model for Oncologic Pathology | Anurag Vaidya et.al. | 2501.16652 | link |
2025-01-28 | DOCS: Quantifying Weight Similarity for Deeper Insights into Large Language Models | Zeping Min et.al. | 2501.16650 | null |
2025-01-28 | An LLM Benchmark for Addressee Recognition in Multi-modal Multi-party Dialogue | Koji Inoue et.al. | 2501.16643 | null |
2025-01-28 | CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs | Jinlan Fu et.al. | 2501.16629 | link |
2025-01-28 | Few-Shot Optimized Framework for Hallucination Detection in Resource-Limited NLP Systems | Baraa Hikal et.al. | 2501.16616 | null |
2025-01-28 | Sparse Autoencoders Trained on the Same Data Learn Different Features | Gonçalo Paulo et.al. | 2501.16615 | null |
2025-01-28 | Fine-Tuned Language Models as Space Systems Controllers | Enrico M. Zucchelli et.al. | 2501.16588 | null |
2025-01-27 | AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models | Zheng Lian et.al. | 2501.16566 | null |
2025-01-27 | LoRA-X: Bridging Foundation Models with Training-Free Cross-Model Adaptation | Farzad Farhadzadeh et.al. | 2501.16559 | null |
2025-01-27 | Distributional Information Embedding: A Framework for Multi-bit Watermarking | Haiyun He et.al. | 2501.16558 | null |
2025-01-27 | PackDiT: Joint Human Motion and Text Generation via Mutual Prompting | Zhongyu Jiang et.al. | 2501.16551 | null |
2025-01-27 | PhysAnimator: Physics-Guided Generative Cartoon Animation | Tianyi Xie et.al. | 2501.16550 | null |
2025-01-27 | Sample-Efficient Behavior Cloning Using General Domain Knowledge | Feiyu Zhu et.al. | 2501.16546 | null |
2025-01-27 | Generalized Mission Planning for Heterogeneous Multi-Robot Teams via LLM-constructed Hierarchical Trees | Piyush Gupta et.al. | 2501.16539 | null |
2025-01-27 | Targeting Alignment: Extracting Safety Classifiers of Aligned LLMs | Jean-Charles Noirot Ferrand et.al. | 2501.16534 | null |
2025-01-27 | A comparison of data filtering techniques for English-Polish LLM-based machine translation in the biomedical domain | Jorge del Pozo Lérida et.al. | 2501.16533 | null |
2025-01-27 | Programming by Examples Meets Historical Linguistics: A Large Language Model Based Approach to Sound Law Induction | Atharva Naik et.al. | 2501.16524 | null |
2025-01-27 | How well can LLMs Grade Essays in Arabic? | Rayed Ghazawi et.al. | 2501.16516 | null |
2025-01-27 | Deception in LLMs: Self-Preservation and Autonomous Goals in Large Language Models | Sudarshan Kamath Barkur et.al. | 2501.16513 | null |
2025-01-27 | Smoothed Embeddings for Robust Language Models | Ryo Hase et.al. | 2501.16497 | null |
2025-01-27 | Explaining GitHub Actions Failures with Large Language Models: Challenges, Insights, and Limitations | Pablo Valenzuela-Toledo et.al. | 2501.16495 | null |
2025-01-27 | Generating customized prompts for Zero-Shot Rare Event Medical Image Classification using LLM | Payal Kamboj et.al. | 2501.16481 | link |
2025-01-27 | Cross-Domain Semantic Segmentation with Large Language Model-Assisted Descriptor Generation | Philip Hughes et.al. | 2501.16467 | null |
2025-01-27 | CoCoNUT: Structural Code Understanding does not fall out of a tree | Claas Beger et.al. | 2501.16456 | link |
2025-01-27 | Detecting Zero-Day Attacks in Digital Substations via In-Context Learning | Faizan Manzoor et.al. | 2501.16453 | null |
2025-01-27 | 360Brew: A Decoder-only Foundation Model for Personalized Ranking and Recommendation | Hamed Firooz et.al. | 2501.16450 | null |
2025-01-27 | DynAlign: Unsupervised Dynamic Taxonomy Alignment for Cross-Domain Segmentation | Han Sun et.al. | 2501.16410 | null |
2025-01-27 | Evaluating The Performance of Using Large Language Models to Automate Summarization of CT Simulation Orders in Radiation Oncology | Meiyun Cao et.al. | 2501.16309 | null |
2025-01-27 | RAPID: Retrieval-Augmented Parallel Inference Drafting for Text-Based Video Event Retrieval | Long Nguyen et.al. | 2501.16303 | null |
2025-01-27 | Matryoshka Re-Ranker: A Flexible Re-Ranking Architecture With Configurable Depth and Width | Zheng Liu et.al. | 2501.16302 | null |
2025-01-27 | Large Models in Dialogue for Active Perception and Anomaly Detection | Tzoulio Chamiti et.al. | 2501.16300 | link |
2025-01-27 | FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers | Renshan Zhang et.al. | 2501.16297 | null |
2025-01-27 | Brain-Adapter: Enhancing Neurological Disorder Analysis with Adapter-Tuning Multimodal Large Language Models | Jing Zhang et.al. | 2501.16282 | null |
2025-01-27 | Do LLMs Have Visualization Literacy? An Evaluation on Modified Visualizations to Test Generalization in Data Interpretation | Jiayi Hong et.al. | 2501.16277 | link |
2025-01-27 | URAG: Implementing a Unified Hybrid RAG for Precise Answers in University Admission Chatbots -- A Case Study at HCMUT | Long Nguyen et.al. | 2501.16276 | null |
2025-01-27 | A foundation model for human-AI collaboration in medical literature mining | Zifeng Wang et.al. | 2501.16255 | null |
2025-01-27 | Multi-Agent Geospatial Copilots for Remote Sensing Workflows | Chaehong Lee et.al. | 2501.16254 | null |
2025-01-27 | Zero-Shot Decision Tree Construction via Large Language Models | Lucas Carrasco et.al. | 2501.16247 | null |
2025-01-27 | CLISC: Bridging clip and sam by enhanced cam for unsupervised brain tumor segmentation | Xiaochuan Ma et.al. | 2501.16246 | null |
2025-01-27 | Phase Transitions in Large Language Models and the |
Youran Sun et.al. | 2501.16241 | null |
2025-01-27 | AiGet: Transforming Everyday Moments into Hidden Knowledge Discovery with AI Assistance on Smart Glasses | Runze Cai et.al. | 2501.16240 | null |
2025-01-28 | Distilling foundation models for robust and efficient models in digital pathology | Alexandre Filiot et.al. | 2501.16239 | null |
2025-01-27 | Language-Based Bayesian Optimization Research Assistant (BORA) | Abdoulatif Cissé et.al. | 2501.16224 | null |
2025-01-27 | Enhancing Visual Inspection Capability of Multi-Modal Large Language Models on Medical Time Series with Supportive Conformalized and Interpretable Small Specialized Models | Huayu Li et.al. | 2501.16215 | link |
2025-01-27 | Provence: efficient and robust context pruning for retrieval-augmented generation | Nadezhda Chirkova et.al. | 2501.16214 | null |
2025-01-27 | Raiders of the Lost Dependency: Fixing Dependency Conflicts in Python using LLMs | Antony Bartlett et.al. | 2501.16191 | null |
2025-01-27 | SWIFT: Mapping Sub-series with Wavelet Decomposition Improves Time Series Forecasting | Wenxuan Xie et.al. | 2501.16178 | link |
2025-01-27 | BAG: Body-Aligned 3D Wearable Asset Generation | Zhongjin Luo et.al. | 2501.16177 | null |
2025-01-27 | Will Systems of LLM Agents Cooperate: An Investigation into a Social Dilemma | Richard Willis et.al. | 2501.16173 | link |
2025-01-27 | MetaDecorator: Generating Immersive Virtual Tours through Multimodality | Shuang Xie et.al. | 2501.16164 | null |
2025-01-27 | CITYWALK: Enhancing LLM-Based C++ Unit Test Generation via Project-Dependency Awareness and Language-Specific Knowledge | Yuwei Zhang et.al. | 2501.16155 | null |
2025-01-27 | AdaCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Chain-of-Thought | Xin Huang et.al. | 2501.16154 | null |
2025-01-27 | AI Agents for Computer Use: A Review of Instruction-based Computer Control, GUI Automation, and Operator Assistants | Pascal J. Sager et.al. | 2501.16150 | null |
2025-01-27 | PATCH: Empowering Large Language Model with Programmer-Intent Guidance and Collaborative-Behavior Simulation for Automatic Bug Fixing | Yuwei Zhang et.al. | 2501.16149 | null |
2025-01-27 | SampleLLM: Optimizing Tabular Data Synthesis in Recommendations | Jingtong Gao et.al. | 2501.16125 | null |
2025-01-27 | Using Generative Models to Produce Realistic Populations of UK Windstorms | Yee Chun Tsoi et.al. | 2501.16110 | null |
2025-01-27 | Integration of LLM Quality Assurance into an NLG System | Ching-Yi Chen et.al. | 2501.16078 | null |
2025-01-27 | PISCO: Pretty Simple Compression for Retrieval-Augmented Generation | Maxime Louis et.al. | 2501.16075 | null |
2025-01-27 | A generative material transformer using Wyckoff representation | Pierre-Paul De Breuck et.al. | 2501.16051 | null |
2025-01-27 | Skeleton-Guided-Translation: A Benchmarking Framework for Code Repository Translation with Fine-Grained Quality Evaluation | Xing Zhang et.al. | 2501.16050 | null |
2025-01-27 | PRISMe: A Novel LLM-Powered Tool for Interactive Privacy Policy Assessment | Vincent Freiberger et.al. | 2501.16033 | null |
2025-01-27 | FDLLM: A Text Fingerprint Detection Method for LLMs in Multi-Language, Multi-Domain Black-Box Environments | Zhiyuan Fu et.al. | 2501.16029 | null |
2025-01-27 | Transformability reveals the interplay of dynamics across different network orders | Ming Xie et.al. | 2501.16016 | null |
2025-01-27 | TOPLOC: A Locality Sensitive Hashing Scheme for Trustless Verifiable Inference | Jack Min Ong et.al. | 2501.16007 | null |
2025-01-27 | EDSep: An Effective Diffusion-Based Method for Speech Source Separation | Jinwei Dong et.al. | 2501.15965 | null |
2025-01-27 | Rethinking the Bias of Foundation Model under Long-tailed Distribution | Jiahao Chen et.al. | 2501.15955 | null |
2025-01-27 | Understanding Long Videos via LLM-Powered Entity Relation Graphs | Meng Chu et.al. | 2501.15953 | null |
2025-01-27 | TimeHF: Billion-Scale Time Series Models Guided by Human Feedback | Yongzhi Qi et.al. | 2501.15942 | null |
2025-01-27 | SkillScope: A Tool to Predict Fine-Grained Skills Needed to Solve Issues on GitHub | Benjamin C. Carter et.al. | 2501.15922 | null |
2025-01-27 | Parametric Retrieval Augmented Generation | Weihang Su et.al. | 2501.15915 | link |
2025-01-27 | Robust Mobile Robot Path Planning via LLM-Based Dynamic Waypoint Generation | Muhammad Taha Tariq et.al. | 2501.15901 | null |
2025-01-27 | Investigating the Sensitivity of Pre-trained Audio Embeddings to Common Effects | Victor Deng et.al. | 2501.15900 | null |
2025-01-27 | Adaptive Width Neural Networks | Federico Errica et.al. | 2501.15889 | null |
2025-01-27 | LCTG Bench: LLM Controlled Text Generation Benchmark | Kentaro Kurihara et.al. | 2501.15875 | link |
2025-01-27 | LLM-attacker: Enhancing Closed-loop Adversarial Scenario Generation for Autonomous Driving with Large Language Models | Yuewen Mei et.al. | 2501.15850 | null |
2025-01-27 | SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model | Delin Qu et.al. | 2501.15830 | null |
2025-01-27 | Aging-aware CPU Core Management for Embodied Carbon Amortization in Cloud LLM Inference | Tharindu B. Hewage et.al. | 2501.15829 | link |
2025-01-27 | MADP: Multi-Agent Deductive Planning for Enhanced Cognitive-Behavioral Mental Health Question Answer | Qi Chen et.al. | 2501.15826 | null |
2025-01-27 | LemmaHead: RAG Assisted Proof Generation Using Large Language Models | Tianbo Yang et.al. | 2501.15797 | null |
2025-01-27 | Can Multimodal Large Language Models be Guided to Improve Industrial Anomaly Detection? | Zhiling Chen et.al. | 2501.15795 | null |
2025-01-27 | Harnessing Diverse Perspectives: A Multi-Agent Framework for Enhanced Error Detection in Knowledge Graphs | Yu Li et.al. | 2501.15791 | link |
2025-01-27 | Memorization and Regularization in Generative Diffusion Models | Ricardo Baptista et.al. | 2501.15785 | link |
2025-01-27 | Large Language Models to Diffusion Finetuning | Edoardo Cetin et.al. | 2501.15781 | null |
2025-01-27 | Is It Navajo? Accurate Language Detection in Endangered Athabaskan Languages | Ivory Yang et.al. | 2501.15773 | link |
2025-01-27 | GraphICL: Unlocking Graph Learning Potential in LLMs through Structured Prompt Design | Yuanfu Sun et.al. | 2501.15755 | null |
2025-01-27 | IndicMMLU-Pro: Benchmarking the Indic Large Language Models | Sankalp KJ et.al. | 2501.15747 | null |
2025-01-27 | Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and Reasoning | Michael Xieyang Liu et.al. | 2501.15727 | null |
2025-01-27 | A Survey on Computational Pathology Foundation Models: Datasets, Adaptation Strategies, and Evaluation Tasks | Dong Li et.al. | 2501.15724 | null |
2025-01-27 | On Parallelism in Music and Language: A Perspective from Symbol Emergence Systems based on Probabilistic Generative Models | Tadahiro Taniguchi et.al. | 2501.15721 | null |
2025-01-26 | Adapting Biomedical Abstracts into Plain language using Large Language Models | Haritha Gangavarapu et.al. | 2501.15700 | null |
2025-01-26 | TensorLLM: Tensorising Multi-Head Attention for Enhanced Reasoning and Compression in LLMs | Yuxuan Gu et.al. | 2501.15674 | null |
2025-01-26 | Bringing Characters to New Stories: Training-Free Theme-Specific Image Generation via Dynamic Visual Prompting | Yuxin Zhang et.al. | 2501.15641 | null |
2025-01-26 | BoKDiff: Best-of-K Diffusion Alignment for Target-Specific 3D Molecule Generation | Ali Khodabandeh Yalabadi et.al. | 2501.15631 | link |
2025-01-26 | Improving Estonian Text Simplification through Pretrained Language Models and Custom Datasets | Eduard Barbu et.al. | 2501.15624 | null |
2025-01-26 | Rethinking External Slow-Thinking: From Snowball Errors to Probability of Correct Reasoning | Zeyu Gan et.al. | 2501.15602 | link |
2025-01-26 | Evaluating an LLM-Powered Chatbot for Cognitive Restructuring: Insights from Mental Health Professionals | Yinzhou Wang et.al. | 2501.15599 | null |
2025-01-26 | Diffusion Generative Modeling for Spatially Resolved Gene Expression Inference from Histology Images | Sichen Zhu et.al. | 2501.15598 | link |
2025-01-26 | SedarEval: Automated Evaluation using Self-Adaptive Rubrics | Zhiyuan Fan et.al. | 2501.15595 | link |
2025-01-26 | SCP-116K: A High-Quality Problem-Solution Dataset and a Generalized Pipeline for Automated Extraction in the Higher Education Science Domain | Dakuan Lu et.al. | 2501.15587 | link |
2025-01-26 | Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework | Yuhong Sun et.al. | 2501.15581 | null |
2025-01-26 | Instruction Tuning for Story Understanding and Generation with Weak Supervision | Yangshu Yuan et.al. | 2501.15574 | null |
2025-01-26 | Cross-Cultural Fashion Design via Interactive Large Language Models and Diffusion Models | Spencer Ramsey et.al. | 2501.15571 | null |
2025-01-26 | ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer | Lin Yueyu et.al. | 2501.15570 | link |
2025-01-26 | Ocean-OCR: Towards General OCR Application via a Vision-Language Model | Song Chen et.al. | 2501.15558 | link |
2025-01-26 | Advancing Generative Artificial Intelligence and Large Language Models for Demand Side Management with Electric Vehicles | Hanwen Zhang et.al. | 2501.15544 | null |
2025-01-26 | Estimating Committor Functions via Deep Adaptive Sampling on Rare Transition Paths | Yueyang Wang et.al. | 2501.15522 | null |
2025-01-26 | Domain Adaptation from Generated Multi-Weather Images for Unsupervised Maritime Object Classification | Dan Song et.al. | 2501.15503 | null |
2025-01-26 | Unveiling the Potential of Multimodal Retrieval Augmented Generation with Planning | Xiaohan Yu et.al. | 2501.15470 | null |
2025-01-26 | Data-adaptive Safety Rules for Training Reward Models | Xiaomin Li et.al. | 2501.15453 | null |
2025-01-26 | OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas | Xiaoyang Wang et.al. | 2501.15427 | null |
2025-01-26 | Visual Generation Without Guidance | Huayu Chen et.al. | 2501.15420 | link |
2025-01-26 | AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement | Junan Zhang et.al. | 2501.15417 | null |
2025-01-26 | The Potential of Large Language Models in Supply Chain Management: Advancing Decision-Making, Efficiency, and Innovation | Raha Aghaei et.al. | 2501.15411 | null |
2025-01-26 | Semantic Layered Embedding Diffusion in Large Language Models for Multi-Contextual Consistency | Irin Kabakum et.al. | 2501.15405 | null |
2025-01-26 | How Green are Neural Language Models? Analyzing Energy Consumption in Text Summarization Fine-tuning | Tohida Rehman et.al. | 2501.15398 | null |
2025-01-26 | Zero-Shot Interactive Text-to-Image Retrieval via Diffusion-Augmented Representations | Zijun Long et.al. | 2501.15379 | null |
2025-01-26 | How to Mitigate Information Loss in Knowledge Graphs for GraphRAG: Leveraging Triple Context Restoration and Query-Driven Feedback | Manzong Huang et.al. | 2501.15378 | null |
2025-01-26 | Evaluating the Effectiveness of XAI Techniques for Encoder-Based Language Models | Melkamu Abay Mersha et.al. | 2501.15374 | null |
2025-01-26 | Scaling Large Vision-Language Models for Enhanced Multimodal Comprehension In Biomedical Image Analysis | Robinson Umeike et.al. | 2501.15370 | null |
2025-01-26 | Decentralized Low-Rank Fine-Tuning of Large Language Models | Sajjad Ghiasvand et.al. | 2501.15361 | null |
2025-01-26 | Large Language Models as Theory of Mind Aware Generative Agents with Counterfactual Reflection | Bo Yang et.al. | 2501.15355 | null |
2025-01-25 | Fairness in LLM-Generated Surveys | Andrés Abeliuk et.al. | 2501.15351 | null |
2025-01-25 | Between Puppet and Actor: Reframing Authorship in this Age of AI Agents | Yuqian Sun et.al. | 2501.15346 | null |
2025-01-25 | Recognize Any Surgical Object: Unleashing the Power of Weakly-Supervised Data | Jiajie Li et.al. | 2501.15326 | null |
2025-01-25 | ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning | Shangqian Gao et.al. | 2501.15316 | null |
2025-01-25 | The Multicultural Medical Assistant: Can LLMs Improve Medical ASR Errors Across Borders? | Ayo Adedeji et.al. | 2501.15310 | null |
2025-01-25 | You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning | Ayan Sengupta et.al. | 2501.15296 | null |
2025-01-24 | HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation | Xin Zhou et.al. | 2501.14729 | link |
2025-01-24 | Do LLMs Provide Consistent Answers to Health-Related Questions across Languages? | Ipek Baris Schlicht et.al. | 2501.14719 | null |
2025-01-24 | Towards Better Understanding Table Instruction Tuning: Decoupling the Effects from Data versus Models | Naihao Deng et.al. | 2501.14717 | null |
2025-01-24 | FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing | James Seale Smith et.al. | 2501.14713 | null |
2025-01-24 | The Karp Dataset | Mason DiCicco et.al. | 2501.14705 | null |
2025-01-24 | Rethinking Table Instruction Tuning | Naihao Deng et.al. | 2501.14693 | null |
2025-01-24 | Rethinking Foundation Models for Medical Image Classification through a Benchmark Study on MedMNIST | Fuping Wu et.al. | 2501.14685 | null |
2025-01-24 | An Empirical Study on LLM-based Classification of Requirements-related Provisions in Food-safety Regulations | Shabnam Hassani et.al. | 2501.14683 | null |
2025-01-24 | Diffusion based Text-to-Music Generationwith Global and Local Text based Conditioning | Jisi Zhang et.al. | 2501.14680 | null |
2025-01-24 | MedAgentBench: Dataset for Benchmarking LLMs as Agents in Medical Applications | Yixing Jiang et.al. | 2501.14654 | link |
2025-01-24 | Investigating the (De)Composition Capabilities of Large Language Models in Natural-to-Formal Language Conversion | Ziyao Xu et.al. | 2501.14649 | link |
2025-01-24 | Towards Scalable Topological Regularizers | Hiu-Tung Wong et.al. | 2501.14641 | null |
2025-01-24 | Recommending Actionable Strategies: A Semantic Approach to Integrating Analytical Frameworks with Decision Heuristics | Renato Ghisellini et.al. | 2501.14634 | null |
2025-01-24 | Extracting Problem Structure with LLMs for Optimized SAT Local Search | André Schilder et.al. | 2501.14630 | null |
2025-01-24 | Single-neuron deep generative model uncovers underlying physics of neuronal activity in Ca imaging data | Jordi Abante et.al. | 2501.14615 | null |
2025-01-24 | ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations | Tianming Liang et.al. | 2501.14607 | null |
2025-01-24 | Leveraging ChatGPT's Multimodal Vision Capabilities to Rank Satellite Images by Poverty Level: Advancing Tools for Social Science Research | Hamid Sarmadi et.al. | 2501.14546 | null |
2025-01-24 | VERUS-LM: a Versatile Framework for Combining LLMs with Symbolic Reasoning | Benjamin Callewaert et.al. | 2501.14540 | null |
2025-01-24 | Design and Implementation of a Psychiatry Resident Training System Based on Large Language Models | Zhenguang Zhong et.al. | 2501.14530 | link |
2025-01-24 | Scene Understanding Enabled Semantic Communication with Open Channel Coding | Zhe Xiang et.al. | 2501.14520 | null |
2025-01-24 | Real-world Edge Neural Network Implementations Leak Private Interactions Through Physical Side Channel | Zhuoran Liu et.al. | 2501.14512 | null |
2025-01-24 | Automated Assignment Grading with Large Language Models: Insights From a Bioinformatics Course | Pavlin G. Poličar et.al. | 2501.14499 | null |
2025-01-24 | Evaluating and Improving Graph to Text Generation with Large Language Models | Jie He et.al. | 2501.14497 | link |
2025-01-24 | RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques | Zhengyang Tang et.al. | 2501.14492 | link |
2025-01-24 | Pesti-Gen: Unleashing a Generative Molecule Approach for Toxicity Aware Pesticide Design | Taehan Kim et.al. | 2501.14469 | null |
2025-01-24 | Boundary Value Test Input Generation Using Prompt Engineering with LLMs: Fault Detection and Coverage Analysis | Xiujing Guo et.al. | 2501.14465 | null |
2025-01-24 | Understanding and Mitigating Gender Bias in LLMs via Interpretable Neuron Editing | Zeping Yu et.al. | 2501.14457 | null |
2025-01-24 | Domaino1s: Guiding LLM Reasoning for Explainable Answers in High-Stakes Domains | Xu Chu et.al. | 2501.14431 | null |
2025-01-24 | GraphBC: Improving LLMs for Better Graph Data Processing | Xu Chu et.al. | 2501.14427 | null |
2025-01-24 | CENTS: Generating synthetic electricity consumption time series for rare and unseen scenarios | Michael Fuest et.al. | 2501.14426 | null |
2025-01-24 | DeepFlow: Serverless Large Language Model Serving at Scale | Junhao Hu et.al. | 2501.14417 | null |
2025-01-24 | SKIL: Semantic Keypoint Imitation Learning for Generalizable Data-efficient Manipulation | Shengjie Wang et.al. | 2501.14400 | null |
2025-01-24 | ECTIL: Label-efficient Computational Tumour Infiltrating Lymphocyte (TIL) assessment in breast cancer: Multicentre validation in 2,340 patients with breast cancer | Yoni Schirris et.al. | 2501.14379 | link |
2025-01-24 | DRESSing Up LLM: Efficient Stylized Question-Answering via Style Subspace Editing | Xinyu Ma et.al. | 2501.14371 | link |
2025-01-24 | Uncovering the bias in the evidence for dynamical dark energy through minimal and generalized modeling approaches | Ziad Sakr et.al. | 2501.14366 | null |
2025-01-24 | FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration | Kai-Tuo Xu et.al. | 2501.14350 | link |
2025-01-24 | Chain-of-Retrieval Augmented Generation | Liang Wang et.al. | 2501.14342 | null |
2025-01-24 | Exploring the sustainable scaling of AI dilemma: A projective study of corporations' AI environmental impacts | Clément Desroches et.al. | 2501.14334 | null |
2025-01-24 | Assessing Large Language Models in Comprehending and Verifying Concurrent Programs across Memory Models | Ridhi Jain et.al. | 2501.14326 | null |
2025-01-24 | PAID: A Framework of Product-Centric Advertising Image Design | Hongyu Chen et.al. | 2501.14316 | null |
2025-01-24 | Locality-aware Fair Scheduling in LLM Serving | Shiyi Cao et.al. | 2501.14312 | null |
2025-01-24 | A Zero-Shot LLM Framework for Automatic Assignment Grading in Higher Education | Calvin Yeung et.al. | 2501.14305 | link |
2025-01-24 | MASTER: A Multi-Agent System with LLM Specialized MCTS | Bingzheng Gan et.al. | 2501.14304 | null |
2025-01-24 | Fast Think-on-Graph: Wider, Deeper and Faster Reasoning of Large Language Model on Knowledge Graph | Xujian Liang et.al. | 2501.14300 | link |
2025-01-24 | Multi-stage Large Language Model Pipelines Can Outperform GPT-4o in Relevance Assessment | Julian A. Schnabel et.al. | 2501.14296 | null |
2025-01-24 | Examining Alignment of Large Language Models through Representative Heuristics: The Case of Political Stereotypes | Sullam Jeoung et.al. | 2501.14294 | link |
2025-01-24 | Advances in Temporal Point Processes: Bayesian, Deep, and LLM Approaches | Feng Zhou et.al. | 2501.14291 | null |
2025-01-24 | Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation | Sadegh Mahdavi et.al. | 2501.14275 | link |
2025-01-24 | Siren: A Learning-Based Multi-Turn Attack Framework for Simulating Real-World Human Jailbreak Behaviors | Yi Zhao et.al. | 2501.14250 | link |
2025-01-24 | Humanity's Last Exam | Long Phan et.al. | 2501.14249 | null |
2025-01-24 | Multi-agent KTO: Reinforcing Strategic Interactions of Large Language Model in Language Game | Rong Ye et.al. | 2501.14225 | null |
2025-01-24 | Top Ten Challenges Towards Agentic Neural Graph Databases | Jiaxin Bai et.al. | 2501.14224 | null |
2025-01-24 | TFG-Flow: Training-free Guidance in Multimodal Generative Flow | Haowei Lin et.al. | 2501.14216 | null |
2025-01-24 | Serving Long-Context LLMs at the Mobile Edge: Test-Time Reinforcement Learning-based Model Caching and Inference Offloading | Minrui Xu et.al. | 2501.14205 | null |
2025-01-24 | VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking | Runyi Hu et.al. | 2501.14195 | link |
2025-01-24 | Distributed Multi-Agent Coordination Using Multi-Modal Foundation Models | Saaduddin Mahmud et.al. | 2501.14189 | null |
2025-01-24 | GeoSim.AI: AI assistants for numerical simulations in geomechanics | Yared W. Bekele et.al. | 2501.14186 | null |
2025-01-24 | AI Chatbots as Professional Service Agents: Developing a Professional Identity | Wenwen Li et.al. | 2501.14179 | null |
2025-01-24 | Argos: Agentic Time-Series Anomaly Detection with Autonomous Rule Generation via Large Language Models | Yile Gu et.al. | 2501.14170 | null |
2025-01-24 | Test-Time Code-Switching for Cross-lingual Aspect Sentiment Triplet Extraction | Dongming Sheng et.al. | 2501.14144 | null |
2025-01-23 | Autonomous Structural Memory Manipulation for Large Language Models Using Hierarchical Embedding Augmentation | Derek Yotheringhay et.al. | 2501.14119 | null |
2025-01-23 | Domain-Factored Untrained Deep Prior for Spectrum Cartography | Subash Timilsina et.al. | 2501.14116 | null |
2025-01-23 | MedSlice: Fine-Tuned Large Language Models for Secure Clinical Note Sectioning | Joshua Davis et.al. | 2501.14105 | link |
2025-01-23 | StreamingRAG: Real-time Contextual Retrieval and Generation Framework | Murugan Sankaradas et.al. | 2501.14101 | null |
2025-01-23 | Enhancing Biomedical Relation Extraction with Directionality | Po-Ting Lai et.al. | 2501.14079 | link |
2025-01-23 | LLMs are Vulnerable to Malicious Prompts Disguised as Scientific Language | Yubin Ge et.al. | 2501.14073 | null |
2025-01-23 | Efficient 2D CT Foundation Model for Contrast Phase Classification | Benjamin Hou et.al. | 2501.14066 | null |
2025-01-23 | Revisiting CLIP: Efficient Alignment of 3D MRI and Tabular Data using Domain-Specific Foundation Models | Jakob Krogh Petersen et.al. | 2501.14051 | link |
2025-01-23 | LLM-guided Instance-level Image Manipulation with Diffusion U-Net Cross-Attention Maps | Andrey Palaev et.al. | 2501.14046 | link |
2025-01-23 | Leveraging Large Language Models to Analyze Emotional and Contextual Drivers of Teen Substance Use in Online Discussions | Jianfeng Zhu et.al. | 2501.14037 | null |
2025-01-23 | CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation | Guofeng Cui et.al. | 2501.13927 | null |
2025-01-23 | Improving Video Generation with Human Feedback | Jie Liu et.al. | 2501.13918 | null |
2025-01-23 | Binary Diffusion Probabilistic Model | Vitaliy Kinakh et.al. | 2501.13915 | null |
2025-01-23 | Analysis of Indic Language Capabilities in LLMs | Aatman Vaidya et.al. | 2501.13912 | null |
2025-01-23 | Privacy-Preserving Personalized Federated Prompt Learning for Multimodal Large Language Models | Linh Tran et.al. | 2501.13904 | null |
2025-01-23 | Exploring Finetuned Audio-LLM on Heart Murmur Features | Adrian Florea et.al. | 2501.13884 | null |
2025-01-23 | The machine learning platform for developers of large systems | Alexey Naikov et.al. | 2501.13881 | null |
2025-01-23 | A RAG-Based Institutional Assistant | Gustavo Kuratomi et.al. | 2501.13880 | null |
2025-01-23 | On the Reasoning Capacity of AI Models and How to Quantify It | Santosh Kumar Radha et.al. | 2501.13833 | null |
2025-01-23 | Predicting Compact Phrasal Rewrites with Large Language Models for ASR Post Editing | Hao Zhang et.al. | 2501.13831 | null |
2025-01-23 | Hallucinations Can Improve Large Language Models in Drug Discovery | Shuzhou Yuan et.al. | 2501.13824 | null |
2025-01-23 | Large Language Model driven Policy Exploration for Recommender Systems | Jie Wang et.al. | 2501.13816 | null |
2025-01-23 | Enhancing LLMs for Governance with Human Oversight: Evaluating and Aligning LLMs on Expert Classification of Climate Misinformation for Detecting False or Misleading Claims about Climate Change | Mowafak Allaham et.al. | 2501.13802 | null |
2025-01-23 | Parameter-Efficient Fine-Tuning for Foundation Models | Dan Zhang et.al. | 2501.13787 | link |
2025-01-23 | Not Every AI Problem is a Data Problem: We Should Be Intentional About Data Scaling | Tanya Rodchenko et.al. | 2501.13779 | null |
2025-01-23 | Explainable XR: Understanding User Behaviors of XR Environments using LLM-assisted Analytics Framework | Yoonsang Kim et.al. | 2501.13778 | link |
2025-01-23 | Do Large Language Models Truly Understand Geometric Structures? | Xiaofeng Wang et.al. | 2501.13773 | link |
2025-01-23 | Tune In, Act Up: Exploring the Impact of Audio Modality-Specific Edits on Large Audio Language Models in Jailbreak | Erjia Xiao et.al. | 2501.13772 | null |
2025-01-23 | UGMathBench: A Diverse and Dynamic Benchmark for Undergraduate-Level Mathematical Reasoning with Large Language Models | Xin Xu et.al. | 2501.13766 | null |
2025-01-23 | EICopilot: Search and Explore Enterprise Information over Large-scale Knowledge Graphs with LLM-driven Agents | Yuhui Yun et.al. | 2501.13746 | null |
2025-01-23 | GPT-HTree: A Decision Tree Framework Integrating Hierarchical Clustering and Large Language Models for Explainable Classification | Te Pei et.al. | 2501.13743 | null |
2025-01-23 | An Empirical Study of Retrieval-Augmented Code Generation: Challenges and Opportunities | Zezhou Yang et.al. | 2501.13742 | link |
2025-01-23 | Pseudocode-Injection Magic: Enabling LLMs to Tackle Graph Computational Tasks | Chang Gong et.al. | 2501.13731 | null |
2025-01-23 | RPO: Retrieval Preference Optimization for Robust Retrieval-Augmented Generation | Shi-Qi Yan et.al. | 2501.13726 | null |
2025-01-23 | Musical ethnocentrism in Large Language Models | Anna Kruspe et.al. | 2501.13720 | null |
2025-01-23 | A Mutual Information Perspective on Multiple Latent Variable Generative Models for Positive View Generation | Dario Serez et.al. | 2501.13718 | null |
2025-01-23 | EventVL: Understand Event Streams via Multimodal Large Language Model | Pengteng Li et.al. | 2501.13707 | null |
2025-01-23 | DI-BENCH: Benchmarking Large Language Models on Dependency Inference with Testable Repositories at Scale | Linghao Zhang et.al. | 2501.13699 | null |
2025-01-23 | Question Answering on Patient Medical Records with Private Fine-Tuned LLMs | Sara Kothari et.al. | 2501.13687 | null |
2025-01-23 | HumorReject: Decoupling LLM Safety from Refusal Prefix via A Little Humor | Zihui Wu et.al. | 2501.13677 | link |
2025-01-23 | How to Complete Domain Tuning while Keeping General Ability in LLM: Adaptive Layer-wise and Element-wise Regularization | Shezheng Song et.al. | 2501.13669 | null |
2025-01-23 | LVPruning: An Effective yet Simple Language-Guided Vision Token Pruning Approach for Multi-modal Large Language Models | Yizheng Sun et.al. | 2501.13652 | null |
2025-01-23 | Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models | Zhenghao Lin et.al. | 2501.13629 | null |
2025-01-23 | Text-to-SQL based on Large Language Models and Database Keyword Search | Eduardo R. Nascimento et.al. | 2501.13594 | null |
2025-01-23 | Improving Contextual Faithfulness of Large Language Models via Retrieval Heads-Induced Optimization | Lei Huang et.al. | 2501.13573 | null |
2025-01-23 | One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt | Tao Liu et.al. | 2501.13554 | link |
2025-01-23 | LLMs Can Plan Only If We Tell Them | Bilgehan Sel et.al. | 2501.13545 | null |
2025-01-23 | ReasVQA: Advancing VideoQA with Imperfect Reasoning Process | Jianxin Liang et.al. | 2501.13536 | null |
2025-01-23 | RECALL: Library-Like Behavior In Language Models is Enhanced by Self-Referencing Causal Cycles | Munachiso Nwadike et.al. | 2501.13491 | link |
2025-01-23 | Adaptive Testing for LLM-Based Applications: A Diversity-based Approach | Juyeon Yoon et.al. | 2501.13480 | null |
2025-01-23 | LDR-Net: A Novel Framework for AI-generated Image Detection via Localized Discrepancy Representation | JiaXin Chen et.al. | 2501.13475 | null |
2025-01-23 | Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge | Haomiao Xiong et.al. | 2501.13468 | link |
2025-01-23 | Spurious Forgetting in Continual Learning of Language Models | Junhao Zheng et.al. | 2501.13453 | link |
2025-01-23 | Softplus Attention with Re-weighting Boosts Length Extrapolation in Large Language Models | Bo Gao et.al. | 2501.13428 | null |
2025-01-23 | Predicting Turbulence Structure In Street-Canyon Flows using Deep Generative Modeling | Tomek Jaroslawski et.al. | 2501.13415 | null |
2025-01-23 | VulnBot: Autonomous Penetration Testing for A Multi-Agent Collaborative Framework | He Kong et.al. | 2501.13411 | link |
2025-01-23 | Towards Intelligent Design: A Self-driven Framework for Collocated Clothing Synthesis Leveraging Fashion Styles and Textures | Minglong Dong et.al. | 2501.13396 | null |
2025-01-23 | Can Large Language Models Understand Preferences in Personalized Recommendation? | Zhaoxuan Tan et.al. | 2501.13391 | link |
2025-01-23 | Do as We Do, Not as You Think: the Conformity of Large Language Models | Zhiyuan Weng et.al. | 2501.13381 | link |
2025-01-23 | Scalable Evaluation Framework for Foundation Models in Musculoskeletal MRI Bridging Computational Innovation with Clinical Utility | Gabrielle Hoyer et.al. | 2501.13376 | null |
2025-01-23 | Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement | Jae-Sung Bae et.al. | 2501.13372 | null |
2025-01-23 | Meta-Feature Adapter: Integrating Environmental Metadata for Enhanced Animal Re-identification | Yuzhuo Li et.al. | 2501.13368 | null |
2025-01-23 | 50 Shades of Deceptive Patterns: A Unified Taxonomy, Multimodal Detection, and Security Implications | Zewei Shi et.al. | 2501.13351 | link |
2025-01-23 | MSF: Efficient Diffusion Model Via Multi-Scale Latent Factorize | Haohang Xu et.al. | 2501.13349 | null |
2025-01-23 | Full-Stack Optimized Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation | Rong Shan et.al. | 2501.13344 | null |
2025-01-23 | Multi-aspect Knowledge Distillation with Large Language Model | Taegyeong Lee et.al. | 2501.13341 | link |
2025-01-23 | Generative Multi-Form Bayesian Optimization | Zhendong Guo et.al. | 2501.13337 | null |
2025-01-23 | SplitLLM: Hierarchical Split Learning for Large Language Model over Wireless Network | Songge Zhang et.al. | 2501.13318 | null |
2025-01-23 | Representing Visualization Insights as a Dense Insight Network | Jane Hoffswell et.al. | 2501.13309 | null |
2025-01-23 | OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia | Xuelong Geng et.al. | 2501.13306 | link |
2025-01-23 | Watching the AI Watchdogs: A Fairness and Robustness Analysis of AI Safety Moderation Classifiers | Akshit Achara et.al. | 2501.13302 | link |
2025-01-23 | Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents | Shrinidhi Kumbhar et.al. | 2501.13299 | null |
2025-01-23 | RAMQA: A Unified Framework for Retrieval-Augmented Multi-Modal Question Answering | Yang Bai et.al. | 2501.13297 | link |
2025-01-23 | Toyteller: AI-powered Visual Storytelling Through Toy-Playing with Character Symbols | John Joon Young Chung et.al. | 2501.13284 | null |
2025-01-22 | MEDFORM: A Foundation Model for Contrastive Learning of CT Imaging and Clinical Numeric Data in Multi-Cancer Analysis | Daeun Jung et.al. | 2501.13277 | link |
2025-01-22 | RAG-Reward: Optimizing RAG with Reward Modeling and RLHF | Hanning Zhang et.al. | 2501.13264 | null |
2025-01-22 | Exploring GPT's Ability as a Judge in Music Understanding | Kun Fang et.al. | 2501.13261 | link |
2025-01-22 | Bypassing Array Canaries via Autonomous Function Call Resolution | Nathaniel Oh et.al. | 2501.13256 | link |
2025-01-22 | S-LoRA: Scalable Low-Rank Adaptation for Class Incremental Learning | Yichen Wu et.al. | 2501.13198 | null |
2025-01-22 | Computational modelling of biological systems now and then: revisiting tools and visions from the beginning of the century | Axel Loewe et.al. | 2501.13142 | null |
2025-01-23 | VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding | Boqiang Zhang et.al. | 2501.13106 | link |
2025-01-22 | Robust Representation Consistency Model via Contrastive Denoising | Jiachen Lei et.al. | 2501.13094 | link |
2025-01-22 | Refining Input Guardrails: Enhancing LLM-as-a-Judge Efficiency Through Chain-of-Thought Fine-Tuning and Alignment | Melissa Kazemi Rad et.al. | 2501.13080 | null |
2025-01-22 | Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning | Bohao Yang et.al. | 2501.13042 | link |
2025-01-22 | Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament | Yantao Liu et.al. | 2501.13007 | link |
2025-01-22 | Neural network enhanced cross entropy benchmark for monitored circuits | Yangrui Hu et.al. | 2501.13005 | null |
2025-01-22 | Large Language Model-Based Semantic Communication System for Image Transmission | Soheyb Ribouh et.al. | 2501.12988 | null |
2025-01-22 | LLM4WM: Adapting LLM for Wireless Multi-Tasking | Xuanyu Liu et.al. | 2501.12983 | null |
2025-01-22 | Low-dimensional adaptation of diffusion models: Convergence in total variation | Jiadong Liang et.al. | 2501.12982 | null |
2025-01-22 | OnionEval: An Unified Evaluation of Fact-conflicting Hallucination for Small-Large Language Models | Chongren Sun et.al. | 2501.12975 | link |
2025-01-22 | Accessible Smart Contracts Verification: Synthesizing Formal Models with Tamed LLMs | Jan Corazza et.al. | 2501.12972 | null |
2025-01-22 | It's complicated. The relationship of algorithmic fairness and non-discrimination regulations in the EU AI Act | Kristof Meding et.al. | 2501.12962 | null |
2025-01-22 | Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference | Weizhi Fei et.al. | 2501.12959 | null |
2025-01-22 | GANQ: GPU-Adaptive Non-Uniform Quantization for Large Language Models | Pengxiang Zhao et.al. | 2501.12956 | null |
2025-01-22 | 3D Object Manipulation in a Single Image using Generative Models | Ruisi Zhao et.al. | 2501.12935 | null |
2025-01-22 | Correctness Assessment of Code Generated by Large Language Models Using Internal Representations | Tuan-Dung Bui et.al. | 2501.12934 | link |
2025-01-22 | DynamicEarth: How Far are We from Open-Vocabulary Change Detection? | Kaiyu Li et.al. | 2501.12931 | null |
2025-01-22 | A Functional Software Reference Architecture for LLM-Integrated Systems | Alessio Bucaioni et.al. | 2501.12904 | null |
2025-01-22 | Architectural Fusion Through Contextual Partitioning in Large Language Models: A Novel Approach to Parameterized Knowledge Integration | Offa Kingsleigh et.al. | 2501.12901 | null |
2025-01-22 | Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback | Yafu Li et.al. | 2501.12895 | link |
2025-01-23 | Generative AI Misuse Potential in Cyber Security Education: A Case Study of a UK Degree Program | Carlton Shepherd et.al. | 2501.12883 | null |
2025-01-22 | WisdomBot: Tuning Large Language Models with Artificial Intelligence Knowledge | Jingyuan Chen et.al. | 2501.12877 | null |
2025-01-22 | ACEBench: Who Wins the Match Point in Tool Learning? | Chen Chen et.al. | 2501.12851 | null |
2025-01-22 | AMM-Diff: Adaptive Multi-Modality Diffusion Network for Missing Modality Imputation | Aghiles Kebaili et.al. | 2501.12840 | null |
2025-01-22 | Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home | Viktor Moskvoretskii et.al. | 2501.12835 | null |
2025-01-22 | Open or Closed LLM for Lesser-Resourced Languages? Lessons from Greek | John Pavlopoulos et.al. | 2501.12826 | link |
2025-01-22 | Enhancing Monocular Depth Estimation with Multi-Source Auxiliary Tasks | Alessio Quercia et.al. | 2501.12824 | null |
2025-01-22 | Certified Guidance for Planning with Deep Generative Models | Francesco Giacomarra et.al. | 2501.12815 | null |
2025-01-22 | Revisit Self-Debugging with Self-Generated Tests for Code Generation | Xiancai Chen et.al. | 2501.12793 | null |
2025-01-22 | LLMs as Repositories of Factual Knowledge: Limitations and Solutions | Seyed Mahed Mousavi et.al. | 2501.12774 | null |
2025-01-22 | NExtLong: Toward Effective Long-Context Training without Long Documents | Chaochen Gao et.al. | 2501.12766 | link |
2025-01-22 | Online Preference Alignment for Language Models via Count-based Exploration | Chenjia Bai et.al. | 2501.12735 | link |
2025-01-22 | Paradigm-Based Automatic HDL Code Generation Using LLMs | Wenhao Sun et.al. | 2501.12702 | null |
2025-01-22 | Training Dialogue Systems by AI Feedback for Improving Overall Dialogue Impression | Kai Yoshida et.al. | 2501.12698 | null |
2025-01-22 | Combining Knowledge Graph and LLMs for Enhanced Zero-shot Visual Question Answering | Qian Tao et.al. | 2501.12697 | null |
2025-01-22 | SoundSpring: Loss-Resilient Audio Transceiver with Dual-Functional Masked Language Modeling | Shengshi Yao et.al. | 2501.12696 | null |
2025-01-22 | EchoLM: Accelerating LLM Serving with Real-time Knowledge Distillation | Yifan Yu et.al. | 2501.12689 | null |
2025-01-22 | Distillation Quantification for Large Language Models | Sunbowen Lee et.al. | 2501.12619 | link |
2025-01-22 | Deep Learning-Based Identification of Inconsistent Method Names: How Far Are We? | Taiming Wang et.al. | 2501.12617 | null |
2025-01-22 | Kimi k1.5: Scaling Reinforcement Learning with LLMs | Kimi Team et.al. | 2501.12599 | null |
2025-01-22 | Leveraging LLMs to Create a Haptic Devices' Recommendation System | Yang Liu et.al. | 2501.12573 | null |
2025-01-22 | Understanding the LLM-ification of CHI: Unpacking the Impact of LLMs at CHI through a Systematic Literature Review | Rock Yuren Pang et.al. | 2501.12557 | link |
2025-01-21 | Human-like conceptual representations emerge from language prediction | Ningyu Xu et.al. | 2501.12547 | null |
2025-01-21 | How Does the Spatial Distribution of Pre-training Data Affect Geospatial Foundation Models? | Mirali Purohit et.al. | 2501.12535 | null |
2025-01-21 | An Empirically-grounded tool for Automatic Prompt Linting and Repair: A Case Study on Bias, Vulnerability, and Optimization in Developer Prompts | Dhia Elhaq Rzig et.al. | 2501.12521 | null |
2025-01-21 | A Domain Adaptation Framework for Speech Recognition Systems with Only Synthetic data | Minh Tran et.al. | 2501.12501 | null |
2025-01-21 | The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws | Tian Jin et.al. | 2501.12486 | null |
2025-01-21 | An Empirical Characterization of Outages and Incidents in Public Services for Large Language Models | Xiaoyu Chu et.al. | 2501.12469 | link |
2025-01-21 | Adaptive PII Mitigation Framework for Large Language Models | Shubhi Asthana et.al. | 2501.12465 | null |
2025-01-21 | Empowering AIOps: Leveraging Large Language Models for IT Operations ManagementOperations Management | Arthur Vitui et.al. | 2501.12461 | link |
2025-01-21 | Deploying Privacy Guardrails for LLMs: A Comparative Analysis of Real-World Applications | Shubhi Asthana et.al. | 2501.12456 | null |
2025-01-21 | Divide-Then-Aggregate: An Efficient Tool Learning Method via Parallel Tool Invocation | Dongsheng Zhu et.al. | 2501.12432 | null |
2025-01-21 | FREYR: A Framework for Recognizing and Executing Your Requests | Roberto Gallotta et.al. | 2501.12423 | link |
2025-01-21 | CroMe: Multimodal Fake News Detection using Cross-Modal Tri-Transformer and Metric Learning | Eunjee Choi et.al. | 2501.12422 | null |
2025-01-22 | InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling | Yi Wang et.al. | 2501.12386 | link |
2025-01-21 | Accelerating Pulsar Parameter Estimation Using Convolutional Neural Networks | Greg Olmschenk et.al. | 2501.12383 | null |
2025-01-21 | MMVU: Measuring Expert-Level Multi-Discipline Video Understanding | Yilun Zhao et.al. | 2501.12380 | link |
2025-01-22 | Video Depth Anything: Consistent Depth Estimation for Super-Long Videos | Sili Chen et.al. | 2501.12375 | null |
2025-01-21 | Expertise elevates AI usage: experimental evidence comparing laypeople and professional artists | Thomas F. Eisenmann et.al. | 2501.12374 | link |
2025-01-21 | Is Long Context All You Need? Leveraging LLM's Extended Context for NL2SQL | Yeounoh Chung et.al. | 2501.12372 | null |
2025-01-21 | Automatic Labelling with Open-source LLMs using Dynamic Label Schema Integration | Thomas Walshe et.al. | 2501.12332 | null |
2025-01-21 | Cinepro: Robust Training of Foundation Models for Cancer Detection in Prostate Ultrasound Cineloops | Mohamed Harmanani et.al. | 2501.12331 | link |
2025-01-21 | VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model | Xianwei Zhuang et.al. | 2501.12327 | link |
2025-01-21 | LLM-Assisted Knowledge Graph Completion for Curriculum and Domain Modelling in Personalized Higher Education Recommendations | Hasan Abu-Rasheed et.al. | 2501.12300 | null |
2025-01-21 | MoGERNN: An Inductive Traffic Predictor for Unobserved Locations in Dynamic Sensing Networks | Qishen Zhou et.al. | 2501.12281 | link |
2025-01-21 | Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement | Maosong Cao et.al. | 2501.12273 | link |
2025-01-21 | FOCUS: First Order Concentrated Updating Scheme | Yizhou Liu et.al. | 2501.12243 | null |
2025-01-21 | InsTALL: Context-aware Instructional Task Assistance with Multi-modal Large Language Models | Pha Nguyen et.al. | 2501.12231 | null |
2025-01-21 | CDW-CoT: Clustered Distance-Weighted Chain-of-Thoughts Reasoning | Yuanheng Fang et.al. | 2501.12226 | null |
2025-01-21 | Leveraging Large Language Models for Realizing Truly Intelligent User Interfaces | Allard Oelen et.al. | 2501.12221 | null |
2025-01-21 | You Can't Eat Your Cake and Have It Too: The Performance Degradation of LLMs with Jailbreak Defense | Wuyuao Mai et.al. | 2501.12210 | null |
2025-01-21 | Explainability for Vision Foundation Models: A Survey | Rémi Kazmierczak et.al. | 2501.12203 | null |
2025-01-22 | Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation | Zibo Zhao et.al. | 2501.12202 | link |
2025-01-21 | BiMarker: Enhancing Text Watermark Detection for Large Language Models with Bipolar Watermarks | Zhuang Li et.al. | 2501.12174 | null |
2025-01-21 | Contextualizing Recommendation Explanations with LLMs: A User Study | Yuanjun Feng et.al. | 2501.12152 | null |
2025-01-21 | Improving Influence-based Instruction Tuning Data Selection for Balanced Learning of Diverse Capabilities | Qirun Dai et.al. | 2501.12147 | null |
2025-01-21 | Do LLMs Provide Links to Code Similar to what they Generate? A Study with Gemini and Bing CoPilot | Daniele Bifolco et.al. | 2501.12134 | null |
2025-01-21 | Evaluating Efficiency and Engagement in Scripted and LLM-Enhanced Human-Robot Interactions | Tim Schreiter et.al. | 2501.12128 | null |
2025-01-21 | Can open source large language models be used for tumor documentation in Germany? -- An evaluation on urological doctors' notes | Stefan Lenz et.al. | 2501.12106 | link |
2025-01-21 | Dissecting the NVIDIA Hopper Architecture through Microbenchmarking and Multiple Level Analysis | Weile Luo et.al. | 2501.12084 | null |
2025-01-21 | Phishing Awareness via Game-Based Learning | Argianto Rahartomo et.al. | 2501.12077 | link |
2025-01-21 | PINNsAgent: Automated PDE Surrogation with Large Language Models | Qingpo Wuwu et.al. | 2501.12053 | null |
2025-01-21 | Harnessing Generative Pre-Trained Transformer for Datacenter Packet Trace Generation | Chen Griner et.al. | 2501.12033 | null |
2025-01-21 | Comparative Analysis of Pre-trained Deep Learning Models and DINOv2 for Cushing's Syndrome Diagnosis in Facial Analysis | Hongjun Liu et.al. | 2501.12023 | null |
2025-01-21 | Are Traditional Deep Learning Model Approaches as Effective as a Retinal-Specific Foundation Model for Ocular and Systemic Disease Detection? | Samantha Min Er Yew et.al. | 2501.12016 | null |
2025-01-21 | Rate-Aware Learned Speech Compression | Jun Xu et.al. | 2501.11999 | null |
2025-01-21 | Linear Feedback Control Systems for Iterative Prompt Optimization in Large Language Models | Rupesh Raj Karn et.al. | 2501.11979 | null |
2025-01-21 | Leveraging Graph Structures and Large Language Models for End-to-End Synthetic Task-Oriented Dialogues | Maya Medjad et.al. | 2501.11977 | link |
2025-01-21 | Bridging Visualization and Optimization: Multimodal Large Language Models on Graph-Structured Combinatorial Optimization | Jie Zhao et.al. | 2501.11968 | null |
2025-01-21 | A Hybrid Attention Framework for Fake News Detection with Large Language Models | Xiaochuan Xu et.al. | 2501.11967 | null |
2025-01-21 | **TAD-Bench: A Comprehensive Benchmark for Embedding-Based Text Anom |