Skip to content

Commit d7bd03d

Browse files
Merge pull request #14 from ScalingIntelligence/archon
Create archon.md
2 parents 8e42dc5 + f46d5eb commit d7bd03d

File tree

1 file changed

+38
-0
lines changed

1 file changed

+38
-0
lines changed

_pubs/archon.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
---
2+
title: 'Archon: An Architecture Search Framework for Inference-Time Techniques'
3+
authors:
4+
- key: jonsaadfalcon
5+
- key: adriangamarralafuente
6+
- name: Shlok Natarajan
7+
affiliation: Stanford
8+
- name: Nahum Maru
9+
affiliation: Stanford
10+
- name: Hristo Todorov
11+
affiliation: Stanford
12+
- name: E. Kelly Buchanan
13+
affiliation: Stanford
14+
- name: Mayee Chen
15+
affiliation: Stanford
16+
- name: Neel Guha
17+
affiliation: Stanford
18+
- name: Christopher Ré
19+
affiliation: Stanford
20+
- key: azaliamirhoseini
21+
venue: preprint
22+
year: 2024
23+
has_pdf: true
24+
doi: 10.48550/arXiv.2407.21787
25+
tags:
26+
- machine learning
27+
- generative AI
28+
- inference-time techniques
29+
teaser: Archon, a modular framework for designing inference-time architectures, outperforms top language models like GPT-4 and Claude 3.5 on various benchmarks by optimally combining LLMs and inference techniques.
30+
materials:
31+
- name: Paper
32+
url: [https://arxiv.org/abs/2407.21787](https://arxiv.org/abs/2409.15254)
33+
type: file-pdf
34+
- name: Codebase
35+
url: [https://github.com/ScalingIntelligence/Archon](https://github.com/ScalingIntelligence/Archon)
36+
type: code
37+
---
38+
Inference-time techniques are emerging as highly effective tools to increase large language model (LLM) capabilities. However, there is still limited understanding of the best practices for developing systems that combine inference-time techniques with one or more LLMs, with challenges including: (1) effectively allocating inference compute budget, (2) understanding the interactions between different combinations of inference-time techniques and their impact on downstream performance, and 3) efficiently searching over the large space of model choices, inference-time techniques, and their compositions. To address these challenges, we introduce Archon, an automated framework for designing inference-time architectures. Archon defines an extensible design space, encompassing methods such as generation ensembling, multi-sampling, ranking, fusion, critiquing, verification, and unit testing. It then transforms the problem of selecting and combining LLMs and inference-time techniques into a hyperparameter optimization objective. To optimize this objective, we introduce automated Inference-Time Architecture Search (ITAS) algorithms. Given target benchmark(s), an inference compute budget, and available LLMs, ITAS outputs optimized architectures. We evaluate Archon architectures across a wide range of instruction-following and reasoning benchmarks, including MT-Bench, Arena-Hard-Auto, AlpacaEval 2.0, MixEval, MixEval Hard, MATH, and CodeContests. We show that automatically designed inference-time architectures by Archon outperform strong models such as GPT-4o and Claude 3.5 Sonnet on these benchmarks, achieving an average increase of 14.1 and 10.3 percentage points with all-source models and open-source models, respectively.

0 commit comments

Comments
 (0)