contextualize
is a package to quickly retrieve and format file contents for use with LLMs.
You can install the package using pip:
pip install contextualize
or pipx for using the CLI globally:
pipx install contextualize
Define FileReference
objects for specified file paths and optional ranges.
- set
range
to a tuple of line numbers to include only a portion of the file, e.g.range=(1, 10)
- set
format
to "md" (default) or "xml" to wrap file contents in Markdown code blocks or<file>
tags - set
label
to "relative" (default), "name", or "ext" to determine what label is affixed to the enclosing Markdown/XML string- "relative" will use the relative path from the current working directory
- "name" will use the file name only
- "ext" will use the file extension only
Retrieve wrapped contents from the output
attribute.
A CLI (cli.py
) is provided to print file contents to the console from the command line.
-
cat
: Prepare and concatenate file referencespaths
: Positional arguments for target file(s) or directories--ignore
: File(s) to ignore (optional)--format
: Output format (md
,xml
, orshell
; default ismd
):shell
mimicscat
output in a live shell promptxml
encloses file contents in<file>
tagsmd
encloses file contents in triple backticks
--label
: Label style (relative
for relative file path,name
for file name only,ext
for file extension only; default isrelative
)--output
: Output target (console
(default),clipboard
)--output-file
: Output file path (optional, compatible with--output clipboard
)
-
ls
: List token countspaths
: Positional arguments for target file(s) or directories to process--openai-encoding
: OpenAI encoding to use for tokenization, e.g.,cl100k_base
(default),p50k_base
,r50k_base
--openai-model
: OpenAI model (e.g.,gpt-3.5-turbo
/gpt-4
(default),text-davinci-003
,code-davinci-002
) to determine which encoding to use for tokenization.--anthropic-model
: Anthropic model to use for token counting (e.g.,claude-3-5-sonnet-latest
)
-
cat
:contextualize cat README.md
will print the wrapped contents ofREADME.md
to the console with default settings (Markdown format, relative path label).contextualize cat README.md --format xml
will print the wrapped content ofREADME.md
to the console with XML format.contextualize cat README.md --format shell
will print the content as if a user is runningcat README.md
in a live shell prompt.contextualize cat contextualize/ dev/ README.md --format xml
will prepare file references for files in thecontextualize/
anddev/
directories andREADME.md
, and print each file’s contents (wrapped in corresponding XML tags) to the console.
-
ls
:contextualize ls README.md
will count and print the number of tokens inREADME.md
using the defaultcl100k_base
encoding, unlessANTHROPIC_API_KEY
is set, in which case the Anthropic token counting API will be used.contextualize ls contextualize/ --openai-model text-davinci-003
will count and print the number of tokens in each file in thecontextualize/
directory using thep50k_base
encoding associated with thetext-davinci-003
model, then print the total tokens for all processed files.