Pyterrier

A Python API for Terrier

Installation

Easiest way to get started with Pyterrier is to use one of our Colab notebooks - look for the badges below.

Linux or Google Colab

pip install --upgrade git+https://github.com/terrier-org/pyterrier.git#egg=python-terrier
You may need to set JAVA_HOME environment variable if Pyjnius cannot find your Java installation.

macOS

You need to hava Java installed. Pyjnius/PyTerrier will pick up the location automatically.
pip install --upgrade git+https://github.com/terrier-org/pyterrier.git#egg=python-terrier

Windows

Pyterrier is not available for Windows because pytrec_eval isn't available for Windows. If you can compile & install pytrec_eval youself, it should work fine.

Indexing

PyTerrier has a number of useful classes for creating indices:

You can create an index from TREC formatted collection using TRECCollectionIndexer.
For TXT, PDF, Microsoft Word files, etc files you can use FilesIndexer.
For Pandas Dataframe you can use DFIndexer.
For any abitrary iterable dictionaries, you can use IterDictIndexer.

See examples in the indexing notebook

Retrieval and Evaluation

topics = pt.io.read_topics(topicsFile)
qrels = pt.io.read_qrels(qrelsFile)
BM25_br = pt.BatchRetrieve(index, wmodel="BM25")
res = BM25_br.transform(topics)
pt.Utils.evaluate(res, qrels, metrics = ['map'])

There is a worked example in the retrieval and evaluation notebook

Experiment - Perform Retrieval and Evaluation with a single function

Pyterrier provides an experiment object, which allows to compare multiple retrieval approaches on the same queries & relevance assessments:

pt.Experiment([BM25_br, PL2_br], topics, qrels, ["map", "ndcg"])

There is a worked example in the experiment notebook

Pipelines

Pyterrier makes it easy to develop complex retrieval pipelines using Python operators such as >> to chain different retrieval components. Each retrieval approach is a transformer, having one key method, transform(), which takes a single Pandas dataframe as input, and returns another dataframe. Two examples might encapsulate applying the sequential dependence model, or a query expansion process:

sdm_bm25 = pt.rewrite.SDM() >> pt.BatchRetrieve(indexref, wmodel="BM25")
bo1_qe = BM25_br >> pt.rewrite.Bo1QueryExpansion() >> BM25_br

Our example pipelines show other common use cases. For more information, see the Pyterrier data model.

Learning to Rank

Complex learning to rank pipelines, including for learning-to-rank, can be constructed using Pyterrier's operator language. For example, to combine two features and make them available for learning, we can use the ** operator.

two_features = BM25_br >> ( \
  pt.BatchRetrieve(indexref, wmodel="DirichletLM") ** 
  pt.BatchRetrieve(indexref, wmodel="PL2") \
 )

There are several worked examples in the learning-to-rank notebook . Some pipelines can be automatically optimised - more detail about pipeline optimisation are included in our ICTIR 2020 paper.

Dataset API

Pyterrier allows simple access to standard information retrieval test collections through its dataset API, which can download the topics, qrels, corpus or, for some test collections, a ready-made Terrier index.

topics = pt.datasets.get_dataset("trec-robust-2004").get_topics()
qrels = pt.datasets.get_dataset("trec-robust-2004").get_qrels()
pt.Experiment([BM25_br, PL2_br], topics, qrels, eval_metrics)

You can use pt.datasets.list_datasets() to see available test collections - if your favourite test collection is missing, you can submit a Pull Request.

Index API

All of the standard Terrier Index API can be access easily from Pyterrier.

For instance, accessing term statistics is a single call on an index:

index.getLexicon()["circuit"].getDocumentFrequency()

There are lots of examples in the index API notebook

Open Source Licence

PyTerrier is subject to the terms detailed in the Mozilla Public License Version 2.0. The Mozilla Public License can be found in the file LICENSE.txt. By using this software, you have agreed to the licence.

Citation Licence

The source and binary forms of PyTerrier are subject to the following citation license:

By downloading and using PyTerrier, you agree to cite at the undernoted paper describing PyTerrier in any kind of material you produce where PyTerrier was used to conduct search or experimentation, whether be it a research paper, dissertation, article, poster, presentation, or documentation. By using this software, you have agreed to the citation licence.

Declarative Experimentation inInformation Retrieval using PyTerrier. Craig Macdonald and Nicola Tonellotto. In Proceedings of ICTIR 2020.

@inproceesings{
    author = {Craig Macdonald and Nicola Tonellotto},
    title = {Declarative Experimentation inInformation Retrieval using PyTerrier},
    booktitle = {Proceedings of ICTIR 2020},
    year = {2020}
}

Credits

Alex Tsolov, University of Glasgow
Craig Macdonald, University of Glasgow
Nicola Tonellotto, University of Pisa
Arthur Câmara, Delft University
Alberto Ueda, Federal University of Minas Gerais
Sean MacAvaney, Georgetown University

Name		Name	Last commit message	Last commit date
Latest commit History 477 Commits
.github/workflows		.github/workflows
docs		docs
examples/notebooks		examples/notebooks
pyterrier		pyterrier
terrier-python-helper		terrier-python-helper
tests		tests
.gitignore		.gitignore
.readthedocs-conda-environment.yml		.readthedocs-conda-environment.yml
.readthedocs.yml		.readthedocs.yml
LICENSE.txt		LICENSE.txt
README.md		README.md
datamodel.md		datamodel.md
pipeline_examples.md		pipeline_examples.md
pipelines.md		pipelines.md
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pyterrier

Installation

Linux or Google Colab

macOS

Windows

Indexing

Retrieval and Evaluation

Experiment - Perform Retrieval and Evaluation with a single function

Pipelines

Learning to Rank

Dataset API

Index API

Open Source Licence

Citation Licence

Credits

About

Uh oh!

Releases

Packages

Languages

License

jag2724/pyterrier

Folders and files

Latest commit

History

Repository files navigation

Pyterrier

Installation

Linux or Google Colab

macOS

Windows

Indexing

Retrieval and Evaluation

Experiment - Perform Retrieval and Evaluation with a single function

Pipelines

Learning to Rank

Dataset API

Index API

Open Source Licence

Citation Licence

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages