Skip to content

ofirnachum/pyltr

This branch is 6 commits ahead of, 51 commits behind jma127/pyltr:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Ofir NachumOfir Nachum
Ofir Nachum
and
Ofir Nachum
Apr 27, 2016
a7c167c · Apr 27, 2016

History

45 Commits
Aug 25, 2015
Apr 27, 2016
Aug 24, 2015
Aug 18, 2015
Apr 24, 2016
Aug 24, 2015
Aug 24, 2015

Repository files navigation

pyltr

pyltr is a Python learning-to-rank toolkit with ranking models, evaluation metrics, data wrangling helpers, and more.

This software is licensed under the BSD 3-clause license (see LICENSE.txt).

Example

Import pyltr:

import pyltr

Import a LETOR dataset (e.g. MQ2007 ):

with open('train.txt') as trainfile, \
        open('vali.txt') as valifile, \
        open('test.txt') as evalfile:
    TX, Ty, Tqids, _ = pyltr.data.letor.read_dataset(trainfile)
    VX, Vy, Vqids, _ = pyltr.data.letor.read_dataset(valifile)
    EX, Ey, Eqids, _ = pyltr.data.letor.read_dataset(evalfile)

Train a LambdaMART model, using validation set for early stopping and trimming:

metric = pyltr.metrics.NDCG(k=10)

# Only needed if you want to perform validation (early stopping & trimming)
monitor = pyltr.models.monitors.ValidationMonitor(
    VX, Vy, Vqids, metric=metric, stop_after=250)

model = pyltr.models.LambdaMART(
    metric=metric,
    n_estimators=1000,
    learning_rate=0.02,
    max_features=0.5,
    query_subsample=0.5,
    max_leaf_nodes=10,
    min_samples_leaf=64,
    verbose=1,
)

model.fit(TX, Ty, Tqids, monitor=monitor)

Evaluate model on test data:

Epred = model.predict(EX)
print 'Random ranking:', metric.calc_mean_random(Eqids, Ey)
print 'Our model:', metric.calc_mean(Eqids, Ey, Epred)

Features

Below are some of the features currently implemented in pyltr.

Models

  • LambdaMART (pyltr.models.LambdaMART)
    • Validation & early stopping
    • Query subsampling

Metrics

  • (N)DCG (pyltr.metrics.DCG, pyltr.metrics.NDCG)
    • pow2 and identity gain functions
  • ERR (pyltr.metrics.ERR)
    • pow2 and identity gain functions
  • (M)AP (pyltr.metrics.AP)
  • Kendall's Tau (pyltr.metrics.KendallTau)
  • AUC-ROC -- Area under the ROC curve (pyltr.metrics.AUCROC)

Data Wrangling

  • Data loaders (e.g. pyltr.data.letor.read)
  • Query groupers and validators (pyltr.util.group.check_qids, pyltr.util.group.get_groups)

Running Tests

Use the run_tests.sh script to run all unit tests.

Building Docs

cd into the docs/ directory and run make html. Docs are generated in the docs/_build directory.

About

Python learning to rank (LTR) toolkit

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.9%
  • Shell 0.1%