Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs Revamp: Models documentation #117

Merged
merged 10 commits into from
May 11, 2022
Binary file added docs/assets/hub/models-linked-datasets.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/hub/models-linked-spaces.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/hub/models-usage-modal.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/hub/models-usage.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
46 changes: 2 additions & 44 deletions docs/hub/_sections.yml
Original file line number Diff line number Diff line change
@@ -4,47 +4,5 @@
- local: repositories-main
title: Repositories

- local: main
title: Hub documentation

- local: model-repos
title: Everything you wanted to know about repos

- local: adding-a-model
title: Adding a model to the Hub

- local: libraries
title: Libraries

- local: inference
title: Inference

- local: endpoints
title: Hub API Endpoints

- local: spaces
title: Spaces documentation

- local: org-cards
title: Organization cards

- local: adding-a-library
title: Integrate a library with the Hub

- local: searching-the-hub
title: How to search the Hub efficiently

- local: how-to-downstream
title: How to download files from the Hub

- local: how-to-upstream
title: How to create repositories and upload files to the Hub

- local: how-to-inference
title: How to use the Inference API from the Hub library

- local: adding-a-task
title: Adding a new task to the Hub

- local: security
title: Security tips
- local: models-main
title: Models
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion docs/hub/hugging-face-hub.md
Original file line number Diff line number Diff line change
@@ -24,7 +24,7 @@ On it, you'll be able to upload and discover...
Unlike other hosting solutions, the Hub offers **versioning, commit history, diffs, branches, and over a dozen library integrations**! You can learn more about the features that all repositories share in the **Repositories documentation**.

## Models
Models on the Hugging Face Hub allow for simple discovery and usage to maximize model impact. To promote responsible model usage and development, model repos are equipped with [Model Cards](TODO) to inform users of each model's limitations and biases. Additional [metadata](/docs/hub/model-repos#model-card-metadata) about info such as their tasks, languages, and metrics can be included, with training metrics charts even added if the repository contains [TensorBoard traces](https://huggingface.co/models?filter=tensorboard). It's also easy to add an **inference widget** to your model, allowing anyone to play with the model directly in the browser! For production settings, an API is provided to **instantly serve your model**.
Models on the Hugging Face Hub allow for simple discovery and usage to maximize model impact. To promote responsible model usage and development, model repos are equipped with [Model Cards](./models-cards) to inform users of each model's limitations and biases. Additional [metadata](/docs/hub/model-repos#model-card-metadata) about info such as their tasks, languages, and metrics can be included, with training metrics charts even added if the repository contains [TensorBoard traces](https://huggingface.co/models?filter=tensorboard). It's also easy to add an **inference widget** to your model, allowing anyone to play with the model directly in the browser! For production settings, an API is provided to **instantly serve your model**.

To upload models to the Hub, or download models and integrate them into your work, explore the **Models documentation**. You can also choose from [**over a dozen frameworks**](/docs/hub/libraries) such as 🤗 Transformers, Asteroid, and ESPnet that support the Hugging Face Hub.

181 changes: 181 additions & 0 deletions docs/hub/models-adding-libraries.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
---
title: Integrate a library with the Hub
---

# Integrate your library with the Hub

The Hugging Face Hub aims to facilitate sharing machine learning models, checkpoints, and artifacts. This endeavor includes integrating the Hub into many of the amazing third-party libraries in the community. Some of the ones already integrated include [spaCy](https://spacy.io/usage/projects#huggingface_hub), [AllenNLP](https://allennlp.org/), and [timm](https://rwightman.github.io/pytorch-image-models/), among many others. Integration means users can download and upload files to the Hub directly from your library. We hope you will integrate your library and join us in democratizing artificial intelligence for everyone!

Integrating the Hub with your library provides many benefits, including:

- Free model hosting for you and your users.
- Built-in file versioning - even for huge files - made possible by [Git-LFS](https://git-lfs.github.com/).
- All public models are powered by the [Inference API](https://api-inference.huggingface.co/docs/python/html/index.html).
- In-browser widgets allow users to interact with your hosted models directly.

This tutorial will help you integrate the Hub into your library so your users can benefit from all the features offered by the Hub.

Before you begin, we recommend you create a [Hugging Face account](https://huggingface.co/join) from which you can manage your repositories and files.

If you need help with the integration, feel free to open an [issue](https://github.com/huggingface/huggingface_hub/issues/new/choose), and we would be more than happy to help you!

## Installation

1. Install the `huggingface_hub` library with pip in your environment:

```bash
python -m pip install huggingface_hub
```

2. Once you have successfully installed the `huggingface_hub` library, log in to your Hugging Face account:

```bash
huggingface-cli login
```

```bash
_| _| _| _| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _|_|_|_| _|_| _|_|_| _|_|_|_|
_| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|
_|_|_|_| _| _| _| _|_| _| _|_| _| _| _| _| _| _|_| _|_|_| _|_|_|_| _| _|_|_|
_| _| _| _| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|
_| _| _|_| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _| _| _| _|_|_| _|_|_|_|


Username:
Password:
```

3. Alternatively, if you prefer working from a Jupyter or Colaboratory notebook, login with `notebook_login`:

```python
>>> from huggingface_hub import notebook_login
>>> notebook_login()
```

`notebook_login` will launch a widget in your notebook from which you can enter your Hugging Face credentials.

## Download files from the Hub

Integration allows users to download your hosted files directly from the Hub using your library.

Use the `hf_hub_download` function to retrieve a URL and download files from your repository. Downloaded files are stored in your cache: `~/.cache/huggingface/hub`. You don't have to re-download the file the next time you use it, and for larger files, this can save a lot of time. Furthermore, if the repository is updated with a new version of the file, `huggingface_hub` will automatically download the latest version and store it in the cache for you. Users don't have to worry about updating their files.

For example, download the `config.json` file from the [lysandre/arxiv-nlp](https://huggingface.co/lysandre/arxiv-nlp) repository:

```python
>>> from huggingface_hub import hf_hub_download
>>> hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json")
```

Download a specific version of the file by specifying the `revision` parameter. The `revision` parameter can be a branch name, tag, or commit hash.

The commit hash must be a full-length hash instead of the shorter 7-character commit hash:

```python
>>> from huggingface_hub import hf_hub_download
>>> hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="877b84a8f93f2d619faa2a6e514a32beef88ab0a")
```

Use the `cache_dir` parameter to change where a file is stored:

```python
>>> from huggingface_hub import hf_hub_download
>>> hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json", cache_dir="/home/lysandre/test")
```

### Code sample

We recommend adding a code snippet to explain how to use a model in your downstream library.

![/docs/assets/hub/code_snippet.png](/docs/assets/hub/code_snippet.png)

Add a code snippet by updating the [Libraries Typescript file](https://github.com/huggingface/hub-docs/blob/main/js/src/lib/interfaces/Libraries.ts) with instructions for your model. For example, the [Asteroid](https://huggingface.co/asteroid-team) integration includes a brief code snippet for how to load and use an Asteroid model:

```typescript
const asteroid = (model: ModelData) =>
`from asteroid.models import BaseModel

model = BaseModel.from_pretrained("${model.id}")`;
```

Doing so will also add a tag to your model so users can quickly identify models from your library.

![/docs/assets/hub/libraries-tags.png](/docs/assets/hub/libraries-tags.png)

## Upload files to the Hub

You might also want to provide a method for creating model repositories and uploading files to the Hub directly from your library. The `huggingface_hub` library offers two ways to assist you with creating repositories and uploading files:

- `create_repo` creates a repository on the Hub.
- `upload_file` directly uploads files to a repository on the Hub.

### `create_repo`

The `create_repo` method creates a repository on the Hub. Use the `name` parameter to provide a name for your repository:

```python
>>> from huggingface_hub import create_repo
>>> create_repo(repo_id="test-model")
'https://huggingface.co/lysandre/test-model'
```

When you check your Hugging Face account, you should now see a `test-model` repository under your namespace.

### `upload_file`

The `upload_file` method uploads files to the Hub. This method requires the following:

- A path to the file to upload.
- The final path in the repository.
- The repository you wish to push the files to.

For example:

```python
>>> from huggingface_hub import upload_file
>>> upload_file(
... path_or_fileobj="/home/lysandre/dummy-test/README.md",
... path_in_repo="README.md",
... repo_id="lysandre/test-model"
... )
'https://huggingface.co/lysandre/test-model/blob/main/README.md'
```

If you need to upload more than one file, look at the utilities offered by the `Repository` class [here](TODO).

Once again, if you check your Hugging Face account, you should see the file inside your repository.

Lastly, it is important to add a model card so users understand how to use your model. See [here](/docs/hub/model-repos#what-are-model-cards-and-why-are-they-useful) for more details about how to create a model card.

## Set up the Inference API

Our Inference API powers models uploaded to the Hub through your library.

All third-party libraries are Dockerized, so you can install the dependencies you'll need for your library to work correctly. Add your library to the existing Docker images by navigating to the [Docker images folder](https://github.com/huggingface/api-inference-community/tree/main/docker_images).

1. Copy the `common` folder and rename it with the name of your library (e.g. `docker/common` to `docker/your-awesome-library`).
2. There are four files you need to edit:
* List the packages required for your library to work in `requirements.txt`.
* Update `app/main.py` with the tasks supported by your model (see [here](https://github.com/huggingface/api-inference-community) for a complete list of available tasks). Look out for the `IMPLEMENT_THIS` flag to add your supported task.

```python
ALLOWED_TASKS: Dict[str, Type[Pipeline]] = {
"token-classification": TokenClassificationPipeline
}
```

* For each task your library supports, modify the `app/pipelines/task_name.py` files accordingly. We have also added an `IMPLEMENT_THIS` flag in the pipeline files to guide you. If there isn't a pipeline that supports your task, feel free to add one. Open an [issue](https://github.com/huggingface/hub-docs/issues/new) here, and we will be happy to help you.
* Add your model and task to the `tests/test_api.py` file. For example, if you have a text generation model:

```python
TESTABLE_MODELS: Dict[str,str] = {
"text-generation": "my-gpt2-model"
}
```
3. Finally, run the following test to ensure everything works as expected:

```bash
pytest -sv --rootdir docker_images/your-awesome-library/docker_images/your-awesome-library/
```

With these simple but powerful methods, you brought the full functionality of the Hub into your library. Users can download files stored on the Hub from your library with `hf_hub_download`, create repositories with `create_repo`, and upload files with `upload_file`. You also set up Inference API with your library, allowing users to interact with your models on the Hub from inside a browser.
54 changes: 54 additions & 0 deletions docs/hub/models-cards-co2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
---
title: Carbon Emissions
---

<h1>Displaying carbon emissions for your model</h1>

## Why is it beneficial to calculate the carbon emissions of my model?

Training ML models is often energy-intensive and can produce a substantial carbon footprint, as described by [Strubell et al.](https://arxiv.org/abs/1906.02243). It's therefore important to *track* and *report* the emissions of models to get a better idea of the environmental impacts of our field.


## What information should I include about the carbon footprint of my model?

If you can, you should include information about:
- where the model was trained (in terms of location)
- the hardware used -- e.g. GPU, TPU, or CPU, and how many
- training type: pre-training or fine-tuning
- the estimated carbon footprint of the model, calculated in real-time with the [Code Carbon](https://github.com/mlco2/codecarbon) package or after training using the [ML CO2 Calculator](https://mlco2.github.io/impact/).

## Carbon footprint metadata

You can add the carbon footprint data to the model card metadata (in the README.md file). The structure of the metadata should be:

```yaml
---
co2_eq_emissions:
emissions: "in grams of CO2"
source: "source of the information, either directly from AutoTrain, code carbon or from a scientific article documenting the model"
training_type: "pre-training or fine-tuning"
geographical_location: "as granular as possible, for instance Quebec, Canada or Brooklyn, NY, USA"
hardware_used: "how much compute and what kind, e.g. 8 v100 GPUs"
---
```

## How is the carbon footprint of my model calculated? 🌎

Considering the computing hardware, location, usage, and training time, you can estimate how much CO<sub>2</sub> the model produced.

The math is pretty simple! ➕

First, you take the *carbon intensity* of the electric grid used for the training -- this is how much CO<sub>2</sub> is produced by KwH of electricity used. The carbon intensity depends on the location of the hardware and the [energy mix](https://electricitymap.org/) used at that location -- whether it's renewable energy like solar 🌞, wind 🌬️ and hydro 💧, or non-renewable energy like coal ⚫ and natural gas 💨. The more renewable energy gets used for training, the less carbon-intensive it is!

Then, you take the power consumption of the GPU during training using the `pynvml` library.

Finally, you multiply the power consumption and carbon intensity by the training time of the model, and you have an estimate of the CO<sub>2</sub> emission.

Keep in mind that this isn't an exact number because other factors come into play -- like the energy used for data center heating and cooling -- which will increase carbon emissions. But this will give you a good idea of the scale of CO<sub>2</sub> emissions that your model is producing!

To add **Carbon Emissions** metadata to your models:

1. If you are using **AutoTrain**, this is tracked for you 🔥
2. Otherwise, use a tracker like Code Carbon in your training code, then specify `co2_eq_emissions.emissions: 1.2345` in your model card metadata, where `1.2345` is the emissions value in **grams**.

To learn more about the carbon footprint of Transformers, check out the [video](https://www.youtube.com/watch?v=ftWlj4FBHTg), part of the Hugging Face Course!
Loading