Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs Revamp: new "Repositories" page for hub-docs #92

Merged
merged 13 commits into from
Apr 18, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified docs/assets/hub/empty_repo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/hub/new_repo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/hub/repo_history.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/hub/repo_with_files.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/hub/_sections.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
- local: hugging-face-hub
title: Hugging Face Hub

- local: repositories-main
title: Repositories

- local: main
title: Hub documentation

Expand Down
25 changes: 25 additions & 0 deletions docs/hub/repositories-best-practices.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
title: Getting Started with Repositories
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice section!

---

<h1>Best practices with repositories</h1>

Here are some additional best practices to help you get the most out of your repository.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a section named "Handling multiple experiments" with a large TODO? (I can help with that in a follow-up PR from my side)

I think we can add things from huggingface/huggingface_hub#769 and specially from #53


## Private repositories

You can choose a repository's visibility when you create it, and any repository that you own can have its visibility toggled between *public* and *private* in the **Settings** tab. Unless your repository is owned by an organization (more about that [**here!**](TODO)), you are the only user that can make changes to your repo or upload any code. Setting your visibility to *private* will:

- Ensure your repo is not discoverable by other users by searching the Hub.
- Other users who visit the URL of your private repo will receive a `404 - Repo not found` error.
- Other users will not be able to clone your repo.

## Handling multiple experiments
### TODO
Can use content from https://github.com/huggingface/huggingface_hub/issues/769 and https://github.com/huggingface/hub-docs/issues/53

## Licenses

You are able to add a license to any repo that you create on the Hugging Face Hub to let other users know about the permissions that you want to attribute to your code. The license can also be added to your repository's `README.md` file, known as a *card* on the Hub, in the card's metadata section. Remember to seek out and respect a project's license if you're considering using their code.

A [**full list of the available licenses**](TODO) is available in these docs.
114 changes: 114 additions & 0 deletions docs/hub/repositories-getting-started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
---
title: Getting Started with Repositories
---

<h1>Getting Started with Repositories</h1>

This beginner-friendly guide will help you get the basic skills you need to create and manage your repository on the Hub. Each section builds on the previous one, so feel free to choose where to start!

## Requirements

If you do not have `git` available as a CLI command yet, you will need to [install Git](https://git-scm.com/downloads) for your platform. You will also need to [install Git LFS](https://git-lfs.github.com/), which will be used to handle large files such as images and model weights.

To be able to push your code to the Hub, you'll need to authenticate somehow. The easiest way to do this is by installing the [`huggingface_hub` CLI](https://huggingface.co/docs/huggingface_hub/index) and running the login command:

```bash
python -m pip install huggingface_hub
huggingface-cli login
```

The content in the **Getting Started** section of this document is also available as a video!

<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/rkCly_cbMBk" title="Managing a repo" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

## Creating a repository

Using the Hub's web interface you can easily create repositories, add files (even large ones!), explore models, visualize diffs, and much more. There are three kinds of repositories on the Hub, and in this guide we'll be creating a **model repository** for demonstration purposes. For information on creating and managing models, datasets, and Spaces, refer to their respective documentation.

1. To create a new repository, visit [huggingface.co/new](http://huggingface.co/new):

![/docs/assets/hub/new_repo.png](/docs/assets/hub/new_repo.png)

2. First, specify the owner of the repository: this can be either you or any of the organizations you’re affiliated with.

3. Next, enter your model’s name. This will also be the name of the repository. Finally, you can specify whether you want your model to be public or private.

You can leave the *License* field blank for now. To learn about licenses, visit the **Licenses** (TODO: LINK TO LICENSES) section of this document.

After creating your model repository, you should see a page like this:

![/docs/assets/hub/empty_repo.png](/docs/assets/hub/empty_repo.png)

Note that the Hub prompts you to create a *Model Card*, which you can learn about in the **Model Cards documentation** (TODO: LINK). Including a Model Card in your model repo is best practice, but since we're only making a test repo at the moment we can skip this.


## Cloning repositories

Downloading repositories to your local machine is called *cloning*. You can use the following commands to load the repo that we made and navigate to it:
```bash
git clone https://huggingface.co/<your-username>/<your-model-id>
cd <your-model-id>
```

## Adding files to a repository

Now's the time, you can add any files you want to the repository! 🔥


Do you have files larger than 10MB? Those files should be tracked with `git-lfs`, which you can initialize with:

```bash
git lfs install
```

Note that if your files are larger than **5GB** you'll also need to run:

```bash
huggingface-cli lfs-enable-largefiles
```

When you use Hugging Face to create a repository, we automatically provide a list of common file extensions for these files in the `.gitattributes` file, which `git-lfs` uses to efficiently track changes to your large files. However, you might need to add new extensions if your file types are not already handled. You can do so with `git lfs track "*.your_extension"`.


You can use Git to save new files and any changes to already existing files as a bundle of changes called a *commit*, which can be thought of as a "revision" to your project. To create a commit, we have to `add` the files to let Git know that we're planning on saving the changes and then `commit` those changes. In order to sync the new commit with the Hugging Face Hub, we then `push` the commit to the Hub.

```bash
# Create any files you like! Then...
git add .
git commit -m "First model version" # You can choose any descriptive message
git push
```

And we're done! You can check your repository on Hugging Face with all the recently added files. For example, in the screenshot below the user added a number of files. Note that one of the files in this example has a size of `413 MB`, so the repo uses Git LFS to track it.

![/docs/assets/hub/repo_with_files.png](/docs/assets/hub/repo_with_files.png)


## Viewing a repo's history
Every time you go through the `add`-`commit`-`push` cycle, the repo will keep track of every change you've made to your files. The UI allows you to explore the model files and commits and to see the difference (also known as *diff*) introduced by each commit. To see the history, you can click on the **History: X commits** link.

![/docs/assets/hub/repo_history.png](/docs/assets/hub/repo_history.png)

You can click on an individual commit to see what changes that commit introduced:

![/docs/assets/hub/explore_history.gif](/docs/assets/hub/explore_history.gif)


## Renaming or transferring a repo


If you own a repository, you will be able to visit the **Settings** tab to manage the name and ownership. Note that there are certain limitations in terms of use cases.

Moving can be used in these use cases ✅
- Renaming a repository within same user.
- Renaming a repository within same organization. The user must be part of the organization and have "write" or "admin" rights in the organization.
- Transferring repository from user to an organization. The user must be part of the organization and have "write" or "admin" rights in the organization.
- Transferring a repository from an organization to yourself. You must be part of the organization, and have "admin" rights in the organization.
- Transferring a repository from a source organization to another target organization. The user must have "admin" rights in the source organization **and** either "write" or "admin" rights in the target organization.

Moving does not work for ❌
- Transferring a repository from an organization to another user who is not yourself.
- Transferring a repository from a source organization to another target organization if the user does not have both "admin" rights in the source organization **and** either "write" or "admin" rights in the target organization.
- Transferring a repository from user A to user B.

If these are use cases you need help with, please send us an email at **website at huggingface.co**.
15 changes: 15 additions & 0 deletions docs/hub/repositories-main.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
title: Repositories
---

<h1>Repositories</h1>

Models, Spaces, and datasets are hosted on the Hugging Face Hub as [Git repositories](https://git-scm.com/about), which means that version control and collaboration are core elements of the Hub. In a nutshell, a repository (also known as a **repo**) is a place where code and assets can be stored to back up your work, share it with the community, and work in a team.

In these pages, we will go over the basics of getting started with Git and interacting with repositories on the Hub. Once you get the hang of it, you can explore the best practices and next steps that we've compiled for effective repository usage.

## Contents

- [Getting Started](./repositories-getting-started)
- [Best Practices](./repositories-best-practices)
- [Next Steps](./repositories-next-steps)
94 changes: 94 additions & 0 deletions docs/hub/repositories-next-steps.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
---
title: Next Steps
---

<h1>Next steps</h1>

These next sections highlight features and additional information that you may find useful to make the most out of the Git repositories on the Hugging Face Hub.

## Learning more about Git

A good place to visit if you want to continue learning about Git is [this Git tutorial](https://learngitbranching.js.org/). For even more background on Git, you can take a look at [GitHub's Git Guides](https://github.com/git-guides).

## How to use branches

To effectively use Git repos collaboratively and to work on features without releasing premature code you can use **branches**. Branches allow you to separate your "work in progress" code from your "production-ready" code, with the additional benefit of letting multiple people work on a project without frequently conflicting with each others' contributions. You can use branches to isolate experiments in their own branch, and even [adopt team-wide practices for managing branches](https://ericmjl.github.io/essays-on-data-science/workflow/gitflow/).

To learn about Git branching, you can try out the [Learn Git Branching interactive tutorial](https://learngitbranching.js.org/).

## Using tags

Git allows you to *tag* commits so that you can easily note milestones in your project. As such, you can use tags to mark commits in your Hub repos! To learn about using tags, you can visit [this DevConnected post](https://devconnected.com/how-to-create-git-tags/).

Beyond making it easy to identify important commits in your repo's history, using Git tags also allows you to [clone a repository at a specific tag](https://www.techiedelight.com/clone-specific-tag-with-git/). The `huggingface_hub` library also supports working with tags, such as [downloading files from a specific tagged commit](https://huggingface.co/docs/huggingface_hub/main/en/how-to-downstream#hfhuburl).

## How to duplicate or fork a repo (including LFS pointers)

If you'd like to copy a repository, depending on whether you want to preserve the Git history there are two options.

### Duplicating without Git history

In many scenarios, if you want your own copy of a particular codebase you might not be concerned about the previous Git history. In this case, you can quickly duplicate a repo with the handy [Repo Duplicator](https://huggingface.co/spaces/osanseviero/repo_duplicator)! You'll have to create a User Access Token, which you can read more about in the [security documentation](TODO).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!


### Duplicating with the Git history (Fork)

A duplicate of a repository with the commit history preserved is called a *fork*. You may choose to fork one of your own repos, but it also common to fork other people's projects if you would like to tinker with them.

**Note that you will need to [install Git LFS](https://git-lfs.github.com/) and the [`huggingface_hub` CLI](https://huggingface.co/docs/huggingface_hub/index) to follow this process**. When you want to fork or [rebase](https://git-scm.com/docs/git-rebase) a repository with LFS files you cannot use the usual Git approach that you might be familiar with since you need to be careful to not break the LFS pointers. Forking can take time depending on your bandwidth because you will have to fetch and re-upload all the LFS files in your fork.

For example, say you have an upstream repository, **upstream**, and you just created your own repository on the Hub which is **myfork** in this example.

1. Create a destination repository (e.g. **myfork**) in https://huggingface.co

2. Clone your fork repository:

```
git lfs clone https://huggingface.co/me/myfork.git
```

3. Fetch non LFS files:

```
cd myfork
git lfs install --skip-smudge --local # affects only this clone
git remote add upstream https://huggingface.co/friend/upstream.git
git fetch upstream
```

4. Fetch large files. This can take some time depending on your download bandwidth:

```
git lfs fetch --all upstream # this can take time depending on your download bandwidth
```

4.a. If you want to completely override the fork history (which should only have an initial commit), run:

```
git reset --hard upstream/main
```

4.b. If you want to rebase instead of overriding, run the following command and resolve any conflicts:

```
git rebase upstream/main
```

5. Prepare your LFS files to push:

```
git lfs install --force --local # this reinstalls the LFS hooks
huggingface-cli lfs-enable-largefiles . # needed if some files are bigger than 5Gb
```

6. And finally push:

```
git push --force origin main # this can take time depending on your upload bandwidth
```

Now you have your own fork or rebased repo in the Hub!


## How to programmatically manage repositories

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice reference to huggingface_hub

So far, we've looked at using the Git CLI and the Hugging Face Hub to work with our repos. But Hugging Face also supports accessing repos with Python via the [`huggingface_hub` library](https://huggingface.co/docs/huggingface_hub/index). The operations that we've explored such as downloading repositories and uploading files are available through the library, as well as other useful functions!