Skip to content

Investigate overhead when listing spaces #2017

@Wauplin

Description

@Wauplin

2 things:

  1. looks like huggingface_hub induced some significant overhead on top of requests especially when listing Spaces
  2. looks like huggingface_hub 0.19.4 takes significantly less time than 0.20.2 (cc @hysts who discovered and reproduced it in an AWS Lambda).

See related slack thread (private).

Would be worth having a look at it checking if we are not doing something too stupid 😁. First convo was about listing spaces but most likely not specific to this endpoint.

(for the record, listing using token=False is also significantly faster than with auth given that the server doesn't have to handle it)

import collections
import cProfile
import time

import requests

from huggingface_hub import HfApi


def count_sdks_hf_hub(limit=None):
    num_spaces = collections.defaultdict(int)
    api = HfApi(user_agent={"is_ci": True}, token=False)
    for space in api.list_spaces(limit=limit):
        if not space.private:
            num_spaces[space.sdk] += 1
    return dict(num_spaces)

def count_sdks_requests(limit=None):
    session = requests.Session()
    session.headers.update({"user-agent": "is_ci/true"})
    url = "https://huggingface.co/api/spaces"

    num_spaces = collections.defaultdict(int)
    n = 0
    while True:
        response = session.get(url)
        response.raise_for_status()
        for space in response.json():
            n += 1
            if not space.get("private"):
                num_spaces[space.get("sdk")] += 1
        url = response.links.get("next", {}).get("url")
        if limit is not None and n >= limit:
            return num_spaces
        if url is None:
            return dict(num_spaces)

for fn in (count_sdks_hf_hub, count_sdks_requests):
    start_t = time.perf_counter()
    res = fn()
    elapsed = time.perf_counter() - start_t
    print(fn.__name__, elapsed, res)

# or
# cProfile.run('count_sdks_hf_hub()')
count_sdks_hf_hub 37.29020519400001 {'streamlit': 29277, 'gradio': 110611, 'static': 5360, None: 867, 'docker': 37227}
count_sdks_requests 29.098300531999485 {'streamlit': 29277, 'gradio': 110611, 'static': 5360, None: 867, 'docker': 37227}

count_sdks_hf_hub 33.34559893700134 {'streamlit': 29274, 'gradio': 110618, 'static': 5360, None: 867, 'docker': 37230}
count_sdks_requests 30.013438875999782 {'streamlit': 29274, 'gradio': 110617, 'static': 5360, None: 867, 'docker': 37230}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions