Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel merge index #590

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Prev Previous commit
Next Next commit
update
XiaohanZhangCMU committed Feb 15, 2024
commit 6add8eafbbbaf7f5d2db13af9c8c4da09ab5f8c7
2 changes: 2 additions & 0 deletions streaming/base/util.py
Original file line number Diff line number Diff line change
@@ -346,6 +346,8 @@ def _merge_index_from_list(index_file_urls: Sequence[Union[str, Tuple[str, str]]
cpu_count = max(psutil.cpu_count() - 2, 1)
n_processes = n_processes if (n_processes is not None and 1 <= n_processes <= cpu_count) else 1

logger.warning(f'Got n_processes = {n_processes}. download and merge index in parallel')

# Prepare a temp folder to download index.json from remote if necessary. Removed in the end.
with tempfile.TemporaryDirectory() as temp_root:
logging.info(f'Created temporary folder {temp_root} to store index files')