Skip to content

Commit 4e488b0

Browse files
authored
Fix the pre-process branch (#147)
1 parent 9f62039 commit 4e488b0

File tree

1 file changed

+4
-2
lines changed

1 file changed

+4
-2
lines changed

docs/content/pre-process-datasets.ipynb

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -214,7 +214,8 @@
214214
" # Get just the first few (each file is 11GB so this should be enough for a large dataset)\n",
215215
" data_files=[\n",
216216
" \"00.jsonl.zst\",\n",
217-
" \"01.jsonl.zst\" \"02.jsonl.zst\",\n",
217+
" \"01.jsonl.zst\",\n",
218+
" \"02.jsonl.zst\",\n",
218219
" \"03.jsonl.zst\",\n",
219220
" \"04.jsonl.zst\",\n",
220221
" \"05.jsonl.zst\",\n",
@@ -226,7 +227,8 @@
226227
" tokenizer_name=\"EleutherAI/gpt-neox-20b\",\n",
227228
" data_files=[\n",
228229
" \"00.jsonl.zst\",\n",
229-
" \"01.jsonl.zst\" \"02.jsonl.zst\",\n",
230+
" \"01.jsonl.zst\",\n",
231+
" \"02.jsonl.zst\",\n",
230232
" \"03.jsonl.zst\",\n",
231233
" \"04.jsonl.zst\",\n",
232234
" \"05.jsonl.zst\",\n",

0 commit comments

Comments
 (0)