Fix chunk_size to be 1000 tokens, not characters #1

yaroslavpoltoran · 2023-10-09T17:58:59Z

Hello, my friend. Thank you for the video and for repository.
When we use just RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100), then chunks contain up to 1000 characters, not tokens. You told in video and there is written in the code, that one chunk has 1000 tokens. To make so, we can use RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=1000, chunk_overlap=100) with installed tiktoken library. Then we will have ~1000 tokens in one chunk.

add tiktoken

4ed9943

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix chunk_size to be 1000 tokens, not characters #1

Fix chunk_size to be 1000 tokens, not characters #1

Uh oh!

yaroslavpoltoran commented Oct 9, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix chunk_size to be 1000 tokens, not characters #1

Are you sure you want to change the base?

Fix chunk_size to be 1000 tokens, not characters #1

Uh oh!

Conversation

yaroslavpoltoran commented Oct 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yaroslavpoltoran commented Oct 9, 2023 •

edited

Loading