Skip to content

Conversation

@yaroslavpoltoran
Copy link

@yaroslavpoltoran yaroslavpoltoran commented Oct 9, 2023

Hello, my friend. Thank you for the video and for repository.
When we use just RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100), then chunks contain up to 1000 characters, not tokens. You told in video and there is written in the code, that one chunk has 1000 tokens. To make so, we can use RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=1000, chunk_overlap=100) with installed tiktoken library. Then we will have ~1000 tokens in one chunk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant