It would be convenient to be able to register your own Chunkers #1005
                  
                    
                      aropb
                    
                  
                
                  started this conversation in
                2. Feature requests
              
            Replies: 2 comments 2 replies
-
| 
         The recommended approach is to load and use custom handlers. When loading files, the "steps" parameter allows to choose which handlers to execute, which in turns allows to customize all the ingestion aspects: extraction, chunking, storage, etc.  | 
  
Beta Was this translation helpful? Give feedback.
                  
                    0 replies
                  
                
            -
| 
         That's what I'm doing now, but why rewrite the entire handler if you only need to replace the chunker.  | 
  
Beta Was this translation helpful? Give feedback.
                  
                    2 replies
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Currently, the creation of a Chunker instance is strictly embedded in the code (SummarizationHandler, TextPartitioningHandler).
It would be very convenient to be able to register your Chunkers instead of the standard ones and then use them through dependencies.
It is also important to be able to use your own Tokenizer (instead of CL100KTokenizer()).
The CL100KTokenizer is currently being created by default in SummarizationHandler, TextPartitioningHandler for default Chunkers.
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions