Compare data generated with HybridChunker versus our existing custom chunking implementation #506

bbrowning · 2025-01-27T15:02:01Z

We need to compare the data samples generated with Docling's new HybridChunker versus our existing Docling / custom chunking implementation for the same set of input document(s).

Do the new chunks look reasonable? Are the chunks substantially larger or smaller than before? This may be a manual spot-checking effort, unless we come up with a reasonable effort integration with an external tool to analyze our generated data samples.

bbrowning added this to the 0.8.0 milestone Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compare data generated with HybridChunker versus our existing custom chunking implementation #506

Compare data generated with HybridChunker versus our existing custom chunking implementation #506

bbrowning commented Jan 27, 2025

Compare data generated with HybridChunker versus our existing custom chunking implementation #506

Compare data generated with HybridChunker versus our existing custom chunking implementation #506

Comments

bbrowning commented Jan 27, 2025