You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Output should be empty, but instead returns these chunks (above MAX_TOKENS):
=== 4 ===chunker.serialize(chunk) (129 tokens):"IBM\n1910s–1950s\nIBM originated with several technological innovations developed and commercialized in the late 19th century. Julius E. Pitrap patented the computing scale in 1885;[17] Alexander Dey invented the dial recorder (1888);[18] Herman Hollerith patented the Electric Tabulating Machine (1889);[19] and Willard Bundy invented a time clock to record workers' arrival and departure times on a paper tape (1889).[20] On June 16, 1911, their four companies were amalgamated in New York State by Charles Ranlett Flint forming a fifth company, the"=== 6 ===chunker.serialize(chunk) (130 tokens):"IBM\n1910s–1950s\nCollectively, the companies manufactured a wide array of machinery for sale and lease, ranging from commercial scales and industrial time recorders, meat and cheese slicers, to tabulators and punched cards. Thomas J. Watson, Sr., fired from the National Cash Register Company by John Henry Patterson, called on Flint and, in 1914, was offered a position at CTR.[23] Watson joined CTR as general manager and then, 11 months later, was made President when antitrust cases relating to his time at NCR were resolved.[24] Having learned Patterson's pioneering business practices, Watson proceeded to put the stamp"
Bug
The HybridChunker sometimes produces chunks slightly exceeding the configured token limit.
Steps to reproduce
Output should be empty, but instead returns these chunks (above
MAX_TOKENS
):Docling version
Python version
Python 3.12.8
The text was updated successfully, but these errors were encountered: