Releases: ccprocessor/llm-webkit
Releases · ccprocessor/llm-webkit
v4.3.0-released
What's Changed
- docs: modify output spec by @drunkpig in #589
- feat: update code and math content_list by @e06084 in #590
- feat: reformat content list of image and list by @papayalove in #591
- fix: fix all newlines by @papayalove in #593
- feat: use new spec to represent title and paragraph by @drunkpig in #592
- refactor: table输出格式重构 by @LollipopsAndWine in #595
- refactor: 清除markdown中冗余的换行 by @LollipopsAndWine in #596
- v4.3.0-released by @e06084 in #597
Full Changelog: v4.2.1-released...v4.3.0-released
v4.2.1-released
What's Changed
- 删除replace_math逻辑 by @1041206149 in #580
- fix: content_list_spec.md by @1041206149 in #581
- docs: update content list spec by @e06084 in #582
- fix: content_list_spec.md_2 by @1041206149 in #583
- fix: content_list_spec.md_3 by @1041206149 in #584
- fix: content_list_spec.md_4 by @1041206149 in #586
- Dev api add mysql by @papayalove in #585
- fix: fix code newline by @papayalove in #587
- Release v4.2.1 by @e06084 in #588
Full Changelog: v4.2.0-released...v4.2.1-released
v4.2.0-released
What's Changed
- sync main to dev by @e06084 in #560
- fix: 修复段落结尾为换行时datajson拼接异常 by @LollipopsAndWine in #561
- fix: add post_main_html_processer_demo.py & update post main html by @renpengli01 in #563
- fix: 兼容段落可能为None的情况 by @LollipopsAndWine in #564
- fix: update html-pre-dedup & layout-clustering & readme by @renpengli01 in #565
- feat: simple api add use_raw_image_url in mm_md format by @e06084 in #566
- Dev feat api2 by @papayalove in #568
- fix: 1.修复标题中公式在md渲染异常 2.修复正则无法正确匹配$...$$...$$...$这种连续公式 3.修复处理不闭合的公式逻辑 4.去掉$转义 by @LollipopsAndWine in #567
- feat: add html parse api url parsing by @papayalove in #570
- fix:1.修复表格内公式没有被$$包裹渲染异常 2.去掉原生HTML具有hidden属性的隐藏标签 by @LollipopsAndWine in #574
- Dev speed up by @papayalove in #575
- fix: Handle encoding errors in selectolax by switching to BeautifulSoup by @ideaflow in #569
- release 4.2.0 by @e06084 in #576
Full Changelog: v4.1.1-released...v4.2.0-released
v4.1.1-released
What's Changed
Full Changelog: v4.1.0-released...v4.1.1-released
v4.1.0-released
What's Changed
- fix: update post main html README.md by @renpengli01 in #540
- fix: image提取caption by @LollipopsAndWine in #542
- docs: update readme by @e06084 in #545
- fix: escape '%' in MathML formula v2 by @1041206149 in #548
- fix: update post main html new plan & unit test by @renpengli01 in #546
- fix: fix tail content bug and improve multiple same first dynamic class id match by @papayalove in #553
- fix: add to_plain_md by datajson.py & unit by @renpengli01 in #550
- new simplify for dripper v1.5 by @ideaflow in #554
- 默认math全量扫描文本正则匹配 by @e06084 in #556
- git commit -m ': add single parse api' by @papayalove in #544
- fix: 1.简化表格,去掉非表格标签 2.修复table、list、title、text中上下标与主体被分隔开 by @LollipopsAndWine in #557
- Release v4.1.0 by @e06084 in #558
- update pydantic requirement by @e06084 in #559
Full Changelog: v4.0.1-released...v4.1.0-released
v4.0.1-released
What's Changed
- feat: simple api add language parameter by @e06084 in #536
- : fix element dict layer key error where html has deeper layer than the template by @papayalove in #538
- v4.0.1-released by @e06084 in #539
Full Changelog: v4.0.0-released...v4.0.1-released
v4.0.0-released
What's Changed
- fix: 修复复杂嵌套table提取逻辑 by @LollipopsAndWine in #534
- v4.0.0-released by @e06084 in #535
Full Changelog: v3.2.3-released...v4.0.0-released
v3.2.3-released
What's Changed
- : fix match failure if there are too many same ids in one html, fix incomplete html tags that cause structure chaos and fix natural language detection method for chinese by @papayalove in #527
- feat: Refactor html extractor to two stages by @e06084 in #521
- : fix main html loss due to br tail and p tag by @papayalove in #530
- fead: post main html by @renpengli01 in #528
- fead: post main html README.md by @renpengli01 in #529
- feat: add extract plain text from html source method by @drunkpig in #532
- fix: 删除cchtml中的script/style节点 by @e06084 in #531
- v3.2.3-released by @e06084 in #533
Full Changelog: v3.2.2-released...v3.2.3-released
v3.2.2-released
What's Changed
- 增加mathjax渲染器作用的范围 by @1041206149 in #520
- 修复类名为class的公式抽重复的情况 by @1041206149 in #522
- fix: 允许获取非标准结构的list的content_list by @LollipopsAndWine in #523
- v3.2.2-released by @e06084 in #526
Full Changelog: v3.2.1-released...v3.2.2-released
v3.2.1-released
What's Changed
- fix: set logging ERROR level in ASCIIMath2Tex by @e06084 in #508
- : improve dealing with response 0 by @papayalove in #509
- refactor: 重构simple by @LollipopsAndWine in #511
- fix: combine_text with empty text by @e06084 in #510
- fix: noclip pre-extract problem by @drunkpig in #513
- : fix None error by @papayalove in #514
- feat: change HTMLStripSpacePostExtractor to ContentListStripSpacePostExtractor by @e06084 in #515
- fix: change test_ContentListStripSpacePostExtractor.py filename by @e06084 in #516
- 添加一个默认行内行行间设置 by @1041206149 in #517
- fix by @e06084 in #519
- Release 3.2.1 by @e06084 in #518
Full Changelog: v3.2.0-released...v3.2.1-released