基于 Transformer + FAISS 的高性能语义文本去重工具,面向大规模语料库
-
Updated
Aug 12, 2025 - Python
基于 Transformer + FAISS 的高性能语义文本去重工具,面向大规模语料库
Tool to deduplicate file contents
high-performance website content extractor
Add a description, image, and links to the text-deduplication topic page so that developers can more easily learn about it.
To associate your repository with the text-deduplication topic, visit your repo's landing page and select "manage topics."