Releases: AuvaLab/itext2kg
Releases · AuvaLab/itext2kg
ATOM
- Complete Architectural Redesign: ATOM now employs a three-module parallel pipeline for DTKG construction and updates.
- Atomic Fact Decomposition: A new first module splits text into minimal "atomic facts," addressing the "forgetting effect" where LLMs omit facts in longer contexts.
- Enhanced Exhaustivity and Stability: The new architecture achieves significant gains: ~31% in factual exhaustivity, ~18% in temporal exhaustivity, and ~17% in stability.
- Dual-Time Modeling: Implemented dual-time modeling (t_obs vs. t_start/t_end) to prevent temporal misattribution in dynamic KGs.
- Parallel 5-Tuple Extraction: Module-2 now directly extracts 5-tuples (subject, predicate, object, t_start, t_end) in parallel from atomic facts.
- Parallel Atomic Merge Architecture: Module-3 uses an efficient, parallel pairwise merge algorithm, achieving 93.8% latency reduction vs. Graphiti and 95.3% vs. iText2KG.
- LLM-Independent Resolution: Replaced slow LLM-based resolution with distance metrics (cosine similarity) for scalable, parallel merging.
iText2KG v0.0.9
Improvements
- We correct the bug #38 in neo4j storage.
iText2KG v0.0.8
New Features
- iText2KG_Star
- Direct relationship extraction (faster)
- Eliminates separate entity extraction step
- No isolated/invented entity handling
- Dynamic Knowledge Graphs
- Dynamic KGs with temporal tracking
- Incremental updates with existing graphs
- Facts-Based Construction
- Structured fact extraction via Document Distiller
- More exhaustive knowledge graphs
Improvements
- Async Migration: All methods now async/await
- Enhanced Logging: Structured logging system
- Universal LangChain Support: All chat/embedding models
- Better Error Handling: Production-ready reliability
Technical
- Enhanced relationship extraction
- Python 3.10+ compatibility
- Comprehensive examples
We corrected the following issues: #34, #33, #29, #28, #26, #22
Refactoring the iText2KG code
-The entire iText2KG code has been refactored by adding data models that describe an Entity, a Relationship, and a Knowledge Graph.
- Each entity is embedded using both its name and label to avoid merging concepts with similar names but different labels, such as Python: Language and Python: Snake.
- The weights for entity name embedding and entity label are configurable, with defaults set to 0.4 for the entity label and 0.6 for the entity name.
- A max_tries parameter has been added to the iText2KG.build_graph function for entity and relation extraction to prevent hallucinatory effects in structuring the output. A max_tries_isolated_entities parameter has been added to the same method to handle hallucinatory effects when processing isolated entities.
Latest update
- Fixing the bug reported in #7.
- Update the iText2KG build_graph function to perform matching between newly constructed graphs and existing graphs after the construction process.
Supporting other LLMs
- Now, iText2KG is compatible with all the chat/embeddings models LangChain supports. (#1)
The constructed graph can be expanded by passing the already extracted entities and relationships as arguments to the build_graph function in iText2KG. - iText2KG is compatible with all Python versions above 3.9. (#2)
- Some bugs in the overall architecture are corrected.
Latest update
- Updating the version of neo4j in the requirements.
- Adding datasets for threshold estimation.
- Adding the paper link.
Fixing the workflow
V0.0.2 Update dependencies in publish_on_pypi.yml and setup.cfg
First Release
V0.0.1 Add PyPI publishing workflow and configuration files