⭐ Highlights

Long-Form Question Answering (LFQA)

Haystack now provides LFQA with a Seq2SeqGenerator for generative QA and a Retribert Retriever thanks to community member @vblagoje. #1086
If you would like to ask questions where the answer is not a short phrase explicitly given in one of the documents but a more elaborate answer than LFQA is interesting for you. These elaborate answers are generated by combining information from multiple relevant documents.

Document Re-Ranking

For pure "semantic document search" use cases that do not need question answering functionality but only document ranking, there is now a new type of node: Ranker. While the Retriever is a perfect fit for document retrieval, we can further improve its results with the Ranker. #1025
To this end, the Ranker uses a pre-trained model to calculate the semantic similarity of the question and each of the top-k retrieved documents. Documents with a high semantic similarity are ranked higher. The combination of a Retriever and Ranker is especially powerful if you combine a sparse retriever, e.g., ElasticsearchRetriever based on BM25 and a dense Ranker.
A pipeline with a Ranker and Retriever can be setup in just a few lines of code:

...
retriever = ElasticsearchRetriever(document_store=document_store)
ranker = FARMRanker(model_name_or_path="deepset/gbert-base-germandpr-reranking")

p = Pipeline()
p.add_node(component=retriever, name="ESRetriever", inputs=["Query"])
p.add_node(component=ranker, name="Ranker", inputs=["ESRetriever"])
...

Weaviate

Thanks to a contribution by our community member @venuraja79 Weaviate is integrated into Haystack as another DocumentStore #1064
It allows a combination of vector search and scalar filtering, i.e., you can filter for a certain tag and do dense retrieval on that subset. After starting a Weaviate server with docker, it's as simple as:

from haystack.document_store import WeaviateDocumentStore
document_store = WeaviateDocumentStore()

Haystack uses the most recent Weaviate version 1.4.0 and the updating of embeddings has also been optimized #1181

Query Classifier

Some search applications need to distinguish between keyword queries and longer textual questions that come in. If you only want to route longer questions to the Reader branch in order to maximize the accuracy of results and minimize computation efforts/costs and route keyword queries to a Document Retriever, you can do that now with a QueryClassifier node thanks to a contribution by @shahrukhx01. #1099
You could use it as shown in this exemplary pipeline:

New Tutorials

Tutorial 11: Pipelines #991
Tutorial 12: Generative QA with LFQA #1086

⚠️ Breaking Changes

Remove Python 3.6 support #1059
Refactor REST APIs to use Pipelines #922
Bump to FARM 0.8.0, torch 1.8.1 and transformers 4.6.1 #1192

🤓 Detailed Changes

Connector

Add crawler to get texts from websites #775

Preprocessor

Add white space normalization warning #1022
Preserve whitespace during PreProcessor.split() #1121
Fix equality check in preprocessor #969

Pipeline

Add validation for root node in Pipeline #987
Fix passing a list as parameter value in Pipeline YAML #952
Add export of Pipeline YAML config #1003
Add config to JoinDocuments node to allow yaml export in pipelines #1134

Document Stores

Integrate Weaviate as another DocumentStore #957 #1064
Add OpenDistro init #1101
Rename all document stores delete_all_documents() method to delete_documents #1047
Fix Elasticsearch connection for non-admin users #1028
Fix update_embeddings() for FAISSDocumentStore #978
Feature: Enable AWS Elasticsearch IAM connection #965
Fix optional FAISS import #971
Make FAISS import conditional #970
Benchmark milvus #850
Improve Milvus HNSW Performance #1127
Update Milvus benchmarks #1128
Upgrade milvus to 1.1.0 #1066
Update tests for FAISSDocumentStore #999
Add L2 support for FAISS HNSW #1138
Improve the speed of FAISSDocumentStore.delete_documents() #1095
Add options for handling duplicate documents (skip, fail, overwrite) #1088
Update Embeddings - Use update instead of replace #1181
Improve the progress bar in update_embeddings() + Fix filters in update_embeddings() #1063
Using text hash as id to prevent document duplication #1000

Retriever

DPR Training parameter #989
Removed single_model_path; added infer_tokenizer to dpr load() #1060
Integrate sentence transformers into benchmarks #843
added use_amp to the train method, in order to use mixed precision training #1048

Ranker

Re-ranking component for document search without QA #1025
Remove quickfix from reader and ranker #1196
Distinguish labels for calculating similarity scores #1124

Query Classifier

Fix typo in Query Classifier Exception Message #1190
Add QueryClassifier incl. baseline models #1099

Reader

Filtering duplicate answers #1021
Add ONNXRuntime support #157
Remove unused function _get_pseudo_prob #1201

Generator

Integrate LFQA with Haystack - inferencing #1086

Evaluation Nodes

Reduce precision in pipeline eval print functions #943
Fix division by zero error in EvalRetriever #938
Add evaluation nodes for Pipelines #904
Add More top_k handling to EvalDocuments #1133
Prevent merge of same questions on different documents during evaluation #1119

REST API

adding root_path option #982
Add PDF converter dependencies Docker #1107
Disable Gunicorn preload option #960

User Interface

change file-upload response to sidebar #1018
Add File Upload Functionality in UI #995
Streamlit UI Evaluation mode #920
Fix evaluation mode in UI #1024
Fix typo in streamlit UI #1106

Documentation and Tutorials

Add about sections to Tutorial 12 #1195
Tutorial update #1166
Documentation update #1162
Add FAQ page #1151
Refresh API docs #1152
Add docu of confidence scores and calibration method #1131
Adding indentation to markup files #947
Update preprocessing.md #1087
Add badges to readme #1136
Regen api docs #1015
Docs: Add usage information detailes for aws elastic search service #1008
Add tutorial pages #1013
Pipelines tutorial #991
knowledge graph documentation #979
knowledge graph example #934
Add Milvus to the retriever / document store table #931
New docs version #964
Update Documentation #976
update api markdown files and add markdown file for ranker #1198
Reformat FAQ page #1177
Minor change with a link to the Weaviate docs #1180
Add links to GitHub Discussion and SO #984
Update milvus links and docstrings #959
Fixed link to dpr #962
Removed comma from last item in json list #1114
Fixing inconsistency #926

Misc

Squad tools #1029
Bugfix setting of device by defaulting to "cpu" #1182
Fixing issues caused due to mypy upgrade #1165
Remove Duplicate Benchmark Run #1132
Fixing grpcio-tools to version of colab's pre-installed grpcio #1113
Update farm version #936

🙏 Big thanks to all contributors! ❤️

A big thank you to all the contributors for this release: @PiffPaffM @oryx1729 @jacksbox @guillim @Timoeller @aantti @tholor @brandenchan @julian-risch @bhadreshpsavani @akkefa @mosheber @lalitpagaria @Avi777 @MichaelBitard @AlviseSembenico @shahrukhx01 @venuraja79 @bobvanluijt @vblagoje @cvgoudar

We would like to thank everyone who participated in the insightful discussions on GitHub and our community Slack!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.9.0

⭐ Highlights

Long-Form Question Answering (LFQA)

Document Re-Ranking

Weaviate

Query Classifier

New Tutorials

⚠️ Breaking Changes

🤓 Detailed Changes

Connector

Preprocessor

Pipeline

Document Stores

Retriever

Ranker

Query Classifier

Reader

Generator

Evaluation Nodes

REST API

User Interface

Documentation and Tutorials

Misc

🙏 Big thanks to all contributors! ❤️