v1.16.0
⭐️ Highlights
Using GPT-4 through PromptNode
and Agent
Haystack now supports GPT-4 through PromptNode
and Agent
. This means you can use the latest advancements in large language modeling to make your NLP applications more accurate and efficient.
To get started, create a PromptModel
for GPT-4 and plug it into your PromptNode
. Just like with ChatGPT, you can use GPT-4 in a chat scenario and ask follow-up questions, as shown in this example:
prompt_model = PromptModel("gpt-4", api_key=api_key)
prompt_node = PromptNode(prompt_model)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
{"role": "user", "content": "Where was it played?"},
]
result = prompt_node(messages)
More flexible routing of Documents with RouteDocuments
This release includes an enhancement to the RouteDocuments
node, which makes Document routing even more flexible.
The RouteDocuments
node now not only returns Documents matched by the split_by
or metadata_values
parameter, but also creates an extra route for unmatched Documents. This means that you won't accidentally filter out any Documents due to missing metadata fields. Additionally, the update adds support for using List[List[str]]
as input type to metadata_values
, so multiple metadata values can be grouped into a single output.
Deprecating RAGenerator
and Seq2SeqGenerator
RAGenerator
and Seq2SeqGenerator
are deprecated and will be removed in version 1.18. We advise using the more powerful PromptNode
instead, which can use RAG and Seq2Seq models as well. The following example shows how to use PromptNode
as a replacement for Seq2SeqGenerator
:
p = PromptNode("vblagoje/bart_lfqa")
# Start by defining a question/query
query = "Why does water heated to room temperature feel colder than the air around it?"
# Given the question above, suppose the documents below were found in some document store
documents = [
"when the skin is completely wet. The body continuously loses water by...",
"at greater pressures. There is an ambiguity, however, as to the meaning of the terms 'heating' and 'cooling'...",
"are not in a relation of thermal equilibrium, heat will flow from the hotter to the colder, by whatever pathway...",
"air condition and moving along a line of constant enthalpy toward a state of higher humidity. A simple example ...",
"Thermal contact conductance. In physics, thermal contact conductance is the study of heat conduction between solid ...",
]
# Manually concatenate the question and support documents into BART input
# conditioned_doc = "<P> " + " <P> ".join([d for d in documents])
# query_and_docs = "question: {} context: {}".format(query, conditioned_doc)
# Or use the PromptTemplate as shown here
pt = PromptTemplate("lfqa", "question: {query} context: {join(documents, delimiter='<P>')}")
res = p.prompt(prompt_template=pt, query=query, documents=[Document(d) for d in documents])
⚠️ Breaking Changes
Refactoring of our dependency management
We added the following extras as optional dependencies for Haystack: stats
, metrics
, preprocessing
, file-conversion
, and elasticsearch
. To keep using certain components, you need to install farm-haystack
with these new extras:
Component | Installation extra |
---|---|
PreProcessor |
farm-haystack[preprocessing] |
DocxToTextConverter |
farm-haystack[file-conversion] |
TikaConverter |
farm-haystack[file-conversion] |
LangdetectDocumentLanguageClassifier |
farm-haystack[file-conversion] |
ElasticsearchDocumentStore |
farm-haystack[elasticsearch] |
Dropping support for Python 3.7
Since Python 3.7 will reach end of life in June 2023, we will no longer support it as of Haystack version 1.16.
Smaller Breaking Changes
- Using
TableCell
instead ofSpan
to indicate the coordinates of a table cell (#4616) - Default
save_dir
forFARMReader
'strain
method changed tof"./saved_models/{self.inferencer.model.language_model.name}"
(#4553) - Using
PreProcessor
withsplit_respect_sentence_boundary
set toTrue
might return a different set of Documents than in v1.15 (#4470)
What's Changed
Breaking Changes
- feat: Deduplicate duplicate Answers resulting from overlapping Documents in
FARMReader
by @bogdankostic in #4470 - feat: Change default save_dir for FARMReader.train by @GitIgnoreMaybe in #4553
- feat!: drop Python3.7 support by @ZanSara in #4421
- refactor!: extract evaluation and statistical dependencies by @ZanSara in #4457
- refactor!: extract preprocessing and file conversion deps by @ZanSara in #4605
- feat: Implementation of Table Cell Proposal by @sjrl in #4616
Pipeline
- fix: Fix pipeline config and agent tools hashing for telemetry by @silvanocerza in #4508
- refactor: Adjust WhisperTranscriber to pipeline run methods by @vblagoje in #4510
- Adding filtering support for Weaviate when used for BM25 querying by @zoltan-fedor in #4385
- test: Remove duplicate whisper test by @julian-risch in #4567
- fix: provide a fallback for PyMuPDF by @masci in #4564
- Docs: Shaper API update by @agnieszka-m in #4542
- Docs: Update Whisper API. by @agnieszka-m in #4539
- refactor: remove variadic parameters in
WebSearch
initialization; make new nodes directly importable by @anakin87 in #4581 - test: Add pytest fixture to block requests in unit tests by @silvanocerza in #4433
- test: Rework conftest by @silvanocerza in #4614
- feat: arbitrary
crawler_depth
forCrawler
class by @benheckmann in #4623 - fix: ParsrConverter list element added by @Namoush in #4562
- fix: make
langdetect
truly optional by @ZanSara in #4686 - feat: More flexible routing for RouteDocuments node by @sjrl in #4690
- docs: Adapt Shaper docstrings regarding dropping metadata by @bogdankostic in #4655
DocumentStores
- fix: Check for date fields in weaviate meta update by @joekitsmith in #4371
- chore: skip Milvus tests by @ZanSara in #4654
- docs: Add deprecation information to doc string of
MilvusDocumentStore
by @bogdankostic in #4658 - Ignore cross-reference properties when loading documents by @masci in #4664
- fix: PineconeDocumentStore error when delete_documents right after initialization by @Namoush in #4609
- fix: remove warnings from the more recent Elasticsearch client by @masci in #4602
- fix: Fixing the Weaviate BM25 query builder bug by @zoltan-fedor in #4703
Documentation
- Docs: Update Seq2SeqGen models and docstrings lg by @agnieszka-m in #4595
- feat: Load documents from remote - helper function by @TuanaCelik in #4545
- refactor: Remove unecessary literal_eval when parsing env var by @silvanocerza in #4570
- Docs: Fix QuestionGenerator and Summarizer docstrings by @agnieszka-m in #4594
- refactor: Rework prompt tests by @silvanocerza in #4600
- feat: Add util method to make HTTP requests with configurable retry by @silvanocerza in #4627
- refactor: Rework invocation layers by @silvanocerza in #4615
- refactor: Add 503 as status code that triggers retry in request_with_retry by @silvanocerza in #4640
- feat: initial implementation of
MemoryDocumentStore
for new Pipelines by @ZanSara in #4447 - docs: Add PDFToTextOCRConverter to API Docs by @bogdankostic in #4656
- Docs: Add max length unit to PromptNode API docs by @agnieszka-m in #4601
- fix: Add model_max_length model_kwargs parameter to HF PromptNode by @vblagoje in #4651
- feat: Add chatgpt streaming by @vblagoje in #4659
- feat: Add Hugging Face inferencing PromptNode layer by @vblagoje in #4641
- refactor:
node->component
by @ZanSara in #4687 - feat: Add AzureChatGPT Capability using new InvocationLayer style by @recrudesce in #4675
- docs: add deprecation notes to docstrings by @masci in #4708
Other Changes
- test: disable posthog in rest api tests by @ZanSara in #4507
- ci: Enhance release_docs.py by @silvanocerza in #4459
- ci: Use new Slack action to send failure messages by @silvanocerza in #4464
- ci: Fix docker release process after PyPi release by @silvanocerza in #4513
- Docs: Add whisper api by @agnieszka-m in #4511
- proposal:
DocumentStores
andRetrievers
by @ZanSara in #4370 - fix: do not override bake's platform definitions by @masci in #4518
- ci: Fix Slack messages formatting on job failure by @silvanocerza in #4520
- fix: update envs for the backend image of annotation tool by @oryx1729 in #4535
- refactor:
OpenAIAnswerGenerator
- avoid tokenizing all documents several times by @anakin87 in #4504 - Docs: Update PromptNode API docs by @agnieszka-m in #4549
- ci: Checkout correct ref in docstring-labeler.yml by @silvanocerza in #4563
- test: Skip flaky prompt node integration test by @silvanocerza in #4572
- refactor: Refactor prompt node by @silvanocerza in #4580
- feat: Haystack CLI by @ZanSara in #4568
- fix: Adjust HF stop words (single stop word) by @vblagoje in #4584
- build xpdf on bionic by @masci in #4606
- feat: support for gpt-4 by @ZanSara in #4620
- ci: Fix docstring-labeler.yml not working in PR from forks by @silvanocerza in #4648
- feat: Add GenerationConfig option to PromptNode's HuggingFace invocation layer by @vblagoje in #4649
- test: Add requests blocker fixture by @silvanocerza in #4671
- test: Block requests_cache in unit tests by @silvanocerza in #4696
- docs: add web retriever to api docs by @dfokina in #4699
- fix: Log 'Observation on new line by @TuanaCelik in #4704
- build: Update weaviate-client by @bogdankostic in #4715
- fix: Tiktoken does not support Azure gpt-35-turbo by @recrudesce in #4739
- refactor!: extract elasticsearch by @ZanSara in #4668
- Revert "fix: Log 'Observation' on new line (#4704)" by @bogdankostic in #4751
- fix: gpt-3.5-turbo is an agent streaming model by @vblagoje in #4673
- fix: recursion of death while loading PromptTemplate from yaml by @tstadel in #4691
- fix: Deprecate Seq2SeqGenerator and RAGenerator by @vblagoje in #4745
- Enhance the error logging in PromptTemplate variable resolution by @vblagoje in #4730
- fix: Add separate query method for OpenSearchDocumentStore by @bogdankostic in #4764
New Contributors
- @GitIgnoreMaybe made their first contribution in #4553
- @erendabanlioglu made their first contribution in #4559
- @benheckmann made their first contribution in #4623
- @joekitsmith made their first contribution in #4371
Full Changelog: v1.15.1...v1.16.0