⭐️ Highlights

Using GPT-4 through `PromptNode` and `Agent`

Haystack now supports GPT-4 through PromptNode and Agent. This means you can use the latest advancements in large language modeling to make your NLP applications more accurate and efficient.

To get started, create a PromptModel for GPT-4 and plug it into your PromptNode. Just like with ChatGPT, you can use GPT-4 in a chat scenario and ask follow-up questions, as shown in this example:

prompt_model = PromptModel("gpt-4", api_key=api_key)
prompt_node = PromptNode(prompt_model)
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"},
    {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
    {"role": "user", "content": "Where was it played?"},
]
result = prompt_node(messages)

Deprecating `RAGenerator` and `Seq2SeqGenerator`

RAGenerator and Seq2SeqGenerator are deprecated and will be removed in version 1.18. We advise using the more powerful PromptNode instead, which can use RAG and Seq2Seq models as well. The following example shows how to use PromptNode as a replacement for Seq2SeqGenerator:

p = PromptNode("vblagoje/bart_lfqa")

# Start by defining a question/query
query = "Why does water heated to room temperature feel colder than the air around it?"

# Given the question above, suppose the documents below were found in some document store
documents = [
    "when the skin is completely wet. The body continuously loses water by...",
    "at greater pressures. There is an ambiguity, however, as to the meaning of the terms 'heating' and 'cooling'...",
    "are not in a relation of thermal equilibrium, heat will flow from the hotter to the colder, by whatever pathway...",
    "air condition and moving along a line of constant enthalpy toward a state of higher humidity. A simple example ...",
    "Thermal contact conductance. In physics, thermal contact conductance is the study of heat conduction between solid ...",
]


# Manually concatenate the question and support documents into BART input
# conditioned_doc = "<P> " + " <P> ".join([d for d in documents])
# query_and_docs = "question: {} context: {}".format(query, conditioned_doc)

# Or use the PromptTemplate as shown here
pt = PromptTemplate("lfqa", "question: {query} context: {join(documents, delimiter='<P>')}")

res = p.prompt(prompt_template=pt, query=query, documents=[Document(d) for d in documents])

⚠️ Breaking Changes

Refactoring of our dependency management

We added the following extras as optional dependencies for Haystack: stats, metrics, preprocessing, file-conversion, and elasticsearch. To keep using certain components, you need to install farm-haystack with these new extras:

Component	Installation extra
`PreProcessor`	`farm-haystack[preprocessing]`
`DocxToTextConverter`	`farm-haystack[file-conversion]`
`TikaConverter`	`farm-haystack[file-conversion]`
`LangdetectDocumentLanguageClassifier`	`farm-haystack[file-conversion]`
`ElasticsearchDocumentStore`	`farm-haystack[elasticsearch]`

Dropping support for Python 3.7

Since Python 3.7 will reach end of life in June 2023, we will no longer support it as of Haystack version 1.16.

Smaller Breaking Changes

Using TableCell instead of Span to indicate the coordinates of a table cell (#4616)
Default save_dir for FARMReader's train method changed to f"./saved_models/{self.inferencer.model.language_model.name}" (#4553)
Using PreProcessor with split_respect_sentence_boundary set to True might return a different set of Documents than in v1.15 (#4470)

What's Changed

Breaking Changes

feat: Deduplicate duplicate Answers resulting from overlapping Documents in FARMReader by @bogdankostic in #4470
feat: Change default save_dir for FARMReader.train by @GitIgnoreMaybe in #4553
feat!: drop Python3.7 support by @ZanSara in #4421
refactor!: extract evaluation and statistical dependencies by @ZanSara in #4457
refactor!: extract preprocessing and file conversion deps by @ZanSara in #4605
feat: Implementation of Table Cell Proposal by @sjrl in #4616

Pipeline

fix: Fix pipeline config and agent tools hashing for telemetry by @silvanocerza in #4508
refactor: Adjust WhisperTranscriber to pipeline run methods by @vblagoje in #4510
Adding filtering support for Weaviate when used for BM25 querying by @zoltan-fedor in #4385
test: Remove duplicate whisper test by @julian-risch in #4567
fix: provide a fallback for PyMuPDF by @masci in #4564
Docs: Shaper API update by @agnieszka-m in #4542
Docs: Update Whisper API. by @agnieszka-m in #4539
refactor: remove variadic parameters in WebSearch initialization; make new nodes directly importable by @anakin87 in #4581
test: Add pytest fixture to block requests in unit tests by @silvanocerza in #4433
test: Rework conftest by @silvanocerza in #4614
feat: arbitrary crawler_depth for Crawler class by @benheckmann in #4623
fix: ParsrConverter list element added by @Namoush in #4562
fix: make langdetect truly optional by @ZanSara in #4686
feat: More flexible routing for RouteDocuments node by @sjrl in #4690
docs: Adapt Shaper docstrings regarding dropping metadata by @bogdankostic in #4655

DocumentStores

fix: Check for date fields in weaviate meta update by @joekitsmith in #4371
chore: skip Milvus tests by @ZanSara in #4654
docs: Add deprecation information to doc string of MilvusDocumentStore by @bogdankostic in #4658
Ignore cross-reference properties when loading documents by @masci in #4664
fix: PineconeDocumentStore error when delete_documents right after initialization by @Namoush in #4609
fix: remove warnings from the more recent Elasticsearch client by @masci in #4602
fix: Fixing the Weaviate BM25 query builder bug by @zoltan-fedor in #4703

Documentation

Docs: Update Seq2SeqGen models and docstrings lg by @agnieszka-m in #4595
feat: Load documents from remote - helper function by @TuanaCelik in #4545
refactor: Remove unecessary literal_eval when parsing env var by @silvanocerza in #4570
Docs: Fix QuestionGenerator and Summarizer docstrings by @agnieszka-m in #4594
refactor: Rework prompt tests by @silvanocerza in #4600
feat: Add util method to make HTTP requests with configurable retry by @silvanocerza in #4627
refactor: Rework invocation layers by @silvanocerza in #4615
refactor: Add 503 as status code that triggers retry in request_with_retry by @silvanocerza in #4640
feat: initial implementation of MemoryDocumentStore for new Pipelines by @ZanSara in #4447
docs: Add PDFToTextOCRConverter to API Docs by @bogdankostic in #4656
Docs: Add max length unit to PromptNode API docs by @agnieszka-m in #4601
fix: Add model_max_length model_kwargs parameter to HF PromptNode by @vblagoje in #4651
feat: Add chatgpt streaming by @vblagoje in #4659
feat: Add Hugging Face inferencing PromptNode layer by @vblagoje in #4641
refactor: node->component by @ZanSara in #4687
feat: Add AzureChatGPT Capability using new InvocationLayer style by @recrudesce in #4675
docs: add deprecation notes to docstrings by @masci in #4708

Other Changes

test: disable posthog in rest api tests by @ZanSara in #4507
ci: Enhance release_docs.py by @silvanocerza in #4459
ci: Use new Slack action to send failure messages by @silvanocerza in #4464
ci: Fix docker release process after PyPi release by @silvanocerza in #4513
Docs: Add whisper api by @agnieszka-m in #4511
proposal: DocumentStores and Retrievers by @ZanSara in #4370
fix: do not override bake's platform definitions by @masci in #4518
ci: Fix Slack messages formatting on job failure by @silvanocerza in #4520
fix: update envs for the backend image of annotation tool by @oryx1729 in #4535
refactor: OpenAIAnswerGenerator - avoid tokenizing all documents several times by @anakin87 in #4504
Docs: Update PromptNode API docs by @agnieszka-m in #4549
ci: Checkout correct ref in docstring-labeler.yml by @silvanocerza in #4563
test: Skip flaky prompt node integration test by @silvanocerza in #4572
refactor: Refactor prompt node by @silvanocerza in #4580
feat: Haystack CLI by @ZanSara in #4568
fix: Adjust HF stop words (single stop word) by @vblagoje in #4584
build xpdf on bionic by @masci in #4606
feat: support for gpt-4 by @ZanSara in #4620
ci: Fix docstring-labeler.yml not working in PR from forks by @silvanocerza in #4648
feat: Add GenerationConfig option to PromptNode's HuggingFace invocation layer by @vblagoje in #4649
test: Add requests blocker fixture by @silvanocerza in #4671
test: Block requests_cache in unit tests by @silvanocerza in #4696
docs: add web retriever to api docs by @dfokina in #4699
fix: Log 'Observation on new line by @TuanaCelik in #4704
build: Update weaviate-client by @bogdankostic in #4715
fix: Tiktoken does not support Azure gpt-35-turbo by @recrudesce in #4739
refactor!: extract elasticsearch by @ZanSara in #4668
Revert "fix: Log 'Observation' on new line (#4704)" by @bogdankostic in #4751
fix: gpt-3.5-turbo is an agent streaming model by @vblagoje in #4673
fix: recursion of death while loading PromptTemplate from yaml by @tstadel in #4691
fix: Deprecate Seq2SeqGenerator and RAGenerator by @vblagoje in #4745
Enhance the error logging in PromptTemplate variable resolution by @vblagoje in #4730
fix: Add separate query method for OpenSearchDocumentStore by @bogdankostic in #4764

New Contributors

@GitIgnoreMaybe made their first contribution in #4553
@erendabanlioglu made their first contribution in #4559
@benheckmann made their first contribution in #4623
@joekitsmith made their first contribution in #4371

Full Changelog: v1.15.1...v1.16.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.16.0

⭐️ Highlights

Using GPT-4 through `PromptNode` and `Agent`

More flexible routing of Documents with `RouteDocuments`

Deprecating `RAGenerator` and `Seq2SeqGenerator`

⚠️ Breaking Changes

Refactoring of our dependency management

Dropping support for Python 3.7

Smaller Breaking Changes

What's Changed

Breaking Changes

Pipeline

DocumentStores

Documentation

Other Changes

New Contributors

Contributors

v1.16.0

⭐️ Highlights

Using GPT-4 through PromptNode and Agent

More flexible routing of Documents with RouteDocuments

Deprecating RAGenerator and Seq2SeqGenerator

⚠️ Breaking Changes

Refactoring of our dependency management

Dropping support for Python 3.7

Smaller Breaking Changes

What's Changed

Breaking Changes

Pipeline

DocumentStores

Documentation

Other Changes

New Contributors

Contributors

Using GPT-4 through `PromptNode` and `Agent`

More flexible routing of Documents with `RouteDocuments`

Deprecating `RAGenerator` and `Seq2SeqGenerator`