-
Notifications
You must be signed in to change notification settings - Fork 2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
upgrade haystack version number to 1.1.0 (#2039)
* upgrade haystack version number to 1.1.0 * copy docs to new version folder
- Loading branch information
1 parent
50317d7
commit c6f23dc
Showing
130 changed files
with
15,965 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# Minimal makefile for Sphinx documentation | ||
# | ||
|
||
# You can set these variables from the command line, and also | ||
# from the environment for the first two. | ||
|
||
SPHINXBUILD := sphinx-build | ||
MAKEINFO := makeinfo | ||
|
||
BUILDDIR := build | ||
SOURCE := _src/ | ||
# SPHINXFLAGS := -a -W -n -A local=1 -d $(BUILDDIR)/doctree | ||
SPHINXFLAGS := -A local=1 -d $(BUILDDIR)/doctree | ||
SPHINXOPTS := $(SPHINXFLAGS) $(SOURCE) | ||
|
||
# Put it first so that "make" without argument is like "make help". | ||
help: | ||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) | ||
|
||
.PHONY: help Makefile | ||
|
||
# Catch-all target: route all unknown targets to Sphinx using the new | ||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). | ||
%: Makefile | ||
$(SPHINXBUILD) -M $@ $(SPHINXOPTS) $(BUILDDIR)/$@ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Minimal makefile for Sphinx documentation | ||
# | ||
|
||
# You can set these variables from the command line, and also | ||
# from the environment for the first two. | ||
SPHINXOPTS ?= | ||
SPHINXBUILD ?= sphinx-build | ||
SOURCEDIR = . | ||
BUILDDIR = _build | ||
|
||
# Put it first so that "make" without argument is like "make help". | ||
help: | ||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) | ||
|
||
.PHONY: help Makefile | ||
|
||
# Catch-all target: route all unknown targets to Sphinx using the new | ||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). | ||
%: Makefile | ||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
div.sphinxsidebarwrapper { | ||
position: relative; | ||
top: 0px; | ||
padding: 0; | ||
} | ||
|
||
div.sphinxsidebar { | ||
margin: 0; | ||
padding: 0 15px 0 15px; | ||
width: 210px; | ||
float: left; | ||
font-size: 1em; | ||
text-align: left; | ||
} | ||
|
||
div.sphinxsidebar .logo { | ||
font-size: 1.8em; | ||
color: #0A507A; | ||
font-weight: 300; | ||
text-align: center; | ||
} | ||
|
||
div.sphinxsidebar .logo img { | ||
vertical-align: middle; | ||
} | ||
|
||
div.sphinxsidebar .download a img { | ||
vertical-align: middle; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
{# put the sidebar before the body #} | ||
{% block sidebar1 %}{{ sidebar() }}{% endblock %} | ||
{% block sidebar2 %}{% endblock %} | ||
|
||
{% block extrahead %} | ||
<link href='https://fonts.googleapis.com/css?family=Open+Sans:300,400,700' | ||
rel='stylesheet' type='text/css' /> | ||
{{ super() }} | ||
{#- if not embedded #} | ||
<style type="text/css"> | ||
table.right { float: left; margin-left: 20px; } | ||
table.right td { border: 1px solid #ccc; } | ||
{% if pagename == 'index' %} | ||
.related { display: none; } | ||
{% endif %} | ||
</style> | ||
<script> | ||
// intelligent scrolling of the sidebar content | ||
$(window).scroll(function() { | ||
var sb = $('.sphinxsidebarwrapper'); | ||
var win = $(window); | ||
var sbh = sb.height(); | ||
var offset = $('.sphinxsidebar').position()['top']; | ||
var wintop = win.scrollTop(); | ||
var winbot = wintop + win.innerHeight(); | ||
var curtop = sb.position()['top']; | ||
var curbot = curtop + sbh; | ||
// does sidebar fit in window? | ||
if (sbh < win.innerHeight()) { | ||
// yes: easy case -- always keep at the top | ||
sb.css('top', $u.min([$u.max([0, wintop - offset - 10]), | ||
$(document).height() - sbh - 200])); | ||
} else { | ||
// no: only scroll if top/bottom edge of sidebar is at | ||
// top/bottom edge of window | ||
if (curtop > wintop && curbot > winbot) { | ||
sb.css('top', $u.max([wintop - offset - 10, 0])); | ||
} else if (curtop < wintop && curbot < winbot) { | ||
sb.css('top', $u.min([winbot - sbh - offset - 20, | ||
$(document).height() - sbh - 200])); | ||
} | ||
} | ||
}); | ||
</script> | ||
{#- endif #} | ||
{% endblock %} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,232 @@ | ||
<a name="base"></a> | ||
# Module base | ||
|
||
<a name="base.BaseGenerator"></a> | ||
## BaseGenerator Objects | ||
|
||
```python | ||
class BaseGenerator(BaseComponent) | ||
``` | ||
|
||
Abstract class for Generators | ||
|
||
<a name="base.BaseGenerator.predict"></a> | ||
#### predict | ||
|
||
```python | ||
| @abstractmethod | ||
| predict(query: str, documents: List[Document], top_k: Optional[int]) -> Dict | ||
``` | ||
|
||
Abstract method to generate answers. | ||
|
||
**Arguments**: | ||
|
||
- `query`: Query | ||
- `documents`: Related documents (e.g. coming from a retriever) that the answer shall be conditioned on. | ||
- `top_k`: Number of returned answers | ||
|
||
**Returns**: | ||
|
||
Generated answers plus additional infos in a dict | ||
|
||
<a name="transformers"></a> | ||
# Module transformers | ||
|
||
<a name="transformers.RAGenerator"></a> | ||
## RAGenerator Objects | ||
|
||
```python | ||
class RAGenerator(BaseGenerator) | ||
``` | ||
|
||
Implementation of Facebook's Retrieval-Augmented Generator (https://arxiv.org/abs/2005.11401) based on | ||
HuggingFace's transformers (https://huggingface.co/transformers/model_doc/rag.html). | ||
|
||
Instead of "finding" the answer within a document, these models **generate** the answer. | ||
In that sense, RAG follows a similar approach as GPT-3 but it comes with two huge advantages | ||
for real-world applications: | ||
a) it has a manageable model size | ||
b) the answer generation is conditioned on retrieved documents, | ||
i.e. the model can easily adjust to domain documents even after training has finished | ||
(in contrast: GPT-3 relies on the web data seen during training) | ||
|
||
**Example** | ||
|
||
```python | ||
| query = "who got the first nobel prize in physics?" | ||
| | ||
| # Retrieve related documents from retriever | ||
| retrieved_docs = retriever.retrieve(query=query) | ||
| | ||
| # Now generate answer from query and retrieved documents | ||
| generator.predict( | ||
| query=query, | ||
| documents=retrieved_docs, | ||
| top_k=1 | ||
| ) | ||
| | ||
| # Answer | ||
| | ||
| {'query': 'who got the first nobel prize in physics', | ||
| 'answers': | ||
| [{'query': 'who got the first nobel prize in physics', | ||
| 'answer': ' albert einstein', | ||
| 'meta': { 'doc_ids': [...], | ||
| 'doc_scores': [80.42758 ...], | ||
| 'doc_probabilities': [40.71379089355469, ... | ||
| 'content': ['Albert Einstein was a ...] | ||
| 'titles': ['"Albert Einstein"', ...] | ||
| }}]} | ||
``` | ||
|
||
<a name="transformers.RAGenerator.__init__"></a> | ||
#### \_\_init\_\_ | ||
|
||
```python | ||
| __init__(model_name_or_path: str = "facebook/rag-token-nq", model_version: Optional[str] = None, retriever: Optional[DensePassageRetriever] = None, generator_type: RAGeneratorType = RAGeneratorType.TOKEN, top_k: int = 2, max_length: int = 200, min_length: int = 2, num_beams: int = 2, embed_title: bool = True, prefix: Optional[str] = None, use_gpu: bool = True) | ||
``` | ||
|
||
Load a RAG model from Transformers along with passage_embedding_model. | ||
See https://huggingface.co/transformers/model_doc/rag.html for more details | ||
|
||
**Arguments**: | ||
|
||
- `model_name_or_path`: Directory of a saved model or the name of a public model e.g. | ||
'facebook/rag-token-nq', 'facebook/rag-sequence-nq'. | ||
See https://huggingface.co/models for full list of available models. | ||
- `model_version`: The version of model to use from the HuggingFace model hub. Can be tag name, branch name, or commit hash. | ||
- `retriever`: `DensePassageRetriever` used to embedded passages for the docs passed to `predict()`. This is optional and is only needed if the docs you pass don't already contain embeddings in `Document.embedding`. | ||
- `generator_type`: Which RAG generator implementation to use? RAG-TOKEN or RAG-SEQUENCE | ||
- `top_k`: Number of independently generated text to return | ||
- `max_length`: Maximum length of generated text | ||
- `min_length`: Minimum length of generated text | ||
- `num_beams`: Number of beams for beam search. 1 means no beam search. | ||
- `embed_title`: Embedded the title of passage while generating embedding | ||
- `prefix`: The prefix used by the generator's tokenizer. | ||
- `use_gpu`: Whether to use GPU (if available) | ||
|
||
<a name="transformers.RAGenerator.predict"></a> | ||
#### predict | ||
|
||
```python | ||
| predict(query: str, documents: List[Document], top_k: Optional[int] = None) -> Dict | ||
``` | ||
|
||
Generate the answer to the input query. The generation will be conditioned on the supplied documents. | ||
These document can for example be retrieved via the Retriever. | ||
|
||
**Arguments**: | ||
|
||
- `query`: Query | ||
- `documents`: Related documents (e.g. coming from a retriever) that the answer shall be conditioned on. | ||
- `top_k`: Number of returned answers | ||
|
||
**Returns**: | ||
|
||
Generated answers plus additional infos in a dict like this: | ||
|
||
```python | ||
| {'query': 'who got the first nobel prize in physics', | ||
| 'answers': | ||
| [{'query': 'who got the first nobel prize in physics', | ||
| 'answer': ' albert einstein', | ||
| 'meta': { 'doc_ids': [...], | ||
| 'doc_scores': [80.42758 ...], | ||
| 'doc_probabilities': [40.71379089355469, ... | ||
| 'content': ['Albert Einstein was a ...] | ||
| 'titles': ['"Albert Einstein"', ...] | ||
| }}]} | ||
``` | ||
|
||
<a name="transformers.Seq2SeqGenerator"></a> | ||
## Seq2SeqGenerator Objects | ||
|
||
```python | ||
class Seq2SeqGenerator(BaseGenerator) | ||
``` | ||
|
||
A generic sequence-to-sequence generator based on HuggingFace's transformers. | ||
|
||
Text generation is supported by so called auto-regressive language models like GPT2, | ||
XLNet, XLM, Bart, T5 and others. In fact, any HuggingFace language model that extends | ||
GenerationMixin can be used by Seq2SeqGenerator. | ||
|
||
Moreover, as language models prepare model input in their specific encoding, each model | ||
specified with model_name_or_path parameter in this Seq2SeqGenerator should have an | ||
accompanying model input converter that takes care of prefixes, separator tokens etc. | ||
By default, we provide model input converters for a few well-known seq2seq language models (e.g. ELI5). | ||
It is the responsibility of Seq2SeqGenerator user to ensure an appropriate model input converter | ||
is either already registered or specified on a per-model basis in the Seq2SeqGenerator constructor. | ||
|
||
For mode details on custom model input converters refer to _BartEli5Converter | ||
|
||
|
||
See https://huggingface.co/transformers/main_classes/model.html?transformers.generation_utils.GenerationMixin#transformers.generation_utils.GenerationMixin | ||
as well as https://huggingface.co/blog/how-to-generate | ||
|
||
For a list of all text-generation models see https://huggingface.co/models?pipeline_tag=text-generation | ||
|
||
**Example** | ||
|
||
```python | ||
| query = "Why is Dothraki language important?" | ||
| | ||
| # Retrieve related documents from retriever | ||
| retrieved_docs = retriever.retrieve(query=query) | ||
| | ||
| # Now generate answer from query and retrieved documents | ||
| generator.predict( | ||
| query=query, | ||
| documents=retrieved_docs, | ||
| top_k=1 | ||
| ) | ||
| | ||
| # Answer | ||
| | ||
| {'answers': [" The Dothraki language is a constructed fictional language. It's important because George R.R. Martin wrote it."], | ||
| 'query': 'Why is Dothraki language important?'} | ||
| | ||
``` | ||
|
||
<a name="transformers.Seq2SeqGenerator.__init__"></a> | ||
#### \_\_init\_\_ | ||
|
||
```python | ||
| __init__(model_name_or_path: str, input_converter: Optional[Callable] = None, top_k: int = 1, max_length: int = 200, min_length: int = 2, num_beams: int = 8, use_gpu: bool = True) | ||
``` | ||
|
||
**Arguments**: | ||
|
||
- `model_name_or_path`: a HF model name for auto-regressive language model like GPT2, XLNet, XLM, Bart, T5 etc | ||
- `input_converter`: an optional Callable to prepare model input for the underlying language model | ||
specified in model_name_or_path parameter. The required __call__ method signature for | ||
the Callable is: | ||
__call__(tokenizer: PreTrainedTokenizer, query: str, documents: List[Document], | ||
top_k: Optional[int] = None) -> BatchEncoding: | ||
- `top_k`: Number of independently generated text to return | ||
- `max_length`: Maximum length of generated text | ||
- `min_length`: Minimum length of generated text | ||
- `num_beams`: Number of beams for beam search. 1 means no beam search. | ||
- `use_gpu`: Whether to use GPU (if available) | ||
|
||
<a name="transformers.Seq2SeqGenerator.predict"></a> | ||
#### predict | ||
|
||
```python | ||
| predict(query: str, documents: List[Document], top_k: Optional[int] = None) -> Dict | ||
``` | ||
|
||
Generate the answer to the input query. The generation will be conditioned on the supplied documents. | ||
These document can be retrieved via the Retriever or supplied directly via predict method. | ||
|
||
**Arguments**: | ||
|
||
- `query`: Query | ||
- `documents`: Related documents (e.g. coming from a retriever) that the answer shall be conditioned on. | ||
- `top_k`: Number of returned answers | ||
|
||
**Returns**: | ||
|
||
Generated answers | ||
|
Oops, something went wrong.