Skip to content

Commit c6f23dc

Browse files
authored
upgrade haystack version number to 1.1.0 (#2039)
* upgrade haystack version number to 1.1.0 * copy docs to new version folder
1 parent 50317d7 commit c6f23dc

File tree

130 files changed

+15965
-1
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

130 files changed

+15965
-1
lines changed

docs/v1.1.0/Makefile

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Minimal makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line, and also
5+
# from the environment for the first two.
6+
7+
SPHINXBUILD := sphinx-build
8+
MAKEINFO := makeinfo
9+
10+
BUILDDIR := build
11+
SOURCE := _src/
12+
# SPHINXFLAGS := -a -W -n -A local=1 -d $(BUILDDIR)/doctree
13+
SPHINXFLAGS := -A local=1 -d $(BUILDDIR)/doctree
14+
SPHINXOPTS := $(SPHINXFLAGS) $(SOURCE)
15+
16+
# Put it first so that "make" without argument is like "make help".
17+
help:
18+
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
19+
20+
.PHONY: help Makefile
21+
22+
# Catch-all target: route all unknown targets to Sphinx using the new
23+
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
24+
%: Makefile
25+
$(SPHINXBUILD) -M $@ $(SPHINXOPTS) $(BUILDDIR)/$@
26+

docs/v1.1.0/_src/api/Makefile

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Minimal makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line, and also
5+
# from the environment for the first two.
6+
SPHINXOPTS ?=
7+
SPHINXBUILD ?= sphinx-build
8+
SOURCEDIR = .
9+
BUILDDIR = _build
10+
11+
# Put it first so that "make" without argument is like "make help".
12+
help:
13+
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14+
15+
.PHONY: help Makefile
16+
17+
# Catch-all target: route all unknown targets to Sphinx using the new
18+
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
19+
%: Makefile
20+
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
div.sphinxsidebarwrapper {
2+
position: relative;
3+
top: 0px;
4+
padding: 0;
5+
}
6+
7+
div.sphinxsidebar {
8+
margin: 0;
9+
padding: 0 15px 0 15px;
10+
width: 210px;
11+
float: left;
12+
font-size: 1em;
13+
text-align: left;
14+
}
15+
16+
div.sphinxsidebar .logo {
17+
font-size: 1.8em;
18+
color: #0A507A;
19+
font-weight: 300;
20+
text-align: center;
21+
}
22+
23+
div.sphinxsidebar .logo img {
24+
vertical-align: middle;
25+
}
26+
27+
div.sphinxsidebar .download a img {
28+
vertical-align: middle;
29+
}
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
{# put the sidebar before the body #}
2+
{% block sidebar1 %}{{ sidebar() }}{% endblock %}
3+
{% block sidebar2 %}{% endblock %}
4+
5+
{% block extrahead %}
6+
<link href='https://fonts.googleapis.com/css?family=Open+Sans:300,400,700'
7+
rel='stylesheet' type='text/css' />
8+
{{ super() }}
9+
{#- if not embedded #}
10+
<style type="text/css">
11+
table.right { float: left; margin-left: 20px; }
12+
table.right td { border: 1px solid #ccc; }
13+
{% if pagename == 'index' %}
14+
.related { display: none; }
15+
{% endif %}
16+
</style>
17+
<script>
18+
// intelligent scrolling of the sidebar content
19+
$(window).scroll(function() {
20+
var sb = $('.sphinxsidebarwrapper');
21+
var win = $(window);
22+
var sbh = sb.height();
23+
var offset = $('.sphinxsidebar').position()['top'];
24+
var wintop = win.scrollTop();
25+
var winbot = wintop + win.innerHeight();
26+
var curtop = sb.position()['top'];
27+
var curbot = curtop + sbh;
28+
// does sidebar fit in window?
29+
if (sbh < win.innerHeight()) {
30+
// yes: easy case -- always keep at the top
31+
sb.css('top', $u.min([$u.max([0, wintop - offset - 10]),
32+
$(document).height() - sbh - 200]));
33+
} else {
34+
// no: only scroll if top/bottom edge of sidebar is at
35+
// top/bottom edge of window
36+
if (curtop > wintop && curbot > winbot) {
37+
sb.css('top', $u.max([wintop - offset - 10, 0]));
38+
} else if (curtop < wintop && curbot < winbot) {
39+
sb.css('top', $u.min([winbot - sbh - offset - 20,
40+
$(document).height() - sbh - 200]));
41+
}
42+
}
43+
});
44+
</script>
45+
{#- endif #}
46+
{% endblock %}
Lines changed: 232 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,232 @@
1+
<a name="base"></a>
2+
# Module base
3+
4+
<a name="base.BaseGenerator"></a>
5+
## BaseGenerator Objects
6+
7+
```python
8+
class BaseGenerator(BaseComponent)
9+
```
10+
11+
Abstract class for Generators
12+
13+
<a name="base.BaseGenerator.predict"></a>
14+
#### predict
15+
16+
```python
17+
| @abstractmethod
18+
| predict(query: str, documents: List[Document], top_k: Optional[int]) -> Dict
19+
```
20+
21+
Abstract method to generate answers.
22+
23+
**Arguments**:
24+
25+
- `query`: Query
26+
- `documents`: Related documents (e.g. coming from a retriever) that the answer shall be conditioned on.
27+
- `top_k`: Number of returned answers
28+
29+
**Returns**:
30+
31+
Generated answers plus additional infos in a dict
32+
33+
<a name="transformers"></a>
34+
# Module transformers
35+
36+
<a name="transformers.RAGenerator"></a>
37+
## RAGenerator Objects
38+
39+
```python
40+
class RAGenerator(BaseGenerator)
41+
```
42+
43+
Implementation of Facebook's Retrieval-Augmented Generator (https://arxiv.org/abs/2005.11401) based on
44+
HuggingFace's transformers (https://huggingface.co/transformers/model_doc/rag.html).
45+
46+
Instead of "finding" the answer within a document, these models **generate** the answer.
47+
In that sense, RAG follows a similar approach as GPT-3 but it comes with two huge advantages
48+
for real-world applications:
49+
a) it has a manageable model size
50+
b) the answer generation is conditioned on retrieved documents,
51+
i.e. the model can easily adjust to domain documents even after training has finished
52+
(in contrast: GPT-3 relies on the web data seen during training)
53+
54+
**Example**
55+
56+
```python
57+
| query = "who got the first nobel prize in physics?"
58+
|
59+
| # Retrieve related documents from retriever
60+
| retrieved_docs = retriever.retrieve(query=query)
61+
|
62+
| # Now generate answer from query and retrieved documents
63+
| generator.predict(
64+
| query=query,
65+
| documents=retrieved_docs,
66+
| top_k=1
67+
| )
68+
|
69+
| # Answer
70+
|
71+
| {'query': 'who got the first nobel prize in physics',
72+
| 'answers':
73+
| [{'query': 'who got the first nobel prize in physics',
74+
| 'answer': ' albert einstein',
75+
| 'meta': { 'doc_ids': [...],
76+
| 'doc_scores': [80.42758 ...],
77+
| 'doc_probabilities': [40.71379089355469, ...
78+
| 'content': ['Albert Einstein was a ...]
79+
| 'titles': ['"Albert Einstein"', ...]
80+
| }}]}
81+
```
82+
83+
<a name="transformers.RAGenerator.__init__"></a>
84+
#### \_\_init\_\_
85+
86+
```python
87+
| __init__(model_name_or_path: str = "facebook/rag-token-nq", model_version: Optional[str] = None, retriever: Optional[DensePassageRetriever] = None, generator_type: RAGeneratorType = RAGeneratorType.TOKEN, top_k: int = 2, max_length: int = 200, min_length: int = 2, num_beams: int = 2, embed_title: bool = True, prefix: Optional[str] = None, use_gpu: bool = True)
88+
```
89+
90+
Load a RAG model from Transformers along with passage_embedding_model.
91+
See https://huggingface.co/transformers/model_doc/rag.html for more details
92+
93+
**Arguments**:
94+
95+
- `model_name_or_path`: Directory of a saved model or the name of a public model e.g.
96+
'facebook/rag-token-nq', 'facebook/rag-sequence-nq'.
97+
See https://huggingface.co/models for full list of available models.
98+
- `model_version`: The version of model to use from the HuggingFace model hub. Can be tag name, branch name, or commit hash.
99+
- `retriever`: `DensePassageRetriever` used to embedded passages for the docs passed to `predict()`. This is optional and is only needed if the docs you pass don't already contain embeddings in `Document.embedding`.
100+
- `generator_type`: Which RAG generator implementation to use? RAG-TOKEN or RAG-SEQUENCE
101+
- `top_k`: Number of independently generated text to return
102+
- `max_length`: Maximum length of generated text
103+
- `min_length`: Minimum length of generated text
104+
- `num_beams`: Number of beams for beam search. 1 means no beam search.
105+
- `embed_title`: Embedded the title of passage while generating embedding
106+
- `prefix`: The prefix used by the generator's tokenizer.
107+
- `use_gpu`: Whether to use GPU (if available)
108+
109+
<a name="transformers.RAGenerator.predict"></a>
110+
#### predict
111+
112+
```python
113+
| predict(query: str, documents: List[Document], top_k: Optional[int] = None) -> Dict
114+
```
115+
116+
Generate the answer to the input query. The generation will be conditioned on the supplied documents.
117+
These document can for example be retrieved via the Retriever.
118+
119+
**Arguments**:
120+
121+
- `query`: Query
122+
- `documents`: Related documents (e.g. coming from a retriever) that the answer shall be conditioned on.
123+
- `top_k`: Number of returned answers
124+
125+
**Returns**:
126+
127+
Generated answers plus additional infos in a dict like this:
128+
129+
```python
130+
| {'query': 'who got the first nobel prize in physics',
131+
| 'answers':
132+
| [{'query': 'who got the first nobel prize in physics',
133+
| 'answer': ' albert einstein',
134+
| 'meta': { 'doc_ids': [...],
135+
| 'doc_scores': [80.42758 ...],
136+
| 'doc_probabilities': [40.71379089355469, ...
137+
| 'content': ['Albert Einstein was a ...]
138+
| 'titles': ['"Albert Einstein"', ...]
139+
| }}]}
140+
```
141+
142+
<a name="transformers.Seq2SeqGenerator"></a>
143+
## Seq2SeqGenerator Objects
144+
145+
```python
146+
class Seq2SeqGenerator(BaseGenerator)
147+
```
148+
149+
A generic sequence-to-sequence generator based on HuggingFace's transformers.
150+
151+
Text generation is supported by so called auto-regressive language models like GPT2,
152+
XLNet, XLM, Bart, T5 and others. In fact, any HuggingFace language model that extends
153+
GenerationMixin can be used by Seq2SeqGenerator.
154+
155+
Moreover, as language models prepare model input in their specific encoding, each model
156+
specified with model_name_or_path parameter in this Seq2SeqGenerator should have an
157+
accompanying model input converter that takes care of prefixes, separator tokens etc.
158+
By default, we provide model input converters for a few well-known seq2seq language models (e.g. ELI5).
159+
It is the responsibility of Seq2SeqGenerator user to ensure an appropriate model input converter
160+
is either already registered or specified on a per-model basis in the Seq2SeqGenerator constructor.
161+
162+
For mode details on custom model input converters refer to _BartEli5Converter
163+
164+
165+
See https://huggingface.co/transformers/main_classes/model.html?transformers.generation_utils.GenerationMixin#transformers.generation_utils.GenerationMixin
166+
as well as https://huggingface.co/blog/how-to-generate
167+
168+
For a list of all text-generation models see https://huggingface.co/models?pipeline_tag=text-generation
169+
170+
**Example**
171+
172+
```python
173+
| query = "Why is Dothraki language important?"
174+
|
175+
| # Retrieve related documents from retriever
176+
| retrieved_docs = retriever.retrieve(query=query)
177+
|
178+
| # Now generate answer from query and retrieved documents
179+
| generator.predict(
180+
| query=query,
181+
| documents=retrieved_docs,
182+
| top_k=1
183+
| )
184+
|
185+
| # Answer
186+
|
187+
| {'answers': [" The Dothraki language is a constructed fictional language. It's important because George R.R. Martin wrote it."],
188+
| 'query': 'Why is Dothraki language important?'}
189+
|
190+
```
191+
192+
<a name="transformers.Seq2SeqGenerator.__init__"></a>
193+
#### \_\_init\_\_
194+
195+
```python
196+
| __init__(model_name_or_path: str, input_converter: Optional[Callable] = None, top_k: int = 1, max_length: int = 200, min_length: int = 2, num_beams: int = 8, use_gpu: bool = True)
197+
```
198+
199+
**Arguments**:
200+
201+
- `model_name_or_path`: a HF model name for auto-regressive language model like GPT2, XLNet, XLM, Bart, T5 etc
202+
- `input_converter`: an optional Callable to prepare model input for the underlying language model
203+
specified in model_name_or_path parameter. The required __call__ method signature for
204+
the Callable is:
205+
__call__(tokenizer: PreTrainedTokenizer, query: str, documents: List[Document],
206+
top_k: Optional[int] = None) -> BatchEncoding:
207+
- `top_k`: Number of independently generated text to return
208+
- `max_length`: Maximum length of generated text
209+
- `min_length`: Minimum length of generated text
210+
- `num_beams`: Number of beams for beam search. 1 means no beam search.
211+
- `use_gpu`: Whether to use GPU (if available)
212+
213+
<a name="transformers.Seq2SeqGenerator.predict"></a>
214+
#### predict
215+
216+
```python
217+
| predict(query: str, documents: List[Document], top_k: Optional[int] = None) -> Dict
218+
```
219+
220+
Generate the answer to the input query. The generation will be conditioned on the supplied documents.
221+
These document can be retrieved via the Retriever or supplied directly via predict method.
222+
223+
**Arguments**:
224+
225+
- `query`: Query
226+
- `documents`: Related documents (e.g. coming from a retriever) that the answer shall be conditioned on.
227+
- `top_k`: Number of returned answers
228+
229+
**Returns**:
230+
231+
Generated answers
232+

0 commit comments

Comments
 (0)