Skip to content

Commit 794505c

Browse files
authored
Merge pull request #2022 from reebhub/RDoc-3248_TermVectors
Explain term vectors
2 parents 90a7971 + 963b114 commit 794505c

File tree

9 files changed

+361
-2
lines changed

9 files changed

+361
-2
lines changed

Documentation/4.0/Raven.Documentation.Pages/indexes/using-term-vectors.dotnet.markdown

+22-2
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,26 @@
11
# Indexes: Term Vectors
2+
---
23

3-
[Term Vector](https://en.wikipedia.org/wiki/Vector_space_model) is a representation of a text document as a vector of identifiers that can be used for similarity searches, information filtering, information retrieval, and indexing. In RavenDB the feature like [MoreLikeThis](../client-api/session/querying/how-to-use-morelikethis) is leveraging the term vectors to accomplish its purposes.
4+
{NOTE: }
45

5-
To create an index and enable Term Vectors on a specific field we can create an index using the `AbstractIndexCreationTask`, then specify the term vectors there, or define our term vectors in the `IndexDefinition` (directly or using the `IndexDefinitionBuilder`).
6+
* A [Term Vector](https://en.wikipedia.org/wiki/Vector_space_model) is a representation of a text document
7+
as a vector of identifiers.
8+
* A term vector can be used for similarity searches, information filtering, information retrieval, and indexing.
9+
* In RavenDB, features like [MoreLikeThis](../client-api/session/querying/how-to-use-morelikethis) leverage
10+
term vectors to accomplish their goals.
11+
12+
* In this page:
13+
* Creating an index that enables term vectors
14+
15+
{NOTE/}
16+
17+
---
18+
19+
{PANEL: }
20+
21+
To create an index and enable Term Vectors on a specific field we can create an index using
22+
the `AbstractIndexCreationTask`, then specify the term vectors there, or define our term vectors
23+
in the `IndexDefinition` (directly or using the `IndexDefinitionBuilder`).
624

725
{CODE-TABS}
826
{CODE-TAB:csharp:AbstractIndexCreationTask term_vectors_1@Indexes\TermVectors.cs /}
@@ -13,6 +31,8 @@ The available Term Vector options are:
1331

1432
{CODE term_vectors_3@Indexes\TermVectors.cs /}
1533

34+
{PANEL/}
35+
1636
## Related articles
1737

1838
### Indexes
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
# Indexes: Term Vectors
2+
---
3+
4+
{NOTE: }
5+
6+
* A [Term Vector](https://en.wikipedia.org/wiki/Vector_space_model) is a representation of a text document
7+
as a vector of identifiers.
8+
Lucene indexes can contain term vectors for documents they index.
9+
* Term vectors can be used for various purposes, including similarity searches, information filtering
10+
and retrieval, and indexing.
11+
A book's index, for example, may have term vector enabled on the book's **subject** field, to be able
12+
to use this field to search for books with similar subjects.
13+
* RavenDB features like [MoreLikeThis](../client-api/session/querying/how-to-use-morelikethis) leverage
14+
stored term vectors to accomplish their goals.
15+
16+
* In this page:
17+
* [Creating an index and enabling Term Vectors on a field](../indexes/using-term-vectors#creating-an-index-and-enabling-term-vectors-on-a-field)
18+
* [Using the API](../indexes/using-term-vectors#using-the-api)
19+
* [Using Studio](../indexes/using-term-vectors#using-studio)
20+
21+
{NOTE/}
22+
23+
---
24+
25+
{PANEL: Creating an index and enabling Term Vectors on a field}
26+
27+
Indexes that include term vectors can be created and configured using the API
28+
or Studio.
29+
30+
## Using the API
31+
32+
To create an index and enable Term Vectors on a specific field, we can -
33+
34+
A. Create an index using the `AbstractIndexCreationTask`, and specify the term vectors there.
35+
B. Or, we can define our term vectors in the `IndexDefinition` (directly or using the `IndexDefinitionBuilder`).
36+
37+
{CODE-TABS}
38+
{CODE-TAB:csharp:AbstractIndexCreationTask term_vectors_1@Indexes\TermVectors.cs /}
39+
{CODE-TAB:csharp:Operation term_vectors_2@Indexes\TermVectors.cs /}
40+
{CODE-TABS/}
41+
42+
Available Term Vector options include:
43+
44+
{CODE term_vectors_3@Indexes\TermVectors.cs /}
45+
46+
Learn which Lucene API methods and constants are available [here](https://lucene.apache.org/core/3_6_2/api/all/org/apache/lucene/document/Field.TermVector.html).
47+
48+
## Using Studio
49+
50+
Let's use as an example one of Studio's sample indexes, `Product/Search`, that has term vector
51+
enabled on its `Name` field so a feature like [MoreLikeThis](../client-api/session/querying/how-to-use-morelikethis)
52+
can use this fiels to select a product and find products similar to it.
53+
54+
![Term vector enabled on index field](images/term-vector-enabled.png "Term vector enabled on index field")
55+
56+
We can now use a query like:
57+
58+
{CODE-BLOCK:sql}
59+
from index 'Product/Search'
60+
where morelikethis(id() = 'products/7-A')
61+
{CODE-BLOCK/}
62+
63+
{PANEL/}
64+
65+
## Related articles
66+
67+
### Indexes
68+
69+
- [Boosting](../indexes/boosting)
70+
- [Analyzers](../indexes/using-analyzers)
71+
- [Storing Data in Index](../indexes/storing-data-in-index)
72+
- [Dynamic Fields](../indexes/using-dynamic-fields)
73+
74+
## External articles
75+
76+
- [Lucene API](https://lucene.apache.org/core/3_6_2/api/all/org/apache/lucene/document/Field.TermVector.html)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Indexes: Term Vectors
2+
3+
[Term Vector](https://en.wikipedia.org/wiki/Vector_space_model) is a representation of a text document as a vector of identifiers that can be used for similarity searches, information filtering, information retrieval, and indexing. In RavenDB the feature like [MoreLikeThis](../client-api/session/querying/how-to-use-morelikethis) is leveraging the term vectors to accomplish its purposes.
4+
5+
To create an index and enable Term Vectors on a specific field we can create an index using the `AbstractIndexCreationTask`, then specify the term vectors there, or define our term vectors in the `IndexDefinition` (directly or using the `IndexDefinitionBuilder`).
6+
7+
{CODE-TABS}
8+
{CODE-TAB:java:AbstractIndexCreationTask term_vectors_1@Indexes\TermVectors.java /}
9+
{CODE-TAB:java:Operation term_vectors_2@Indexes\TermVectors.java /}
10+
{CODE-TABS/}
11+
12+
The available Term Vector options are:
13+
14+
{CODE:java term_vectors_3@Indexes\TermVectors.java /}
15+
16+
## Related articles
17+
18+
### Indexes
19+
20+
- [Boosting](../indexes/boosting)
21+
- [Analyzers](../indexes/using-analyzers)
22+
- [Storing Data in Index](../indexes/storing-data-in-index)
23+
- [Dynamic Fields](../indexes/using-dynamic-fields)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Indexes: Term Vectors
2+
3+
[Term Vector](https://en.wikipedia.org/wiki/Vector_space_model) is a representation of a text document as a vector of identifiers that can be used for similarity searches, information filtering, information retrieval, and indexing. In RavenDB the feature like [MoreLikeThis](../client-api/session/querying/how-to-use-morelikethis) is leveraging the term vectors to accomplish its purposes.
4+
5+
To create an index and enable Term Vectors on a specific field we can create an index using the `AbstractIndexCreationTask`, then specify the term vectors there, or define our term vectors in the `IndexDefinition` (directly or using the `IndexDefinitionBuilder`).
6+
7+
{CODE-TABS}
8+
{CODE-TAB:nodejs:AbstractIndexCreationTask term_vectors_1@indexes\termVectors.js /}
9+
{CODE-TAB:nodejs:Operation term_vectors_2@indexes\termVectors.js /}
10+
{CODE-TABS/}
11+
12+
The available Term Vector options are:
13+
14+
| Term Vector | |
15+
| ----------- | - |
16+
| `"No"` | Do not store term vectors |
17+
| `"Yes"` | Store the term vectors of each document. A term vector is a list of the document's terms and their number of occurrences in that document. |
18+
| `"WithPositions"` | Store the term vector + token position information |
19+
| `"WithOffsets"` | Store the term vector + token offset information |
20+
| `"WithPositionsAndOffsets"` | Store the term vector + token position and offset information |
21+
22+
## Related articles
23+
24+
### Indexes
25+
26+
- [Boosting](../indexes/boosting)
27+
- [Analyzers](../indexes/using-analyzers)
28+
- [Storing Data in Index](../indexes/storing-data-in-index)
29+
- [Dynamic Fields](../indexes/using-dynamic-fields)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
using System.Linq;
2+
using Raven.Client.Documents;
3+
using Raven.Client.Documents.Indexes;
4+
using Raven.Client.Documents.Operations.Indexes;
5+
6+
namespace Raven.Documentation.Samples.Indexes
7+
{
8+
namespace Foo
9+
{
10+
#region term_vectors_3
11+
public enum FieldTermVector
12+
{
13+
/// <summary>
14+
/// Do not store term vectors
15+
/// </summary>
16+
No,
17+
18+
/// <summary>
19+
/// Store the term vectors of each document. A term vector is a list of the document's
20+
/// terms and their number of occurrences in that document.
21+
/// </summary>
22+
Yes,
23+
24+
/// <summary>
25+
/// Store the term vector + token position information
26+
/// </summary>
27+
WithPositions,
28+
29+
/// <summary>
30+
/// Store the term vector + Token offset information
31+
/// </summary>
32+
WithOffsets,
33+
34+
/// <summary>
35+
/// Store the term vector + Token position and offset information
36+
/// </summary>
37+
WithPositionsAndOffsets
38+
}
39+
#endregion
40+
}
41+
42+
public class TermVectors
43+
{
44+
#region term_vectors_1
45+
public class BlogPosts_ByTagsAndContent : AbstractIndexCreationTask<BlogPost>
46+
{
47+
public BlogPosts_ByTagsAndContent()
48+
{
49+
Map = users => from doc in users
50+
select new
51+
{
52+
doc.Tags,
53+
doc.Content
54+
};
55+
56+
Indexes.Add(x => x.Content, FieldIndexing.Search);
57+
TermVectors.Add(x => x.Content, FieldTermVector.WithPositionsAndOffsets);
58+
}
59+
}
60+
#endregion
61+
62+
public TermVectors()
63+
{
64+
using (var store = new DocumentStore())
65+
{
66+
#region term_vectors_2
67+
IndexDefinitionBuilder<BlogPost> indexDefinitionBuilder =
68+
new IndexDefinitionBuilder<BlogPost>("BlogPosts/ByTagsAndContent")
69+
{
70+
Map = users => from doc in users
71+
select new
72+
{
73+
doc.Tags,
74+
doc.Content
75+
},
76+
Indexes =
77+
{
78+
{ x => x.Content, FieldIndexing.Search }
79+
},
80+
TermVectors =
81+
{
82+
{ x => x.Content, FieldTermVector.WithPositionsAndOffsets }
83+
}
84+
};
85+
86+
IndexDefinition indexDefinition = indexDefinitionBuilder
87+
.ToIndexDefinition(store.Conventions);
88+
89+
store.Maintenance.Send(new PutIndexesOperation(indexDefinition));
90+
#endregion
91+
}
92+
}
93+
}
94+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
package net.ravendb.Indexes;
2+
3+
import net.ravendb.client.documents.DocumentStore;
4+
import net.ravendb.client.documents.IDocumentStore;
5+
import net.ravendb.client.documents.indexes.*;
6+
import net.ravendb.client.documents.operations.indexes.PutIndexesOperation;
7+
8+
public class TermVectors {
9+
10+
public static class Foo {
11+
//region term_vectors_3
12+
public enum FieldTermVector {
13+
/**
14+
* Do not store term vectors
15+
*/
16+
NO,
17+
18+
/**
19+
* Store the term vectors of each document. A term vector is a list of the document's
20+
* terms and their number of occurrences in that document.
21+
*/
22+
YES,
23+
/**
24+
* Store the term vector + token position information
25+
*/
26+
WITH_POSITIONS,
27+
/**
28+
* Store the term vector + Token offset information
29+
*/
30+
WITH_OFFSETS,
31+
32+
/**
33+
* Store the term vector + Token position and offset information
34+
*/
35+
WITH_POSITIONS_AND_OFFSETS
36+
}
37+
//endregion
38+
}
39+
40+
41+
42+
//region term_vectors_1
43+
public static class BlogPosts_ByTagsAndContent extends AbstractIndexCreationTask {
44+
public BlogPosts_ByTagsAndContent() {
45+
map = "docs.Posts.Select(post => new { " +
46+
" Tags = post.Tags, " +
47+
" Content = post.Content " +
48+
"})";
49+
50+
index("Content", FieldIndexing.SEARCH);
51+
termVector("Content", FieldTermVector.WITH_POSITIONS_AND_OFFSETS);
52+
}
53+
}
54+
//endregion
55+
56+
public TermVectors() {
57+
try (IDocumentStore store = new DocumentStore()) {
58+
//region term_vectors_2
59+
IndexDefinitionBuilder builder = new IndexDefinitionBuilder("BlogPosts/ByTagsAndContent");
60+
builder.setMap("docs.Posts.Select(post => new { " +
61+
" Tags = post.Tags, " +
62+
" Content = post.Content " +
63+
"})");
64+
65+
builder.getIndexesStrings().put("Content", FieldIndexing.SEARCH);
66+
builder.getTermVectorsStrings().put("Content", FieldTermVector.WITH_POSITIONS_AND_OFFSETS);
67+
68+
IndexDefinition indexDefinition = builder.toIndexDefinition(store.getConventions());
69+
70+
store.maintenance().send(new PutIndexesOperation(indexDefinition));
71+
//endregion
72+
}
73+
}
74+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
import {
2+
AbstractIndexCreationTask,
3+
DocumentStore,
4+
IndexDefinition,
5+
PutIndexesOperation,
6+
IndexDefinitionBuilder
7+
} from "ravendb";
8+
9+
const store = new DocumentStore();
10+
const session = store.openSession();
11+
12+
//region term_vectors_1
13+
class BlogPosts_ByTagsAndContent extends AbstractIndexCreationTask {
14+
constructor() {
15+
super();
16+
17+
this.map = `docs.Posts.Select(post => new {
18+
tags = post.tags,
19+
content = post.content
20+
})`;
21+
22+
this.index("content", "Search");
23+
this.termVector("content", "WithPositionsAndOffsets");
24+
}
25+
}
26+
//endregion
27+
28+
async function termVectors() {
29+
//region term_vectors_2
30+
const builder = new IndexDefinitionBuilder("BlogPosts/ByTagsAndContent");
31+
builder.map = `docs.Posts.Select(post => new {
32+
tags = post.tags,
33+
content = post.content
34+
})`;
35+
36+
builder.indexesStrings["content"] = "Search";
37+
builder.termVectorsStrings["content"] = "WithPositionsAndOffsets";
38+
39+
const indexDefinition = builder.toIndexDefinition(store.conventions);
40+
41+
await store.maintenance.send(new PutIndexesOperation(indexDefinition));
42+
//endregion
43+
}

0 commit comments

Comments
 (0)