-
Notifications
You must be signed in to change notification settings - Fork 459
Index to search #1276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Index to search #1276
Conversation
89fd00e to
526d757
Compare
10bfd94 to
5bd6b18
Compare
️✅ There are no secrets present in this pull request anymore.If these secrets were true positive and are still valid, we highly recommend you to revoke them. 🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request. |
5bd6b18 to
e966594
Compare
qbey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First review, I know work is still ongoing and I did not read all the tests... :)
7cfa907 to
7255ec2
Compare
7255ec2 to
ee1105f
Compare
652c868 to
e9fdc43
Compare
553332f to
bca3a22
Compare
Search in Docs relies on an external project like "La Suite Find". We need to declare a common external network in order to connect to the search app and index our documents.
We need to content in our demo documents so that we can test indexing.
Add indexer that loops across documents in the database, formats them as json objects and indexes them in the remote "Find" mico-service.
On document content or permission changes, start a celery job that will call the indexation API of the app "Find". Signed-off-by: Fabre Florian <[email protected]>
Signed-off-by: Fabre Florian <[email protected]>
Signed-off-by: Fabre Florian <[email protected]>
New API view that calls the indexed documents search view (resource server) of app "Find". Signed-off-by: Fabre Florian <[email protected]>
New SEARCH_INDEXER_CLASS setting to define the indexer service class. Raise ImpoperlyConfigured errors instead of RuntimeError in index service. Signed-off-by: Fabre Florian <[email protected]>
Signed-off-by: Fabre Florian <[email protected]>
Filter deleted documents from visited ones. Set default ordering to the Find API search call (-updated_at) BaseDocumentIndexer.search now returns a list of document ids instead of models. Do not call the indexer in signals when SEARCH_INDEXER_CLASS is not defined or properly configured. Signed-off-by: Fabre Florian <[email protected]>
Only documents without title and content are ignored by indexer.
Add SEARCH_INDEXER_COUNTDOWN as configurable setting. Make the search backend creation simplier (only 'get_document_indexer' now). Allow indexation of deleted documents. Signed-off-by: Fabre Florian <[email protected]>
Add bin/fernetkey that generates a key for the OIDC_STORE_REFRESH_TOKEN_KEY setting. Signed-off-by: Fabre Florian <[email protected]>
Add nginx with 'nginx' alias to the 'lasuite-net' network (keycloak calls) Add celery-dev to the 'lasuite-net' network (Find API calls in jobs) Set app-dev alias as 'impress' in the 'lasuite-net' network Add indexer configuration in common settings Signed-off-by: Fabre Florian <[email protected]>
Rename FindDocumentIndexer as SearchIndexer Rename FindDocumentSerializer as SearchDocumentSerializer Rename package core.tasks.find as core.task.search Remove logs on http errors in SearchIndexer Factorise some code in search API view. Signed-off-by: Fabre Florian <[email protected]>
Replace indexer_debounce_lock|release functions by indexer_throttle_acquire() Instead of mutex-like mechanism, simply set a flag in cache for an amount of time that prevents any other task creation. Signed-off-by: Fabre Florian <[email protected]>
Keep ordering by score from Find API on search/ results and fallback search still uses "-update_at" ordering as default Refactor pagination to work with a list instead of a queryset Signed-off-by: Fabre Florian <[email protected]>
Set SEARCH_INDEXER_CLASS=None as default configuration for dev. Rename docker network 'lasuite-net' as 'lasuite' to match with Drive configuration. Signed-off-by: Fabre Florian <[email protected]>
Add documentation for env & Find+Docs configuration in dev mode Signed-off-by: Fabre Florian <[email protected]>
Reduce the number of Find API calls by grouping all the latest changes for indexation : send all the documents updated or deleted since the triggering of the task. Signed-off-by: Fabre Florian <[email protected]>
As we filter the empty documents from the batch during indexing some batches can be empty and cause an error. Now they are ignored. Add --batch-size argument to the index command. Signed-off-by: Fabre Florian <[email protected]>
81070d0 to
94c792f
Compare
Use nb_results instead of page/page_size argument for /search API. Signed-off-by: Fabre Florian <[email protected]>
94c792f to
f2106dd
Compare
Purpose
We want to add fulltext (and semantic in a second phase) search to Docs.
The goal is to enable efficient and scalable search across document content by pushing relevant data to a dedicated search backend, such as OpenSearch. The backend should be pluggable.
Proposal
Fixes #322