Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC with enhancements to MultimodalQnA #208

Merged
merged 38 commits into from
Oct 17, 2024
Merged
Changes from 1 commit
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
ebd4e93
Initial add of MM RAG enhancements RFC
dmsuehir Oct 3, 2024
a2b9b34
Formatting
dmsuehir Oct 3, 2024
23d554a
Minor changes to design section
dmsuehir Oct 3, 2024
5fe0eec
Initial add of MM RAG enhancements RFC
dmsuehir Oct 3, 2024
1f2026d
Formatting
dmsuehir Oct 3, 2024
c64e3c6
Minor changes to design section
dmsuehir Oct 3, 2024
144b6a8
Merge branch 'dina/mm-rfc' of github.com:dmsuehir/opea-project-docs i…
dmsuehir Oct 3, 2024
421846a
Merge branch 'opea-project:main' into dina/mm-rfc
dmsuehir Oct 3, 2024
1f9c64c
Merge branch 'main' of github.com:dmsuehir/opea-project-docs into din…
dmsuehir Oct 3, 2024
dd96c45
Merge branch 'dina/mm-rfc' of github.com:dmsuehir/opea-project-docs i…
dmsuehir Oct 3, 2024
67da09c
Added use case descriptions
mhbuehler Oct 4, 2024
9de00ef
Add WIP table of dataprep endpoints
dmsuehir Oct 7, 2024
b092b78
Added use case descriptions
mhbuehler Oct 4, 2024
8c310a7
Add WIP table of dataprep endpoints
dmsuehir Oct 7, 2024
c53fce4
Merge branch 'dina/mm-rfc' of github.com:dmsuehir/opea-project-docs i…
mhbuehler Oct 7, 2024
34833ae
Update to combine image/audio dataprep with existing endpoints and ad…
dmsuehir Oct 9, 2024
23f6723
Merge branch 'main' of github.com:dmsuehir/opea-project-docs into din…
dmsuehir Oct 9, 2024
07dc126
Adds UI section & mockups
mhbuehler Oct 9, 2024
051eca5
Add diagram and provide more detail for the gateway/embedding service…
dmsuehir Oct 9, 2024
546620b
Improved audio ingestion UI description & mockup
mhbuehler Oct 9, 2024
1b6dc36
Update dataprep endpoint paths
dmsuehir Oct 9, 2024
288cf8d
Filled out the compatiblity and miscellaneous sections and updated di…
dmsuehir Oct 10, 2024
1010066
Merge branch 'opea-project:main' into dina/mm-rfc
dmsuehir Oct 10, 2024
60516cd
Merge branch 'main' of github.com:dmsuehir/opea-project-docs into din…
dmsuehir Oct 10, 2024
0d8a423
Merge branch 'dina/mm-rfc' of github.com:dmsuehir/opea-project-docs i…
dmsuehir Oct 10, 2024
93b5ed8
Formatting
dmsuehir Oct 10, 2024
c11569f
Added TTS to the diagram and RFC for spoken responses
dmsuehir Oct 11, 2024
434ec3c
Merge branch 'main' of github.com:dmsuehir/opea-project-docs into din…
dmsuehir Oct 11, 2024
32eea00
Improved language and added UI alternative
mhbuehler Oct 11, 2024
7c692f4
Merge pull request #1 from dmsuehir/melanie/rfc_edit
dmsuehir Oct 11, 2024
f98babb
Rename file
dmsuehir Oct 11, 2024
ceeb954
PDF processing (#2)
mhbuehler Oct 14, 2024
1ea893e
Merge branch 'main' of github.com:dmsuehir/opea-project-docs into din…
dmsuehir Oct 14, 2024
5bfc86d
Update notes about compatibility (#3)
dmsuehir Oct 14, 2024
3eac196
Added Other Enhancements Section (#4)
okhleif-IL Oct 14, 2024
9e15e65
Merge branch 'main' of github.com:dmsuehir/opea-project-docs into din…
dmsuehir Oct 15, 2024
61bccd7
Minor updates and formatting (#5)
dmsuehir Oct 16, 2024
f23e413
Updated video UI image (#6)
mhbuehler Oct 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Formatting
Signed-off-by: Dina Suehiro Jones <[email protected]>
dmsuehir committed Oct 10, 2024
commit 93b5ed8b6f1bee69fb067bd71f1dbb371e806308
Original file line number Diff line number Diff line change
@@ -146,9 +146,12 @@ We list each proposed change in detail below and then provide mockups of the new
## Alternatives Considered

The following alternatives can be considered:
* Instead of having the embedding microservice use the ASR microservice, it could directly use the whisper model
(similar to how the multimodal data prep uses the whisper model to transcribe video audio). Using the whisper model
directly instead of going through ASR would reduce the number of running containers/services.
* In order to use the ASR microservice would add 2 more containers (`opea/asr` and `opea/whisper`/`opea/whisper-gaudi`)
to the `compose.yaml` file, and when using Gaudi, the whipser service container would use 1 HPU. Instead of having the
embedding microservice use ASR, it could directly use the whisper model (similar to how the multimodal data prep uses
the whisper model to transcribe video audio). However, using the whisper model directly from a container running on
CPU (like the embedding service or data prep) means that we aren't getting the performance benefits of Gaudi when
converting speech-to-text with the whisper model.
* In data prep, we could have separate endpoints for different type of media. For example, instead of having
`/v1/ingest_with_text`, we could break that out into `/v1/videos_with_transcript` and `/v1/images_with_text`
separately.
@@ -174,12 +177,7 @@ List other information user and developer may care about, such as:
- TODO List or staging plan.
-->

It should be considered that the addition of [ASR](https://github.com/opea-project/GenAIComps/tree/main/comps/asr/whisper)
adds another microservice to MultimodalQnA. This means that the `compose.yaml` files will need to start 2 more
containers (`opea/asr` and `opea/whisper`) and when using Gaudi, the whisper service container will use 1 HPU. If this
is deemed too expensive, the embedding service can use the whipser model directly (without ASR), however this may mean
that the speech-to-text translation is done using CPU, since the embedding service in the Gaudi example currently runs
using CPU.
### Development Phases

We have planned the following development phases based on the priority of the features and their development effort: