Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(benchmarks): evaluation pipeline #71

Merged
merged 38 commits into from
Aug 6, 2024
Merged

Conversation

micpst
Copy link
Collaborator

@micpst micpst commented Jul 10, 2024

This PR restructures benchmarks folder, integrates benchmarking pipeline with Hugging Face and adds new metrics for IQL evaluation - acc/precision/recall for decision making, IQL generation, parseability and correctness rates.

@micpst micpst added the refactor Code change that neither fixes a bug nor adds a feature label Jul 10, 2024
@micpst micpst self-assigned this Jul 10, 2024
Copy link

Trivy scanning results.

Copy link

github-actions bot commented Jul 10, 2024

badge

Code Coverage Summary

Filename                                                  Stmts    Miss  Cover    Missing
------------------------------------------------------  -------  ------  -------  ---------------------------------------------------------------------
dbally/_main.py                                              13       1  92.31%   10
dbally/_types.py                                              8       1  87.50%   24
dbally/exceptions.py                                          1       0  100.00%
dbally/assistants/base.py                                    24       0  100.00%
dbally/assistants/openai.py                                  59       2  96.61%   59-76
dbally/audit/event_tracker.py                                36       8  77.78%   38-40, 53, 64, 74, 91, 97
dbally/audit/events.py                                       34       0  100.00%
dbally/audit/spans.py                                         7       0  100.00%
dbally/audit/event_handlers/base.py                          15       0  100.00%
dbally/audit/event_handlers/buffer_event_handler.py           8       0  100.00%
dbally/audit/event_handlers/cli_event_handler.py             56      35  37.50%   11-13, 47-55, 65-66, 79-98, 120-127, 138-145
dbally/audit/event_handlers/langsmith_event_handler.py       29      25  13.79%   6-106
dbally/audit/event_handlers/otel_event_handler.py            74      27  63.51%   19, 123, 126, 138-139, 159-179, 191-199, 209-219, 222-223
dbally/collection/collection.py                             126       3  97.62%   136, 153, 325
dbally/collection/exceptions.py                              13       0  100.00%
dbally/collection/results.py                                 14       0  100.00%
dbally/embeddings/base.py                                     5       0  100.00%
dbally/embeddings/exceptions.py                              15       6  60.00%   10-11, 20, 29-30, 39
dbally/embeddings/litellm.py                                 28      12  57.14%   7-8, 44, 68-84
dbally/gradio/gradio_interface.py                           111     111  0.00%    1-301
dbally/iql/_exceptions.py                                    49       1  97.96%   74
dbally/iql/_processor.py                                     84       5  94.05%   20, 75, 81, 87, 102
dbally/iql/_query.py                                         17       1  94.12%   8
dbally/iql/_type_validators.py                               38       2  94.74%   24, 28
dbally/iql/syntax.py                                         36       9  75.00%   6-9, 27, 36, 60, 63-66
dbally/iql_generator/iql_generator.py                        31       2  93.55%   89-90
dbally/iql_generator/prompt.py                               16       1  93.75%   33
dbally/llms/base.py                                          28       1  96.43%   34
dbally/llms/litellm.py                                       24      10  58.33%   8-9, 48-54, 61, 78
dbally/llms/local.py                                         18      18  0.00%    1-60
dbally/llms/clients/base.py                                  23       2  91.30%   46-47
dbally/llms/clients/exceptions.py                            15       6  60.00%   10-11, 20, 29-30, 39
dbally/llms/clients/litellm.py                               44      21  52.27%   8-9, 65-71, 97-120
dbally/llms/clients/local.py                                 33      33  0.00%    1-95
dbally/nl_responder/nl_responder.py                          24       4  83.33%   74-85
dbally/nl_responder/prompts.py                               19       4  78.95%   64-67
dbally/prompt/elements.py                                    25       1  96.00%   29
dbally/prompt/template.py                                    65       7  89.23%   33, 41, 49, 110, 127, 153, 205
dbally/similarity/chroma_store.py                            37       0  100.00%
dbally/similarity/elastic_vector_search.py                   19      16  15.79%   5-102
dbally/similarity/elasticsearch_store.py                     22      19  13.64%   5-107
dbally/similarity/faiss_store.py                             38      35  7.90%    5-103
dbally/similarity/fetcher.py                                  5       0  100.00%
dbally/similarity/index.py                                   26       0  100.00%
dbally/similarity/sqlalchemy_base.py                         44      19  56.82%   35-37, 46, 68, 77, 86-89, 99-105, 123-126
dbally/similarity/store.py                                    7       0  100.00%
dbally/view_selection/base.py                                 7       0  100.00%
dbally/view_selection/llm_view_selector.py                   17       0  100.00%
dbally/view_selection/prompt.py                               9       0  100.00%
dbally/view_selection/random_view_selector.py                10      10  0.00%    1-36
dbally/views/base.py                                         16       1  93.75%   52
dbally/views/decorators.py                                    6       0  100.00%
dbally/views/exceptions.py                                    8       4  50.00%   23-26
dbally/views/exposed_functions.py                            33       1  96.97%   24
dbally/views/methods_base.py                                 34       2  94.12%   75, 83
dbally/views/pandas_base.py                                  33       1  96.97%   64
dbally/views/sqlalchemy_base.py                              37       7  81.08%   48, 63-65, 83-87
dbally/views/structured.py                                   46       4  91.30%   80-87
dbally/views/freeform/text2sql/config.py                     21       1  95.24%   47
dbally/views/freeform/text2sql/exceptions.py                  7       3  57.14%   12-14
dbally/views/freeform/text2sql/prompt.py                     11       0  100.00%
dbally/views/freeform/text2sql/view.py                       93      22  76.34%   49, 52, 55, 58, 61, 73, 151, 155-158, 161, 191, 206-207, 215, 228-233
dbally_cli/main.py                                            5       5  0.00%    1-13
dbally_cli/text2sql.py                                       94      94  0.00%    1-248
dbally_codegen/autodiscovery.py                             122      18  85.25%   54-57, 241-243, 264-278, 281-284, 353-354, 450-455
dbally_codegen/generator.py                                 175       7  96.00%   81, 91, 314, 342, 360, 374, 420
TOTAL                                                      2247     628  72.05%

Diff against main

Filename                                 Stmts    Miss  Cover
-------------------------------------  -------  ------  -------
dbally/iql_generator/iql_generator.py       +4      +2  -6.45%
dbally/views/exceptions.py                  +8      +4  +50.00%
dbally/views/structured.py                  +8      +4  -8.70%
TOTAL                                      +20     +10  -0.20%

Results for commit: cb70ee9

Minimum allowed coverage is 60%

♻️ This comment has been updated with latest results

@micpst micpst force-pushed the mp/refactor-benchmarks branch from 1731da4 to db2207c Compare July 10, 2024 08:32
@micpst micpst marked this pull request as ready for review July 24, 2024 08:56
Copy link
Member

@mhordynski mhordynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

impressive work, thanks @micpst 🥇

@micpst micpst merged commit cd2cece into main Aug 6, 2024
3 checks passed
@micpst micpst deleted the mp/refactor-benchmarks branch August 6, 2024 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Code change that neither fixes a bug nor adds a feature
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants