Skip to content

Commit 9f6b5df

Browse files
authored
docs(aggregations): structured view aggregations (#87)
1 parent 64086a6 commit 9f6b5df

13 files changed

+377
-18
lines changed

README.md

+4-2
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ The benefits of db-ally can be described in terms of its four main characteristi
3333

3434
## Quickstart
3535

36-
In db-ally, developers define their use cases by implementing [**views**](https://db-ally.deepsense.ai/concepts/views) and **filters**. A list of possible filters is presented to the LLM in terms of [**IQL**](https://db-ally.deepsense.ai/concepts/iql) (Intermediate Query Language). Views are grouped and registered within a [**collection**](https://db-ally.deepsense.ai/concepts/views), which then serves as an entry point for asking questions in natural language.
36+
In db-ally, developers define their use cases by implementing [**views**](https://db-ally.deepsense.ai/concepts/views), **filters** and **aggregations**. A list of possible filters and aggregations is presented to the LLM in terms of [**IQL**](https://db-ally.deepsense.ai/concepts/iql) (Intermediate Query Language). Views are grouped and registered within a [**collection**](https://db-ally.deepsense.ai/concepts/views), which then serves as an entry point for asking questions in natural language.
3737

3838
This is a basic implementation of a db-ally view for an example HR application, which retrieves candidates from an SQL database:
3939

@@ -60,8 +60,10 @@ class CandidateView(SqlAlchemyBaseView):
6060
"""
6161
return Candidate.country == country
6262

63-
engine = create_engine('sqlite:///examples/recruiting/data/candidates.db')
63+
6464
llm = LiteLLM(model_name="gpt-3.5-turbo")
65+
engine = create_engine("sqlite:///examples/recruiting/data/candidates.db")
66+
6567
my_collection = create_collection("collection_name", llm)
6668
my_collection.add(CandidateView, lambda: CandidateView(engine))
6769

docs/about/roadmap.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Below you can find a list of planned features and integrations.
99

1010
## Planned Features
1111

12-
- [ ] **Support analytical queries**: support for exposing operations beyond filtering.
12+
- [x] **Support analytical queries**: support for exposing operations beyond filtering.
1313
- [x] **Few-shot prompting configuration**: allow users to configure the few-shot prompting in View definition to
1414
improve IQL generation accuracy.
1515
- [ ] **Request contextualization**: allow to provide extra context for db-ally runs, such as user asking the question.

docs/concepts/iql.md

+36-3
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,45 @@
11
# Concept: IQL
22

3-
Intermediate Query Language (IQL) is a simple language that serves as an abstraction layer between natural language and data source-specific query syntax, such as SQL. With db-ally's [structured views](./structured_views.md), LLM utilizes IQL to express complex queries in a simplified way.
3+
Intermediate Query Language (IQL) is a simple language that serves as an abstraction layer between natural language and data source-specific query syntax, such as SQL. With db-ally's [structured views](structured_views.md), LLM utilizes IQL to express complex queries in a simplified way. IQL allows developers to model operations such as filtering and aggregation on the underlying data.
4+
5+
## Filtering
46

57
For instance, an LLM might generate an IQL query like this when asked "Find me French candidates suitable for a senior data scientist position":
68

9+
```python
10+
from_country("France") AND senior_data_scientist_position()
711
```
8-
from_country('France') AND senior_data_scientist_position()
12+
13+
The capabilities made available to the AI model via IQL differ between projects. Developers control these by defining special [views](structured_views.md). db-ally automatically exposes special methods defined in structured views, known as "filters", via IQL. For instance, the expression above suggests that the specific project contains a view that includes the `from_country` and `senior_data_scientist_position` methods (and possibly others that the LLM did not choose to use for this particular question). Additionally, the LLM can use boolean operators (`AND`, `OR`, `NOT`) to combine individual filters into more complex expressions.
14+
15+
## Aggregation
16+
17+
Similar to filtering, developers can define special methods in [structured views](structured_views.md) that perform aggregation. These methods are also exposed to the LLM via IQL. For example, an LLM might generate the following IQL query when asked "What's the average salary for each country?":
18+
19+
```python
20+
average_salary_by_country()
921
```
1022

11-
The capabilities made available to the AI model via IQL differ between projects. Developers control these by defining special [Views](structured_views.md). db-ally automatically exposes special methods defined in structured views, known as "filters", via IQL. For instance, the expression above suggests that the specific project contains a view that includes the `from_country` and `senior_data_scientist_position` methods (and possibly others that the LLM did not choose to use for this particular question). Additionally, the LLM can use Boolean operators (`and`,`or`, `not`) to combine individual filters into more complex expressions.
23+
The `average_salary_by_country` groups candidates by country and calculates the average salary for each group.
24+
25+
The aggregation IQL call has access to the raw query, so it can perform even more complex aggregations. Like grouping different columns, or applying a custom functions. We can ask db-ally to generate candidates raport with the following IQL query:
26+
27+
```python
28+
candidate_report()
29+
```
30+
31+
In this case, the `candidate_report` method is defined in a structured view, and it performs a series of aggregations and calculations to produce a report with the average salary, number of candiates, and other metrics, by country.
32+
33+
## Operation chaining
34+
35+
Some queries require filtering and aggregation. For example, to calculate the average salary for a data scientist in the US, we first need to filter the data to include only US candidates who are senior specialists, and then calculate the average salary. In this case, db-ally will first generate an IQL query to filter the data, and then another IQL query to calculate the average salary.
36+
37+
```python
38+
from_country("USA") AND senior_data_scientist_position()
39+
```
40+
41+
```python
42+
average_salary()
43+
```
1244

45+
In this case, db-ally will execute queries sequentially to build a single query plan to execute on the data source.

docs/concepts/structured_views.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Structured views are a type of [view](../concepts/views.md), which provide a way
77

88
Given different natural language queries, a db-ally view will produce different responses while maintaining a consistent data structure. This consistency offers a reliable interface for integration - the code consuming responses from a particular structured view knows what data structure to expect and can utilize this knowledge when displaying or processing the data. This feature of db-ally makes it stand out in terms of reliability and stability compared to standard text-to-SQL approaches.
99

10-
Each structured view can contain one or more filters, which the LLM may decide to choose and apply to the extracted data so that it meets the criteria specified in the natural language query. Given such a query, LLM chooses which filters to use, provides arguments to the filters, and connects the filters with Boolean operators. The LLM expresses these filter combinations using a special language called [IQL](iql.md), in which the defined view filters provide a layer of abstraction between the LLM and the raw syntax used to query the data source (e.g., SQL).
10+
Each structured view can contain one or more **filters** or **aggregations**, which the LLM may decide to choose and apply to the extracted data so that it meets the criteria specified in the natural language query. Given such a query, LLM chooses which filters to use, provides arguments to the filters, and connects the filters with boolean operators. For aggregations, the LLM selects an appropriate aggregation method and applies it to the data. The LLM expresses these filter combinations and aggregation using a special language called [IQL](iql.md), in which the defined view filters and aggregations provide a layer of abstraction between the LLM and the raw syntax used to query the data source (e.g., SQL).
1111

1212
!!! example
1313
For instance, this is a simple [view that uses SQLAlchemy](../how-to/views/sql.md) to select data from specific columns in a SQL database. It contains a single filter, that the LLM may optionally use to control which table rows to fetch:
@@ -18,14 +18,14 @@ Each structured view can contain one or more “filters”, which the LLM may de
1818
A view for retrieving candidates from the database.
1919
"""
2020

21-
def get_select(self):
21+
def get_select(self) -> Select:
2222
"""
2323
Defines which columns to select
2424
"""
2525
return sqlalchemy.select(Candidate.id, Candidate.name, Candidate.country)
2626

2727
@decorators.view_filter()
28-
def from_country(self, country: str):
28+
def from_country(self, country: str) -> ColumnElement:
2929
"""
3030
Filter candidates from a specific country.
3131
"""

docs/index.md

+6-4
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ hide:
1010
</style>
1111

1212
<div align="center" markdown="span">
13-
![dbally logo](https://raw.githubusercontent.com/deepsense-ai/db-ally/mp/update-logo/docs/assets/banner-light.svg#only-light){ width="30%" }
14-
![dbally logo](https://raw.githubusercontent.com/deepsense-ai/db-ally/mp/update-logo/docs/assets/banner-dark.svg#only-dark){ width="30%" }
13+
![dbally logo](https://raw.githubusercontent.com/deepsense-ai/db-ally/main/docs/assets/banner-light.svg#only-light){ width="30%" }
14+
![dbally logo](https://raw.githubusercontent.com/deepsense-ai/db-ally/main/docs/assets/banner-dark.svg#only-dark){ width="30%" }
1515
</div>
1616

1717
<p align="center">
@@ -49,7 +49,7 @@ The benefits of db-ally can be described in terms of its four main characteristi
4949

5050
## Quickstart
5151

52-
In db-ally, developers define their use cases by implementing [**views**](https://db-ally.deepsense.ai/concepts/views) and **filters**. A list of possible filters is presented to the LLM in terms of [**IQL**](https://db-ally.deepsense.ai/concepts/iql) (Intermediate Query Language). Views are grouped and registered within a [**collection**](https://db-ally.deepsense.ai/concepts/views), which then serves as an entry point for asking questions in natural language.
52+
In db-ally, developers define their use cases by implementing [**views**](https://db-ally.deepsense.ai/concepts/views), **filters** and **aggregations**. A list of possible filters and aggregations is presented to the LLM in terms of [**IQL**](https://db-ally.deepsense.ai/concepts/iql) (Intermediate Query Language). Views are grouped and registered within a [**collection**](https://db-ally.deepsense.ai/concepts/views), which then serves as an entry point for asking questions in natural language.
5353

5454
This is a basic implementation of a db-ally view for an example HR application, which retrieves candidates from an SQL database:
5555

@@ -76,8 +76,10 @@ class CandidateView(SqlAlchemyBaseView):
7676
"""
7777
return Candidate.country == country
7878

79-
engine = create_engine('sqlite:///examples/recruiting/data/candidates.db')
79+
8080
llm = LiteLLM(model_name="gpt-3.5-turbo")
81+
engine = create_engine("sqlite:///examples/recruiting/data/candidates.db")
82+
8183
my_collection = create_collection("collection_name", llm)
8284
my_collection.add(CandidateView, lambda: CandidateView(engine))
8385

docs/quickstart/aggregations.md

+93
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# Quickstart: Aggregations
2+
3+
This guide is a continuation of the [Intro](./intro.md) guide. It assumes that you have already set up the views and the collection. If not, please refer to the complete Part 1 code on [GitHub](https://github.com/deepsense-ai/db-ally/blob/main/examples/intro.py){:target="_blank"}.
4+
5+
In this guide, we will add aggregations to our view to calculate general metrics about the candidates.
6+
7+
## View Definition
8+
9+
To add aggregations to our [structured view](../concepts/structured_views.md), we'll define new methods. These methods will allow the LLM model to perform calculations and summarize data across multiple rows. Let's add three aggregation methods to our `CandidateView`:
10+
11+
```python
12+
class CandidateView(SqlAlchemyBaseView):
13+
"""
14+
A view for retrieving candidates from the database.
15+
"""
16+
17+
def get_select(self) -> sqlalchemy.Select:
18+
"""
19+
Creates the initial SqlAlchemy select object, which will be used to build the query.
20+
"""
21+
return sqlalchemy.select(Candidate)
22+
23+
@decorators.view_aggregation()
24+
def average_years_of_experience(self) -> sqlalchemy.Select:
25+
"""
26+
Calculates the average years of experience of candidates.
27+
"""
28+
return self.select.with_only_columns(
29+
sqlalchemy.func.avg(Candidate.years_of_experience).label("average_years_of_experience")
30+
)
31+
32+
@decorators.view_aggregation()
33+
def positions_per_country(self) -> sqlalchemy.Select:
34+
"""
35+
Returns the number of candidates per position per country.
36+
"""
37+
return (
38+
self.select.with_only_columns(
39+
sqlalchemy.func.count(Candidate.position).label("number_of_positions"),
40+
Candidate.position,
41+
Candidate.country,
42+
)
43+
.group_by(Candidate.position, Candidate.country)
44+
.order_by(sqlalchemy.desc("number_of_positions"))
45+
)
46+
47+
@decorators.view_aggregation()
48+
def candidates_per_country(self) -> sqlalchemy.Select:
49+
"""
50+
Returns the number of candidates per country.
51+
"""
52+
return (
53+
self.select.with_only_columns(
54+
sqlalchemy.func.count(Candidate.id).label("number_of_candidates"),
55+
Candidate.country,
56+
)
57+
.group_by(Candidate.country)
58+
)
59+
```
60+
61+
By setting up these aggregations, you enable the LLM to calculate metrics about the average years of experience, the number of candidates per position per country, and the top universities based on the number of candidates.
62+
63+
## Query Execution
64+
65+
Having already defined and registered the view with the collection, we can now execute the query:
66+
67+
```python
68+
result = await collection.ask("What is the average years of experience of candidates?")
69+
print(result.results)
70+
```
71+
72+
This will return the average years of experience of candidates.
73+
74+
<details>
75+
<summary>The expected output</summary>
76+
```
77+
The generated SQL query is: SELECT avg(candidates.years_of_experience) AS average_years_of_experience
78+
FROM candidates
79+
80+
Number of rows: 1
81+
{'average_years_of_experience': 4.98}
82+
```
83+
</details>
84+
85+
Feel free to try other questions like: "What's the distribution of candidates across different positions and countries?" or "How many candidates are from China?".
86+
87+
## Full Example
88+
89+
Access the full example on [GitHub](https://github.com/deepsense-ai/db-ally/blob/main/examples/aggregations.py){:target="_blank"}.
90+
91+
## Next Steps
92+
93+
Explore [Quickstart Part 3: Semantic Similarity](./semantic-similarity.md) to expand on the example and learn about using semantic similarity.

docs/quickstart/index.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ Candidate = Base.classes.candidates
5252

5353
## View Definition
5454

55-
To use db-ally, define the views you want to use. A [structured view](../concepts/structured_views.md) is a class that specifies what to select from the database and includes methods that the AI model can use to filter rows. These methods are known as "filters".
55+
To use db-ally, define the views you want to use. A [structured view](../concepts/structured_views.md) is a class that specifies what to select from the database and includes methods that the AI model can use to filter rows. These methods are known as **filters**.
5656

5757
```python
5858
from dbally import decorators, SqlAlchemyBaseView
@@ -174,4 +174,4 @@ Access the full example on [GitHub](https://github.com/deepsense-ai/db-ally/blob
174174

175175
## Next Steps
176176

177-
Explore [Quickstart Part 2: Semantic Similarity](./semantic-similarity.md) to expand on the example and learn about using semantic similarity.
177+
Explore [Quickstart Part 2: Semantic Similarity](./semantic-similarity.md) to expand on the example and learn about using semantic similarity.

docs/quickstart/multiple-views.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Quickstart: Multiple Views
22

3-
This guide continues from [Semantic Similarity](./semantic-similarity.md) guide. It assumes that you have already set up the views and the collection. If not, please refer to the complete Part 2 code on [GitHub](https://github.com/deepsense-ai/db-ally/blob/main/examples/semantic_similarity.py){:target="_blank"}.
3+
This guide continues from [Semantic Similarity](./semantic-similarity.md) guide. It assumes that you have already set up the views and the collection. If not, please refer to the complete Part 3 code on [GitHub](https://github.com/deepsense-ai/db-ally/blob/main/examples/semantic_similarity.py){:target="_blank"}.
44

55
The guide illustrates how to use multiple views to handle queries requiring different types of data. `CandidateView` and `JobView` are used as examples.
66

@@ -28,6 +28,7 @@ jobs_data = pd.DataFrame.from_records([
2828
{"title": "Machine Learning Engineer", "company": "Company C", "location": "Berlin", "salary": 90000},
2929
{"title": "Data Scientist", "company": "Company D", "location": "London", "salary": 110000},
3030
{"title": "Data Scientist", "company": "Company E", "location": "Warsaw", "salary": 80000},
31+
{"title": "Data Scientist", "company": "Company F", "location": "Warsaw", "salary": 100000},
3132
])
3233
```
3334

docs/quickstart/semantic-similarity.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Quickstart: Semantic Similarity
22

3-
This guide is a continuation of the [Intro](./index.md) guide. It assumes that you have already set up the views and the collection. If not, please refer to the complete Part 1 code on [GitHub](https://github.com/deepsense-ai/db-ally/blob/main/examples/intro.py){:target="_blank"}.
3+
This guide is a continuation of the [Aggregations](./aggregations.md) guide. It assumes that you have already set up the views and the collection. If not, please refer to the complete Part 2 code on [GitHub](https://github.com/deepsense-ai/db-ally/blob/main/examples/aggregations.py){:target="_blank"}.
44

55
This guide will demonstrate how to use semantic similarity to handle queries in which the filter values are similar to those in the database, without requiring an exact match. We will use filtering by country as an example.
66

@@ -150,4 +150,4 @@ To see the full example, you can find the code on [GitHub](https://github.com/de
150150

151151
## Next Steps
152152

153-
Explore [Quickstart Part 3: Multiple Views](./multiple-views.md) to learn how to run queries with multiple views and display the results based on the view that was used to fetch the data.
153+
Explore [Quickstart Part 4: Multiple Views](./multiple-views.md) to learn how to run queries with multiple views and display the results.

0 commit comments

Comments
 (0)