At some point during a project, whether it be personal or professional, you have likely needed to use SQL to interact with a relational database in your application code. In the event they are tables your team owns, you may have used an object-relational mapper (ORM) - such as SQLAlchemy, Django, Advanced Alchemy, or Piccolo. However, if the operations are primarily read-only (for example, reading and presenting information on tables which are externally updated by another process) integrating an ORM either isn’t feasible or would induce more extra complexity than it’s worth. In this case, options are fairly limited outside of writing raw SQL queries in code, which introduces a different type of complexity around sanitizing and validating inputs, ensuring proper syntax, and all the other stuff (likely) engineers don’t want to expend energy on.
While LLMs are fairly adept at building queries given the quantity of SQL on the internet, it still requires providing the table structure as context via DDL, verbal description, or an external tool that enables table metadata discovery. Additionally, when making updates, coding agents will need to ingest the strings and may make changes, potentially untested.
This is pysqlscribe, a query building library which enables building SQL queries using objects.
- Dialect Support: Currently supports
mysql,postgres, andoracle. More dialects can be added by subclassing theQueryclass and registering it with theQueryRegistry. - Dependency Free: No external dependencies outside of the Python standard library.
- Multiple APIs: Offers multiple APIs for building queries, including a
Queryclass, aTableclass, and aSchemaclass. - DDL Parser/Loader: Can parse DDL files to create
Tableobjects, facilitating integration with existing database schema definitions. - Safe by default: All identifiers are escaped by default to prevent SQL injection attacks, with the option to disable this behavior if desired.
To install, you can simply run:
pip install pysqlscribepysqlscribe currently offers several APIs for building queries.
A Query object can be constructed using the QueryRegistry's get_builder if you supply a valid dialect (e.g; "mysql", "postgres", "oracle"). For example, "mysql" would be:
from pysqlscribe.query import QueryRegistry
query_builder = QueryRegistry.get_builder("mysql")
query = query_builder.select("test_column", "another_test_column").from_("test_table").build()Alternatively, you can create the corresponding Query class associated with the dialect directly:
from pysqlscribe.query import MySQLQuery
query_builder = MySQLQuery()
query = query_builder.select("test_column", "another_test_column").from_("test_table").build()In both cases, the output is:
SELECT `test_column`,`another_test_column` FROM `test_table`Furthermore, if there are any dialects that we currently don't support, you can create your own by subclassing Query and registering it with the QueryRegistry:
from pysqlscribe.query import QueryRegistry, Query
@QueryRegistry.register("custom")
class CustomQuery(Query):
...An alternative method for building queries is through the Table object:
from pysqlscribe.table import MySQLTable
table = MySQLTable("test_table", "test_column", "another_test_column")
query = table.select("test_column").build()Output:
SELECT `test_column` FROM `test_table`A schema for the table can also be provided as a keyword argument, after the columns:
from pysqlscribe.table import MySQLTable
table = MySQLTable("test_table", "test_column", "another_test_column", schema="test_schema")
query = table.select("test_column").build()Output:
SELECT `test_column` FROM `test_schema.test_table`Table also offers a create method in the event you've added a new dialect which doesn't have an associated Table implementation, or if you need to change it for different environments (e.g; sqlite for local development, mysql/postgres/oracle/etc. for deployment):
from pysqlscribe.table import Table
new_dialect_table_class = Table.create(
"new-dialect") # assuming you've registered "new-dialect" with the `QueryRegistry`
table = new_dialect_table_class("test_table", "test_column", "another_test_column")You can overwrite the original columns supplied to a Table as well, which will delete the old attributes and set new ones:
from pysqlscribe.table import MySQLTable
table = MySQLTable("test_table", "test_column", "another_test_column")
table.test_column # valid
table.fields = ['new_test_column']
table.select("new_test_column")
table.new_test_column # now valid - but `table.test_column` is not anymoreAdditionally, you can reference the Column attributes Table object when constructing queries. For example, in a WHERE clause:
from pysqlscribe.table import PostgresTable
table = PostgresTable("employee", "first_name", "last_name", "salary", "location")
table.select("first_name", "last_name", "location").where(table.salary > 1000).build()Output:
SELECT "first_name","last_name","location" FROM "employee" WHERE salary > 1000
and in a JOIN:
from pysqlscribe.table import PostgresTable
employee_table = PostgresTable(
"employee", "first_name", "last_name", "dept", "payroll_id"
)
payroll_table = PostgresTable("payroll", "id", "salary", "category")
query = (
employee_table.select(
employee_table.first_name, employee_table.last_name, employee_table.dept
)
.join(payroll_table, "inner", payroll_table.id == employee_table.payroll_id)
.build()
)Output:
SELECT "first_name","last_name","dept" FROM "employee" INNER JOIN "payroll" ON payroll.id = employee.payroll_id
For associating multiple Tables with a single schema, you can use the Schema:
from pysqlscribe.schema import Schema
schema = Schema("test_schema", tables=["test_table", "another_test_table"], dialect="postgres")
schema.tables # a list of two `Table` objectsThis is functionally equivalent to:
from pysqlscribe.table import PostgresTable
table = PostgresTable("test_table", schema="test_schema")
another_table = PostgresTable("another_test_table", schema="test_schema")Instead of supplying a dialect directly to Schema, you can also set the environment variable PYSQLSCRIBE_BUILDER_DIALECT:
export PYSQLSCRIBE_BUILDER_DIALECT = 'postgres'from pysqlscribe.schema import Schema
schema = Schema("test_schema", tables=["test_table", "another_test_table"])
schema.tables # a list of two `PostgresTable` objectsAlternatively, if you already have existing Table objects you want to associate with the schema, you can supply them directly (in this case, dialect is not needed):
from pysqlscribe.schema import Schema
from pysqlscribe.table import PostgresTable
table = PostgresTable("test_table")
another_table = PostgresTable("another_test_table")
schema = Schema("test_schema", [table, another_table])Schema also has each table set as an attribute, so in the example above, you can do the following:
schema.test_table # will return the supplied table object with the name `"test_table"`Arithmetic operations can be performed on columns, both on Column objects and scalar values:
from pysqlscribe.table import MySQLTable
table = MySQLTable("employees", "salary", "bonus", "lti")
query = table.select(
(table.salary + table.bonus + table.lti).as_("total_compensation")
).build()Output:
SELECT employees.salary + employees.bonus + employees.lti AS total_compensation FROM `employees` from pysqlscribe.table import MySQLTable
table = MySQLTable("employees", "salary", "bonus", "lti")
query = table.select((table.salary * 0.75).as_("salary_after_taxes")).build()Output:
SELECT employees.salary * 0.75 AS salary_after_taxes FROM `employees`Membership operations such as IN and NOT IN are supported:
from pysqlscribe.table import MySQLTable
table = MySQLTable("employees", "salary", "bonus", "department_id")
query = table.select().where(table.department_id.in_([1, 2, 3])).build()Output:
SELECT * FROM `employees` WHERE department_id IN (1,2,3)
For computing aggregations (e.g; MAX, AVG, COUNT) or performing scalar operations (e.g; ABS, SQRT, UPPER), we have functions available in the aggregate_functions and scalar_functions modules which will accept both strings or columns:
from pysqlscribe.table import PostgresTable
from pysqlscribe.aggregate_functions import max_
from pysqlscribe.scalar_functions import upper
table = PostgresTable(
"employee", "first_name", "last_name", "store_location", "salary"
)
query = (
table.select(upper(table.store_location), max_(table.salary))
.group_by(table.store_location)
.build()
)
# Equivalently:
query_with_strs = (
table.select(upper("store_location"), max_("salary"))
.group_by("store_location")
.build()
)Output:
SELECT UPPER(store_location),MAX(salary) FROM "employee" GROUP BY "store_location"
You can combine queries using the union, intersect, and except methods, providing either another Query object or a string:
from pysqlscribe.query import QueryRegistry
query_builder = QueryRegistry.get_builder("mysql")
another_query_builder = QueryRegistry.get_builder("mysql")
query = (
query_builder.select("test_column", "another_test_column")
.from_("test_table")
.union(
another_query_builder.select("test_column", "another_test_column")
.from_("another_test_table")
)
.build()
)Output:
SELECT `test_column`,`another_test_column` FROM `test_table` UNION SELECT `test_column`,`another_test_column` FROM `another_test_table`to perform all for each combination operation, you supply the argument all_:
from pysqlscribe.query import QueryRegistry
query_builder = QueryRegistry.get_builder("mysql")
another_query_builder = QueryRegistry.get_builder("mysql")
query = (
query_builder.select("test_column", "another_test_column")
.from_("test_table")
.union(
another_query_builder.select("test_column", "another_test_column")
.from_("another_test_table"), all_=True
)
.build()
)Output:
SELECT `test_column`,`another_test_column` FROM `test_table` UNION ALL SELECT `test_column`,`another_test_column` FROM `another_test_table`For aliasing tables and columns, you can use the as_ method on the Table or Column objects:
from pysqlscribe.table import PostgresTable
employee_table = PostgresTable(
"employee", "first_name", "last_name", "dept", "payroll_id"
)
query = (
employee_table.as_("e").select(employee_table.first_name.as_("name")).build()
)Output:
SELECT "first_name" AS name FROM "employee" AS e
Subqueries can be used when evaluating Columns in the form of a membership:
from pysqlscribe.table import MySQLTable
employees = MySQLTable("employees", "salary", "bonus", "department_id")
departments = MySQLTable("departments", "id", "name", "manager_id")
subquery = departments.select("id").where(departments.name == "Engineering")
query = employees.select().where(employees.department_id.in_(subquery)).build()Output:
SELECT * FROM `employees` WHERE employees.department_id IN (SELECT `id` FROM `departments` WHERE departments.name = 'Engineering')While the primary focus of this library is on building retrieval ("SELECT") queries, you can also build INSERT queries:
from pysqlscribe.query import QueryRegistry
query_builder = QueryRegistry.get_builder("mysql")
query = query_builder.insert(
"test_column",
"another_test_column",
into="test_table",
values=(1, 2),
).build()Output:
INSERT INTO `test_table` (`test_column`,`another_test_column`) VALUES (1,2)While into and values are required keyword arguments, if no positional arguments (args) are supplied, it is omitted from the query:
from pysqlscribe.query import QueryRegistry
query_builder = QueryRegistry.get_builder("mysql")
query = query_builder.insert(
into="test_table",
values=(1, 2),
).build()Output:
INSERT INTO `test_table` VALUES (1,2)Multiple values can also be supplied:
from pysqlscribe.query import QueryRegistry
query_builder = QueryRegistry.get_builder("mysql")
query = query_builder.insert(
"test_column", "another_test_column", into="test_table", values=[(1, 2), (3, 4)]
).build()Output:
INSERT INTO `test_table` (`test_column`,`another_test_column`) VALUES (1,2),(3,4)RETURNING is also supported:
from pysqlscribe.query import QueryRegistry
query_builder = QueryRegistry.get_builder("postgres")
query = (
query_builder.insert(
"id", "employee_name", into="employees", values=(1, "'john doe'")
)
.returning("id", "employee_name")
.build()
)Output:
INSERT INTO "employees" ("id","employee_name") VALUES (1,'john doe') RETURNING "id","employee_name"
The Table API offers the insert capability. Similar to select, the into argument is inferred from the table name:
from pysqlscribe.table import MySQLTable
table = MySQLTable("employees", "salary", "bonus")
query = table.insert(table.salary, table.bonus, values=(100, 200)).build()Output:
INSERT INTO `employees` (`salary`,`bonus`) VALUES (100,200)By default, all identifiers are escaped using the corresponding dialect's escape character, as can be seen in various examples. This is done to prevent SQL injection attacks and to ensure we handle different column name variations (e.g; a column with a space in the name, a column name which coincides with a keyword). Admittedly, this also makes the queries less aesthetic. If you want to disable this behavior, you can use the disable_escape_identifiers method:
from pysqlscribe.query import QueryRegistry
query_builder = QueryRegistry.get_builder("mysql").disable_escape_identifiers()
query = (
query_builder.select("test_column", "another_test_column")
.from_("test_table")
.where("test_column = 1", "another_test_column > 2")
.build()
)Output:
SELECT test_column,another_test_column FROM test_table WHERE test_column = 1 AND another_test_column > 2 # look ma, no backticks!If you want to switch preferences, there's a corresponding enable_escape_identifiers method:
from pysqlscribe.query import QueryRegistry
query_builder = QueryRegistry.get_builder("mysql").disable_escape_identifiers()
query = (
query_builder.select("test_column", "another_test_column")
.enable_escape_identifiers()
.from_("test_table")
.where("test_column = 1", "another_test_column > 2")
.build()
)Output:
SELECT test_column,another_test_column FROM `test_table` WHERE test_column = 1 AND another_test_column > 2 # note the table name is escaped while the columns are notAlternatively, if you don't want to change existing code or you have several Query or Table objects you want to apply this setting to (and don't plan on swapping settings), you can set the environment variable PYSQLSCRIBE_ESCAPE_IDENTIFIERS to "False" or "0".
pysqlscribe also has a simple DDL parser which can load/create Table objecs from a DDL file (or directory containing DDL files):
from pysqlscribe.utils.ddl_loader import load_tables_from_ddls
tables = load_tables_from_ddls(
"path/to/ddl_file.sql", # can be a file or directory
dialect="mysql" # specify the dialect of the DDL
)Alternatively, if you have a string containing the DDL, you can use:
from pysqlscribe.utils.ddl_parser import parse_create_tables
from pysqlscribe.utils.ddl_loader import create_tables_from_parsed
sql = """
CREATE TABLE cool_company.employees (
employee_id INT,
salary INT,
role VARCHAR(50),
);
"""
parsed = parse_create_tables(sql) # will be a dictionary of table name to table metadata e.g; columns, schema
parsed # {'employees': {'columns': ['employee_id', 'salary', 'role'], 'schema': 'cool_company'}}
tables = create_tables_from_parsed(
parsed,
dialect="mysql"
) # dictionary of table name to `Table` object
tables # {'employees': MysqlTable(name=cool_company.employees, columns=('employee_id', 'salary', 'role'))}This is anticipated to grow, also there are certainly operations that are missing within dialects.
-
MySQL -
Oracle -
Postgres -
Sqlite
- Add more dialects
- Support
OFFSETfor Oracle and SQLServer - Improved injection mitigation
- Support more aggregate and scalar functions
- Enhance how where clauses are handled
💡 Interested in contributing? Check out the Local Development & Contributions Guide.