With pgAssistant, you can leverage a Large Language Model (LLM) to enhance your SQL analyses. Here are the available options:
- Locally deployed model, such as Ollama.
- OpenAI: requires an OpenAI API key.
- Any other compatible model: by copying and pasting the prompt generated by pgAssistant.
To enable integration with an LLM, configure the following environment variables:
Variable | Description |
---|---|
OPENAI_API_KEY | OpenAI API key. Any value is required if using a locally deployed LLM. |
OPENAI_API_MODEL | Model to use, such as o1-mini , codellama:13b , or llama3.2:latest . |
LOCAL_LLM_URI | URI of your local LLM (e.g., http://localhost:11434 ). Leave blank for OpenAI. |
OPENAI_API_KEY = myOpenAPIKey
OPENAI_API_MODEL = gpt-4
OPENAI_API_KEY = peuimportemaisobligatoire
OPENAI_API_MODEL = llama3.2:latest
LOCAL_LLM_URI = http://localhost:11434
pgAssistant provides three pieces of information to the LLM model you have chosen:
- The DDL of the tables involved in the query you want to analyze (from the "Top Queries" or "Rank Queries" menus). To generate the DDL, pgAssistant uses
pg_dump
. - The SQL query.
- The result of an
ANALYZE QUERY
for your query.
The Markdown format is used to generate the prompt.
I have a PostgreSQL query that I would like to optimize. Below, I have provided the necessary details:
- DDL of the table(s) involved:
CREATE TABLE public.books (
book_id integer NOT NULL,
title character varying(150) NOT NULL,
publication_year integer,
genre character varying(50),
author_id integer NOT NULL
);
ALTER TABLE public.books OWNER TO demo;
CREATE SEQUENCE public.books_book_id_seq
AS integer
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;
ALTER SEQUENCE public.books_book_id_seq OWNER TO demo;
ALTER SEQUENCE public.books_book_id_seq OWNED BY public.books.book_id;
CREATE TABLE public.borrowers (
borrower_id integer NOT NULL,
name character varying(100) NOT NULL,
email character varying(100) NOT NULL,
membership_date date DEFAULT CURRENT_DATE
);
ALTER TABLE public.borrowers OWNER TO demo;
CREATE SEQUENCE public.borrowers_borrower_id_seq
AS integer
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;
ALTER SEQUENCE public.borrowers_borrower_id_seq OWNER TO demo;
ALTER SEQUENCE public.borrowers_borrower_id_seq OWNED BY public.borrowers.borrower_id;
CREATE TABLE public.loans (
loan_id integer NOT NULL,
book_id integer NOT NULL,
borrower_id integer NOT NULL,
loan_date date DEFAULT CURRENT_DATE,
return_date date
);
ALTER TABLE public.loans OWNER TO demo;
CREATE SEQUENCE public.loans_loan_id_seq
AS integer
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;
ALTER SEQUENCE public.loans_loan_id_seq OWNER TO demo;
ALTER SEQUENCE public.loans_loan_id_seq OWNED BY public.loans.loan_id;
ALTER TABLE ONLY public.books ALTER COLUMN book_id SET DEFAULT nextval('public.books_book_id_seq'::regclass);
ALTER TABLE ONLY public.borrowers ALTER COLUMN borrower_id SET DEFAULT nextval('public.borrowers_borrower_id_seq'::regclass);
ALTER TABLE ONLY public.loans ALTER COLUMN loan_id SET DEFAULT nextval('public.loans_loan_id_seq'::regclass);
ALTER TABLE ONLY public.books
ADD CONSTRAINT books_pkey PRIMARY KEY (book_id);
ALTER TABLE ONLY public.borrowers
ADD CONSTRAINT borrowers_email_key UNIQUE (email);
ALTER TABLE ONLY public.borrowers
ADD CONSTRAINT borrowers_pkey PRIMARY KEY (borrower_id);
ALTER TABLE ONLY public.loans
ADD CONSTRAINT loans_pkey PRIMARY KEY (loan_id);
ALTER TABLE ONLY public.books
ADD CONSTRAINT books_author_id_fkey FOREIGN KEY (author_id) REFERENCES public.authors(author_id) ON DELETE CASCADE;
ALTER TABLE ONLY public.loans
ADD CONSTRAINT loans_book_id_fkey FOREIGN KEY (book_id) REFERENCES public.books(book_id) ON DELETE CASCADE;
ALTER TABLE ONLY public.loans
ADD CONSTRAINT loans_borrower_id_fkey FOREIGN KEY (borrower_id) REFERENCES public.borrowers(borrower_id) ON DELETE CASCADE;
- The SQL query:
EXPLAIN ANALYZE SELECT l.loan_id, b.title, br.name AS borrower_name, l.loan_date
FROM loans l
JOIN books b ON l.book_id = b.book_id
JOIN borrowers br ON l.borrower_id = br.borrower_id
WHERE l.return_date IS null and b.title like 'The Hob%'
- EXPLAIN ANALYZE output:
Nested Loop (cost=0.57..2522.08 rows=3 width=544) (actual time=0.052..33.863 rows=4 loops=1)
-> Nested Loop (cost=0.29..2516.82 rows=3 width=330) (actual time=0.032..33.832 rows=4 loops=1)
-> Seq Scan on loans l (cost=0.00..1638.90 rows=516 width=16) (actual time=0.007..6.412 rows=33335 loops=1)
Filter: (return_date IS NULL)
Rows Removed by Filter: 66668
-> Index Scan using books_pkey on books b (cost=0.29..1.70 rows=1 width=322) (actual time=0.001..0.001 rows=0 loops=33335)
Index Cond: (book_id = l.book_id)
Filter: ((title)::text ~~ 'The Hob%'::text)
Rows Removed by Filter: 1
-> Index Scan using borrowers_pkey on borrowers br (cost=0.29..1.75 rows=1 width=222) (actual time=0.007..0.007 rows=1 loops=4)
Index Cond: (borrower_id = l.borrower_id)
Planning Time: 1.513 ms
Execution Time: 33.936 ms
Could you analyze this query and suggest optimizations? If optimizations are possible, please provide the necessary SQL statements (e.g., additional indexes or query rewrites) and explain the reasoning behind your recommendations.
We conducted numerous tests with:
- ChatGPT by copying/pasting the prompt generated by pgAssistant;
- OpenAI;
- Ollama using several models including
codellama:latest
,codellama:13b
,mistral
andllama3.2:latest
.
The best results were consistently obtained with ChatGPT and by far, especially for complex queries. However, open-source models also delivered good results.
The llama3.2:latest
model will provide better results than codellama
. The Mistral model yields results comparable to llama3.2
.
On a Mac M4 Pro, 7b models respond relatively quickly (<20 seconds) and are usable. With 13b models and above, response times are much slower, so patience is required.
Unlike ChatGPT, we recommend asking the same question multiple times to open-source models.
Regardless, try applying the recommendations suggested by the LLMs, but above all, TEST! Reset the pg_stat_statements
statistics and re-execute the queries you have optimized...