Skip to content

Latest commit

 

History

History
199 lines (144 loc) · 7.1 KB

LLM.md

File metadata and controls

199 lines (144 loc) · 7.1 KB

Using LLM(s) with pgAssistant

With pgAssistant, you can leverage a Large Language Model (LLM) to enhance your SQL analyses. Here are the available options:

  • Locally deployed model, such as Ollama.
  • OpenAI: requires an OpenAI API key.
  • Any other compatible model: by copying and pasting the prompt generated by pgAssistant.

Environment Variable Configuration

To enable integration with an LLM, configure the following environment variables:

Variable Description
OPENAI_API_KEY OpenAI API key. Any value is required if using a locally deployed LLM.
OPENAI_API_MODEL Model to use, such as o1-mini, codellama:13b, or llama3.2:latest.
LOCAL_LLM_URI URI of your local LLM (e.g., http://localhost:11434). Leave blank for OpenAI.

Sample usages

With OpenAI

OPENAI_API_KEY = myOpenAPIKey
OPENAI_API_MODEL = gpt-4

With local Ollama

OPENAI_API_KEY = peuimportemaisobligatoire
OPENAI_API_MODEL = llama3.2:latest
LOCAL_LLM_URI = http://localhost:11434

How does pgAssistant create a prompt to query LLMs?

pgAssistant provides three pieces of information to the LLM model you have chosen:

  • The DDL of the tables involved in the query you want to analyze (from the "Top Queries" or "Rank Queries" menus). To generate the DDL, pgAssistant uses pg_dump.
  • The SQL query.
  • The result of an ANALYZE QUERY for your query.

The Markdown format is used to generate the prompt.

Example of a prompt generated by pgAssistant

I have a PostgreSQL query that I would like to optimize. Below, I have provided the necessary details:

  1. DDL of the table(s) involved:
CREATE TABLE public.books (
    book_id integer NOT NULL,
    title character varying(150) NOT NULL,
    publication_year integer,
    genre character varying(50),
    author_id integer NOT NULL
);

ALTER TABLE public.books OWNER TO demo;

CREATE SEQUENCE public.books_book_id_seq
    AS integer
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1;

ALTER SEQUENCE public.books_book_id_seq OWNER TO demo;

ALTER SEQUENCE public.books_book_id_seq OWNED BY public.books.book_id;

CREATE TABLE public.borrowers (
    borrower_id integer NOT NULL,
    name character varying(100) NOT NULL,
    email character varying(100) NOT NULL,
    membership_date date DEFAULT CURRENT_DATE
);

ALTER TABLE public.borrowers OWNER TO demo;

CREATE SEQUENCE public.borrowers_borrower_id_seq
    AS integer
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1;
ALTER SEQUENCE public.borrowers_borrower_id_seq OWNER TO demo;

ALTER SEQUENCE public.borrowers_borrower_id_seq OWNED BY public.borrowers.borrower_id;

CREATE TABLE public.loans (
    loan_id integer NOT NULL,
    book_id integer NOT NULL,
    borrower_id integer NOT NULL,
    loan_date date DEFAULT CURRENT_DATE,
    return_date date
);

ALTER TABLE public.loans OWNER TO demo;

CREATE SEQUENCE public.loans_loan_id_seq
    AS integer
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1;

ALTER SEQUENCE public.loans_loan_id_seq OWNER TO demo;

ALTER SEQUENCE public.loans_loan_id_seq OWNED BY public.loans.loan_id;

ALTER TABLE ONLY public.books ALTER COLUMN book_id SET DEFAULT nextval('public.books_book_id_seq'::regclass);

ALTER TABLE ONLY public.borrowers ALTER COLUMN borrower_id SET DEFAULT nextval('public.borrowers_borrower_id_seq'::regclass);

ALTER TABLE ONLY public.loans ALTER COLUMN loan_id SET DEFAULT nextval('public.loans_loan_id_seq'::regclass);

ALTER TABLE ONLY public.books
    ADD CONSTRAINT books_pkey PRIMARY KEY (book_id);

ALTER TABLE ONLY public.borrowers
    ADD CONSTRAINT borrowers_email_key UNIQUE (email);

ALTER TABLE ONLY public.borrowers
    ADD CONSTRAINT borrowers_pkey PRIMARY KEY (borrower_id);

ALTER TABLE ONLY public.loans
    ADD CONSTRAINT loans_pkey PRIMARY KEY (loan_id);

ALTER TABLE ONLY public.books
    ADD CONSTRAINT books_author_id_fkey FOREIGN KEY (author_id) REFERENCES public.authors(author_id) ON DELETE CASCADE;

ALTER TABLE ONLY public.loans
    ADD CONSTRAINT loans_book_id_fkey FOREIGN KEY (book_id) REFERENCES public.books(book_id) ON DELETE CASCADE;

ALTER TABLE ONLY public.loans
    ADD CONSTRAINT loans_borrower_id_fkey FOREIGN KEY (borrower_id) REFERENCES public.borrowers(borrower_id) ON DELETE CASCADE;
  1. The SQL query:
EXPLAIN ANALYZE  SELECT l.loan_id, b.title, br.name AS borrower_name, l.loan_date
FROM loans l
JOIN books b ON l.book_id = b.book_id
JOIN borrowers br ON l.borrower_id = br.borrower_id
WHERE l.return_date IS null and b.title like 'The Hob%'
  1. EXPLAIN ANALYZE output:
Nested Loop  (cost=0.57..2522.08 rows=3 width=544) (actual time=0.052..33.863 rows=4 loops=1)
  ->  Nested Loop  (cost=0.29..2516.82 rows=3 width=330) (actual time=0.032..33.832 rows=4 loops=1)
        ->  Seq Scan on loans l  (cost=0.00..1638.90 rows=516 width=16) (actual time=0.007..6.412 rows=33335 loops=1)
              Filter: (return_date IS NULL)
              Rows Removed by Filter: 66668
        ->  Index Scan using books_pkey on books b  (cost=0.29..1.70 rows=1 width=322) (actual time=0.001..0.001 rows=0 loops=33335)
              Index Cond: (book_id = l.book_id)
              Filter: ((title)::text ~~ 'The Hob%'::text)
              Rows Removed by Filter: 1
  ->  Index Scan using borrowers_pkey on borrowers br  (cost=0.29..1.75 rows=1 width=222) (actual time=0.007..0.007 rows=1 loops=4)
        Index Cond: (borrower_id = l.borrower_id)
Planning Time: 1.513 ms
Execution Time: 33.936 ms

Could you analyze this query and suggest optimizations? If optimizations are possible, please provide the necessary SQL statements (e.g., additional indexes or query rewrites) and explain the reasoning behind your recommendations.

Tests Conducted

We conducted numerous tests with:

  • ChatGPT by copying/pasting the prompt generated by pgAssistant;
  • OpenAI;
  • Ollama using several models including codellama:latest, codellama:13b, mistral and llama3.2:latest.

The best results were consistently obtained with ChatGPT and by far, especially for complex queries. However, open-source models also delivered good results.

The llama3.2:latest model will provide better results than codellama. The Mistral model yields results comparable to llama3.2.

On a Mac M4 Pro, 7b models respond relatively quickly (<20 seconds) and are usable. With 13b models and above, response times are much slower, so patience is required.

Unlike ChatGPT, we recommend asking the same question multiple times to open-source models.

Regardless, try applying the recommendations suggested by the LLMs, but above all, TEST! Reset the pg_stat_statements statistics and re-execute the queries you have optimized...


Prompt samplw

Alt text

mistral results

Alt text