Matching patients to clinical trials is a challenge due to the complex eligibility criteria associated with each study. The specificity of clinical trial eligibility can exceed that of the structured data in EHR systems, meaning that matching often requires significant manual review and input.
The scope of this project is to create a clinical trial matching tool that helps bridge the specificity gap in EHR systems and clinical trial eligibility requirements by processing unstructured natural language patient-doctor conversations into structured search criteria and delivering a curated selection of trials that may be relevant to the patient with intuitive UI/UX to help users understand and navigate patient eligibility. The final product thus allows stakeholders to accelerate the matching process by acting as a context-driven search engine.
- Pertinent EHR data that is not typically stated in a routine visit for the patient is assumed to be available (weight, age, height, medications, current conditions, etc...). I felt it would not be realistic for all of this information to be verbally exchanged in a typical visit.
- The ClinicalTrials.gov API returns the eligibility criteria (which includes both inclusion/exclusion criteria) as plain text, so effective natural language processing of this information and using it to evaluate patient eligibility is a critical feature that sets this tool apart from just learning how to effectively query the ClinicalTrials.gov API.
- Relevancy and eligibility are key: the goal is not to recommend specifically that the patient be matched to {X} trial, but to build software-driven tools to navigate the complex eligibility criteria in order to recommend patient eligibility/relevancy for {X, Y, Z...} trials.
Frontend: Vite/TS/React -> Vercel Backend: Python/FastAPI/GPT-4o mini api -> Railway
The backend delivers a transcript + EHR -> eligibility ranked trials pipeline and serves it using an API. Relevant trials, rankings (+ ranking reasoning), and extracted patient context are rendered in the frontend after a user submits the necessary context:

- Extracts structured context from the transcript using an LLM (GPT-4o mini)
- Context to query mapping: constructs queries using patient conditions, location, intervention preference, status (is the trial actively recruiting?), and filters (e.g., patient age, prognosis, risk tolerance, ...).
- Progressive fallback trial querying: if the query is too specific/restrictive (no results returned), query fields are progressively relaxed until a certain threshold of trials are returned. Fallback flags are kept track so that users can be notified that search results may not entirely reflect the initially specified search criteria (e.g., proximity to patient home town).
- Eligibility ranking: for each trial returned, an LLM (GPT-4o mini) is prompted to evaluate the eligibility of the patient for the clinical trial on a scale of 0-3 (minimizing output tokens). The LLM is provided with the extracted transcript context, EHR data, and the transcript itself as a static prefix in the prompt (enabling prompt caching for the all the context surrounding the patient), and then the specific trial's eligibility is passed as the prompt suffix.
- REST API serving the frontend for matching, ranking, and providing detailed context for all returned trials.
- Search construction: guided input for patient EHR data + doctor-patient transcript for sending trial match/rank requests to the API. A number of example patients are provided as drop-down fill-ins.
- Results display: compact cards with relevant information for each trial are displayed, such as the status of the study, location, the eligibility scoring, and the eligibility scoring ranking. Clicking on a card expands it to show exhaustive details of the study.
- User warnings: warns users that the search terms had to be loosened in order to find relevant trials, so the trials may be further away/have interventions they did not originally prefer.
- Patient context sidebar: extracted transcript + patient data is displayed as a sidebar along with the trial results, allowing for quick manual comparison of patient information with eligibility criteria + intervention types.
Frontend: Clinical Trial Matching
Backend: API Docs
Copy the .env.example as the .env file. From the project root:
cd frontend
npm i
npm run devCopy the .env.example as the .env; fill in a suitable OpenAI API key (with gpt-4o mini access). From the project root:
cd backend
python -m venv venv
.\venv\Scripts\activate
pip install -r requirements.txt
uvicorn src.main:app --host 0.0.0.0 --port 8000Once the Vite localhost and API server are both up and running, the webapp should work as intended!