Andrej Karpathy writes on twitter: "One built-in UI/UX feature of LLM interfaces I'd love is proof... A feature that automatically brings in original material / reputable sources and highlights relevant sections as proof alongside factual generations would be very cool."
Surface is a website for quick AI answers, verified using web resources. It also demonstrates an interface and prompting approach for verifying claims in model responses, which could be integrated into conversational interfaces like claude.ai and chatgpt.com. With those interfaces, a user typically has to do a manual web search to verify a fact.
See more examples on the website.
Prompting Approach
- The model responds directly, while identifying the claims it wishes to check and suggesting a search query to investigate each claim. Then, for each claim and search query pair:
- The search query is searched using Bing Search API to get URLs.
- The URLs are used to get webpage text using Jina AI's Reader.
- The webpage texts are used to identify direct evidence and evaluate the truth of each claim.
Dev Setup
- Env vars
In the backend, you will need an anthropic api key and a bing search api key:
# backend/.env
NODE_ENV=development
ANTHROPIC_API_KEY=...
SECRET_CODES_ANSWER=exampleKey1, exampleKey2
BING_SEARCH_V7_SUBSCRIPTION_KEY=...
TUNNEL_TOKEN=dummy_value
In the frontend:
# frontend/.env
NEXT_PUBLIC_API_PREFIX=http://localhost:80
- Install deps in
frontend/
npm run install
- Install deps in
backend/
npm run install
- Start the nextjs frontend in
frontend/
npm run dev
- Start the docker backend in
backend/
.
npm run dev
- Navigate to
http://localhost:3000
in your browser. Enter one of the secret codes from the .env file above.
Known Issues
- I should have done the parsing of the model response with claims on the backend, not the frontend
- Backend docker does not refresh
Next Steps
- Present the option to regenerate the response, with the relevant webpage text included in the context.
- The model sometimes recognizes when it needs more information. In that case, do the typical RAG approach (like Perplexity). This is an approach to mitigate no