ScholarVerse

ScholarVerse is a self-hosted academic research platform for literature, linguistics, and humanities work. The stack in this repo is built around evidence-first retrieval, citation traceability, asynchronous document intelligence, and open-source inference endpoints instead of proprietary hosted AI APIs.

What is here

scholarverse-web/ Next.js 16 App Router frontend and server routes.
migrations/ InsForge/PostgreSQL schema with pgvector-backed RAG tables, policies, and retrieval SQL.
services/ingestion-worker/ FastAPI + Redis Queue ingestion service for OCR, metadata extraction, chunking, and callback delivery.
docker-compose.yml Local orchestration for the web app, Redis, vLLM, optional Ollama, and the ingestion services.
docs/architecture.md Production deployment notes and system flow.
scholarverse-web/training/ Fine-tuning dataset export and supervised training helpers.

Core platform design

scholarverse-web handles the research workspace, streaming responses, admin surface, and upload entrypoint.
InsForge owns authentication, PostgreSQL, pgvector persistence, storage buckets, and chat/document metadata once the project is linked.
A self-hosted OpenAI-compatible endpoint, preferably vLLM, serves the primary research model and embedding model.
The ingestion worker performs OCR, parsing, semantic segmentation, duplicate detection, and callback delivery asynchronously through Redis-backed jobs.
Retrieval happens before generation. The app is designed to pass page-scoped source excerpts into the generation step and surface citations back to the user interface.

Local development

Copy .env.example to .env and fill the InsForge and model settings.
Install the web app dependencies:
```
cd scholarverse-web
npm install
```
Run the web app:
```
npm run dev
```
Start the worker stack when you want document ingestion or self-hosted inference:
```
docker compose up redis vllm ingestion-api ingestion-worker
```

Apply the database schema to your linked InsForge project:

npx @insforge/cli whoami --json
npx @insforge/cli current --json
npx @insforge/cli db migrations up --all

Fine-tuning workflow

ScholarVerse now includes a practical prompt-to-training path:

Export grounded chat examples from the live corpus and saved conversations:

cd scholarverse-web
npm run finetune:export -- --output training/datasets/scholarverse-sft.jsonl

Train a ScholarVerse-specific model adapter:

uv run training/train_scholarverse_sft.py \
  --dataset training/datasets/scholarverse-sft.jsonl \
  --model Qwen/Qwen2.5-32B-Instruct \
  --output-dir training/output/scholarverse-qwen-sft

Serve the tuned checkpoint from infrastructure you control and point SELF_HOSTED_CHAT_MODEL at it.

Deployment model

Frontend: Vercel or InsForge-managed frontend deployment from scholarverse-web/
Backend primitives: InsForge
Self-hosted inference: dedicated GPU server running vLLM
Async ingestion: containerized FastAPI + RQ worker on a compute service
Redis: managed or self-hosted Redis for queueing

Current runtime behavior

The web app runs in a demo-backed mode until you provide real InsForge and self-hosted AI environment variables. That fallback is deliberate: it keeps the interface operable while you wire the actual infrastructure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ScholarVerse

What is here

Core platform design

Local development

Fine-tuning workflow

Deployment model

Current runtime behavior

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
docs		docs
migrations		migrations
scholarverse-web		scholarverse-web
services/ingestion-worker		services/ingestion-worker
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation

ScholarVerse

What is here

Core platform design

Local development

Fine-tuning workflow

Deployment model

Current runtime behavior

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages