Academic Search Engine Comparisons

A web application for comparing search results across multiple academic search engines, including ADS/SciX, Google Scholar, Semantic Scholar, and Web of Science.

Features

Compare search results from multiple academic search engines
Analyze similarity and differences between result sets
Experiment with boosting factors to improve search rankings
Perform A/B testing of different search algorithms
Debug tools for API testing and diagnostics
Direct Solr proxy for ADS/SciX queries (no API key required)

New Features

Quepid Integration

The application now includes integration with Quepid, a search relevance testing platform. This integration allows you to:

Connect to your Quepid cases containing relevance judgments
Evaluate search results using industry-standard metrics like nDCG@10
Compare performance across different search engines
Test how changes to search algorithms affect relevance scores

Configuration

To use the Quepid integration, you'll need to set the following environment variables:

QUEPID_API_URL=https://app.quepid.com/api/
QUEPID_API_KEY=your_api_key_here

API Endpoints

The following endpoint has been added:

POST /experiments/quepid-evaluation: Evaluate search results against Quepid judgments

Example request:

{
  "query": "katabatic wind",
  "sources": ["ads", "scholar", "semantic_scholar"],
  "case_id": 123,
  "max_results": 20
}

Example response:

{
  "query": "katabatic wind",
  "case_id": 123,
  "case_name": "Atmospheric Sciences",
  "source_results": [
    {
      "source": "ads",
      "metrics": [
        {
          "name": "ndcg@10",
          "value": 0.85,
          "description": "Normalized Discounted Cumulative Gain at 10"
        },
        {
          "name": "p@10",
          "value": 0.7,
          "description": "Precision at 10"
        }
      ],
      "judged_retrieved": 15,
      "relevant_retrieved": 12,
      "results_count": 20
    }
  ],
  "total_judged": 25,
  "total_relevant": 18
}

New: LLM-Based Query Intent Service

The repository now includes a new feature that uses lightweight open-source LLMs to interpret user search queries, detect intent, and transform queries to be more effective. This feature is accessible through the "Query Intent" tab in the UI.

Key Features

Query analysis using local LLM models via Ollama
Automatic query transformation based on detected intent
Support for multiple lightweight models (Llama 2, Mistral, Gemma)
Rule-based fallbacks when intent is clear
Docker Compose setup for easy deployment

To use this feature:

Set up the backend service following instructions in backend/README.md
Use the "Query Intent" tab in the UI for semantic query transformation

For details, see the backend documentation.

Project Structure

The project is structured as follows:

backend/: FastAPI backend with search services
- app/: Application code
  - api/: API routes and models
  - core/: Core configuration and utilities
  - services/: Search engine integration services
  - utils/: Utility functions
- tests/: Backend tests
frontend/: React frontend application
- public/: Static files
- src/: React source code
  - components/: React components
  - services/: API service functions

Prerequisites

Python 3.9+
Node.js 14+
API keys for academic search engines (optional)

Setup

Backend Setup

Navigate to the backend directory:
```
cd backend
```

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Create a .env.local file in the project root with your API keys:

ADS_API_TOKEN=your_ads_token
SEMANTIC_SCHOLAR_API_KEY=your_ss_key
WEB_OF_SCIENCE_API_KEY=your_wos_key

ADS/SciX Solr Proxy Configuration

The application supports querying ADS/SciX directly through a Solr proxy, which offers faster results and doesn't require an API key. Configure this in your environment file:

# Solr proxy URL (default: https://scix-solr-proxy.onrender.com/solr/select)
ADS_SOLR_PROXY_URL=https://scix-solr-proxy.onrender.com/solr/select

# Query method (options: solr_only, api_only, solr_first)
# - solr_only: Only use Solr proxy
# - api_only: Only use ADS API
# - solr_first: Try Solr first, fall back to API if needed (default)
ADS_QUERY_METHOD=solr_first

Frontend Setup

Navigate to the frontend directory:
```
cd frontend
```
Install dependencies:
```
npm install
```

Development

Running Locally

Start both frontend and backend servers:

./start_local.sh

Or run them separately:

Backend:

cd backend
python -m uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Frontend:
```
cd frontend
npm start
```

Open your browser and navigate to http://localhost:3000

Testing

Run backend tests:
```
cd backend
pytest
```

Deployment

This application is configured for deployment on Render.com using the render.yaml configuration file.

Environment Configuration

The application supports different environments:

local: For local development
development: For development deployment
staging: For staging deployment
production: For production deployment

Environment-specific configuration is loaded from:

.env.local
.env.dev
.env.staging
.env.prod

API Documentation

When running locally, the API documentation is available at:

Swagger UI: http://localhost:8000/api/docs
ReDoc: http://localhost:8000/api/redoc

License

MIT License

Search Comparisons Tool

A tool for comparing search results across different academic search engines.

Deployment on Render

Prerequisites

A Render account
ADS API token
Git repository with the code

Backend Deployment

Create a new Web Service on Render
Connect your Git repository
Configure the following settings:
- Name: search-comparisons-backend
- Environment: Python
- Build Command: pip install -r requirements.txt
- Start Command: uvicorn app.main:app --host 0.0.0.0 --port $PORT
- Environment Variables:
  - LLM_PROVIDER: ollama
  - LLM_MODEL_NAME: llama2
  - LLM_TEMPERATURE: 0.7
  - LLM_MAX_TOKENS: 1000
  - ADS_API_TOKEN: (your ADS API token)
  - SOLR_URL: https://api.adsabs.harvard.edu/v1/search/query
  - CORS_ORIGINS: https://search-comparisons-frontend.onrender.com
  - ENVIRONMENT: production

Frontend Deployment

Create a new Web Service on Render
Connect your Git repository
Configure the following settings:
- Name: search-comparisons-frontend
- Environment: Node
- Build Command: npm install && npm run build
- Start Command: npm start
- Environment Variables:
  - REACT_APP_API_URL: https://search-comparisons-backend.onrender.com
  - NODE_ENV: production

Using Docker

Alternatively, you can deploy using Docker:

Create a new Web Service on Render
Select "Docker" as the environment
Point to your Dockerfile
Configure the same environment variables as above

Health Checks

The backend service includes a health check endpoint at /api/health. Render will automatically monitor this endpoint.

Environment Variables

Make sure to set up the following environment variables in your Render dashboard:

ADS_API_TOKEN: Your ADS API token
LLM_PROVIDER: The LLM provider to use (default: ollama)
LLM_MODEL_NAME: The model name to use (default: llama2)
LLM_TEMPERATURE: The temperature for LLM generation (default: 0.7)
LLM_MAX_TOKENS: Maximum tokens to generate (default: 1000)
SOLR_URL: The Solr API endpoint
CORS_ORIGINS: Allowed CORS origins
ENVIRONMENT: Set to "production" for production deployment

Monitoring and Logs

Render provides built-in monitoring and logging. You can view:

Application logs
Build logs
Health check status
Resource usage

Scaling

The service can be scaled horizontally by:

Going to the service settings
Adjusting the instance count
Setting up auto-scaling rules if needed

Custom Domains

You can set up custom domains for both services:

Go to the service settings
Click on "Custom Domains"
Follow the instructions to add your domain

Development

Local Setup

Clone the repository
Create a virtual environment: python -m venv venv
Activate the virtual environment: source venv/bin/activate
Install dependencies: pip install -r requirements.txt
Set up environment variables in a .env file
Run the development server: uvicorn app.main:app --reload

Testing

Run tests with:

pytest

Code Quality

The project uses:

Ruff for linting
Black for code formatting
MyPy for type checking

Run these tools with:

ruff check .
black .
mypy .

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
backend		backend
frontend		frontend
.dockerignore		.dockerignore
.env.local		.env.local
.env.prod		.env.prod
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
render.yaml		render.yaml
requirements.txt		requirements.txt
start_local.sh		start_local.sh

License

adsabs/search-comparisons

Folders and files

Latest commit

History

Repository files navigation

Academic Search Engine Comparisons

Features

New Features

Quepid Integration

Configuration

API Endpoints

New: LLM-Based Query Intent Service

Key Features

Project Structure

Prerequisites

Setup

Backend Setup

ADS/SciX Solr Proxy Configuration

Frontend Setup

Development

Running Locally

Testing

Deployment

Environment Configuration

API Documentation

License

Search Comparisons Tool

Deployment on Render

Prerequisites

Backend Deployment

Frontend Deployment

Using Docker

Health Checks

Environment Variables

Monitoring and Logs

Scaling

Custom Domains

Development

Local Setup

Testing

Code Quality

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages