A web-based tool for comparing search result similarities across multiple academic search indexers, including SciX, Google Scholar, Web of Science, and Semantic Scholar.
NOTE: Web of Science is not yet configured in the current code.
- Multi-Source Search: Query multiple academic search engines simultaneously
- Advanced Similarity Metrics: Compare results using Jaccard similarity, rank-biased overlap, and more
- Visualization: Interactive charts and Venn diagrams to visualize overlap and similarities
- SciX Ranking Modifier: Experiment with modifications to SciX's ranking algorithm
- Flexible Metadata Comparison: Configure which metadata fields to consider in comparisons
- Detailed Analysis: View comprehensive tables and charts of comparison results
For a streamlined setup experience, you can use the included startup script. Run the following commands,
git clone https://github.com/sjarmak/search-engine-comparator.git
cd search-engine-comparator
chmod +x startup.sh
./startup.sh
If you do not already have a .env file setup in the backend folder then prior to carrying out search engine comparisons requiring API keys (ADS/SciX, WoS) you will need to:
cd backend
nano .env
And then add your API keys to the environment file and ctrl+X to save and exit. Then cd back to the root directory and run ./startup.sh
This script will automatically:
- Check prerequisites
- Create and activate a Python virtual environment
- Install backend and frontend dependencies
- Create a template .env file if one doesn't exist
- Apply SSL certificate fixes for macOS users
- Start both the backend and frontend servers
After running the script, the application will be available at:
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
Press Ctrl+C in the terminal running the script to stop both services.
academic-search-comparator/
├── startup.sh # Automated setup and startup script
├── frontend/ # React frontend
│ ├── public/
│ ├── src/
│ │ ├── components/ # React components
│ │ │ ├── ComparisonResults.js
│ │ │ ├── MetricsTable.js
│ │ │ ├── ResultsTable.js
│ │ │ ├── SciXModifier.js
│ │ │ └── VennDiagram.js
│ │ ├── App.js # Main application component
│ │ └── index.js # Entry point
│ └── package.json # Frontend dependencies
│
└── backend/ # FastAPI backend
├── main.py # Main API implementation
├── fix_macos_certs.py # macOS SSL certificate fix script
└── requirements.txt # Backend dependencies
If you prefer to set up each component manually, or if you encounter issues with the startup script, follow these instructions:
- Python 3.8+ for the backend
- Node.js and npm for the frontend
- API keys for the academic search services:
- NASA SciX (ADS) API key
- Semantic Scholar API key (optional)
- Web of Science API key (if using WoS)
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
cd backend
pip install -r requirements.txt
Make sure your requirements.txt includes:
fastapi
uvicorn
httpx
python-dotenv
nltk
scholarly
beautifulsoup4
certifi
python-certifi-win32 # for Windows users
- Create a
.env
file in the backend directory with your API keys:
ADS_API_KEY=your_ads_api_key
SEMANTIC_SCHOLAR_API_KEY=your_semantic_scholar_api_key
WOS_API_KEY=your_web_of_science_api_key
- For macOS users: Fix SSL certificate issues by running:
python fix_macos_certs.py
Then add the recommended environment variables to your shell profile (.zshrc
or .bash_profile
).
- Run the backend server:
For macOS users:
PYTHONHTTPSVERIFY=0 python -m uvicorn main:app --reload
For Windows/Linux users:
uvicorn main:app --reload
- Install dependencies:
cd frontend
npm install
- Start the development server:
npm start
The application should now be running at http://localhost:3000
.
If you encounter issues with the startup script:
- Check that you have the necessary permissions to execute the script
- Try running the script with bash explicitly:
bash startup.sh
- Verify that both frontend and backend directories exist in the same folder as the script
- Follow the manual setup instructions as an alternative
If you encounter SSL certificate verification errors on macOS:
- Run the
fix_macos_certs.py
script to configure your certificates - Add the environment variables to your shell profile as instructed by the script
- Run the backend server with the
PYTHONHTTPSVERIFY=0
flag as shown above
Google Scholar sometimes blocks automated access. If you encounter issues:
- Ensure you're using the latest version of the code with the direct HTML parsing approach
- Try different search terms to test functionality
- The application has fallback mechanisms to use alternative Google Scholar access methods
If you need to use proxies for enhanced reliability:
- The application uses free proxies by default
- For production use, consider configuring a dedicated proxy service
- Update the
setup_scholarly_proxy()
function inmain.py
with your proxy details
-
Enter a Search Query: Type your academic search query in the main search box.
-
Select Data Sources: Choose which academic search engines to query.
-
Choose Similarity Metrics: Select which metrics to use for comparing results.
-
Select Metadata Fields: Choose which fields to consider in comparisons (Title, Abstract, Authors, DOI, Year).
-
Run Comparison: Click "Compare Search Results" to execute the search and view results.
-
Analyze Results: Explore the Overview, Detailed Results, and SciX Modifier tabs to analyze the comparison data.
-
Experiment with SciX Ranking: In the SciX Modifier tab, adjust parameters to see how they affect the ranking of search results.
The SciX Modifier tab allows you to experiment with different ranking parameters:
- Title Keywords: Boost articles with specific keywords in the title.
- Recency Boost: Give higher ranking to more recent publications.
- Weight Adjustments: Modify the weights for authors and citations in the ranking algorithm.
Apply modifications to see how they affect the ranked results compared to the original SciX ordering.
The backend API is documented using FastAPI's automatic documentation. Once the backend server is running, visit:
- Swagger UI:
http://localhost:8000/docs
- ReDoc:
http://localhost:8000/redoc
To add a new academic search source, you'll need to:
- Implement a new function in the backend to retrieve results from the source
- Update the frontend to include the new source in the selection options
- Modify the comparison functions to handle the new source
This project is licensed under the MIT License - see the LICENSE file for details.
- NASA ADS/SciX for providing the ADS API
- Google Scholar for their academic search service
- Semantic Scholar for their open research API
- Web of Science for their comprehensive academic database