This project is a full-stack research tool designed to help game developers analyze the competitive landscape on Steam. By inputting a game idea or description, developers can discover similar existing games, understand their market positioning, and gain insights into their strengths and weaknesses based on player reviews.
- Competitor Identification: Uses natural language processing to find the top 5 most similar games to your concept.
- Uniqueness Score: Calculates a "uniqueness score" to gauge how crowded the market is for your game idea.
- AI-Powered Review Analysis: Fetches recent reviews for competitor games and uses the Gemini API to summarize common praises and criticisms.
- Interactive Frontend: A clean, modern web interface built with Next.js and React to easily input your ideas and visualize the results.
- RESTful API: A robust backend built with FastAPI that serves the analysis results to the frontend.
The tool is comprised of a data pipeline, a backend API, and a web frontend.
-
Data Pipeline:
- A Kaggle dataset of Steam games is downloaded and processed.
- Game descriptions and tags are cleaned and combined.
- The
sentence-transformerslibrary is used to generate vector embeddings for each game's text, creating a numerical representation of its content. - The processed data and embeddings are saved for the API to use.
-
Backend API:
- The user submits their game description to the
/analyzeendpoint. - The backend generates an embedding for the user's description.
- Cosine similarity is used to compare the user's embedding to all game embeddings in the database, identifying the most similar games.
- A uniqueness score is calculated based on the similarity of the top match.
- The API asynchronously fetches the latest reviews for the top 5 competitor games from Steam.
- The fetched reviews are passed to the Gemini API with a specialized prompt to extract common challenges and praises.
- The results are sent back to the frontend as a JSON object.
- The user submits their game description to the
-
Frontend:
- The user enters their game idea into a textarea.
- The frontend calls the backend API and displays the results in a user-friendly format, including the uniqueness score, a list of similar games with links to their Steam pages, and the AI-generated review summary.
This project demonstrates a range of modern software engineering and data science skills:
- Backend Development:
- Built a high-performance, asynchronous REST API using FastAPI.
- Utilized Pydantic for robust data validation.
- Implemented asynchronous network requests with
aiohttpto efficiently fetch external data from the Steam API.
- Natural Language Processing (NLP):
- Applied sentence embeddings (
all-MiniLM-L6-v2model) to represent and compare the semantic content of game descriptions. - Used cosine similarity to perform a semantic search for competitor games.
- Leveraged TF-IDF to predict relevant tags for a given game description.
- Applied sentence embeddings (
- Generative AI:
- Integrated the Google Gemini API for advanced text summarization, crafting a specific prompt to extract actionable insights from unstructured review data.
- Frontend Development:
- Developed a responsive and interactive user interface with Next.js, React, and TypeScript.
- Styled the application with TailwindCSS for a modern and clean aesthetic.
- Full-Stack Architecture:
- Designed and built a complete full-stack application, from data preprocessing to a user-facing web interface.
- Managed a data pipeline that includes fetching, cleaning, and processing data for use in a machine learning application.
To run this project, you need to run the backend API and the frontend application separately.
- Navigate to the API directory:
cd api - Install Python dependencies:
pip install -r ../requirements.txt
- Set up your environment variables:
- Create a file named
.envin theapi/directory. - Add your Google API key to the
.envfile:GOOGLE_API_KEY="YOUR_API_KEY_HERE"
- Create a file named
- Run the preprocessing script:
- If updating the data, run the preprocesser first. Then, upload the data to GitHub
python ../preprocess.py
- Start the FastAPI server:
uvicorn main:app --reload
- Navigate to the web directory:
cd web - Install Node.js dependencies:
npm install npm run build
- Start the Next.js development server:
npm run start
- Backend: Python, FastAPI,
sentence-transformers,scikit-learn,pandas,numpy, Google Gemini API - Frontend: Next.js, React, TypeScript, TailwindCSS
- Data: Kaggle, SQLite
- DevOps:
update_data.shscript for data pipeline management