- 🔗 Published Project: http://13.200.181.230/
- 📑 Solution Deck: https://drive.google.com/file/d/1vC7BAlbA2t6g1qsu0TcUiXaOebk1pPnB/view?usp=sharing/
- 🧪 API Documentation: http://13.200.181.230/api/docs
- 📁 GitHub Repo: https://github.com/Vrittigyl/DataReplica
DataReplica is an AI-powered web tool to create high-quality, privacy-safe synthetic datasets from limited or sensitive data. It enables users to analyze, customize, generate, and download synthetic data, making data sharing and ML prototyping safer and faster across all domains.
- 📥 Upload CSV datasets for analysis
- 🔍 Auto-detect column types (Numerical, Categorical, Datetime)
- 🤖 Generate synthetic data using:
- CTGAN
- TVAE
- GaussianCopula
- Auto-select best model
- ⚙️ Custom options:
- Choose number of rows
- Omit specific columns
- Change column data types
- 📊 Generate detailed reports:
- Data Quality Report (DQR)
- Synthetic vs Real Data Comparison
- 🔐 Privacy score for synthetic data
- ⬇️ Download all outputs (data + reports) in CSV format
- Upload your CSV dataset
- The system analyzes and displays data column types
- Select your preferences:
- Synthetic model
- Row count
- Omitted columns
- Data type overrides
- Enable/disable reports
- Generate synthetic data
- View privacy score and download outputs
📍 URL: http://13.200.181.230/api/docs
- Use the
/generate/endpoint to submit:- CSV file
- Model type
- Number of synthetic rows
- Column modifications
- Boolean flags for reports
- Download synthetic data and reports from the returned links
- Clone the repository:
git clone https://github.com/yourusername/synthetic-data-generator.git cd synthetic-data-generator - Set up the environment and run the backend:
pip install -r requirements.txt uvicorn backend.main:app --reload
- Open FastAPI docs in your browser:
http://localhost:8000/docs
- To start the frontend (optional):
cd frontend npm install npm run dev