REST API backend for serving DigitalChild data to the Phase 4 research dashboard.
# Activate virtual environment
source .LittleRainbow/bin/activate
# Install dependencies
pip install -r requirements.txt -r api_requirements.txt
# Create .env file from template
cp .env.example .env
# Edit .env and configure as neededpython run_api.pyThe API will be available at http://127.0.0.1:5000
# Test health check
curl http://127.0.0.1:5000/api/health
# Test system info
curl http://127.0.0.1:5000/api/infoGET /api/health
- Returns API health status
- Use for monitoring and load balancer health checks
GET /api/info
- Returns system information and data statistics
- Includes document counts, scorecard coverage, and data freshness
GET /api/documents
- List documents with filtering and pagination
- Query parameters:
country: Filter by country nameregion: Filter by regionsource: Filter by source (e.g., "au_policy", "upr")doc_type: Filter by document typetags: Comma-separated list of tagsyear: Filter by specific yearyear_min,year_max: Filter by year rangepage: Page number (default: 1)per_page: Items per page (default: 20, max: 100)sort_by: Field to sort by (default: "last_processed")sort_order: "asc" or "desc" (default: "desc")
Example:
curl "http://localhost:5000/api/documents?region=Africa&year_min=2020&per_page=10"GET /api/documents/:id
- Get detailed information for a single document
- Returns full document metadata with tags_history
- Cached for 15 minutes
GET /api/scorecard
- List all countries in scorecard with summary
- Query parameters:
region: Filter by region (optional)page: Page number (default: 1)per_page: Items per page (default: 20, max: 100)
Example:
curl "http://localhost:5000/api/scorecard?region=Africa&per_page=20"GET /api/scorecard/:country
- Get full scorecard details for a specific country
- Returns all 10 indicators with sources
- Cached for 1 hour
Example:
curl "http://localhost:5000/api/scorecard/Kenya"GET /api/scorecard/indicators/statistics
- Get statistics about indicator values across all countries
- Returns value distribution for each indicator
- Cached for 1 hour
GET /api/tags
- Get tag frequency analysis across documents
- Query parameters:
version: Tag version (e.g., "tags_v3", "digital", "queerai")country: Filter by country nameregion: Filter by regionyear: Filter by specific yearyear_min,year_max: Filter by year range
Example:
curl "http://localhost:5000/api/tags?version=tags_v3®ion=Africa&year_min=2020"GET /api/tags/versions
- Get list of available tag versions
- Returns array of version identifiers
Example:
curl "http://localhost:5000/api/tags/versions"GET /api/timeline/tags
- Get temporal analysis of tags over time (year × tag matrix)
- Query parameters:
version: Tag version (optional)year_min,year_max: Filter by year range (optional)country: Filter by country (optional)region: Filter by region (optional)
Example:
curl "http://localhost:5000/api/timeline/tags?version=tags_v3&year_min=2018&year_max=2024"GET /api/export
- List available export formats
- Returns format ID, filename, and description for each format
Example:
curl "http://localhost:5000/api/export"GET /api/export/:format
- Download dataset in CSV format
- Available formats:
scorecard_summary: Scorecard data for all countriestags_summary: Tag frequency across all documentsdocuments_list: Complete document list with metadata
- Query parameters (for tags_summary):
version: Tag version (optional)
Example:
curl "http://localhost:5000/api/export/scorecard_summary" -o scorecard.csv
curl "http://localhost:5000/api/export/tags_summary?version=tags_v3" -o tags.csvAll CSV exports include SPDX license headers (CC-BY-4.0) for data attribution.
- ✅ API directory structure created
- ✅ Configuration management (development, production, testing)
- ✅ Flask extensions (CORS, Caching, Rate Limiting)
- ✅ Flask app factory pattern
- ✅ Metadata service layer with caching
- ✅ Scorecard service layer (works with pandas DataFrames)
- ✅ Health check routes
- ✅ Standard response formatting and error handling
- ✅ Request validators
- ✅ API requirements file
- ✅ Environment configuration template
- ✅ Development and production entry points
- ✅ Documents API (list with filters, detail)
- ✅ Scorecard API (summary, country detail, statistics)
- ✅ Caching decorators (15min documents, 1hr scorecard)
- ✅ Request validation for all parameters
- ✅ Pagination support (configurable page size)
- ✅ Sorting support (any field, asc/desc)
- ✅ 104 test cases written (100% pass rate)
- ✅ All 14 endpoints working and tested
- ✅ Tags API (frequency analysis, version management)
- GET /api/tags (with filters)
- GET /api/tags/versions
- ✅ Timeline API (temporal analysis)
- GET /api/timeline/tags (year × tag matrix)
- ✅ Export API (CSV downloads)
- GET /api/export (list formats)
- GET /api/export/:format (download CSV)
- ✅ SPDX license headers in CSV exports
- ✅ 31 test cases written for Week 3 endpoints
- ✅ All 14 endpoints now working (76 total tests passing)
- ✅ API key authentication middleware
@require_api_keydecorator for protected endpoints@optional_api_keyfor flexible authentication- X-API-Key header validation
- Development mode auto-allow for testing
- ✅ Rate limiting implementation
- Dynamic limits based on authentication status
- Public: 100 requests/hour default
- Authenticated: 1000 requests/hour default
- Custom limits for expensive operations (exports: 20/200 per hour)
- Search operations: 200/2000 per hour
- ✅ Flask-Limiter integration
- Custom rate limit key function (API key or IP)
- Redis storage for production
- Memory storage for development
- ✅ Applied to key endpoints
- Documents list with search rate limits
- Export downloads with strict limits
- Optional authentication throughout
- ✅ 28 test cases for authentication and rate limiting
- ✅ All 104 tests passing (100% success rate)
- ✅ Docker deployment
- Multi-stage Dockerfile with security best practices
- docker-compose.yml with Redis and Nginx
- Health checks and non-root user
- ✅ Nginx configuration
- Reverse proxy setup
- SSL/TLS configuration
- Security headers
- Gzip compression
- ✅ Production deployment guide
- Complete setup instructions
- Docker and manual deployment options
- SSL certificate setup (Let's Encrypt)
- Monitoring and logging configuration
- Security checklist
- Troubleshooting guide
- ✅ Configuration management
- Environment-based settings
- Production validation
- API key management
- ✅ Ready for production deployment
- ✅ Standard JSON response format
- ✅ Error handling with custom exceptions
- ✅ File modification time caching for metadata
- ✅ Pandas DataFrame support for scorecard data
- ✅ Environment-based configuration
- ✅ CORS support for frontend integration
- ✅ Rate limiting ready (in-memory for dev, Redis for prod)
- ✅ Logging with configurable levels
api/
├── __init__.py # Package initialization
├── app.py # Flask app factory
├── config.py # Configuration classes
├── extensions.py # Flask extensions init
├── routes/ # API endpoints
│ ├── health.py # Health & info endpoints
│ └── ... # (More routes in Week 2+)
├── services/ # Business logic layer
│ ├── metadata_service.py # Document metadata
│ ├── scorecard_service.py # Scorecard data
│ └── ... # (More services in Week 2+)
├── middleware/ # Request/response processing
│ └── error_handlers.py # Exception handling
└── utils/ # Helper functions
├── response.py # Response formatting
└── validators.py # Input validation
Services wrap existing processors/ modules with API-friendly formatting:
# Example: metadata_service.py
from processors.logger import get_logger
def get_documents(filters, page, per_page):
"""Load metadata.json, apply filters, paginate"""
metadata = load_metadata() # With file mtime caching
docs = metadata.get("documents", [])
# Apply filters...
# Paginate...
return {"documents": [...], "pagination": {...}}All endpoints return standardized JSON:
Success:
{
"status": "success",
"data": {...},
"timestamp": "2026-01-25T09:13:43Z"
}Error:
{
"status": "error",
"error": {
"code": "NOT_FOUND",
"message": "Resource not found",
"details": {}
},
"timestamp": "2026-01-25T09:13:43Z"
}Environment variables (see .env.example):
FLASK_ENV: development | production | testingSECRET_KEY: Flask secret key (required in production)API_KEYS: Comma-separated API keys (required in production)CORS_ORIGINS: Allowed CORS originsCACHE_TYPE: SimpleCache (dev) | RedisCache (prod)METADATA_FILE: Path to metadata.jsonSCORECARD_FILE: Path to scorecard_main.xlsx
All 5 weeks of the Phase 4 API implementation are complete:
- ✅ Week 1: Foundation (app factory, config, extensions, middleware)
- ✅ Week 2: Core APIs (documents, scorecard endpoints)
- ✅ Week 3: Extended APIs (tags, timeline, exports)
- ✅ Week 4: Authentication & rate limiting
- ✅ Week 5: Production deployment ready
Final Statistics:
- 14 REST endpoints operational
- 104 integration tests passing (100% success rate)
- Authentication: API key based with flexible decorators
- Rate limiting: Dynamic limits (100-2000 req/hr based on auth)
- Deployment: Docker + docker-compose + Nginx ready
- Documentation: Complete API docs + production guide
Optional improvements for future iterations:
- Swagger/OpenAPI specification
- Interactive API explorer at /api/docs
- Auto-generated client libraries
- GraphQL endpoint for flexible queries
- Webhook support for data updates
- Batch operations API
- API versioning (v2)
- Database integration (PostgreSQL)
- Full-text search (Elasticsearch)
- CDN integration for exports
- Query result streaming
- API usage analytics dashboard
- Per-endpoint performance metrics
- User behavior tracking
- Cost per API call analysis
- OAuth 2.0 / JWT authentication
- IP whitelisting
- Request signature validation
- DDoS protection (Cloudflare integration)
# Install production dependencies
pip install -r api_requirements.txt
# Set environment
export FLASK_ENV=production
export SECRET_KEY=your-secret-key
export API_KEYS=key1,key2,key3
# Run with gunicorn
gunicorn -w 4 -b 0.0.0.0:5000 wsgi:app# Build image
docker build -t digitalchild-api .
# Run container
docker run -p 5000:5000 --env-file .env digitalchild-api- Requires Python 3.12+
- All data files must exist before starting API
- Run
python init_project.pyif metadata.json doesn't exist - Services use file modification time caching for efficiency
- Scorecard service works with pandas DataFrames from
processors/scorecard.py - Always run from project root for imports to work correctly
ImportError: No module named 'api'
- Make sure you're running from the project root directory
FileNotFoundError: metadata.json
- Run
python init_project.pyto create required files
KeyError: 'Region'
- Scorecard columns use "Region - Broad" not "Region"
- Service layer handles this mapping
TypeError: '<' not supported between instances of 'NoneType' and 'str'
- Fixed in metadata_service.py by converting None to "unknown"
- All dictionary keys must be non-None for JSON serialization