An automated AI-powered system that discovers, analyzes, and creates content about startups from various online sources. The application continuously crawls the web for startup information, generates AI-powered summaries, and automatically posts engaging content to social media platforms.
- Automated Web Crawling - Continuously discovers startup information from various online sources
- AI-Powered Analysis - Uses Google Gemini AI to generate intelligent summaries and insights
- Multi-Platform Content Generation - Automatically creates tweets and blog posts about startups
- Social Media Integration - Posts generated content to Twitter and Dev.to
- Data Aggregation - Collects startup data from Product Hunt, websites, and other sources
- Scheduled Operations - Runs automated workflows with different intervals for various tasks
- Structured Logging - Comprehensive logging with Winston and Better Stack integration
- Runtime: Node.js with TypeScript
- Database: PostgreSQL with Drizzle ORM + MongoDB for additional storage
- AI Integration: Google Gemini API
- Web Crawling: Crawlee with Playwright
- Social APIs: Twitter API v2, Dev.to API, Product Hunt API
- Logging: Winston with daily rotation and Better Stack integration
- Task Scheduling: Custom timing system with configurable delays
src/
├── modules/
│ ├── ai/ # AI-powered content generation
│ │ ├── startups/ # Startup analysis and summarization
│ │ ├── tweet/ # Tweet generation and posting
│ │ └── blog/ # Blog generation and posting
│ └── fetch-data-from-online/ # Data collection modules
│ ├── product-hunt/ # Product Hunt integration
│ └── website-crawler/ # Web scraping functionality
├── db/ # Database configurations
├── utils/ # Shared utilities and helpers
├── connection.ts # Database connection setup
├── winston.ts # Logging configuration
└── index.ts # Application entry point
interface Startup {
id: string;
name?: string;
VC_firm: string;
website: string;
founder_names: string[];
foundedAt?: string;
}interface AIGeneratedSummary {
id: string;
summary: string[];
startupId: string;
isUsedForTweets: boolean;
isUsedForBlogs: boolean;
}interface Tweet {
id: string;
startupId: string;
tweet: string;
isUsed: boolean;
}interface Blog {
id: string;
startupId: string;
title: string;
blog: string;
isUsed: boolean;
}interface WebPageData {
id: string;
url: string;
title: string;
description: string;
text: string;
isUsed: boolean;
startupId: string;
}- Node.js (v18 or higher)
- pnpm
- PostgreSQL
- MongoDB
-
Install dependencies
pnpm install --frozen-lockfile
-
Set up environment variables
cp .env.example .env
Edit
.envwith your configuration:GEMINI_API_KEY: Your Google Gemini API keyMONGODB_CONNECT_URL: MongoDB connection stringDATABASE_URL: PostgreSQL connection stringX_*: Twitter API credentialsDEVTO_API_KEY: Dev.to API keyBEARER_TOKEN: Product Hunt API tokenBETTER_STACK_*: Better Stack logging configuration
-
Run database migrations
pnpm db:migrate
-
Start the application
# Development pnpm run dev # Production pnpm run build pnpm run start
The application runs in a continuous loop with the following schedule:
-
Every Loop Iteration:
- Crawl websites for startup data
- Generate AI summaries of startups
- Generate tweet content
- Generate blog content
-
Every Hour:
- Post generated tweets to Twitter
-
Every Day:
- Post generated blogs to Dev.to
- Fetch fresh data from Product Hunt
-
Data Collection:
- Product Hunt API for trending startups
- Web crawler for detailed startup information
- Website content extraction and analysis
-
AI Processing:
- Google Gemini analyzes collected data
- Generates comprehensive summaries
- Creates engaging social media content
-
Content Distribution:
- Automated posting to Twitter
- Blog publication on Dev.to
- Tracking of used content to prevent duplicates
- Fetches daily and trending startup data
- Requires API bearer token for authentication
- Posts generated tweets automatically
- Uses OAuth 1.0a authentication
- Supports media attachments and threading
- Publishes comprehensive blog posts
- API key authentication
- Markdown formatting support
- Powers content generation and analysis
- Provides intelligent summaries
- Creates engaging social media copy
| Variable | Description | Required |
|---|---|---|
GEMINI_API_KEY |
Google Gemini API key | Yes |
MONGODB_CONNECT_URL |
MongoDB connection string | Yes |
DATABASE_URL |
PostgreSQL connection string | Yes |
X_APP_KEY |
Twitter app key | Yes |
X_APP_SECRET |
Twitter app secret | Yes |
X_ACCESS_TOKEN |
Twitter access token | Yes |
X_ACCESS_TOKEN_SECRET |
Twitter access token secret | Yes |
DEVTO_API_KEY |
Dev.to API key | Yes |
BEARER_TOKEN |
Product Hunt bearer token | Yes |
HEADLESS |
Run browser in headless mode | No |
MAX_REQUESTS |
Maximum requests per crawling session | No |
DELAY_MS |
Delay between API requests | No |
The application uses Winston for structured logging with:
- Daily log rotation
- Better Stack integration for centralized monitoring
- Different log levels for various components
- Child loggers for better traceability
# Generate new migrations
pnpm db:generate
# Run migrations
pnpm db:migrate
# Open database studio
pnpm studio- Check logs in the console or log files
- Monitor Better Stack dashboard for centralized logging
- Track API usage and rate limits
- Monitor database performance and connections
Contributions, issues, and feature requests are welcome! Please follow the guidelines outlined in the contributing.md file.
MIT License
For questions or support, please open an issue on the GitHub repository.