This project is designed to scrape technology news from various sources and provide the data as a JSON API and RSS feeds. Currently, it supports fetching daily and trending papers from Hugging Face.
The application is built using Hono, a lightweight web framework, and is intended to be deployed on Cloudflare Workers. It leverages Cloudflare's powerful HTMLRewriter
for efficient web scraping directly on the edge.
Due to its reliance on Cloudflare-specific APIs like HTMLRewriter
, this project is exclusively designed for deployment on the Cloudflare platform.
- JSON API: Get structured data of scraped content.
- RSS Feeds: Subscribe to your favorite tech news.
- Built for Cloudflare: Optimized for performance and scalability on the Cloudflare network.
The following endpoints are currently available, routed under the /huggingface
path:
- JSON:
/huggingface/dailypapers
- RSS:
/huggingface/dailypapers/rss
- JSON:
/huggingface/trendingpapers
- RSS:
/huggingface/trendingpapers/rss
- Node.js and npm
- A Cloudflare account
- Clone the repository:
git clone <repository-url>
- Install dependencies:
npm install
- Rename
wrangler.toml.example
towrangler.toml
and update it with your Cloudflare account details.
Start the development server with:
npm run dev
Your application will be available at http://localhost:5173. The frontend is a simple React application for potential future UI, while the worker endpoints can be tested with tools like curl
or Postman.
- Build the project for production:
npm run build
- Deploy to Cloudflare Workers:
npm run deploy