Welcome to the Email-Folder AI Agent Hackathon! 🎉 In this challenge, you’ll build an intelligent agent that:
- Reads customer emails from a designated folder
- Parses sender, subject, body text, and attachments (mostly PDFs)
- Invokes a document‐extraction API or model to pull structured content
- Generates a concise summary of both email context and extracted document details
- (Optional) Presents the summary on a lightweight web front‐end for user confirmation
Many businesses receive dozens—or even hundreds—of emails per day, each potentially containing valuable documents. Your goal is to automate this pipeline end to end:
- Ingest: Load raw
.emlor.msgfiles from a given folder - Parse: Extract metadata (sender, subject), body text, and attachments
- Extract: Feed attachments into a document‐extraction API or your own model to obtain structured data
- Summarize & Confirm: Produce a human‐readable summary (JSON or HTML) of email + document contents, ready for user confirmation via a simple web UI
- Deliver: Package your solution into a GitHub repo, complete with a flowchart and clear run instructions
-
Correctness & Coverage
- All emails in the folder are processed reliably
- Attachments (PDFs, DOCX, images) are parsed and extracted
- Summaries include both email metadata and document content details
-
Architecture Clarity
- A clear flowchart (PNG, SVG, or Mermaid in Markdown) illustrating each stage
- Well‐structured README with step‐by‐step setup/run guide
-
Code Quality & Modularity
- Clean, maintainable code (Python, Node.js, Java, etc.)
- Robust error handling (missing fields, corrupt files, network issues)
-
Ease of Deployment
- Simple local setup instructions (
pip install,npm install) - Bonus: Dockerfile or
docker-compose.yml
- Simple local setup instructions (
-
Innovation & UX
- Bonus for a lightweight web front‐end (React/Flask/Express) for manual confirmation/editing of extracted data
A ZIP archive containing sample emails (included in this repo under /emails):
emails/sample1.emlemails/sample2.eml- …
Each may include zero or more attachments (.pdf, .docx, .jpeg).
Tip: You can generate your own test emails or leverage open‐source libraries like mailparser or Apache Tika for parsing.
-
README.md(this file) -
flowchart.*– Diagram (PNG/SVG/Mermaid) illustrating your end-to-end pipeline -
Source code under
/src -
Dependency & Run Instructions, for example:
git clone https://github.com/your-org/your-repo.git cd your-repo pip install -r requirements.txt # or npm install ./run_agent.sh # or npm start -
Sample outputs for the provided emails under
/output -
(Optional) Dockerfile or
docker-compose.ymlfor containerized setup -
(Optional) Web UI under
/webdemonstrating manual confirmation/editing
- Fork this repository
- Implement your solution, add your flowchart, and update this README
- Push to your fork and open a Pull Request against
main - In your PR description, include:
- A brief overview of your approach
- Any special dependencies or setup steps
- (If applicable) Link to a live demo or screenshot
| Category | Weight |
|---|---|
| Functionality | 40% |
| Architecture Clarity | 25% |
| Code Quality | 15% |
| Ease of Deployment | 10% |
| Innovation & UX | 10% |
Winners will be selected based on the combined score across these areas.
- Kick-off: May 30, 2025
- Submission Deadline: June 10, 2025, 23:59 IST
- Winners Announced: June 16, 2025
- Teams of 1 participants
- All code must be original or properly attributed
- No plagiarism—automated and manual checks will be performed
- Keep your fork public until winners are announced
- mailparser (Node.js)
- Python
emaillibrary - Apache Tika for document parsing
- Mermaid or Graphviz for flowcharts
If you have any questions, please:
- Open an issue in this repo
- Email us at hr@cargoa.io
Good luck, and happy hacking! 🚀