Analyze Elon Musk's tweets using Apache Spark (RDD API) and visualize insights with a stunning HTML dashboard.
Explore the repo »
Features ·
Structure ·
Screenshots ·
Run Guide ·
Stack
This project performs data analytics on Elon Musk’s tweets using Apache Spark with Scala.
It provides both statistical insights and a beautiful interactive dashboard built with HTML + Bootstrap + Glassmorphism effects.
The dashboard includes:
- Keyword frequency distribution
- Tweet length statistics (Mean, Std)
- Per-keyword insights
- A dynamic video/space-themed background 🌌
| Type | Description |
|---|---|
| 🧠 Dynamic Input | User enters keywords interactively via console |
| 💬 Text Analytics | Calculates tweet percentages by keyword occurrence |
| 📏 Statistical Metrics | Mean & Standard Deviation for all and per keyword |
| 🌐 HTML Dashboard | Interactive Bootstrap-based dashboard with video background |
| 🎁 Bonus Analytics | Optional export of results to HDFS for distributed environments |
elon-tweets/
├── build.sbt
├── project/
│ └── build.properties
├── src/
│ └── main/
│ ├── scala/
│ │ ├── ElonTweetsApp.scala # Main Spark Application
│ │ ├── DataLoader.scala # Reads CSV (local/HDFS)
│ │ ├── KeywordAnalyzer.scala # RDD-based analytics logic
│ │ └── ReportGenerator.scala # HTML report generator
│ └── resources/
│ ├── log4j.properties # Logging config
│ └── application-example.properties
├── data/ # Local input data (ignored in .gitignore)
├── output/ # Generated HTML reports
├── README.md
└── .gitignore
📊 Keyword-based statistics and insights
sbt clean compile
