📊 AI Hype Decoded: Sentiment Spillover & Stock Price Prediction

FinHack 2026 — Case 4

Can AI-related text sentiment — both direct and spillover from related companies — predict whether a stock goes up or down in the 5 trading days after earnings?

What This Project Does

We build a sentiment-driven prediction system that tests whether AI news signals improve post-earnings stock return prediction. The key innovation is spillover — sentiment about NVDA might predict AMD's earnings move, not just NVDA's.

The 5-Model Progression

Each model adds complexity. The output table shows whether accuracy increases at each step:

#	Model	Features	Method	Question Answered
1	Baseline	Market only	Logistic Regression	What can prices alone predict?
2	+ Sentiment	Market + direct sentiment	Logistic Regression	Does sentiment help?
3	+ Spillover	All features	Logistic Regression	Does cross-company sentiment help?
4	XGBoost	All features	Gradient Boosting	Do nonlinear interactions help?
5	LSTM	All features	Deep Learning	Does deep learning help?

Project Structure

finhack/
├── README.md                ← You are here
├── requirements.txt         ← Python dependencies
├── src/
│   ├── config.py            ← Stock universe, relationships, constants
│   ├── 01_collect_data.py   ← Step 1: Earnings dates, stock prices, news
│   ├── 02_sentiment.py      ← Step 2: FinBERT sentiment extraction
│   ├── 03_features.py       ← Step 3: Feature engineering (market, direct, spillover)
│   └── 04_model.py          ← Step 4: Train 5 models, print comparison table
├── dashboard/
│   └── app.py               ← Streamlit dashboard (4 tabs)
├── docs/
│   └── methodology.md       ← Full methodology writeup for presentation
└── data/                    ← Generated data files (created by pipeline, gitignored)

Quick Start

Prerequisites

Python 3.10+
pip
~2GB disk space (for FinBERT model download)
Internet connection (for stock data + model download on first run)

1. Clone and install

git clone git@github.com:Palash-Mehta/finhack.git
cd finhack
pip install -r requirements.txt

2. Run the pipeline (4 steps, in order)

# Step 1: Collect stock prices, earnings dates, and news headlines (~2 min)
python src/01_collect_data.py

# Step 2: Run FinBERT sentiment on all headlines (~3-5 min on CPU)
python src/02_sentiment.py

# Step 3: Engineer features — market, direct sentiment, spillover (~1 min)
python src/03_features.py

# Step 4: Train all 5 models and print comparison table (~2-3 min)
python src/04_model.py

Each step saves output to data/ so you can re-run later steps without re-running earlier ones.

3. Launch the dashboard

streamlit run dashboard/app.py

Opens at http://localhost:8501 with four tabs:

📈 Sentiment Timeline — Per-stock sentiment in the 7-day pre-earnings window
🕸️ Spillover Network — Interactive graph of cross-company relationships
🏆 Model Comparison — Side-by-side metrics + accuracy progression chart
🔍 Prediction Explorer — Drill into individual earnings events

What Each Step Does

Step 1: `01_collect_data.py` — Data Collection

Data	Source	Output
Daily stock prices	Yahoo Finance (`yfinance`)	`data/prices.parquet`
Earnings dates	Yahoo Finance earnings calendar	`data/earnings.csv`
News headlines	Synthetic (see note below)	`data/news.parquet`

⚠️ Note on news data: The prototype generates synthetic headlines for demo purposes. To use real data, replace the collect_news() function with API calls to NewsAPI, Polygon.io, or EODHD. The rest of the pipeline works identically.

Step 2: `02_sentiment.py` — FinBERT Sentiment

Runs ProsusAI/finbert on every headline
FinBERT is BERT fine-tuned on financial text — understands "beats expectations" = positive
Outputs a continuous score: P(positive) - P(negative), range [-1, +1]
Auto-detects GPU; falls back to CPU

Step 3: `03_features.py` — Feature Engineering

Three feature groups per earnings event:

Group	Features	Description
A — Market	`pre_ret_5d`, `pre_ret_20d`, `pre_vol_5d`, `mcap_bucket`	Price-based signals before earnings
B — Direct Sentiment	`sent_mean`, `sent_count`, `sent_trend`, `sent_extreme`	Sentiment about the target stock
C — Spillover	`spill_sent_mean`, `spill_sent_gap`, `spill_negative_count`	Sentiment about related companies

Target: Binary — is the 5-day post-earnings return positive (1) or negative (0)?

Step 4: `04_model.py` — Modeling

Trains 5 models and prints a comparison table like:

══════════════════════════════════════════════════════════════════════════════
  MODEL COMPARISON — ACCURACY PROGRESSION (Test Set)
══════════════════════════════════════════════════════════════════════════════
  #    Model                                              Acc      F1     AUC   Δ Acc
  ---- -------------------------------------------------- ------- ------- ------- -------
  1    Baseline Logreg                                    0.XXXX  0.XXXX  0.XXXX    —
  2    Sentiment Logreg                                   0.XXXX  0.XXXX  0.XXXX +0.XXXX
  3    Spillover Logreg                                   0.XXXX  0.XXXX  0.XXXX +0.XXXX
  4    Xgboost                                            0.XXXX  0.XXXX  0.XXXX +0.XXXX
  5    Lstm                                               0.XXXX  0.XXXX  0.XXXX +0.XXXX

Data Leakage Controls

This is the #1 thing judges will scrutinize. Our controls:

Rule	Implementation
Sentiment window	Only news from [earnings_date - 7, earnings_date] used
Market features	Only price data from before earnings date
Target return	Computed from days [+1, +5] after earnings
Validation split	Chronological only — train on 2023, validate 2024-H1, test 2024-H2+
No shuffling	Time-series split, never k-fold

Stock Universe (20 tickers)

Tier	Stocks	Role
Mega-cap AI	NVDA, MSFT, GOOGL, META, AMZN, AAPL	Primary sentiment generators
AI Infrastructure	AMD, AVGO, SMCI, MRVL, TSM	Hardware supply chain
AI Software	CRM, PLTR, SNOW, NOW, AI	Enterprise AI
AI Adjacent	TSLA, ORCL, IBM, INTC	Broader tech

Cross-company relationships (for spillover) are defined in src/config.py via SUPPLY_CHAIN_LINKS — e.g., NVDA → AMD, SMCI, MRVL, AVGO, TSM.

Key Files to Modify

If you want to...	Edit this
Add/remove stocks	`src/config.py` → `STOCK_UNIVERSE`
Change relationships	`src/config.py` → `SUPPLY_CHAIN_LINKS`
Use real news API	`src/01_collect_data.py` → `collect_news()`
Add features	`src/03_features.py` → add to compute functions + feature group lists
Tune models	`src/04_model.py` → hyperparameters in each train function
Change dashboard	`dashboard/app.py`

For the Presentation

Read docs/methodology.md for the full writeup (can be adapted into slides)
The dashboard is the live demo — walk through all 4 tabs
The model comparison table is the punchline — show accuracy progression
Key narrative: "When NVDA sentiment drops, AMD stock reacts before AMD even reports. Can we capture that spillover signal?"

Troubleshooting

Problem	Fix
`ModuleNotFoundError`	Run `pip install -r requirements.txt`
`FileNotFoundError: data/...`	Run the pipeline steps in order (01 → 02 → 03 → 04)
FinBERT download slow	First run downloads ~400MB model. Subsequent runs use cache.
`CUDA out of memory`	FinBERT falls back to CPU automatically. LSTM uses CPU by default on Mac.
Dashboard won't start	Make sure you've run all 4 pipeline steps first
yfinance rate limited	Wait a minute and re-run step 01. Or reduce `ALL_TICKERS` in config.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📊 AI Hype Decoded: Sentiment Spillover & Stock Price Prediction

What This Project Does

The 5-Model Progression

Project Structure

Quick Start

Prerequisites

1. Clone and install

2. Run the pipeline (4 steps, in order)

3. Launch the dashboard

What Each Step Does

Step 1: `01_collect_data.py` — Data Collection

Step 2: `02_sentiment.py` — FinBERT Sentiment

Step 3: `03_features.py` — Feature Engineering

Step 4: `04_model.py` — Modeling

Data Leakage Controls

Stock Universe (20 tickers)

Key Files to Modify

For the Presentation

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dashboard		dashboard
docs		docs
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

📊 AI Hype Decoded: Sentiment Spillover & Stock Price Prediction

What This Project Does

The 5-Model Progression

Project Structure

Quick Start

Prerequisites

1. Clone and install

2. Run the pipeline (4 steps, in order)

3. Launch the dashboard

What Each Step Does

Step 1: 01_collect_data.py — Data Collection

Step 2: 02_sentiment.py — FinBERT Sentiment

Step 3: 03_features.py — Feature Engineering

Step 4: 04_model.py — Modeling

Data Leakage Controls

Stock Universe (20 tickers)

Key Files to Modify

For the Presentation

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Step 1: `01_collect_data.py` — Data Collection

Step 2: `02_sentiment.py` — FinBERT Sentiment

Step 3: `03_features.py` — Feature Engineering

Step 4: `04_model.py` — Modeling

Packages