Skip to content

nategalit/amazon-malloy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Who Is the Amazon Health & Wellness Buyer?

A Malloy Analysis of 1.85 Million Amazon Purchases


The Question

Does Amazon health and wellness spending follow the demographic lines you'd expect? My assumption going in: wealthier, more educated, older buyers would dominate supplement and medication purchases. The data had other ideas.

This analysis uses the Harvard Dataverse Amazon Purchases Dataset — 1,850,717 purchases from 5,027 households collected between 2018 and 2023, linked to a detailed demographic survey covering income, education, age, household size, and self-reported health conditions.


What I Found

Surprise #1: Income barely matters

I expected a steep income gradient in health spending. Instead, less than 2 percentage points separate the lowest and highest earners. Households making under $25,000 per year dedicate 7.9% of their Amazon orders to health products. Households making $150,000 or more: 7.7%. Amazon's health aisle is, against all expectation, remarkably democratic.

Health spending by income

Surprise #2: Education runs backwards

People with some high school or less dedicate 9.5% of their Amazon orders to health and wellness products. Graduate degree holders — doctors, lawyers, MBAs — come in last at 7.7%. The more education a household has, the smaller the share of spending that goes to health products. This is the opposite of what most consumer behavior research would predict.

The explanation likely isn't that educated buyers are less health-conscious. It's that graduate-degree households buy vastly more of everything on Amazon, so health products become a smaller slice of a much larger pie.

Health spending by education

Surprise #3: Diabetes predicts almost nothing — until you look closer

Households that self-reported having diabetes buy health products at 8.048% of total orders. Households without diabetes: 8.049%. The difference is essentially zero. On the surface, a chronic health condition predicts nothing about health purchasing behavior.

But break down which health categories diabetic buyers choose, and a sharper picture emerges. Diabetic households over-index heavily on PROFESSIONAL_HEALTHCARE (20.9% of purchases in that category come from diabetic buyers), OTC_MEDICATION (19.4%), and VITAMIN (14.4%). They are not buying more health products overall — they are buying more medically serious ones. The signal is hidden because the health category is dominated by supplements and skincare bought by people without chronic conditions.

Diabetic buyer categories

Surprise #4: "Health" means something completely different by age

Young buyers (18–24) and older buyers (55–64) shop the Amazon health aisle at similar overall rates — 8.0% vs 10.1%. But they are buying entirely different things. Older buyers lead in MEDICATION, VITAMIN, and HERBAL_SUPPLEMENT. Young buyers actually beat older buyers in SKIN_MOISTURIZER (1,571 vs 1,322 orders) and dominate BEAUTY. Amazon's health aisle serves two completely different customers who happen to shop in the same place.

Young vs older health categories


The Numbers at a Glance

Metric Value
Total purchases analyzed 1,850,717
Unique households 5,027
Date range Jan 2018 – Mar 2023
Total spend in dataset $44,053,633
Avg spend per household ~$8,765 over 5 years
Health & wellness orders ~160,000 (~8.6% of total)

How to Run This Yourself

Requirements: VS Code + the Malloy extension (bundles DuckDB, no setup needed) git clone https://github.com/YOUR_USERNAME/amazon-malloy cd amazon-malloy

  1. Download the dataset from Harvard Dataverse and place amazon-purchases.csv and survey.csv in the data/ folder
  2. Open the project folder in VS Code
  3. Open notebooks/amazon-health-story.malloynb to follow the story
  4. Open models/amazon.malloy to explore or write your own queries

Files in This Repo

amazon-malloy/ data/ survey.csv fields.csv models/ amazon.malloy ← source definitions, joins, measures hello.malloy ← initial sanity check query notebooks/ amazon-health-story.malloynb ← the narrative notebook README.md Note: amazon-purchases.csv (299MB) exceeds GitHub's file limit and is not included. Download it directly from Harvard Dataverse using the link above.


So What?

These findings matter most to three audiences.

Consumer brands and health product marketers should rethink demographic targeting. If income and education don't predict health spending on Amazon, then campaigns built on those proxies are misfiring. The better signal appears to be age — but only if you're willing to accept that "health" means skincare to one age group and medications to another, requiring entirely different product strategies and creative approaches.

Healthcare researchers and insurers should note the diabetes finding. The fact that diabetic households buy health products at the same rate as healthy households — but shift toward medications and professional healthcare equipment — suggests Amazon purchase data could be a meaningful supplementary signal for identifying unmet health needs, without requiring any clinical data at all.

Amazon itself sits on a segmentation goldmine. The platform currently presents a unified "Health & Wellness" category to all buyers. This data suggests that category is actually four or five distinct markets layered on top of each other, each driven by different demographics and different needs. A more granular category structure — or personalization that accounts for these splits — could meaningfully improve both discovery and conversion.


Dataset: Berinsky, A., et al. (2023). Amazon Purchase Histories. Harvard Dataverse. https://doi.org/10.7910/DVN/YGLYDY

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors