Download data from your LinkedIn company page using the Pages Data Portability API (DMA, EU). Produces raw JSON dumps under ./out/<org>/<timestamp>/ for downstream analysis.
See API_REFERENCE.md for the full endpoint catalogue.
- A LinkedIn developer app at https://www.linkedin.com/developers/apps that has been approved for the Pages Data Portability API product.
- The app's Client ID and Client Secret (Auth tab).
- A Redirect URL added under the Auth tab. Since this script runs in a terminal,
http://localhost:8000/callbackworks fine — the page won't load, you'll just copy the URL out of the browser bar. - Your LinkedIn organization ID (the numeric ID of your company page).
pip install -r requirements.txt
cp .env.example .env
# edit .env: fill in LINKEDIN_CLIENT_ID, LINKEDIN_CLIENT_SECRET, LINKEDIN_ORG_IDpython linkedin_export.py --auth-onlyThe script prints an authorization URL. Open it in a browser, sign in, click Allow, then paste either the redirect URL or just the code=... value back into the terminal. The token is saved to ./.token.json (gitignored, mode 600) and refreshed automatically on later runs.
If you already have a token from https://www.linkedin.com/developers/tools/oauth/token-generator, set LINKEDIN_ACCESS_TOKEN in .env and skip the flow.
# Just LIST post URNs + dates (no download). Good first run.
python linkedin_export.py --list-only
# List only posts published in 2025 (creation-time filter).
python linkedin_export.py --list-only --year 2025
# Year ranges and lists are supported.
python linkedin_export.py --list-only --year 2023-2025
python linkedin_export.py --list-only --year 2024,2025
# Posts + analytics for 2024 only (analytics window auto-derived from --year).
python linkedin_export.py --posts --analytics --year 2024
# One-shot: posts + engagement + analytics + followers, all years.
python linkedin_export.py --all
# Just followers (slow — 1 request / 60 seconds rate limit).
python linkedin_export.py --followers
# Cap pages while testing.
python linkedin_export.py --posts --max-pages 1The listing phase prints one line per post like 42. [2025-03-14] urn:li:share:1234. Posts with no extractable timestamp on the listing element are kept and shown as [????-??-??]. The listing is creation-time-descending, so when filtering by year the script short-circuits pagination once it drops below the earliest requested year.
Flags:
| Flag | Effect |
|---|---|
--posts |
List post URNs via dmaFeedContentsExternal?q=postsByAuthor, then hydrate via BATCH_GET /rest/dmaPosts. |
--list-only |
Only list post URNs (with dates) and write post_urns.json. Skips hydration, engagement, analytics, followers. |
--year |
Filter by post creation year. 2025, 2024,2025, 2022-2024, or all (default). |
--engagement |
Per post: comment URNs + reaction URNs (via dmaFeedContentsExternal), hydrated via BATCH_GET /rest/dmaComments and /rest/dmaReactions; plus BATCH_GET /rest/dmaSocialMetadata for counts. Implies --posts. |
--analytics |
Org-level trend (impressions, reactions, comments, reposts, clicks, engagement rate, CTR) + per-post trend via /rest/dmaOrganizationalPageContentAnalytics?q=trend. Implies --posts. |
--followers |
dmaOrganizationalPageFollows?q=followee&edgeType=MEMBER_FOLLOWS_ORGANIZATIONAL_PAGE. Cursor-paginated, 1 request / 60 seconds, max 1000 per page, up to 48h delayed. |
--all |
All of the above. |
--analytics-start-ms / --analytics-end-ms |
Analytics time window. Defaults: derived from --year if set, else last 365 days. |
--max-pages N |
Stop after N pages on each listing (debug). |
--reauth |
Force a fresh OAuth flow. |
--auth-only |
Run OAuth, save the token, then exit. |
out/
org-76457805/
2026-05-06T...Z/
post_urns.json
post_listing_raw_pages.json
posts.json
followers.json
followers_raw_pages.json
analytics_org_trend.json
per_post/
urn_li_share_<id>/
comments.json
reactions.json
social_metadata.json
analytics.json
*.json files are the raw API responses (each call's full body, list of pages where pagination was used). No transformation — keep the originals so you can iterate on the analysis.
Every call sends:
Authorization: Bearer <token>
LinkedIn-Version: 202604
X-Restli-Protocol-Version: 2.0.0
OAuth scope requested: r_dma_admin_pages_content.
- Member privacy: members who have not opted in to the Page owners exporting your data setting will appear obfuscated (no
actor/followerURN). - Analytics randomization: very small values may be rounded to 0; identical requests can return slightly different values (privacy-preserving randomization).
- Followers rate limit: 1 request per 60 seconds. The script sleeps automatically; budget time for many followers.
- Analytics quotas: hitting the privacy budget returns 0 values with a quota refresh time in metadata.
The script calls /rest/dmaOrganizationalPageContentAnalytics?q=trend for both an org-level summary (analytics_org_trend.json) and per-post (per_post/<slug>/analytics.json). A few quirks of that endpoint to know about:
- 14-month max window per call. The API rejects any
timeIntervalswhose start–end span is more than 14 months (400 BAD_REQUEST: "Start time and end time must be less than 14 months apart."). The script chunks the requested range into ≤12-month windows and merges theelementsfrom each call into a single response, so you can ask for the org's full multi-year history with--alland the chunking is transparent. - Per-post windows are scoped to the post. Each post's trend is fetched in a single call with the window
[post_date − 1 day, post_date + 12 months]. Most engagement on a LinkedIn post happens within weeks of publication, so this captures everything without chunking. - No upper-bound override needed for old posts. When you run without
--year, the org-level window auto-sets to[earliest post date − 1 day, now]. Override with--analytics-start-ms/--analytics-end-msif you need a different window. - Some old windows return 500. When the page had no activity in a particular window (typical for spans before the page's first post), LinkedIn sometimes returns a persistent
HTTP 500instead of an emptyelementslist. The script catches each such window, printsskipping window YYYY-MM-DD..YYYY-MM-DD: ..., records it undermetadata.failedWindowsin the merged JSON, and continues with the next window. The other windows still produce valid data — only the empty-data window is missing from the merged result. - Empty
elements≠ broken. A window that returns200 OKwith an emptyelementsarray genuinely had no activity. Don't confuse this with the 500 case above. - Older posts may show no per-post stats. The endpoint applies privacy thresholds: very small impression / reaction / comment counts are rounded to 0 (you can tell because
metadatacarries the quota-refresh marker). Posts older than ~12 months also occasionally return zero per-post elements regardless of what the trend was at the time. - What the rendered markdown shows. The
## Statsblock in each post's markdown sums totals across the trend window forIMPRESSIONS,REACTIONS,COMMENTS,REPOSTS, andCLICKS. If a metric is missing it renders as—.
401 EMPTY_ACCESS_TOKEN→ token expired; rerun, the script will refresh. Or--reauth.403 FORBIDDEN→ app not provisioned forr_dma_admin_pages_content, or you don't have an admin role on the page.429 TOO_MANY_REQUESTS→ script waits perRetry-After; for follows this is intrinsic.
Access to this API is gated — you can't just register an app and call the endpoints. LinkedIn has to verify and approve you first. The flow:
-
Be a Page admin. You must have an
ADMINISTRATOR,CONTENT_ADMINISTRATOR, orANALYSTrole on the LinkedIn Page whose data you want to export. If you don't, ask whoever does to add you (Page → Admin tools → Manage admins) — your personal LinkedIn account, not a separate "company" account, is what gets the role. -
Create a developer app. Go to https://www.linkedin.com/developers/apps/new, fill in the basics (name, link the LinkedIn Page, upload a logo, accept the API terms). The app is the OAuth client your script authenticates as.
-
Apply for the Pages Data Portability product. Open your app → Products tab → find Pages Data Portability API → click Request access. This opens a form asking for:
- Business / legal name and address.
- Country of operation (the API exists to satisfy the EU Digital Markets Act, so EU-relevant business context helps).
- Use case description — what you intend to do with the data, what page(s) you'll export, why.
- Confirmation that you're a Page admin.
- Acceptance of the LinkedIn DMA Portability API Terms.
-
Wait for review. LinkedIn's documentation says the decision typically comes within 7 business days. You'll get a notification (and an email) when it's approved or denied. Until it's approved, the Products tab still shows "Request access" and OAuth attempts with
scope=r_dma_admin_pages_contentwill fail withInvalid scope/403 FORBIDDEN. -
Configure OAuth. Once approved, in your app → Auth tab:
- Note the Client ID and Client Secret (paste into
.env). - Add
http://localhost:8000/callback(or whatever you set asLINKEDIN_REDIRECT_URI) under Authorized redirect URLs. The redirect URL string must match byte-for-byte.
- Note the Client ID and Client Secret (paste into
-
Run the OAuth flow (
python linkedin_export.py --auth-only) and you're done. The script's three-legged OAuth handshake is inlinkedin_export.pyand is also documented in LinkedIn's 3-legged OAuth Flow docs.
Useful LinkedIn-side references:
- Pages Data Portability API Overview — official endpoint catalogue.
- LinkedIn Pages Data Portability API application review and developer support — what the review checks for and how to follow up if it's slow.
- Page owners exporting your data — the member-side privacy toggle that controls whether commenters / reactors are obfuscated in your export.