Skip to content

dpella/linked-company-pages-export

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LinkedIn Pages Data Portability Export

Download data from your LinkedIn company page using the Pages Data Portability API (DMA, EU). Produces raw JSON dumps under ./out/<org>/<timestamp>/ for downstream analysis.

See API_REFERENCE.md for the full endpoint catalogue.

Prerequisites

  1. A LinkedIn developer app at https://www.linkedin.com/developers/apps that has been approved for the Pages Data Portability API product.
  2. The app's Client ID and Client Secret (Auth tab).
  3. A Redirect URL added under the Auth tab. Since this script runs in a terminal, http://localhost:8000/callback works fine — the page won't load, you'll just copy the URL out of the browser bar.
  4. Your LinkedIn organization ID (the numeric ID of your company page).

Setup

pip install -r requirements.txt
cp .env.example .env
# edit .env: fill in LINKEDIN_CLIENT_ID, LINKEDIN_CLIENT_SECRET, LINKEDIN_ORG_ID

OAuth (first run)

python linkedin_export.py --auth-only

The script prints an authorization URL. Open it in a browser, sign in, click Allow, then paste either the redirect URL or just the code=... value back into the terminal. The token is saved to ./.token.json (gitignored, mode 600) and refreshed automatically on later runs.

If you already have a token from https://www.linkedin.com/developers/tools/oauth/token-generator, set LINKEDIN_ACCESS_TOKEN in .env and skip the flow.

Usage

# Just LIST post URNs + dates (no download). Good first run.
python linkedin_export.py --list-only

# List only posts published in 2025 (creation-time filter).
python linkedin_export.py --list-only --year 2025

# Year ranges and lists are supported.
python linkedin_export.py --list-only --year 2023-2025
python linkedin_export.py --list-only --year 2024,2025

# Posts + analytics for 2024 only (analytics window auto-derived from --year).
python linkedin_export.py --posts --analytics --year 2024

# One-shot: posts + engagement + analytics + followers, all years.
python linkedin_export.py --all

# Just followers (slow — 1 request / 60 seconds rate limit).
python linkedin_export.py --followers

# Cap pages while testing.
python linkedin_export.py --posts --max-pages 1

The listing phase prints one line per post like 42. [2025-03-14] urn:li:share:1234. Posts with no extractable timestamp on the listing element are kept and shown as [????-??-??]. The listing is creation-time-descending, so when filtering by year the script short-circuits pagination once it drops below the earliest requested year.

Flags:

Flag Effect
--posts List post URNs via dmaFeedContentsExternal?q=postsByAuthor, then hydrate via BATCH_GET /rest/dmaPosts.
--list-only Only list post URNs (with dates) and write post_urns.json. Skips hydration, engagement, analytics, followers.
--year Filter by post creation year. 2025, 2024,2025, 2022-2024, or all (default).
--engagement Per post: comment URNs + reaction URNs (via dmaFeedContentsExternal), hydrated via BATCH_GET /rest/dmaComments and /rest/dmaReactions; plus BATCH_GET /rest/dmaSocialMetadata for counts. Implies --posts.
--analytics Org-level trend (impressions, reactions, comments, reposts, clicks, engagement rate, CTR) + per-post trend via /rest/dmaOrganizationalPageContentAnalytics?q=trend. Implies --posts.
--followers dmaOrganizationalPageFollows?q=followee&edgeType=MEMBER_FOLLOWS_ORGANIZATIONAL_PAGE. Cursor-paginated, 1 request / 60 seconds, max 1000 per page, up to 48h delayed.
--all All of the above.
--analytics-start-ms / --analytics-end-ms Analytics time window. Defaults: derived from --year if set, else last 365 days.
--max-pages N Stop after N pages on each listing (debug).
--reauth Force a fresh OAuth flow.
--auth-only Run OAuth, save the token, then exit.

Output layout

out/
  org-76457805/
    2026-05-06T...Z/
      post_urns.json
      post_listing_raw_pages.json
      posts.json
      followers.json
      followers_raw_pages.json
      analytics_org_trend.json
      per_post/
        urn_li_share_<id>/
          comments.json
          reactions.json
          social_metadata.json
          analytics.json

*.json files are the raw API responses (each call's full body, list of pages where pagination was used). No transformation — keep the originals so you can iterate on the analysis.

Headers / API version

Every call sends:

Authorization: Bearer <token>
LinkedIn-Version: 202604
X-Restli-Protocol-Version: 2.0.0

OAuth scope requested: r_dma_admin_pages_content.

Caveats

  • Member privacy: members who have not opted in to the Page owners exporting your data setting will appear obfuscated (no actor / follower URN).
  • Analytics randomization: very small values may be rounded to 0; identical requests can return slightly different values (privacy-preserving randomization).
  • Followers rate limit: 1 request per 60 seconds. The script sleeps automatically; budget time for many followers.
  • Analytics quotas: hitting the privacy budget returns 0 values with a quota refresh time in metadata.

Notes on analytics

The script calls /rest/dmaOrganizationalPageContentAnalytics?q=trend for both an org-level summary (analytics_org_trend.json) and per-post (per_post/<slug>/analytics.json). A few quirks of that endpoint to know about:

  • 14-month max window per call. The API rejects any timeIntervals whose start–end span is more than 14 months (400 BAD_REQUEST: "Start time and end time must be less than 14 months apart."). The script chunks the requested range into ≤12-month windows and merges the elements from each call into a single response, so you can ask for the org's full multi-year history with --all and the chunking is transparent.
  • Per-post windows are scoped to the post. Each post's trend is fetched in a single call with the window [post_date − 1 day, post_date + 12 months]. Most engagement on a LinkedIn post happens within weeks of publication, so this captures everything without chunking.
  • No upper-bound override needed for old posts. When you run without --year, the org-level window auto-sets to [earliest post date − 1 day, now]. Override with --analytics-start-ms / --analytics-end-ms if you need a different window.
  • Some old windows return 500. When the page had no activity in a particular window (typical for spans before the page's first post), LinkedIn sometimes returns a persistent HTTP 500 instead of an empty elements list. The script catches each such window, prints skipping window YYYY-MM-DD..YYYY-MM-DD: ..., records it under metadata.failedWindows in the merged JSON, and continues with the next window. The other windows still produce valid data — only the empty-data window is missing from the merged result.
  • Empty elements ≠ broken. A window that returns 200 OK with an empty elements array genuinely had no activity. Don't confuse this with the 500 case above.
  • Older posts may show no per-post stats. The endpoint applies privacy thresholds: very small impression / reaction / comment counts are rounded to 0 (you can tell because metadata carries the quota-refresh marker). Posts older than ~12 months also occasionally return zero per-post elements regardless of what the trend was at the time.
  • What the rendered markdown shows. The ## Stats block in each post's markdown sums totals across the trend window for IMPRESSIONS, REACTIONS, COMMENTS, REPOSTS, and CLICKS. If a metric is missing it renders as .

Troubleshooting

  • 401 EMPTY_ACCESS_TOKEN → token expired; rerun, the script will refresh. Or --reauth.
  • 403 FORBIDDEN → app not provisioned for r_dma_admin_pages_content, or you don't have an admin role on the page.
  • 429 TOO_MANY_REQUESTS → script waits per Retry-After; for follows this is intrinsic.

Getting access to the Pages Data Portability API

Access to this API is gated — you can't just register an app and call the endpoints. LinkedIn has to verify and approve you first. The flow:

  1. Be a Page admin. You must have an ADMINISTRATOR, CONTENT_ADMINISTRATOR, or ANALYST role on the LinkedIn Page whose data you want to export. If you don't, ask whoever does to add you (Page → Admin toolsManage admins) — your personal LinkedIn account, not a separate "company" account, is what gets the role.

  2. Create a developer app. Go to https://www.linkedin.com/developers/apps/new, fill in the basics (name, link the LinkedIn Page, upload a logo, accept the API terms). The app is the OAuth client your script authenticates as.

  3. Apply for the Pages Data Portability product. Open your app → Products tab → find Pages Data Portability API → click Request access. This opens a form asking for:

    • Business / legal name and address.
    • Country of operation (the API exists to satisfy the EU Digital Markets Act, so EU-relevant business context helps).
    • Use case description — what you intend to do with the data, what page(s) you'll export, why.
    • Confirmation that you're a Page admin.
    • Acceptance of the LinkedIn DMA Portability API Terms.
  4. Wait for review. LinkedIn's documentation says the decision typically comes within 7 business days. You'll get a notification (and an email) when it's approved or denied. Until it's approved, the Products tab still shows "Request access" and OAuth attempts with scope=r_dma_admin_pages_content will fail with Invalid scope / 403 FORBIDDEN.

  5. Configure OAuth. Once approved, in your app → Auth tab:

    • Note the Client ID and Client Secret (paste into .env).
    • Add http://localhost:8000/callback (or whatever you set as LINKEDIN_REDIRECT_URI) under Authorized redirect URLs. The redirect URL string must match byte-for-byte.
  6. Run the OAuth flow (python linkedin_export.py --auth-only) and you're done. The script's three-legged OAuth handshake is in linkedin_export.py and is also documented in LinkedIn's 3-legged OAuth Flow docs.

Useful LinkedIn-side references:

About

Script to export all post done by a European company

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages