Skip to content

alopes/source-maps-downloader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

Webpack Source File Extractor

A Python tool that automatically discovers and extracts original source files from webpack bundles using sourcemaps. This tool is useful for analyzing and understanding the structure of webpack-based applications by recovering the original source code from production builds.

Features

  • Automatic Discovery: Scans web pages to find JavaScript files and their associated sourcemaps
  • Recursive Chunk Detection: Identifies and downloads dynamically loaded webpack chunks
  • Source Extraction: Extracts original source files from sourcemaps while preserving directory structure
  • Concurrent Downloads: Uses threading for efficient parallel downloading
  • Smart Filtering: Excludes node_modules and webpack internals
  • Next.js Support: Includes patterns for Next.js applications

Requirements

  • Python 3.6+
  • Required packages (auto-installed if missing):
    • requests
    • beautifulsoup4

Installation

Clone or download the sourcemap_downloader.py script to your local machine.

# Install dependencies manually if needed
pip install requests beautifulsoup4

Usage

Basic Usage

python sourcemap_downloader.py <URL>

Command Line Options

python sourcemap_downloader.py <URL> [options]

Arguments:
  URL                    Target website URL (e.g., https://example.com)

Options:
  -o, --output DIR       Output directory for extracted files (default: webpack_sources)
  -w, --workers N        Number of concurrent download workers (default: 10)
  -h, --help            Show help message

Examples

  1. Extract sources from a website:

    python sourcemap_downloader.py https://example.com
  2. Specify custom output directory:

    python sourcemap_downloader.py https://example.com -o my_sources
  3. Adjust concurrent workers for faster/slower downloads:

    python sourcemap_downloader.py https://example.com -w 20

How It Works

  1. Discovery Phase:

    • Fetches the main page HTML
    • Identifies all JavaScript files linked in <script> tags
    • Finds preloaded JavaScript resources
  2. Recursive Chunk Detection:

    • Downloads each JavaScript file
    • Searches for references to other webpack chunks
    • Builds a complete list of all JavaScript assets
  3. Sourcemap Extraction:

    • Looks for sourcemap references in JavaScript files (//# sourceMappingURL=)
    • Attempts to download .map files for each .js file
    • Validates that downloaded files are valid sourcemaps
  4. Source File Recovery:

    • Parses sourcemap JSON data
    • Extracts original source code from sourcesContent field
    • Recreates original directory structure
    • Saves files with their original paths

Output Structure

The tool creates a directory structure that mirrors the original source code organization:

webpack_sources/
├── components/
│   ├── Header.tsx
│   ├── Footer.tsx
│   └── ...
├── pages/
│   ├── index.tsx
│   ├── about.tsx
│   └── ...
├── utils/
│   └── helpers.ts
└── ...

Supported Patterns

The tool recognizes various webpack and framework patterns:

  • Standard webpack sourcemap references
  • Next.js chunk loading patterns
  • Dynamic imports and code splitting
  • Various sourcemap URL formats

Limitations

  • Only works with websites that include sourcemaps in production
  • Requires sourcemaps to contain the sourcesContent field
  • Cannot recover sources if sourcemaps are missing or incomplete
  • Skips minified/processed node_modules code

Security Considerations

This tool is designed for legitimate purposes such as:

  • Analyzing your own applications
  • Security research with proper authorization
  • Educational purposes
  • Understanding webpack bundle structures

Always ensure you have permission to analyze the target website and comply with applicable laws and terms of service.

Troubleshooting

No sourcemaps found:

  • The website may not include sourcemaps in production
  • Try checking if the site is in development mode

Incomplete source extraction:

  • Some sourcemaps may not include source content
  • Check if all JavaScript chunks were discovered

Connection errors:

  • Reduce the number of workers with -w 5
  • Check your internet connection
  • Verify the URL is accessible

License

This tool is provided as-is for educational and analysis purposes. Users are responsible for ensuring their use complies with all applicable laws and regulations.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages