A Python tool that automatically discovers and extracts original source files from webpack bundles using sourcemaps. This tool is useful for analyzing and understanding the structure of webpack-based applications by recovering the original source code from production builds.
- Automatic Discovery: Scans web pages to find JavaScript files and their associated sourcemaps
- Recursive Chunk Detection: Identifies and downloads dynamically loaded webpack chunks
- Source Extraction: Extracts original source files from sourcemaps while preserving directory structure
- Concurrent Downloads: Uses threading for efficient parallel downloading
- Smart Filtering: Excludes node_modules and webpack internals
- Next.js Support: Includes patterns for Next.js applications
- Python 3.6+
- Required packages (auto-installed if missing):
requests
beautifulsoup4
Clone or download the sourcemap_downloader.py
script to your local machine.
# Install dependencies manually if needed
pip install requests beautifulsoup4
python sourcemap_downloader.py <URL>
python sourcemap_downloader.py <URL> [options]
Arguments:
URL Target website URL (e.g., https://example.com)
Options:
-o, --output DIR Output directory for extracted files (default: webpack_sources)
-w, --workers N Number of concurrent download workers (default: 10)
-h, --help Show help message
-
Extract sources from a website:
python sourcemap_downloader.py https://example.com
-
Specify custom output directory:
python sourcemap_downloader.py https://example.com -o my_sources
-
Adjust concurrent workers for faster/slower downloads:
python sourcemap_downloader.py https://example.com -w 20
-
Discovery Phase:
- Fetches the main page HTML
- Identifies all JavaScript files linked in
<script>
tags - Finds preloaded JavaScript resources
-
Recursive Chunk Detection:
- Downloads each JavaScript file
- Searches for references to other webpack chunks
- Builds a complete list of all JavaScript assets
-
Sourcemap Extraction:
- Looks for sourcemap references in JavaScript files (
//# sourceMappingURL=
) - Attempts to download
.map
files for each.js
file - Validates that downloaded files are valid sourcemaps
- Looks for sourcemap references in JavaScript files (
-
Source File Recovery:
- Parses sourcemap JSON data
- Extracts original source code from
sourcesContent
field - Recreates original directory structure
- Saves files with their original paths
The tool creates a directory structure that mirrors the original source code organization:
webpack_sources/
├── components/
│ ├── Header.tsx
│ ├── Footer.tsx
│ └── ...
├── pages/
│ ├── index.tsx
│ ├── about.tsx
│ └── ...
├── utils/
│ └── helpers.ts
└── ...
The tool recognizes various webpack and framework patterns:
- Standard webpack sourcemap references
- Next.js chunk loading patterns
- Dynamic imports and code splitting
- Various sourcemap URL formats
- Only works with websites that include sourcemaps in production
- Requires sourcemaps to contain the
sourcesContent
field - Cannot recover sources if sourcemaps are missing or incomplete
- Skips minified/processed node_modules code
This tool is designed for legitimate purposes such as:
- Analyzing your own applications
- Security research with proper authorization
- Educational purposes
- Understanding webpack bundle structures
Always ensure you have permission to analyze the target website and comply with applicable laws and terms of service.
No sourcemaps found:
- The website may not include sourcemaps in production
- Try checking if the site is in development mode
Incomplete source extraction:
- Some sourcemaps may not include source content
- Check if all JavaScript chunks were discovered
Connection errors:
- Reduce the number of workers with
-w 5
- Check your internet connection
- Verify the URL is accessible
This tool is provided as-is for educational and analysis purposes. Users are responsible for ensuring their use complies with all applicable laws and regulations.