A Python package for working with SEC filings at scale. Full Documentation | Website
- Download SEC filings quickly and efficiently
- Monitor EDGAR for new filings in real-time
- Parse filings at scale
- Access comprehensive datasets (10-Ks, SIC codes, etc.)
- Build datasets directly from unstructured text
- Interact with SEC data using MuleBot
pip install datamule
from datamule import Portfolio
# Create a Portfolio object
portfolio = Portfolio('output_dir') # can be an existing directory or a new one
# Download submissions
portfolio.download_submissions(
filing_date=('2023-01-01','2023-01-03'),
submission_type=['10-K']
)
# Iterate through documents by document type
for ten_k in portfolio.document_type('10-K'):
ten_k.parse()
print(ten_k.data['document']['part2']['item7'])
# Iterate through documents by what strings they contain
for document in portfolio.contains_string('United States'):
print(document.path)
# You can also use regex patterns
for document in portfolio.contains_string(r'(?i)covid-19'):
print(document.type)
# For faster operations, you can take advantage of built in threading with callback function
def callback(submission):
print(submission.path)
submission_results = portfolio.process_submissions(callback)
Create a discord bot, use insider trading disclosures to map relationships in Silicon Valley, and more in examples.
Default is the SEC, but for faster downloads you can use datamule.
from datamule import Config
config = Config()
config.set_default_source("datamule") # set default source to datamule, can also be "sec"
print(f"Default source: {config.get_default_source()}")
To use datamule as a provider, you need an API key. It costs $1/100,000 downloads.
File Size | Examples | Downloader | Premium Downloader |
---|---|---|---|
Small Files | 3, 4, 5 | 5/s | 300/s |
Medium Files | 8-K | 5/s | 60/s |
Large Files | 10-K | 3/s | 5/s |
- How to download SEC filings in 2025
- How to host the SEC Archive for $20/month
- Creating Structured Datasets from SEC filings
- Deploy a Financial Chatbot in 5 Minutes