A modern, fully typed Python library for converting HTML to Markdown. This library is a completely rewritten fork of markdownify with a modernized codebase, strict type safety and support for Python 3.9+.
- Full type safety with strict MyPy adherence
- Functional API design
- Extensive test coverage
- Configurable conversion options
- CLI tool for easy conversions
- Support for pre-configured BeautifulSoup instances
- Strict semver versioning
pip install html-to-markdown
Convert HTML to Markdown with a single function call:
from html_to_markdown import convert_to_markdown
html = '''
<article>
<h1>Welcome</h1>
<p>This is a <strong>sample</strong> with a <a href="https://example.com">link</a>.</p>
<ul>
<li>Item 1</li>
<li>Item 2</li>
</ul>
</article>
'''
markdown = convert_to_markdown(html)
print(markdown)
Output:
# Welcome
This is a **sample** with a [link](https://example.com).
* Item 1
* Item 2
If you need more control over HTML parsing, you can pass a pre-configured BeautifulSoup instance:
from bs4 import BeautifulSoup
from html_to_markdown import convert_to_markdown
# Configure BeautifulSoup with your preferred parser
soup = BeautifulSoup(html, 'lxml') # Note: lxml requires additional installation
markdown = convert_to_markdown(soup)
The library offers extensive customization through various options:
from html_to_markdown import convert_to_markdown
html = '<div>Your content here...</div>'
markdown = convert_to_markdown(
html,
heading_style="atx", # Use # style headers
strong_em_symbol="*", # Use * for bold/italic
bullets="*+-", # Define bullet point characters
wrap=True, # Enable text wrapping
wrap_width=100, # Set wrap width
escape_asterisks=True, # Escape * characters
code_language="python" # Default code block language
)
Option | Type | Default | Description |
---|---|---|---|
autolinks |
bool | True |
Auto-convert URLs to Markdown links |
bullets |
str | '*+-' |
Characters to use for bullet points |
code_language |
str | '' |
Default language for code blocks |
heading_style |
str | 'underlined' |
Header style ('underlined' , 'atx' , 'atx_closed' ) |
escape_asterisks |
bool | True |
Escape * characters |
escape_underscores |
bool | True |
Escape _ characters |
wrap |
bool | False |
Enable text wrapping |
wrap_width |
int | 80 |
Text wrap width |
For a complete list of options, see the Configuration section below.
Convert HTML files directly from the command line:
# Convert a file
html_to_markdown input.html > output.md
# Process stdin
cat input.html | html_to_markdown > output.md
# Use custom options
html_to_markdown --heading-style atx --wrap --wrap-width 100 input.html > output.md
View all available options:
html_to_markdown --help
For existing projects using Markdownify, a compatibility layer is provided:
# Old code
from markdownify import markdownify as md
# New code - works the same way
from html_to_markdown import markdownify as md
The markdownify
function is an alias for convert_to_markdown
and provides identical functionality.
Full list of configuration options:
autolinks
: Convert valid URLs to Markdown links automaticallybullets
: Characters to use for bullet points in listscode_language
: Default language for fenced code blockscode_language_callback
: Function to determine code block languageconvert
: List of HTML tags to convert (None = all supported tags)default_title
: Use default titles for elements like linksescape_asterisks
: Escape * charactersescape_misc
: Escape miscellaneous Markdown charactersescape_underscores
: Escape _ charactersheading_style
: Header style (underlined/atx/atx_closed)keep_inline_images_in
: Tags where inline images should be keptnewline_style
: Style for handling newlines (spaces/backslash)strip
: Tags to remove from outputstrong_em_symbol
: Symbol for strong/emphasized text (* or _)sub_symbol
: Symbol for subscript textsup_symbol
: Symbol for superscript textwrap
: Enable text wrappingwrap_width
: Width for text wrappingconvert_as_inline
: Treat content as inline elements
This library is open to contribution. Feel free to open issues or submit PRs. Its better to discuss issues before submitting PRs to avoid disappointment.
- Clone the repo
- Install the system dependencies
- Install the full dependencies with
uv sync
- Install the pre-commit hooks with:
pre-commit install && pre-commit install --hook-type commit-msg
- Make your changes and submit a PR
This library uses the MIT license.
Special thanks to the original markdownify project creators and contributors.