This utility converts an HTML to Obsidian-style Markdown.
- Supported tags:
<body>
,<section>
,<aside>
,<hr>
,<p>
,<h1>
,<h2>
,<h3>
,<h4>
,<h5>
,<h6>
,<ul>
,<ol>
,<li>
,<a>
,<blockquote>
,<table>
,<tr>
,<th>
,<td>
,<img>
,<b>
,<strong>
,<i>
,<em>
,<mark>
,<del>
,<s>
,<sub>
,<sup>
,<pre>
,<div>
(partial),<code>
,<samp>
,<kbd>
,<span>
(partial) - Math style support:
$...$
,\(...\)
,$$...$$
,\[...\]
, and MathML - Within-document link support
- Within-site hyperlink/image support
This utility requires python>=3.9
.
To install:
git clone https://github.com/kkew3/html2obsidian.git && cd html2obsidian
pip install -e .
or use uv
(recommended):
uv tool install git+https://github.com/kkew3/html2obsidian.git
Check sample_html for example input html and sample_output for corresponding output markdown.
Example usage:
curl -fsSL the-url | html2obsidian --url the-url - > output.md
For detailed help, refer to
html2obsidian --help
which is quoted below for reference:
usage: html2obsidian [-h] [--ul-bullet {-,+,*}]
[--strong-symbol {*,_}] [--em-symbol {*,_}]
[--sub-start-symbol CHARS]
[--sub-end-symbol CHARS]
[--sup-start-symbol CHARS]
[--sup-end-symbol CHARS] [--join]
[--elevate-header-to N]
[--indent-list-with-tab]
[--write-base64-img-to WRITE_BASE64_IMG_TO]
[--url URL]
html_file
Convert an HTML file to Obsidian-style markdown and write to stdout.
positional arguments:
html_file the html file to read; pass `-` to read from
stdin
options:
-h, --help show this help message and exit
--ul-bullet {-,+,*}
--strong-symbol {*,_}
--em-symbol {*,_}
--sub-start-symbol CHARS
--sub-end-symbol CHARS
--sup-start-symbol CHARS
--sup-end-symbol CHARS
--join
--elevate-header-to N
--indent-list-with-tab
--write-base64-img-to WRITE_BASE64_IMG_TO
--url URL url if the html is downloaded from web; this
helps resolve within-doc link
Note that sometimes there are warnings issued, e.g.
/path/to/convert_html.py:1183: UserWarning: illegal linebreaks in <a>; ignored
warnings.warn('illegal linebreaks in <a>; ignored')
Most of the time, however, it does not imply errors in conversion.
from lxml import etree
from html2obsidian import convert_html
html_file = ...
options = ... # may be empty dict
url = ... # may be None
with open(html_file, encoding='utf-8') as infile:
html = infile.read()
parser = etree.HTMLParser(target=convert_html.KeepOnlySupportedTarget(strict=True))
elements = etree.HTML(html, parser)
# this is the string output containing the markdown
output = convert_html.StackMarkdownGenerator(options, elements, url)
Please refer to convert_html.StackMarkdownGenerator.default_options
for help on available options.
To run the tests, simply
pytest
Note that sometimes there are warnings issued, like mentioned above.
Please refer to test_convert_html.py
(in particular, the comments), to see whether such warnings imply error or not.
$$..$$
-style math is not recognized when embedded in<p>
rather than in<div class="math">
.