html2obsidian

Introduction

This lib (a simple __main__ interface attached) converts an HTML to Obsidian-style Markdown.

Features

Supported tags: <hr>, <p>, <h1>, <h2>, <h3>, <h4>, <h5>, <h6>, <ul>, <ol>, <li>, <a>, <blockquote>, <table>, <tr>, <th>, <td>, <img>, <b>, <strong>, <i>, <em>, <mark>, <del>, <s>, <sub>, <sup>, <pre>, <div> (partial), <code>, <samp>, <kbd>, <span> (partial)
Math style support: $...$ , $...$, $$...$$, \[...\], and MathML
Within-document link support
Within-site hyperlink/image support

Installation

This library requires python>=3.6.

To install dependencies (please refer to requirements.txt for detail), in particular, MathML is supported only when lxml is installed.

pip install -r requirements.txt

or

conda install pytest lxml

Usage

Run as executable

To use the attached __main__, refer to

python3 convert_html.py --help

for help. Note that sometimes there are warnings issued, e.g.

/path/to/convert_html.py:1183: UserWarning: illegal linebreaks in <a>; ignored
  warnings.warn('illegal linebreaks in <a>; ignored')

Most of the time, however, it does not imply errors in conversion.

Use the library

from lxml import etree
import convert_html

html_file = ...
options = ...  # may be empty dict
url = ...  # may be None

with open(html_file, encoding='utf-8') as infile:
    html = infile.read()
parser = etree.HTMLParser(target=convert_html.KeepOnlySupportedTarget(strict=True))
elements = etree.HTML(html, parser)
# this is the string output containing the markdown
output = convert_html.StackMarkdownGenerator(options, elements, url)

Please refer to convert_html.StackMarkdownGenerator.default_options for help on available options.

Run tests

To run the tests, simply

pytest

Note that sometimes there are warnings issued, like mentioned above. Please refer to test_convert_html.py (in particular, the comments), to see whether such warnings imply error or not.

Bugs

$$..$$-style math is not recognized when embedded in <p> rather than in <div class="math">.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
extern		extern
sample_html		sample_html
sample_output		sample_output
test_cases		test_cases
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.style.yapf		.style.yapf
LICENSE.txt		LICENSE.txt
README.md		README.md
convert_html.py		convert_html.py
generate_samples.sh		generate_samples.sh
requirements.txt		requirements.txt
test_convert_html.py		test_convert_html.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

html2obsidian

Introduction

Features

Installation

Usage

Run as executable

Use the library

Run tests

Bugs

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

kkew3/html2obsidian

Folders and files

Latest commit

History

Repository files navigation

html2obsidian

Introduction

Features

Installation

Usage

Run as executable

Use the library

Run tests

Bugs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages