Skip to content

rewrite markdown parser #159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
192 changes: 100 additions & 92 deletions docs/source/topics/text-formatting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,98 @@ list of the basic styles currently supported by Pyrogram.
- spoiler
- `text URL <https://telegramplayground.github.io/pyrogram/>`_
- `user text mention <tg://user?id=123456789>`_
- :emoji:`👍`


HTML Style
----------

To strictly use this mode, pass :obj:`~pyrogram.enums.HTML` to the *parse_mode* parameter when using
:meth:`~pyrogram.Client.send_message`. The following tags are currently supported:

.. code-block:: text

<b>bold</b>, <strong>bold</strong>

<i>italic</i>, <em>italic</em>

<u>underline</u>

<s>strike</s>, <del>strike</del>, <strike>strike</strike>

<spoiler>spoiler</spoiler>

<a href="https://telegramplayground.github.io/pyrogram/">text URL</a>

<a href="tg://user?id=123456789">inline mention</a>

<code>inline fixed-width code</code>

<emoji id="5469770542288478598">👍</emoji>

<pre language="py">
pre-formatted
fixed-width
code block
</pre>

**Example**:

.. code-block:: python

from pyrogram.enums import ParseMode

await app.send_message(
chat_id="me",
text=(
"<b>bold</b>, <strong>bold</strong>"
"<i>italic</i>, <em>italic</em>"
"<u>underline</u>, <ins>underline</ins>"
"<s>strike</s>, <strike>strike</strike>, <del>strike</del>"
"<spoiler>spoiler</spoiler>\n\n"

"<b>bold <i>italic bold <s>italic bold strike <spoiler>italic bold strike spoiler</spoiler></s> <u>underline italic bold</u></i> bold</b>\n\n"

"<a href=\"https://telegramplayground.github.io/pyrogram/\">inline URL</a> "
"<a href=\"tg://user?id=23122162\">inline mention of a user</a>\n"
"<emoji id=5469770542288478598>👍</emoji> "
"<code>inline fixed-width code</code> "
"<pre>pre-formatted fixed-width code block</pre>\n\n"
"<pre language='py'>"
"for i in range(10):\n"
" print(i)"
"</pre>\n\n"

"<blockquote>Block quotation started"
"Block quotation continued"
"The last line of the block quotation</blockquote>"
"<blockquote expandable>Expandable block quotation started"
"Expandable block quotation continued"
"Expandable block quotation continued"
"Hidden by default part of the block quotation started"
"Expandable block quotation continued"
"The last line of the block quotation</blockquote>"
),
parse_mode=ParseMode.HTML
)

.. note::

All ``<``, ``>`` and ``&`` symbols that are not a part of a tag or an HTML entity must be replaced with the
corresponding HTML entities (``<`` with ``&lt;``, ``>`` with ``&gt;`` and ``&`` with ``&amp;``). You can use this
snippet to quickly escape those characters:

.. code-block:: python

text = "<my & text>"
text = text.replace("<", "&lt;").replace("&", "&amp;")

print(text)

.. code-block:: text

&lt;my &amp; text>


Markdown Style
--------------
Expand Down Expand Up @@ -107,11 +197,12 @@ To strictly use this mode, pass :obj:`~pyrogram.enums.ParseMode.MARKDOWN` to the
"~~strike~~, "
"||spoiler||, "
"[URL](https://telegramplayground.github.io/pyrogram/), "
"![👍](tg://emoji?id=5469770542288478598)"
"`code`, "
"```"
"```py"
"for i in range(10):\n"
" print(i)"
"```"
"```\n"

">blockquote\n"

Expand All @@ -135,96 +226,6 @@ To strictly use this mode, pass :obj:`~pyrogram.enums.ParseMode.MARKDOWN` to the
parse_mode=ParseMode.MARKDOWN
)

HTML Style
----------

To strictly use this mode, pass :obj:`~pyrogram.enums.HTML` to the *parse_mode* parameter when using
:meth:`~pyrogram.Client.send_message`. The following tags are currently supported:

.. code-block:: text

<b>bold</b>, <strong>bold</strong>

<i>italic</i>, <em>italic</em>

<u>underline</u>

<s>strike</s>, <del>strike</del>, <strike>strike</strike>

<spoiler>spoiler</spoiler>

<a href="https://telegramplayground.github.io/pyrogram/">text URL</a>

<a href="tg://user?id=123456789">inline mention</a>

<code>inline fixed-width code</code>

<emoji id="12345678901234567890">🔥</emoji>

<pre language="py">
pre-formatted
fixed-width
code block
</pre>

**Example**:

.. code-block:: python

from pyrogram.enums import ParseMode

await app.send_message(
chat_id="me",
text=(
"<b>bold</b>, <strong>bold</strong>"
"<i>italic</i>, <em>italic</em>"
"<u>underline</u>, <ins>underline</ins>"
"<s>strike</s>, <strike>strike</strike>, <del>strike</del>"
"<spoiler>spoiler</spoiler>\n\n"

"<b>bold <i>italic bold <s>italic bold strike <spoiler>italic bold strike spoiler</spoiler></s> <u>underline italic bold</u></i> bold</b>\n\n"

"<a href=\"https://telegramplayground.github.io/pyrogram/\">inline URL</a> "
"<a href=\"tg://user?id=23122162\">inline mention of a user</a>\n"
"<emoji id=5368324170671202286>👍</emoji> "
"<code>inline fixed-width code</code> "
"<pre>pre-formatted fixed-width code block</pre>\n\n"
"<pre language='py'>"
"for i in range(10):\n"
" print(i)"
"</pre>\n\n"

"<blockquote>Block quotation started"
"Block quotation continued"
"The last line of the block quotation</blockquote>"
"<blockquote expandable>Expandable block quotation started"
"Expandable block quotation continued"
"Expandable block quotation continued"
"Hidden by default part of the block quotation started"
"Expandable block quotation continued"
"The last line of the block quotation</blockquote>"
),
parse_mode=ParseMode.HTML
)

.. note::

All ``<``, ``>`` and ``&`` symbols that are not a part of a tag or an HTML entity must be replaced with the
corresponding HTML entities (``<`` with ``&lt;``, ``>`` with ``&gt;`` and ``&`` with ``&amp;``). You can use this
snippet to quickly escape those characters:

.. code-block:: python

import html

text = "<my text>"
text = html.escape(text)

print(text)

.. code-block:: text

&lt;my text&gt;

Different Styles
----------------
Expand Down Expand Up @@ -272,6 +273,13 @@ Result:
Nested and Overlapping Entities
-------------------------------

.. warning::

The Markdown style is not recommended for complex text formatting.

If you want to use complex text formatting such as nested entities, overlapping entities use the HTML style instead.


You can also style texts with more than one decoration at once by nesting entities together. For example, you can send
a text message with both :bold-underline:`bold and underline` styles, or a text that has both :strike-italic:`italic and
strike` styles, and you can still combine both Markdown and HTML together.
Expand Down
2 changes: 1 addition & 1 deletion pyrogram/parser/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Pyrogram - Telegram MTProto API Client Library for Python
# Copyright (C) 2017-present Dan <https://github.com/delivrance>
# Copyright (C) 2017-present <https://github.com/TelegramPlayGround>
#
# This file is part of Pyrogram.
#
Expand Down
7 changes: 2 additions & 5 deletions pyrogram/parser/html.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Pyrogram - Telegram MTProto API Client Library for Python
# Copyright (C) 2017-present Dan <https://github.com/delivrance>
# Copyright (C) 2017-present <https://github.com/TelegramPlayGround>
#
# This file is part of Pyrogram.
#
Expand Down Expand Up @@ -178,16 +178,13 @@ def parse_one(entity):
language = getattr(entity, "language", "") or ""
start_tag = f'<{name} language="{language}">' if language else f"<{name}>"
end_tag = f"</{name}>"
elif entity_type == MessageEntityType.BLOCKQUOTE:
name = entity_type.name.lower()
start_tag = f"<{name}>"
end_tag = f"</{name}>"
elif entity_type == MessageEntityType.EXPANDABLE_BLOCKQUOTE:
name = "blockquote"
start_tag = f"<{name} expandable>"
end_tag = f"</{name}>"
elif entity_type in (
MessageEntityType.CODE,
MessageEntityType.BLOCKQUOTE,
MessageEntityType.SPOILER,
):
name = entity_type.name.lower()
Expand Down
Loading