Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursively skip verbatim elements #847

Merged
merged 1 commit into from
Dec 12, 2022
Merged

Recursively skip verbatim elements #847

merged 1 commit into from
Dec 12, 2022

Conversation

mre
Copy link
Member

@mre mre commented Nov 28, 2022

This handles cases like

<code>
    <pre>
        <span>https://example.org</span>
    </pre>
</code>

where a link was embedded in a verbatim block, but embedded in a non-verbatim element (<span>).
This was correctly handled in html5ever, but not html5gum (which is our default HTML parser in lychee)

Fixes #259 (comment).

@untitaker, is there a better way to handle nested tags in html5gum or would you have resolved the issue in a similar way?

@mre mre requested a review from untitaker November 28, 2022 17:29
@mre mre merged commit ef391ce into master Dec 12, 2022
@mre mre deleted the exclude-recursive branch December 12, 2022 00:06
@untitaker
Copy link
Collaborator

untitaker commented Dec 12, 2022

@mre sorry for the slow response. I believe a better fix would be untitaker/hyperlink@07a7f20, which uses naive_next_state to make the parser not emit <a> tags at all inside <script> tags (and I think inside pre too)

@mre
Copy link
Member Author

mre commented Dec 12, 2022

Oh I see. Would you have the time to create a pull request to change it to the new structure? As the author of the library it's probably way easier to do for you than for me. 😅 Maybe you want to ignore what I did in this PR and base it off of the last version of master just before this commit?

@untitaker
Copy link
Collaborator

Ah I see, you are extracting from text as well even if there's no <a> tag. I don't think it will help here. This is probably fine.

@mre
Copy link
Member Author

mre commented Dec 12, 2022

Probably not the cleanest implementation, but yeah. Thanks for the feedback. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exclude URLs based on HTML tags
2 participants