Select Markdown links based on their tag #270

T145 · 2021-06-25T18:04:04Z

Something like:

./lychee README.md -t 3 -m 2 --exclude-mail -v --md-tags link

Would then exclusively select all links formatted as such:

[link](https://github.com/lycheeverse/lychee)

The text was updated successfully, but these errors were encountered:

MichaIng · 2021-06-25T22:43:40Z

Somehow similar to #259.

What you call "tag" here, is the link text <a href="https://github.com/lycheeverse/lychee">text</a>. I wonder if this is a generic enough component to base an include/exclude option on, as this text usually is different on every link, or you use <https://github.com/lycheeverse/lychee> to have it matching the URL itself, and even the brackets are optional for most interpreters.

EDIT: I'm just collecting a few probably dump ideas below about how to combine both requests, and probably more similar ones that may arise. But I'm too tired to come up with a clever one, I guess. I'll leave it and review tomorrow, maybe some approach is useful after all 😄 😴.

Using an attribute or class to exclude would be great IMO, but in Markdown such does not exist without extensions. Using the attribute extension allows to do something like [link](https://github.com/lycheeverse/lychee){: .exclude } to add the HTML element to "exclude" class in a resulting HTML document. But that is hard to parse, as the colon is optional and class="... exclude ..." can be used as well to add it to one or multiple classes. Difficult if there is no Markdown parser library with support for this extension, and of course not all Markdown files are converted respecting this extension/syntax or converted at all.

I'm thinking about a single option which covers HTML and Markdown (and probably all types of) documents, else it might get a never ending list of options...

Other linters often respect comments given in the document. So a flexible behaviour would be to e.g. let lychee skip (or explicitly include) the next URL, when a  line or similar is seen before, and other comment syntax respectively.
While it is flexible, it potentially requires to add a lot of comments to the code, while a tag/attribute/class-based include/exclude rule would be especially for HTML much nicer IMO, but difficult to find such for Markdown.

And you ask for an include option, while my request was for an exclude option. Probably it could be merged by using an inversion flag. Something like calling the options --filter 'excludeThis' and using --include --filter 'includeThis' so make all filter rules exclusive includes instead of excludes. And code-wise the filter values could then be used to decide whether an URL is checked or not, but in presence of the --include flag the result is simply inverted in all cases.

T145 · 2021-06-25T22:53:09Z

@MichaIng There could be a similar --html-tags flag that when present only includes the elements specified. If you want to get gritty about specific elements, a valid tag option could be something like CSS syntax to select an element.

MichaIng · 2021-06-25T23:37:23Z

CSS selector syntax would be awesome indeed. But I wouldn't want to put to burden onto the devs for implementing such a complex parser, so I guess it depends on whether there is a reliable library which can do it nicely.

lebensterben · 2021-06-25T23:54:41Z

The implementation is not hard.
But in what kind of scenario would a typical user want to filter links by link text?

T145 · 2021-06-26T00:59:22Z

Grabbing specifc links aids automation immensely. It really makes using this in a GitHub Action environment more favorable.

lebensterben · 2021-06-26T01:22:28Z

In CI the normal use case is to blindly check any links if found, with optional filtering based on link pattern (that's already supported).

Adding the suggested function just doesn't add extra utility to normal CI users.

lebensterben · 2021-06-26T01:27:45Z

There's an alternative solution which probably can suit your needs.
We can have lychee logs failed links in a file, and add an option to lychee or its CI workflow to 'resume' the previous job by only checking the failed links.

T145 · 2021-06-26T01:47:24Z

Other utilities can be used to do that though. My point is that some level of taking away the "blindness" would be good in general. The reason that use case is the most you've seen is b/c anyone who needs just that picks up this utility. Anyone else who needs something different will immediately try to find a better solution. I haven't been able to find another All-In-One utility that can just pick out links that match a specific Markdown or HTML tag.

MichaIng · 2021-06-26T09:04:32Z

But is it really the link text that you want to match against? I mean to you have a lot of "Read more" links, and need to check those exclusively? I also can't really imagine a use case without at least a more generic identifier, like a class other other kind of mark, like mentioned above, which indeed is difficult in pure Markdown. This is also the reason why we do not check the Markdown files but the resulting HTML file, after generated. But that's not done in everyone's case, I agree.

So it would be interesting to hear or see an example about where and how you'd use this feature, to better understand a possible pattern of use cases.

T145 · 2021-06-26T17:09:31Z

The link text wouldn't be what's matched: it'd be the tags.

Another cool thing I thought of could be mixing the html and md tag selection flags, and only have a single comma indicate when you want none. Usually there would be multiples delimited by comma. E.g. this:

lychee README.md --html-tags , --md-tags link

Would select all Markdown links in the [link](mylink) format and ignore all html links included in the document. Redundant yes, but it's just to illustrate the point in one command.

lebensterben · 2021-06-26T17:14:20Z

this doesn't fit into standard *nix CLI style.

MichaIng · 2021-06-26T17:57:44Z

@T145
Did you try it? In [link](mylink), link is the link text. The concept of a "tag" doesn't really exist in Markdown:

[link](mylink) => link

In the translated HTML document, "<a>" itself is the tag, but not the text below start and end tag.

T145 · 2021-06-26T18:06:11Z

Yes I'm aware. However it can be handled is fine w/ me, just so long as there can be some level that links can be selected at. Idk if this program parses raw Markdown or converts it into HTML first.

MichaIng · 2021-06-26T18:20:24Z

I'm quite sure that Markdown is not converted by lychee (correct me if I'm wrong) and it is good that it does not even try it, as Markdown as mentioned has no hard syntax, but has different flavors and can be extended, which enables plenty different syntax for links, which practically cannot be reliably handled by lychee. So URLs are most likely found from the raw text input without interpreting Markdown in any special way (again correct me, if I am wrong).

just so long as there can be some level that links can be selected at

IMO it does not make sense to implement a feature only until another/better feature has been added, that would be a waste of development time. Without a convincing example of a document where selecting Markdown links by link text makes sense, I would vote against this, as there are IMO features with a wider use case requested. Adding an option to select and/or exclude links in HTML documents based on tags or CSS selectors would find more use, and can help in case of Markdown documents as well, when those are translated into HTML within the CI/CD pipeline with a defined Markdown parser and extensions.

T145 · 2021-06-26T18:27:59Z

You can't see that the use cases are similar? A lot of assumptions are being made on either end about how this program works, so let the developers make their assessments on our respective recommendations.

lebensterben · 2021-06-26T18:28:33Z

@MichaIng
You're right. lychee doesn't convert input file(s).

mre · 2021-12-03T12:14:18Z

An alternative approach would be to use a Markdown command-line processor for extracting the tags and only use lychee on its output:

md --md-tags link | lychee -

Note that md does not exist. It would be a utility similar to jq and could be helpful for many use-cases. I wonder if there's a tool out there like this. At least a quick search didn't reveal anything. If not then it's either very hard or somebody should build it.

mre · 2025-01-06T23:37:52Z

Today I heard of mdq, which is helpful for this task. https://github.com/yshavit/mdq

Perhaps people can try it out to see if they can extract links based on tags and feed that into lychee.

MichaIng mentioned this issue Jun 26, 2021

Exclude URLs based on HTML tags #259

Closed

mre added the request-for-comments label Sep 4, 2021

mre added enhancement New feature or request workaround labels Feb 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Select Markdown links based on their tag #270

Select Markdown links based on their tag #270

T145 commented Jun 25, 2021

MichaIng commented Jun 25, 2021

T145 commented Jun 25, 2021

MichaIng commented Jun 25, 2021

lebensterben commented Jun 25, 2021

T145 commented Jun 26, 2021

lebensterben commented Jun 26, 2021

lebensterben commented Jun 26, 2021

T145 commented Jun 26, 2021

MichaIng commented Jun 26, 2021 •

edited

Loading

T145 commented Jun 26, 2021 •

edited

Loading

lebensterben commented Jun 26, 2021

MichaIng commented Jun 26, 2021 •

edited

Loading

T145 commented Jun 26, 2021

MichaIng commented Jun 26, 2021

T145 commented Jun 26, 2021

lebensterben commented Jun 26, 2021

mre commented Dec 3, 2021

mre commented Jan 6, 2025

Select Markdown links based on their tag #270

Select Markdown links based on their tag #270

Comments

T145 commented Jun 25, 2021

MichaIng commented Jun 25, 2021

T145 commented Jun 25, 2021

MichaIng commented Jun 25, 2021

lebensterben commented Jun 25, 2021

T145 commented Jun 26, 2021

lebensterben commented Jun 26, 2021

lebensterben commented Jun 26, 2021

T145 commented Jun 26, 2021

MichaIng commented Jun 26, 2021 • edited Loading

T145 commented Jun 26, 2021 • edited Loading

lebensterben commented Jun 26, 2021

MichaIng commented Jun 26, 2021 • edited Loading

T145 commented Jun 26, 2021

MichaIng commented Jun 26, 2021

T145 commented Jun 26, 2021

lebensterben commented Jun 26, 2021

mre commented Dec 3, 2021

mre commented Jan 6, 2025

MichaIng commented Jun 26, 2021 •

edited

Loading

T145 commented Jun 26, 2021 •

edited

Loading

MichaIng commented Jun 26, 2021 •

edited

Loading