Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing features in comparison table #68

Closed
4 of 18 tasks
anarcat opened this issue Dec 5, 2020 · 6 comments
Closed
4 of 18 tasks

missing features in comparison table #68

anarcat opened this issue Dec 5, 2020 · 6 comments

Comments

@anarcat
Copy link
Contributor

anarcat commented Dec 5, 2020

Hi!

By default, some crawlers (like linkchecker and the w3c link checker) respect robots.txt (because they are, after all, bots). One of the thing that got me involved in linkchecker is the ability to disable that for some sites, actually..

Is that a feature that lychee supports? Either way, it should probably be listed in the table.

Same with GUI support: linkchecker has a GUI, not sure if that's the case for lychee or the others. Oh and plugins, we have plugins too. :)

I suspect there might be other such features missing here... heck, just looking at the linkchecker readme, i find:

  • recurses
  • GUI
  • web interface
  • robots.txt
  • plugins
  • multiple output (HTML, SQL, CSV, XML, Sitemap, text)
  • regex filters (or are those included in the "exclude filters" bit?)
  • proxy support
  • telnet, FTP, news:, nntp support
  • cookie support
  • html5 support (although to be honest I have no idea what that actually means)

Those are provided through plugins:

  • anchor checks
  • PDF parsing
  • word document parsing
  • HTTPS expiration check
  • virus checks
  • content search for regex
  • w3c syntax check
@mre
Copy link
Member

mre commented Dec 5, 2020

Oh wow that's quite a nice list!

When I initially compiled the comparison I picked whatever I could find on various repos.
It's quite biased at the moment but it took quite some time to check a single feature for all checkers so I had to prioritize, heh.
Would you mind sending a PR for adding the missing items to the comparison?

Some thoughts

regex filters (or are those included in the "exclude filters" bit?)

Yup. We support multiple exclude regexes as well as include regexes. They can be combined as well.

GUI / web interface

lychee is a pure CLI tool. Don't think we'll ever add a graphical interface. It's more a design decision than a lacking feature. 😉 So tbh I'd skip that part.

I'd also skip the plugins but definitely mention plugin support in general.

HTML5 support is an odd one. lychee reads HTML files but I'm not sure what's missing for HTML5 support.
The rest of the list is great. lychee has none of that, so I'm glad there's still room for improvement. 😉

The best thing would be if we could add an entry for each link checker but I realize I ask a lot. 😃

@anarcat
Copy link
Contributor Author

anarcat commented Dec 6, 2020

Thanks for the quick reply! I understand some features might seem out of scope for lychee - I think that's fine, but I also think a fair comparison would nevertheless include those...

The best thing would be if we could add an entry for each link checker but I realize I ask a lot. smiley

As much as I would love to see that happen, I don't have much more time beyond the tiny PR I submitted.

Some friend also identified that lychee doesn't recurse -- which I found surprising -- so I added that to the checklist above.

@mre
Copy link
Member

mre commented Dec 8, 2020

Yeah recursion is still on the todo list. See #21

@pawroman
Copy link
Member

pawroman commented Dec 8, 2020

GUI / web interface

lychee is a pure CLI tool. Don't think we'll ever add a graphical interface. It's more a design decision than a lacking feature. wink So tbh I'd skip that part.

GUI is kind of an anti-feature for me, because of (arguably) little usefulness. Some people might need/want GUI but I think it's not that useful for most use cases. I agree that it's a design decision. I also think that with JSON/CSV output, a simple GUI can be bolted on without major problems, e.g. using zenity or this: https://github.com/BashGui/easybashgui or a simple web interface.

That being said, I'd welcome any contributions adding GUI if people wanted to do it 😄

By default, some crawlers (like linkchecker and the w3c link checker) respect robots.txt (because they are, after all, bots).

In my opinion robots.txt should be an opt-in feature, because the very purpose of a link checker is to check links (which usually requires doing a GET request). I think robots.txt is mostly meant to prevent abuse from crawlers/spiders. Link checkers (at least in my use cases) are targeting particular links (e.g. here's a list of links to check, go check them), which is not what crawlers do.

@anarcat
Copy link
Contributor Author

anarcat commented Dec 8, 2020

just to be clear, the feature list here is not something I'm proposing lychee necessarily implement. it's fine if lychee doesn't want a GUI: but it would still be nice to see how lychee differs from other similar tools, and if there is one feature it decided not to have, that's the place to document it, IMHO.

@mre
Copy link
Member

mre commented Jan 7, 2021

We've since added JSON output support, which I deem to be the most useful format for machine-readability. I've also mentioned the lack of recursion functionality in the docs now and updated the remaining tools to the best of my knowledge.

From your list, telnet, FTP, news:, and nntp support might be another nice addition, but I honestly don't have the time to check all tools for that functionality right now. Also adding them might make the README.md longer than it already is — and it was intended as a quick primer.

I have to agree with @pawroman that a GUI is something I would not expect from a link checker. Same for robots.txt. The intent is not to scrape pages, but to check their health. No information has to be stored on disk for this. Respecting the robots.txt could result in unexpected results.

On a more general note, I don't believe the README.md should be an exhausting survey of features of various link-checkers; that could make for a nice blog post. Rather, the intention is to give a quick overview to a potential user if they should consider lychee for their use-case. As with other software, I expect a lack of documentation for a given functionality to mean that the tool does not support it (yet). Therefore I guess it's fair to stop here and perhaps revisit this issue at a later point. Particularly plugin support still sounds interesting (hello WASM), but I don't see that as a feature lacking in lychee, but a design decision on how some features were implemented. I might come around on this in the future, though.

In any case, healthy criticism is very valuable for every project, so thank you very much for the conversation @anarcat and for your help to make the docs a little more complete. ❤️

@mre mre closed this as completed Jan 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants