Skip to content

Added Pipeline for scheduled link rot checker #3649

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

krishnaduttPanchagnula
Copy link

@krishnaduttPanchagnula krishnaduttPanchagnula commented Jun 21, 2025

Closes #3635

@alexandear alexandear requested a review from Copilot June 21, 2025 17:33
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds a scheduled link rot checker that scans markdown files for links and validates them by making HEAD requests.

  • Added a Python script that extracts and processes URLs from markdown files.
  • Introduced a GitHub Actions workflow to schedule the execution of the link rot checker.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
hack/lin-rot-checker.py New script to extract URLs from markdown files and verify links.
.github/workflows/lin-rot-checker.yml Workflow configuration to run the link rot checker on a schedule.


for link in links:
try:
if requests.head(link).status_code==200:
Copy link
Preview

Copilot AI Jun 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a timeout to the requests.head call (e.g., requests.head(link, timeout=5)) to prevent the script from hanging on unresponsive links.

Suggested change
if requests.head(link).status_code==200:
if requests.head(link, timeout=5).status_code==200:

Copilot uses AI. Check for mistakes.

@jandubois
Copy link
Member

I think we don't care about the languages used for external tools, as long as they are easy to install both locally and in CI.

But for tools that should become part of our repo, unless there is a really good reason, they should be written in Go or bash1.

Footnotes

  1. I realize that is somewhat ironic, given that I wrote test-port-forwarding.pl in Perl, but that was ages ago, and I would now argue that it should be written in Go instead, if it didn't already exist.

@krishnaduttPanchagnula
Copy link
Author

@jandubois Can you please review and suggest if this cron time works.

@alexandear
Copy link
Member

alexandear commented Jun 26, 2025

Please fix a typo in the PR's title: "pipline" -> "pipeline". Also, update the title to match implementation (we don't have script anymore).

@krishnaduttPanchagnula krishnaduttPanchagnula changed the title added script and pipline for scheduled link rot checker added Pipeline for scheduled link rot checker Jun 27, 2025
@krishnaduttPanchagnula krishnaduttPanchagnula changed the title added Pipeline for scheduled link rot checker Added Pipeline for scheduled link rot checker Jun 27, 2025
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash the commits

@krishnaduttPanchagnula
Copy link
Author

@alexandear can we rerun this pipeline. My changes are regarding the new pipeline and should not be affecting any integration tests.

Copy link
Member

@alexandear alexandear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the workflow fails when running locally with the help of act:

❯ act -j linkChecker --container-architecture linux/amd64
INFO[0000] Using docker host 'unix:///Users/alexandear/.colima/default/docker.sock', and daemon socket 'unix:///Users/alexandear/.colima/default/docker.sock' 
[Automated Link Health Check/linkChecker] ⭐ Run Set up job
[Automated Link Health Check/linkChecker] 🚀  Start image=node:16-buster-slim
[Automated Link Health Check/linkChecker]   🐳  docker pull image=node:16-buster-slim platform=linux/amd64 username= forcePull=true
[Automated Link Health Check/linkChecker]   🐳  docker create image=node:16-buster-slim platform=linux/amd64 entrypoint=["tail" "-f" "/dev/null"] cmd=[] network="host"
[Automated Link Health Check/linkChecker]   🐳  docker run image=node:16-buster-slim platform=linux/amd64 entrypoint=["tail" "-f" "/dev/null"] cmd=[] network="host"
[Automated Link Health Check/linkChecker]   🐳  docker exec cmd=[node --no-warnings -e console.log(process.execPath)] user= workdir=
[Automated Link Health Check/linkChecker]   ✅  Success - Set up job
[Automated Link Health Check/linkChecker]   ☁  git clone 'https://github.com/lycheeverse/lychee-action' # ref=82202e5e9c2f4ef1a55a3d02563e1cb6041e5332i
[Automated Link Health Check/linkChecker] Unable to resolve 82202e5e9c2f4ef1a55a3d02563e1cb6041e5332i: reference not found
[Automated Link Health Check/linkChecker] Unable to resolve 82202e5e9c2f4ef1a55a3d02563e1cb6041e5332i: reference not found
[Automated Link Health Check/linkChecker] reference not found
[Automated Link Health Check/linkChecker] ⭐ Run Complete job
[Automated Link Health Check/linkChecker]   ✅  Success - Complete job
[Automated Link Health Check/linkChecker] 🏁  Job failed
Error: reference not found

Signed-off-by: krishnaduttPanchagnula <[email protected]>
Signed-off-by: krishnaduttPanchagnula <[email protected]>
Signed-off-by: krishnaduttPanchagnula <[email protected]>
Signed-off-by: krishnaduttPanchagnula <[email protected]>
Signed-off-by: krishnaduttPanchagnula <[email protected]>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a whitespace after # in comments. E.g., should be:

# v4.2.2

Signed-off-by: krishnaduttPanchagnula <[email protected]>
Signed-off-by: krishnaduttPanchagnula <[email protected]>
uses: lycheeverse/lychee-action@82202e5e9c2f4ef1a55a3d02563e1cb6041e5332 # v2.4.1
with:
fail: false
output: ./lychee/out.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose adding this step to output results to a log:

Suggested change
output: ./lychee/out.md
output: ./lychee/out.md
- name: Show Report
run: cat lychee/out.md

- name: Link Checker
id: lychee
uses: lycheeverse/lychee-action@82202e5e9c2f4ef1a55a3d02563e1cb6041e5332 # v2.4.1
with:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to add these args (inspired https://github.com/open-policy-agent/opa/blob/f72110de200666efeed73887707f8207a767ec9a/.github/workflows/link-checker.yaml):

Suggested change
with:
with:
args: |
--max-concurrency 1 \
--no-progress \
--scheme https \
--scheme http

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add scheduled CI job to check for link rot in repo and website
4 participants