Skip to content

Remove ssr.paths configuration and replace with ssr.excludePathRegexes which excludes specific paths from SSR #4227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

FrancescoMolinaro
Copy link
Contributor

@FrancescoMolinaro FrancescoMolinaro commented Apr 22, 2025

References

Fixes #4132

Description

Adapted filtering of SSR pages blocking the rendering on specific pages instead of allowing it only on some pages.

Instructions for Reviewers

Navigating on pages 500, 404 or 403, or any invalid URL I should see the correct error status in the request.
In general the response status should be always correct for pages rendered in SSR.

List of changes in this PR:

Removed ssr.paths configuration and replaced with ssr.excludePathRegexes.
Add regexes to exclude community and collection browse, global browse, global search, community list, and statistics.

Checklist

This checklist provides a reminder of what we are going to look for when reviewing your PR. You do not need to complete this checklist prior creating your PR (draft PRs are always welcome).
However, reviewers may request that you complete any actions in this list if you have not done so. If you are unsure about an item in the checklist, don't hesitate to ask. We're here to help!

  • My PR is created against the main branch of code (unless it is a backport or is fixing an issue specific to an older branch).
  • My PR is small in size (e.g. less than 1,000 lines of code, not including comments & specs/tests), or I have provided reasons as to why that's not possible.
  • My PR passes ESLint validation using npm run lint
  • My PR doesn't introduce circular dependencies (verified via npm run check-circ-deps)
  • My PR includes TypeDoc comments for all new (or modified) public methods and classes. It also includes TypeDoc for large or complex private methods.
  • My PR passes all specs/tests and includes new/updated specs or tests based on the Code Testing Guide.
  • My PR aligns with Accessibility guidelines if it makes changes to the user interface.
  • My PR uses i18n (internationalization) keys instead of hardcoded English text, to allow for translations.
  • My PR includes details on how to test it. I've provided clear instructions to reviewers on how to successfully test this fix or feature.
  • If my PR includes new libraries/dependencies (in package.json), I've made sure their licenses align with the DSpace BSD License based on the Licensing of Contributions documentation.
  • If my PR includes new features or configurations, I've provided basic technical documentation in the PR itself.
  • If my PR fixes an issue ticket, I've linked them together.

@alanorth alanorth added component: SEO Search Engine Optimization high priority port to dspace-7_x This PR needs to be ported to `dspace-7_x` branch for next bug-fix release port to dspace-8_x This PR needs to be ported to `dspace-8_x` branch for next bug-fix release labels Apr 22, 2025
@alanorth alanorth added this to the 9.0 milestone Apr 22, 2025
@alanorth
Copy link
Contributor

Thanks @FrancescoMolinaro. I tested this on DSpace 7.6.4-SNAPSHOT (patch applies cleanly) and saw that the 403, 404, and 500 error pages were returning the correct codes. I also tested a few other paths for existing and non-existing items (for example) to make sure the codes didn't change.

@tdonohue tdonohue changed the title [DURACOM-344] set error pages status in response Set error pages status code in SSR response Apr 22, 2025
@tdonohue tdonohue changed the title Set error pages status code in SSR response Set error pages status code in response Apr 22, 2025
@tdonohue tdonohue self-requested a review April 23, 2025 14:02
Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FrancescoMolinaro : The code looks good overall now, but my tests are not all successful. Here's what I'm seeing:

  1. Running this PR in production mode (npm run build:prod; npm run serve:ssr)
  2. Test http://localhost:4000/500 -> Returns 500 error code (SUCCESS)
  3. Test http://localhost:4000/403 -> Returns 403 error code (SUCCESS)
  4. Test http://localhost:4000/404 -> Returns 404 error code (SUCCESS)
  5. Test http://localhost:4000/invalid-url -> Shows the /404 page, but returns 200 OK (FAILURE)

So, this works for the pages that it's checking for. But, it doesn't seem to work in scenarios where a different path results in a 404 error. It also therefore might not work if a different path resulted in a 500 or 403 error.

Overall, this PR provides better behavior than the current main. But, it doesn't seem to fully fix the problem of returning a proper error code when an error occurs from the backend. Instead, it just makes sure that if you access these specific error pages directly in your browser that you'll get a proper error code.

@tdonohue tdonohue moved this to 🙋 Needs Reviewers Assigned in DSpace 9.0 Release Apr 23, 2025
@FrancescoMolinaro
Copy link
Contributor Author

Hi @tdonohue , thanks for the feedback!

I agree that there are some scenario where this solution might not work.
I digged a bit more into the problem and found out the issue stems from this change we made #3682 ; so I reverted my changes and adapted the code accordingly.
The problem with unrecognized urls, e.g. http://localhost:4000/invalid-url , is that we skip SSR completely and a blank document with a code of 200 is returned. when the redirect to the 404 page happens we are already in CSR therefore too late to change the status.

I don't think there is an easy solution to this, what comes to mind to make a bit more robust the handling of the requests' status while keeping some pages out of SSR, could be importing the APP_ROUTES in the server.ts and checking the starting of the request's path so we can set a 404 in case none of the app's paths matches the starting of the request's one.

This would still leave a lot of options uncovered, for instance home-invalid, would still result as a known route with a 200 code.
Anyway I would really appreaciate a second opinion on this as it seems to me that we have a wider issue.

@tdonohue
Copy link
Member

Assigning @artlowel as a secondary reviewer since he offered to help take a look at this today.

@tdonohue tdonohue requested a review from artlowel April 24, 2025 19:24
@AndreaBarbasso
Copy link
Contributor

@tdonohue, @artlowel, the same situation (using a 200 error code for something that is not explicitly - eg. /404, /500 - a redirect/error) can be seen here: #4042.
I think @FrancescoMolinaro is on the right track, we might want to re-think the solution applied here and use a black-list approach instead of a white-list approach.

@tdonohue
Copy link
Member

tdonohue commented Apr 28, 2025

@AndreaBarbasso and @FrancescoMolinaro : If we can find a way to solve this by changing our settings to exclude specific paths from SSR, instead of including only specific paths in SSR, then I'd be OK with that approach. That does seem like it may provide better behavior for unexpected errors / redirects.

If we want to go that route, I'd recommend renaming the existing ssr.paths setting in config.*.yml to make the new behavior clear. Perhaps calling it something like ssr.excludePaths or similar. So, something like this:

ssr:
  excludePaths: [ '/browse/', '/community-list', '/search/', '/statistics']

The challenge may be that we may need to also support regex values if we also want to exclude browse pages for communities & collections which appear under /communities/[uuid]/browse/ and /collections/[uuid]/browse/ respectively.

Whatever solution is created, we'll have to make sure to also backport it to 7.6.x and 8.x.

@FrancescoMolinaro
Copy link
Contributor Author

Hi @tdonohue , thanks for the feedback!

I agree that a blacklist approach would make the solution more stable for error codes and redirects, so I have refactored the implementation following your suggestion.
I still think there will be some edge cases where we can't show a correct error code, e.g. redirect to login page if a user access a restricted collection or community, but I believe this change will anyway reduce considerably the number of these cases.

I will look forward to your feedback and thanks again!

Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FrancescoMolinaro : I haven't had a chance to test this yet, but I had some feedback on how we could clean up the new code. It may make since to just have one excludePaths setting, if you agree. See more inline below.

# Regexes to be run against the URl of the page to check if SSR is allowed.
# This configuration offers more flexibility and precision than excludePaths as we can match more complex URLs.
# If the entire URL match any of the regexes it will be served directly in CSR.
ssrBlacklistRegexes: [ /^\/communities\/[0-9a-fA-F-]{36}(\/.*)?$/, /^\/collections\/[0-9a-fA-F-]{36}(\/.*)?$/]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, we should rename this to be similar to the other setting, or find a way to just combine the two settings into one.

So, either rename this to be excludePathRegexes, or maybe see if we can just have a single excludePaths setting which supports regex & define all paths via regex.

It might just be easiest to have a single excludePaths setting which uses regex, as that provides more power over which paths are excluded.

server.ts Outdated
* @param path
* @param blacklist
*/
function isBlacklistedPathForSsr(url: string, path: string, blacklist: RegExp[]): boolean {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove the term "blacklist" from our code, as it can have negative connotations.

Please change the terms in your code to either say "exclusion list" or "block list" or similar. Those terms are more descriptive as well. So, this method could be named isExcludedFromSsr() and the last param could be called exclusionList or excludeRegex or similar.

@github-project-automation github-project-automation bot moved this from 🙋 Needs Reviewers Assigned to 👀 Under Review in DSpace 9.0 Release Apr 29, 2025
@FrancescoMolinaro
Copy link
Contributor Author

Hi @tdonohue , thanks for the feedback and sorry for the poor naming choice, I appreciated a lot the article you shared.

I have refactored the solution so to have a single parameter based on regexes as you proposed, I agree that this give a better flexibility and power in handling the various paths compositiions.

I will look forward to your next feedback and thanks again.

Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FrancescoMolinaro : Thanks for the updates. Overall, this now looks good to me. I only have minor suggestions inline below.

I'd also recommend we update the Title & Description of this PR, as it's changed completely from its original purpose. If possible, we may also want to remove the earlier commits in this PR (by rebasing the PR & squashing commits). But, if you aren't able to easily do that, I can always merge squash this (reducing the entire PR to a single commit).

(UPDATE: I've changed the PR title. But, I haven't yet updated the description)

Overall, though, I'm +1. I also tested this today and verified that this approach works. I can verify that the excluded paths are not triggering SSR. I can also verify that the error pages & invalid URLs are returning correct error codes again.

paths: [ '/home', '/items/', '/entities/', '/collections/', '/communities/', '/bitstream/', '/bitstreams/', '/handle/', '/reload/' ]
# Regexes to be run against the path of the page to check if SSR is allowed.
# If the path match any of the regexes it will be served directly in CSR.
excludePathRegexes: [ /^\/communities\/[0-9a-fA-F-]{36}(\/.*)?$/, /^\/collections\/[0-9a-fA-F-]{36}(\/.*)?$/]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we updated this to have the same list as your have in environment.production.ts? I'd also recommend we add a basic explanation of these regexes. Something like:

# By default, excludes community and collection browse, global browse, global search, community list, and statistics.

server.ts Outdated
@@ -221,7 +221,7 @@ export function app() {
* The callback function to serve server side angular
*/
function ngApp(req, res, next) {
if (environment.ssr.enabled && req.method === 'GET' && (req.path === '/' || environment.ssr.paths.some(pathPrefix => req.path.startsWith(pathPrefix)))) {
if (environment.ssr.enabled && req.method === 'GET' && (req.path === '/' || !isExcludedFromSsr(req.path, environment.ssr.excludePathRegexes))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an extra space before this if that seems unnecessary. I'm surprised that lint didn't catch it.

@tdonohue tdonohue changed the title Set error pages status code in response Remove "ssr.paths" configuration and replace with "ssr.excludePathRegexes" which excludes specific paths from SSR Apr 30, 2025
@tdonohue tdonohue changed the title Remove "ssr.paths" configuration and replace with "ssr.excludePathRegexes" which excludes specific paths from SSR Remove ssr.paths configuration and replace with ssr.excludePathRegexes which excludes specific paths from SSR Apr 30, 2025
@FrancescoMolinaro FrancescoMolinaro force-pushed the task/main/DURACOM-344 branch from 4af4306 to c399976 Compare May 2, 2025 09:31
@FrancescoMolinaro FrancescoMolinaro force-pushed the task/main/DURACOM-344 branch from c399976 to b1c5460 Compare May 2, 2025 09:37
@FrancescoMolinaro
Copy link
Contributor Author

FrancescoMolinaro commented May 2, 2025

Hi @tdonohue , thanks again for the feedback.

I did implement the last changes you requested and I have updated the description and squashed all the commits so that we now have a single one.
The PR should be now in a cleaner state but please let me know if there is something else that needs improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: SEO Search Engine Optimization high priority port to dspace-7_x This PR needs to be ported to `dspace-7_x` branch for next bug-fix release port to dspace-8_x This PR needs to be ported to `dspace-8_x` branch for next bug-fix release
Projects
Status: 👀 Under Review
Development

Successfully merging this pull request may close these issues.

Error pages return HTTP 200 instead of correct status codes (403, 404, 500)
4 participants