Is there a more efficient way to block unwanted downloads in PlaywrightCrawler? #1534
Unanswered
loic-bellinger
asked this question in
Q&A
Replies: 1 comment 1 reply
-
|
Hello, I think that using the pre-navigation hooks is the common way to deal with similar problems. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi team
I’m using
PlaywrightCrawler(Python) and trying to prevent unwanted PDF/media downloads from URLs that cannot be filtered by theexcludeparameter ofenqueue_links(or are hit after redirections).Currently, I’m thinking of doing this in a
pre_navigation_hookby setting up a route handler:This seem to work, but I’m not sure it’s ideal:
resource_typereally reliable?context.page.routewhich feels somewhat inefficient.I did use
browser_new_context_options={"accept_downloads": False}when instantiating my crawler to avoid downloading the files anyway, but I don't want it to spend time trying to download & retrying to download stuff.Any help would be appreciated
Below a code with an unwanted URL to highlight my issue and make prototyping faster:
Beta Was this translation helpful? Give feedback.
All reactions