-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scrape Failed. Caught exception during request #107
Comments
Im having the same issue with bestbuy.....they probably added something to prevent scrapers on their page. E2020-12-14 17:32:45,364 [bstby_1] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5) |
I increased the minimum timeout interval from 5 seconds to 15 seconds in this commit: 1fa0cba Can you try again using the latest image? A lot of factors can cause requests to time out (a slow or congested internet connection, using Wi-Fi instead of ethernet, other processes competing for CPU and network resources, the list goes on). I think 15 seconds should be a reasonable timeout to download a webpage, but let me know if you still see timeout issues and I can raise it more. |
Hi,
I am running this on a brand new Raspberry pi 4 8GB. I am randomly getting response like this:
E2020-12-14 05:44:33,770 [mzn_2] scrape failed
I2020-12-14 05:44:46,499 [mzn_3] not in stock
I2020-12-14 05:44:56,268 [mzn_4] not in stock
E2020-12-14 05:45:01,436 [bstby_1] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 05:45:01,437 [bstby_1] scrape failed
E2020-12-14 05:45:06,575 [bstby_2] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 05:45:06,575 [bstby_2] scrape failed
E2020-12-14 05:45:36,778 [mzn_2] caught exception during request: Message: invalid session id
E2020-12-14 07:13:20,149 [mzn_2] caught exception during request: Message: unknown error: session deleted because of page crash
from unknown error: cannot determine loading status
from tab crashed
(Session info: headless chrome=83.0.4103.116)
E2020-12-14 07:13:20,150 [mzn_2] scrape failed
I tried running it on my macbook pro 16 and there's no problem.
The text was updated successfully, but these errors were encountered: