Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrape Failed. Caught exception during request #107

Closed
c2peng opened this issue Dec 14, 2020 · 2 comments
Closed

Scrape Failed. Caught exception during request #107

c2peng opened this issue Dec 14, 2020 · 2 comments

Comments

@c2peng
Copy link

c2peng commented Dec 14, 2020

Hi,
I am running this on a brand new Raspberry pi 4 8GB. I am randomly getting response like this:
E2020-12-14 05:44:33,770 [mzn_2] scrape failed
I2020-12-14 05:44:46,499 [mzn_3] not in stock
I2020-12-14 05:44:56,268 [mzn_4] not in stock
E2020-12-14 05:45:01,436 [bstby_1] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 05:45:01,437 [bstby_1] scrape failed
E2020-12-14 05:45:06,575 [bstby_2] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 05:45:06,575 [bstby_2] scrape failed
E2020-12-14 05:45:36,778 [mzn_2] caught exception during request: Message: invalid session id
E2020-12-14 07:13:20,149 [mzn_2] caught exception during request: Message: unknown error: session deleted because of page crash
from unknown error: cannot determine loading status
from tab crashed
(Session info: headless chrome=83.0.4103.116)

E2020-12-14 07:13:20,150 [mzn_2] scrape failed

I tried running it on my macbook pro 16 and there's no problem.

@gbasile17
Copy link

Im having the same issue with bestbuy.....they probably added something to prevent scrapers on their page.

E2020-12-14 17:32:45,364 [bstby_1] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:32:45,365 [bstby_1] scrape failed
E2020-12-14 17:32:50,443 [bstby_2] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:32:50,443 [bstby_2] scrape failed
E2020-12-14 17:32:55,511 [bstby_3] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:32:55,512 [bstby_3] scrape failed
E2020-12-14 17:33:00,585 [bstby_4] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:33:00,585 [bstby_4] scrape failed
E2020-12-14 17:33:05,657 [bstby_5] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:33:05,658 [bstby_5] scrape failed
E2020-12-14 17:33:10,729 [bstby_6] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:33:10,729 [bstby_6] scrape failed
E2020-12-14 17:33:15,802 [bstby_7] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:33:15,803 [bstby_7] scrape failed
E2020-12-14 17:33:20,868 [bstby_8] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:33:20,868 [bstby_8] scrape failed
E2020-12-14 17:33:25,934 [bstby_9] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:33:25,934 [bstby_9] scrape failed
E2020-12-14 17:33:30,999 [bstby_10] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:33:31,000 [bstby_10] scrape failed
E2020-12-14 17:33:36,074 [bstby_11] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:33:36,074 [bstby_11] scrape failed
E2020-12-14 17:33:41,145 [bstby_12] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:33:41,145 [bstby_12] scrape failed
E2020-12-14 17:33:46,223 [bstby_13] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:33:46,223 [bstby_13] scrape failed
E2020-12-14 17:33:51,304 [bstby_14] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:33:51,305 [bstby_14] scrape failed
E2020-12-14 17:33:56,372 [bstby_15] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:33:56,372 [bstby_15] scrape failed
E2020-12-14 17:34:01,454 [bstby_16] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:34:01,454 [bstby_16] scrape failed
E2020-12-14 17:34:06,532 [bstby_17] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:34:06,532 [bstby_17] scrape failed
E2020-12-14 17:34:11,609 [bstby_18] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:34:11,610 [bstby_18] scrape failed

@EricJMarti
Copy link
Owner

I increased the minimum timeout interval from 5 seconds to 15 seconds in this commit: 1fa0cba

Can you try again using the latest image?

A lot of factors can cause requests to time out (a slow or congested internet connection, using Wi-Fi instead of ethernet, other processes competing for CPU and network resources, the list goes on). I think 15 seconds should be a reasonable timeout to download a webpage, but let me know if you still see timeout issues and I can raise it more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants