Scrape Failed. Caught exception during request #107

c2peng · 2020-12-14T05:46:25Z

Hi,
I am running this on a brand new Raspberry pi 4 8GB. I am randomly getting response like this:
E2020-12-14 05:44:33,770 [mzn_2] scrape failed
I2020-12-14 05:44:46,499 [mzn_3] not in stock
I2020-12-14 05:44:56,268 [mzn_4] not in stock
E2020-12-14 05:45:01,436 [bstby_1] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 05:45:01,437 [bstby_1] scrape failed
E2020-12-14 05:45:06,575 [bstby_2] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 05:45:06,575 [bstby_2] scrape failed
E2020-12-14 05:45:36,778 [mzn_2] caught exception during request: Message: invalid session id
E2020-12-14 07:13:20,149 [mzn_2] caught exception during request: Message: unknown error: session deleted because of page crash
from unknown error: cannot determine loading status
from tab crashed
(Session info: headless chrome=83.0.4103.116)

E2020-12-14 07:13:20,150 [mzn_2] scrape failed

I tried running it on my macbook pro 16 and there's no problem.

gbasile17 · 2020-12-14T17:39:26Z

Im having the same issue with bestbuy.....they probably added something to prevent scrapers on their page.

E2020-12-14 17:32:45,364 [bstby_1] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:32:45,365 [bstby_1] scrape failed
E2020-12-14 17:32:50,443 [bstby_2] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:32:50,443 [bstby_2] scrape failed
E2020-12-14 17:32:55,511 [bstby_3] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:32:55,512 [bstby_3] scrape failed
E2020-12-14 17:33:00,585 [bstby_4] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:33:00,585 [bstby_4] scrape failed
E2020-12-14 17:33:05,657 [bstby_5] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:33:05,658 [bstby_5] scrape failed
E2020-12-14 17:33:10,729 [bstby_6] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:33:10,729 [bstby_6] scrape failed
E2020-12-14 17:33:15,802 [bstby_7] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:33:15,803 [bstby_7] scrape failed
E2020-12-14 17:33:20,868 [bstby_8] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:33:20,868 [bstby_8] scrape failed
E2020-12-14 17:33:25,934 [bstby_9] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:33:25,934 [bstby_9] scrape failed
E2020-12-14 17:33:30,999 [bstby_10] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:33:31,000 [bstby_10] scrape failed
E2020-12-14 17:33:36,074 [bstby_11] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:33:36,074 [bstby_11] scrape failed
E2020-12-14 17:33:41,145 [bstby_12] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:33:41,145 [bstby_12] scrape failed
E2020-12-14 17:33:46,223 [bstby_13] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:33:46,223 [bstby_13] scrape failed
E2020-12-14 17:33:51,304 [bstby_14] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:33:51,305 [bstby_14] scrape failed
E2020-12-14 17:33:56,372 [bstby_15] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:33:56,372 [bstby_15] scrape failed
E2020-12-14 17:34:01,454 [bstby_16] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:34:01,454 [bstby_16] scrape failed
E2020-12-14 17:34:06,532 [bstby_17] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:34:06,532 [bstby_17] scrape failed
E2020-12-14 17:34:11,609 [bstby_18] caught exception during request: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Read timed out. (read timeout=5)
E2020-12-14 17:34:11,610 [bstby_18] scrape failed

EricJMarti · 2020-12-14T22:26:22Z

I increased the minimum timeout interval from 5 seconds to 15 seconds in this commit: 1fa0cba

Can you try again using the latest image?

A lot of factors can cause requests to time out (a slow or congested internet connection, using Wi-Fi instead of ethernet, other processes competing for CPU and network resources, the list goes on). I think 15 seconds should be a reasonable timeout to download a webpage, but let me know if you still see timeout issues and I can raise it more.

EricJMarti closed this as completed Dec 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scrape Failed. Caught exception during request #107

Scrape Failed. Caught exception during request #107

c2peng commented Dec 14, 2020 •

edited

Loading

gbasile17 commented Dec 14, 2020

EricJMarti commented Dec 14, 2020

Scrape Failed. Caught exception during request #107

Scrape Failed. Caught exception during request #107

Comments

c2peng commented Dec 14, 2020 • edited Loading

gbasile17 commented Dec 14, 2020

EricJMarti commented Dec 14, 2020

c2peng commented Dec 14, 2020 •

edited

Loading