Description
Which package is the feature request for? If unsure which one to select, leave blank
crawlee
Because I'm actively using PuppeteerCrawler
from crawlee
I might test it with that, so I'll focus to test using it first.
Feature
I migrated from puppeteer-cluster
to crawlee
, and I missed their monitor feature for local dev.
Motivation
It's handy to track time estimation.
Ideal solution or implementation, and any additional constraints
-
Consume and reuse existing statistic data of task completed and we will only add what's missing for the monitor, I don't currently know which file is it. But I'm sure RequestQueue and Concurrency features have this data.
-
Imagined CLI UI:
Start: START_TIME
Now: CURENT_TIME (running for CONSUMED_TIME)
Progress: FINISHED / TOTAL_TASK (FINISHED_PERCENTAGE), failed: FAILED (FAILED_PERCENTAGE)
Remaining: ESTIMATED_TIME (SPEED)
Sys. load: CPU_LOAD / MEM_LOAD
Concurrencies: CONCURRENCY_INFO
CONCURRENCY_LIST
-
Add a new Monitor class in packages/core/src/monitor.ts to handle the display of the monitor UI. It will contain the logic to write into the output and logic to gather and calculate the monitor data.
-
Integrate the Monitor class into the BasicCrawler class in packages/basic-crawler/src/internals/basic-crawler.ts
-
The Monitor class tracks and displays time estimation and concurrency status in the CLI output at regular intervals as proposed UI template.
-
Updated the run function in packages/basic-crawler/src/internals/basic-crawler.ts to initialize and start the Monitor class.
Alternative solutions or implementations
No response
Other context
crawlee
already using built-inlog
, so to make sure this monitor output not overwrite the log, we should find out how to write monitor andlog
output in separate line.