Language for Scraper #44
Replies: 4 comments 4 replies
-
I agree on the points provided, though I'd like to offer an argument in favor of using TypeScript instead of Python for this. Pros
Cons
These things being said, the main point I'd like to bring up is Selenium. Python, JS, and various other languages have full Selenium libraries at our disposal, and I think we should use it regardless of whichever language we choose. Combining Selenium with web requests will allow us to automate sophisticated traversal of web-pages, enabling us to scrape for data that would otherwise be very difficult to access. For instance, Selenium makes it easy to parse a page's HTML, find a specific form element, and then interact with it whether via clicking, typing, or some other means. This would make our scraping much more powerful and ease the development process. |
Beta Was this translation helpful? Give feedback.
-
Is everyone in agreement with using TypeScript instead of Python for writing the scraper per @hochladen's response above? |
Beta Was this translation helpful? Give feedback.
-
As it seems there's not currently much opposition to this, I'll start doing some research and planning as far as how to start developing it. In the meantime, we should all talk together here in the coming days to figure out what/how we want to scrape. I think the data scraped by the existing scraper is a good baseline, but w/ the added power of Selenium we should be able to pull more. |
Beta Was this translation helpful? Give feedback.
-
I have ~250 lines of a prototype TS/Selenium coursebook scraper written up, should be finished with it by EOD tomorrow. Seems promising so far, I'll make a repo for it and share it here when it's ready. |
Beta Was this translation helpful? Give feedback.
-
What Language should we use for web scraping data?
Currently, we are using Python for scraping data, but as we are essentially making the scraper from scratch, this would be the best time to change the language. I'd like to hear @hochladen's and @AdamMcAdamson's take on this as well as anyone else's. If you are suggesting a different language, please give concrete advantages and disadvantages it has over python.
My Take
I recognize that I am biased as I personally like Python and have made web scraping projects in the past with Python so I'm comfortable with it. So I'm gonna try to give some pros and cons on why I think we should still use it. I'm sure I'm missing some things so feel but these were the biggest things that came to thought.
Some pros
Some Cons
Beta Was this translation helpful? Give feedback.
All reactions