In: Computer Science
Question Describe the pros and cons and the applicable situations of the following approaches that are used to collect data from a website - API call - Beautifulsoup - Selenium
API call, Beautifulsoup, Selenium these are the most used web scraping tool by the developers.
API call: API allows a user to open up the data and other functionality to the outside.
Pros:
With API computers can manage the work rather than people.
Delivery of service is more flexible in API call. API can access the app components.
Through APIs, users or companies can personalize the experience according to their own.
Cons:
Not reliable: When certain changes in a particular website arise, API does not reflect it in runtime. It takes time so not reliable.
Rate Limits: One cannot harvest many data through API call. There is some restriction on the basis of time between consecutive requests.
Sometimes scrapping data is not copyrightable.
Beautifulsoup:
Pros:
Easy to learn and master. It can extract data from all the links available in a webpage through a single command.
The documentation provided is very good which helps to learn it very quickly.
It has a good community so, one can easily figure out the issues arise.
Cons:
The main disadvantage with Beautifulsoup is it can not do all the things of its own. It required many dependencies and libraries to perform actions.
Selenium:
Pros:
It can easily work with JAVASCRIPT.
It can handle AJAX requests.
Selenium is very beginner-friendly and provides automation features.
Can work with a wide range of programming languages. (Java, Python, C#, python)
It supports various operating system as well as Browsers.
Cons:
Technical support is not reliable.
It supports web-based application only.
There is no build-in repository, required more time to create testing.