Web Scraping Vs Web Crawling
There is a subtle difference between web scraping and web crawling. Moreover Web Scraping and Web Crawling are interrelated. The words Web Scraping and Web Crawling may look similar and many people use these words very frequently, But both have lots of differences between them.
In simple terms, Web crawling is the process of repetitively finding and fetching hyperlinks starting from a list of starting URLs.
Broadly speaking, Web Crawling is the process of locating information on World Wide Web(WWW), indexing all the words in a document, adding them to a database, then following all hyper links and indexes and adds that information also to the database.
A web crawler is a software program that visits websites and reads their pages and other related information in order to build entries for a search engine index. The major search engines like Google, Yahoo, Bing etc on the Web all have such a program, which is also known as a “web spider” or a “bot.”
In simple terms, Web scraping is the process of automatically requesting a web document and collecting information from it. Strictly speaking, to do web scraping, you have to do some degree of web crawling to move around the websites.
Web Crawling would be generally what Google, Yahoo, Bing etc. do, searching for any kind of information. Web Scraping is essentially targeted at specific websites for specific data, e.g. for stock market data, business leads, supplier product scraping and this are mostly provided by web scraping service provider.
Web Scraper would be doing things a good web crawler wouldn’t do, i.e.:
- Doesn’t obey robots.txt
- Submit forms with data
- Transforming the data into required form and format
- Saving extracted data into database