The learning journey is expensive and some people search for money during their journey, so some of us search for skills in the middle and don't spend much time during learning. if you don't need money, when you learn these new skills, this makes you grow up in programming language level, and you became multi-skills.
Here I explain one of these skills which help you in your journey
Firstly, let me explain what's web scraping?
web scraping is a process that extracts data from a website, the data are text, photo, video, or anything that is public on the internet. or it's the method utilized to scrape a huge amount of data from different sites where data is scraped as well as saved in the local files in computers or to the database in the tabulated layout. imagine you can extract all data on amazon or all hotel prices in Airbnb or likes or comments on Facebook or anything on the internet.
why scraping is important?
The result of the scraping process is data and data in our world is the new oil for many industries. when we look in AI industries if you don't have data it means you can't develop AI. "No data, no AI algorithm"
What is the market size in web scraping?
Web scraping software market size was valued as at 420.84$ Million in 2019 and is expected to reach as 948.60$ Million by 2026 with a compound annual growth rate of 13.1% during the forecast period 2020-2026 The Reference
As we read above we can determine the web scraping market is huge, this means we have a chance to get a good job. I want to tell you web scraping software market size is not for programs only but include people who work as a freelancer
In this tutorial we talk about python
firstly, you want to know the type of website because the type of website determine which library use it in your project
All websites are one of two types of websites
Now what are libraries which we can use in scraping?
In python, we have more libraries to use in scraping but common are
Beautiful Soup Making this library is common because it's easy but this library scrapes static websites only.
Selenium selenium is a library for Automation and testing, but we use it in scraping as well. Selenium is a good choice for a dynamic website, but I don't prefer using this library because it is slow.
Scrapy This is framework specific for scraping, All tools which you will need it scraping, you find it in this framework, This library I prefer to learn it because having an active community, fast, specific for. But we have a problem here with this library for static websites so you must use another additional library to scrape dynamic websites like Scrapy-splash, Scrapy-playwright, and so on.
Playwright A playwright is one of the tools which I love and use in some projects, Why because it's fast. it's an alternative for selenium, it does what selenium does but is faster. It can scrape any website Dynamic or static but is still slower than Scrapy so we use it with scrapy to use all Features in both.
Now, you know libraries that use in web scraping, Do all you need to scrape websites? no, you want some knowledge in HTML because you determine elements that will extract data from it by HTML elements.
So all you need to scrape websites is the basics of HTML and some knowledge in one of these libraries, I recommend, playwright and Beautiful soup or Scrapy and playwright in big projects.
In the next blog, I will talk about one of my projects, its Facebook scraper. Facebook scraper is a website to scrape Facebook page all you need to do is put URL, you can see a video to know more about this project from here.
If you want anything, I will be glad to connect with you.
Did you find this article valuable?
Support Eslam Abou-shashaa by becoming a sponsor. Any amount is appreciated!