How to scrap data from the website using Python

Robson - 08-03-2021 | #python

Img source: Edureka

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. There are different ways to scrape websites such as online Services, APIs or writing your own code. In this article, we’ll see how to implement web scraping with python. We will use one of the websites I have built.

I will skip the installation of Python in this tutorial.

Using your prefered text editor, create a python file and name it whatever you want. I'll name mine scrapper.py. We'll Import all the libraries that we'll need to build our scrapper. a library is a collection of precompiled routines that a program can use. The routines, sometimes called modules, are stored in object format.

	# import libraries
	import requests
	import csv
	from bs4 import BeautifulSoup

view raw scrapper.py hosted with ❤ by GitHub

Now let's get the url for the website from which we want to scrap data from. In this case, we'll use https://windhoeknamibia.github.io

Using our url, we'll now use the requests library to fetch data from the website.

	# import libraries
	import requests
	import csv
	from bs4 import BeautifulSoup

	# url to scrap
	url = "https://windhoeknamibia.github.io/"

	#send http request
	resp = requests.get(url)

	# prints html layout of the website
	print(resp.text)

view raw scrapper.py hosted with ❤ by GitHub

Create a soup object to get the title of the website

	#create a soup object
	soup = BeautifulSoup(req.content, 'html.parser')

	#get the title of the website
	title = soup.find(id="title")
	print(title)
	print(title.string)

view raw scrapper.py hosted with ❤ by GitHub

Create a soup object to find places!

	# get all <h4> with a className: "place-name"
	places_obj = soup.find_all("h4", {"class":"place-name"})
	print(places_obj)

view raw scrapper.py hosted with ❤ by GitHub

Write all place names to a csv file.

	# create a list of place names
	list_of_places = [["Place Names"]]

	#loop through the places object and append them to your list of places
	for place in places_obj:
	list_of_places.append([place.string])

	print(list_of_places)

	with open('places.csv', 'w', newline='') as csv_file: # creates a new csv file
	writerObj = csv.writer(csv_file) # create a csv writer object
	writerObj.writerows(list_of_places) # writerows is used to write data into your csv file

	# the csv file will be displayed in your workspace

view raw scrapper.py hosted with ❤ by GitHub

Let's create a soup object to help us get all the image src links.

	# all images are inside a div with a className "whk-place"
	img_obj = soup.find_all('div', {'class': 'whk-place'})

view raw scrapper.py hosted with ❤ by GitHub

Now let's print out all the image links.

	imgLinks = []

	for link in img_obj:
	imgLinks.append(link.find('img').get('src')) # find the src attribute in each image

	print(imgLinks)
	# prints a list of image links

view raw scrapper.py hosted with ❤ by GitHub

I hope this article helped you understand web scrapping and how to use python libraries to scrap websites. You can continue to do more practical examples using different websites.
Be careful not to scrap data from websites which do not give you permission to do so. To know whether a website allows web scraping or not, you can look at the website’s “robots.txt” file. You can find this file by appending “/robots.txt” to the URL that you want to scrape.

To do!

Try to write all image src links into your csv file.

Happy coding!