Python Django Web Scraping

March 6, 2021, 1:11 p.m.

In this tutorial we are going to look at how we can extract data from other websites using Beautiful Soup. we are going to get most popular movies from this website below

https://www.imdb.com/chart/moviemeter/

If you go to that website you will see a list of most popular movies

imdb

We will write a python program that will extract this information.

First of all install requests and BeautifulSoup using the following commands


pip install requests
pip install beautifulsoup4

If you inspect this page you will realize that all these movies are contained in a table with a class of chart full-width

imdb

Now lets inspect the image besides the movie title by right clicking on it and clicking inspect , you will see that the title of the movie is found in the alt attribute of the image. This is what we are going to get with our python program.

inspect

Our complete python program will look like this


import requests

from bs4 import BeautifulSoup


url = "https://www.imdb.com/chart/moviemeter/"

response = requests.get(url)

soup = BeautifulSoup(response.content, "html.parser")

table = soup.find('table',  {'class': 'chart full-width'})

rows = table.find_all('tr')

movies = []

for row in rows:
    image = row.find('img')
    if image:
        movies.append(image['alt'])

The above program does the following , it goes to this page "https://www.imdb.com/chart/moviemeter/" and it looks for a table with a class of chart full-width and then we find all the images in the rows of this table , after that we get the alt attribute of these images which contain the movie title that we need.

After we extract the movie titles , we append them to the movies list and now we can use django to display this list like this


def home(request):
    url = "https://www.imdb.com/chart/moviemeter/"
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")
    table = soup.find('table',  {'class': 'chart full-width'})
    rows = table.find_all('tr')
    movies = []
    for row in rows:
        image = row.find('img')
        if image:
            movies.append(image['alt'])
    return render(request, "movies/home.html", {'movies': movies})

On the html side our list will be displayed like this



{% extends 'base.html' %}

{% block content %}
<div class="container">

  <h1 class='text-center'> Movies List </h1>

  <div class="row">
    <div class="col-sm-8 offset-sm-2">

      <ul class="list-group">
        {% for movie in movies %}
        <li class="list-group-item">
          <p>{{ movie }}</p> 
        </li>
        {% endfor %}
      </ul>

    </div>
  </div>
</div>
{% endblock %}

Our Django app will now display the movies list as shown in image below, remember this list was extracted from another website and now it is being shown in our app.

movies home page

GET SOURCE CODE

https://github.com/felix13/djangowebscraper

Conclusion

We just created a web scraper to get a list of popular movies from the imdb site, Now feel free to modify this scraper to scrape other things on the internet, this could be prices of products in other websites or information from wikipedia.

Keep Learning