Extracting data from website using Beautifulsoup

web scraping python beautifulsoup
how to extract data from html file using python
python extract table from webpage
beautifulsoup tutorial
extract text from website python
beautifulsoup find by class
scrape website with login python beautifulsoup
beautifulsoup looping through pages

I am trying to extract the model name and any other detail about the models. When i try to fetch text then i can't find anything special which i can use to fetch data.

Anyone know how to fetch data from these urls?

https://www.audi.de/de/brand/de.html

or https://www.opel.de/auswahlhilfe/modelle.html

I would like to get the list of models and any property which is available like price.

So far i am trying to use this:

    from selenium import webdriver
    from bs4 import BeautifulSoup
    import pandas as pd
    from urllib.request import urlopen

    url = "https://www.opel.de/auswahlhilfe/modelle.html"
    html = urlopen(url)
    text = soup.get_text()

But i am not getting anything useful.. Any expert here?

If you go to Network Tab you will get The below link which returns value in json format. You don't need selenium to do that.

https://www.opel.de/apps/atomic/getVehicleTeasers.path=L2NvbnRlbnQvb3BlbC93b3JsZHdpZGUvZ2VybWFueS9kZS9pbmRleC9iYXNlYmFsbC1jYXJkcy9iYmMtY29sbGVjdGlvbnMvdmVoaWNsZXMtb25seS1jb2xsZWN0aW9uPuGlIfE.feefoEnabled=false.expandingMenuEnabled=false.json

Try below code.

from bs4 import BeautifulSoup
import requests
import pandas as pd
url='https://www.opel.de/apps/atomic/getVehicleTeasers.path=L2NvbnRlbnQvb3BlbC93b3JsZHdpZGUvZ2VybWFueS9kZS9pbmRleC9iYXNlYmFsbC1jYXJkcy9iYmMtY29sbGVjdGlvbnMvdmVoaWNsZXMtb25seS1jb2xsZWN0aW9uPuGlIfE.feefoEnabled=false.expandingMenuEnabled=false.json'
rs=requests.get(url).json()
html=''.join(rs['bbcTeaser'])
soup=BeautifulSoup(html,'html.parser')
car_name=[]
car_price=[]
for name, price in zip(soup.select('.q-carline'),soup.select('.q-value')):
    car_name.append(name.text)
    car_price.append(price.text)

df = pd.DataFrame({"car_name":car_name,"car_price":car_price})
print(df)

Output:

                           car_name                       car_price
0                              ADAM  € 14.120,00 nur Lagerfahrzeuge
1                        ADAM ROCKS  € 16.475,00 nur Lagerfahrzeuge
2                      ADAM ROCKS S                     € 20.430,00
3                            ADAM S                    € 19.330,00 
4                          Ampera-e                     € 42.990,00
5                     Astra 5-Türer                     € 19.990,00
6               Astra Sports Tourer                     € 20.990,00
7                           Cascada  € 33.995,00 nur Lagerfahrzeuge
8                        Combo Life                     € 21.645,00
9                       Neuer Corsa                     € 13.990,00
10                          Corsa-e                     € 29.900,00
11                    Corsa 3-Türer                     € 13.255,00
12                    Corsa 5-Türer                     € 14.055,00
13                      Crossland X                     € 18.750,00
14                      Grandland X                     € 24.700,00
15             Insignia Grand Sport                     € 28.505,00
16                     Insignia GSi                     € 46.695,00
17           Insignia Sports Tourer                     € 29.505,00
18          Insignia Country Tourer                     € 41.385,00
19                             KARL                     € 13.350,00
20                       KARL ROCKS                     € 12.965,00
21                          Mokka X                     € 20.495,00
22                           Zafira                     € 28.495,00
23                      Zafira Life                     € 34.780,00
24                      Combo Cargo                     € 20.230,00
25                     Movano Cargo                     € 27.925,00
26              Movano Doppelkabine                     € 38.288,25
27  Movano Fahrgestell Normalkabine                     € 34.777,75
28  Movano Fahrgestell Doppelkabine                     € 35.967,75
29      Movano Plattformfahrgestell                     € 34.777,75
30              Movano Kofferaufbau                     € 46.320,75
31     Movano Pritsche Normalkabine                     € 37.574,25
32     Movano Pritsche Doppelkabine                     € 38.764,25
33              Movano Kofferaufbau                     € 46.320,75
34     Movano Pritsche Normalkabine                     € 37.574,25
35     Movano Pritsche Doppelkabine                     € 38.764,25
36       Movano Kipper Normalkabine                     € 39.894,75
37       Movano Kipper Doppelkabine                     € 44.178,75
38                     Vivaro Cargo                     € 29.750,00
39              Vivaro Doppelkabine                     € 33.082,00
40                     Vivaro Kombi                     € 31.237,50
41              Grandland X Hybrid4                     € 51.165,00
42                     Movano Kombi                     € 30.905,00

Snapshot:

Tutorial: Python Web Scraping Using BeautifulSoup –, If the data you're looking for is on an web page, however, then the solution to all to scrape multiple web pages with Python using BeautifulSoup and requests. Let's extract all these 50 containers by parsing the HTML document from our� How To Work with Web Data Using Requests and Beautiful Soup with Python 3. This tutorial will go over how to work with the Requests and Beautiful Soup Python packages in order to make use of data from web pages.

Use requests rather thanurllib.request...

import requests
from bs4 import BeautifulSoup

url = "https://www.opel.de/auswahlhilfe/modelle.html"
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
text = soup.get_text()
print(text)

Tutorial: Web Scraping and BeautifulSoup – Dataquest, Access the HTML of the webpage and extract useful information/data from it. This technique is called web scraping or web harvesting or web data extraction. This� Web scraping is the technique to extract data from a website. The module BeautifulSoup is designed for web scraping. The BeautifulSoup module can handle HTML and XML. It provides simple method for searching, navigating and modifying the parse tree.

I'm not sure if you are committed to using BeautifulSoup, but you can solve this problem with some plain old Selenium.

Using the URL https://www.opel.de/auswahlhilfe/modelle.html you provided, these samples may get you started.

from selenium import webdriver
from time import sleep
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

# start the driver
driver = webdriver.Chrome()

# navigate to the URL
driver.get("https://www.opel.de/auswahlhilfe/modelle.html")

# invoke WebDriverWait to wait for page to load
# get the model names
model_names = WebDriverWait(driver, 10).until(
        EC.presence_of_all_elements_located((By.XPATH, "//a/div/div/div/span[contains(@class, 'q-carline')]")))

for model in model_names:
    print(model.text)

To get a list of prices:

prices = driver.find_elements_by_xpath("//a/div/div/div/span[contains(@class, 'q-price')]/span[contains(@class, 'q-value')]");

for price in prices:
    print(price.text)

Implementing Web Scraping in Python with BeautifulSoup , Use Python's BeautifulSoup library to assist in the honest act of be treated as such: there are much better options for simple data extraction. Web scraping is the process of doing this, of extracting data from web pages. In this article, we’ll see how to do web scraping in python. For this task, there are several libraries that you can use. Among these, here we will use Beautiful Soup 4. This library takes care of extracting data from a HTML document, not downloading it.

You can get some of the data back from json response. Just would have to phish out what you can get from there:

import requests
from bs4 import BeautifulSoup

url = 'https://www.opel.de/apps/atomic/getVehicleTeasers.path=L2NvbnRlbnQvb3BlbC93b3JsZHdpZGUvZ2VybWFueS9kZS9pbmRleC9iYXNlYmFsbC1jYXJkcy9iYmMtY29sbGVjdGlvbnMvdmVoaWNsZXMtb25seS1jb2xsZWN0aW9uPuGlIfE.feefoEnabled=false.expandingMenuEnabled=false.json'

jsonData = requests.get(url).json()['bbcTeaser']

for each in jsonData:
    soup = BeautifulSoup(each)

    carline = soup.find('span', {'class':'q-carline'}).text
    price = soup.find('span', {'class':'q-value'}).text

    print (carline, price)

Output:

ADAM € 14.120,00 nur Lagerfahrzeuge
ADAM ROCKS € 16.475,00 nur Lagerfahrzeuge
ADAM ROCKS S € 20.430,00
ADAM S € 19.330,00 
Ampera-e € 42.990,00
Astra 5-Türer € 19.990,00
Astra Sports Tourer € 20.990,00
Cascada € 33.995,00 nur Lagerfahrzeuge
Combo Life € 21.645,00
Neuer Corsa € 13.990,00
Corsa-e € 29.900,00
Corsa 3-Türer € 13.255,00
Corsa 5-Türer € 14.055,00
Crossland X € 18.750,00
Grandland X € 24.700,00
Insignia Grand Sport € 28.505,00
Insignia GSi € 46.695,00
Insignia Sports Tourer € 29.505,00
Insignia Country Tourer € 41.385,00
KARL € 13.350,00
KARL ROCKS € 12.965,00
Mokka X € 20.495,00
Zafira € 28.495,00
Zafira Life € 34.780,00
Combo Cargo € 20.230,00
Movano Cargo € 27.925,00
Movano Doppelkabine € 38.288,25
Movano Fahrgestell Normalkabine € 34.777,75
Movano Fahrgestell Doppelkabine € 35.967,75
Movano Plattformfahrgestell € 34.777,75
Movano Kofferaufbau € 46.320,75
Movano Pritsche Normalkabine € 37.574,25
Movano Pritsche Doppelkabine € 38.764,25
Movano Kofferaufbau € 46.320,75
Movano Pritsche Normalkabine € 37.574,25
Movano Pritsche Doppelkabine € 38.764,25
Movano Kipper Normalkabine € 39.894,75
Movano Kipper Doppelkabine € 44.178,75
Vivaro Cargo € 29.750,00
Vivaro Doppelkabine € 33.082,00
Vivaro Kombi € 31.237,50
Grandland X Hybrid4 € 51.165,00
Movano Kombi € 30.905,00

Scraping Data on the Web with BeautifulSoup, We'll make data extraction easier by building a web scraper to retrieve stock indices automatically from the Internet. Getting Started. We are going� BeautifulSoup is a popular Python library for extracting data from HTML or live pages. It isn't limited to a single webpage. You can extract data from multiple web pages. In fact, one of the examples we use does just that.

How to scrape websites with Python and BeautifulSoup, You'll learn how to write a script that uses Python's requests library to scrape data from a website. You'll also use Beautiful Soup to extract the� Web Scraping is a useful technique to convert unstructured data on the web to structured data. BeautifulSoup is an efficient library available in Python to perform web scraping other than urllib. A basic knowledge of HTML and HTML tags is necessary to do web scraping in Python.

Beautiful Soup: Build a Web Scraper With Python – Real Python, Since we'll be doing this project in order to learn about web scraping with Beautiful Soup, we don't need to pull too much data from the site, so let's limit the � Use requests and Beautiful Soup for scraping and parsing data from the Web. Walk through a web scraping pipeline from start to finish. Build a script that fetches job offers from the Web and displays relevant information in your console.

Collecting Data from the Web with Python and Beautiful Soup , Web scraping is also sometimes referred to as web harvesting or web data extraction. Copying text from a website and pasting it to your local� Implementing Web Scraping in Python with BeautifulSoup There are mainly two ways to extract data from a website: Use the API of the website (if it exists). For example, Facebook has the Facebook Graph API which allows retrieval of data posted on Facebook.

Comments
  • Amazing! This is what i want :) Thanks @KunduK
  • @KunduK I tried to find this json url frm network but i didnt able to find that specific url. Can you explain a bit more...
  • Once you go to network tab click on XHR tab and do ctlr+R you will see the link which returns data as json when click on response tab.
  • Thanks @Moshe but how i can generate list? I am fetching the text and Its clean text but not sure how to create list of models from that...
  • @MuhammadSalmanShahid see KunduK's answer... he did the work!
  • Thanks for your answer. Are you using any other library because you used 'driver'?
  • name 'driver' is not defined Just getting error.
  • Driver is the Selenium Webdriver. You have to actually start a driver first. Updated my code sample.
  • Thanks. But still getting exception error: WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
  • You need to do exactly what the error message is telling you to do -- add chromedriver.exe to your Path environment variable. There are plenty of guides and tutorials to show you exactly how to do this. Here's one: stackoverflow.com/questions/29858752/…