How to scrape this piece of HTML with BS

beautifulsoup get href text
beautifulsoup looping through pages
web scraping python beautifulsoup
beautifulsoup find text
scrape website with login python beautifulsoup
beautifulsoup tutorial
beautifulsoup find by class
beautifulsoup get text inside tag

I'm trying to scrape following kind of HTML in BeautifulSoup.

<div …. > <div…..>
<div class="class1">Jill</div> <div class="class2">50</div>
<div class="class1">Jane</div>
<div class="class1">Joe</div>  <div class="class2">12</div>
</div></div>

Not every person has a second item to scrape so things like soup.find_all("div", attrs={"class": "class2"}) will not work correctly (it will return both 50 and 12 but the 12 is not connected with the right person)

Wanted result (in variables):

Jill 50 Jane Joe 12

You could get all name('class1') elements and check if they have a corresponding age('class2') element.

from bs4 import BeautifulSoup

html = """
<div class='parent'>
    <div class="class1">Jill</div> <div class="class2">50</div>
    <div class="class1">Jane</div>
    <div class="class1">Joe</div> <div class="class2">12</div>
</div>
"""

soup = BeautifulSoup(html)

name_tags = soup.find_all('div', {'class': 'class1'})

name_age_pairs = []

# Iterate through all 'class1' elements and see if the next sibling is 'class2'
for name_tag in name_tags:
    name_next_div = name_tag.find_next('div')
    age = None
    if 'class2' in name_next_div['class']:
        age = int(name_next_div.string)
    name_age_pairs.append((name_tag.string, age))

print(name_age_pairs)

name_age_pairs will contain:

[('Jill', 50), ('Jane', None), ('Joe', 12)]

Where 'None' means there is no age associated with the second person.

Tutorial: Web Scraping and BeautifulSoup – Dataquest, In this tutorial we'll learn to scrape multiple web pages with Python using To parse our HTML document and extract the 50 div containers, we'll use a Python Since we're going to make 72 requests, our work will look a bit untidy as the  Web scraping is an important part of the data science process. It is also a huge pain, but if you can wade through all the html of a page you often come away with something useful. The Federal…

Try this:

pairs = []
for div in soup.find_all('div', {'class': 'class1'}):
    name = div.text
    item = ''
    tmp = div.find_next('div')
    if 'class2' in tmp['class']:
        item = tmp.text
    pairs.append([name, item])

Web Scraping with Beautiful Soup, Web scraping is a process of extracting specific information as structured data from HTML/XML content. Often data scientists and researchers  Setting up a Web Scraping Project. First, you will have to download and install ParseHub for free. Once open, click on New Project and submit the URL we will be scraping. ParseHub will now render the page and you will be able to select the data you’d like to extract.

This is what I finally used. Works for multiple values and spaces inside class names.

# default values for vars
Item1 = Item2 = Item3 = ""

for item in soup.find_all('div'):

    # convert to str for comparison reasons
    strItem = str(item)

    if strItem.find("class1") > 0 and item.string != None:

        if Item1 != "": # if you have None as default change this
            print(Item1, Item2, Item3) # or make list, dict, json, csv, sql......

        Item2 = Item3 = "" # default values for vars
        Item1 = item.string

    elif strItem.find("class2") > 0 and item.string != None:
        Item2 = item.string

    elif strItem.find("class3") > 0 and item.string != None:
        Item3 = item.string

    # and so on....

# don't forget to process the last one...
print(Item1, Item2, Item3) # # or make list, dict, json, csv, sql......

Web Scraping and Parsing HTML in Python with Beautiful Soup , Every web page is different, and sometimes getting the right data out of them requires a bit of creativity, pattern recognition, and experimentation. In this Python Programming Tutorial, we will be learning how to scrape websites using the Requests-HTML library. Requests-HTML is an excellent tool for parsing HTML code and grabbing exactly the

Scraping Data on the Web with BeautifulSoup, As mentioned before, requests will provide us with our target's HTML, out of a web page is a bit of an art form: effective scraping requires us  The Scrape HTML Add-In is a simple Excel Add-In which allows you to easily scrape HTML content from websites without needing to write a single line of VBA code.Most scraping tools/examples/add-ins are either very complicated or not really flexible e.g. prepared for scraping only some simple examples of HTML.

Web Scraping 101 with Python & Beautiful Soup, For example, a python script could scrape a website when ticket sales But Beautiful Soup allows you to parse the HTML in a a beautiful way, You know, it really doesn't matter what you write as long as you've got a young, and beautiful, piece of text. Stack choices: React vs Vue vs Angular vs Svelte. Knowing how to wield any scraping library, is a very useful skill. Well worth the investment 💰 The secret is that it’s easy to scrape websites. Together, we’ll build a simple Python class that scrapes the BBC. Ps: If you only opened the article for the final code, feel free to skip to the end where it’s all laid out.

Web Scraping with Python: Collecting More Data from the Modern Web, import re random.seed(datetime.datetime.now()) def getLinks(articleUrl): html = urlopen('http://en.wikipedia.org{}'.format(articleUrl)) bs = BeautifulSoup(html,  Data Science Stack Exchange is a question and answer site for Data science professionals, Machine Learning specialists, and those interested in learning more about the field. It only takes a minute to sign up.

Comments
  • Yes find_all() will return all element having classname class2.If you use find() it will return 1st match.However not clear what is your expected out put??I guess you need to claas2 value wrt username of class1?
  • I've updated the question with the expected result.