Beautiful Soup find() isn't finding all results for Class

Beautiful Soup find() isn't finding all results for Class

beautifulsoup find by class
beautifulsoup get text
beautifulsoup find by id
beautifulsoup get text inside tag
beautifulsoup findall multiple tags
beautifulsoup find nested tags
beautifulsoup get href
beautifulsoup get innerhtml

I have code trying to pull all the html stuff within the tracklist container, which should have 88 songs. The information is definitely there (I printed the soup to check), so I'm not sure why everything after the first 30 react-contextmenu-wrapper are lost.

from bs4 import BeautifulSoup
from urllib.request import urlopen
import re


spotify = 'https://open.spotify.com/playlist/3vSFv2hZICtgyBYYK6zqrP'
html = urlopen(spotify)
soup = BeautifulSoup(html, "html5lib")

main = soup.find(class_ = 'tracklist-container')
print(main)

Thank you for the help. Current output from printing is as follows:

                  1.
              </div></div><div class="tracklist-col name"><div class="top-align track-name-wrapper"><span class="track-name" dir="auto">Move On - Teen Daze Remix</span><span class="artists-albums"><a href="/artist/3HrczLBDJXJu6dJWEMbKHa" tabindex="-1"><span dir="auto">Garden City Movement</span></a>     • <a href="/album/4p8FxnuYzykCcN7xbjA9jq" tabindex="-1"><span dir="auto">Entertainment</span></a></span></div></div><div class="tracklist-col explicit"></div><div class="tracklist-col duration"><div class="top-align"><span class="total-duration">5:11</span><span class="preview-duration">0:30</span></div></div><div class="progress-bar-outer"><div class="progress-bar"></div></div></li><li class="tracklist-row js-track-row tracklist-row--track track-has-preview" data-position="2" role="button" tabindex="0"><div class="tracklist-col position-outer"><div class="play-pause top-align"><svg aria-label="Play" class="svg-play" role="button"><use xlink:href="#icon-play" xmlns:xlink="http://www.w3.org/1999/xlink"></use></svg><svg aria-label="Pause" class="svg-pause" role="button"><use xlink:href="#icon-pause" xmlns:xlink="http://www.w3.org/1999/xlink"></use></svg></div><div class="tracklist-col__track-number position top-align">
                  2.
              </div></div><div class="tracklist-col name"><div class="top-align track-name-wrapper"><span class="track-name" dir="auto">Flicker</span><span class="artists-albums"><a href="/artist/4qpWUfUAeI34HzvCORn1ze" tabindex="-1"><span dir="auto">Forhill</span></a>     • <a href="/album/0gfz1Tbst40swwL357cRqG" tabindex="-1"><span dir="auto">Flicker</span></a></span></div></div><div class="tracklist-col explicit"></div><div class="tracklist-col duration"><div class="top-align"><span class="total-duration">3:45</span><span class="preview-duration">0:30</span></div></div><div class="progress-bar-outer"><div class="progress-bar"></div></div></li><li class="tracklist-row js-track-row tracklist-row--track track-has-preview" data-position="3" role="button" tabindex="0"><div class="tracklist-col position-outer"><div class="play-pause top-align"><svg aria-label="Play" class="svg-play" role="button"><use xlink:href="#icon-play" xmlns:xlink="http://www.w3.org/1999/xlink"></use></svg><svg aria-label="Pause" class="svg-pause" role="button"><use xlink:href="#icon-pause" xmlns:xlink="http://www.w3.org/1999/xlink"></use></svg></div><div class="tracklist-col__track-number position top-align">

...

                  30.
              </div></div><div class="tracklist-col name"><div class="top-align track-name-wrapper"><span class="track-name" dir="auto">Trapdoor</span><span class="artists-albums"><a href="/artist/3nqTFzjmi1LLM6pn0TRMv8" tabindex="-1"><span dir="auto">Eagle Eyed Tiger</span></a>     • <a href="/album/48Q8Jgk1x4wiHWecV4nlz6" tabindex="-1"><span dir="auto">Future or Past</span></a></span></div></div><div class="tracklist-col explicit"></div><div class="tracklist-col duration"><div class="top-align"><span class="total-duration">4:14</span><span class="preview-duration">0:30</span></div></div><div class="progress-bar-outer"><div class="progress-bar"></div></div></li></ol><button class="link js-action-button" data-track-type="view-all-button">View all on Spotify</button></div>

Last entry should be the 88th. It just feels like my search results got truncated.


It is all there in the response just within a script tag.

You can see the start of the relevant javascript object here:

I would regex out the required string and parse with json library.


Py:

import requests, re, json

r = s.get('https://open.spotify.com/playlist/3vSFv2hZICtgyBYYK6zqrP')
p = re.compile(r'Spotify\.Entity = (.*?);')
data = json.loads(p.findall(r.text)[0])
print(len(data['tracks']['items']))

Beautiful Soup documentation, In the first example, we use BeautifulSoup module to get three tags. We open the index.html file and read its contents with the read() method. Find HTML Tags using BeautifulSoup. In this tutorial we will learn about searching any tag using BeautifulSoup module. We suggest you to go through the previous tutorials about the basic introduction to the BeautifulSoup module and the tutorial covering all the useful methods of the BeautifulSoup module.


Since it seemed you were on right track, I did not try to solve the full problem and rather tried to provide you a hint which could be helpful: Do dynamic webscrapping.

"Why Selenium? Isn’t Beautiful Soup enough?

Web scraping with Python often requires no more than the use of the Beautiful Soup to reach the goal. Beautiful Soup is a very powerful library that makes web scraping by traversing the DOM (document object model) easier to implement. But it does only static scraping. Static scraping ignores JavaScript. It fetches web pages from the server without the help of a browser. You get exactly what you see in "view page source", and then you slice and dice it. If the data you are looking for is available in "view page source" only, you don’t need to go any further. But if you need data that are present in components which get rendered on clicking JavaScript links, dynamic scraping comes to the rescue. The combination of Beautiful Soup and Selenium will do the job of dynamic scraping. Selenium automates web browser interaction from python. Hence the data rendered by JavaScript links can be made available by automating the button clicks with Selenium and then can be extracted by Beautiful Soup." https://medium.com/ymedialabs-innovation/web-scraping-using-beautiful-soup-and-selenium-for-dynamic-page-2f8ad15efe25

Here is what I see at the end of the 30 songs in the DOM which refers to a button:

    </li>
   </ol>
   <button class="link js-action-button" data-track-type="view-all-button">
    View all on Spotify
   </button>
  </div>

Understand the Find() function in Beautiful Soup, Web scraping is a process of extracting specific information as structured Use BeautifulSoup to find the particular element from the response and extract the text​. Use prettify() method to print the formatted HTML response. Beautiful Soupis a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers


It's because you're doing

main = soup.find(class_ = 'tracklist-container')

the class "tracklist-container" only holds these 30 items, i'm not sure what you're trying to accomplish, but if you want what's afterwards try parsing the class afterwards.

in other words, the class contains 30 songs, i visited the site and found 30 songs so it might be only for logged in users.

Python BeautifulSoup tutorial, In this article, we will learn how to extract structured information from any BeautifulSoup is not a web scraping library per se. 200: print("Error fetching page") exit() else: content = response.content print(content) > b'<html  Beautiful Soup 3 only works on Python 2.x, but Beautiful Soup 4 also works on Python 3.x. Beautiful Soup 4 is faster, has more features, and works with third-party parsers like lxml and html5lib. You should use Beautiful Soup 4 for all new projects.


Using BeautifulSoup to parse HTML and extract press briefings URLs, BeautifulSoup is a Python module that parses HTML (and can deal with common find() - finds the first, next match; findAll() - finds all matches. O nce you get into Web Scraping and data processing, you will find so many tools that can do that job for you. One of them is Beautiful Soup, which is a python library for pulling data out of HTML and XML files. It creates data parse trees in order to get data easily. Original photo by Joshua Sortino on Unsplash.


Web Scraping with Beautiful Soup, find() is one of the best features in BeautifulSoup. It helps aggregate DOM elements easily so you can manipulate what you need. Knowing which  Beautiful Soup 4 supports most CSS selectors with the .select() method, therefore you can use an id selector such as:. soup.select('#articlebody') If you need to specify the element’s type, you can add a type selector before the id selector:


Python web scraping with BeautifulSoup, Since most of the HTML data is nested, we cannot extract data simply through Now soup.prettify() is printed, it gives the visual representation of the parse tree  In BeautifulSoup, we use the find_all method to extract a list of all of a specific tag’s objects from a webpage. Thus, in the links example, we specify we want to get all of the anchor tags (or “a” tags), which create HTML links on the page.