selecting second child in beautiful soup with soup.select?

I have:

<h2 id='names'>Names</h2>
<p>John</p>
<p>Peter</p>

now what's the easiest way to get the Peter here if I have h2 tag already? Now I've tried:

soup.select("#names > p:nth-child(1)")

but here I get nth-child NotImplementedError:

NotImplementedError: Only the following pseudo-classes are implemented: nth-of-type.

So I'm not sure what's going on here. The second option was to just get all 'p' tag children and hard select [1] but then there's a danger of index out of range which would require to surround every attempt to get Peter with try/except which is a bit silly.

Any way to select nth-child with soup.select() function?

EDIT: replacing nth-child with nth-of-type seemed to do the trick, so the correct line is:

soup.select("#names > p:nth-of-type(1)")

not sure why it doesn't accept nth-child but it seems that both nth-child and nth-of-type return the same results.

Adding your edit as an answer so that it can be more easily found by others:

Use nth-of-type instead of nth-child:

soup.select("#names > p:nth-of-type(1)")

selecting second child in beautiful soup with soup.select?, 0, Beautiful Soup supports most CSS4 selectors via the SoupSieve project. If you installed Beautiful Soup through pip , SoupSieve was installed at the same time, so you don't have to do anything extra. You'll be able to use nearly all selectors you'd ever need to, including nth-child . In case you want to find only the first child: >>> soup.select_one('li.test > a') <a>link1</a> share (beautiful soup) 1. Beautiful Soup returns no href. 0.

'nth-of-child' is simply not implemented in beautifulsoup4 (at time of writing), there is simply no code in the beautifulsoup codebase to do it. The authors explicitly added the 'NotImplementedError' to explain this, here is the code

Given the html you quote in your question you are not looking for a child of h2#names.

What you are really looking for is the second adjacent sibling, I'm not a css selector guru but I found that this worked.

soup.select("#names + p + p")

Finding Children Nodes With Beautiful Soup – Linux Hint, For beginners in web scraping with BeautifulSoup, an article discussing the concepts of web already know what a child node is, it is basically a node (tag) that exists inside another node. ["\n Here's an ordered list\n ", <li>Number One< /li>, Henceforth, we would be working with the “our_soup” variable and calling all of our attributes or methods on it. On a quick note, if you do not already know what a child node is, it is basically a node (tag) that exists inside another node. In our HTML snippet for example, the li tags are children nodes of both the “ul” and the “ol

Beautiful Soup 4.7.0 (released at the beginning of 2019) now supports most selectors, including :nth-child:

As of version 4.7.0, Beautiful Soup supports most CSS4 selectors via the SoupSieve project. If you installed Beautiful Soup through pip, SoupSieve was installed at the same time, so you don’t have to do anything extra.

So, if you upgrade your version:

pip install bs4 -U

You'll be able to use nearly all selectors you'd ever need to, including nth-child.

That said, note that in your input HTML, the #names h2 tag does not actually have any children:

<h2 id='names'>Names</h2>
<p>John</p>
<p>Peter</p>

Here, there are just 3 elements, which are all siblings, so

#names > p:nth-child(1)

wouldn't work, even in CSS or Javascript.

If the #names element had the <p>s as children, your selector would work, to an extent:

html = '''
<div id='names'>
    <p>John</p>
    <p>Peter</p>
</div>
'''
soup = BeautifulSoup(html, 'html.parser')
soup.select("#names > p:nth-child(1)")

Output:

[<p>John</p>]

Of course, the John <p> is the first child of the #names parent. If you want Peter, use :nth-child(2).

If the elements are all adjacent siblings, you can use + to select the next sibling:

html = '''
<h2 id='names'>Names</h2>
<p>John</p>
<p>Peter</p>
'''
soup = BeautifulSoup(html, 'html.parser')
soup.select("#names + p + p")

Output:

[<p>Peter</p>]

Beautiful soup cheat sheet: https://www.crummy.com/software , soup = BeautifulSoup(html_doc, 'html.parser') css selector. #------------------------- . css_soup.select("p.strikeout.body"). soup.select("p nth-of-type(3)") # 3rd child. Any way to select nth-child with soup.select() function? EDIT: replacing nth-child with nth-of-type seemed to do the trick, so the correct line is: soup.select("#names > p:nth-of-type(1)") not sure why it doesn't accept nth-child but it seems that both nth-child and nth-of-type return the same results.

Python web scraping with BeautifulSoup, BeautifulSoup is not a web scraping library per se. all_results = soup.select('td: nth-child(2) > span:nth-child(1)') results = [r.text.split(' ')[0].strip() p> </div> <p> This paragraph will be selected</p> (match h2 ~ p) </section>. rows = soup.findAll('tr')[4::5] This can be easily done with select in beautiful soup if you know the row numbers to be selected. (Note : This is in bs4

Beautiful Soup Documentation — Beautiful Soup 4.9.0 documentation, In this case, the <html> tag is the child of the BeautifulSoup object.: for string in soup.strings: print(repr(string)) # u"The Dormouse's story" # u'\n\n' # u"The� Clone via HTTPS Clone with Git or checkout with SVN using the repository’s web address.

Beautiful Soup - Quick Guide, very easily. Below are some of the points on why to choose python for web scraping: Another way is to pass the document through open filehandle. from bs4 import In this case, the <html> tag is the child of the BeautifulSoup object − We'll start out by using Beautiful Soup, one of Python's most popular HTML-parsing libraries. Importing the BeautifulSoup constructor function. This is the standard import statement for using Beautiful Soup: from bs4 import BeautifulSoup. The BeautifulSoup constructor function takes in two string arguments: The HTML string to be parsed.

Comments
  • it did works indeed, however I did it with nth-of-type instead of nth-child which seemed to do the trick as well.