Get the integers from a list, created with BeautifulSoup in Python

beautifulsoup find by class
web scraping python beautifulsoup
beautifulsoup get text inside tag
beautifulsoup find nested tags
beautifulsoup looping through pages
if both html and xml parsers are available, which one will be chosen by beautifulsoup for parsing?
beautifulsoup tutorial
get all links from a website python beautifulsoup

I'm a beginner in Python and i need some help about this code :

from urllib.request import *
from bs4 import BeautifulSoup
import re

req = Request("https://adrianchifu.com/teachings/AMSE/MAG1/project/Xlrda/dsuR/2/J9ED27Y.html")
a = urlopen(req).read()
soup=BeautifulSoup(a,'html.parser')
nombres=[]
tout = (soup.find_all('td'))
str_tout=str(tout)     
tout = [float(s) for s in re.findall(r'\d+\.\d+', str_tout)]
nombres.append(tout)
print(nombres)

From a website, i need to get all the numeric values contained in it (it's juste a part contained in the whole code). I have succeeded in extracting the floats, but i can't get the integers. I have tried many things but i didn't figure out how to do. Thanks for your help.

EDIT : For this link (https://adrianchifu.com/teachings/AMSE/MAG1/project/Xlrda/dsuR/2/9GYIGO.html), the method given just below isn't working because in the list, there are integers, floats but also characters. And some chain of characters start with a number, which is complicating the thing. How can i catch the integers but not the characters starting with a number?

You should keep doing with you own way, and you can complete your job by using split.

from urllib.request import *
from bs4 import BeautifulSoup
import re

req = Request("https://adrianchifu.com/teachings/AMSE/MAG1/project/Xlrda/dsuR/2/J9ED27Y.html")
a = urlopen(req).read()
soup = BeautifulSoup(a,'html.parser')
nombres = []
tout = [ele.text for ele in soup.find_all('td')]
tout = [text if not re.findall(r"^\d+\.\d+",text) else int(text.split(".")[0]) for text in tout]
print(tout)
# [89, 54, 19, 'OIK3XF02PS', 87, 2, 99, '6190', 83, 'E2RYAFAE']

Get the integers from a list, created with BeautifulSoup in Python, I'm a beginner in Python and i need some help about this code : from urllib.​request import * from bs4 import BeautifulSoup import re req  def Parse(self,url,song_name,flag): ''' It will the resource URL if song is found, Otherwise it will return the list of songs that can be downloaded ''' file_download=FileDownload() html=file_download.get_html_response(url) if flag == False: soup=BeautifulSoup(html) a_list=soup.findAll('a','touch') #print a_list text=[str(x) for x in a_list

Integers don't have the form \d+\.\d+, so let's make the decimal point and digits optional with ^\d+(?:\.\d+)?$ (note the non-capturing group. It is important).

Then, I'd try to match each td.text by itself:

req = Request("https://adrianchifu.com/teachings/AMSE/MAG1/project/Xlrda/dsuR/2/J9ED27Y.html")
a = urlopen(req).read()
soup = BeautifulSoup(a,'html.parser')
nombres = []
tds = soup.find_all('td')
for td in tds:
    if re.match(r'^\d+(?:\.\d+)?$', td.text):
        nombres.append(float(td.text))
print(nombres)

This outputs

[89.169, 54.893, 19.212, 87.045, 2.248, 99.947, 6190.0, 83.096]

As a last improvement I'd use a list comprehenssion with a compiled regex to improve the performance a bit:

req = Request("https://adrianchifu.com/teachings/AMSE/MAG1/project/Xlrda/dsuR/2/J9ED27Y.html")
a = urlopen(req).read()
soup = BeautifulSoup(a,'html.parser')
tds = soup.find_all('td')
numbers_regex = re.compile(r'^\d+(?:\.\d+)?$')
nombres = [float(td.text) for td in tds if numbers_regex.match(td.text)]

Beautiful Soup Documentation, Beautiful Soup is a Python library for pulling data out of HTML and XML files. or html5lib–Python's built-in HTML parser is just not very good in older versions. You can use `get_attribute_list to get a value that's always a list, whether or For html.parser, these numbers represent the position of the initial less-than sign. A list of random numbers can be then created using python list comprehension approach: >>> l = [random.randint(0,10) for i in range(5)] >>> l [4, 9, 8, 4, 5] Another solution is to use randrange function (except that can specify a step if you need):

If you are looking for regex for matching the integers.

^[1-9][0-9]{0,2}$

All positive non-zero integers between 1 and 999. You can adjust the upper range of this expression by changing the second number (ie 2) in the {0,2} part of the expression.

Courtsy: http://regexlib.com

Using BeautifulSoup in Python to scrape a list of 44 best bars in the , Using BeautifulSoup in Python to scrape a list of 44 best bars in the Twin Cities We have created a directory in Excel that gives us the list of best bars in town along with their addresses, phone numbers and website address. Python | Converting all strings in list to integers Interconversion between data types is facilitated by python libraries quite easily. But the problem of converting the entire list of string to integers is quite common in development domain.

Web Scraping With Python, Let's replace our re.findall line with a BeautifulSoup call which uses a CSS will need to use regexes to work with the text content that you find when you get there​. an int. prices = [int(span.text[1:]) for span in price_spans] This line uses list  BeautifulSoup is a module that allows us to extract data from an HTML page. You will find it working with HTML easier than regex. We will: – able to use simple methods and Pythonic idioms searching tree, then extract what we need without boilerplate code.

Tutorial: Web Scraping and BeautifulSoup – Dataquest, This intermediate tutorial teaches you use BeautifulSoup and Python to One way to get all the data we need is to compile a list of movie names, and The '​html.parser' argument indicates that we want to do the parsing using Python's built-in HTML parser. We could easily clean that output and convert it to an integer. Before that, the website will be scraped using python's BeautifulSoup package. To understand the page structure, Chrome browser developer tools will need to be used. This is done to identify the Classes that will be searched to get the required information.

Tutorial: Python Web Scraping Using BeautifulSoup –, HTML isn't a programming language, like Python — instead, it's a markup We first have to import the library, and create an instance of the BeautifulSoup class Note that children returns a list generator, so we need to call the list function on it: expand=False) weather["temp_num"] = temp_nums.astype('int') temp_nums BeautifulSoup. BeautifulSoup is a Python library for parsing HTML and XML documents. It is often used for web scraping. BeautifulSoup transforms a complex HTML document into a complex tree of Python objects, such as tag, navigable string, or comment.

Comments
  • Try \d+(\.\d+)? instead of just \d+\.\d+.
  • Traceback (most recent call last): File "test4.py", line 11, in <module> tout = [float(s) for s in re.findall(r'\d+(\.\d+)?', str_tout)] File "test4.py", line 11, in <listcomp> tout = [float(s) for s in re.findall(r'\d+(\.\d+)?', str_tout)] ValueError: could not convert string to float:
  • Python is returning me this, it must be because i also have to deal with characters in my html file.
  • I tried running the regexp in JS directly on your webpage and I can confirm that the regexp \d+(\.\d+)? is correct: document.body.textContent.match(/\d+(\.\d+)?/g).map(parseFloat). However I would advice querying only in the table (quick and dirty JS snippet) document.querySelector("table").textContent.match(/\d+(\.\d+)?/g).map(parseFloat)
  • @Derek朕會功夫 This question is about Python, how is JS code going to help OP? Anyway, you'd have to use a non-capturing group as my answer suggests, otherwise float() will get empty strings for integers
  • Thanks for your detailed answer. Yet, what i sent you is a a test for one of the links, and i have a lot of links to treat. For this one adrianchifu.com/teachings/AMSE/MAG1/project/Xlrda/dsuR/2/… The code is not working. I guess it's because one of the character chain is starting with a number. How do i have to modify the code you sent to me to get over this?
  • This isthe error : Traceback (most recent call last): File "test4.py", line 13, in <module> nombres.append(float(td.text)) ValueError: could not convert string to float: '3ID1386'
  • @Shikantaza I've updated the regex you should use (^\d+(?:\.\d+)?$)