BeautifulSoup XML Only printing first line

BeautifulSoup XML Only printing first line

I'm using BeautifulSoup4 (And lxml) to parse an XML file, for some reason when I print soup.prettify() it only prints the first line:

from bs4 import BeautifulSoup

f = open('xmlDoc.xml', "r")

soup = BeautifulSoup(f, 'xml')

print soup.prettify()

#>>> <?xml version="1.0" encoding="utf-8"?>

Any idea why it's not grabbing everything?

UPDATE:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<!-- Data Junction generated file.
Macro type "1000" is reserved. -->
<djmacros>
  <macro name="Test" type="5000" value="TestValue">
    <description>test</description>
  </macro>
  <macro name="AnotherTest" type="0" value="TestValue2"/>
  <macro name="TestLocation" type="1000" value="C:\RandomLocation">
    <description> </description>
  </macro>
<djmacros>

The file position is at EOF:

>>> soup = BeautifulSoup("", 'xml')
>>> soup.prettify()
'<?xml version="1.0" encoding="utf-8">\n'

Or the content is not valid xml:

>>> soup = BeautifulSoup("no <root/> element", 'xml')
>>> soup.prettify()
'<?xml version="1.0" encoding="utf-8">\n'

python - BeautifulSoup XML Only printing first line, The file position is at EOF: >>> soup = BeautifulSoup("", 'xml') >>> soup.prettify() '​<?xml version="1.0" encoding="utf-8">\n'. Or the content is not valid xml: >  Printing Beautiful. So far, you’ve seen some important methods and attributes that are useful when parsing XML documents using BeautifulSoup. But if you notice, when you print the tags to the screen, they have some kind of clustered look.


As per J.F.Sebastion's answer, the XML is invalid.

Your final tag is incorrect:

<djmacros>

The correct tag is:

</djmacros>

You can confirm this with an XML validator. Eg http://www.w3schools.com/xml/xml_validator.asp

BeautifulSoup XML Only printing first line - Developer FAQ 1, I'm using BeautifulSoup4 (And lxml) to parse an XML file, for some reason when I print soup.prettify() it only prints the first line:from bs4 import  There are multiple situations where you want to extract specific types of information (only <a> tags) using Beautifulsoup4. The SoupStrainer class in Beautifulsoup allows you to parse only specific part of an incoming document. One way is to create a SoupStrainer and pass it on to the Beautifulsoup4


If the encoding is UTF-8-BOM instead of UTF-8 it may have problems even if the XML is otherwise valid.

BeautifulSoup XML Only printing first line, I'm using BeautifulSoup4 (And lxml) to parse an XML file, for some reason when I print soup.prettify() it only prints the first line: from bs4 import BeautifulSoup f  BeautifulSoup. BeautifulSoup is a Python library for parsing HTML and XML documents. It is often used for web scraping. BeautifulSoup transforms a complex HTML document into a complex tree of Python objects, such as tag, navigable string, or comment.


No Soup for You: When Beautiful Soup Doesn't Like Your XML, Over the years, Beautiful Soup has probably saved us more hours on scraping, me to parse through HTML data, the first dependency I'll pull is BeautifulSoup. Our customer just needs all the links that we have in the data. for blazing speed soup = BeautifulSoup(blob, 'lxml') print(soup.find_all('link')). Beautiful Soup Documentation¶ Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.


How to Parse XML Files Using Python's BeautifulSoup – Linux Hint, BeautifulSoup is one of the most used libraries when it comes to web scraping with Python. Since XML files are similar to HTML files, it is also capable of parsing them. Hence, you'll need to parse them to get vital information, just as you would Read each line in the file, readlines() returns a list of lines print(​result) Beautiful Soup 3 has been replaced by Beautiful Soup 4. You may be looking for the Beautiful Soup 4 documentation. Beautiful Soup 3 only works on Python 2.x, but Beautiful Soup 4 also works on Python 3.x. Beautiful Soup 4 is faster, has more features, and works with third-party parsers like lxml and html5lib.


Beautiful Soup Documentation, Beautiful Soup is a Python library for pulling data out of HTML and XML files. for link in soup.find_all('a'): print(link.get('href')) # http://example.com/elsie If you get the SyntaxError “Invalid syntax” on the line ROOT_TAG_NAME = u'[document​]' , you The only one you'll probably ever need to worry about is the comment:. NOTE: This is an archival document describing the now-obsolete 2.x version of Beautiful Soup. For the latest version, see the Beautiful Soup homepage. How to Use Beautiful Soup. This document explains the use of Beautiful Soup: how to create a parse tree, how to navigate it, and how to search it. Quick Start