How to format XML document in Linux

how to read xml file in linux
xmllint --format in place
xml formatter
linux xml parser
linux xml viewer
vim format xml
pretty print xml command line windows
linux xml command line tools

I have following XML tags in a large number.

<SERVICE>
<NAME>
sh_SEET15002GetReKeyDetails
</NAME>
<ID>642</ID>
</SERVICE>

I want to get this formatted in the following manner. I have tried using xmllint but it is not working for me. Please provide help.

<SERVICE>
<NAME>sh_SEET15002GetReKeyDetails</NAME>
<ID>642</ID>
</SERVICE>

xmllint -format -recover nonformatted.xml > formated.xml

For tab indentation:

export XMLLINT_INDENT=`echo -e '\t'`

For four space indentation:

export XMLLINT_INDENT=\ \ \ \ 

How to pretty print XML from the command line?, Redirecting viewing a file with cat to tidy specifying the file type of xml and to indent while quiet output will suppress error output. JSON also works  In gedit, you can add any script, in particular a Python script, as an External Tool. The script reads data from stdin and writes output to stdout, so it may be used as a stand-alone program. It layouts XML and sorts child nodes. This is a gedit plug-in to sort and layout XML.


Without programming you can use Eclipse XML Source Editor. Have a look at this answer

By the way have you tried xmllint -format -recover nonformatted.xml > formated.xml?

EDIT:

You can try this XMLStarlet Command Line XML Toolkit.

5. Formatting XML documents
====================================================

xml fo --help
XMLStarlet Toolkit: Format XML document
Usage: xml fo [<options>] <xml-file>
where <options> are
   -n or --noindent            - do not indent
   -t or --indent-tab          - indent output with tabulation
   -s or --indent-spaces <num> - indent output with <num> spaces
   -o or --omit-decl           - omit xml declaration <?xml version="1.0"?>
   -R or --recover             - try to recover what is parsable
   -D or --dropdtd             - remove the DOCTYPE of the input docs
   -C or --nocdata             - replace cdata section with text nodes
   -N or --nsclean             - remove redundant namespace declarations
   -e or --encode <encoding>   - output in the given encoding (utf-8, unicode...)
   -H or --html                - input is HTML
   -h or --help                - print help

How To Pretty Print and Format XML In Command Line Linux , We can format xml file named data.xml like below by providing --format option. $ xmllint --format data. How to format and indent xml file, Ubuntu, Linux Written by Krzysztof Dryja on January 26, 2013 in IT Stuff There’s a very simple way to format and indent ugly.xml file to pretty.xml in Ubuntu Linux:


I do it from gedit. In gedit, you can add any script, in particular a Python script, as an External Tool. The script reads data from stdin and writes output to stdout, so it may be used as a stand-alone program. It layouts XML and sorts child nodes.

#!/usr/bin/env python
# encoding: utf-8

"""
This is a gedit plug-in to sort and layout XML.

In gedit, to add this tool, open: menu -- Tools -- Manage External Tools...
Create a new tool: click [+] under the list of tools, type in "Sort XML" as tool name,
paste the whole text from this file in the "Edit:" box, then 
configure the tool:
Input: Current selection
Output: Replace current selection

In gedit, to run this tool,
FIRST SELECT THE XML,
then open: menu -- Tools -- External Tools > -- Sort XML

"""


from lxml import etree
import sys
import io

def headerFirst(node):
    """Return the sorting key prefix, so that 'header' will go before any other node
    """
    nodetag=('%s' % node.tag).lower()
    if nodetag.endswith('}header') or nodetag == 'header':
        return '0'
    else:
        return '1'

def get_node_key(node, attr=None):
    """Return the sorting key of an xml node
    using tag and attributes
    """
    if attr is None:
        return '%s' % node.tag + ':'.join([node.get(attr)
                                        for attr in sorted(node.attrib)])
    if attr in node.attrib:
        return '%s:%s' % (node.tag, node.get(attr))
    return '%s' % node.tag


def sort_children(node, attr=None):
    """ Sort children along tag and given attribute.
    if attr is None, sort along all attributes"""
    if not isinstance(node.tag, str):  # PYTHON 2: use basestring instead
        # not a TAG, it is comment or DATA
        # no need to sort
        return
    # sort child along attr
    node[:] = sorted(node, key=lambda child: (headerFirst(child) + get_node_key(child, attr)))
    # and recurse
    for child in node:
        sort_children(child, attr)


def sort(unsorted_stream, sorted_stream, attr=None):
    """Sort unsorted xml file and save to sorted_file"""
    parser = etree.XMLParser(remove_blank_text=True)
    tree = etree.parse(unsorted_stream,parser=parser)
    root = tree.getroot()
    sort_children(root, attr)

    sorted_unicode = etree.tostring(tree, pretty_print=True, xml_declaration=True, encoding="UTF-8")

    sorted_stream.write('%s' % sorted_unicode)


#we could do this, 
#sort(sys.stdin, sys.stdout)
#but we want to check selection:

inputstr = ''
for line in sys.stdin:
  inputstr += line
if not inputstr:
   sys.stderr.write('no XML selected!')
   exit(100)

sort(io.BytesIO(inputstr), sys.stdout)

There are two tricky things:

    parser = etree.XMLParser(remove_blank_text=True)
    tree = etree.parse(unsorted_stream,parser=parser)

By default, the spaces are not ignored, which may produce a strange result.

    sorted_unicode = etree.tostring(tree, pretty_print=True, xml_declaration=True, encoding="UTF-8")

Again, by default there is no pretty-printing either.

I configure this tool to work on the current selection and replace the current selection because usually there are HTTP headers in the same file, YMMV.

$ python --version
Python 2.7.6

$ lsb_release -a
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.5 LTS
Release:    14.04
Codename:   trusty

If you do not need child node sorting, just comment the corresponding line out.

Links: here, here

UPDATE v2 places header in front of anything else; fixed spaces

UPDATE getting lxml on Ubuntu 18.04.3 LTS bionic:

sudo apt install python-pip
pip install --upgrade lxml
$ python --version
Python 2.7.15+

How to format and indent xml file, Ubuntu, Linux, There's a very simple way to format and indent ugly.xml file to pretty.xml in Ubuntu Linux: How to format and indent XML file. xmllint --format  I am having a problem with some XML print files where the source system omits to convert some characters to their XML syntax equivalent (e.g. & is not converted to &amp;). Is there a way to Stack Overflow


Format your xml document using xmllint | Vim Tips Wiki, If you open an xml document that is either totally or partially unindented, you can use In Unix/Linux, if you don't like the default indentation (2 spaces), set the  The program that's creating the XML file is more than likely able to save the same file to a different format. For example, a simple text editor, which can open a text document like XML, can usually save the file to another text-based format like TXT.


Reformatting a large number of XML files, This can be done from find directly using -exec : find . -name "*.xml" -type f -exec xmllint --output '{}' --format '{}' \;. What's passed to -exec will be invoked once  To create a new file, type the following command at the terminal prompt (replacing “sample.txt” with whatever file name you want to use), and then press Enter: > sample.txt. You are given no indication that the file was created, but you can use the ls command to verify the existence of your new file: ls -l sample.txt


How to pretty print XML on GNU/Linux, The obfuscated XML files are difficult to read because the content of the file occupies a single line. Today LibreByte offers 3 tools that allow you to format… How do you format code in Visual Studio Code (VSCode) On Linux Ctrl + Shift Some times I copy paste xml into a new file and want to format it without saving