Python AST with preserved comments

python ast to code
python ast visitor
python concrete syntax tree
ast parser
python tree library
ast get_docstring
nameerror: name 'ast' is not defined
literal_eval python 3

I can get AST without comments using

import ast
module = ast.parse(open('/path/to/module.py').read())

Could you show an example of getting AST with preserved comments (and whitespace)?


The ast module doesn't include comments. The tokenize module can give you comments, but doesn't provide other program structure.

Python AST with preserved comments, I can get AST without comments using import ast module = ast.parse(open('/path/​to/module. an example of getting AST with preserved  Python AST with preserved comments . Python AST with preserved comments. 0 votes. I can get AST without comments using import ast Recent in Python.


An AST that keeps information about formating, comments etc. is called a Full Syntax Tree.

redbaron is able to do this. Install with pip install redbaron and try the following code.

import redbaron

with open("/path/to/module.py", "r") as source_code:
    red = redbaron.RedBaron(source_code.read())

print (red.fst())

ast — Abstract Syntax Trees, AST . An abstract syntax tree can be compiled into a Python code object using the parser is modified to check and return type comments as specified by PEP  One thing about https://www.python.org/dev/peps/pep-0484/ is that it makes comments potentially semantically meaningful. Unfortunately the AST doesn't carry comments with it in any way, making it difficult to build a tool to implement a linter for PEP 484 using purely the ast module. Even if comments were carried along side-band and could do correlation by line number would be useful in this scenario.


This question naturally arises when writing any kind of Python code beautifier, pep-8 checker, etc. In such cases, you are doing a source-to-source transformations, you do expect the input to be written by human and not only want the output to be human-readable, but in addition expect it to:

  1. include all comments, exactly where they appear in the original.
  2. output the exact spelling of strings, including docstrings as in the original.

This is far from easy to do with the ast module. You could call it a hole in the api, but there seems to be no easy way to extend the api to do 1 and 2 easily.

Andrei's suggestion to use both ast and tokenize together is a brilliant workaround. The idea came to me also when writing a Python to Coffeescript converter, but the code is far from trivial.

The TokenSync (ts) class starting at line 1305 in py2cs.py coordinates communication between the token-based data and the ast traversal. Given the source string s, the TokenSync class tokenizes s and inits internal data structures that support several interface methods:

ts.leading_lines(node): Returns a list of the preceding comment and blank lines.

ts.trailing_comment(node): Return a string containing the trailing comment for the node, if any.

ts.sync_string(node): Return the spelling of the string at the given node.

It is straightforward, but just a bit clumsy, for the ast visitors to use these methods. Here are some examples from the CoffeeScriptTraverser (cst) class in py2cs.py:

def do_Str(self, node):
    '''A string constant, including docstrings.'''
    if hasattr(node, 'lineno'):
        return self.sync_string(node)

This works provided that ast.Str nodes are visited in the order they appear in the sources. This happens naturally in most traversals.

Here is the ast.If visitor. It shows how to use ts.leading_lines and ts.trailing_comment:

def do_If(self, node):

    result = self.leading_lines(node)
    tail = self.trailing_comment(node)
    s = 'if %s:%s' % (self.visit(node.test), tail)
    result.append(self.indent(s))
    for z in node.body:
        self.level += 1
        result.append(self.visit(z))
        self.level -= 1
    if node.orelse:
        tail = self.tail_after_body(node.body, node.orelse, result)
        result.append(self.indent('else:' + tail))
        for z in node.orelse:
            self.level += 1
            result.append(self.visit(z))
            self.level -= 1
    return ''.join(result)

The ts.tail_after_body method compensates for the fact that there are no ast nodes representing 'else' clauses. It's not rocket science, but it isn't pretty:

def tail_after_body(self, body, aList, result):
    '''
    Return the tail of the 'else' or 'finally' statement following the given body.
    aList is the node.orelse or node.finalbody list.
    '''
    node = self.last_node(body)
    if node:
        max_n = node.lineno
        leading = self.leading_lines(aList[0])
        if leading:
            result.extend(leading)
            max_n += len(leading)
        tail = self.trailing_comment_at_lineno(max_n + 1)
    else:
        tail = '\n'
    return tail

Note that cst.tail_after_body just calls ts.tail_after_body.

Summary

The TokenSync class encapsulates most of the complexities involved in making token-oriented data available to ast traversal code. Using the TokenSync class is straightforward, but the ast visitors for all Python statements (and ast.Str) must include calls to ts.leading_lines, ts.trailing_comment and ts.sync_string. Furthermore, the ts.tail_after_body hack is needed to handle "missing" ast nodes.

In short, the code works well, but is just a bit clumsy.

@Andrei: your short answer might suggest that you know of a more elegant way. If so, I would love to see it.

Edward K. Ream

horast · PyPI, human-oriented ast parser/unparser. Again, this will be preserved as comment in Python code, but it's useful for enhancing syntactic  How do we create a multi-line comment in Python? Python comments. Home. Community . Categories . Python AST with preserved comments.


google/py-ast-utils, Contribute to google/py-ast-utils development by creating an account on GitHub. "round tripping" between python source and an AST, preserving all formatting. them, with formatting (line breaks, spaces, comments, docstrings) preserved. Python AST Module. With the Python AST module, we can do a lot of things like modifying Python code and inspect it. The code can be parsed and modified before it is compiled to bytecode form. It is important to understand that each Abstract Syntax Tree represents each element in our Python code as an object.


If you're using python 3, you can use bowler, which is based on lib2to3, but provides a much nicer API and CLI for creating transformation scripts.

https://pybowler.io/

Python AST with preserved comments - Community, I can get AST without comments using import ast module = ast.parse(open('/path/​to/module.py').read()) Could you show an example of getting  However, when executed as Python, debugging will always end up False because directives are preserved as usual comments in Python and therefore they are ignored. Therefore, the Directive node is not meant to enable preprocessing of Python, at least for now.


Time loop, Python AST preserving whitespace and comments The problem I'm facing right now in Pythoscope is how to insert new and detect and replace  typed_ast is a Python 3 package that provides a Python 2.7 and Python 3 parser similar to the standard ast library. Unlike ast, the parsers in typed_ast include PEP 484 type comments and are independent of the version of Python under which they are run. The typed_ast parsers produce the standard Python AST (plus type comments), and are both fast and correct, as they are based on the CPython 2.7 and 3.6 parsers.


From AST to Lossless Syntax Tree, In contrast, we must preserve comments and formatting with the Lossless Syntax Tree. A better comparison is Python's 2to3, which converts  An abstract syntax tree can be generated by passing ast.PyCF_ONLY_AST as a flag to the compile() built-in function, or using the parse() helper provided in this module. The result will be a tree of objects whose classes all inherit from ast.AST. An abstract syntax tree can be compiled into a Python code object using the built-in compile() function.


Deciphering Python: How to use Abstract Syntax Trees (AST) to , This post explores Abstract Syntax Trees (AST), a vital part of how Python nodes and the type of data stored, yet the core idea of nodes and edges is the same. We make a small number of changes to Grammar (inserting optional TYPE_COMMENT tokens and to Python.asdl (adding fields to a few node types to hold the optional type comment), and a fair number of changes to ast.c to extract the type comments. By default, ast.parse() does not return type comments, since this would reject some perfectly good Python code (with a type comment in a place where the grammar doesn’t allow it). But passing an new flag will cause the tokenizer to process type