Howto clean comments from raw sql file

remove comments from sql script
python remove comments from file
python sql file parser
python remove c-style comments
python regex comment line
sqlparse sql where
python sql validator
python sql parse tree

I have problem with cleaning comments and empty lines from already existing sql file. The file has over 10k lines so cleaning it manually is not an option.

I have a little python script, but I have no idea how to handle comments inside multi line inserts.

Code:
f = file( 'file.sql', 'r' )
t = filter( lambda x: not x.startswith('--') \
            and not x.isspace() 
  , f.readlines() )
f.close()
t #<- here the cleaned data should be
How it should work:

This should be cleaned:

-- normal sql comment

This should stay as it is:

CREATE FUNCTION func1(a integer) RETURNS void
    LANGUAGE plpgsql
    AS $$
BEGIN
        -- comment
       [...]
END;
$$;

INSERT INTO public.texts (multilinetext) VALUES ('
and more lines here \'
-- part of text 
\'
[...]

');

Try the sqlparse module.

Updated example: leaving comments inside insert values, and comments within CREATE FUNCTION blocks. You can tweak further to tune the behavior:

import sqlparse
from sqlparse import tokens

queries = '''
CREATE FUNCTION func1(a integer) RETURNS void
    LANGUAGE plpgsql
        AS $$
        BEGIN
                -- comment
       END;
       $$;
SELECT -- comment
* FROM -- comment
TABLE foo;
-- comment
INSERT INTO foo VALUES ('a -- foo bar');
INSERT INTO foo
VALUES ('
a 
-- foo bar'
);

'''

IGNORE = set(['CREATE FUNCTION',])  # extend this

def _filter(stmt, allow=0):
    ddl = [t for t in stmt.tokens if t.ttype in (tokens.DDL, tokens.Keyword)]
    start = ' '.join(d.value for d in ddl[:2])
    if ddl and start in IGNORE:
        allow = 1
    for tok in stmt.tokens:
        if allow or not isinstance(tok, sqlparse.sql.Comment):
            yield tok

for stmt in sqlparse.split(queries):
    sql = sqlparse.parse(stmt)[0]
    print sqlparse.sql.TokenList([t for t in _filter(sql)])

Output:

CREATE FUNCTION func1(a integer) RETURNS void
    LANGUAGE plpgsql
        AS $$
        BEGIN
                -- comment
       END;
       $$;

SELECT * FROM TABLE foo;

INSERT INTO foo VALUES ('a -- foo bar');

INSERT INTO foo
VALUES ('
a
-- foo bar'
);

python - Howto clean comments from raw sql file, Howto clean comments from raw sql file comments from functions and stored procedures--just don't add the "remove comments" code to the  The data came from a source outside SQL Server, and the table into which it arrived consisted of an identity primary key; all the remaining columns were stored as varchar(50). Problem solving

Adding an updated answer :)

import sqlparse

sql_example = """--comment
SELECT * from test;
INSERT INTO test VALUES ('
-- test
a
');
 """
print sqlparse.format(sql_example, strip_comments=True).strip()

Output:

SELECT * from test;
INSERT INTO test VALUES ('
-- test
a
');

It achieves the same result but also covers all other corner cases and more concise

Removing comments from SQL scripts, That's useful if you want to preserve the original position of each SQL command, in case you need to manipulate the original script while  SQL Comments. Comments are used to explain sections of SQL statements, or to prevent execution of SQL statements.

This is an extend of samplebias answer that work with your example :

import sqlparse

sql_example = """--comment
SELECT * from test;
INSERT INTO test VALUES ('
-- test
a
');
"""

new_sql = []

for statement in sqlparse.parse(sql_example):
    new_tockens = [stm for stm in statement.tokens 
                   if not isinstance(stm, sqlparse.sql.Comment)]

    new_statement = sqlparse.sql.TokenList(new_tockens)
    new_sql.append(new_statement.to_unicode())

print sqlparse.format("\n".join(new_sql))

Output:

SELECT * from test;

INSERT INTO test VALUES ('
-- test
a
');

-- (Comment) (Transact-SQL), Indicates user-provided text. Comments can be inserted on a separate line, nested at the end of a Transact-SQL command line, or within a  I am trying to write a sp, which adjust the number of columns based on the IN Parameter for the raw file. Raw (Id Int,RawData varchar(max)) RawData has actual data with delimiter, this column has all the raw file imported as is.

It is possible to do it with regular expressions. First you have to split the file by strings and after this you can split the file by comments. The following Perl program does it:

#! /usr/bin/perl -w

# Read hole file.
my $file = join ('', <>);

# Split by strings including the strings.
my @major_parts = split (/('(?:[^'\\]++|\\.)*+')/, $file);

foreach my $part (@major_parts) {
    if ($part =~ /^'/) {
        # Print the part if it is a string.
        print $part; 
    }
    else {
        # Split by comments removing the comments
        my @minor_parts = split (/^--.*$/m, $part);
        # Print the remaining parts.
        print join ('', @minor_parts);
    }
}

SQL command to delete 'some' comments, Support » Fixing WordPress » SQL command to delete 'some' comments the comment_content column is not indexed, the query could potentially take a long time to execute. I believe it would take a PHP script to do what you want in (2). In the below code, we use the TRY_PARSE function in T-SQL to replace invalid dates and integers with NULL values and on smaller data sets this functions well. Because we have a few records here (10,004), these try-parses execute quickly (less than a second). However, if we had more data and more potential for bad data, these might take hours. I

MySQL Comment In Depth, This tutorial shows you how to use the MySQL comment to document SQL code and These comments allow you to embed SQL code that will execute only in  In this 70th episode of the Notes from the Fields series database expert Tim Mitchell (partner at Linchpin People) shares very interesting conversation related to how to use raw files in SSIS. SQL Server Integration Services is well designed for retrieving and processing data on the fly, directly in the data flow pipeline.

3 Easy Ways to Delete All Comments on WordPress, Do you want to purge your WordPress comments section? Check out Follow these steps to delete all WordPress comments using an SQL query: From your This image shows you how to execute the SQL query. Repeat  Basic raw SQL queries. You can use the FromSqlRaw extension method to begin a LINQ query based on a raw SQL query. FromSqlRaw can only be used on query roots, that is directly on the DbSet<>. var blogs = context.Blogs .FromSqlRaw("SELECT * FROM dbo.Blogs") .ToList(); Raw SQL queries can be used to execute a stored procedure.

Easy Comment Management via SQL Queries, Sometimes it's easier to modify comment status and delete unwanted Given this information, we may execute the following SQL queries (via  use this command DBCC LOG ('my_table', 3) in sql server management studio and you'll see a table that contents logs, if you read it you'll see in a column called [comment] a list of operation made like update insert, and you'll see the date of the operation and many other informations. – Neuvill Mar 19 '12 at 9:02.

Docs, The sql file can also contain comments of either of the following formats: Set to true to remove any comments in the SQL before executing, otherwise false. The BCP (Bulk Copy Program) utility is a command line that program that bulk-copies data between a SQL instance and a data file using a special format file. The BCP utility can be used to import large numbers of rows into SQL Server or export SQL Server data into files.

Comments
  • Can you keep track of how many non-escaped quotes you've passed? An odd number means the comment is part of a string and thus should not be removed.
  • but it will still remove comments from stored procedures
  • I'm slightly off topic, but maybe your real problem is that your SQL file is 10K lines. Or that you're not using version control. Or both. Something like our process might help you. See my answer for stackoverflow.com/questions/5330065/…
  • @Catcall I have both, the file is produced from several smaller files, but for production environment deployment the comments are not needed (they take around 60% of file)
  • It might be easier to change the makefile. It's certainly easier to skip removing comments from functions and stored procedures--just don't add the "remove comments" code to the part of the make that builds the functions (or the file full of functions).
  • thanks for a tip, but this is also stripping comments from inside functions, is there a way to prevent it ?
  • @Szymon Lukaszczyk : i was just living the same comment for @samplebias :) , look at my answer if you want something that work with your example.
  • @Szymon updated to show the raw tokens, which you can filter by iterating over them.
  • almost, still fails on the INSERT from my example
  • @Szymon I updated it to handle that case. This should provide enough of a baseline which you can extend and modify.
  • print(sqlparse.format("\n".join(new_sql))) returns u'\n' after coming the example to python