Removing all whitespace lines from a multi-line string efficiently

remove whitespace from string c#
remove space between string in c#
c# remove whitespace and newlines from string
c# remove whitespace from end of string
golang remove all whitespace from string
c# remove extra spaces
regex remove all whitespace
linq remove whitespace

In C# what's the best way to remove blank lines i.e., lines that contain only whitespace from a string? I'm happy to use a Regex if that's the best solution.

EDIT: I should add I'm using .NET 2.0.


Bounty update: I'll roll this back after the bounty is awarded, but I wanted to clarify a few things.

First, any Perl 5 compat regex will work. This is not limited to .NET developers. The title and tags have been edited to reflect this.

Second, while I gave a quick example in the bounty details, it isn't the only test you must satisfy. Your solution must remove all lines which consist of nothing but whitespace, as well as the last newline. If there is a string which, after running through your regex, ends with "/r/n" or any whitespace characters, it fails.

If you want to remove lines containing any whitespace (tabs, spaces), try:

string fix = Regex.Replace(original, @"^\s*$\n", string.Empty, RegexOptions.Multiline);

Edit (for @Will): The simplest solution to trim trailing newlines would be to use TrimEnd on the resulting string, e.g.:

string fix =
    Regex.Replace(original, @"^\s*$\n", string.Empty, RegexOptions.Multiline)
         .TrimEnd();

Efficient way to remove ALL whitespace from String?, This is fastest way I know of, even though you said you didn't want to use regular expressions: Regex.Replace(XML, @"\s+", ""). The first half catches all whitespace at the start of the string until the first non-whitespace line, or all whitespace between non-whitespace lines. The second half snags the remaining whitespace in the string, including the last non-whitespace line's newline.

string outputString;
using (StringReader reader = new StringReader(originalString)
using (StringWriter writer = new StringWriter())
{
    string line;
    while((line = reader.ReadLine()) != null)
    {
        if (line.Trim().Length > 0)
            writer.WriteLine(line);
    }
    outputString = writer.ToString();
}

Removing multiple blanks for a string - MATLAB Answers, Learn more about space, blank, isspace, strings MATLAB. suppose I have a string mystr='Tom and Jerry' I want to write a function that removes all the spaces Question: How are eight lines of code and a while-loop (containing strcmp and I realized the problem was when displaying strings in multiline edit box ghost  Free online string whitespace remover. Just load your string and it will automatically get all whitespace deleted. There are no ads, popups or nonsense, just a string whitespace stripper. Load a string, delete whitespace.

off the top of my head...

string fixed = Regex.Replace(input, "\s*(\n)","$1");

turns this:

fdasdf
asdf
[tabs]

[spaces]  

asdf


into this:

fdasdf
asdf
asdf

Introduce ability to remove leading indents in a multiline string after , In dart if I create a multiline string like this: String itemList = """ 1. Hmm, if we require the final line to be only whitespace, and all other lines starting with the Do you have a more efficient RegExp -based approach on mind? Given a String with white spaces, the task is to remove all white spaces from a string using Java built-in methods. Examples: Input: str = " Geeks for Geeks " Output: GeeksforGeeks Input: str = " A Computer Science Portal" Output: AComputerSciencePortal

Using LINQ:

var result = string.Join("\r\n",
                 multilineString.Split(new string[] { "\r\n" }, ...None)
                                .Where(s => !string.IsNullOrWhitespace(s)));

If you're dealing with large inputs and/or inconsistent line endings you should use a StringReader and do the above old-school with a foreach loop instead.

Remove spaces from a given string, An efficient C++ program to remove all spaces. // from a string. #include <​iostream>. using namespace std;. // Function to remove all spaces from a given string. World's simplest whitespace, tab and newline deleter. Just paste your text in the form below, press Remove All Spaces button, and you get a single string back with no spaces. Press button, get spaceless string. No ads, nonsense or garbage.

Alright this answer is in accordance to the clarified requirements specified in the bounty:

I also need to remove any trailing newlines, and my Regex-fu is failing. My bounty goes to anyone who can give me a regex which passes this test: StripWhitespace("test\r\n \r\nthis\r\n\r\n") == "test\r\nthis"

So Here's the answer:

(?<=\r?\n)(\s*$\r?\n)+|(?<=\r?\n)(\r?\n)+|(\r?\n)+\z

Or in the C# code provided by @Chris Schmich:

string fix = Regex.Replace("test\r\n \r\nthis\r\n\r\n", @"(?<=\r?\n)(\s*$\r?\n)+|(?<=\r?\n)(\r?\n)+|(\r?\n)+\z", string.Empty, RegexOptions.Multiline);

Now let's try to understand it. There are three optional patterns in here which I am willing to replace with string.empty.

  1. (?<=\r?\n)(\s*$\r?\n)+ - matches one to unlimited lines containing only white space and preceeded by a line break (but does not match the first preceeding line breaks).
  2. (?<=\r?\n)(\r?\n)+ - matches one to unlimited empty lines with no content that are preceeded by a line break (but does not match the first preceeding line breaks).
  3. (\r?\n)+\z - matches one to unlimited line breaks at the end of the tested string (trailing line breaks as you called them)

That satisfies your test perfectly! But also satisfies both \r\n and \n line break styles! Test it out! I believe this will be the most correct answer, although simpler expression would pass your specified bounty test, this regex passes more complex conditions.

EDIT: @Will pointed out a potential flaw in the last pattern match of the above regex in that it won't match multiple line breaks containing white space at the end of the test string. So let's change that last pattern to this:

\b\s+\z The \b is a word boundry (beginning or END of a word), the \s+ is one or more white space characters, the \z is the end of the test string (end of "file"). So now it will match any assortment of whitespace at the end of the file including tabs and spaces in addition to carriage returns and line breaks. I tested both of @Will's provided test cases.

So all together now, it should be:

(?<=\r?\n)(\s*$\r?\n)+|(?<=\r?\n)(\r?\n)+|\b\s+\z

EDIT #2: Alright there is one more possible case @Wil found that the last regex doesn't cover. That case is inputs that have line breaks at the beginning of the file before any content. So lets add one more pattern to match the beginning of the file.

\A\s+ - The \A match the beginning of the file, the \s+ match one or more white space characters.

So now we've got:

\A\s+|(?<=\r?\n)(\s*$\r?\n)+|(?<=\r?\n)(\r?\n)+|\b\s+\z

So now we have four patterns for matching:

  1. whitespace at the beginning of the file,
  2. redundant line breaks containing white space, (ex: \r\n \r\n\t\r\n)
  3. redundant line breaks with no content, (ex: \r\n\r\n)
  4. whitespace at the end of the file

Handy one-liners for SED, Output file # should contain no more than one blank line between lines of text. sed of file # delete trailing whitespace (spaces, tabs) from end of each line sed '​s/[ N;s/\n=/ /;ta' -e 'P;D' # add commas to numeric strings, changing "1234567" to efficient on large files # beginning at line 3, print every 7th line gsed -n '3~7p'​  In this post, we will see how to remove whitespaces from a string in C++. By default, the following characters are considered as whitespace characters: 1. space ' ' 2. line feed 'n' 3. carriage return 'r' 4. horizontal tab 't' 5. form feed 'f' 6. vertical tab 'v'.

Editing and Deleting Text, These range from moving around lines of text and duplicating lines to Atom also has built in functionality to re-flow a paragraph to hard-wrap at a given maximum line length. document and effectively execute the same commands in multiple places at once. The "Remove Trailing Whitespace" option is on by default. Line Break Removal Tool You can remove line breaks from blocks of text but preserve paragraph breaks with this tool. If you've ever received text that was formatted in a skinny column with broken line breaks at the end of each line, like text from an email or copy and pasted text from a PDF column with spacing, word wrap, or line break problems

String - core 1.0.5, A built-in representation for efficient string manipulation. multiline strings """​Triple double quotes let you create "multiline strings" which can have unescaped quotes and Using the escapes can be better if you need one of the many whitespace characters with different widths. Replace all occurrences of some substring. Python Remove Spaces from String. There are various ways to remove spaces from a string in Python. This tutorial is aimed to provide a short example of various functions we can use to remove whitespaces from a string.

sed, a stream editor, The following command prints only line 45 of the input file: sed -n '45p' file.txt of GNU sed. • Multiple commands syntax: Extension for easier scripting They perform two sed operations: deleting any lines matching the regular expression /^​foo/ , and replacing all occurrences of the string ' hello ' with ' world ': sed '/^foo/d  Recently had a need to remove the spaces from a string in powershell. Here is a really simple way to do so: [crayon-5ed70dcb0b483556756812/] That will return “TestAccountTestAccount”

Comments
  • A regex is quick and simple. What aspect are you trying to optimize when you say "the best way"? Readability? Time? Memory Use?
  • I'd say readability would be the most important in this case.
  • Readability rarely equates to regular expressions
  • Agreed they can get pretty hairy, but I think the one by Chris Schmich, for example, is fine.
  • \s+ instead of \s* would be better I think
  • @Salman Chris' rx is correct, as is my lonely, unappreciated answer. ;-(
  • @Salman A: \s+ would not work on totally empty lines, e.g. "foo\n\nbar".
  • This works, but it can leave an extraneous newline at the end.
  • @ChrisSchmich: Yes, purely regex. When you have several 100mb strings in memory, you don't want to create new instances that differ by only "/r/n". If I can get it in one pass, I can rest a little easier on the memory pressure.
  • +1 This one is nice since it should scale well for large strings.
  • Shouldn't this really be if (line.Trim().Length > 0) writer.WriteLine(line)? The OP did not request that all lines be trimmed in the output string.
  • What?! no love for the elegant regex? I am crushed.