Split string by line breaks, keeping quoted segments

split string with double quotes in c#
postgres split string
regex split on space unless in quotes
split string by comma and ignore comma in double quotes javascript
postgresql split string to rows
postgres split string to array
php split string by length
c# split string by comma but not inside quotes

I have a csv file that is read into a string, and I need to separate that string by line breaks keeping quoted segments. The reason the quotes are being used is because some fields of the file have line breaks within.

Basically, I have a file that is like this (I'm using | to represent the separator):

This is | a | line

This is | a line too | "but this field has

a line break"

This is | another | line

I know I can use a regex with the .split() function, but I'm having trouble with it. Can anyone help?

I'm expecting a array like ["This is | a | line", "This is | a line too | but this field has\na line break", "This is | another | line"]


As a simplest solution, we can first mark(replace with some identifier) the line breaks that we would not want to split at.

Then split at all other line breaks and finally replace the preserved break identifiers with line breaks again(\n).

arr = str.replace(/("[\s\S]*?")/g, (m, cg) => {
        return cg.replace(/\n/g, "LINE-BREAK-TO-PRESERVE");
      })
      .split('\n')
      .filter(i => Boolean(i.trim()))
      .map(i => i.replace(/LINE-BREAK-TO-PRESERVE/g, '\n'));

Above code should fill your purpose smoothly :)

Split Field by Line Break in PostgreSQL • Lexy Kassan, Select whatever field you need to split (or you can enter a text string as the first In this case, we are using the new line character, \n. into one value, such as a |, in which case you would put that between the quotes (eg. '|'). Indicate which section of the split field you want to keep in the third parameter. Teams. Q&A for Work. Stack Overflow for Teams is a private, secure spot for you and your coworkers to find and share information.


Honestly this is a simple enough problem that even regex seems like overkill. I would just iterate through the string, and whenever you find a line break and aren't inside a quote, push the substring that you've found so far onto the array:

var arr = []
var inQuote = false;
var str = `This is | a | line
This is | a line too | "but this field has
a line break"
This is | another | line`
for (var pos = 0; pos < str.length; pos++) {
    if (str.charAt(pos) == "\n" && !inQuote) {
        arr.push(str.slice(0, pos));
        str = str.slice(pos + 1);
        pos = 0;
    } else if (str.charAt(pos) == '"') {
        inQuote = !inQuote;
        // if you want to get rid of the quotes:
        str = str.slice(0, pos) + str.slice(pos + 1)
        pos--
    }
}
arr.push(str)
console.log(arr)

split - Manual, Returns an array of strings, each of which is a substring of string formed by It seemed to split on a comma even though it was between a pair of quotes. get a maximum of 2 array parts separated by the first new line (independant if saved under I kept running into the same issue Chris Tyler experienced with lewis [ at t]� Tip. split() is deprecated as of PHP 5.3.0. preg_split() is the suggested alternative to this function. If you don't require the power of regular expressions, it is faster to use explode(), which doesn't incur the overhead of the regular expression engine.


Try this;

("[^"\n]*)\r?\n(?!(([^"]*"){2})*[^"]*$)

Demo: https://regex101.com/r/wL9sQ4/82

Splitting a string into words or double-quoted substrings, it on spaces, unless its grouped into speech marks, the same way the command line does it. string baseString = "This is a \"Very Long Test\""; var re = new Regex("(? Notice that if there is a double quote in the middle of a word it will be split inDelimitedString; break; case ' ': if (!inDelimitedString) { if ( currentToken. The start and end quotation marks for a character vector must appear on the same line. For example, this code returns an error, because each line contains only one quotation mark: mytext = 'Accelerating the pace of


As in another answer, it may be better to use a loop because even with the knowledge of the separator, it is hard to check if the quote is in the middle of data (as a literal) or the quote is acting as a quote.

That said, this regex should serve the purpose for the given case:

/(?<!\|\s+"[\w\s]+)\n/
console.log(
`This is | a | line
This is | a line too | "but this field has
a line break"
This is | another | line`.split(/(?<!\|\s+"[\w\s]+)\n/)
)

?<! is negative lookbehind, which means the part not in the bracket (\n) is matched only when the part in the bracket is not matched.

The part in the bracket is Separator (|) followed by more than 1 space \s+ immediately followed by a quote " and then followed by a mixture of words and spaces.

Hope this helps. \s+ can be modified to \s* and [\w\s]+ may also be changed to [^"]* as desired.

Demo

String.Split("\n") ignore breaks in "quotes" : csharp, I'm trying to read from a text file and make a array that holds each line of text. I do so by Split('\n');. However I want to ignore line breaks that occur within quotes. Is there Delimiters = new string[] { "," }; while (true) { string[] parts = parser. It's as if these beginners keep looking for some magical answer which doesn't exist. The way to learn PowerShell is to browse and nibble, rather than to sit down to a formal five-course meal. In his continuing series on Powershell one-liners, Michael Sorens provides Fast Food for busy professionals who want results quickly and aren't too faddy. Part 3 has, as its tasty confections, collections, hashtables, arrays and strings.


Supporting linebreaks inside double quoted csv fields � Issue #4 , fields = line.split(',') is a good idea; csv with linebreak is not context free anymore. So the main regex in Atom consists of 10 identical parts: (?:((? mechatroner changed the title Commas or newlines in quoted entries are to keep the original file i.e. it is acceptable to replace newlines with e.g. 4 spaces. Removes all trailing whitespace of string. 31: split(str="", num=string.count(str)) Splits string according to delimiter str (space if not provided) and returns list of substrings; split into at most num substrings if given. 32: splitlines( num=string.count(' ')) Splits string at all (or num) NEWLINEs and returns a list of each line with


Chapter 6 – Manipulating Strings, Any quotes, tabs, or newlines in between the “triple quotes” are considered part of By slicing and storing the resulting substring in another variable, you can� Microsoft Manual of Style: 7. Practical issues of style: Line breaks. Try to keep headings on one line. If a two-line heading is unavoidable, break the lines so that the first line is longer. Do not break headings by hyphenating words, and avoid breaking a heading between the parts of a hyphenated word.


Split strings at newline characters - MATLAB splitlines, You can use + to concatenate text onto the end of a string. Starting in R2017a, you can create strings using double quotes. str = "In Xanadu did Kubla Khan"; str =� Python string method replace() returns a copy of the string in which the occurrences of old have been replaced with new, optionally restricting the number of replacements to max. Syntax. Following is the syntax for replace() method − str.replace(old, new[, max]) Parameters. old − This is old substring to be replaced.