I have a multiline string which is delimited by a set of different delimiters:


I can split this string into its parts, using String.split, but it seems that I can't get the actual string, which matched the delimiter regex.

In other words, this is what I get:

  • Text1
  • Text2
  • Text3
  • Text4

This is what I want

  • Text1
  • DelimiterA
  • Text2
  • DelimiterC
  • Text3
  • DelimiterB
  • Text4

Is there any JDK way to split the string using a delimiter regex but also keep the delimiters?

You can use Lookahead and Lookbehind. Like this:


And you will get:

[a;, b;, c;, d]
[a, ;b, ;c, ;d]
[a, ;, b, ;, c, ;, d]

The last one is what you want.

((?<=;)|(?=;)) equals to select an empty character before ; or after ;.

Hope this helps.

EDIT Fabian Steeg comments on Readability is valid. Readability is always the problem for RegEx. One thing, I do to help easing this is to create a variable whose name represent what the regex does and use Java String format to help that. Like this:

static public final String WITH_DELIMITER = "((?<=%1$s)|(?=%1$s))";
public void someMethod() {
final String[] aEach = "a;b;c;d".split(String.format(WITH_DELIMITER, ";"));

This helps a little bit. :-D

A very naive solution, that doesn't involve regex would be to perform a string replace on your delimiter along the lines of (assuming comma for delimiter):

string.replace(FullString, "," , "~,~")

Where you can replace tilda (~) with an appropriate unique delimiter.

Then if you do a split on your new delimiter then i believe you will get the desired result.

Split Strings and Keep the Delimiter, But still, there are some cases, when we want to keep them when splitting the string. 'one.two.three'.split(/(?<=\.)/); // ['one.',  To keep several delimiters as a whole. The whole idea being, as you want to split but keep all the characters, to match positions only.

import java.util.regex.*;
import java.util.LinkedList;

public class Splitter {
    private static final Pattern DEFAULT_PATTERN = Pattern.compile("\\s+");

    private Pattern pattern;
    private boolean keep_delimiters;

    public Splitter(Pattern pattern, boolean keep_delimiters) {
        this.pattern = pattern;
        this.keep_delimiters = keep_delimiters;
    public Splitter(String pattern, boolean keep_delimiters) {
        this(Pattern.compile(pattern==null?"":pattern), keep_delimiters);
    public Splitter(Pattern pattern) { this(pattern, true); }
    public Splitter(String pattern) { this(pattern, true); }
    public Splitter(boolean keep_delimiters) { this(DEFAULT_PATTERN, keep_delimiters); }
    public Splitter() { this(DEFAULT_PATTERN); }

    public String[] split(String text) {
        if (text == null) {
            text = "";

        int last_match = 0;
        LinkedList<String> splitted = new LinkedList<String>();

        Matcher m = this.pattern.matcher(text);

        while (m.find()) {


            if (this.keep_delimiters) {

            last_match = m.end();


        return splitted.toArray(new String[splitted.size()]);

    public static void main(String[] argv) {
        if (argv.length != 2) {
            System.err.println("Syntax: java Splitter <pattern> <text>");

        Pattern pattern = null;
        try {
            pattern = Pattern.compile(argv[0]);
        catch (PatternSyntaxException e) {

        Splitter splitter = new Splitter(pattern);

        String text = argv[1];
        int counter = 1;
        for (String part : splitter.split(text)) {
            System.out.printf("Part %d: \"%s\"\n", counter++, part);

    > java Splitter "\W+" "Hello World!"
    Part 1: "Hello"
    Part 2: " "
    Part 3: "World"
    Part 4: "!"
    Part 5: ""

I don't really like the other way, where you get an empty element in front and back. A delimiter is usually not at the beginning or at the end of the string, thus you most often end up wasting two good array slots.

Edit: Fixed limit cases. Commented source with test cases can be found here:

How can I split a string in Java and retain the delimiters?

I got here late, but returning to the original question, why not just use lookarounds?

Pattern p = Pattern.compile("(?<=\\w)(?=\\W)|(?<=\\W)(?=\\w)");


[', ab, ',', cd, ',', eg, ']
[boo, :, and, :, foo]

EDIT: What you see above is what appears on the command line when I run that code, but I now see that it's a bit confusing. It's difficult to keep track of which commas are part of the result and which were added by Arrays.toString(). SO's syntax highlighting isn't helping either. In hopes of getting the highlighting to work with me instead of against me, here's how those arrays would look it I were declaring them in source code:

{ "'", "ab", "','", "cd", "','", "eg", "'" }
{ "boo", ":", "and", ":", "foo" }

I hope that's easier to read. Thanks for the heads-up, @finnw.

  • Come to think of it, where do you want to keep the delimiters? Along with words or separate? In the first case, would you attach them to preceding or following word? In the second case, my answer is what you need...
  • Just implemented a class which should help you achieve what you are looking for. See below
  • Very nice! Here we can see again the power of regular expressions!!
  • Nice to see there is a way to do this with String#split, though I wish there was a way to include the delimiters as there was for the StringTokenizer - split(";", true) would be so much more readable than split("((?<=;)|(?=;))").
  • That should be: String.format(WITH_DELIMITER, ";"); as format is a static method.
  • One complication I just encountered is variable-length delimiters (say [\\s,]+) that you want to match completely. The required regexes get even longer, as you need additional negative look{ahead,behind}s to avoid matching them in the middle, eg. (?<=[\\s,]+)(?![\\s,])|(?<![\\s,])(?=[\\s,]+).
  • what if I want split by two delimiters? let's say ';' or '.'
  • Note that this will only work for relatively simple expressions; I got a "Look-behind group does not have an obvious maximum length" trying to use this with a regex representing all real numbers.
  • FYI: Merged from…
  • Wahoo... Thank you for participating! Interesting approach. I am not sure it can be help consistently (with that, sometimes there is a delimiter, sometimes there is not), but +1 for the effort. However, you still need to properly address the limit cases (empty or null values)