Why String.replaceAll() in java requires 4 slashes "\\\\" in regex to actually replace "\"?

replace forward slash in java
replace backslash in java

I recently noticed that, String.replaceAll(regex,replacement) behaves very weirdly when it comes to the escape-character "\"(slash)

For example consider there is a string with filepath - String text = "E:\\dummypath" and we want to replace the "\\" with "/".

text.replace("\\","/") gives the output "E:/dummypath" whereas text.replaceAll("\\","/") raises the exception java.util.regex.PatternSyntaxException.

If we want to implement the same functionality with replaceAll() we need to write it as, text.replaceAll("\\\\","/")

One notable difference is replaceAll() has its arguments as reg-ex whereas replace() has arguments character-sequence!

But text.replaceAll("\n","/") works exactly the same as its char-sequence equivalent text.replace("\n","/")

Digging Deeper: Even more weird behaviors can be observed when we try some other inputs.

Lets assign text="Hello\nWorld\n"

Now, text.replaceAll("\n","/"), text.replaceAll("\\n","/"), text.replaceAll("\\\n","/") all these three gives the same output Hello/World/

Java had really messed up with the reg-ex in its best possible way I feel! No other language seems to have these playful behaviors in reg-ex. Any specific reason, why Java messed up like this?

@Peter Lawrey's answer describes the mechanics. The "problem" is that backslash is an escape character in both Java string literals, and in the mini-language of regexes. So when you use a string literal to represent a regex, there are two sets of escaping to consider ... depending on what you want the regex to mean.

But why is it like that?

It is a historical thing. Java originally didn't have regexes at all. The syntax rules for Java String literals were borrowed from C / C++, which also didn't have built-in regex support. Awkwardness of double escaping didn't become apparent in Java until they added regex support in the form of the Pattern class ... in Java 1.4.

So how do other languages manage to avoid this?

They do it by providing direct or indirect syntactic support for regexes in the programming language itself. For instance, in Perl, Ruby, Javascript and many other languages, there is a syntax for patterns / regexs (e.g. '/pattern/') where string literal escaping rules do not apply. In C# and Python, they provide an alternative "raw" string literal syntax in which backslashes are not escapes. (But note that if you use the normal C# / Python string syntax, you have the Java problem of double escaping.)


Why do text.replaceAll("\n","/"), text.replaceAll("\\n","/"), and text.replaceAll("\\\n","/") all give the same output?

The first case is a newline character at the String level. The Java regex language treats all non-special characters as matching themselves.

The second case is a backslash followed by an "n" at the String level. The Java regex language interprets a backslash followed by an "n" as a newline.

The final case is a backslash followed by a newline character at the String level. The Java regex language doesn't recognize this as a specific (regex) escape sequence. However in the regex language, a backslash followed by any non-alphabetic character means the latter character. So, a backslash followed by a newline character ... means the same thing as a newline.

Learning Java, However, under Solaris, Java accepts only paths with forward slashes. for instance the forward slash (/), and do a String replace to substitute for the localized when you need it: String [] path = { "mail", "2004", "june", "merle" }; StringBuffer sb = new toString() ); One thing to remember is that Java interprets the backslash  Note that the doubling is required as \ is an escape character for Java string (and character) literals. Note that as replace doesn't treat the inputs as regular expression patterns, there's no need to perform further doubling, unlike replaceAll.

You need to esacpe twice, once for Java, once for the regex.

Java code is

"\\\\"

makes a regex string of

"\\" - two chars

but the regex needs an escape too so it turns into

\ - one symbol

How to globally replace a forward slash in a JavaScript string , A Computer Science portal for geeks. Interview Experiences, ISRO, Java, Java Programs, Java Quiz, JavaScript, JQuery, JS++ Method 1: Using replace() method with a regular expression: The replace() Method 2: Splitting in place of the forward-slash and joining it back with required string: The split() method is used  Java n'avait pas de regexe à l'origine. Syntaxe les règles pour la littérature Java String ont été empruntées à C / C++, qui n'avait pas non plus de support regex intégré. La maladresse de la double évasion n'est apparue en Java que lorsqu'ils ont ajouté le support regex sous la forme de la classe Pattern dans Java 1.4.

1) Let's say you want to replace a single \ using Java's replaceAll method:

   \
   ˪--- 1) the final backslash

2) Java's replaceAll method takes a regex as first argument. In a regex literal, \ has a special meaning, e.g. in \d which is a shortcut for [0-9] (any digit). The way to escape a metachar in a regex literal is to precede it with a \, which leads to:

 \ \
 | ˪--- 1) the final backslash
 |
 ˪----- 2) the backslash needed to escape 1) in a regex literal

3) In Java, there is no regex literal: you write a regex in a string literal (unlike JavaScript for example, where you can write /\d+/). But in a string literal, \ also has a special meaning, e.g. in \n (a new line) or \t (a tab). The way to escape a metachar in a string literal is to precede it with a \, which leads to:

\\\\
|||˪--- 1) the final backslash
||˪---- 3) the backslash needed to escape 1) in a string literal
|˪----- 2) the backslash needed to escape 1) in a regex literal
˪------ 3) the backslash needed to escape 2) in a string literal

Guide to Escaping Characters in Java RegExps, Learn how to escape special characters in Java Regular Expressions. This test shows that for a given input string foof when the pattern foo. Therefore, we need to double the backslash character when using it to precede any character Let's look at how the replaceAll() method of java.util.regex. problems with replaceAll and slash character . jean-gobert de coster. Ranch Hand Posts: 49. posted 11 years ago. Hi, [Campbell@localhost java]$ javac -d

This is because Java tries to give \ a special meaning in the replacement string, so that \$ will be a literal $ sign, but in the process they seem to have removed the actual special meaning of \

While text.replaceAll("\\\\","/"), at least can be considered to be okay in some sense (though it itself is not absolutely right), all the three executions, text.replaceAll("\n","/"), text.replaceAll("\\n","/"), text.replaceAll("\\\n","/") giving same output seem even more funny. It is just contradicting as to why they have restricted the functioning of text.replaceAll("\\","/") for the same reason.

Java didn't mess up with regular expressions. It is because, Java likes to mess up with coders by trying to do something unique and different, when it is not at all required.

String Replace and String Split in Java with Examples, String replace is the process by which we replace parts of a string with com.​javajee.abc) with forward slash (/) to display the corresponding folder The signature of replaceAll() is replaceAll(String regex, String replacement). To match for a metacharacter, we need to first treat it as a regular character. Using the following way, we can easily replace a backslash in Java. The replaceAll() method, as you know, takes two parameters out of which, the first one is the regular expression (aka regex) and the next one is the replacement.

One way around this problem is to replace backslash with another character, use that stand-in character for intermediate replacements, then convert it back into backslash at the end. For example, to convert "\r\n" to "\n":

String out = in.replace('\\','@').replaceAll("@r@n","@n").replace('@','\\');

Of course, that won't work very well if you choose a replacement character that can occur in the input string.

Regular Expressions :: Eloquent JavaScript, But they are a powerful tool for inspecting and processing strings. First, since a forward slash ends the pattern, we need to put a backslash before any The real power of using regular expressions with replace comes from the fact that we  The first case is a newline character at the String level. The Java regex language treats all non-special characters as matching themselves. The second case is a backslash followed by an "n" at the String level. The Java regex language interprets a backslash followed by an "n" as a newline.

In Java, how can I replace a forward slash in a string with a , so two \ are required, and since java needs a double \ for each intended one, the code will become like this. @Test. public void test(). {. String s = "a\\a";. String x  The problem is actually that you need to double-escape backslashes in the replacement string. You see, "\\/" (as I'm sure you know) means the replacement string is \/, and (as you probably don't know) the replacement string \/ actually just inserts /, because Java is weird, and gives \ a special meaning in the replacement string.

How to replace this character '/' in a string in Java, Else, you would probably have an Array of<Char>(), go through the first string char by char (with probably a for loop and index access), and if the char is not a “/​” new line character or “ character), you'll need to use the escaped version (like \n or. Java: String replace() method - contains various useful methods related to  Obviously a forward slash is a special character in a String, but a forward slash is also a special character in a regex. The forward slash has to be escaped with another forward slash in order for the regex engine to view it as a literal '\' character. Both forward slashes have to be escaped so that the String class views them as literal '\'.

Using Regular Expressions in Java, RegexBuddy—The best regex editor and tester for Java developers! Unless you need to support older versions of the JDK, the java.util.regex It is important to remember that String.matches() only returns true if the entire string can be matched. replaceAll("regex", "replacement") replaces all regex matches inside the  The problem here is that a backslash is (1) an escape chararacter in Java string literals, and (2) an escape character in regular expressions – each of this uses need doubling the character, in effect needing 4 \ in row. Of course, as Bozho said, you need to do something with the result (assign it to some variable) and not throw it away.

Comments
  • I completely agree with your last statement.
  • String literals and regex expressions.
  • Yesterday I wrote str.replaceAll("\\\\", "\\\\\\\\") So don't feel too bad.
  • @Cruncher I understand :) To replace "\" with "\\" we have to go through that!
  • @Bharath And the reason I replaced "\" with "\\" was so that I would get the escape for a regular expression used later.
  • Escape twice? Once for Java and Once for regex? Then by the same logic, can you explain me why, text.replaceAll("\n","/"), text.replaceAll("\\n","/"), text.replaceAll("\\\n","/") all these give the same output. Where does the escape twice once for java and once for regex logic go there?
  • \n is plain newline character, this conversion is at compile time. \\n tells the regex Pattern processors to decode the newline, something it does at runtime. The \\\n means take the \n as a literal. This ends up being the same thing in a more complicated way. The reason there are multiple ways of doing this, is that the `\` is supported for all literal characters and just happens to work for characters which are taken literally anyway.
  • Yeah I understand now, particularly after Stephen's answer :) Thanks a lot for the help :)
  • I think this is an awesome answer, which clearly explains what is going on behind the scenes. Surprising that I am first upvoter !
  • The behavior you're talking about is not incorrect, it's merely inconvenient. If you understand the escaping rules for string literals and regexes, it makes perfect sense. It might have been better if they had used $$ (like the .NET flavor does) instead of \$ to escape dollar signs in the replacement string, but the current design is not broken.
  • you are right. I dint understand earlier before Stephen's answer, that they had to do it this way because, "reg-ex" were written latter and "\" has special meaning in string literals before that itself and hence the double-escaping. Now I understand the escaping rules of both string literals and regexes, and it makes perfect sense :)
  • quoteReplacement() is meant to be used on the replacement string, which is the second argument. The first argument is the regex, and to escape it you use java.util.regex.Pattern.quote(String).