Trouble Percent-Encoding Spaces in Java

url encode
java url encode
java url encode space to %20
how to pass special characters in url in java
java uri encode
java urlencode space to %20
url encoding replace space with java
spring url encode

I am using the URLUTF8Encoder.java class from W3C (www.w3.org/International/URLUTF8Encoder.java).

Currently, it will encode any blank spaces ' ' into plus signs '+'.

I am having difficulty modifying the code to percent-encode the blank space into '%20'. Unfortunately, I am not too familiar with hex. Can anyone help me out? I need to modify this snippet...

else if (ch == ' ') { // space
                sbuf.append('+');

in the following code:

final static String[] hex = { "%00", "%01", "%02", "%03", "%04", "%05",
            "%06", "%07", "%08", "%09", "%0A", "%0B", "%0C", "%0D", "%0E",
            "%0F", "%10", "%11", "%12", "%13", "%14", "%15", "%16", "%17",
            "%18", "%19", "%1A", "%1B", "%1C", "%1D", "%1E", "%1F", "%20",
            "%21", "%22", "%23", "%24", "%25", "%26", "%27", "%28", "%29",
            "%2A", "%2B", "%2C", "%2D", "%2E", "%2F", "%30", "%31", "%32",
            "%33", "%34", "%35", "%36", "%37", "%38", "%39", "%3A", "%3B",
            "%3C", "%3D", "%3E", "%3F", "%40", "%41", "%42", "%43", "%44",
            "%45", "%46", "%47", "%48", "%49", "%4A", "%4B", "%4C", "%4D",
            "%4E", "%4F", "%50", "%51", "%52", "%53", "%54", "%55", "%56",
            "%57", "%58", "%59", "%5A", "%5B", "%5C", "%5D", "%5E", "%5F",
            "%60", "%61", "%62", "%63", "%64", "%65", "%66", "%67", "%68",
            "%69", "%6A", "%6B", "%6C", "%6D", "%6E", "%6F", "%70", "%71",
            "%72", "%73", "%74", "%75", "%76", "%77", "%78", "%79", "%7A",
            "%7B", "%7C", "%7D", "%7E", "%7F", "%80", "%81", "%82", "%83",
            "%84", "%85", "%86", "%87", "%88", "%89", "%8A", "%8B", "%8C",
            "%8D", "%8E", "%8F", "%90", "%91", "%92", "%93", "%94", "%95",
            "%96", "%97", "%98", "%99", "%9A", "%9B", "%9C", "%9D", "%9E",
            "%9F", "%A0", "%A1", "%A2", "%A3", "%A4", "%A5", "%A6", "%A7",
            "%A8", "%A9", "%AA", "%AB", "%AC", "%AD", "%AE", "%AF", "%B0",
            "%B1", "%B2", "%B3", "%B4", "%B5", "%B6", "%B7", "%B8", "%B9",
            "%BA", "%BB", "%BC", "%BD", "%BE", "%BF", "%C0", "%C1", "%C2",
            "%C3", "%C4", "%C5", "%C6", "%C7", "%C8", "%C9", "%CA", "%CB",
            "%CC", "%CD", "%CE", "%CF", "%D0", "%D1", "%D2", "%D3", "%D4",
            "%D5", "%D6", "%D7", "%D8", "%D9", "%DA", "%DB", "%DC", "%DD",
            "%DE", "%DF", "%E0", "%E1", "%E2", "%E3", "%E4", "%E5", "%E6",
            "%E7", "%E8", "%E9", "%EA", "%EB", "%EC", "%ED", "%EE", "%EF",
            "%F0", "%F1", "%F2", "%F3", "%F4", "%F5", "%F6", "%F7", "%F8",
            "%F9", "%FA", "%FB", "%FC", "%FD", "%FE", "%FF" };

public static String encode(String s) {
        StringBuffer sbuf = new StringBuffer();
        int len = s.length();
        for (int i = 0; i < len; i++) {
            int ch = s.charAt(i);
            if ('A' <= ch && ch <= 'Z') { // 'A'..'Z'
                sbuf.append((char) ch);
            } else if ('a' <= ch && ch <= 'z') { // 'a'..'z'
                sbuf.append((char) ch);
            } else if ('0' <= ch && ch <= '9') { // '0'..'9'
                sbuf.append((char) ch);
            } else if (ch == ' ') { // space
                sbuf.append('+');
            } else if (ch == '-'
                    || ch == '_' // unreserved
                    || ch == '.' || ch == '!' || ch == '~' || ch == '*'
                    || ch == '\'' || ch == '(' || ch == ')') {
                sbuf.append((char) ch);
            } else if (ch <= 0x007f) { // other ASCII
                sbuf.append(hex[ch]);
            } else if (ch <= 0x07FF) { // non-ASCII <= 0x7FF
                sbuf.append(hex[0xc0 | (ch >> 6)]);
                sbuf.append(hex[0x80 | (ch & 0x3F)]);
            } else { // 0x7FF < ch <= 0xFFFF
                sbuf.append(hex[0xe0 | (ch >> 12)]);
                sbuf.append(hex[0x80 | ((ch >> 6) & 0x3F)]);
                sbuf.append(hex[0x80 | (ch & 0x3F)]);
            }
        }
        return sbuf.toString();
    }

Thanks!

I won't ask why you're doing this, and just answer your question directly. Please read other answers to determine if you really want to be modifying this code. If you just remove the code:

else if (ch == ' ') { // space
   sbuf.append('+');
} 

It will do what you want, because the space character will be taken care of by the code:

else if (ch <= 0x007f) { // other ASCII
   sbuf.append(hex[ch]);
} 

URL Encoding Spaces with Percent (%20) Instead of Plus Sign (+) , How can I do URL encoding on spaces using percent symbols using Java's URLEncoder.encode(String test), it replaced the spaces with If so then the problem is that the other system isn't doing URL-decoding properly. URL Encoding, Percent-Encoding and Query Strings in Java. 09-Jul-2014 I’ve been lost in the details of URL encoding a number of times. Each time I figure it out, move on, then promptly forget everything about it. This article is nowhere near a complete reference, it exists mainly to jog my memory.

You might want to check out Apache Common's codec package, it's probably a lot more robust : http://commons.apache.org/codec/ - The package you're using is about 14 years old and only encodes into one type of encoding (www-url-form-encoded) - which REQUIRES spaces to be encoded as '+'. If you're trying do do standard URL encoding (which wants spaces as %20), you'll need to use a different package entirely.

How to URL Encode a String in Java, URL Encoding a Query string or Form parameter in Java. Java provides Note that Java's URLEncoder class encodes space character( " " ) into a + sign. This is � This is the java program to count number of words and spaces in a string. I have taken initial value of word as 1 because if there is only one word in the string then there will be no space. The total spaces will be one less than the total number of words.

Why are you using this class instead of the API method?

java.net.URLEncoder.encode("your string", "utf-8");

And why is it a problem that spaces are encoded as + characters? That is exactly how URL safe character encoding is supposed to work.

What is URL Encoding and How does it work?, URL Encoding Example. Space: One of the most frequent URL Encoded character you're likely to encounter is space . The ASCII value of space� next() can read the input only till the space. It can't read two words separated by space. Also, next() places the cursor in the same line after reading the input. nextLine() reads input including space between the words (that is, it reads till the end of line ). Once the input is read, nextLine() positions the cursor in the next line.

Just do this:

String str = "Hello World+You";
String encodedStr = URLEncoder.encode(str, "UTF-8");
encodedStr = encodedStr.replace("+", "%20");
System.out.println("Encoded String: " + encodedStr);

Percent-encoding, Percent-encoding, also known as URL encoding, is a method to encode information in a encodings as the basis for percent-encoding, leading to ambiguities and difficulty interpreting URIs reliably. rules, with a number of modifications such as newline normalization and replacing spaces with + instead of %20 . The "\\s" says it must contain any whitespace character. The last ".*" says there can be zero or more instances of any character after the space. When you put all those together, this returns true if there are one or more whitespace characters anywhere in the string. Here is a simple test you can run to benchmark your solution against:

You can use the built-in java.net.URI class, which is normally used via it's static builder as URI.create("http://example.com/search?param=42") but in case when a parameter contains literal space you can use it as:

URI uri = new URI("http", // scheme
    null,                 // user authentication info
    "example.com",        // domain
    -1,                   // port (use -1 for default port 80)
    "/search",            // path
    "param=four and two", // one or more parameters
    null);                // fragment (appended with the # char)
System.out.println(uri)
// OUTPUT:
// http://example.com/search?param=four%20and%20two

If you look inside this particular URI constructor you'll see that -1 can be used to specify the default port (80); explicitly passing 80 as constructor value will create a URL like http://example.com:80/search?param=four%20and%20two which you probably do not want.

The same constructor can be used to build only the query part of the URL which you can append to an existing string:

URI uri2 = new URI(null, null, null, -1, null, "param=four and two", null);
System.out.println(uri2)
// OUTPUT:
// ?param=four%20and%20two

Might be worth mentioning that a URI is not the same as a URL: file:/// is a valid URI but not a valid URL.

URL Encoding | Maps URLs, Some characters cannot be part of a URL (for example, the space) and The URI generic syntax uses URL encoding to deal with this problem,� If you plan on using a non-ASCII based encoding, ensure your Java Virtual Machine has the correct generic arguments specific for the non-ASCII based encoding. For example, for UTF-8 encoding, the following two parameters should be added to the Java Virtual Machine generic arguments for WebSphere Portal: -Dfile.encoding=UTF-8 and -Dclient

URL Encoding of Special Characters, When these characters are not used in their special role inside a URL, they sequences of spaces may be lost in some uses (especially multiple spaces) foreseeable problem arises when these values are passed as part of a URL string. The java.io.InputStreamReader, java.io.OutputStreamWriter, java.lang.String classes, and classes in the java.nio.charset package can convert between Unicode and a number of other character encodings. The supported encodings vary between different implementations of Java SE 8.

encodeURIComponent(), A new string representing the provided string encoded as a URI Alphanumeric Characters + Space console.log(encodeURI(set1)); // ;,/? I think you are using java.util.Scanner for reading your input. java doc from scanner. A Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace. The resulting tokens may then be converted into values of different types using the various next methods.

Using URL encoding to handle special characters in a document URI, URL encoding is often required to convert special characters (such as "/", "&", "#", . The <space> character needs to be encoded because it is not a valid URL character. Also view raw BadUri.java hosted with ❤ by GitHub. Encoded url: JavaScript Percent Encoder. Encodes all characters into percent-encoded hex form (including unreserved characters)

Comments
  • why do you need the + to be %20? they are both equivalent? permadi.com/tutorial/urlEncoding
  • Please see my response below, thanks.
  • @fuzzy lollipop: Alas, no. HTTP says it should be %20, it's the HTML specification that allows + instead of space. So, example.com/something%20here.php?q=a+string+with+spaces is valid, but example.com/something+here.php?q=a+string+with+spaces is not.
  • That did the trick, thanks very much for your help. I am not crazy about using such an old class, but like I mentioned in another post, I have no other choice (as I am developing on a BlackBerry platform) Thanks!
  • I am developing on a BlackBerry platform, which does not include the java.net API, unfortunately. The problem is that when I form a URL to make a request, ie: api.netflix.com/titles?more_stuff&term=Forrest+Gump it will not work unless it looks like this Forrest%20Gump (according to the Netflix API that I am using)
  • So why don't you simply remove the special if branch for spaces (else if (ch == ' ') { sbuf.append('+'); }) in the code you've pasted? In that case, spaces should fall into the "other ASCII" branch and be encoded as you expect.
  • "And why is it a problem that spaces are encoded as + characters? That is exactly how URL safe character encoding is supposed to work." HTML allows + for spaces, HTTP requires %20 for spaces.
  • Powerlord - for some reason the Netflix API doesn't like the "+" symbol
  • I wonder, could there be + in some meaning other than a space? Anyway, this rather seems like a fugly hack than a clean solution.
  • It is slow, it includes unnecessary memory allocations (can easy become hot code when processing lots of data). Regex Pattern is created from "+" implicitely by replace() - should be pre-compiled if URL-encoding in a loop. Requires handling of UnsupportedEncodingException (nonsensical since UTF-8 is used)