URL encoding the space character: + or %20?

When is a space in a URL encoded to +, and when is it encoded to %20?

From Wikipedia (emphasis and link added):

When data that has been entered into HTML forms is submitted, the form field names and values are encoded and sent to the server in an HTTP request message using method GET or POST, or, historically, via email. The encoding used by default is based on a very early version of the general URI percent-encoding rules, with a number of modifications such as newline normalization and replacing spaces with "+" instead of "%20". The MIME type of data encoded this way is application/x-www-form-urlencoded, and it is currently defined (still in a very outdated manner) in the HTML and XForms specifications.

So, the real percent encoding uses %20 while form data in URLs is in a modified form that uses +. So you're most likely to only see + in URLs in the query string after an ?.

URL encoding the space character: + or %20?, The characters allowed in a URI are either reserved or unreserved (or a percent character as part of a percent-encoding). Reserved characters are those� To submit these characters in an URL, they are converted into a special format called URL encoding or percentage encoding. Instead of the character itself, its position in the Ascii charset is given, preceded by a percentage sign. Thus, a space " " will turn into %20, and the umlaut äinto %E4.

This confusion is because URLs are still 'broken' to this day.

Take "http://www.google.com" for instance. This is a URL. A URL is a Uniform Resource Locator and is really a pointer to a web page (in most cases). URLs actually have a very well-defined structure since the first specification in 1994.

We can extract detailed information about the "http://www.google.com" URL:

+---------------+-------------------+
|      Part     |      Data         |
+---------------+-------------------+
|  Scheme       | http              |
|  Host         | www.google.com    |
+---------------+-------------------+

If we look at a more complex URL such as:

"https://bob:bobby@www.lunatech.com:8080/file;p=1?q=2#third"

we can extract the following information:

+-------------------+---------------------+
|        Part       |       Data          |
+-------------------+---------------------+
|  Scheme           | https               |
|  User             | bob                 |
|  Password         | bobby               |
|  Host             | www.lunatech.com    |
|  Port             | 8080                |
|  Path             | /file;p=1           |
|  Path parameter   | p=1                 |
|  Query            | q=2                 |
|  Fragment         | third               |
+-------------------+---------------------+

https://bob:bobby@www.lunatech.com:8080/file;p=1?q=2#third
\___/   \_/ \___/ \______________/ \__/\_______/ \_/ \___/
  |      |    |          |          |      | \_/  |    |
Scheme User Password    Host       Port  Path |   | Fragment
        \_____________________________/       | Query
                       |               Path parameter
                   Authority

The reserved characters are different for each part.

For HTTP URLs, a space in a path fragment part has to be encoded to "%20" (not, absolutely not "+"), while the "+" character in the path fragment part can be left unencoded.

Now in the query part, spaces may be encoded to either "+" (for backwards compatibility: do not try to search for it in the URI standard) or "%20" while the "+" character (as a result of this ambiguity) has to be escaped to "%2B".

This means that the "blue+light blue" string has to be encoded differently in the path and query parts:

"http://example.com/blue+light%20blue?blue%2Blight+blue".

From there you can deduce that encoding a fully constructed URL is impossible without a syntactical awareness of the URL structure.

This boils down to:

You should have %20 before the ? and + after.

Source

URL Encode and Decode, For example, spaces in a string are either encoded with %20 or replaced with the plus sign ( + ). If you use a pipe character ( | ) as a separator, be� A space position in the character set is 20 hexadecimals. So you can use %20 in place of a space when passing your request to the server. http://www.example.com/new%20pricing.htm. This URL actually retrieves a document named "new pricing.htm" from the www.example.com.

I would recommend %20.

Are you hard-coding them?

This is not very consistent across languages, though. If I'm not mistaken, in PHP urlencode() treats spaces as + whereas Python's urlencode() treats them as %20.

EDIT:

It seems I'm mistaken. Python's urlencode() (at least in 2.7.2) uses quote_plus() instead of quote() and thus encodes spaces as "+". It seems also that the W3C recommendation is the "+" as per here: http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1

And in fact, you can follow this interesting debate on Python's own issue tracker about what to use to encode spaces: http://bugs.python.org/issue13866.

EDIT #2:

I understand that the most common way of encoding " " is as "+", but just a note, it may be just me, but I find this a bit confusing:

import urllib
print(urllib.urlencode({' ' : '+ '})

>>> '+=%2B+'

URL Encoding | Maps URLs, URLs use the ASCII charset. URL encoding replaces space characters with "%20 " (percent followed by the ASCII code for a blank space). No. Spaces in URIs/URLs should be encoded using %20 URL Encoding - Percent (%) Encoding URLs use the ASCII charset. URL encoding replaces space characters with "%20" (percent followed by the ASCII code for a blank space).

A space may only be encoded to "+" in the "application/x-www-form-urlencoded" content-type key-value pairs query part of an URL. In my opinion, this is a MAY, not a MUST. In the rest of URLs, it is encoded as %20.

In my opinion, it's better to always encode spaces as %20, not as "+", even in the query part of an URL, because it is the HTML specification (RFC-1866) that specified that space characters should be encoded as "+" in "application/x-www-form-urlencoded" content-type key-value pairs (see paragraph 8.2.1. subparagraph 1.)

This way of encoding form data is also given in later HTML specifications. For example, look for relevant paragraphs about application/x-www-form-urlencoded in HTML 4.01 Specification, and so on.

Here is a sample string in URL where the HTML specification allows encoding spaces as pluses: "http://example.com/over/there?name=foo+bar". So, only after "?", spaces can be replaced by pluses. In other cases, spaces should be encoded to %20. But since it's hard to correctly determine the context, it's the best practice to never encode spaces as "+".

I would recommend to percent-encode all character except "unreserved" defined in RFC-3986, p.2.3

unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"

The implementation depends on the programming language that you chose.

If your URL contains national characters, first encode them to UTF-8 and then percent-encode the result.

Spaces in Link and Image URL/URI (HTML Encoding), The <space> character also needs to be encoded because is not allowed on a valid URL format. Also, some characters, such as "~" might not transport properly � The <space>character has been URL encoded as "+". The &character has been URL encoded as "%26". <space>character and &character are just some of the special characters that need to be encoded. Below are some others (click the button to see

Introduction to URL Encoding, Whitespace characters in URLs are considered unsafe, so most modern CMS platforms will not generate URLs with whitespaces. Through encoding (either as � Space:One of the most frequent URL Encoded character you’re likely to encounter is space. The ASCII value of spacecharacter in decimal is 32, which when converted to hex comes out to be 20. Now we just precede the hexadecimal representation with a percent sign (%), which gives us the URL encoded value - %20. ASCII Character Encoding Reference

URL contains whitespace, Typically it is used to encode a name in a given name space, or an algorithm for The percent sign ("%", ASCII 25 hex) is used as the escape character in the� In this case the %20 is the escape character for the space. Mass Find And Replace For Documents Or Web Pages Chances are you’re looking up the code for a URL escape character because you’re a web designer, or at least building and designing your own web site.

Universal Resource Identifiers: Recommendations, URL Encoding converts reserved, unsafe, and non-ASCII characters in URLs to a Space: One of the most frequent URL Encoded character you're likely to� URLs consist of a sequence of characters within the US-ASCII coded character set. URL encoding is a method for representing special characters with a “%” followed by the two hexadecimal digits that form the hexadecimal value of the character within the US-ASCII coded character set. Safe characters include alphanumerical values [0-9, a-z, A-Z] and certain special characters -_.* (dash, underscore, period and asterisk).

Comments
  • This question would be more helpful as several language-specific questions, right?
  • Possible duplicate of When to encode space to plus (+) or %20?
  • @user the question you link to was asked later, making it the dupe, not this one.
  • So + encoding would technically be multipart/form-data encoding, while percent encoding is application/x-www-form-urlencoded?
  • @BC: no - multipart/form-data uses MIME encoding; application/x-www-form-urlencoded uses + and properly encoded URIs use %20.
  • "So you're most likely to only see + in URLs in the query string after an ?" Is an understatement. You should never see "+" in the path part of the URL because it will not do what you expect (space).
  • So basically: Target of GET submission is http://www.bing.com/search?q=hello+world and a resource with space in the name http://camera.phor.net/cameralife/folders/2012/2012-06%20Pool%20party/
  • Note that for email links, you do need %20 and not + after the ?. For example, mailto:support@example.org?subject=I%20need%20help. If you tried that with +, the email will open with +es instead of spaces.
  • >> you should have %20 before the ? and + after Sorry for the silly question. I know a bit somehow that hashtag parameter is used after "?" question mark parameter. Though it is somehow different because using "#" does not reload the page. But I've been trying to use %20 and + sign after the "#" hashtag, and it seems not working. Which one needs to be used after "#"?
  • @Philcyb You might wanna read this en.wikipedia.org/wiki/Percent-encoding
  • Does the query part actually have an "official" standard? I thought basically that part is application specific. 99.99% of apps use key1=value1&key1=value2 where keys and values are encoded with whatever rules encodeURIComponent follow but AFAIK the contents of the query part is entirely 100% up to the app. Other then it only goes to the first # there's no official encoding.