String comparison technique used by Python

python string comparison not working
python compare two strings character by character
python compare strings if
string comparison in python
python compare two strings and return the difference
how to compare two strings in python using for loop
python while loop string comparison
python using is for string comparison

I'm wondering how Python does string comparison, more specifically how it determines the outcome when a less than (<) or greater than (>) operator is used.

For instance if I put print('abc' < 'bac') I get True. I understand that it compares corresponding characters in the string, however its unclear as to why there is more, for lack of a better term, "weight" placed on the fact that a is less than b (first position) in first string rather than the fact that a is less than b in the second string (second position).

From the docs:

The comparison uses lexicographical ordering: first the first two items are compared, and if they differ this determines the outcome of the comparison; if they are equal, the next two items are compared, and so on, until either sequence is exhausted.

Also:

Lexicographical ordering for strings uses the Unicode code point number to order individual characters.

or on Python 2:

Lexicographical ordering for strings uses the ASCII ordering for individual characters.

As an example:

>>> 'abc' > 'bac'
False
>>> ord('a'), ord('b')
(97, 98)

The result False is returned as soon as a is found to be less than b. The further items are not compared (as you can see for the second items: b > a is True).

Be aware of lower and uppercase:

>>> [(x, ord(x)) for x in abc]
[('a', 97), ('b', 98), ('c', 99), ('d', 100), ('e', 101), ('f', 102), ('g', 103), ('h', 104), ('i', 105), ('j', 106), ('k', 107), ('l', 108), ('m', 109), ('n', 110), ('o', 111), ('p', 112), ('q', 113), ('r', 114), ('s', 115), ('t', 116), ('u', 117), ('v', 118), ('w', 119), ('x', 120), ('y', 121), ('z', 122)]
>>> [(x, ord(x)) for x in abc.upper()]
[('A', 65), ('B', 66), ('C', 67), ('D', 68), ('E', 69), ('F', 70), ('G', 71), ('H', 72), ('I', 73), ('J', 74), ('K', 75), ('L', 76), ('M', 77), ('N', 78), ('O', 79), ('P', 80), ('Q', 81), ('R', 82), ('S', 83), ('T', 84), ('U', 85), ('V', 86), ('W', 87), ('X', 88), ('Y', 89), ('Z', 90)]

String comparison technique used by Python, . The character with lower Unicode value is considered to be smaller. An example of Python compare strings with == Two string variables are created which is followed by using the if statement. In the if statement, both variables are compared by using equal to operator.

Python string comparison is lexicographic:

From Python Docs: http://docs.python.org/reference/expressions.html

Strings are compared lexicographically using the numeric equivalents (the result of the built-in function ord()) of their characters. Unicode and 8-bit strings are fully interoperable in this behavior.

Hence in your example, 'abc' < 'bac', 'a' comes before (less-than) 'b' numerically (in ASCII and Unicode representations), so the comparison ends right there.

Python String Comparison, not true: Two distinct objects can have the same value. Python String comparison can be performed using equality (==) and comparison (<, >, !=, <=, >=) operators. There are no special methods to compare two strings. Python String Comparison. Python string comparison is performed using the characters in both strings. The characters in both strings are compared one by one. When different characters are found then their Unicode value is compared.

Python and just about every other computer language use the same principles as (I hope) you would use when finding a word in a printed dictionary:

(1) Depending on the human language involved, you have a notion of character ordering: 'a' < 'b' < 'c' etc

(2) First character has more weight than second character: 'az' < 'za' (whether the language is written left-to-right or right-to-left or boustrophedon is quite irrelevant)

(3) If you run out of characters to test, the shorter string is less than the longer string: 'foo' < 'food'

Typically, in a computer language the "notion of character ordering" is rather primitive: each character has a human-language-independent number ord(character) and characters are compared and sorted using that number. Often that ordering is not appropriate to the human language of the user, and then you need to get into "collating", a fun topic.

String comparison in Python: is vs. ==, How do you check if two strings are the same in Python? In most simple words possible, you want to calculate how many transformations you need to perform on the string A to make it equal to string B. The algorithm is also known as Edit Distance, so maybe that’s the term more familiar to you. To use it in Python you’ll need to install it, let’s say through pip: pip install python-Levenshtein

Take a look also at How do I sort unicode strings alphabetically in Python? where the discussion is about sorting rules given by the Unicode Collation Algorithm (http://www.unicode.org/reports/tr10/).

To reply to the comment

What? How else can ordering be defined other than left-to-right?

by S.Lott, there is a famous counter-example when sorting French language. It involves accents: indeed, one could say that, in French, letters are sorted left-to-right and accents right-to-left. Here is the counter-example: we have e < é and o < ô, so you would expect the words cote, coté, côte, côté to be sorted as cote < coté < côte < côté. Well, this is not what happens, in fact you have: cote < côte < coté < côté, i.e., if we remove "c" and "t", we get oe < ôe < oé < ôé, which is exactly right-to-left ordering.

And a last remark: you shouldn't be talking about left-to-right and right-to-left sorting but rather about forward and backward sorting.

Indeed there are languages written from right to left and if you think Arabic and Hebrew are sorted right-to-left you may be right from a graphical point of view, but you are wrong on the logical level!

Indeed, Unicode considers character strings encoded in logical order, and writing direction is a phenomenon occurring on the glyph level. In other words, even if in the word שלום the letter shin appears on the right of the lamed, logically it occurs before it. To sort this word one will first consider the shin, then the lamed, then the vav, then the mem, and this is forward ordering (although Hebrew is written right-to-left), while French accents are sorted backwards (although French is written left-to-right).

Python strings - Python Tutorial, Python String comparison can be performed using equality (==) and comparison (<, >, !=, <=, >=) operators. There are There are no special methods to compare two strings. What if we use < and > operators to compare two equal strings? Python compares string lexicographically i.e using ASCII value of the characters. Suppose you have str1 as "Mary" and str2 as "Mac" . The first two characters from str1 and str2 ( M and M ) are compared. As they are equal, the second two characters are compared. Because they are also equal, the third two characters ( r and c )

This is a lexicographical ordering. It just puts things in dictionary order.

Comparing Strings using Python, Each object can be identified using the id() method, as you can see below. Python tries to re-use objects in memory that have the same value, which also makes  Strings are Arrays. Like many other popular programming languages, strings in Python are arrays of bytes representing unicode characters. However, Python does not have a character data type, a single character is simply a string with a length of 1. Square brackets can be used to access elements of the string.

Python String Comparison: A Complete Guide to Compare Strings in , So, String of length 1 can be used as a Character in Python. String Comparison can be easily performed with the help of Comparison Operator,  One of Python's coolest features is the string format operator %. This operator is unique to strings and makes up for the pack of having functions from C's printf() family. Following is a simple example −

Python Compare String Methods With Code Snippets, Since strings are the most used data types in Python, so we thought to simplify the string comparison operations. In this tutorial, we'll explain how to create string​  The Python string data type is a sequence made up of one or more individual characters that could consist of letters, numbers, whitespace characters, or symbols. Because a string is a sequence, it can be accessed in the same ways that other sequence-based data types are, through indexing and slicing.

String Comparison in Python, If we wish to compare two strings and check for their equality even if the order of characters/words is different, then we first need to use sorted() method and then  String Formatting Operator. One of Python's coolest features is the string format operator %. This operator is unique to strings and makes up for the pack of having functions from C's printf() family. Following is a simple example −

Comments
  • What? How else can ordering be defined other than left-to-right?
  • @S.Lott: right-to-left. Not that anyone would do so, but it's not the only possibility.
  • @katrielalex: If you allow that, you'd have to allow random and even-only and odd-only and every other possibility. Then you'd have to "parameterize" the operator to pick which ordering. If there's going to be a default, how could it be other than left-to-right?
  • @S.Lott: I agree -- lex is the only sensible order to use. I just nitpicked that it's certainly not the only possible order!
  • @S.Lott: To answer your question, you might use sorted(range(10), key=lambda i: i ^ 123) for numbers or sorted('How else can ordering be defined other than left-to-right?'.split(), key= lambda s: s[::-1]) for text. They are definite (if unhelpful) orderings.
  • Just wanted to add that if one sequence is exhausted, that sequence is less: 'abc' < 'abcd'.
  • Thank you for this, might be useful to add that it works for number strings too. I was just having this issue "24" > 40 = True due to ord("2") = 50
  • @vaultah: Just to save other people reading your comment the need to read the question you're linking to, the relevant rule for Python 2 is "When you order a numeric and a non-numeric type, the numeric type comes first." (Python 3 raises a TypeError exception instead, btw.)
  • So, does it end the comparison as soon as it finds that one of the characters is less than the one it corresponds with?
  • @David: Yes. Either less than or greater than. If they are equal, the next items are compared.
  • This is actually wrong, because a dictionary doesn't make a difference between lowercase and uppercase letters, for instance 'a' > 'z' is True while 'a' > 'Z'is False
  • In both cases your loops terminate at the end of whichever string is shortest. You cannot then just return False unconditionally, that is wrong if string1 is longer than string2 (e.g. doggy & dog ), you need to check....
  • @JonBrave Seems correct what you say. You mean adding a if len(string1) < len(string2): return True before the final return False? I'm not at a computer currently, so I cannot check. Will do later :)
  • Yes, you need some test at the end deciding whether to return False or True according as either you have reached the end of both strings (False, because they are equal), string1 is longer (also False) or string2 is longer (True). The whole thing could be coded as return len(string1) < len(string2).