Fast string to integer conversion in Python

convert string to integer python
python convert string to double
python string to float
convert list to int python
python int to string format
convert string to integer python? - stack overflow
how to convert string to hex in python
python string to decimal

A simple problem, really: you have one billion (1e+9) unsigned 32-bit integers stored as decimal ASCII strings in a TSV (tab-separated values) file. Conversion using int() is horribly slow compared to other tools working on the same dataset. Why? And more importantly: how to make it faster?

Therefore the question: what is the fastest way possible to convert a string to an integer, in Python?

What I'm really thinking about is some semi-hidden Python functionality that could be (ab)used for this purpose, not unlike Guido's use of array.array in his "Optimization Anecdote".

Sample data (with tabs expanded to spaces)

38262904        "pfv"              2002-11-15T00:37:20+00:00
12311231        "tnealzref"        2008-01-21T20:46:51+00:00
26783384        "hayb"             2004-02-14T20:43:45+00:00
812874          "qevzasdfvnp"      2005-01-11T00:29:46+00:00
22312733        "bdumtddyasb"      2009-01-17T20:41:04+00:00

The time it takes reading the data is irrelevant here, processing the data is the bottleneck.

Microbenchmarks

All of the following are interpreted languages. The host machine is running 64-bit Linux.

Python 2.6.2 with IPython 0.9.1, ~214k conversions per second (100%):

In [1]: strings = map(str, range(int(1e7)))

In [2]: %timeit map(int, strings);
10 loops, best of 3: 4.68 s per loop

REBOL 3.0 Version 2.100.76.4.2, ~231kcps (108%):

>> strings: array n: to-integer 1e7 repeat i n [poke strings i mold (i - 1)]
== "9999999"

>> delta-time [map str strings [to integer! str]]
== 0:00:04.328675

REBOL 2.7.6.4.2 (15-Mar-2008), ~523kcps (261%):

As John noted in the comments, this version does not build a list of converted integers, so the speed-ratio given is relative to Python's 4.99s runtime of for str in strings: int(str).

>> delta-time: func [c /local t] [t: now/time/precise do c now/time/precise - t]

>> strings: array n: to-integer 1e7 repeat i n [poke strings i mold (i - 1)]
== "9999999"

>> delta-time [foreach str strings [to integer! str]]
== 0:00:01.913193

KDB+ 2.6t 2009.04.15, ~2016kcps (944%):

q)strings:string til "i"$1e7

q)\t "I"$strings
496

I might suggest that for raw speed, Python isn't the right tool for this task. A hand-coded C implementation will beat Python easily.

Fast and efficient way to convert string into integer, You could use text => bytes => int: int.from_bytes(text.encode(),'big'). A quick time comparison: import timeit  Conversion using int() is horribly slow compared to other tools working on the same dataset. Why? And more importantly: how to make it faster? Therefore the question: what is the fastest way possible to convert a string to an integer, in Python?

You will get some percentage of speed by ensuring only "local" variables are used in your tightest of loops. The int function is a global, so looking it up will be more expensive than a local.

Do you really need all billion numbers in memory at all times. Consider using some iterators to give you only a few values at a time A billion numbers will take a bit of storage. Appending these to a list, one at a time, is going to require several large reallocations.

Get your looping out of Python entirely if possible. The map function here can be your friend. I'm not sure how your data is stored. If it is a single number per line, you could reduce the code to

values = map(int, open("numberfile.txt"))

If there are multiple values per line that are white space separated, dig into the itertools to keep the looping code out of Python. This version has the added benefit of creating a number iterator, so you can spool only one or several numbers out of the file at a time, instead of one billion in one shot.

numfile = open("numberfile.txt")
valIter = itertools.imap(int, itertools.chain(itertools.imap(str.split, numfile)))

How to Convert a Python String to int – Real Python, There are several ways to represent integers in Python. In this quick and practical tutorial, you'll learn how you can store integers using int and  Python Main Function Convert string to integer in Python In Python an strings can be converted into a integer using the built-in int() function. The int() function takes in any python data type and converts it into a integer.But use of the int() function is not the only way to do so.

The following most simplistic C extension already improves heavily on the builtin, managing to convert over three times as many strings per second (650kcps vs 214kcps):

static PyObject *fastint_int(PyObject *self, PyObject *args) {
    char *s; unsigned r = 0;
    if (!PyArg_ParseTuple(args, "s", &s)) return NULL;
    for (r = 0; *s; r = r * 10 + *s++ - '0');
    return Py_BuildValue("i", r);
}

This obviously does not cater for integers of arbitrary length and various other special cases, but that's no problem in our scenario.

How to Convert Strings into Integers in Python, Just like the str() built-in, Python also offers a handy built-in which takes a string object as an argument and returns the corresponding integer object. Example  If you have a decimal integer represented as a string and you want to convert the Python string to an int, then you just pass the string to int(), which returns a decimal integer: >>> int ("10") 10 >>> type (int ("10")) <class 'int'> By default, int() assumes that the string argument represents a decimal integer.

Agree with Greg; Python, as an interpreted language, is generally slow. You could try compiling the source code on-the-fly with the Psyco library or coding the app in a lower level language such C/C++.

fastnumbers · PyPI, Super-fast and clean conversions to numbers. Provide drop-in replacements for the Python built-in int and float that on average are up to 2x faster. Convert string to a float >>> fast_float('56.07') 56.07 >>> # Integers are  To convert a string to integer in Python, use the int() function. This function takes two parameters: the initial string and the optional base to represent the data. Use the syntax print(int(“”)) to return the str as an int. Python includes a number of data types that are used to distinguish a particular type of data.

As others have said you could code up your own C module to do the parsing/conversion for you. Then you could simply import that and call on it. You might be able to use Pyrex or its Cython derivative to generate your C from your Python (by adding a few type constraining hints to the Python).

You can read more about Cython and see if that will help.

Another question that comes to mind though ... what are you going to be doing with these billion integers? Is it possible that you might load them as strings, search for them as strings and perform a lazy conversion as necessary? Or could you parallelize the conversion and the other computations using threading or multiprocessing modules and Queues? (Have one or more threads/processes performing the conversion and feeding a Queue from which your processing engine fetches them). In other words would a producer/consumer design alleviate the problem?

Why is int() slower than float() when converting a string? : Python, If you explicitly call it with base 10 ( int('500', 10) ) you'll see that it speeds up and has performance similar to float(). I'm not sure why it's faster - I don't see a  So this is how you can convert Python string to a number. If the string does not have decimal places, you’ll most likely want to convert it to an integer by using the int() method. The str() function is used to convert the integer to String in Python. The ord() function is used to convert the character to an integer.

String conversion and formatting, Functions for number conversion and formatted string output. int PyOS_snprintf (​char *str, size_t size, const char *format, )  Python; How to Convert Strings into Integers in Python; How to Convert Strings into Integers in Python. Just like the str() built-in, Python also offers a handy built-in which takes a string object as an argument and returns the corresponding integer object.

Faster float / string conversion (Ryu) - Ideas, Converting between floating point numbers and strings is apparently CPython seems to use dtoa for this, but a new algorithm Ryu is apparently much faster. fixed-size integer operations, and prove its correctness. Ryu¯ How to convert Python string to an int and float In certain scenarios, you may need to convert a string to an integer or float for performing certain operations in Python. An example of string to int conversion A demo of string to float conversion (Both of these examples are explained below along with list …

Converting Python 3 Data Types: Numbers, Strings, Lists, Tuples , Python's method float() will convert integers to floats. To use Converting strings to numbers enables us to quickly modify the data type we are  Python defines type conversion functions to directly convert one data type to another which is useful in day to day and competitive programming. This article is aimed at providing the information about certain conversion functions. 1. int (a,base) : This function converts any data type to integer. ‘Base’ specifies the base in which string

Comments
  • Try numpy.fromfile to load 'one billion positive integers' (btw, what do you mean by 'billion' (it is 10**9 in US, it might be 10**12 in Britain)?
  • Good catch about the billion; even though the latter use got out of vogue in Britain in the 1970's.
  • Have you tried to compile the code ?
  • (1) Please be more explicit than "stored as ASCII strings in a text file". Fixed columns or delimited? Is this the only type of data in the file? Show a few sample lines. (2) Show us the code that YOU are currently using, if you want us to believe that int() is the problem and that this isn't a homework question (3) Please express the speed in SI units rather than "horribly slow". (4) What other tools? (5) What platform and what version of Python?
  • (6) What is the average number of digits in an integer? (7) Are the digits decimal/hex/octal/something else?
  • I totally agree, but that's not really the point of my question. I added a paragraph of what I'm looking for. A custom Python extension would be an option, though.
  • Is there any reason not to use C standard lib's functions e.g., strtoul()?
  • -1 on the interpreted ==> slow corollary. A C implementation will be faster in this case, but your generalization is simply wrong.
  • An interpreted language must be translated into machine code at the time of execution and that is simply slower than executing a compiled object code. Still don't understand your downvote. Please explain why do you think "my generalization" is wrong.
  • Interpreted languages can make optimizations on the bytecode during runtime, sometimes leading to better performance than native machine code. Look it up, it has been discussed to death.
  • Well, I suppose 90% of the cases isn't enough to generalize, so it's edited.
  • Moving as much as can out of the inner loop and running this on 1e7 iterations takes 27 seconds using psyco.full(). So it would take something resembling 45 minutes on my machine to do 1e9. I'm empted to believe that C/C++/C# would be faster, though I have not benchmarked them.