Python Efficient way of parsing string to list of floats from a file

convert string to float python
could not convert string to float python
python convert list to float
python convert string to int
convert string list to float array python
convert each element of list to string python
convert list of floats to int python
python split string by character

This document has a word and tens of thousands of floats per line, I want to transform it to a dictionary with the word as key and a vector with all the floats. That is how I am doing, but due to the size of the file (about 20k lines each one with about 10k values) the process is taking a bit too long. I could not find a more efficient way of doing the parsing. Just some alternative ways that were not guaranteed to decrease run time.

with open("googlenews.word2vec.300d.txt") as g_file:
  i = 0;
  #dict of words: [lots of floats]
  google_words = {}

  for line in g_file:
    google_words[line.split()[0]] = [float(line.split()[i]) for i in range(1, len(line.split()))]

In your solution you preform slow line.split() for every word, twice. Consider following modification:

with open("googlenews.word2vec.300d.txt") as g_file:
    i = 0;
    #dict of words: [lots of floats]
    google_words = {}

    for line in g_file:
        word, *numbers = line.split()
        google_words[word] = [float(number) for number in numbers]

One advanced concept I used here is "unpacking": word, *numbers = line.split()

Python allows to unpack iterable values into multilple variables:

a, b, c = [1, 2, 3]
# This is practically equivalent to
a = 1
b = 2
c = 3

The * is a shortcut for "take the leftovers, put them in the list and assign the list to the name":

a, *rest = [1, 2, 3, 4]
# results in
a == 1
rest == [2, 3, 4]

How to convert all items in a list to floats in Python, How do I convert a string list to a float list in Python? Parse a string to a float in Python. The simplest code, you can use to parse a string to float in Python. If you want to convert “123.456” to int, you can’t use int (“123.456”). First, you have to convert this value to float and then to int. If you want to use one line, you can do this.

Just don't call line.split() more than once.

with open("googlenews.word2vec.300d.txt") as g_file:
    i = 0;
    #dict of words: [lots of floats]
    google_words = {}

    for line in g_file:
        temp = line.split()
        google_words[temp[0]] = [float(temp[i]) for i in range(1, len(temp))]

Here's a simple generator of such file:

s = "x"
for i in range (10000):
    s += " 1.2345"
print (s)

The former version takes some time. The version with only one split call is instant.

How to convert a float object to a string in Python, How do I convert a float to a string in Python? To convert a string value to the float, we use float() function. Python float() function float() function is a library function in python, it is used to convert a given string or integer value to the float value.

You could also use the csv module, which should be more efficient that what you are doing.

It would be something like:

import csv

d = {}
with (open("huge_file_so_huge.txt", "r")) as g_file:
    for row in csv.reader(g_file, delimiter=" "):
        d[row[0]] = list(map(float, row[1:]))

Read input as a float in Python, import time def main(): d = [1.0]*10**7 st = time.time() e = map(str, d) print time.​time() - st >>> main() 3.4690001010. too much? If not building list  Python Parse String to Float or Int. Python Tutorial: File Objects String Parsing with Regular Expressions In Python - Duration: 6:32.

Fast way to convert to a huge list of floats to a list of strings, Let's discuss how to resolve a problem in which we may have a comma separated float numbers and we need to convert to float list. Method #1 : Using list  Discover tips and tricks involved in parsing float or integer number from strings in Python on Education Ecosystem blog. Learn how to use Exception Handling to prevent your program from crashing when wrong data types are used as parameters.

Convert String float to float list in Python, strings in a program. This is a common way for a program to get input data load all lines into a list of strings (lines): lines = infile.readlines We must convert strings to numbers before computing: lines in a file. Especially older Python programs employ this technique: What is best for polynomials: lists or dictionaries? String Concatenation. The accuracy of this section is disputed with respect to later versions of Python. In CPython 2.5, string concatenation is fairly fast, although this may not apply likewise to other Python implementations. See ConcatenationTestCode for a discussion. Strings in Python are immutable.

[PDF] Reading data from a file Example: reading a file with numbers (part , Common file operations (continued) Operation Interpretation aString = input.read​() into a string aList = input.readlines() Read entire file into list of line strings methods in the table are common, keep in mind that probably the best way to met to convert strings to and from numbers come in handy when dealing with files​  It’s a lot faster to read the data into a primitive data type like a list or a dict. Once the list or dict is created, pandas allows us to easily convert it to a DataFrame as you will see later on. The image below shows the standard process when it comes to parsing any file. Parsing text in standard format

Comments
  • Define "taking a bit too long" better - how long is it taking, and being realistic how long would you like it to take?
  • I'd probably call line.split() once, and assign it to a variable rather than keeping that call in your list comprehension. That way you can iterate over it specifically
  • Just what I was going to say...
  • Thanks! It works much better now. One noob question though, what is the * doing in *numbers? Is it a pointer like in C, C++?
  • @VictorZuanazzi No, python doesn't have built-in pointers;) I explained this syntactic sugar in the updated answer.