Open and read txt file that are space delimited
I have a space seperated txt file like following:
2004 Temperature for KATHMANDU AIRPORT Tmax Tmin 1 18.8 2.4 2 19.0 1.1 3 18.3 1.7 4 18.3 1.0 5 17.8 1.3
I want to calculate the mean of both Tmax and Tmin seperately. But, I am having hard time reading txt file. I tried this link like.
import re list_b =  list_d =  with open('TA103019.95.txt', 'r') as f: for line in f: list_line = re.findall(r"[\d.\d+']+", line) list_b.append(float(list_line)) #appends second column list_d.append(float(list_line)) #appends fourth column print list_b print list_d
But, it is giving me error :
IndexError: list index out of range
what is wrong here?
A simple way to solve that is to use
Of course, you need to drop the first two lines:
with io.open("path/to/file.txt", mode="r", encoding="utf-8") as f: next(f) next(f) for line in f: print(line.split())
['1', '18.8', '2.4'] ['2', '19.0', '1.1'] ['3', '18.3', '1.7'] ['4', '18.3', '1.0'] ['5', '17.8', '1.3']
Quoting the documentation:
If sep is not specified or is
None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.
The TextFieldType property defines whether it is a delimited file or one with fixed-width fields of text. To parse a comma delimited text file Create a new TextFieldParser. The following code creates the TextFieldParser named MyReader and opens the file test.txt.
import re list_b =  list_d =  with open('TA103019.95.txt', 'r') as f: for line in f: # regex is corrected to match the decimal values only list_line = re.findall(r"\d+\.\d+", line) # error condition handled where the values are not found if len(list_line) < 2: continue # indexes are corrected below list_b.append(float(list_line)) #appends second column list_d.append(float(list_line)) #appends fourth column print list_b print list_d
I have added my answer with some comments in the code itself.
You were getting the
Index out of range error because your list_line was having only a single element(i.e. 2004 in the first line of file) and you were trying to access the 1st and 3rd index of the list_line.
I try to read the file into pandas. The file has values separated by space, but with different number of spaces I tried: pd.read_csv('file.csv', delimiter=' ') but it doesn't work
def readit(file_name,start_line = 2): # start_line - where your data starts (2 line mean 3rd line, because we start from 0th line) with open(file_name,'r') as f: data = f.read().split('\n') data = [i.split(' ') for i in data[start_line:]] for i in range(len(data)): row = [(sub) for sub in data[i] if len(sub)!=0] yield int(row),float(row),float(row) iterator = readit('TA103019.95.txt') index, tmax, tmin = zip(*iterator) mean_Tmax = sum(tmax)/len(tmax) mean_Tmin = sum(tmin)/len(tmin) print('Mean Tmax: ',mean_Tmax) print('Mean Tmnin: ',mean_Tmin) >>> ('Mean Tmax: ', 18.439999999999998) >>> ('Mean Tmnin: ', 1.5)
Thanks to Dan D. for more Elegant solution
The characters used as a separators and delimiters will be visible, if you open the.txt file in Writer and enable the hidden characters (View > Nonprinting Characters). If your file still only opens in Writer, check if it doesn't contain illegal characters, e.g. null characters. They will show as (rows of) #'s in Writer.
Save Excel File to space delimited text file by asjacobsen Apr 23, 2009 6:04AM PDT There's an easier way than saving as a CSV file:
Simplify your life and avoid 're' for this problem.
Perhaps you are reading the header row mistakenly? If the format of the file is fixed, I usually "burn" the header row with a line read before starting the loop like:
with open(file_name, 'r') as f: f.readline() # burn the header row for line in f: tokens = line.strip().split(' ') # tokenize the row based on spaces
Then you have a list of tokens, which will be strings that you'll need to convert to int or float or whatever and go from there!
Put in a couple print statements to see what you are picking up...
Hi, I have space delimited text file with spaces between two names. I want to use the infile option with delimiter as space and keep the name in the same column. Here is a snippet of the code I am having. Agoura Hills 21127 290 290 0 3 5 48 64 155 15 0 and so on. Thanks in advance.
I've seen before a way to read data into a data frame that is tabbed or white spaced in your working script file. For example: dat <- SOMETHING( person1 12 15 person2 15 18 person3 20 14 ) Say you're grabbing data from a website and just want to table a few things, and it comes off like this with white space etc.
Description tdfread opens the Select File to Open dialog box for interactive selection of a data file, and reads the data from the file you select. tdfread can read data from tab-delimited text files with.txt,.dat, or.csv file extensions.
Tab-delimited files. The options available for reading in a .csv file in proc import also exist for tab-delimited files: you can opt to read in or not read in names from your file; you can treat tab-delimited files as a special type of external file with extension .txt of your can treat your file as an instance of a delimited file and describe the delimiter.
- It is giving me error
NameError: name 'io' is not defined
- You could have skipped two lines by calling
next(f)twice and then in a single
for line in f:you could have parsed and yielded each line. This would eliminate both lists data and processed. And the transpose can be done with
index, tmax, tmin = zip(*iterator).
- Thanks. I partly edited solution, not fully (didnt know much how to implement next(f) but also didnt want to spend too much time on it). Feel free to edit the answer.
- It is giving me error
IndexError: list index out of range
- It's because we didn't deal with the first two lines that are not the data.