(Python) numpy genfromtxt convert problem

1466 views python
-5

i use

netdata = num.genfromtxt('resultscut.rw', dtype=None, delimiter = '|', usecols=(0,1,2,3,4))

to generate a list out of a text data file. This works really nice but when i put a bigger data file to convert i get this error:

  File "/home/.local/lib/python2.7/site-packages/numpy/lib/npyio.py", line 2047, in genfromtxt
    for (i, conv) in enumerate(converters)]))
MemoryError

Is it too big for genfromtxt? How can i fix it?

Thank you in advance, Greetings :)

answered question

Be specific. How big is the file that breaks the program? How big is the file that does not?

use ls -l your_file to find the size of it.

1 Answer

1

As discussed in the comments, the resulting object is probably too large for your memory.

Numpy has the ability to store arrays on your disk (hopefully SSD, if you use a HDD, this will probably be too slow). This is called a memmap.

It is possible to store datatypes such as strings in a memmap, but this can become tricky: numpy.memmap for an array of strings?

Also, it might be complicated to get the data into the memmap in the first place. You might want to split the file and load it in multiple goes. Then you can write the individual portions into the memmap one by one.

Another important point might be the dtype. You specify None and use many columns. Are you having different datatypes in the different columns ? If yes, you might want to switch to pandas, instead of numpy. That will give you a proper datatype for this spreadsheet like data. Be sure to use the appropriate datatypes for every column. That can significantly reduce your memory footprint (and might already solve your problem): https://www.dataquest.io/blog/pandas-big-data/

posted this

Have an answer?

JD

Please login first before posting an answer.