I want to process quite large ARFF files in scikit-learn. The files are in a zip file and I do not want to unzip the file into a folder before processing it. Therefore, I use the zipfile module of Python 3.6:
from zipfile import ZipFile from scipy.io.arff import loadarff archive = ZipFile (& # 39; archive.zip & # 39 ;, & # 39; r & # 39;) datafile = archive.open (& # 39; datafile.arff & # 39;) data = loadarff (data file) # ... data file.close () file.close ()
However, this produces the following error:
Tracking (recent calls latest): File "./m.py", line 6, in
data = loadarff (data file) File "/usr/lib64/python3.6/site-packages/scipy/io/arff/arffread.py", line 541, in loadarff returns _loadarff (ofile) File "/usr/lib64/python3.6/site-packages/scipy/io/arff/arffread.py", line 550, in _loadarff rel, attr = read_header (ofile) File "/usr/lib64/python3.6/site-packages/scipy/io/arff/arffread.py", line 323, in read_header while r_comment.match (i): TypeError: can not use a string pattern on an object similar to bytes
According to the documentation,
load requires an object similar to a file. Therefore, my question is how to get the contents of ARFF as an object of this type from the zip file.
Note: If I decompress manually and charge the ARFF directly with
data = loadarff (& # 39; datafile.arff & # 39;), everything is fine.