csv - Splitting a Dataframe Column in Python -
i trying drop rows pandas dataframe df
. looks , has 180 rows , 2745 columns. want rid of rows have curv_typ
of pyc_rt
, ycif_rt
. want rid of geo\time
column. extracting data csv file , have realize curv_typ,maturity,bonds,geo\time
, characters below pyc_rt,y1,gbaaa,ea
in 1 column:
curv_typ,maturity,bonds,geo\time 2015m06d16 2015m06d15 2015m06d11 \ 0 pyc_rt,y1,gbaaa,ea -0.24 -0.24 -0.24 1 pyc_rt,y1,gba_aaa,ea -0.02 -0.03 -0.10 2 pyc_rt,y10,gbaaa,ea 0.94 0.92 0.99 3 pyc_rt,y10,gba_aaa,ea 1.67 1.70 1.60 4 pyc_rt,y11,gbaaa,ea 1.03 1.01 1.09
i decided try , split column , drop resulting individual columns, getting error keyerror: 'curv_typ,maturity,bonds,geo\time'
in last line of code df_new = pd.dataframe(df['curv_typ,maturity,bonds,geo\time'].str.split(',').tolist(), df[1:]).stack()
import os import urllib2 import gzip import stringio import pandas pd baseurl = "http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/bulkdownloadlisting?file=" filename = "data/irt_euryld_d.tsv.gz" outfilepath = filename.split('/')[1][:-3] response = urllib2.urlopen(baseurl + filename) compressedfile = stringio.stringio() compressedfile.write(response.read()) compressedfile.seek(0) decompressedfile = gzip.gzipfile(fileobj=compressedfile, mode='rb') open(outfilepath, 'w') outfile: outfile.write(decompressedfile.read()) #now have deal tsv file import csv open(outfilepath,'rb') tsvin, open('ecb.csv', 'wb') csvout: tsvin = csv.reader(tsvin, delimiter='\t') writer = csv.writer(csvout) data in tsvin: writer.writerow(data) csvout = 'c:\users\sidney\ecb.csv' #df = pd.dataframe.from_csv(csvout) df = pd.read_csv('c:\users\sidney\ecb.csv', delimiter=',', encoding="utf-8-sig") print df df_new = pd.dataframe(df['curv_typ,maturity,bonds,geo\time'].str.split(',').tolist(), df[1:]).stack()
edit: reptilicus's answer used code below:
#now have deal tsv file import csv outfilepath = filename.split('/')[1][:-3] #as in code above, put here reference csvout = 'c:\users\sidney\ecb.tsv' outfile = open(csvout, "w") open(outfilepath, "rb") f: line in f.read(): line.replace(",", "\t") outfile.write(line) outfile.close() df = pd.dataframe.from_csv("ecb.tsv", sep="\t", index_col=false)
i still same exact output before.
thank you
the format of csv awful, there comma , tab separated data in there.
get rid of commas first:
tr ',' '\t' < irt_euryld_d.tsv > test.tsv
if can't use tr
can in python:
outfile = open("outfile.tsv", "w") open("irt_euryld_d.tsz", "rb") f: line in f.read(): line.replace(",", "\t") outfile.write(line) outfile.close()
then can load nicely in pandas:
in [9]: df = dataframe.from_csv("test.tsv", sep="\t", index_col=false) in [10]: df out[10]: curv_typ maturity bonds geo\time 2015m06d17 2015m06d16 \ 0 pyc_rt y1 gbaaa ea -0.23 -0.24 1 pyc_rt y1 gba_aaa ea -0.05 -0.02 2 pyc_rt y10 gbaaa ea 0.94 0.94 3 pyc_rt y10 gba_aaa ea 1.66 1.67 in [11]: df[df["curv_typ"] != "pyc_rt"] out[11]: curv_typ maturity bonds geo\time 2015m06d17 2015m06d16 \ 60 ycif_rt y1 gbaaa ea -0.22 -0.23 61 ycif_rt y1 gba_aaa ea 0.04 0.08 62 ycif_rt y10 gbaaa ea 2.00 1.97
Comments
Post a Comment