csv - Splitting a Dataframe Column in Python -


i trying drop rows pandas dataframe df. looks , has 180 rows , 2745 columns. want rid of rows have curv_typ of pyc_rt , ycif_rt. want rid of geo\time column. extracting data csv file , have realize curv_typ,maturity,bonds,geo\time , characters below pyc_rt,y1,gbaaa,ea in 1 column:

 curv_typ,maturity,bonds,geo\time  2015m06d16   2015m06d15   2015m06d11   \ 0                 pyc_rt,y1,gbaaa,ea        -0.24        -0.24        -0.24    1               pyc_rt,y1,gba_aaa,ea        -0.02        -0.03        -0.10    2                pyc_rt,y10,gbaaa,ea         0.94         0.92         0.99    3              pyc_rt,y10,gba_aaa,ea         1.67         1.70         1.60    4                pyc_rt,y11,gbaaa,ea         1.03         1.01         1.09  

i decided try , split column , drop resulting individual columns, getting error keyerror: 'curv_typ,maturity,bonds,geo\time' in last line of code df_new = pd.dataframe(df['curv_typ,maturity,bonds,geo\time'].str.split(',').tolist(), df[1:]).stack()

import os import urllib2 import gzip import stringio import pandas pd  baseurl = "http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/bulkdownloadlisting?file=" filename = "data/irt_euryld_d.tsv.gz" outfilepath = filename.split('/')[1][:-3]  response = urllib2.urlopen(baseurl + filename) compressedfile = stringio.stringio() compressedfile.write(response.read())  compressedfile.seek(0)  decompressedfile = gzip.gzipfile(fileobj=compressedfile, mode='rb')   open(outfilepath, 'w') outfile:     outfile.write(decompressedfile.read())  #now have deal tsv file import csv  open(outfilepath,'rb') tsvin, open('ecb.csv', 'wb') csvout:     tsvin = csv.reader(tsvin, delimiter='\t')     writer = csv.writer(csvout)     data in tsvin:         writer.writerow(data)   csvout = 'c:\users\sidney\ecb.csv' #df = pd.dataframe.from_csv(csvout) df = pd.read_csv('c:\users\sidney\ecb.csv', delimiter=',', encoding="utf-8-sig") print df df_new = pd.dataframe(df['curv_typ,maturity,bonds,geo\time'].str.split(',').tolist(), df[1:]).stack() 

edit: reptilicus's answer used code below:

#now have deal tsv file import csv  outfilepath = filename.split('/')[1][:-3] #as in code above, put here reference csvout = 'c:\users\sidney\ecb.tsv' outfile = open(csvout, "w") open(outfilepath, "rb") f:     line in f.read():         line.replace(",", "\t")         outfile.write(line) outfile.close()  df = pd.dataframe.from_csv("ecb.tsv", sep="\t", index_col=false) 

i still same exact output before.

thank you

the format of csv awful, there comma , tab separated data in there.

get rid of commas first:

tr ',' '\t' < irt_euryld_d.tsv > test.tsv 

if can't use tr can in python:

outfile = open("outfile.tsv", "w") open("irt_euryld_d.tsz", "rb") f:     line in f.read():         line.replace(",", "\t")         outfile.write(line) outfile.close() 

then can load nicely in pandas:

in [9]: df = dataframe.from_csv("test.tsv", sep="\t", index_col=false)  in [10]: df out[10]:     curv_typ maturity    bonds geo\time  2015m06d17   2015m06d16   \ 0     pyc_rt       y1    gbaaa       ea        -0.23        -0.24 1     pyc_rt       y1  gba_aaa       ea        -0.05        -0.02 2     pyc_rt      y10    gbaaa       ea         0.94         0.94 3     pyc_rt      y10  gba_aaa       ea         1.66         1.67 in [11]: df[df["curv_typ"] != "pyc_rt"] out[11]:     curv_typ maturity    bonds geo\time  2015m06d17   2015m06d16   \ 60   ycif_rt       y1    gbaaa       ea        -0.22        -0.23 61   ycif_rt       y1  gba_aaa       ea         0.04         0.08 62   ycif_rt      y10    gbaaa       ea         2.00         1.97 

Comments

Popular posts from this blog

PHP DOM loadHTML() method unusual warning -

python - How to create jsonb index using GIN on SQLAlchemy? -

c# - TransactionScope not rolling back although no complete() is called -