python - KeyError when using melt to restructure Dataframe -


i have dataframe looks follows , has 2628 rows , 101 columns. want convert years row associated numbers 0.08333 0.16666 0.249999 , on, column:

years      currency  0.08333333  0.16666666  0.24999999  0.33333332  \ 2005-01-04      gbp    4.709456    4.633861    4.586271    4.567017    2005-01-05      gbp    4.713099    4.649220    4.606802    4.588313    2005-01-06      gbp    4.707237    4.646861    4.609294    4.593076 

the code follows, combined_data dataframe. used melt error keyerror: 'years' , don't know how handle this:

from pandas.io.excel import read_excel import pandas pd import numpy np  url = 'http://www.bankofengland.co.uk/statistics/documents/yieldcurve/uknom05_mdaily.xls'  # check sheet number, spot: 9/9, short end 7/9 spot_curve = read_excel(url, sheetname=8) short_end_spot_curve = read_excel(url, sheetname=6)  # cleaning, keep nan now, forward fill nan not recommended yield curve spot_curve.columns = spot_curve.loc['years:'] spot_curve.columns.name = 'years' valid_index = spot_curve.index[4:] spot_curve = spot_curve.loc[valid_index] # remove maturities within 5 years duplicated in short-end file col_mask = spot_curve.columns.values > 5 spot_curve = spot_curve.iloc[:, col_mask]   short_end_spot_curve.columns = short_end_spot_curve.loc['years:'] short_end_spot_curve.columns.name = 'years' valid_index = short_end_spot_curve.index[4:] short_end_spot_curve = short_end_spot_curve.loc[valid_index]  # merge these two, time index identical # ============================================== combined_data = pd.concat([short_end_spot_curve, spot_curve], axis=1, join='outer') # sort maturity short end long end combined_data.sort_index(axis=1, inplace=true)  def filter_func(group):     return group.isnull().sum(axis=1) <= 50  combined_data = combined_data.groupby(level=0).filter(filter_func)  idx = 0 values = ['gbp'] * len(combined_data.index) combined_data.insert(idx, 'currency', values) print combined_data  pd.melt(combined_data,id_vars=['years']) #error! 

edit: desired results:

years                     currency   0.08333333   2005-01-04      gbp    4.709456    4.633861    4.586271    4.567017    0.16666666   2005-01-05      gbp    4.713099    4.649220    4.606802    4.588313    0.24999999   2005-01-06      gbp    4.707237    4.646861    4.609294    4.593076 

this may need tweaked based on how many rows have relative columns, give desired results (more or less):

        years currency  0.08333333  0.16666666  0.24999999  0.33333332 0  2005-01-04      gbp    4.709456    4.633861    4.586271    4.567017 1  2005-01-05      gbp    4.713099    4.649220    4.606802    4.588313 2  2005-01-06      gbp    4.707237    4.646861    4.609294    4.593076  df['x'] = df.columns.values[-4:-1] df = df.set_index('x',drop=true) df.columns = ['years','currency','v1','v2','v3','v4']                   years currency        v1        v2        v3        v4 x                                                                       0.08333333  2005-01-04      gbp  4.709456  4.633861  4.586271  4.567017 0.16666666  2005-01-05      gbp  4.713099  4.649220  4.606802  4.588313 0.24999999  2005-01-06      gbp  4.707237  4.646861  4.609294  4.593076 

Comments

Popular posts from this blog

PHP DOM loadHTML() method unusual warning -

python - How to create jsonb index using GIN on SQLAlchemy? -

c# - TransactionScope not rolling back although no complete() is called -