python - Placing every value in its percentile in Pandas -


consider series following percentiles:

> df['col_1'].describe(percentiles=np.linspace(0, 1, 20))  count      13859.000000 mean         421.772842 std        14665.298998 min            1.201755 0%             1.201755 5.3%           1.430695 10.5%          1.438417 15.8%          1.466462 21.1%          1.473050 26.3%          1.500834 31.6%          1.512218 36.8%          1.542935 42.1%          1.579845 47.4%          1.647162 50%            1.690612 52.6%          1.749047 57.9%          1.955589 63.2%          2.344475 68.4%          3.075641 73.7%          4.466094 78.9%          8.410964 84.2%         14.998738 89.5%         41.363612 94.7%        162.865079 100%     1511013.790233 max      1511013.790233 name: col_1, dtype: float64 

i column col_2 percentile each row assigned in calculation made above.

how can in pandas?

df2 = pd.dataframe(range(1000)) df2.columns = ['a1'] df2['percentile'] = pd.qcut(df2.a1,100, labels=false) 

or leave out labels see range


note in python 3, pandas 0.16.2 (latest version of today), need use list(range(1000)) instead of range(1000) above work.


Comments

Popular posts from this blog

python - How to create jsonb index using GIN on SQLAlchemy? -

PHP DOM loadHTML() method unusual warning -

c# - TransactionScope not rolling back although no complete() is called -