python - Placing every value in its percentile in Pandas -
consider series following percentiles:
> df['col_1'].describe(percentiles=np.linspace(0, 1, 20)) count 13859.000000 mean 421.772842 std 14665.298998 min 1.201755 0% 1.201755 5.3% 1.430695 10.5% 1.438417 15.8% 1.466462 21.1% 1.473050 26.3% 1.500834 31.6% 1.512218 36.8% 1.542935 42.1% 1.579845 47.4% 1.647162 50% 1.690612 52.6% 1.749047 57.9% 1.955589 63.2% 2.344475 68.4% 3.075641 73.7% 4.466094 78.9% 8.410964 84.2% 14.998738 89.5% 41.363612 94.7% 162.865079 100% 1511013.790233 max 1511013.790233 name: col_1, dtype: float64
i column col_2
percentile each row assigned in calculation made above.
how can in pandas?
df2 = pd.dataframe(range(1000)) df2.columns = ['a1'] df2['percentile'] = pd.qcut(df2.a1,100, labels=false)
or leave out labels see range
note in python 3, pandas 0.16.2 (latest version of today), need use list(range(1000))
instead of range(1000)
above work.
Comments
Post a Comment