Pandas multi-conditional grouping scheduling problem row_number

df = pd.DataFrame({"key1" : ["a","a","a","b","b"],
    "key2" : ["c","d","c","c","d"],
    "data" : [1,10,2,3,30]})

>>> df
  key1 key2  data
0    a    c     1
1    a    d    10
2    a    c     2
3    b    c     3
4    b    d    30


key1 key2  data  row_number
0    a    c     1     1
1    a    d    10     1
2    a    c     2     2
3    b    c     3     1
4    b    d    30     1

grouped by key1 and key2, sorted by data, what should be done to take out the serial number? The following methods found by search were not successful

df["row_number"] = df["data"].groupby(df["key1","key2"]).rank(ascending=True,method="first")
May.28,2022

def cumsum_seq(v):
    sub = v.sort_values('data')
    sub['seq'] = sub['seq'].cumsum()
    return sub.loc[:, ['data', 'seq']]

df['seq'] = 1
df.groupby(['key1', 'key2']).apply(cumsum_seq).reset_index().drop(columns='level_2')

result

< table > < thead > < tr > < th > < / th > < th > key1 < / th > < th > key2 < / th > < th > data < / th > < th > seq < / th > < / tr > < / thead > < tbody > < tr > < td > 0 < / td > < td > a < / td > < td > c < / td > < td > 1 < / td > < td > 1 < / td > < / tr > < tr > < td > 1 < / td > < td > a < / td > < td > c < / td > < td > 2 < / td > < td > 2 < / td > < / tr > < tr > < td > 2 < / td > < td > a < / td > < td > d < / td > < td > 10 < / td > < td > 1 < / td > < / tr > < tr > < td > 3 < / td > < td > b < / td > < td > c < / td > < td > 3 < / td > < td > 1 < / td > < / tr > < tr > < td > 4 < / td > < td > b < / td > < td > d < / td > < td > 30 < / td > < td > 1 < / td > < / tr > < / tbody > < / table >
Menu