How to automatically group and statistically sum a column

V1 V2
0.52
0.813
0.75
0.925
1.24
.

how to group V2 according to interval (statistical maximum to minimum, divided into n intervals), and count the average of all V1 values in each interval

Mar.23,2021

part of binning can be used with case_when

library(tidyverse)

data %>%
mutate(V2 = case_when(V2 %>% between(0,1) ~ "0-1",
                      V2 %>% between(1,2) ~ "0-2",
                      TRUE ~ ">=3") %>%
group_by(V2) %>%
summarize(mean_value = mean(V1))

whether the code of R is more user-friendly than the one above.


Group v2 columns, map them to grouped ID columns, and then do average statistics.

Please refer to the following code

'''


author: 
'''
import numpy as np
import pandas as pd


df = pd.DataFrame([
    [0.1, 1],
    [0.2, 2],
    [0.3, 3],
    [0.4, 4],
    [0.5, 5],
    [0.6, 6],
], columns=['v1', 'v2'])


-sharp  v2 
-sharp  np.histogram 
ranges = np.histogram(df.v2.values, 3)[1]


def tag_v2(value, ranges):
    '''  ID  v2  '''
    for i in range(len(ranges) - 1):
        if value >= ranges[i] and value <= ranges[i+1]:
            return i
    return -1


-sharp  v2 
df['v2_tag'] = df.v2.apply(lambda i: tag_v2(i, ranges))

print df

output results

      v1    v2    v2_tag
0    0.1    1    0
1    0.2    2    0
2    0.3    3    1
3    0.4    4    1
4    0.5    5    2
5    0.6    6    2

Statistical average

print df.groupby('v2_tag')['v1'].mean()

output results

     v2_tag
0    0.15
1    0.35
2    0.55
Name: v1, dtype: float64

use pandas.cut and groupby

import pandas as pd
from io import StringIO

s = """
V1,V2
0.5,2
0.8,13
0.7,5
0.9,25
1.2,4"""

df = pd.read_csv(StringIO(s))

-sharp 10
step = (df.V2.max - df.V2.min)/10
bins = [df.V2.min() + i * step for i in range(11)]
result = df.groupby(pd.cut(df.V2, bins)).V1.mean()

output:

V2
(2.0, 4.3]      1.2
(4.3, 6.6]      0.7
(6.6, 8.9]      NaN
(8.9, 11.2]     NaN
(11.2, 13.5]    0.8
(13.5, 15.8]    NaN
(15.8, 18.1]    NaN
(18.1, 20.4]    NaN
(20.4, 22.7]    NaN
(22.7, 25.0]    0.9
Name: V1, dtype: float64
Menu