Is a column of data a statistical problem of frequency?

there is a sample, column an is the number, column b is the number of occurrence of the number, the sample is very large, the standard deviation and normal distribution map of this sample is required, how should we deal with it with python? I"ve checked a lot, but I still don"t quite understand it.
A B
100 2
200 3
300 4
.

Mar.28,2021

although @ Leo Li Shiting's method can solve the problem, it is not efficient and does not make full use of the matrix computing power of the numpy class library.

the following provides a more concise and efficient method, from which students can understand the subtlety of numpy matrix operations.

suppose you already know how to calculate the standard deviation of a set of numbers, otherwise see https://zh.wikipedia.org/zh-h.

.

for a set of numbers [100,200,300] , and their corresponding numbers [1,2,3]

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({
    'a': [100, 200, 300], 
    'b': [1, 2, 3],         -sharp a 
})

-sharp n  m  sd 
n = df.b.sum()
m = (df.a * df.b).sum() / n
sd = ((df.b * ((df.a - m) ** 2)).sum() / n) ** 0.5

-sharp 
plt.hist(df.a, weights=df.b)

about data

A B
100 2
200 3
300 4
...

can be seen as a list that looks like this. .

Standard distribution

you can use std () of numpy to calculate the standard deviation, or you can write your own formula. For example,

import pandas as pd
df = pd.DataFrame({'A':[100,200,300],'B':[2,3,4]})
"""
df 

     A  B
0  100  2
1  200  3
2  300  4
"""

l = []
for i, j in zip(df['A'],df['B']):
    tmp = [i]*j
    l.extend(tmp)
    
"""
l 
[100, 100, 200, 200, 200, 300, 300, 300, 300]
"""
Menu