How to use python pandas to deal with the following different types of table problems?

recently, I want to process the form for analysis. Due to the lack of experience in dealing with sql or csv, I have encountered the following problems and asked to solve them:

question 1:

companyID  fresh
23865      13-2
23866      10-3
23867      8-12

how to create a new field Dash, whose value is determined by the value of"- "for each row of fresh data. If the left side of" -"is greater than the right, it is 1, otherwise it is 0. For example, 13-2 is 1, and 8-12 is 0.

question 2:

companyID  productID   A     B     C     D  
23865      1          3.6   3.0   4.0   5.0
23865      2          3.0   3.5   4.5   5.5
23865      3          3.6   3.2   4.2   5.3
23866      1          2.2   3.3   3.8   5.1
23866      2          3.2   3.4   4.4   5.6
23866      3          3.5   3.4   4.6   5.3
23866      4          4.2   4.2   5.3   5.8
23867      2          3.2   3.3   4.4   5.4
23867      3          3.6   3.5   4.6   5.4
23867      4          4.4   4.1   5.5   5.8

(1) and productID represent the product category, and their values correspond to the price of the model A.B.C.D. How to convert the table into a company with only one line, and the prices of all products and models are placed in the company"s line
(2). The difference between (1) and (1) is that, after generating the data of each company for one line, as long as the prices of the four models of product 2 and 4 are not needed. According to the operation of
(3) and then (2), the prices of the four models of AMagneBpencle D are arranged in ascending and descending order (the corresponding product productID also needs to be adjusted at this time)

< hr >

Thank you netizens + God: Lao Gua

< hr >

data= {"companyID": [23865, 23865, 23865, 23865, 2386, and 2386]," productID":, [1, 2, 2, 3, 4, 4, 3, 4, 3, 3, 4, 3, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 6, 4, 6, 4, 6, 4, 6, 4, 6, 4, 6, 4, 6, 6, 6, 6, 6, 6, 6, 5, 6, 5, 6, 5, 6, 5, 6, 6, 6, 6, 5, 6, 6, 4, 6, 5, 6, 5, 6, 4, 6, 4, 6, 6, 6, 6, 6, 6, 6, 5, 6, 4, 6, 6, 5, 6, 6, 6, 6, 5, 6, 6, 6, 6, 6, 6, 5, 6, 6, 6, 6, 4, 6, 6, 6, 6, 6, 6, 4, 6, 6, 6, 6, 6, 6, 4, 6, 4, 6, 4, 6, 4, 6, 4, 6, 4, 6, 4, 6, 4, 6, 4, 6, 4, 6, 4, 6, 4, 6, 6, 6, 4, 6, 6, 4, 6, 4, 4, 6, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 6, 6, 6, 4, 6, 6, 6, 6,
dd=pd.DataFrame (data)
dd

clipboard.png

clipboard.png

< hr >

further,
(4), what if the price of a model is missing and you want to replace it with the average price of the two nearest models of the same product? For example, the price of model D is missing when the companyID is 2386, and the price of the model D is missing. According to the above expression, the price of companyID is 23865, the price of the product is 2, the price of the model D is 5.3, and the average price of the model D is 2 when companyID is 2387. the average price of the model D is 5.5p >


one, df ['Dash'] = df [' fresh'] .apply (lambda x: 1 if x.split ('-') [0] < x.split ('-') [1] else 0)
II,

  1. df.groupby ('companyID') .agg (list)
  2. df [df ['productID'] .isin ([2mer4])] .groupby (' companyID') .agg (list)
  3. df [df ['productID'] .isin ([2p4])] .groupby (' companyID') .agg (lambda x: sorted (list (x), reverse=True)) add sorted function, and reverse parameter is responsible for ascending and descending order

question 1:

new_list = []
for index,row in df.iterrows():
    first_item, second_item = str(row["fresh"]).split("-")
    if int(first_item) > int(second_item):
        row["Dash"] = 1
    else:
        row["Dash"] = 0
    new_list.append(row)
    
new_df = pd.DataFrame(new_list)
Menu