The time-consuming problem of batch storage of a large amount of data in Mysql:, why does the speed decline so fast when the amount of data is large?

mybatis+mysql for java projects

            int batchAmount = 20000;
            for (int startIndex = 0; startIndex < size; startIndex += batchAmount) {
                int endIndex = startIndex + batchAmount;
                endIndex = (endIndex > size) ? size : endIndex;
                nInserted += userMapper.batchInsertIgnoreDuplicate(userList.subList(startIndex, endIndex));
            }

where the mybatis statement corresponding to batchInsertIgnoreDuplicate
Table, Columns, Values is the response table, table column name and column value

<insert id="batchInsertIgnoreDuplicate">
    INSERT IGNORE INTO
    <include refid="Table"/>
    <include refid="Columns"/>
    VALUES
    <foreach collection="userList" item="user" index="index" separator=",">
        <include refid="Values"/>
    </foreach>
</insert>

I found that for incoming operations, if the data rows are in millions or less, about 2000-5000 rows per second
, but when the amount of data reaches about 4 million, the speed drops significantly
I think there is no problem with gc recovery, recovery frequency and recovery time. During 4 million storage operations, observe the gc as follows, no problem, it should not be java or jvm gc problems. Then why the large amount of data and the serious decline in the speed of storage?

    [root@localhost bin]-sharp ./jstat -gcutil 10442 10000
      S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT
     24.04   0.00  63.84  49.04  97.65  95.86    580   11.725     6    0.351   12.076
     24.04   0.00  63.84  49.04  97.65  95.86    580   11.725     6    0.351   12.076
     24.04   0.00  63.84  49.04  97.65  95.86    580   11.725     6    0.351   12.076
     24.04   0.00  63.84  49.04  97.65  95.86    580   11.725     6    0.351   12.076
     24.04   0.00  63.84  49.04  97.65  95.86    580   11.725     6    0.351   12.076
     24.04   0.00  63.84  49.04  97.65  95.86    580   11.725     6    0.351   12.076
     24.04   0.00  63.84  49.04  97.65  95.86    580   11.725     6    0.351   12.076
     24.04   0.00  63.84  49.04  97.65  95.86    580   11.725     6    0.351   12.076
     24.04   0.00  63.84  49.04  97.65  95.86    580   11.725     6    0.351   12.076
     24.04   0.00  63.84  49.04  97.65  95.86    580   11.725     6    0.351   12.076
      0.00  23.32  71.10  49.04  97.65  95.86    581   11.731     6    0.351   12.082
      0.00  23.32  71.10  49.04  97.65  95.86    581   11.731     6    0.351   12.082
      0.00  23.32  71.10  49.04  97.65  95.86    581   11.731     6    0.351   12.082
      0.00  23.32  71.10  49.04  97.65  95.86    581   11.731     6    0.351   12.082
      0.00  23.32  71.10  49.04  97.65  95.86    581   11.731     6    0.351   12.082
      0.00  23.32  71.11  49.04  97.65  95.86    581   11.731     6    0.351   12.082
      0.00  23.32  71.11  49.04  97.65  95.86    581   11.731     6    0.351   12.082
      0.00  23.32  71.11  49.04  97.65  95.86    581   11.731     6    0.351   12.082
      0.00  23.32  71.11  49.04  97.65  95.86    581   11.731     6    0.351   12.082
      0.00  23.32  71.11  49.04  97.65  95.86    581   11.731     6    0.351   12.082
     12.85   0.00  78.61  49.04  97.65  95.86    582   11.736     6    0.351   12.087
     12.85   0.00  78.61  49.04  97.65  95.86    582   11.736     6    0.351   12.087
     12.85   0.00  78.61  49.04  97.65  95.86    582   11.736     6    0.351   12.087
     12.85   0.00  78.61  49.04  97.65  95.86    582   11.736     6    0.351   12.087
     12.85   0.00  78.61  49.04  97.65  95.86    582   11.736     6    0.351   12.087
     12.85   0.00  78.61  49.04  97.65  95.86    582   11.736     6    0.351   12.087
     12.85   0.00  78.61  49.04  97.65  95.86    582   11.736     6    0.351   12.087
     12.85   0.00  78.61  49.04  97.65  95.86    582   11.736     6    0.351   12.087
     12.85   0.00  78.61  49.04  97.65  95.86    582   11.736     6    0.351   12.087
     12.85   0.00  78.61  49.04  97.65  95.86    582   11.736     6    0.351   12.087
Feb.27,2021

the bottleneck may be on the database side, removing indexes and constraints from the table to see if performance has improved.


400 million data is not all in userList, is it?
where does the data come from? Files? Network?
in that case, unless the memory is very large, no wonder it is not slow. It is more efficient to record in batches.


Please use batch mode


if the table is indexed, the larger the amount of data, the higher the cost of maintaining the index, that is, the insertion will be slower (in your case). Check whether there are any extra indexes in the following table, and design the table structure. Indexes can be reduced.
in addition, check to see if there are any triggers for Insert.

Menu