Does it affect disk efficiency that two java processes read a file at the same time (for a long time)?

< H2 > scene < / H2 >

A csv file with more than 1 million entries, read, each line to be processed, and the average processing time per line 250ms .

  • scenario 1: each read a line, process one line, deal with it, and read the next line.
  • scenario 2: read it out at once, put it in memory, and then traverse each line.

later, I found that one java process was not enough to run. I wanted to open a few more. When I found that I opened 3, Linux would kill 1-2 automatically for me.

excuse me, which scheme should be more reasonable in this scenario? What is the principle?

Apr.03,2021

use Scanner to read the file stream, nextLine () to get the file lines, encapsulate the processing logic into the task ( task ) and put it into the thread pool for processing.

  1. large files with high lines are best handled by streaming to avoid eating too much memory at one time.
  2. Thread pool reuses threads to save resources.

  1. Open a thread to read, put it in the cache, and put it in batches to pause it when it reaches a certain threshold.
  2. Open multiple threads to get the data from the cache, and then process it.
Menu