How to quickly merge 1000 txt files, each 2G in size?

1. Is there a fast grouping and merging algorithm?
2. Single-threaded merging is too slow. Will parallel merging greatly improve merging efficiency?
3. Any comments or suggestions are welcome.

Mar.05,2021

you don't say if there are any requirements when merging files, so let's assume that the two files are simply spliced together without any extra action. In that case, the biggest bottleneck in the whole process should be the file read and write operation, so I think multithreading operation should not improve performance. Honestly reading files one by one, appending at the end may be the fastest.


I think we can use divide-and-conquer method, open multiple threads, and decompose tasks, similar to the Fork-join framework of Java, which is about this process

def combine(L, R):
    if R - L < 10:
        return combine10files_in_thread(L, R)
    else:
        mid = (R - L) // 2
        a = combine(L, mid)
        b = combine(mid + 1, R)
        combine2file(a, b)
def combine10files_in_thread(L, R):
    -sharp10
    pass
def combine2file(a,b):
    -sharp
    pass
Menu