There are two sets of data. Each set of data has 500 million url, but only 4 gigabytes of memory. How to find the same two url? in these two sets of data?

this is an interview question for Ali, which has been bothering me for a long time. Please.
there are two sets of data. Each set of data has 500 million url, but only 4 gigabytes of memory. How to find the same two url? in these two sets of data?

Jul.12,2021

interview-Ali -. Big data title-given two files an and b, each storing 5 billion url, each url occupies 64 bytes, and the memory limit is 4G, which allows you to find out the common url? of files an and b

.

there is a similar topic, which adopts the idea of divide and conquer.


take a guess, sort first and then divide into blocks?

< hr >

there are dozens of such problems every year, but big companies are just different. Tsk

Menu