algorithm - Finding median of large set of numbers too big to fit into memory -
i asked question in interview recently.
there n numbers, many fit memory. split across k database tables (unsorted), each of can fit memory. find median of numbers.
wasn't quite sure answer one.
there's few potential solutions:
- external merge sort - o(n log n)
sort numbers on first pass, find median on second. - order statistics distributed selection algorithm - o(n)
simplify problem original problem of finding kth number in unsorted array. - counting sort histogram o(n)
have assume properties range of numbers - can range fit in memory? - if known distribution of numbers other algorithms can produced.
for more details , implementation see:
http://www.fusu.us/2013/07/median-in-large-set-across-1000-servers.html
Comments
Post a Comment