algorithm - Finding median of large set of numbers too big to fit into memory -


i asked question in interview recently.

there n numbers, many fit memory. split across k database tables (unsorted), each of can fit memory. find median of numbers.

wasn't quite sure answer one.

there's few potential solutions:

  • external merge sort - o(n log n)
    sort numbers on first pass, find median on second.
  • order statistics distributed selection algorithm - o(n)
    simplify problem original problem of finding kth number in unsorted array.
  • counting sort histogram o(n)
    have assume properties range of numbers - can range fit in memory?
  • if known distribution of numbers other algorithms can produced.

for more details , implementation see:
http://www.fusu.us/2013/07/median-in-large-set-across-1000-servers.html


Comments

Popular posts from this blog

ASP.NET/SQL find the element ID and update database -

jquery - appear modal windows bottom -

c++ - Compiling static TagLib 1.6.3 libraries for Windows -