indexing - Inverse index binary format -


i'm trying figure out kind of binary file can support needs inverse index. let have document can identify unique id , each document can have 360 fixed values in range of 0-65535. this:

document0: [1, 10, 123, ...] // 360 values

document1: [1, 10, 345, ...] // 360 values

now, inverse index easy - can create each possible value list of documents contains, , query can executed fast, e.g.:

1: [document0, document1]

10: [document0, document1]

123: [document0]

345: [document1]

but wanna store large number of documents in kind of file (binary) , have ability query fast add new documents without recreating whole structure.

now i'm struggling how organize file. if wanna fast access need fixed length document arrays file seek , read. fixed size means have lot of empty spaces document list. idea have kind of bucketing system , each value can belong bucket of specific size, e.g. there buckets size 1, 2, 4, 8, 16, 32, ... (or that) , need kind of header point me bucket starts , size of bucket. idea optimize store size, again i'm having problem addition of new documents.

any idea how organize 'inverse index' file?

best.

i go 65536 files each having id's of documents. if want go gentle on filesystem, divide 256 directories having 256 files each.

00\00.idx 00\01.idx .. ff\ff.idx 

Comments

Popular posts from this blog

ASP.NET/SQL find the element ID and update database -

jquery - appear modal windows bottom -

c++ - Compiling static TagLib 1.6.3 libraries for Windows -