algorithm - Suggestions for a Data Structure for related features -

- July 15, 2013

we have set of documents , each has set of features. given feature a, need know probability of having feature b in same document.

i thought of building probability matrix , s.t: m(i,j) = probability of having feature b in document , given feature there.

however , have additional requirement: given feature in document , features have probability > p of being in same document.

in mean while think off sparse matrix probability matrix , , after it's computed , each feature run on column , sort p , , keep in linked list somewhere. (so , have each feature , list of corresponding features

this space complexity quite big (worst case: n^2, , n large!) , , time complexity each search o(n).

any better idea?

if number of features comparable number of documents, or greater, consider holding inverted index: each feature hold (e.g. sorted list of) documents in present. can work out probability of b given running merge on sorted lists features , b.

for "common features expected given a" question, can think of nothing better pre-computing answer each , hoping resulting list of features isn't long.

Search This Blog

ERT

algorithm - Suggestions for a Data Structure for related features -

Comments

Post a Comment

Popular posts from this blog

ASP.NET/SQL find the element ID and update database -

c++ - Compiling static TagLib 1.6.3 libraries for Windows -

PostgreSQL 9.x - pg_read_binary_file & inserting files into bytea -