algorithm - Suggestions for a Data Structure for related features -


we have set of documents , each has set of features. given feature a, need know probability of having feature b in same document.

i thought of building probability matrix , s.t: m(i,j) = probability of having feature b in document , given feature there.

however , have additional requirement: given feature in document , features have probability > p of being in same document.

in mean while think off sparse matrix probability matrix , , after it's computed , each feature run on column , sort p , , keep in linked list somewhere. (so , have each feature , list of corresponding features

this space complexity quite big (worst case: n^2, , n large!) , , time complexity each search o(n).

any better idea?

if number of features comparable number of documents, or greater, consider holding inverted index: each feature hold (e.g. sorted list of) documents in present. can work out probability of b given running merge on sorted lists features , b.

for "common features expected given a" question, can think of nothing better pre-computing answer each , hoping resulting list of features isn't long.


Comments

Popular posts from this blog

ASP.NET/SQL find the element ID and update database -

jquery - appear modal windows bottom -

c++ - Compiling static TagLib 1.6.3 libraries for Windows -