algorithm - Suggestions for a Data Structure for related features -
we have set of documents , each has set of features. given feature a, need know probability of having feature b in same document.
i thought of building probability matrix , s.t: m(i,j) = probability of having feature b in document , given feature there.
however , have additional requirement: given feature in document , features have probability > p of being in same document.
in mean while think off sparse matrix probability matrix , , after it's computed , each feature run on column , sort p , , keep in linked list somewhere. (so , have each feature , list of corresponding features
this space complexity quite big (worst case: n^2, , n large!) , , time complexity each search o(n).
any better idea?
if number of features comparable number of documents, or greater, consider holding inverted index: each feature hold (e.g. sorted list of) documents in present. can work out probability of b given running merge on sorted lists features , b.
for "common features expected given a" question, can think of nothing better pre-computing answer each , hoping resulting list of features isn't long.
Comments
Post a Comment