r - finding unique values from a list -


suppose have list of values

x <- list(a=c(1,2,3), b = c(2,3,4), c=c(4,5,6)) 

i find unique values list elements combined. far, following code did trick

unique(unlist(x)) 

does know more efficient way? have hefty list lot of values , appreciate speed-up.

this solution suggested marek best answer original q. see below discussion of other approaches , why marek's useful.

> unique(unlist(x, use.names = false)) [1] 1 2 3 4 5 6 

discussion

a faster solution compute unique() on components of x first , final unique() on results. work if components of list have same number of unique values, in both examples below. e.g.:

first version, double unique approach:

> unique(unlist(x)) [1] 1 2 3 4 5 6 > unique.default(sapply(x, unique)) [1] 1 2 3 4 5 6 

we have call unique.default there matrix method unique keeps 1 margin fixed; fine matrix can treated vector.

marek, in comments answer, notes slow speed of unlist approach potentially due names on list. marek's solution make use of use.names argument unlist, if used, results in faster solution double unique version above. simple x of roman's post get

> unique(unlist(x, use.names = false)) [1] 1 2 3 4 5 6 

marek's solution work when number of unique elements differs between components.

here larger example timings of 3 methods:

## create large list (1000 components of length 100 each) df <- as.list(data.frame(matrix(sample(1:10, 1000*1000, replace = true),                                  ncol = 1000))) 

here results 2 approaches using df:

> ## 3 approaches give same result: > all.equal(unique.default(sapply(df, unique)), unique(unlist(df))) [1] true > all.equal(unique(unlist(df, use.names = false)), unique(unlist(df))) [1] true > ## timing roman's original: > system.time(replicate(10, unique(unlist(df))))    user  system elapsed    12.884   0.077  12.966 > ## timing double unique version: > system.time(replicate(10, unique.default(sapply(df, unique))))    user  system elapsed    0.648   0.000   0.653 > ## timing of marek's solution: > system.time(replicate(10, unique(unlist(df, use.names = false))))    user  system elapsed    0.510   0.000   0.512 

which shows double unique lot quicker applying unique() individual components , unique() smaller sets of unique values, speed-up purely due names on list df. if tell unlist not use names, marek's solution marginally quicker double unique problem. marek's solution using correct tool properly, , quicker work-around, preferred solution.

the big gotcha double unique approach work if, in 2 examples here, each component of input list (df or x) has same number of unique values. in such cases sapply simplifies result matrix allows apply unique.default. if components of input list have differing numbers of unique values, double unique solution fail.


Comments

Popular posts from this blog

ASP.NET/SQL find the element ID and update database -

jquery - appear modal windows bottom -

c++ - Compiling static TagLib 1.6.3 libraries for Windows -