r - finding unique values from a list -
suppose have list of values
x <- list(a=c(1,2,3), b = c(2,3,4), c=c(4,5,6))
i find unique values list elements combined. far, following code did trick
unique(unlist(x))
does know more efficient way? have hefty list lot of values , appreciate speed-up.
this solution suggested marek best answer original q. see below discussion of other approaches , why marek's useful.
> unique(unlist(x, use.names = false)) [1] 1 2 3 4 5 6
discussion
a faster solution compute unique()
on components of x
first , final unique()
on results. work if components of list have same number of unique values, in both examples below. e.g.:
first version, double unique approach:
> unique(unlist(x)) [1] 1 2 3 4 5 6 > unique.default(sapply(x, unique)) [1] 1 2 3 4 5 6
we have call unique.default
there matrix
method unique
keeps 1 margin fixed; fine matrix can treated vector.
marek, in comments answer, notes slow speed of unlist
approach potentially due names
on list. marek's solution make use of use.names
argument unlist
, if used, results in faster solution double unique version above. simple x
of roman's post get
> unique(unlist(x, use.names = false)) [1] 1 2 3 4 5 6
marek's solution work when number of unique elements differs between components.
here larger example timings of 3 methods:
## create large list (1000 components of length 100 each) df <- as.list(data.frame(matrix(sample(1:10, 1000*1000, replace = true), ncol = 1000)))
here results 2 approaches using df
:
> ## 3 approaches give same result: > all.equal(unique.default(sapply(df, unique)), unique(unlist(df))) [1] true > all.equal(unique(unlist(df, use.names = false)), unique(unlist(df))) [1] true > ## timing roman's original: > system.time(replicate(10, unique(unlist(df)))) user system elapsed 12.884 0.077 12.966 > ## timing double unique version: > system.time(replicate(10, unique.default(sapply(df, unique)))) user system elapsed 0.648 0.000 0.653 > ## timing of marek's solution: > system.time(replicate(10, unique(unlist(df, use.names = false)))) user system elapsed 0.510 0.000 0.512
which shows double unique
lot quicker applying unique()
individual components , unique()
smaller sets of unique values, speed-up purely due names
on list df
. if tell unlist
not use names
, marek's solution marginally quicker double unique
problem. marek's solution using correct tool properly, , quicker work-around, preferred solution.
the big gotcha double unique
approach work if, in 2 examples here, each component of input list (df
or x
) has same number of unique values. in such cases sapply
simplifies result matrix allows apply unique.default
. if components of input list have differing numbers of unique values, double unique solution fail.
Comments
Post a Comment