r - Classify or cut dataframe by list of class range and summarize it with ddply -
i have question ddply , subset.
i have dataframe df :
df <- read.table(textconnection( " id v_idn v_seed v_time v_pop v_rank v_perco 1 15 125648 0 150 1 15 2 17 125648 0 120 2 5 3 18 125648 0 100 3 6 4 52 125648 0 25 4 1 5 17 125648 10 220 1 5 6 15 125648 10 160 2 15 7 18 125648 10 110 3 6 8 52 125648 10 50 4 1 9 56 -11152 0 250 1 17 10 15 -11152 0 180 2 15 11 18 -11152 0 110 3 6 12 22 -11152 0 5 4 14 13 56 -11152 10 250 1 17 14 15 -11152 10 180 2 15 15 22 -11152 10 125 3 14 16 18 -11152 10 120 4 6 "), header=true)
step 1 :
i have list of equal interval cut_interval :
myinterval <- cut_interval(c(15,5,6,1,17,14), length=10)
so have 2 levels here : [0,10) , (10,20]
step 2 :
i want each group/class define 2 levels in v_cut ... :
id v_idn v_seed v_time v_pop v_rank v_perco v_cut 1 15 125648 0 150 1 15 (10,20] 2 17 125648 0 120 2 5 [0,10) 3 18 125648 0 100 3 6 [0,10) 4 52 125648 0 25 4 1 [0,10) 5 17 125648 10 220 1 5 [0,10) 6 15 125648 10 160 2 15 (10,20] 7 18 125648 10 110 3 6 [0,10) 8 52 125648 10 50 4 1 [0,10) 9 56 -11152 0 250 1 17 (10,20] 10 15 -11152 0 180 2 15 (10,20] 11 18 -11152 0 110 3 6 [0,10) 12 22 -11152 0 5 4 14 (10,20] 13 56 -11152 10 250 1 17 (10,20] 14 15 -11152 10 180 2 15 (10,20] 15 22 -11152 10 125 3 14 (10,20] 16 18 -11152 10 120 4 6 [0,10)
step 3 :
i want know variability of v_rank x axis, , time y axis, each group v_cut, need compute min,mean,max,sd v_rank value
ddply(df, .(v_cut,v_time), summarize ,mean = mean(v_rank), min = min(v_rank), max = max(v_rank), sd = sd(v_rank))
*result wanted : *
id v_time mean.v_rank ... v_cut 1 0 2.25 (10,20] 2 0 2.42 [0,10) 3 10 2.25 [0,10) 4 10 2.42 (10,20]
my problem
i don't know how pass step 1 -> step 2 :/
and if it's possible group v_cut example in step 3 ?
is there possibility make same things "subset" option of ddply ?
one more time, lot great r guru !
update 1 :
i have answer go step1 step2 :
df$v_cut <- cut_interval(df$v_perco,n=10)
i'm using plyr, there perhaps better answer in case ?
answer go step 2 step 3 ?
update 2 :
brandon bertelsen give me answer melt + cast, (to understand) want make same operation plyr , ddply .. different result :
id v_idn v_time mean.v_rank ... v_cut 1 15 0 2.25 (10,20] 2 15 10 2.45 (10,20] 2 17 0 1.52 [0,10) 2 17 10 2.42 [0,10) etc.
i'm trying :
r('sumdata <- ddply(df, .(v_idn,v_time), summarize,min = min(v_rank),mean = mean(v_rank), max = max(v_rank), sd=sd(v_rank))')
but want have v_cut in sumdata dataframe, how can ddply ? there option make ? or merging initial df , key = v_idn add column v_cut sumdata answer ?
you don't need plyr this, can use reshape
## pull need dfx <- df[c("v_seed", "v_time","v_rank","v_perco")] ## bring in cuts dfx <- data.frame(dfx, ifelse(df$v_perco > 10,"(10,20]", "[0,10)"))) ## rename v_cut colnames(dfx)[ncol(dfx)] <- "v_cut" ## melt it. dfx <- melt(dfx, id=c("v_cut", "v_seed", "v_time")) ## cast it. dfx <- cast(dfx, v_cut + v_time + v_seed ~ variable, c(mean,min,max,sd))
if want mean, replace last line with:
dfx <- cast(dfx, v_cut + v_time + v_seed ~ variable, mean)
type "dfx" , you'll see data frame asked for.
Comments
Post a Comment