r - Classify or cut dataframe by list of class range and summarize it with ddply -

- May 15, 2012

i have question ddply , subset.

i have dataframe df :

df <- read.table(textconnection( "   id v_idn v_seed v_time v_pop v_rank v_perco      1  15    125648 0      150   1      15           2  17    125648 0      120   2      5            3  18    125648 0      100   3      6            4  52    125648 0      25    4      1             5  17    125648 10     220   1      5           6  15    125648 10     160   2      15            7  18    125648 10     110   3      6           8  52    125648 10     50    4      1             9  56   -11152  0      250   1      17           10 15   -11152  0      180   2      15           11 18   -11152  0      110   3      6            12 22   -11152  0      5     4      14            13 56   -11152  10     250   1      17           14 15   -11152  10     180   2      15           15 22   -11152  10     125   3      14           16 18   -11152  10     120   4      6 "), header=true)

step 1 :

i have list of equal interval cut_interval :

myinterval <- cut_interval(c(15,5,6,1,17,14), length=10)

so have 2 levels here : [0,10) , (10,20]

step 2 :

i want each group/class define 2 levels in v_cut ... :

id v_idn v_seed v_time v_pop v_rank v_perco v_cut 1  15    125648 0      150   1      15      (10,20] 2  17    125648 0      120   2      5       [0,10) 3  18    125648 0      100   3      6       [0,10) 4  52    125648 0      25    4      1       [0,10)  5  17    125648 10     220   1      5       [0,10) 6  15    125648 10     160   2      15      (10,20]  7  18    125648 10     110   3      6       [0,10) 8  52    125648 10     50    4      1       [0,10)  9  56   -11152  0      250   1      17      (10,20] 10 15   -11152  0      180   2      15      (10,20] 11 18   -11152  0      110   3      6       [0,10) 12 22   -11152  0      5     4      14      (10,20]  13 56   -11152  10     250   1      17      (10,20] 14 15   -11152  10     180   2      15      (10,20] 15 22   -11152  10     125   3      14      (10,20] 16 18   -11152  10     120   4      6       [0,10)

step 3 :

i want know variability of v_rank x axis, , time y axis, each group v_cut, need compute min,mean,max,sd v_rank value

ddply(df, .(v_cut,v_time), summarize ,mean = mean(v_rank), min = min(v_rank), max = max(v_rank), sd = sd(v_rank))

*result wanted : *

id  v_time mean.v_rank ... v_cut 1   0      2.25            (10,20] 2   0      2.42            [0,10) 3   10     2.25            [0,10) 4   10     2.42            (10,20]

my problem

i don't know how pass step 1 -> step 2 :/

and if it's possible group v_cut example in step 3 ?

is there possibility make same things "subset" option of ddply ?

one more time, lot great r guru !

update 1 :

i have answer go step1 step2 :

df$v_cut <- cut_interval(df$v_perco,n=10)

i'm using plyr, there perhaps better answer in case ?

answer go step 2 step 3 ?

update 2 :

brandon bertelsen give me answer melt + cast, (to understand) want make same operation plyr , ddply .. different result :

id  v_idn v_time mean.v_rank ... v_cut     1   15   0      2.25            (10,20]     2   15   10     2.45            (10,20]     2   17   0      1.52            [0,10)     2   17   10     2.42            [0,10)     etc.

i'm trying :

r('sumdata <- ddply(df, .(v_idn,v_time), summarize,min = min(v_rank),mean =  mean(v_rank), max = max(v_rank), sd=sd(v_rank))')

but want have v_cut in sumdata dataframe, how can ddply ? there option make ? or merging initial df , key = v_idn add column v_cut sumdata answer ?

you don't need plyr this, can use reshape

## pull need dfx <- df[c("v_seed", "v_time","v_rank","v_perco")] ## bring in cuts dfx <- data.frame(dfx, ifelse(df$v_perco > 10,"(10,20]", "[0,10)"))) ## rename v_cut colnames(dfx)[ncol(dfx)] <- "v_cut"        ## melt it.     dfx <- melt(dfx, id=c("v_cut", "v_seed", "v_time")) ## cast it. dfx <- cast(dfx, v_cut + v_time + v_seed ~ variable, c(mean,min,max,sd))

if want mean, replace last line with:

dfx <- cast(dfx, v_cut + v_time + v_seed ~ variable, mean)

type "dfx" , you'll see data frame asked for.

Search This Blog

ERT

r - Classify or cut dataframe by list of class range and summarize it with ddply -

Comments

Post a Comment

Popular posts from this blog

ASP.NET/SQL find the element ID and update database -

c++ - Compiling static TagLib 1.6.3 libraries for Windows -

PostgreSQL 9.x - pg_read_binary_file & inserting files into bytea -