 Tags IOS SQL HTML C RUBY-ON-RAILS MYSQL ASP.NET DEVELOPMENT RUBY .NET LINUX SQL-SERVER REGEX WINDOWS ALGORITHM ECLIPSE VISUAL-STUDIO STRING SVN PERFORMANCE APACHE-FLEX UNIT-TESTING SECURITY LINQ UNIX MATH EMAIL OOP LANGUAGE-AGNOSTIC VB6 # find frequency of events in groups dplyr

By : Jeff Gao
Date : August 01 2020, 03:00 PM
may help you . I have a grouped df with different lengths of groups. I want to count y/n events within each group. So if I have the following: , We can count and then calculate proportion by group. code :
``````library(dplyr)

df %>% count(group, outcome) %>% group_by(group) %>% mutate(n = n/sum(n) * 100)

#  group outcome   n
#  <int> <fct>   <dbl>
#1     1 no       40
#2     1 yes      60
#3     2 no       40
#4     2 yes      60
#5     3 no       35.3
#6     3 yes      64.7
#7     4 no       50
#8     4 yes      50
``````
``````prop.table(table(df), 1) * 100

#    outcome
#group       no      yes
#    1 40.00000 60.00000
#    2 40.00000 60.00000
#    3 35.29412 64.70588
#    4 50.00000 50.00000
``````

## Use dplyr to find genotype frequency across SNPs

By : user2689900
Date : March 29 2020, 07:55 AM
hop of those help? Hope I am not misunderstanding. Are you looking for below:
Assume the data structure is:
code :
``````df <- structure(list(Assay = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L), .Label = c("One_apoe-83", "One_CD9-269"), class = "factor"),
Final = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L
), .Label = c("Invalid", "No Call", "NTC", "XX", "YX", "YY"
), class = "factor"), n = c(2L, 9L, 2L, 4L, 41L, 134L, 2L,
5L, 2L, 99L)), .Names = c("Assay", "Final", "n"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"))
``````
``````df %>% group_by(Assay) %>% mutate(n_percent = n/sum(n)*100)
#          Assay   Final   n n_percent
# 1  One_apoe-83 Invalid   2  1.041667
# 2  One_apoe-83 No Call   9  4.687500
# 3  One_apoe-83     NTC   2  1.041667
# 4  One_apoe-83      XX   4  2.083333
# 5  One_apoe-83      YX  41 21.354167
# 6  One_apoe-83      YY 134 69.791667
# 7  One_CD9-269 Invalid   2  1.851852
# 8  One_CD9-269 No Call   5  4.629630
# 9  One_CD9-269     NTC   2  1.851852
# 10 One_CD9-269      XX  99 91.666667
``````
``````df %>%
filter(! Final %in% c("Invalid", "No Call", "NTC")) %>%
group_by(Assay) %>%
mutate(n_percent = n/sum(n)*100)

# Source: local data frame [4 x 4]
# Groups: Assay
#
#         Assay Final   n  n_percent
# 1 One_apoe-83    XX   4   2.234637
# 2 One_apoe-83    YX  41  22.905028
# 3 One_apoe-83    YY 134  74.860335
# 4 One_CD9-269    XX  99 100.000000
``````

## Dplyr: How to recode groups that have frequency less than 1% into "other" category using only dplyr

By : Natalia Pasichnyk
Date : March 29 2020, 07:55 AM
hop of those help? I am looking for a method to do the following: I have data: , This should work,
code :
``````DF %>%
group_by(group_name) %>%
mutate(new_group_name = ifelse(n()>10, group_name, 'others'))
``````

## dplyr: Find mean for each bin by groups

By : user80411
Date : March 29 2020, 07:55 AM
should help you out You seem to be flailing a bit. You've got correct code, then you've got extra code.
Starting from a fresh R session and defining your data, then
code :
``````library(dplyr)
res <- df %>% group_by(id, bin, sign) %>%
summarise(Num = n(), value = mean(value,na.rm=TRUE))
``````
``````head(res)
#   id   bin sign Num       value
# 1  A [0,1]    - 122 -0.08330338
# 2  A [0,1]    + 111  0.11394381
# 3  A [0,1] NULL   2  0.75232462
# 4  A (1,2]    -  54 -0.09236725
# 5  A (1,2]    +  45  0.20581095
# 6  A (2,3]    -  12 -0.08998771
``````
``````groupA = df[df\$id=="A" & df\$bin=="[0, 1]" & df\$sign=="NULL", ]
# mean(groupA\$value, na.rm=T)
#  0.7523246
``````
``````res %>% group_by(id) %>%
summarise(total= sum(Num))
``````
``````ddply(df, .(id, bin, sign), summarize, mean = mean(value,na.rm=TRUE))
``````

## Dplyr : how to find the first-non missing string by groups?

By : yared
Date : March 29 2020, 07:55 AM
this will help Summarise will give one entry per group, here, finding the first non-missing using which
code :
``````data %>%
group_by(group) %>%
summarise(first_non_missing = names[which(!is.na(names))])
``````
``````  group first_non_missing
<chr>             <chr>
1     A              fred
2     B              josh
``````

## Creating Groups with Dplyr's "group_by" then Using Stringr to Find Differences Between Groups

By : GCA CHD
Date : March 29 2020, 07:55 AM
First, see if Task matches Task2. If not, return Task2 as a new variable. I stored this into a new data frame df2
code :
``````df2 <- Df %>%
df2

# 1        John   Chris      Feed cat    Feed cat  TRUE
# 2        John   Chris   Make dinner Make dinner  TRUE
# 3        John   Chris    Iron shirt  Iron shirt  TRUE
# 4        John     Tom   Make dinner Make dinner  TRUE
# 5        John     Tom   Do homework Do homework  TRUE
# 6        John     Tom    Make lunch    Feed cat FALSE   Feed cat
# 7     Melanie Valerie   Make dinner Make dinner  TRUE
# 8     Melanie Valerie      Feed cat    Feed cat  TRUE
# 9     Melanie Valerie Buy groceries  Iron shirt FALSE Iron shirt
# 10    Melanie     Tim   Do homework Do homework  TRUE
# 11    Melanie     Tim    Iron shirt  Iron shirt  TRUE
# 12    Melanie     Tim    Make lunch  Make lunch  TRUE
``````
``````df2 %>%
group_by(CaseWorker, Client) %>%
summarise(n = n(),
matches = sum(match),
all_match = n == matches)

#   CaseWorker  Client     n matches all_match
#        <chr>   <chr> <int>   <int>     <lgl>
# 1       John   Chris     3       3      TRUE
# 2       John     Tom     3       2     FALSE
# 3    Melanie     Tim     3       3      TRUE
# 4    Melanie Valerie     3       2     FALSE
`````` 