logo
Tags down

shadow

find frequency of events in groups dplyr


By : Jeff Gao
Date : August 01 2020, 03:00 PM
may help you . I have a grouped df with different lengths of groups. I want to count y/n events within each group. So if I have the following: , We can count and then calculate proportion by group.
code :
library(dplyr)

df %>% count(group, outcome) %>% group_by(group) %>% mutate(n = n/sum(n) * 100)

#  group outcome   n
#  <int> <fct>   <dbl>
#1     1 no       40  
#2     1 yes      60  
#3     2 no       40  
#4     2 yes      60  
#5     3 no       35.3
#6     3 yes      64.7
#7     4 no       50  
#8     4 yes      50  
prop.table(table(df), 1) * 100

#    outcome
#group       no      yes
#    1 40.00000 60.00000
#    2 40.00000 60.00000
#    3 35.29412 64.70588
#    4 50.00000 50.00000


Share : facebook icon twitter icon

Use dplyr to find genotype frequency across SNPs


By : user2689900
Date : March 29 2020, 07:55 AM
hop of those help? Hope I am not misunderstanding. Are you looking for below:
Assume the data structure is:
code :
df <- structure(list(Assay = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L), .Label = c("One_apoe-83", "One_CD9-269"), class = "factor"), 
    Final = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L
    ), .Label = c("Invalid", "No Call", "NTC", "XX", "YX", "YY"
    ), class = "factor"), n = c(2L, 9L, 2L, 4L, 41L, 134L, 2L, 
    5L, 2L, 99L)), .Names = c("Assay", "Final", "n"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))
df %>% group_by(Assay) %>% mutate(n_percent = n/sum(n)*100)
#          Assay   Final   n n_percent
# 1  One_apoe-83 Invalid   2  1.041667
# 2  One_apoe-83 No Call   9  4.687500
# 3  One_apoe-83     NTC   2  1.041667
# 4  One_apoe-83      XX   4  2.083333
# 5  One_apoe-83      YX  41 21.354167
# 6  One_apoe-83      YY 134 69.791667
# 7  One_CD9-269 Invalid   2  1.851852
# 8  One_CD9-269 No Call   5  4.629630
# 9  One_CD9-269     NTC   2  1.851852
# 10 One_CD9-269      XX  99 91.666667
df %>% 
  filter(! Final %in% c("Invalid", "No Call", "NTC")) %>% 
  group_by(Assay) %>% 
  mutate(n_percent = n/sum(n)*100)

# Source: local data frame [4 x 4]
# Groups: Assay
# 
#         Assay Final   n  n_percent
# 1 One_apoe-83    XX   4   2.234637
# 2 One_apoe-83    YX  41  22.905028
# 3 One_apoe-83    YY 134  74.860335
# 4 One_CD9-269    XX  99 100.000000

Dplyr: How to recode groups that have frequency less than 1% into "other" category using only dplyr


By : Natalia Pasichnyk
Date : March 29 2020, 07:55 AM
hop of those help? I am looking for a method to do the following: I have data: , This should work,
code :
DF %>% 
  group_by(group_name) %>% 
  mutate(new_group_name = ifelse(n()>10, group_name, 'others'))

dplyr: Find mean for each bin by groups


By : user80411
Date : March 29 2020, 07:55 AM
should help you out You seem to be flailing a bit. You've got correct code, then you've got extra code.
Starting from a fresh R session and defining your data, then
code :
library(dplyr)
res <- df %>% group_by(id, bin, sign) %>%
        summarise(Num = n(), value = mean(value,na.rm=TRUE))
head(res)
#   id   bin sign Num       value
# 1  A [0,1]    - 122 -0.08330338
# 2  A [0,1]    + 111  0.11394381
# 3  A [0,1] NULL   2  0.75232462
# 4  A (1,2]    -  54 -0.09236725
# 5  A (1,2]    +  45  0.20581095
# 6  A (2,3]    -  12 -0.08998771
groupA = df[df$id=="A" & df$bin=="[0, 1]" & df$sign=="NULL", ]
# mean(groupA$value, na.rm=T)
# [1] 0.7523246
res %>% group_by(id) %>%
                summarise(total= sum(Num))
ddply(df, .(id, bin, sign), summarize, mean = mean(value,na.rm=TRUE))

Dplyr : how to find the first-non missing string by groups?


By : yared
Date : March 29 2020, 07:55 AM
this will help Summarise will give one entry per group, here, finding the first non-missing using which
code :
data %>%
  group_by(group) %>%
  summarise(first_non_missing = names[which(!is.na(names))[1]])
  group first_non_missing
  <chr>             <chr>
1     A              fred
2     B              josh

Creating Groups with Dplyr's "group_by" then Using Stringr to Find Differences Between Groups


By : GCA CHD
Date : March 29 2020, 07:55 AM
wish help you to fix your issue See if this is what you're after.
First, see if Task matches Task2. If not, return Task2 as a new variable. I stored this into a new data frame df2
code :
df2 <- Df %>% 
    mutate(match = Task == Task2,
           non_match = ifelse(!match, Task2, "")) 
df2

#    CaseWorker  Client          Task       Task2 match  non_match
# 1        John   Chris      Feed cat    Feed cat  TRUE           
# 2        John   Chris   Make dinner Make dinner  TRUE           
# 3        John   Chris    Iron shirt  Iron shirt  TRUE           
# 4        John     Tom   Make dinner Make dinner  TRUE           
# 5        John     Tom   Do homework Do homework  TRUE           
# 6        John     Tom    Make lunch    Feed cat FALSE   Feed cat
# 7     Melanie Valerie   Make dinner Make dinner  TRUE           
# 8     Melanie Valerie      Feed cat    Feed cat  TRUE           
# 9     Melanie Valerie Buy groceries  Iron shirt FALSE Iron shirt
# 10    Melanie     Tim   Do homework Do homework  TRUE           
# 11    Melanie     Tim    Iron shirt  Iron shirt  TRUE           
# 12    Melanie     Tim    Make lunch  Make lunch  TRUE           
df2 %>% 
   group_by(CaseWorker, Client) %>% 
   summarise(n = n(),
             matches = sum(match),
             all_match = n == matches)

#   CaseWorker  Client     n matches all_match
#        <chr>   <chr> <int>   <int>     <lgl>
# 1       John   Chris     3       3      TRUE
# 2       John     Tom     3       2     FALSE
# 3    Melanie     Tim     3       3      TRUE
# 4    Melanie Valerie     3       2     FALSE
Related Posts Related Posts :
  • Cannot install httr package in R 3.6.2 in Linux Mint 19.3
  • I am unable to create this variable
  • How can I wrap lines of code into an function that I can run with one command in R?
  • Code for converting entire data frame to numeric
  • Using which and ! functions in R
  • How to replace NAs with values from another column in data.table (Example given)?
  • Visualising two very different distributions in one plot
  • R: a tidy way to count number of rows between pipes?
  • Collapsing rows using two vectors as indicators
  • I am trying to calculate the sum of distances between every uninfected point and an infected point
  • R lubridate: Apply helper to dataframe
  • Why does Dplyr group_by not respect .drop=FALSE
  • Is there a function in R that can convert a "time" factor (00:00:00) into seconds
  • Partial functions keeping their signature
  • Completing or inserting empty rows in-between ordered factors
  • How to sum up the duplicated value and keep all the rows on R
  • Filtering for two identical consecutive entries in a column
  • why colour not showing as argument in ggplot for R language
  • Clustering using daisy and pam in R
  • string abbreviation creating dublicates
  • Create a new matrix based on a previous one in R
  • Continuous X variable used but still getting Error: StatBin requires a continuous x variable
  • Classic Statistics Probability in R Draws
  • Removing String from the column in R
  • How to add columns to a dataframe through lapply
  • Error in match.arg(regions) : 'arg' must be NULL or a character vector
  • Substitution Encryption/Decryption in R
  • Calculate average based on date range in R
  • How to select among 3 values, the 2 closest to each other in R?
  • Generate random weights vector with fixed sum
  • glmnet: extracting standardized coefficients
  • Colour stacked bar-chart with unique colour for each bar in ggplot
  • Is there a better way in R to split a file with multiple sections
  • How to remove footnote references from a column in R?
  • Is there something wrong with the sjPlot package?
  • efficient way of selecting rows with a minimum time spacing between dates while grouping
  • maximizing function using optim in r where one of the parameters is an integer
  • How to iterate multiple data frames in R?
  • How do I select column based on value in another column with dplyr?
  • Why i cant use "i" value of for loops in ggplot graphs?
  • How to replace blanks (“”) in certain columns (specifically, “NAICSP” and “SOCP”) with NA values
  • How do I plot a graph in R, with the first values in one colour and the next values in another colour?
  • how to compare a vector of length 1 with a vector of length greater than 1?
  • How to remove all columns that contain more than 2000 NA values?
  • How to modify shared objects using foreach loop in R?
  • R making dataset
  • How open PDF file in the same window as R Shiny application works?
  • how to fit multiple ggplot charts on a a4 pdf?
  • How do i create a 3d surface plot in R If I have a dataframe of 3 columns?
  • How to find out the different data types for sparse matrices in R
  • Convert (format) clock time from H:MM to HH:MM
  • R-Lookup matching values in another dataframe and then merge them in original by creating new variables
  • Data Table Solution in R To Trim
  • Fill Missing Values
  • Sum of variables in a grouped barplot in ggplot2
  • Correlation Matrix Between Two Dataframes in R
  • Matlab function uint8 in R
  • Interpolate year-month-day from year, month, and week data in R
  • Why can I not use the output of my function to get an output for my other defined function in R?
  • Can ggplot titles contain line breaks (when used with ggtext)?
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org