logo
Tags down

shadow

How to find the difference of max & min values in one group in a variable in a dataframe


By : user2176158
Date : October 14 2020, 02:22 PM
To fix the issue you can do I have three variables A, B & C in the following format , A solution using dplyr and tidyr.
code :
library(dplyr)
library(tidyr)

dat2 <- dat %>%
  mutate(trip = cumsum(is.na(C))) %>%
  drop_na(C) %>%
  mutate(trip = group_indices(., trip)) %>%
  group_by(trip) %>%
  summarize(Diff = max(C) - min(C)) %>%
  ungroup()
dat2

# # A tibble: 2 x 2
#    trip  Diff
#   <int> <dbl>
# 1     1     5
# 2     2     7
dat <- read.table(text = "A         B     C
Cat1      1    NA       
                  Cat1      2    NA
                  Cat1      1    NA
                  Cat1      2    NA
                  Cat1      NA   4
                  Cat1      NA   1
                  Cat1      NA   6
                  Cat1      NA   4
                  Cat1      7    NA       
                  Cat1      9    NA
                  Cat1      3    NA
                  Cat1      2    NA
                  Cat1      NA   2
                  Cat1      NA   4 
                  Cat1      NA   5
                  Cat1      NA   9",
                  header = TRUE, stringsAsFactors = FALSE)


Share : facebook icon twitter icon

How to find difference between values in two rows in an R dataframe using dplyr


By : John Jackson
Date : March 29 2020, 07:55 AM
fixed the issue. Will look into that further I have an R dataframe such as: , In dplyr:
code :
require(dplyr)
df %>%
  group_by(farm) %>%
  mutate(volume = cumVol - lag(cumVol, default = cumVol[1]))

Source: local data frame [8 x 5]
Groups: farm

  period farm cumVol other volume
1      1    A      1     1      0
2      2    A      5     2      4
3      3    A     15     3     10
4      4    A     31     4     16
5      1    B     10     5      0
6      2    B     12     6      2
7      3    B     16     7      4
8      4    B     24     8      8
df %>%
  group_by(farm) %>%
  mutate(volume = cumVol - lag(cumVol, default = 0))

  period farm cumVol other volume
1      1    A      1     1      1
2      2    A      5     2      4
3      3    A     15     3     10
4      4    A     31     4     16
5      1    B     10     5     10
6      2    B     12     6      2
7      3    B     16     7      4
8      4    B     24     8      8
df1 <- data.frame(period=rep(1:4,4), farm=rep(c(rep('A',4),rep('B',4)),2), crop=(c(rep('apple',8), rep('pear',8))), cumCropVol=c(1,5,15,31,10,12,16,24,11,15,25,31,20,22,26,34), other = rep(1:8,2) ); 
df1 %>% 
  arrange(desc(period), desc(farm)) %>%
  group_by(period, farm) %>% 
  summarise(cumVol=sum(cumCropVol))
df1 <- data.frame(period=rep(1:4,4), farm=rep(c(rep('A',4),rep('B',4)),2), crop=(c(rep('apple',8), rep('pear',8))), cumCropVol=c(1,5,15,31,10,12,16,24,11,15,25,31,20,22,26,34), other = rep(1:8,2) ); 
df <- df1 %>% 
  arrange(desc(period), desc(farm)) %>% 
  group_by(period, farm) %>% 
  summarise(cumVol=sum(cumCropVol))

ungroup(df) %>% 
  arrange(farm) %>%
  group_by(farm) %>% 
  mutate(volume = cumVol - lag(cumVol, default = 0))

Source: local data frame [8 x 4]
Groups: farm

  period farm cumVol volume
1      1    A     12     12
2      2    A     20      8
3      3    A     40     20
4      4    A     62     22
5      1    B     30     30
6      2    B     34      4
7      3    B     42      8
8      4    B     58     16

Find difference (subtraction) for each variable in dataset by group


By : user2610067
Date : March 29 2020, 07:55 AM
this one helps. I have a dataset with about 20 variables. Data was collected over three years (2012-2014) and for each year each observation can be grouped by Site and Plot.
code :
library(dplyr)    
fun <- funs(
  lag1 = .-lag(., order_by = Year, n = 1), 
  lag2 = .-lag(., order_by = Year, n = 2)
)
df %>% 
  group_by(Site, Plot) %>% 
  mutate_each(fun, -Year)
lag(c(1,2,3))
lag(c(1,2,3), order_by = c(2, 1, 3), n = 1)

Add line numbers per group of variable values in R dataframe


By : Mike B
Date : March 29 2020, 07:55 AM
this will help I have an R dataframe like this: , Try a group_by on blogger?
code :
df %>% arrange(blogger, date) %>% 
group_by(blogger) %>% 
mutate(linenumber = row_number()) %>%
ungroup()

How to find the correlation between a group of values in a pandas dataframe column


By : Anneta Tselepi
Date : March 29 2020, 07:55 AM
I wish did fix the issue. I have a dataframe df:
code :
df.groupby('ID').corr()
             Var1      Var2
ID                         
1  Var1  1.000000  0.981981
   Var2  0.981981  1.000000
2  Var1  1.000000  0.970725
   Var2  0.970725  1.000000
df_out = df.groupby('ID').corr()
(df_out[~df_out['Var1'].eq(1)]
          .reset_index(1, drop=True)['Var1']
          .rename('Corr_Coef')
          .reset_index())
   ID  Corr_Coef
0   1   0.981981
1   2   0.970725

Python 2.7: DataFrame groupby and find find the percentage distribution of values within group


By : Balz Rittmeyer
Date : March 29 2020, 07:55 AM
To fix this issue I have a dataframe and i would like to find the percentage difference of values in a column within a group. , If I understand correctly what you need, this might help:
code :
sums = df.groupby(['race', 'tyre', 'stint'])['total diff'].sum()
df = df.set_index(['race', 'tyre', 'stint']).assign(pct=sums).reset_index()
df['pct'] = df['total diff'] / df['pct']

#                     race        tyre  stint  driverRef    total diff       pct
# 0  Australian Grand Prix  Super soft    1.0     vettel  1.251475e+05  0.027613
# 1  Australian Grand Prix  Super soft    1.0  raikkonen  2.812920e+05  0.062065
# 2  Australian Grand Prix  Super soft    1.0    rosberg  1.662784e+05  0.036688
# 3  Australian Grand Prix  Super soft    1.0   hamilton  6.404423e+04  0.014131
# 4  Australian Grand Prix  Super soft    1.0  ricciardo  6.483833e+05  0.143060
# 5  Australian Grand Prix  Super soft    1.0     alonso  4.006758e+05  0.088406
# 6  Australian Grand Prix  Super soft    1.0   haryanto  2.846411e+06  0.628037
Related Posts Related Posts :
  • Multinomial probit regression with mixed type explanatory variables
  • How can I make a variable in a dataset containing a vector of all numbers between two other variables?
  • How to extract the trailing digits from a string in R?
  • Select values based on other columns
  • readLines killing R in purrr::map
  • Subset rows based on "start and stop" strings
  • How to add a column to lists within a list without losing their names?
  • Plotting the means in ggplot, without using stat_summary()
  • R :Looping through each 5 rows of data frame and imputing incremental value
  • In R, is growing a list just as inefficient as growing a vector?
  • Flexdashboard, rhandsontable: how to programmatically access user updated table?
  • Creating Summary Table from R Variables
  • Average over groups and include previous groups
  • R: data.table count rows on specific columns > 0
  • Transform (shuffle) just 2 Fields in a Dataframe
  • Issue with replacing string by match in R
  • (very) Simple quantstrat trading model using logistic regression
  • R - count maximum number of consecutive dates
  • Problems using tidyr separate on "|"
  • Default value when calling a function in a for loop
  • Finding values in a matrix from list of values in R
  • count 0's in a zoo (or dataframe) object
  • Finding the first non-zero year in data frame for multiple variables using tidyverse
  • ggplot2 - how to assign geom_text with arrow icon to second yaxis scale
  • regex fails with dollar sign
  • Drop first element of list of lists, condense list of lists? Too many elements?
  • R - how to apply output of ifelse(str_detect ...) to whole group
  • caret package confusion matrix define positive case with multiple classes
  • Generating a pairwise 'distance' matrix
  • Change all R columns names using a reference file
  • In R & dabestr, how do I get grouped differences correctly?
  • Exclude or set a unique color to the bottom triangle of a correlation matrix heatmap
  • r shiny observe function clears text input
  • Split column by multiple delimiters, keeping delimiters
  • How to random search in a specified grid in caret package?
  • merge 2 data frames in a loop for each column in one of them
  • how to edit the codes for the summary of R S4 Object?
  • Remove specific rows in R
  • Flatten JSON list into data frame
  • Filtering a dataset and making a ggplot
  • Align cells vertically to be at the bottom flextable
  • R speed up sapply
  • invalid subscript type 'list' Azure Machine Learning
  • Use rollapply with xts object and an anonymous defined function
  • Isolate data frames from a spreadsheet to create a list
  • Error in xts, as.POSIXct "'order.by' cannot contain 'NA', 'NaN', or 'Inf'"
  • Column splitting in R
  • number similar/duplicated rows in R
  • Count the number of times each value appears in a row dataframe r
  • how to vectorise my code in r using for loop?
  • A function to fill in a column with NA of the same type
  • Network flow balancing constraint in R
  • Adding main titles from list to graphs in for loop
  • create a matrix in Perl or R if data is provided in CSV file
  • Passing column names as string to with
  • R - filtering rows and summing
  • How to change the order of fill aesthetic in faceted ggplot?
  • Function to remove outliers by group from dataframe
  • Convert unicode to a readable string
  • Wrong scale/difficult to interpret times on time series object using 'ts'
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org