logo
down
shadow

R: How to fill in NA Values within a Column based on grouping?


R: How to fill in NA Values within a Column based on grouping?

By : Olivia
Date : November 21 2020, 03:00 PM
I wish this helpful for you I'm looking to replace the NA values in this example data frame with either 'A' or 'B' depending on their 'second' column category: (A for A1, B for B1) , No need for dplyr. This should work in base R:
code :
df$first[is.na(df$first)] <- gsub("(\\w)\\d", "\\1", df$second[is.na(df$first)])
  first second
1     A     A1
2     A     A1
3     A     A1
4     A     A1
5     B     B1
6     B     B1
7     B     B1
8     B     B1


Share : facebook icon twitter icon
Excel - calculate average of values in one column based on another grouping column. The number of rows is not constant p

Excel - calculate average of values in one column based on another grouping column. The number of rows is not constant p


By : Healthy Syntax
Date : March 29 2020, 07:55 AM
it fixes the issue Two columns, one with ID and one with values. I want to calculate average per ID. The number of rows per ID is not constant. What i have: , Just use these formulas:
code :
=AVERAGEIF(A:A,A2,B:B)
=SUMIF(A:A,A2,B:B)/COUNTIF(A:A,A2)
Creating a new r data.table column based on values in another column and grouping

Creating a new r data.table column based on values in another column and grouping


By : Nandita
Date : March 29 2020, 07:55 AM
this will help I have a data.table with date, zipcode and purchase amounts. , This seems to work:
code :
DT[, new_col := 
  DT[.(zip = zip, d0 = date - 10, d1 = date), on=.(zip, date >= d0, date <= d1), 
    sum(purchaseAmount)
  , by=.EACHI ]$V1
]


          date  zip purchaseAmount new_col
 1: 2016-01-08 1150              5       5
 2: 2016-01-15 3000             15      15
 3: 2016-02-15 1150             16      16
 4: 2016-02-20 2000             18      18
 5: 2016-03-07 2000             19      19
 6: 2016-03-15 2000             11      30
 7: 2016-03-17 2000              6      36
 8: 2016-04-02 1150             17      17
 9: 2016-04-08 3000              7       7
10: 2016-04-09 3000             20      27
DT[.(zip = zip, d0 = date - 10, d1 = date), on=.(zip, date >= d0, date <= d1), 
  sum(purchaseAmount)
, by=.EACHI ]$V1
DT[.(zip = zip, d0 = date - 10, d1 = date), on=.(zip, date >= d0, date <= d1), 
  sum(purchaseAmount)
, by=.EACHI ]
# note that V1 is the default name for computed columns

DT[.(zip = zip, d0 = date - 10, d1 = date), on=.(zip, date >= d0, date <= d1)]
# now we're down to just the join
Create a new column based on Grouping of similar values in another column in pandas

Create a new column based on Grouping of similar values in another column in pandas


By : Nurdin Hishasy
Date : March 29 2020, 07:55 AM
seems to work fine Hi I have an event data frame with datetimes and event ids and sensor ids. I would like to group events that happen within one hour per sensor and if possible tag them with the group count. Original Data Frame , You can use the pd.TimeGrouper + ngroup to group by time frequency.
code :
df['time'] = pd.to_datetime(df.time)
df['group'] = df.set_index('time').groupby(['sensor_id', 
                    pd.TimeGrouper(freq='1H')], sort=False).ngroup().values
df['group'] = df.groupby('sensor_id').group.apply(lambda x: x - x.min() + 1)

df

  sensor_id event_id                time  group
0         A       e1 2017-02-14 05:30:00      1
1         A       e2 2017-02-14 05:45:00      1
2         A       e3 2017-02-14 08:30:00      2
3         B       e3 2017-02-14 05:20:00      1
4         B       e4 2017-02-14 05:30:00      1
5         B       e6 2017-02-14 05:45:00      1
6         C       e1 2017-02-14 05:30:00      1
7         C       e3 2017-02-14 07:30:00      2
8         C       e7 2017-02-14 09:35:00      3
Pandas fill missing values of a column based on the datetime values of another column

Pandas fill missing values of a column based on the datetime values of another column


By : Sanjana Rajavel
Date : March 29 2020, 07:55 AM
hope this fix your issue Your intuition seems fine by me, but you can't apply it this way since your dataframe foo doens't have the same size as your groupby dataframe. What you could do is map the values like this:
code :
foo['last'] = foo.sess_id.map(foo.groupby('sess_id').DATE.max())
foo['first'] = foo.sess_id.map(foo.groupby('sess_id').DATE.min())
def my_custom_function(time):
    current_sessions = my_agg.loc[(my_agg['min']<time) & (my_agg['max']>time)]
    count = len(current_sessions)
    if count == 0:
        return 0
    if count > 1:
        return -99
    return current_sessions.index[0]

my_agg = foo.groupby('sess_id').DATE.agg([min,max])
foo.loc[foo.sess_id.isnull(),'sess_id'] = foo.loc[foo.sess_id.isnull(),'DATE'].apply(my_custom_function)
    DATE                    sess_id
0   2018-01-01 00:19:01     a
1   2018-01-01 00:19:05     b
2   2018-01-01 00:21:07     a
3   2018-01-01 00:22:07     b
4   2018-01-01 00:25:09     c
5   2018-01-01 00:25:11     -99
6   2018-01-01 00:27:28     c
7   2018-01-01 00:29:29     a
8   2018-01-01 00:30:35     b
9   2018-01-01 00:31:16     b
10  2018-01-01 00:35:22     0
counting all string values in given column of a table and grouping it based on third column

counting all string values in given column of a table and grouping it based on third column


By : archal
Date : March 29 2020, 07:55 AM
it helps some times I have three columns. the table looks like this: , Using extractall and crosstab:
code :
s = df.names.str.extractall(r'(\w+)').reset_index(1, drop=True).join(df.tag)

pd.crosstab(s[0], s['tag'])
tag    0  1
0
john   0  1
robin  0  2
sam    1  1
Related Posts Related Posts :
  • How to define a function that calls shiny functions?
  • How to count number of observations in a "n" dimensional range in R
  • Superimposing asymmetric t-distribution using ggplot2
  • Makefile to render all targets of all .Rmd files in directory
  • Authentication failure with rdrop2
  • DT data table display error
  • Issue when adding new rows (with nested dataframes within) to a dataframe
  • R-How to compare two dataframe and update list column value
  • Series vector for approximating pi
  • what is difference between "variance explained " in Random Forest and "merror" in XGBoost
  • R - Cast dataframe on unique rows - reshape2
  • ggplot2: plot correct proportions using geom_bar
  • Speedup query for R data.table - can this two-argument function be applied by group more quickly?
  • apply a function to several columns at once with mutate
  • R 'cowplot' neatly produce gridded plot with shared (common) legends and unique legends
  • Repeat R script for many times and save results to text file
  • How to negative lookbehind for special characters
  • data.table inner join produces error when no match is found
  • Create a new column base on existing column, but row above
  • Is there a way to visualize the process of source() in RStudio?
  • google places api consumes 10 request but I am doing only 1
  • Statistical mode of a categorical variable in R (using mlv)
  • Using for-loop to mutate a data.frame in r
  • Make plot with regression line for mixed model
  • Shortcut to select matces cases in R studio
  • vectoriced norm/matrix multiplication
  • Negative log10 transformation in R
  • Plot data with duplicate points
  • Visualizing crosstab tables with a plot in R - changing colours
  • How to manually modify automated numbers and labels in plot
  • How can I follow any redirections of a url in R?
  • Add jitter to box plot using markers in plotly
  • Adding an extra item to the legend
  • ggplot fills in data in the wrong order
  • Convert list to data frame
  • R: filtering by list(s) of strings and returning all results that start with the content of the lists
  • R:How to attach parts of a data frame with different headers and/or an overflowing piece of the dat frame
  • How to use 'par' for manipulating plot margins?
  • Can dplyr::case_when return mix of NAs and non-NAs?
  • Text preprocessing and topic modelling using text2vec package
  • Uploading multiple files in Shiny, process the files, rbind the results and return a download
  • R levelplot: color green-white-red (white on 0) according to one variable, but show the values of another variable
  • Why [i] doesn't point to the starting point in a vector
  • In R after generating a mvrnorm distribution, Y, what does Y[,1] do?
  • expand a data frame to have as many rows as range of two columns in original row
  • Getting started with R and CFA
  • Re order x-axis in ggplot so time goes from 12AM to 11PM in R
  • R - Automatically stack every nth column of a data frame and save them as new objects
  • How to format dplyr output in R into doubles (or other workable format)?
  • Dataframe to matrix conversion using tapply turns zeros to NAs
  • Smallest multiple of 1:20 - How can I make it quicker?
  • How to specify the size of a graph in ggplot2 independent of axis labels
  • How can I find the number of a vector's elements in another vector?
  • ROC curve from train/test set in caret R package
  • Random Forest for a mixture of categorical,numeric and "unwanted" variables which include missing values
  • extract certain data from multiple excel files with R
  • Matrix with counts of wins and losses between methods in R
  • Grouping string variables from a dataframe by best string match to make subsets
  • Reorder does not work after adding second geom_points
  • cover POS data formate to the one can apply Arules (Apriori)
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org