logo
down
shadow

expand a data frame to have as many rows as range of two columns in original row


expand a data frame to have as many rows as range of two columns in original row

By : Bing Yue
Date : November 22 2020, 03:01 PM
should help you out I have a data frame as follows: , With dplyr, we can use rowwise with do
code :
library(dplyr)
df1 %>% 
   rowwise() %>% 
   do(data.frame(symbol= .$symbol, value = .$start:.$end)) %>% 
   arrange(symbol)
# A tibble: 30 x 2
#   symbol value
#    <chr> <int>
# 1      a     7
# 2      a     8
# 3      a     9
# 4      a    10
# 5      a    11
# 6      i     8
# 7      i     9
# 8      i    10
# 9      i    11
#10      i    12
# ... with 20 more rows


Share : facebook icon twitter icon
Expand data frame rows by date range with NA values

Expand data frame rows by date range with NA values


By : Cledir J. S.
Date : March 29 2020, 07:55 AM
To fix the issue you can do 1- Using gsub, get the year from each row and form a sequence of it. Then use expand.grid to expand the value of IndID with the above sequence. Finally rbind the list of data frames into one data frame.
code :
dat[is.na(dat$CptrDt), "CptrDt"] <- as.Date("01-01-2017", "%m-%d-%Y")
dat[is.na(dat$MortDt), "MortDt"] <- as.Date("01-01-2017", "%m-%d-%Y")

do.call('rbind', apply(dat, 1, function(x) {
                                             pattern <- '([0-9]{4})-[0-9]{2}-[0-9]{2}';
                                             y <- as.numeric( gsub( pattern, '\\1', x[2:3] ) );
                                             expand.grid( IndID = x[1], 
                                                          Year = seq( y[1], y[2], by = 1 ) )
                                            }))

#    IndID Year
# 1    AAA 2013
# 2    AAA 2014
# 3    AAA 2015
# 4    BBB 2013
# 5    BBB 2014
# 6    BBB 2015
# 7    BBB 2016
# 8    CCC 2014
# 9    CCC 2015
# 10   CCC 2016
# 11   CCC 2017
dat[is.na(dat$CptrDt), "CptrDt"] <- as.Date("01-01-2017", "%m-%d-%Y")
dat[is.na(dat$MortDt), "MortDt"] <- as.Date("01-01-2017", "%m-%d-%Y")

dat$CptrDt <- format(dat$CptrDt, "%Y")
dat$MortDt <- format(dat$MortDt, "%Y")

do.call('rbind', apply(dat, 1, function(x) { expand.grid( IndID = x[1], 
                                                          Year = seq( as.numeric( x[2] ), as.numeric( x[3] ), by = 1 ) ) }))
dat <- data.frame(IndID = c("AAA","BBB","CCC"),
                  CptrDt  = as.Date(c("01-01-2013" ,"01-01-2013", "01-01-2014"),"%m-%d-%Y"),
                  MortDt =  as.Date(c("01-01-2015" ,"01-01-2016", NA),"%m-%d-%Y"))
Expand Rows and Add Columns in Data Frame Based On Another Data Frame

Expand Rows and Add Columns in Data Frame Based On Another Data Frame


By : Mr X Debilal Baskey
Date : March 29 2020, 07:55 AM
I wish did fix the issue. About the OP's approach of using 'team.df' as input in the Map/mapply 'team.df' is a data.frame which is a list of columns. So, the basic input is a column of vector. It loops through the vector or column instead of the whole dataset or the rows (based on the desired output). To prevent that, if we wrap with list, it is a single unit, which recycles to each of the list elements of the 'list.of.all.stars'
code :
do.call(rbind, Map(cbind, list(team.df), list.of.all.stars))
res <- do.call(rbind, Map(cbind,  split(team.df, seq_len(nrow(team.df))), list.of.all.stars))
row.names(res) <- NULL
res
#   Team_Name Team_Location         Player Captain
#1 Cavaliers Cleveland, OH   LeBron James    TRUE
#2 Cavaliers Cleveland, OH     Kevin Love   FALSE
#3  Warriors   Oakland, CA  Stephen Curry    TRUE
#4  Warriors   Oakland, CA   Kevin Durant   FALSE
#5  Warriors   Oakland, CA  Klay Thompson   FALSE
#6  Warriors   Oakland, CA Draymond Green   FALSE
library(tidyverse)
team.df %>% 
      group_by_all() %>%
      nest %>% 
      mutate(data = list.of.all.stars) %>% 
      unnest
# A tibble: 6 x 4
#  Team_Name Team_Location Player         Captain
#  <chr>     <chr>         <chr>          <lgl>  
# 1 Cavaliers Cleveland, OH LeBron James   T      
# 2 Cavaliers Cleveland, OH Kevin Love     F      
# 3 Warriors  Oakland, CA   Stephen Curry  T      
# 4 Warriors  Oakland, CA   Kevin Durant   F      
# 5 Warriors  Oakland, CA   Klay Thompson  F      
# 6 Warriors  Oakland, CA   Draymond Green F      
Sum of data frame's rows in range defined by columns

Sum of data frame's rows in range defined by columns


By : Jonathan
Date : March 29 2020, 07:55 AM
I hope this helps . I have an integer based dataframe with positional coordinates in one column and a variable in the second. The coordinates range from 1-10 million, the variables from 0-950 - I'm interested in returning the sum of the variables from ranges defined within a separate frame containing the start and end points of the desired range. , Edit: @Frank's data.table solution: short and fast.
code :
df2[, s := df1[df2, on=.(a >= c, a <= d), sum(b), by=.EACHI]$V1]

    # output
       c d s
    1: 1 3 1
    2: 1 4 1
    3: 2 3 1
    4: 2 5 3
    5: 3 4 1
library(data.table)
setDT(df1)
setDT(df2)

## magic function
get_magic <- function(x)
{
    spell <- c()

    one <- unlist(x[1])
    two <- unlist(x[2])

    a <- df1[between(a, one, two), sum(b)]
    spell <- append(spell, a)

    return(spell)

}


# applies to row
d <- apply(df2, 1, get_magic)

print(d)
# output
[1] 1 1 1 3 1
Is there a way in python to expand data frame rows and columns at the same time?

Is there a way in python to expand data frame rows and columns at the same time?


By : user3061679
Date : March 29 2020, 07:55 AM
Any of those help here's a sample of code
code :
def myFunction(base, plus):
    #Initialize result array
    result = []
    #For Each tuple in entry
    for bas in base:
        #Get Last Element
        lastElem = bas[-1:][0]
        #For Each element to add
        for x in plus:
            # Append a tuple composed of base + sum(lastElement & element to add)
            result.append(bas + ( (lastElem+x),) )
    # Return result
    return result
first_elem = [(18,)]   
add = [6,0,-6]
print(myFunction(first_elem, [6,0,-6]))
#[(18, 24), (18, 18), (18, 12)]
print(myFunction([(18, 24), (18, 18), (18, 12)], [6,0,-6]))
#[(18, 24, 30), (18, 24, 24), (18, 24, 18), (18, 18, 24), (18, 18, 18), (18, 18, 12), (18, 12, 18), (18, 12, 12), (18, 12, 6)]
Expand nested lists to rows, create headers, and map back to original columns

Expand nested lists to rows, create headers, and map back to original columns


By : dotiendiep
Date : March 29 2020, 07:55 AM
I hope this helps you . Explode column results and assign to df1. Create the new dataframe from list of sublist of df1.results and reset_index
code :
df1 = df.explode('results')
pd.DataFrame(df1.results.tolist(), 
             index=df1.column_name,
             columns=['num', 'pct', 'index']).reset_index()

Out[562]:
    column_name  num  pct  index
0  income_level    0   12     13
1  income_level    0   98     43
2  income_level    1   29     73
3  income_level    2   12     34
4     geo_level    0   78     23
5     geo_level    1   56     67
6     geo_level    2   67     34
pd.DataFrame(df.results.sum(), 
             index=np.repeat(df.column_name, df.results.str.len()), 
             columns=['num', 'pct', 'index']).reset_index()

Out[572]:
    column_name  num  pct  index
0  income_level    0   12     13
1  income_level    0   98     43
2  income_level    1   29     73
3  income_level    2   12     34
4     geo_level    0   78     23
5     geo_level    1   56     67
6     geo_level    2   67     34
Related Posts Related Posts :
  • R 'cowplot' neatly produce gridded plot with shared (common) legends and unique legends
  • Repeat R script for many times and save results to text file
  • How to negative lookbehind for special characters
  • data.table inner join produces error when no match is found
  • Create a new column base on existing column, but row above
  • Is there a way to visualize the process of source() in RStudio?
  • google places api consumes 10 request but I am doing only 1
  • Statistical mode of a categorical variable in R (using mlv)
  • Using for-loop to mutate a data.frame in r
  • Make plot with regression line for mixed model
  • Shortcut to select matces cases in R studio
  • vectoriced norm/matrix multiplication
  • Negative log10 transformation in R
  • Plot data with duplicate points
  • Visualizing crosstab tables with a plot in R - changing colours
  • How to manually modify automated numbers and labels in plot
  • How can I follow any redirections of a url in R?
  • Add jitter to box plot using markers in plotly
  • Adding an extra item to the legend
  • ggplot fills in data in the wrong order
  • Convert list to data frame
  • R: filtering by list(s) of strings and returning all results that start with the content of the lists
  • R:How to attach parts of a data frame with different headers and/or an overflowing piece of the dat frame
  • How to use 'par' for manipulating plot margins?
  • Can dplyr::case_when return mix of NAs and non-NAs?
  • Text preprocessing and topic modelling using text2vec package
  • Uploading multiple files in Shiny, process the files, rbind the results and return a download
  • R levelplot: color green-white-red (white on 0) according to one variable, but show the values of another variable
  • Why [i] doesn't point to the starting point in a vector
  • In R after generating a mvrnorm distribution, Y, what does Y[,1] do?
  • Getting started with R and CFA
  • Re order x-axis in ggplot so time goes from 12AM to 11PM in R
  • R - Automatically stack every nth column of a data frame and save them as new objects
  • How to format dplyr output in R into doubles (or other workable format)?
  • Dataframe to matrix conversion using tapply turns zeros to NAs
  • Smallest multiple of 1:20 - How can I make it quicker?
  • How to specify the size of a graph in ggplot2 independent of axis labels
  • How can I find the number of a vector's elements in another vector?
  • ROC curve from train/test set in caret R package
  • Random Forest for a mixture of categorical,numeric and "unwanted" variables which include missing values
  • extract certain data from multiple excel files with R
  • Matrix with counts of wins and losses between methods in R
  • Grouping string variables from a dataframe by best string match to make subsets
  • Reorder does not work after adding second geom_points
  • cover POS data formate to the one can apply Arules (Apriori)
  • Matching values between data frames based on overlapping dates
  • Grouped bar chart turns into stacked bar chart ggplot
  • R: How to fill in NA Values within a Column based on grouping?
  • Two action buttons, but only the first one, that is written in the server file, works?
  • Barchart grouped by variable both count up to 100 percent
  • Converting time in R to 24 hours
  • R - Web scrapping and downloading multiple zip files and save the files without overwriting
  • Find month and year inside string
  • Append multiple csv files into one file using R
  • Use `purrr::map` with k-means
  • R - 'data' is not an exported object from 'namespace:my_package'
  • Sum vector with number by dinamic intervals without looping
  • Issues with ave function in R: error "cannot allocate vector of size 419 kb."
  • Shiny system call with continuous updates
  • Unable to un-nest some fields using google bigquery (standard)
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org