logo
Tags down

shadow

Create a loop to generate a series of dataframes in R


By : Emanuel Grech
Date : July 29 2020, 10:00 PM
fixed the issue. Will look into that further I have a df and I would like to get the average and sd of X1, X2, X3 for each batch at each duration. , A base solution:
code :
to_use <-names(df)[grepl("^X",names(df))]
Map(function(x) Rmisc::summarySE(df,x,groupvars = c("duration"),
                                                     na.rm = FALSE,
                              conf.interval = 0.95, .drop = TRUE),to_use)
lapply(to_use,function(x) Rmisc::summarySE(df,x,groupvars = c("duration"),
                               na.rm = FALSE,
                               conf.interval = 0.95, .drop = TRUE) ) 
$X1
  duration N   X1         sd   se        ci
1        0 2 0.10 0.00000000 0.00 0.0000000
2        1 2 0.15 0.07071068 0.05 0.6353102
3        2 2 0.20 0.14142136 0.10 1.2706205

$X2
  duration N    X2         sd    se        ci
1        0 2 0.100 0.00000000 0.000 0.0000000
2        1 2 0.125 0.03535534 0.025 0.3176551
3        2 2 0.150 0.07071068 0.050 0.6353102

$X3
  duration N    X3         sd    se        ci
1        0 2 0.200 0.00000000 0.000 0.0000000
2        1 2 0.175 0.03535534 0.025 0.3176551
3        2 2 0.150 0.07071068 0.050 0.6353102
df <- structure(list(batch = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("B1", 
"B2"), class = "factor"), duration = c(0L, 1L, 2L, 0L, 1L, 2L
), X1 = c(0.1, 0.2, 0.3, 0.1, 0.1, 0.1), X2 = c(0.1, 0.15, 0.2, 
0.1, 0.1, 0.1), X3 = c(0.2, 0.15, 0.1, 0.2, 0.2, 0.2)), class = "data.frame", row.names = c(NA, 
-6L))


Share : facebook icon twitter icon

Pandas Create Two Dataframes Based on Series Membership


By : Greg Morton
Date : March 29 2020, 07:55 AM
it fixes the issue It seems you need difference of MultiIndexes and then select by loc:
code :
print (df1.index)
MultiIndex(levels=[['IL', 'NY'], ['Chicago', 'Long_Island', 
                                  'NYC', 'South', 'Suburbs', 'Upstate']],
           labels=[[1, 1, 1, 0, 0, 0], [2, 5, 1, 0, 3, 4]],
           names=['State', 'Region'])

print (df2.index)
Int64Index([0, 1, 2], dtype='int64', name='index')

print (df1.index.names)
['State', 'Region']
#create index from both columns
df2 =  df2.set_index(df1.index.names)
what is same as
#df2 = df2.set_index(['State','Region'])

mux = df1.index.difference(df2.index)
print (mux)
MultiIndex(levels=[['IL', 'NY'], ['South', 'Suburbs', 'Upstate']],
           labels=[[0, 0, 1], [0, 1, 2]],
           names=['State', 'Region'],
           sortorder=0)

print (df1.loc[mux])
               2000  2010  Diff
State Region                   
IL    South      50    35    15
      Suburbs   800   650  -150
NY    Upstate   200   270    70
df2 =  df2.set_index(df1.index.names)
df = df1.loc[df1.index.difference(df2.index)]
print (df)

empty dataframes when appending series with for loop


By : anwyn
Date : March 29 2020, 07:55 AM
hope this fix your issue The problem is with the append function. It does not update the initial dataframe, you must:
code :
df = df.append(stuff)

Concat dataframes/series with axis=1 in a loop


By : user3191453
Date : March 29 2020, 07:55 AM
Any of those help Consider pivot_table after calculating month_end (see @Root's answer). Also, use reindex to fill in missing months. Usually in Pandas, grouping aggregations like count of senders per month does not require looping or temporary helper data frames.
code :
from pandas.tseries.offsets import MonthEnd

df_in['month_end'] = (df_in['time'] + MonthEnd(0)).dt.normalize()

agg_df = (df_in.pivot_table(index='month_end', columns='sender', values='time', aggfunc='count')
               .reindex(pd.date_range('1998-01-01', '2000-01-31', freq='m').values, axis='index')
               .fillna(0)                
          )
print(agg_df)  
# sender      Able Boy  Mark L. Taylor  james h. madison  james joyce  scott kirk
# month_end                                                                      
# 1998-01-31       0.0             0.0               0.0          0.0         0.0
# 1998-02-28       0.0             0.0               0.0          0.0         0.0
# 1998-03-31       0.0             0.0               0.0          0.0         0.0
# 1998-04-30       0.0             0.0               0.0          0.0         0.0
# 1998-05-31       0.0             0.0               0.0          0.0         0.0
# 1998-06-30       0.0             0.0               0.0          0.0         0.0
# 1998-07-31       0.0             0.0               0.0          0.0         0.0
# 1998-08-31       0.0             0.0               0.0          0.0         0.0
# 1998-09-30       0.0             0.0               0.0          0.0         0.0
# 1998-10-31       0.0             0.0               0.0          0.0         0.0
# 1998-11-30       4.0             3.0               0.0          0.0         0.0
# 1998-12-31       1.0             0.0               0.0          0.0         4.0
# 1999-01-31       1.0             0.0               4.0          0.0         0.0
# 1999-02-28       0.0             0.0               0.0          0.0         0.0
# 1999-03-31       0.0             0.0               0.0          0.0         0.0
# 1999-04-30       0.0             0.0               0.0          0.0         0.0
# 1999-05-31       0.0             0.0               0.0          0.0         0.0
# 1999-06-30       0.0             0.0               0.0          0.0         0.0
# 1999-07-31       0.0             0.0               0.0          0.0         0.0
# 1999-08-31       0.0             0.0               0.0          0.0         0.0
# 1999-09-30       0.0             0.0               0.0          0.0         0.0
# 1999-10-31       0.0             0.0               0.0          0.0         0.0
# 1999-11-30       0.0             0.0               0.0          0.0         0.0
# 1999-12-31       0.0             0.0               0.0          0.0         0.0
# 2000-01-31       0.0             0.0               0.0          4.0         1.0

Printing a series of dataframes in a for loop


By : user3218473
Date : March 29 2020, 07:55 AM
help you fix your problem To print two dataframe with the same format as pandas you can use this library:
code :
from IPython.display import display 
display(df1.head()) 
display(df2.head())

Create new renamed dataframes based on subset of current dataframes in a loop


By : user3522483
Date : March 29 2020, 07:55 AM
I think the issue was by ths following , You could use the same code which works for one dataframe on a list of dataframes inside lapply. Assuming your list where all the dataframes are stored is called list_df
code :
library(lubridate)
out <- lapply(mget(list_df), function(df) subset(df, hour(columnname) > 5 | hour(columnname) <20))
names(out) <- paste0("df_t", seq_along(out))
Related Posts Related Posts :
  • multiple usage of ggplot
  • Calculate the number of common occurrences and values ​for each id in R studio
  • Splitting full address column in multiple columns
  • Data.table - Subtract pairs of columns
  • Vector containing a numerical value not being equal to the actual value
  • Extracting all words and clusters of letters in a string and then making each word a seperate piece of data using gsub()
  • Count Duplicates In Vector In R
  • tmap coming up with blank map with one variable (but values are there)
  • Making indexing a row number in map_map2_chr function
  • find frequency of events in groups dplyr
  • Add number of observations per group in barplot (ggplot2)
  • Using case_when() within mutate_at() to recode rows of selected columns with different types with NA
  • R programming: How to count different values’ frequency among all columns?
  • ggplot2: geom_point is sometimes removing NA values depending on the aesthetic used
  • Trying to map a value for geom_vline, but is not plotting in the correct place on the x axis with ggplot in R
  • R - find overlapping dates per group based on another data frame
  • In R how to plot the tail area of a normal distribution using ggplot?
  • Is there an R function to select one variable from each group (group_by()) from the dataframe?
  • Alternative of summarise() function in dplyr
  • How to I get scatter plot to have different colour for each value that I am plotting?
  • R: Avoid using a for-loop to sequentially select values in one column and apply a function using the vector of values in
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org