logo
Tags down

shadow

Create groups, breaks and conditions using If Else statements in R or Python


By : Armando Luna
Date : July 30 2020, 06:00 PM
wish help you to fix your issue I have a large dataset (2 million records), df, that I am trying to Group and create Breaks within datetimes. I would like to define a group and create these "breaks", if the following conditions apply: (This is a large dataset, and I do not know the contents of the subject, recipients and length columns) , Using dplyr :
code :
library(dplyr)

df %>%
  #Add row number
  mutate(row = row_number(), 
  #Convert to Posixct
         Date = lubridate::mdy_hms(Date)) %>%
  #Keep only TRUE rows
  filter(Edit) %>%
  #Create groups
  group_by(gr = cumsum(c(TRUE, diff(row) > 1))) %>%
  #Get first, last and difference between the dates
  summarise(Start = first(Date), 
            End = last(Date), 
            Duration = difftime(End, Start, "secs"), 
            Group = "A", Subject = "hey", Length = 80) %>%
   select(-gr)

# A tibble: 3 x 6
#  Start               End                 Duration Group Subject Length
#  <dttm>              <dttm>              <drtn>   <chr> <chr>    <dbl>
#1 2020-01-02 01:00:01 2020-01-02 01:00:30 29 secs  A     hey         80
#2 2020-01-02 01:02:00 2020-01-02 01:02:05  5 secs  A     hey         80
#3 2020-01-02 01:03:00 2020-01-02 01:03:20 20 secs  A     hey         80


Share : facebook icon twitter icon

If you have to put breaks in the if statements, why do we need to bother with conditions in the while operation?


By : keshar gurung
Date : March 29 2020, 07:55 AM
With these it helps Edit (as the question's title is changed):
The checking condition in the while/do-while loop is the one that is primarily checked to determine if the program is to stay in or to get out of the while/do-while loop - not the break statement.
code :
while (!(Carselect == 1 || Carselect == 2 || Carselect == 3));
while (Carselect != 1 && Carselect != 2 && Carselect != 3);

How to write if statements , multiple conditions to create new column in data frame


By : J.Doe
Date : March 29 2020, 07:55 AM
this one helps. You could simply use data table subsetting... first initialize a column, then assign it values based on your conditions. so here DF is my dataframe, and TEMP is the parameter I am classifying my new comlumn "control temp" with.
code :
DF$Control_Temp <- NA
DF$Control_Temp[DF$TEMP <= 50 & DF$TEMP2 == -1] <- 'Y'
DF$Control_Temp[DF$TEMP > 50 & DF$TEMP <= 100 & DF$TEMP2 == -1] <- 'N'
DF$Control_Temp[DF$TEMP > 100 & DF$TEMP2 == -1 ] <- 'Y'

Create groups/classes based on conditions within columns


By : YellowDev
Date : March 29 2020, 07:55 AM
help you fix your problem You can do this without having to loop or iterate through your dataframe. Per Wes McKinney you can use .apply() with a groupBy object and define a function to apply to the groupby object. If you use this with .shift() (like here) you can get your result without using any loops.
Terse example:
code :
# Group by Employee ID
grouped = df.groupby("Employee ID")
# Define function 
def get_unique_events(group):
    # Convert to date and sort by date, like @Khris did
    group["Effective Date"] = pd.to_datetime(group["Effective Date"])
    group = group.sort_values("Effective Date")
    event_series = (group["Effective Date"] - group["Effective Date"].shift(1) > pd.Timedelta('365 days')).apply(lambda x: int(x)).cumsum()+1
    return event_series

event_df = pd.DataFrame(grouped.apply(get_unique_events).rename("Unique Event")).reset_index(level=0)
df = pd.merge(df, event_df[['Unique Event']], left_index=True, right_index=True)
df['Output'] = df['Unique Event'].apply(lambda x: "Unique Leave Event " + str(x))
df['Match'] = df['Desired Output'] == df['Output']

print(df)
  Employee ID Effective Date        Desired Output  Unique Event  \
3         100     2013-01-01  Unique Leave Event 1             1
2         100     2014-07-01  Unique Leave Event 2             2
1         100     2015-06-05  Unique Leave Event 2             2
0         100     2016-01-01  Unique Leave Event 2             2
6         200     2013-01-01  Unique Leave Event 1             1
5         200     2015-01-01  Unique Leave Event 2             2
4         200     2016-01-01  Unique Leave Event 2             2
7         300        2014-01  Unique Leave Event 1             1

                 Output Match
3  Unique Leave Event 1  True
2  Unique Leave Event 2  True
1  Unique Leave Event 2  True
0  Unique Leave Event 2  True
6  Unique Leave Event 1  True
5  Unique Leave Event 2  True
4  Unique Leave Event 2  True
7  Unique Leave Event 1  True
import pandas as pd

data = {'Employee ID': ["100", "100", "100","100","200","200","200","300"],
        'Effective Date': ["2016-01-01","2015-06-05","2014-07-01","2013-01-01","2016-01-01","2015-01-01","2013-01-01","2014-01"],
        'Desired Output': ["Unique Leave Event 2","Unique Leave Event 2","Unique Leave Event 2","Unique Leave Event 1","Unique Leave Event 2","Unique Leave Event 2","Unique Leave Event 1","Unique Leave Event 1"]}
df = pd.DataFrame(data, columns=['Employee ID','Effective Date','Desired Output'])

# Group by Employee ID
grouped = df.groupby("Employee ID")

# Define a function to get the unique events
def get_unique_events(group):
     # Convert to date and sort by date, like @Khris did
    group["Effective Date"] = pd.to_datetime(group["Effective Date"])
    group = group.sort_values("Effective Date")
    # Define a series of booleans to determine whether the time between dates is over 365 days
    # Use .shift(1) to look back one row
    is_year = group["Effective Date"] - group["Effective Date"].shift(1) > pd.Timedelta('365 days')
    # Convert booleans to integers (0 for False, 1 for True)
    is_year_int = is_year.apply(lambda x: int(x))    
    # Use the cumulative sum function in pandas to get the cumulative adjustment from the first date.
    # Add one to start the first event as 1 instead of 0
    event_series = is_year_int.cumsum() + 1
    return event_series

# Run function on df and put results into a new dataframe
# Convert Employee ID back from an index to a column with .reset_index(level=0)
event_df = pd.DataFrame(grouped.apply(get_unique_events).rename("Unique Event")).reset_index(level=0)

# Merge the dataframes
df = pd.merge(df, event_df[['Unique Event']], left_index=True, right_index=True)

# Add string to match desired format
df['Output'] = df['Unique Event'].apply(lambda x: "Unique Leave Event " + str(x))

# Check to see if output matches desired output
df['Match'] = df['Desired Output'] == df['Output']

print(df)
  Employee ID Effective Date        Desired Output  Unique Event  \
3         100     2013-01-01  Unique Leave Event 1             1
2         100     2014-07-01  Unique Leave Event 2             2
1         100     2015-06-05  Unique Leave Event 2             2
0         100     2016-01-01  Unique Leave Event 2             2
6         200     2013-01-01  Unique Leave Event 1             1
5         200     2015-01-01  Unique Leave Event 2             2
4         200     2016-01-01  Unique Leave Event 2             2
7         300        2014-01  Unique Leave Event 1             1

                 Output Match
3  Unique Leave Event 1  True
2  Unique Leave Event 2  True
1  Unique Leave Event 2  True
0  Unique Leave Event 2  True
6  Unique Leave Event 1  True
5  Unique Leave Event 2  True
4  Unique Leave Event 2  True
7  Unique Leave Event 1  True

How to create groups based on conditions


By : user7253776
Date : March 29 2020, 07:55 AM
wish help you to fix your issue I have this kind of data: , A variation on @ycw's answer:
code :
library(data.table)
setDT(df)

df[, g := rleid( z <- out==0 | shift(out==0) )*NA^(!z) ]

    group size      int      out  g
 1:     A 1000 5.585529 0.000000  1
 2:     A 1000 5.709466 0.000000  1
 3:     A 1000 4.890697 0.000000  1
 4:     A 1000 0.000000 0.000000  1
 5:     A 1000 0.000000 0.000000  1
 6:     A    0 0.000000 4.080678  1
 7:     A    0 0.000000 4.883752 NA
 8:     A    0 0.000000 6.817312 NA
 9:     A 2000 4.546503 0.000000  3
10:     A 2000 5.605887 0.000000  3
11:     A 2000 3.182044 0.000000  3
12:     A 2000 0.000000 0.000000  3
13:     A 2000 0.000000 0.000000  3
14:     A 2000 0.000000 0.000000  3
15:     A 2000 0.000000 0.000000  3
16:     A    0 0.000000 5.370628  3
17:     A    0 0.000000 5.520216 NA
18:     A    0 0.000000 4.249468 NA
19:     A 5000 5.630099 0.000000  5
20:     A 5000 4.723816 0.000000  5
21:     A 5000 4.715840 0.000000  5
22:     A 5000 0.000000 0.000000  5
23:     A 5000 0.000000 0.000000  5
24:     A    0 0.000000 5.816900  5
25:     A    0 0.000000 4.113642 NA
26:     A    0 0.000000 4.668422 NA
    group size      int      out  g
df[, g := match(g, unique(na.omit(g)))]
w = df[.(unique(na.omit(g))), on=.(g), which=TRUE, mult="first"]
df[, g2 := cumsum(.I %in% w)]
    group size      int      out  g g2
 1:     A 1000 5.585529 0.000000  1  1
 2:     A 1000 5.709466 0.000000  1  1
 3:     A 1000 4.890697 0.000000  1  1
 4:     A 1000 0.000000 0.000000  1  1
 5:     A 1000 0.000000 0.000000  1  1
 6:     A    0 0.000000 4.080678  1  1
 7:     A    0 0.000000 4.883752 NA  1
 8:     A    0 0.000000 6.817312 NA  1
 9:     A 2000 4.546503 0.000000  2  2
10:     A 2000 5.605887 0.000000  2  2
11:     A 2000 3.182044 0.000000  2  2
12:     A 2000 0.000000 0.000000  2  2
13:     A 2000 0.000000 0.000000  2  2
14:     A 2000 0.000000 0.000000  2  2
15:     A 2000 0.000000 0.000000  2  2
16:     A    0 0.000000 5.370628  2  2
17:     A    0 0.000000 5.520216 NA  2
18:     A    0 0.000000 4.249468 NA  2
19:     A 5000 5.630099 0.000000  3  3
20:     A 5000 4.723816 0.000000  3  3
21:     A 5000 4.715840 0.000000  3  3
22:     A 5000 0.000000 0.000000  3  3
23:     A 5000 0.000000 0.000000  3  3
24:     A    0 0.000000 5.816900  3  3
25:     A    0 0.000000 4.113642 NA  3
26:     A    0 0.000000 4.668422 NA  3
    group size      int      out  g g2

How do I create a JOIN between 2 SELECT Statements with GROUPs?


By : Sathya Moorthy
Date : March 29 2020, 07:55 AM
I hope this helps . I think that you can do what you want, joining them like this (updated):
code :
    strSQL = "INSERT INTO Weekly (Symbol, WeekEnd, WeeklyHi, WeeklyLow, AdjClose) " &
"Select A.Symbol, A.WeekEnd, A.WeeklyHi, A.WeeklyLow, ISNULL(B.AdjClose, 0) as AdjClose " &
"FROM " &
    "(SELECT Symbol, WeekEnd, MAX(DailyHi) as WeeklyHi, MIN(DailyLow) as WeeklyLow " &
    "FROM TempDaily " &
    "GROUP BY Symbol, WeekEnd ) A " &
    "LEFT JOIN " &
    "(Select wdata.WeekEnd, MAX(wdata.AdjClose) as AdjClose " &
    "FROM " &
        "(Select CloseDate, WeekEnd, " &
        "FIRST_VALUE(AdjClose) OVER (PARTITION BY WeekEnd ORDER BY CloseDate " &
        "DESC ROWS UNBOUNDED PRECEDING) As AdjClose " &
        "FROM TempDaily) wdata " &
     "GROUP BY wdata.WeekEnd) B ON A.WeekEnd = B.WeekEnd "
Related Posts Related Posts :
  • name 'df' is not defined in box plot
  • Comparing dataframe columns
  • Can I Override Global Authentication for a Single Request Type in an ApiView using DRF?
  • Celery chain performances
  • Why am I getting "asynchronous comprehension outside of an asynchronous function"?
  • Creating a file from a docker container
  • doing too many write operations in django rest framework and postgres
  • How to change the order of bar charts in Python?
  • Pandas Data Frame manipulation
  • an undefined error in a simple python code- KeyError: '284882215'
  • Pandas split column in several columns throug string replacement or regex
  • how value is passed from __init__ method in pyhton as it dose not return anyhting
  • Dynamically inherit all Python magic methods from an instance attribute
  • Asking user to input certain information
  • how to test a deep learning model in a new dataset
  • Is np.fft.fft working properly? I am getting very large frequency values
  • How can you delete similar characters at the same positions in 2 strings
  • Does insert (at the end of a list) have O(1) time complexity?
  • Automatically Creating List of Dictionaries Based Upon Two Lists of Equal Length with Python
  • Discrete Cosine Transform (DCT) Coefficient Distribution
  • multiprocessing.Pool not running on last element of iterable
  • Python: sorting string non lexicographically
  • Render images from media directory Django
  • Cannot understand why more vectorization is slower than less vectorization in this case?
  • Django - Use a property as a foreign key
  • creating a function that loops if you do not enter the correct variables
  • Confused on how to store 3D matrices in HDF5 file in matlab?
  • TOTP: Can someone use the same otp within 30s and misuse it
  • is it possible to have 2 type hints for 1 parameter in Python?
  • Can someone explain what this Numpy array property is called?
  • Better way to add the result of apply (multiple outputs) to an existing DataFrame with column names
  • Selecting choice numbers
  • Create variables from list PYTHON
  • This code takes forever to run but doesn't give an error
  • "return" and "return None" behavior difference in generator
  • AttributeError: 'str' object has no attribute 'fbind' error using kivy in Python
  • Python not importing files when not inside conda environment
  • Is it possible to override a class' __call__ method?
  • Python library for live coordinated plotting in map
  • Pandas: counting consecutive rows with condition
  • How to define that a return type of method is an implementation of superclass
  • How can I print to the Visual Studio Code console in Portuguese?
  • Google Appengine Standard Python 2.7: Can't run Google Endpoints on localhost dev_appserver.py anymore
  • google appengine Unauthorized status 401
  • Don't understand cause of this IndentationError in my tic tac toe script
  • How to read in key-value pair from a json file as a pandas dataframe?
  • Can decorator decorate a recursive function?
  • How do I create a nested for loop where I have control of the initial loop index value
  • Unexpected error when creating a SQLite database using python
  • Pythonic way to write cascading of loops and if statements?
  • Python Beginner - Having trouble with multiple choice quiz program
  • Itertools return value NOT used in combinations
  • Return a list of words that contain a letter
  • From rows to columns using Peewee ORM
  • Parse large text document, to keep only "account number", and a specific keyword ("Market Value")
  • Cannot append to my list without getting a nonetype object error
  • Python Train Test Split
  • Optimizing following Python List of Dictionary operation with better solution
  • In Pandas merge colum1 value with colum2, both col data type is object and only few values are null in first column?
  • Python run multiple background loops independently
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org