logo
down
shadow

Removing last rows of each group based on condition in a pandas dataframe


Removing last rows of each group based on condition in a pandas dataframe

By : Andi Hasbi
Date : November 23 2020, 03:01 PM
To fix this issue I have following dataframe: , Use
code :
In [4598]: (df.groupby('name').apply(lambda x: x.iloc[:-1] if len(x)>1 else x)
              .reset_index(drop=True))
Out[4598]:
  name gender  count
0    A      M      3
1    A      F      2
2    B    NaN      2
3    C      F      4
4    D      M      5


Share : facebook icon twitter icon
Drop some Pandas dataframe rows using group based condition

Drop some Pandas dataframe rows using group based condition


By : M Venkat
Date : March 29 2020, 07:55 AM
like below fixes the issue I've got some data on sales, say, and want to look at how different post codes compare: do some deliver more profitable business than others? So I'm grouping by postcode, and can easily get various stats out on a per postcode basis. However, there are a few very high value jobs which distort the stats, so what I'd like to do is ignore the outliers. For various reasons, what I'd like to do is define the outliers by group: so, for example, drop the rows in the dataframe that are in the top xth percentile of their group, or the top n in their group. , You can use apply() method:
code :
import pandas as pd
import io


txt="""     A         C         D
0  foo -0.536732  0.061055
1  bar  1.470956  1.350996
2  foo  1.981810  0.676978
3  bar -0.072829  0.417285
4  foo -0.910537 -1.634047
5  bar -0.346749 -0.127740
6  foo  0.959957 -1.068385
7  foo -0.640706  2.635910"""

df = pd.read_csv(io.BytesIO(txt), delim_whitespace=True, index_col=0)

def f(df):
    return df.sort("C").iloc[:-2]
df2 = df.groupby("A", group_keys=False).apply(f)
print df2
     A         C         D
5  bar -0.346749 -0.127740
4  foo -0.910537 -1.634047
7  foo -0.640706  2.635910
0  foo -0.536732  0.061055
print df2.reindex(df.index[df.index.isin(df2.index)])
    A         C         D
0  foo -0.536732  0.061055
4  foo -0.910537 -1.634047
5  bar -0.346749 -0.127740
7  foo -0.640706  2.635910
def f(df):
    return df[df.C>df.C.mean()]
df3 = df.groupby("A", group_keys=False).apply(f)
print df3
removing duplicate rows in pandas DataFrame based on a condition

removing duplicate rows in pandas DataFrame based on a condition


By : yusuga
Date : March 29 2020, 07:55 AM
should help you out I want to delete duplicate rows with respect to column 'a' in a dataFrame with the argument 'take_last = True' unless some condition. For instance, If I had the following dataFrame
code :
#   a  b      c
#0  1  S   Blue
#1  2  M  Black
#2  2  L   Blue
#3  1  L  Green

#get first rows of groups, sort them and reset index; delete redundant col index
df1 = df.groupby('a').head(1).sort('a').reset_index()
del df1['index']

#get last rows of groups, sort them and reset index; delete redundant col index
df2 = df.groupby('a').tail(1).sort('a').reset_index()
del df2['index']
print df1
#   a  b      c
#0  1  S   Blue
#1  2  M  Black
print df2
#   a  b      c
#0  1  L  Green
#1  2  L   Blue

#if value in col c in df1 is 'Blue' replace this row with row from df2 (indexes are same)
df1.loc[df1['c'].isin(['Blue'])] = df2
print df1
#   a  b      c
#0  1  L  Green
#1  2  M  Black
Pandas dataframe in python: Removing rows from df1 based on rows in df2

Pandas dataframe in python: Removing rows from df1 based on rows in df2


By : servcoappliance
Date : March 29 2020, 07:55 AM
may help you . I have two dataframes: , Simpy filter the dataframe using isin()
code :
df1[df1.rowname.isin(df2.rowname)]

  rowname  a  b  c  d
0      R1  1  2  0  1
1      R2  2  2  0  1
3      R4  1  2  0  1
Pandas dataframe apply lambda to selected rows only (based on a condition) within the dataframe

Pandas dataframe apply lambda to selected rows only (based on a condition) within the dataframe


By : Rex
Date : March 29 2020, 07:55 AM
may help you . Update: This solution works:
df['GlobalName'] = np.where(df['GlobalName']=='', df['IsPerson'].apply(lambda x: x if x==True else ''), df['GlobalName'])
Removing rows from a dataframe based on condition or value

Removing rows from a dataframe based on condition or value


By : Nick Wang
Date : March 29 2020, 07:55 AM
wish help you to fix your issue Is there a way I can remove data from a df that has been grouped and sorted based on column values? , You could use boolean masking:
code :
mask = df['df'].ne('mdf') & df['rank'].eq(0)
excl_id = df.loc[mask, 'id'].unique()

df[~df['id'].isin(excl_id)]
Related Posts Related Posts :
  • How to calculate each single element of a numpy array based on conditions
  • How do I change the width of Jupyter notebook's cell's left part?
  • Measure distance between lat/lon coordinates and utm coordinates
  • Installing megam for NLTK on Windows
  • filter dataframe on each value of a samn column have a specific value of another column in Panda\Python
  • Threading with pubsub throwing AssertionError: 'callableObj is not callable' in wxPython
  • Get grouped data from 2 dataframes with condition
  • How can I import all of sklearns regressors
  • How to take all elements except the first k
  • Whats wrong with my iteration list of lists from csv
  • Tensorflow Estimator API save image summary in eval mode
  • How to Pack with PyQt - how to make QFrame/Layout adapt to content
  • How do I get certain Time Range in Python
  • python doubly linked list - insertAfter node
  • Open .h5 file in Python
  • Joining a directory name with a binary file name
  • python, sort list with two arguments in compare function
  • Is it possible to print from Python using non-ANSI colors?
  • Pandas concat historical data using date minus some number of days
  • CV2: Import Error in Python OpenCV
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org