Drop some Pandas dataframe rows using group based condition
By : M Venkat
Date : March 29 2020, 07:55 AM
like below fixes the issue I've got some data on sales, say, and want to look at how different post codes compare: do some deliver more profitable business than others? So I'm grouping by postcode, and can easily get various stats out on a per postcode basis. However, there are a few very high value jobs which distort the stats, so what I'd like to do is ignore the outliers. For various reasons, what I'd like to do is define the outliers by group: so, for example, drop the rows in the dataframe that are in the top xth percentile of their group, or the top n in their group. , You can use apply() method: code :
import pandas as pd
import io
txt=""" A C D
0 foo -0.536732 0.061055
1 bar 1.470956 1.350996
2 foo 1.981810 0.676978
3 bar -0.072829 0.417285
4 foo -0.910537 -1.634047
5 bar -0.346749 -0.127740
6 foo 0.959957 -1.068385
7 foo -0.640706 2.635910"""
df = pd.read_csv(io.BytesIO(txt), delim_whitespace=True, index_col=0)
def f(df):
return df.sort("C").iloc[:-2]
df2 = df.groupby("A", group_keys=False).apply(f)
print df2
A C D
5 bar -0.346749 -0.127740
4 foo -0.910537 -1.634047
7 foo -0.640706 2.635910
0 foo -0.536732 0.061055
print df2.reindex(df.index[df.index.isin(df2.index)])
A C D
0 foo -0.536732 0.061055
4 foo -0.910537 -1.634047
5 bar -0.346749 -0.127740
7 foo -0.640706 2.635910
def f(df):
return df[df.C>df.C.mean()]
df3 = df.groupby("A", group_keys=False).apply(f)
print df3
|
removing duplicate rows in pandas DataFrame based on a condition
By : yusuga
Date : March 29 2020, 07:55 AM
should help you out I want to delete duplicate rows with respect to column 'a' in a dataFrame with the argument 'take_last = True' unless some condition. For instance, If I had the following dataFrame code :
# a b c
#0 1 S Blue
#1 2 M Black
#2 2 L Blue
#3 1 L Green
#get first rows of groups, sort them and reset index; delete redundant col index
df1 = df.groupby('a').head(1).sort('a').reset_index()
del df1['index']
#get last rows of groups, sort them and reset index; delete redundant col index
df2 = df.groupby('a').tail(1).sort('a').reset_index()
del df2['index']
print df1
# a b c
#0 1 S Blue
#1 2 M Black
print df2
# a b c
#0 1 L Green
#1 2 L Blue
#if value in col c in df1 is 'Blue' replace this row with row from df2 (indexes are same)
df1.loc[df1['c'].isin(['Blue'])] = df2
print df1
# a b c
#0 1 L Green
#1 2 M Black
|
Pandas dataframe in python: Removing rows from df1 based on rows in df2
By : servcoappliance
Date : March 29 2020, 07:55 AM
may help you . I have two dataframes: , Simpy filter the dataframe using isin() code :
df1[df1.rowname.isin(df2.rowname)]
rowname a b c d
0 R1 1 2 0 1
1 R2 2 2 0 1
3 R4 1 2 0 1
|
Pandas dataframe apply lambda to selected rows only (based on a condition) within the dataframe
By : Rex
Date : March 29 2020, 07:55 AM
may help you . Update: This solution works: df['GlobalName'] = np.where(df['GlobalName']=='', df['IsPerson'].apply(lambda x: x if x==True else ''), df['GlobalName'])
|
Removing rows from a dataframe based on condition or value
By : Nick Wang
Date : March 29 2020, 07:55 AM
wish help you to fix your issue Is there a way I can remove data from a df that has been grouped and sorted based on column values? , You could use boolean masking: code :
mask = df['df'].ne('mdf') & df['rank'].eq(0)
excl_id = df.loc[mask, 'id'].unique()
df[~df['id'].isin(excl_id)]
|