logo
down
shadow

How to use groupby on a single column and perform comparisons for multiple columns in Pandas?


How to use groupby on a single column and perform comparisons for multiple columns in Pandas?

By : user2173100
Date : October 21 2020, 08:10 PM
I think the issue was by ths following , I have a dataframe of users, whether or not they have signed up, and the model's prediction for whether or not they have signed up. I want to find per user: the TP (they signed up and the model predicted they did), FP (they didn't sign up but the model predicted they did), FN (they signed up but the model predicted no), and TN (they didn't sign up and the model predicted no). Here 1 means they signed up and 0 means they did not. I want to groupby on users, and then perform comparisons using the other two columns. For example, I might have something like the following: , Do the comparison before you groupby and then groupby + sum
code :
(df.assign(TP = df.Signed_up & df.Prediction, 
           TN = (df.Signed_up == 0) & (df.Prediction == 0),
           FN = df.Signed_up & (df.Prediction == 0), 
           FP = (df.Signed_up == 0) & df.Prediction)
   .groupby('Users')['TP', 'TN', 'FN', 'FP'].sum())

       TP   TN   FN   FP
Users                   
User1   1  0.0  1.0  0.0
User2   0  2.0  0.0  1.0
User3   1  0.0  0.0  0.0
df.groupby([*df]).size().unstack([1,2]).fillna(0)

Signed_up     1         0     
Prediction    0    1    0    1
Users                         
User1       1.0  1.0  0.0  0.0
User2       0.0  0.0  2.0  1.0
User3       0.0  1.0  0.0  0.0


Share : facebook icon twitter icon
Pandas groupby, pivot, or stack? Turn groups of a single column into multiple columns

Pandas groupby, pivot, or stack? Turn groups of a single column into multiple columns


By : Rachel
Date : March 29 2020, 07:55 AM
I wish did fix the issue. You can use cumcount with pivot:
code :
print (df)
      A               B      C
0     2  PresentationID  12954
1     5       Attendees     65
2     6       Downloads      0
3     7       Questions      0
4     8           Likes     11
5     9          Tweets      0
6    10           Polls      0
7    73  PresentationID  12953
8    76       Attendees     64
9    77       Downloads     31
10   78       Questions      0
11   79           Likes     11
12   80          Tweets      0
13   81           Polls      0
14  143  PresentationID  12951
15  146       Attendees     64
16  147       Downloads     28
17  148       Questions      2
18  149           Likes      2
19  150          Tweets      0
20  151           Polls      0

df['G'] = df.groupby('B').cumcount()
df = df.pivot(index='G', columns='B', values='C')
print (df)
B  Attendees  Downloads  Likes  Polls  PresentationID  Questions  Tweets
G                                                                       
0         65          0     11      0           12954          0       0
1         64         31     11      0           12953          0       0
2         64         28      2      0           12951          2       0
df = pd.pivot(index=df.groupby('B').cumcount(), columns=df.B, values=df.C)
print (df)
B  Attendees  Downloads  Likes  Polls  PresentationID  Questions  Tweets
0         65          0     11      0           12954          0       0
1         64         31     11      0           12953          0       0
2         64         28      2      0           12951          2       0
Pandas- Groupby multiple columns and mean from a single column

Pandas- Groupby multiple columns and mean from a single column


By : Thalita Alves
Date : March 29 2020, 07:55 AM
like below fixes the issue There is problem you need aggregate columns with strings and times too, e.g. by first, else are omited.
So possible solution is create dict of aggregation functions and use groupby + agg + reset_index + reindex_axis:
code :
print (df)

   A      B      C  D  E  J  K
0  a  date1  time1  1  1  1  1
1  b  date2  time2  2  2  2  2
2  c  date2  time3  1  1  1  1

cols = ['A','B','C']
d = {x:'mean' for x in df.columns.difference(cols)}
d['A'] = 'first'
d['C'] = 'first'
print (d)
{'E': 'mean', 'D': 'mean', 'J': 'mean', 'A': 'first', 'C': 'first', 'K': 'mean'}

df1 = df.groupby('B').agg(d).reset_index().reindex_axis(df.columns, axis=1)
print (df1)
   A      B      C    D    E    J    K
0  a  date1  time1  1.0  1.0  1.0  1.0
1  b  date2  time2  1.5  1.5  1.5  1.5
How to perform operation between different columns after groupby a column in pandas?

How to perform operation between different columns after groupby a column in pandas?


By : Syed Kalim Ullah
Date : March 29 2020, 07:55 AM
like below fixes the issue You can sum both columns, then calculate the average after the groupby:
code :
gp = batting.groupby('playerID')[['H', 'AB']].sum()
gp['ba'] = gp.H/gp.AB
print(gp)

#              H     AB        ba
#playerID                        
#aardsda01     0      4  0.000000
#aaronha01  3771  12364  0.304998
#aaronto01   216    944  0.228814
#aasedo01      0      5  0.000000
#abadan01      2     21  0.095238
#abadfe01      1      9  0.111111
#abadijo01    11     49  0.224490
batting.groupby('playerID')[['H', 'AB']].sum().eval('ab = H / AB')
Perform multiple operations in a single groupby call with pandas?

Perform multiple operations in a single groupby call with pandas?


By : sof.zkzhai
Date : March 29 2020, 07:55 AM
fixed the issue. Will look into that further The solution is specific to your problem, but you can do this using a single groupby call. To get "avg_delay_pos", you just have to remove negative (and zero) values.
code :
df['delay_pos'] = df['delay'].where(df['delay'] > 0)

(df.filter(like='delay')
   .groupby(pd.to_datetime(df[['year', 'month', 'day']]))
   .mean()
   .add_prefix('avg_'))                                                                                                                                 

            avg_delay  avg_delay_pos
2013-01-01          0            NaN
2013-02-01         -4            NaN
2013-03-01         50           50.0
2014-01-01        -60            NaN
2014-02-01          9            9.0
2014-03-01         10           10.0
df['delay_pos'] = df['delay'].where(df['delay'] > 0)
# df['delay'].where(df['delay'] > 0)                                                                                                  

0     NaN
1     NaN
2    50.0
3     NaN
4     9.0
5    10.0
Name: delay, dtype: float64
df.filter(like='delay')                                                                                                             

   delay  delay_pos
0      0        NaN
1     -4        NaN
2     50       50.0
3    -60        NaN
4      9        9.0
5     10       10.0
_.groupby(pd.to_datetime(df[['year', 'month', 'day']])).mean()

            delay  delay_pos
2013-01-01      0        NaN
2013-02-01     -4        NaN
2013-03-01     50       50.0
2014-01-01    -60        NaN
2014-02-01      9        9.0
2014-03-01     10       10.0
pd.to_datetime(df[['year', 'month', 'day']])                                                                                        

0   2013-01-01
1   2013-02-01
2   2013-03-01
3   2014-01-01
4   2014-02-01
5   2014-03-01
dtype: datetime64[ns]
df['delay_pos'] = df['delay'].where(df['delay'] > 0)
df.groupby(['year', 'month', 'day']).mean().add_prefix('avg_').reset_index()

   year  month  day  avg_delay  avg_delay_pos
0  2013      1    1          0            NaN
1  2013      2    1         -4            NaN
2  2013      3    1         50           50.0
3  2014      1    1        -60            NaN
4  2014      2    1          9            9.0
5  2014      3    1         10           10.0
Pandas GroupBy a single column and display multiple columns as value counts

Pandas GroupBy a single column and display multiple columns as value counts


By : user3163918
Date : March 29 2020, 07:55 AM
wish of those help Use GroupBy.agg with value_counts and converting to dict:
code :
print (df.groupby('id')['gender', 'category'].agg(lambda x: x.value_counts().to_dict()))
from collections import Counter

print (df.groupby('id')['gender', 'category'].agg(lambda x: Counter(x)))
                    gender                     category
id                                                     
1   {'Men': 2, 'Women': 1}  {'western': 2, 'formal': 1}
2             {'Women': 1}                {'formal': 1}
3   {'Women': 1, 'Men': 1}                {'casual': 2}
print (df.groupby('id')['gender', 'category'].agg(list))
               gender                    category
id                                               
1   [Men, Women, Men]  [western, western, formal]
2             [Women]                    [formal]
3        [Men, Women]            [casual, casual]
print (pd.concat([df.groupby('id')['gender'].value_counts(),
                  df.groupby('id')['category'].value_counts()]))

id  gender 
1   Men        2
    Women      1
2   Women      1
3   Men        1
    Women      1
1   western    2
    formal     1
2   formal     1
3   casual     2
dtype: int64
Related Posts Related Posts :
  • Submitting login form with scrapy
  • How do i edit the favicon in the Browsable API in Django REST framework?
  • multiprocessing.Pool.map_async doesn't seem to... do anything at all?
  • Python Selenium: Stale Element Reference Exception Error
  • Datetime conversion - How to extract the inferred format?
  • Import YAML variables automatically?
  • How to create a powershell shortcut for my python file
  • Python's 'set' operator doesn't work with numpy.nan
  • Pass object fields and one2many fields on same method - Odoo v8
  • Select columns based on column name and location in Pandas
  • Standardizing timeseries in Pandas using interpolation
  • How many tweets can be collected?
  • how format specifier taking value while tuple list is passed
  • How to print a numpy array with data type?
  • Timeout child thread for python3
  • How can I regroup a dataframe and accumulate a colume's values?
  • Bulk Insert into SQL Server with Python not working
  • Removing last rows of each group based on condition in a pandas dataframe
  • Why the css file can not be found in Django template?
  • targeting center of mass - scipy / numpy
  • Foursquare - get tips from VENUE_ID
  • Unpack a dictionary to format
  • encoding special characters in python2
  • Replacing integers with NaN results in the entire column becoming float dtype
  • Python 3.6 - BeautifulSoup4, parse table AttributeError: ResultSet object has no attribute 'findAll'
  • Convert panda date list to python list of date strings
  • escape response from Scrapy to parse json
  • How to create a same dropdown menu for different labels?
  • Why are some python variables uppercase whereas others are lowercase?
  • Machine Learning, What are the common techniques for feature engineering and presenting the model?
  • Modify value of a Django form field during clean() and validate again
  • Heroku Django app can't start up -- 'No module named site'
  • Getting list of dates (excluding weekends)
  • Im trying to create the regular expression to include the text and not the href
  • Python file.readline(2) reads first 2 charectars
  • Groupby with handling empty bin in python
  • Modifying Gcode
  • calling a value in a dictionary within a dictionary (reading a json file)
  • Bouncing ball invalid syntax why is that?
  • Python making a counter
  • Python rstrip and split
  • What does the String mean in numpy.r_?
  • How to correctly extend variable __all__ in a __init__.py?
  • Python behaves weird with piped input
  • Python 3 two dimensional list comprehension
  • How to slice image by broadcasting slices? Error: 'only integer scalar arrays can be converted to a scalar index' in pyt
  • (Python Beginner) Need a start on classes
  • IndexError: At least one sheet must be visible
  • How to solve a system of linear equations over the nonnegative integers?
  • Pandas keep the most complete rows
  • "List index out of range" error in Python Memory Match game
  • Numpy: how to use argmax results to get the actual max?
  • Google Cloud Dataflow can't import 'google.cloud.datastore'
  • Calculate pandas DataFrame column by custom routine which accepts dictionary as input
  • Connect to a Class Method by it's method name holded into a var in a for loop in python
  • PyQt5 signals and threading.Timer
  • Replace 2 characters in a string in python
  • Passing command line arguments from a folder script to a file script
  • Understand the syntaxe X[Y == c] in Numpy
  • Optimize beginner python script about substring replacement
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org