logo
down
shadow

Pandas Dataframe, how to group columns together in Python


Pandas Dataframe, how to group columns together in Python

By : Caleb Bertsch
Date : October 19 2020, 08:10 AM
around this issue I have a pandas Dataframe and i want to group some of the columns to build higher levels columns: , groupby / concat hack
code :
m = {'A': 'AB', 'B': 'AB', 'C': 'CD', 'D': 'CD'}
pd.concat(dict((*df.groupby(m, 1),)), axis=1)

         AB         CD      
          A    B     C     D
Index                       
1      0.25  0.3  0.25  0.66
2      0.25  0.3  0.25  0.66
3      0.25  0.3  0.25  0.66


Share : facebook icon twitter icon
Python pandas: dataframe grouped by a column(such as, name), and get the value of some columns in each group

Python pandas: dataframe grouped by a column(such as, name), and get the value of some columns in each group


By : Tickers
Date : March 29 2020, 07:55 AM
Does that help I'll elaborate a bit more than in the comment. The problem is that extract_text is only able to handle individual strings. However when you groupby and then apply, you're sending a list with all the strings in the group.
There are two solutions, the first is the one I indicated (sending individual strings):
code :
index_num = df.groupby('age')['text'].apply(lambda x: [extract_text(_) for _ in x]) 
 def extract_text(list_texts):
    list_index = []
    for text in list_texts:
        index_n = None
        text_len = len(text)
        for i in range(0, text_len, 1):
            if text[i] == 'I':
                index_n = i
        list_index.append(index_n)
    return list_index
index_num = df.groupby('age')['text'].apply(extract_text)
pandas update dataframe by another dataframe with group by columns

pandas update dataframe by another dataframe with group by columns


By : Hila Levy
Date : March 29 2020, 07:55 AM
I wish this helpful for you I have two dataframe like this , I think this will be more space efficient:
Edit To Add
code :
In [22]: df1,df2 = df1.align(df2,join='left',axis=0)

In [23]: df1
Out[23]: 
   A  B   C   D  E
0  1  1   A  B0  A
1  2  1  A1  B1  A
2  3  1  A2  B2  S
3  4  1  A3  B3  S
4  5  1  A4  B4  S

In [24]: df2
Out[24]: 
     A    C    D
0    1    c   d1
1    6   c1   d1
2    9   c2   d2
3    4   c3   d3
4  NaN  NaN  NaN
In [26]: equal_rows = df1.A == df2.A

In [27]: df1.loc[equal_rows]
Out[27]: 
   A  B   C   D  E
0  1  1   A  B0  A
3  4  1  A3  B3  S

In [28]: df1.loc[equal_rows,['C','D']] = df2.loc[equal_rows,['C','D']]

In [29]: df1
Out[29]: 
   A  B   C   D  E
0  1  1   c  d1  A
1  2  1  A1  B1  A
2  3  1  A2  B2  S
3  4  1  c3  d3  S
4  5  1  A4  B4  S
In [30]: df2.dropna(how='all',axis=0, inplace=True)

In [31]: df2
Out[31]: 
   A   C   D
0  1   c  d1
1  6  c1  d1
2  9  c2  d2
3  4  c3  d3
In [13]: merged = pd.merge(df1,df2,how='left', on=['A'])

In [14]: merged
Out[14]: 
   A  B C_x D_x  E  C_y  D_y
0  1  1   A  B0  A    c   d1
1  2  1  A1  B1  A  NaN  NaN
2  3  1  A2  B2  S  NaN  NaN
3  4  1  A3  B3  S   c3   d3
4  5  1  A4  B4  S  NaN  NaN

In [15]: merged.fillna({'C_y':df1.C,'D_y':df1.D},inplace=True)
Out[15]: 
   A  B C_x D_x  E C_y D_y
0  1  1   A  B0  A   c  d1
1  2  1  A1  B1  A  A1  B1
2  3  1  A2  B2  S  A2  B2
3  4  1  A3  B3  S  c3  d3
4  5  1  A4  B4  S  A4  B4

In [16]: merged.drop(['C_x','D_x'],axis=1,inplace=True)

In [17]: merged
Out[17]: 
   A  B  E C_y D_y
0  1  1  A   c  d1
1  2  1  A  A1  B1
2  3  1  S  A2  B2
3  4  1  S  c3  d3
4  5  1  S  A4  B4
In [20]: merged.rename(columns={"C_y":'C','D_y':'D'},inplace=True)

In [21]: merged
Out[21]: 
   A  B  E   C   D
0  1  1  A   c  d1
1  2  1  A  A1  B1
2  3  1  S  A2  B2
3  4  1  S  c3  d3
4  5  1  S  A4  B4
When using python pandas dataframe, how do you group columns?

When using python pandas dataframe, how do you group columns?


By : Satish Jadhav
Date : March 29 2020, 07:55 AM
help you fix your problem My input excel (xlsx) file has a format like: , IIUC, use split then group on the first part before '.':
code :
df.groupby(df.columns.str.split('.').str[0], axis=1).sum()
   g_1  g_2  mz   n
0   13   24   1  14
1   13   24   1  14
2   13   24   1  14
   mz  n  n.1  n.2  n.3  g_1  g_1.1  g_2  g_2.1  g_2.2
0   1  2    3    4    5    6      7    8      8      8
1   1  2    3    4    5    6      7    8      8      8
2   1  2    3    4    5    6      7    8      8      8
How to perform group by on multiple columns in a pandas dataframe and predict future values using fbProphet in python?

How to perform group by on multiple columns in a pandas dataframe and predict future values using fbProphet in python?


By : Nikhil Jadhav
Date : March 29 2020, 07:55 AM
I wish this help you My dataframe looks like following. I am trying to aggregate(sum) my amount column based on Date and Group present in pandas dataframe. I was able to successfully aggregate the column. However, I am not sure how to pass in fbprophet to predict the future values based on grouped date and Group. Below is the code for aggregation. Note: I am beginner in python, please provide explanation with code. , You're suffering from a few problems.
numeric
code :
'Amount':[float(n) for n in ['12.1','13.2','15.1','10.7','12.9','9.0','5.6','6.7','4.3','2.3','4.0','5.6','7.8','2.3','5.6','8.9']]}
>>> grouped.reset_index(inplace=True)
>>> grouped.dtypes
Group     object
Date      object
Amount   float64
dtype: object
import datetime as dt

def to_timestamp(day: str):
    return dt.datetime.strptime(day, '%Y-%m-%d')

grouped['Date'] = grouped.Date.apply(to_timestamp)
How to group by a pandas.Dataframe's columns based on the indexes and values of another pandas.Series?

How to group by a pandas.Dataframe's columns based on the indexes and values of another pandas.Series?


By : user3551437
Date : March 29 2020, 07:55 AM
To fix the issue you can do After further research, I have found a solution. Here's what I came up with using pandas.Series.map on the DataFrame's columns:
code :
def sum_weights_by_classification_labels(security_weights, security_classification):

    classification_weights = security_weights.copy()
    classification_weights.columns = classification_weights.columns.map(security_classification)
    classification_weights = classification_weights.groupby(classification_weights.columns, axis=1).sum()

    return classification_weights
def sum_weights_by_classification_labels(security_weights, security_classification):

    security_weights_transposed = security_weights.transpose()
    merged_data = security_weights_transposed.merge(security_classification, how='left', left_index=True, 
                                                    right_index=True)
    classification_weights = merged_data.groupby(security_classification.name).sum().transpose()

    return classification_weights
Related Posts Related Posts :
  • How to use an API that requires user's entry (Sentiment Analysis)
  • Django first app
  • Why is this regex code not working
  • Beautifulsoup - findAll not finding string when link is also in container
  • Python: any() to check if attribute in List of Objects matches a list
  • How do I "enrich" every record in a Pandas dataframe with an hour column?
  • Failing to open an Excel file with Python
  • Python function to modify string
  • Pandas DataFrame seems not to have "factorize" method
  • Row column operations in CSV
  • How to decrypt RSA encrypted file (via PHP and OpenSSL) with pyopenssl?
  • How can we use pandas to generate min, max, mean, median, ...as new columns for the dataframe?
  • Cython: creating an array throws "not allowed in a constant expression"
  • Different thing is shown in html
  • sublimetext3 event for program exit
  • Join contigous tokens if the token includes "@" char
  • transparent background in gif using Python Imageio
  • Enable autologin into flask app using active directory
  • Make a NxN array of 1x3 arrays of random numbers (python)
  • django how to use Max and Count on the same field in back-to-back annotations
  • Using the OR operator seems to only take the first of two conditions when used with np.where filter
  • Elegant Dataframe Operations in Pandas
  • Change metadata of pdf file with pypdf2
  • How can I animate a set of points with matplotlib?
  • error: (-215) count >= 0 && (depth == CV_32F || depth == CV_32S) in function arcLength
  • OpenStack KeyStone SSL Exception When Creating an Instance of KeyStone
  • pyspark: The system cannot find the path specified
  • How can I set path to load data from CSV file into PostgreSQL database in Docker container?
  • Summation in python dictionary
  • DRF 3.7.0 removed handling None in fields and broke my foreign key source fields. Is there a way around it?
  • Error with Padlen in signal.filtfilt in Python
  • Abstract matrix multiplication with variables
  • Reading binary data on bit level
  • How to replace multiple instances of a sub strings in a string using a for loop (in a function)?
  • py2neo cypher create several relations to central node in for loop
  • [python-3]TypeError: must be str, not int
  • How to exit/terminate a job earlier and handle the raised exception in apscheduler?
  • python, print intermediate values while loop
  • python to loop over yaml config
  • D3.js is not recognized by PyCharm
  • Access the regularization paths obtained from ElasticNetCV in sklearn
  • Pattern table to Pandas DataFrame
  • Get the earliest date from a column (Python Pandas) after csv.reader
  • Get SystemError: Parent module '' not loaded, cannot perform relative import when trying to import numpy in a Cython Ext
  • Bash or Python : Append and prepend a string recursively in all .tex files
  • Changing a certain index of boolean list of lists change others, too
  • complex dataframe filtering request on the last occurence of a value in Panda/Python [EDIT]
  • How to repeatedly get the contents of a Text widget every loop with tkinter?
  • How to call the tornado.queues message externally
  • How can I use regex in python so that characters not included are disallowed?
  • Discarding randmly scattered empty spaces in pandas data frame
  • Get sums grouped by date by same column filtered by 2 conditions
  • Element disappears when I add an {% include %} tag inside my for loop
  • Django Rest Framework with either a slug or a pk lookup field for the DetailAPIView
  • Flask doesn't stream on Lambda
  • Generate all permutations of fixed length where the elements come from two different sets
  • Making function for calculating distance
  • How to handle multiprocessing based on the limit of CPU's
  • Django - static files is not working
  • Remove x axis and y axis black lines with matplotlib
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org