logo
Tags down

shadow

Pandas: merge data frame but summing overlapping columns


By : Siju Mohan MM
Date : October 14 2020, 02:21 PM
will help you Here's an attempt. Please leave a comment if I understood correctly.
Given:
code :
>>> df1                                                                                                                
Month  Dec  Nov
ID             
XXX    4.0  1.0
YYY    8.0  3.0
ZZZ    4.0  1.0
>>> df2                                                                                                                
Month  Dec  Nov  Oct
ID                  
AAA    1.0  7.0  9.0
BBB    0.0  NaN  2.0
YYY    5.0  5.0  0.0
>>> pd.concat([df1, df2]).reset_index().groupby('ID', sort=False).sum(min_count=1)
      Dec  Nov  Oct
ID                 
XXX   4.0  1.0  NaN
YYY  13.0  8.0  0.0
ZZZ   4.0  1.0  NaN
AAA   1.0  7.0  9.0
BBB   0.0  NaN  2.0
>>> cat = pd.concat([df1, df2])                                                                                        
>>> cat                                                                                                                
     Dec  Nov  Oct
ID                
XXX  4.0  1.0  NaN
YYY  8.0  3.0  NaN
ZZZ  4.0  1.0  NaN
AAA  1.0  7.0  9.0
BBB  0.0  NaN  2.0
YYY  5.0  5.0  0.0
>>> cat = cat.reset_index()                                                                                            
>>> cat                                                                                                                
    ID  Dec  Nov  Oct
0  XXX  4.0  1.0  NaN
1  YYY  8.0  3.0  NaN
2  ZZZ  4.0  1.0  NaN
3  AAA  1.0  7.0  9.0
4  BBB  0.0  NaN  2.0
5  YYY  5.0  5.0  0.0
>>> cat.groupby('ID', sort=False).size()                                                                               
ID
XXX    1
YYY    2
ZZZ    1
AAA    1
BBB    1
dtype: int64
>>> cat.groupby('ID', sort=False).sum(min_count=1)                                                      
      Dec  Nov  Oct
ID                 
XXX   4.0  1.0  NaN
YYY  13.0  8.0  0.0
ZZZ   4.0  1.0  NaN
AAA   1.0  7.0  9.0
BBB   0.0  NaN  2.0
>>> s = pd.Series([np.nan, np.nan])                                                                                    
>>> s                                                                                                                  
0   NaN
1   NaN
dtype: float64
>>>                                                                                                                    
>>> s.sum()                                                                                                            
0.0
>>> s.sum(min_count=1)                                                                                                 
nan
>>> s[0] = 1                                                                                                           
>>> s                                                                                                                  
0    1.0
1    NaN
dtype: float64
>>> s.sum()                                                                                                            
1.0
>>> s.sum(min_count=1)                                                                                                 
1.0
>>> s.sum(min_count=2)                                                                                                 
nan


Share : facebook icon twitter icon

Summing values of a pandas data frame given a list of columns


By : Jia-Long Johnny Yeh
Date : March 29 2020, 07:55 AM
should help you out You can use subset of df and sum:
code :
print df
   x1  x2  x3  x4  x5  x6
0   1   2   3   4   5   6
1   3   4   5   6   3   3
2   1   2   3   6   1   2

print df[['x1', 'x3', 'x4']]
   x1  x3  x4
0   1   3   4
1   3   5   6
2   1   3   6

li =  ['x1', 'x3', 'x4']
print df[li]
   x1  x3  x4
0   1   3   4
1   3   5   6
2   1   3   6

print df[li].sum()
x1     5
x3    11
x4    16
dtype: int64

print df[li].sum(axis=1)
0     8
1    14
2    10
dtype: int64

Pandas Data Frame how to merge columns


By : Wildfly
Date : March 29 2020, 07:55 AM
I wish this helpful for you This is not possible.
Underlying pandas.DataFrame objects are numpy arrays, which do not group data in the way you suggest. Therefore, an arbitrary column cannot be displayed as grouped data.
code :
import pandas as pd

df = pd.DataFrame([['AAA', 8, 2, 'BBB'],
                   ['AAA', 9, 5, 'BBB'],
                   ['AAA', 10, 6, 'BBB']],
                  columns=['Name', 'Score1', 'Score2', 'PM'])

res = df.set_index(['Name', 'PM'])
          Score1  Score2
Name PM                 
AAA  BBB       8       2
     BBB       9       5
     BBB      10       6
df['dummy'] = 0
res = df.set_index(['Name', 'PM', 'dummy'])
                Score1  Score2
Name PM  dummy                
AAA  BBB 0           8       2
         0           9       5
         0          10       6

Merge two columns into one within the same data frame in pandas/python


By : lio
Date : March 29 2020, 07:55 AM
I wish did fix the issue. I have a question to merge two columns into one in the same dataframe(start_end), also remove null value. I intend to merge 'Start station' and 'End station' into 'station', and keep 'duration' according to the new column 'station'. I have tried pd.merge, pd.concat, pd.append, but I cannot work it out.
code :
>>> df
   Duration      End station                         Start station
0      1407              NaN                        14th & V St NW
1       509              NaN                        21st & I St NW
2       638  15th & P St NW.                                   NaN
3      1532              NaN  Massachusetts Ave & Dupont Circle NW
4       759              NaN           Adams Mill & Columbia Rd NW
>>> df.columns = df.columns.str.replace('.*?station', 'station')
>>> df
   Duration          station                               station
0      1407              NaN                        14th & V St NW
1       509              NaN                        21st & I St NW
2       638  15th & P St NW.                                   NaN
3      1532              NaN  Massachusetts Ave & Dupont Circle NW
4       759              NaN           Adams Mill & Columbia Rd NW
>>> s = df.stack()
>>> s
0  Duration                                    1407
   station                           14th & V St NW
1  Duration                                     509
   station                           21st & I St NW
2  Duration                                     638
   station                          15th & P St NW.
3  Duration                                    1532
   station     Massachusetts Ave & Dupont Circle NW
4  Duration                                     759
   station              Adams Mill & Columbia Rd NW
dtype: object
>>> df = s.unstack()
>>> df
  Duration                               station
0     1407                        14th & V St NW
1      509                        21st & I St NW
2      638                       15th & P St NW.
3     1532  Massachusetts Ave & Dupont Circle NW
4      759           Adams Mill & Columbia Rd NW
>>> 
>>> # without changing column names
>>> s.index
MultiIndex(levels=[[0, 1, 2, 3, 4], ['Duration', 'End station', 'Start station']],
           labels=[[0, 0, 1, 1, 2, 2, 3, 3, 4, 4], [0, 2, 0, 2, 0, 1, 0, 2, 0, 2]])

>>> # column names the same
>>> s.index
MultiIndex(levels=[[0, 1, 2, 3, 4], ['Duration', 'station']],
           labels=[[0, 0, 1, 1, 2, 2, 3, 3, 4, 4], [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]])
>>> stations = pd.concat([df.iloc[:,1],df.iloc[:,2]]).dropna()
>>> stations.name = 'stations'
>>> stations
2                         15th & P St NW.
0                          14th & V St NW
1                          21st & I St NW
3    Massachusetts Ave & Dupont Circle NW
4             Adams Mill & Columbia Rd NW
Name: stations, dtype: object

>>> df2 = pd.concat([df['Duration'], stations], axis=1)
>>> df2
   Duration                              stations
0      1407                        14th & V St NW
1       509                        21st & I St NW
2       638                       15th & P St NW.
3      1532  Massachusetts Ave & Dupont Circle NW
4       759           Adams Mill & Columbia Rd NW

Pandas data frame merge select columns


By : user2376966
Date : March 29 2020, 07:55 AM
around this issue I think you need filter only no column, then on and how parameters are not necessary:
code :
resDF = pd.merge(df1[['no']], df2)
resDF = df2[df2['no'].isin(df1['no'])]

Overlapping density plots of multiple pandas data frame columns


By : user2822827
Date : March 29 2020, 07:55 AM
this one helps. , I understood your question! Here's how I would do it in matplotlib.
code :
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

col1 = np.random.normal(0, 1, (1000, ))
col2 = np.random.normal(1, 1, (1000, ))
col3 = np.random.normal(-1, 1, (1000, ))
df = pd.DataFrame({'col1':col1, 'col2':col2, 'col3':col3})

df['col1_bins'] = pd.cut(df['col1'], bins=np.arange(-10, 11, 0.5))
df['col2_bins'] = pd.cut(df['col2'], bins=np.arange(-10, 11, 0.5))
df['col3_bins'] = pd.cut(df['col3'], bins=np.arange(-10, 11, 0.5))

col1_counts = df[['col1_bins', 'col1']].groupby(['col1_bins']).count().reset_index()
col2_counts = df[['col2_bins', 'col1']].groupby(['col2_bins']).count().reset_index()
col3_counts = df[['col3_bins', 'col1']].groupby(['col3_bins']).count().reset_index()

plt.plot(col1_counts['col1_bins'].astype(str), col1_counts['col1'], 'r')
plt.plot(col2_counts['col2_bins'].astype(str), col2_counts['col1'], 'b')
plt.plot(col3_counts['col3_bins'].astype(str), col3_counts['col1'], 'g')
Related Posts Related Posts :
  • How to add result of previous row to contents of present row?
  • Train LSTM with probabilistic labels
  • AWS Cloudwatch Logstream - What is the key, and how can I set it when getting the logstream
  • Page Pagination/Scraping with Requests/BeautifulSoup
  • How to fix NoReverseMatch on redirect
  • Using a list to name output files in Arcpy
  • Need help conditionally vectorizing a list
  • I want to apply a threshold to pixels in image using python. Where did I make a mistake?
  • Problems unsing Beautiful Soup
  • python binning data openAI gym
  • Python: Argparse with list of lists
  • Creating Columns in m x 1 dataframe based on spaces in each row?
  • Explicit relative imports within a package not using the keyword from
  • APScheduler and passing arguments
  • Compare two lists and print out when a change happens
  • Decoding Django POST request body
  • How to fill pandas dataframe columns in for loop
  • Keras backend function: InvalidArgumentError
  • Get index of elements in first Series within the second series
  • Redirecting to a new URL to parse through
  • Transform string into a bit array
  • How to print list one after the other in a vertical order in text file in python
  • Python divide each string by the total lenght of string
  • Pymongo Bulk Delete
  • Python / NiFi: ExecuteScript python, to convert an UTF-16 text files to UTF-8
  • Getting l1 normalized eigenvectors from python instead of l2?
  • Get span inside a class using WebDriver and Selenium
  • Non blocking command process
  • I'm getting positional argument in Django rest framework APIView class empty. Why? And how to pass value into it?
  • Create an array according to index in another array in Python
  • Matplotlib multiple Y-axes, xlabels disappear?
  • feedparser for reddit returning empty
  • physical dimensions and array dimensions
  • can't get my program to return to main loop
  • how to read image into tensor from url directly
  • Can't find a combination of keywords on an xml page using python and beautiful soup
  • Find the rotation of a quad (4 points, planar)
  • Class method input variables
  • Pandas Dataframe, how to group columns together in Python
  • What does "auth.User" in Django do?
  • Python - Get Last Element after str.split()
  • How to access a variable in one python function in another function
  • Manually computed validation loss different from reported val_loss when using regularization
  • Filtering with a only one conditional
  • How to set specific faker random string of specific length and using underscores for spaces?
  • seaborn FacetGrid+map_dataframe fails (but not when using map)
  • How to get GraphQL schema with Python?
  • Python - How to send values between functions once
  • Loop sum find and multiple
  • Map & append multiple values (per each key) from a dict to different columns of a dataframe
  • Python list of dictionaries incrementation error
  • Filtering Spark Dataframe
  • pytest: How to test project-dependent directory creation?
  • Python Group by and Sum with a Blank space
  • Reorder and return the whole of nested dictionary
  • Finding element from one list in nested second list
  • Calculating AUC for Unsupervised LOF in sklearn
  • Storing Specific Whole Numbers - Python
  • Simulate SHL and SHR ASM instructions in Python
  • AttributeError: type object 'DirectView' has no attribute 'as_view'
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org