logo
down
shadow

Python dataframe combining rows where first two column values inverse


Python dataframe combining rows where first two column values inverse

By : stb5573
Date : November 20 2020, 03:01 PM
hope this fix your issue I'm new to Python and would very much appreciate your help! I have a dataframe with three columns and would like to combine the rows where the first two columns have the same associations (i.e. being in column A vs column B doesn't really matter in this situation) and sum their values in the third column. For example, starting with this dataframe: , You can sort columns A and B and then use groupby
code :
df[['A','B']] = pd.DataFrame(np.sort(df[['A','B']].values, axis=1))
df.groupby(['A', 'B']).C.sum().reset_index()


    A   B   C
0   x   y   9
1   y   z   5
2   z   z   6


Share : facebook icon twitter icon
Combining a dataframe with multiple rows to another dataframe with a single row based on a column row

Combining a dataframe with multiple rows to another dataframe with a single row based on a column row


By : Hye-Ryun Cho
Date : March 29 2020, 07:55 AM
I think the issue was by ths following , I have a dataframe say : , If you are trying to use merge, you can do a left join like this:
code :
df1.merge(df2, how='left', on='A')
Python inverse/negative/non-present values in pandas dataframe column

Python inverse/negative/non-present values in pandas dataframe column


By : Sang Nguyen
Date : March 29 2020, 07:55 AM
Any of those help Reshape to wide, mask existing cell as nan and reshape back, which drops the existing pair of index:
code :
(pd.crosstab(df.col1, df.col2)
   .where(lambda x: x == 0)
   .stack().reset_index()
   .drop(0, 1))

#   col1  col2
#0     A     3
#1     A     4
#2     A     5
#3     A     6
#4     B     1
#5     B     2
#6     B     4
#7     B     5
#8     B     6
#9     C     1
#10    C     2
#11    C     3
idx_full = pd.MultiIndex.from_product([df.col1.unique(), df.col2.unique()])
idx_now = pd.MultiIndex.from_tuples(df.values.tolist())
pd.DataFrame(idx_full.difference(idx_now).tolist(), columns=df.columns)
def anti_complete(df):
    idx_full = pd.MultiIndex.from_product([df[col] for col in df.columns])
    idx_now = pd.MultiIndex.from_tuples(df.values.tolist())
    return pd.DataFrame(idx_full.difference(idx_now).tolist(), columns=df.columns)
​
print(anti_complete(df))
#   col1  col2
#0     A     3
#1     A     4
#2     A     5
#3     A     6
#4     B     1
#5     B     2
#6     B     4
#7     B     5
#8     B     6
#9     C     1
#10    C     2
#11    C     3
Python Pandas : compare the rows of two csv(dataframe) for similar values along one column and return content of rows (c

Python Pandas : compare the rows of two csv(dataframe) for similar values along one column and return content of rows (c


By : Ankit Jadav
Date : March 29 2020, 07:55 AM
this will help It can be done with the help of iterrows() function.
Here is the code :
code :
value=[(0,11,10,20),(1,22,11,21),(2,33,12,22),(3,44,13,23),(4,55,14,24), 
(5,66,15,25),(6,77,16,26),(7,88,17,27),(8,99,18,28)]
header=["index","time","vel","yaw"]
df1 = pd.DataFrame.from_records(value, columns=header)
value=[(0,67,"nan","nan"),(1,75,"nan" ,"nan" ),(2,87,"nan" ,"nan" ) 
(3,99,"nan" ,"nan" )]
header=["index","time","vel","yaw"]
df2 = pd.DataFrame.from_records(value, columns=header)
for index, row in df2.iterrows():
    min=10000000
    for indexer, rows in df1.iterrows():
        if abs(row['time']-rows['time'])<min:
            min = abs(row['time']-rows['time'])
            #storing the position 
            pos = indexer
    df2.loc[index,'vel'] = df1['vel'][pos]
    df2.loc[index,'yaw'] = df1['yaw'][pos]
Combining numeric column values in large pandas DataFrame for duplicate rows without combining string

Combining numeric column values in large pandas DataFrame for duplicate rows without combining string


By : user2004462
Date : March 29 2020, 07:55 AM
this will help Filter only Standard rows by boolean indexing and for new DataFrame use constructor:
code :
a = df.loc[df['account_type'] == 'Standard', 'cost'].sum()
print (a)
2.0

df = pd.DataFrame([['Standard', a]], columns=['account_type',  'cost'])
print (df)
  account_type  cost
0     Standard   2.0
df = pd.DataFrame([['Standard', df['cost'].sum()]], columns=['account_type',  'cost'])
df = pd.DataFrame([
['Standard1', 0.2],
['Standard1', 0.3],
['Standard1', 0.2],
['Standard2', 0.4],
['Standard2', 0.6],
['Standard', 0.3]], columns=['account_type',  'cost'])

print (df)
  account_type  cost
0    Standard1   0.2
1    Standard1   0.3
2    Standard1   0.2
3    Standard2   0.4
4    Standard2   0.6
5     Standard   0.3

df1 = df.groupby('account_type', as_index=False)['cost'].sum()
print (df1)
  account_type  cost
0     Standard   0.3
1    Standard1   0.7
2    Standard2   1.0
df = pd.DataFrame({
         'account_type':['Standard'] * 5 + ['another val'],
         'B':[4,5,4,5,5,4],
         'C':[7,8,9,4,2,3],
         'D':[1,3,5,7,1,0],
         'E':[5,3,6,9,2,4],
         'F':list('aaabbb')
})

print (df)
  account_type  B  C  D  E  F
0     Standard  4  7  1  5  a
1     Standard  5  8  3  3  a
2     Standard  4  9  5  6  a
3     Standard  5  4  7  9  b
4     Standard  5  2  1  2  b
5  another val  4  3  0  4  b

cols = df.select_dtypes(np.number).columns
s = df.loc[df['account_type'] == 'Standard', cols].sum()
print (s)
B    23
C    30
D    17
E    25
dtype: int64

df1 = s.to_frame().T
df1.insert(0, 'account_type', 'Standard')
print (df1)
  account_type   B   C   D   E
0     Standard  23  30  17  25
Combining duplicate dataframe rows with concatenating values for a specific column

Combining duplicate dataframe rows with concatenating values for a specific column


By : user2852013
Date : March 29 2020, 07:55 AM
Does that help In my opinion simpliest is sorting values in join function, so value_counts working correct:
code :
df2 = df.groupby('id')['words'].apply(lambda x: ' '.join(sorted(x))).reset_index()
print (df2)
  id words
0  1   a b
1  2     b
2  3   a c
3  4   a b
4  6   a c

print (df2.words.value_counts())
a b    2
a c    2
b      1
Name: words, dtype: int64
Related Posts Related Posts :
  • django how to use Max and Count on the same field in back-to-back annotations
  • Using the OR operator seems to only take the first of two conditions when used with np.where filter
  • Elegant Dataframe Operations in Pandas
  • Change metadata of pdf file with pypdf2
  • How can I animate a set of points with matplotlib?
  • error: (-215) count >= 0 && (depth == CV_32F || depth == CV_32S) in function arcLength
  • OpenStack KeyStone SSL Exception When Creating an Instance of KeyStone
  • pyspark: The system cannot find the path specified
  • How can I set path to load data from CSV file into PostgreSQL database in Docker container?
  • Summation in python dictionary
  • DRF 3.7.0 removed handling None in fields and broke my foreign key source fields. Is there a way around it?
  • Error with Padlen in signal.filtfilt in Python
  • Abstract matrix multiplication with variables
  • Reading binary data on bit level
  • How to replace multiple instances of a sub strings in a string using a for loop (in a function)?
  • py2neo cypher create several relations to central node in for loop
  • [python-3]TypeError: must be str, not int
  • How to exit/terminate a job earlier and handle the raised exception in apscheduler?
  • python, print intermediate values while loop
  • python to loop over yaml config
  • D3.js is not recognized by PyCharm
  • Access the regularization paths obtained from ElasticNetCV in sklearn
  • Pattern table to Pandas DataFrame
  • Get the earliest date from a column (Python Pandas) after csv.reader
  • Get SystemError: Parent module '' not loaded, cannot perform relative import when trying to import numpy in a Cython Ext
  • Bash or Python : Append and prepend a string recursively in all .tex files
  • Changing a certain index of boolean list of lists change others, too
  • complex dataframe filtering request on the last occurence of a value in Panda/Python [EDIT]
  • How to repeatedly get the contents of a Text widget every loop with tkinter?
  • How to call the tornado.queues message externally
  • How can I use regex in python so that characters not included are disallowed?
  • Discarding randmly scattered empty spaces in pandas data frame
  • Get sums grouped by date by same column filtered by 2 conditions
  • Element disappears when I add an {% include %} tag inside my for loop
  • Django Rest Framework with either a slug or a pk lookup field for the DetailAPIView
  • Flask doesn't stream on Lambda
  • Generate all permutations of fixed length where the elements come from two different sets
  • Making function for calculating distance
  • How to handle multiprocessing based on the limit of CPU's
  • Django - static files is not working
  • Remove x axis and y axis black lines with matplotlib
  • tkinter: assigning multiple functions to one button
  • flask-jwt-extended: Fake Authorization Header during testing (pytest)
  • Setting pandas dataframe value based on row and column conditions
  • swig char ** as a pointer to a char *
  • Confusion over `a` and `b` attributes from scipy.stats.uniform
  • How can I do groupy.apply() without sort my index?
  • Querying Google Cloud datastore with ancestor not returning anything
  • Read value from one thread in Python: queue or global variable?
  • Django - context process query being repeated 102 times
  • Convert a list of images and labels to np array to train tensorflow
  • Lambda not supporting NLTK file size
  • Numpy ndarray image pixel mean for pixel values greater than zero: Normalizing image
  • Understanding output of np.corrcoef for two matrices of different sizes
  • Finding longest perfect match between two strings
  • what is wrong with my cosine similarity? Tensorflow
  • How to manage user content in django?
  • Receiving unsupported operand error while comparing random number and user input.
  • How to wrap the process of creating start_urls in scrapy?
  • How to mark 'duplicated sequence' in pandas?
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org