logo
Tags down

shadow

Pandas DataFrame: programmatic rows split of a dataframe on multiple columns conditions


By : Allan Breck
Date : October 18 2020, 08:10 AM
With these it helps After several attempts, I managed to achieve my goal.
Here is the code:
code :
import Pandas
import numpy
# assume dataframe exists
df = ...
# initiliaze an array of False, matching df number of rows
resulting_bools = numpy.zeros((1, len(df.index)), dtype=bool)

for col in list_cols:
    # obtain array of booleans for given column and boolean condition for [row, column] value
    criterion = df[col].map(lambda x: x < 0) # same condition for each column, different conditions would have been more difficult (for me)

     # perform cumulative boolean evaluation accross columns
    resulting_bools |= criterion

# use the array of booleans to build the required df
negative_values_matches = df[ resulting_bools].copy() # use .copy() to avoid further possible warnings from Pandas depending on what you do with your data frame
positive_values_matches = df[~resulting_bools].copy()


Share : facebook icon twitter icon

Pandas: Update Multiple Dataframe Columns Using Duplicate Rows From Another Dataframe


By : El ouardi Mohamed
Date : March 29 2020, 07:55 AM
around this issue I think that combine_first will be an elegant solution, as per JohnE, provided you set Display Name as an index. This brings me to another point. I think that your task is well-defined only if 'Display Name' corresponds to exactly one set of attributes within each table. Assuming that, you can drop duplicates, set index and use .update like so:
code :
df1 = df1.drop_duplicates()

df1 = df1.set_index('Display Name')
df2 = df2.set_index('Display Name')

df2_c = df2.copy()

df2.update(df1)
df1.update(df2_c)

del df2_c

How to read two rows and two columns of a pandas dataframe at once and apply conditions on those rows/column values?


By : CharliePS
Date : March 29 2020, 07:55 AM
I wish this helpful for you Here is one way to achieve the result. This function uses shift, concat, and apply to run the data into a function which can do the prod/sum thing based on matching _index values.
Code:
code :
import itertools as it

def crazy_prod_sum_thing(frame):
    # get the labels which do not end with _index
    labels = [(l, l + '_index')
              for l in frame.columns.values if not l.endswith('_index')]

    def func(row):
        # get row n and row n-1
        front = row[:len(row) >> 1]
        back = row[len(row) >> 1:]

        # loop through the labels
        results = []
        for l, i in labels:
            x = front[l].split(',')
            y = back[l].split(',')
            if front[i] == back[i]:
                results.append(x[0] + y[0] + ',' + x[1] + x[1])
            else:
                results.append(
                    ','.join([x1 + y1 for x1, y1 in it.product(x, y)]))

        return pd.Series(results)

    # take this function and apply it to pandas dataframe:
    df = pd.concat([frame, frame.shift(1)], axis=1)[1:].apply(
        func, axis=1)

    df.rename(columns={i: x[0] + '_cpst' for i, x in enumerate(labels)},
              inplace=True)
    return pd.concat([frame, df], axis=1)
import pandas as pd
from io import StringIO
data = [x.strip() for x in """
      alfa  alfa_index beta  beta_index delta  delta_index
    0  a,b          23  c,d          36   a,c           32
    1  a,c          23  b,e          37   c,d           32
    2  g,h          28  d,f          37   e,g           32
    3  a,b          28  c,d          39   a,c           34
    4  c,e          28  b,g          39   d,k           34
""".split('\n')[1:-1]]
df = pd.read_csv(StringIO(u'\n'.join(data)), sep='\s+')
print(df)

print(crazy_prod_sum_thing(df))
  alfa  alfa_index beta  beta_index delta  delta_index
0  a,b          23  c,d          36   a,c           32
1  a,c          23  b,e          37   c,d           32
2  g,h          28  d,f          37   e,g           32
3  a,b          28  c,d          39   a,c           34
4  c,e          28  b,g          39   d,k           34

1          [aa,cc, bc,bd,ec,ed, ca,dd]
2          [ga,gc,ha,hc, db,ff, ec,gg]
3    [ag,bb, cd,cf,dd,df, ae,ag,ce,cg]
4                [ca,ee, bc,gg, da,kk]

Split/explode cells into multiple rows based on conditions in pandas dataframe


By : user1843340
Date : March 29 2020, 07:55 AM
may help you . See whether this meets your requirements. The comments explain how it works.
code :
#!/usr/bin/env python
import pandas as pd # tested with pd.__version__ 0.19.2
df = pd.DataFrame([{'Column1': '((CC ) + (A11/ABC/ZZ) + (!AAA))',
                    'Column2': 'XYZ + XXX/YYY'}])   # your input dataframe
list = ['AAA', 'BBB', 'CCC']                        # your input list
to_replace = dict()
for item in list:   # prepare the dictionary for the '!' replacements
    to_replace["!"+item+'\\b'] = '/'.join([i for i in list if i != item])
df = df.replace(to_replace, regex=True) # do all the '!' replacements
import re
def expanded(s):    # expand series s to multiple string list around '/'
    l = s.str.replace('[()]', '').tolist()
    while True:     # in each loop cycle, handle one A/B/C... expression
        xl = []     # expanded list for this cycle
        for s in l: # for each string in the list so far
            m = re.search(r'\w+(/\w+)+', s) # look for a A/B/C... expression
            if m:   # if there is, add the individual expansions to the list
                xl.extend([m.string[:m.start()]+i+m.string[m.end():]
                                            for i in m.group().split('/')])
            else:   # if not, we're done
                return l
        l = xl      # expanded list for this cycle is now the current list
def expand(c):      # expands the column named c to multiple rows
    new = expanded(df[c])                       # get the new contents
    xdf = pd.concat(len(new)/len(df[c])*[df])   # create required rows
    xdf[c] = sorted(new)                        # set the new contents
    return xdf                                  # return new dataframe
df = expand('Column1')
df = expand('Column2')
print df
           Column1    Column2
0  CC  + A11 + BBB  XYZ + XXX
0  CC  + A11 + CCC  XYZ + XXX
0  CC  + ABC + BBB  XYZ + XXX
0  CC  + ABC + CCC  XYZ + XXX
0   CC  + ZZ + BBB  XYZ + XXX
0   CC  + ZZ + CCC  XYZ + XXX
0  CC  + A11 + BBB  XYZ + YYY
0  CC  + A11 + CCC  XYZ + YYY
0  CC  + ABC + BBB  XYZ + YYY
0  CC  + ABC + CCC  XYZ + YYY
0   CC  + ZZ + BBB  XYZ + YYY
0   CC  + ZZ + CCC  XYZ + YYY

How do I split data out from one column of a pandas dataframe into multiple columns of a new dataframe


By : user3620422
Date : March 29 2020, 07:55 AM
Any of those help Using pivot + filter + add_suffix:
code :
out = (df.pivot(*df).filter(['XXXX','ZZZZ']).add_suffix('_DIFF')
                   .reset_index().rename_axis(None,axis=1))
print(out)

   YEAR  XXXX_DIFF  ZZZZ_DIFF
0  2013        5.5        6.5
1  2014        4.5        3.5

split one python list or dataframe with single column into a pandas dataframe multiple columns


By : Jerome Gauzins
Date : March 29 2020, 07:55 AM
I wish this help you I think you can create numpy.array and then numpy.reshape with DataFrame constructor:
Related Posts Related Posts :
  • How do I capitalize each parameter in a function definition using Python?
  • Regex matching of a bytes pattern gives unusual results - '.' not equivalent to [\x00-\xff]
  • I need help converting this REST API Curl command to Python requests
  • How do you make a variable comparison to decide a better score in a dice game?
  • How do I run sumo-gui on instant-veins-4.7.1-i1.ova
  • Deal with NAN values when creating models with python
  • Python requests: having a space in header for posting
  • Adding a column to a pandas dataframe based on cell values
  • Get mongod rs.status() results from a python script
  • ImportError: C extension: No module named 'parsing' not built
  • python pandas update column values related to previous updated row during iteration over it
  • 3 nested loops: Optimizing a simple simulation for speed
  • Assign subset of values to pandas dataframe with MultiIndex
  • How to group two sets of buttons on each top corner of the screen using Tkinter?
  • django login using class based for custom user
  • MRJob sort reducer output
  • Python Pandas Counts using rolling time window
  • Getting or editing a string from a column in a csv file with pandas
  • Python - Delete row in matrix/array if row contains
  • Using dicom Images with OpenCV in Python
  • Odoo ghost record
  • Creating and assigning multiple variables in a tkinter application
  • Graph dictionary
  • No changes to original dataframe after applying loop
  • AUC of Random forest model is lower after tuning parameters using hypergrid search and CV with 10 folds
  • Python: Reading multiple CSV files, and assigning each to a different variable
  • How to identify empty rectangle using OpenCV
  • How to iterate multilevel dataframe in python
  • How to limit the contour plot with a line plot?
  • Why subclassing a str or int behaves differently from subclising a list or dict?
  • Python decode with translation table
  • i need to click unordered links in the below URL using selenium, python
  • How to join pandas dataframe with itself?
  • How to apply a color cast to a video frame in OpenCV Python?
  • Is there any existing library for median filtering with kernel size greater then 5 using OpenCL acceleration in python?
  • Changing the color of points in scatter plot for different dummy values
  • Calculate center for each polygon in a list efficiently
  • Loading modules in the same Python package
  • replacing pixels in an imagewith pixels from another image python
  • Suggestion on picking the best options of two lists (minimum and maximum )python
  • Resetting Index in a Dataframe drops the Indexed column by 1 row
  • Convert number which are str from readlines to digits - python
  • Unable to authenitcate with python minds api
  • Print variables from a query in python
  • Ipython does not see the installed library
  • Javascript-like array-method chaining in Python?
  • PyQT: Get contents CustFormWidgetIem inside QListWidgetItem
  • Bottle server: HTTPResponse vs bottle.response
  • pytorch vgg model test on one image
  • Runtime scope and `main` symbol is different inside or outside a function
  • Use anaconda in pycharm (Import libraries error, updating anaconda and virtual environment)
  • how to get the sum of a CSV column list to print
  • Python plot drop lines with repeating value in column
  • receive binary file from POST request with BaseHTTPRequestHandler
  • D-Bus - 'ServiceUnknown' exception encountered while calling a remote procedure
  • Pandas .min() method doesn't seem fastest
  • Pandas: How to reference columns of structure: ('Name', n) ('Name', n+1)
  • Read a text file and remove all characters except alphabets & spaces in Python
  • Compute all powerset intersections of two lists
  • Applying literal_eval on string of lists of POS tags gives ValueError
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org