logo
down
shadow

How to join pandas dataframe with itself?


How to join pandas dataframe with itself?

By : Ryoga San Bacon
Date : October 22 2020, 08:10 PM
wish helps you I have a dataframe in pandas that looks similar to this: , I think this is groupby + shift problem
code :
df['New']=df.groupby('Account')['Application Date'].shift(-1)


Share : facebook icon twitter icon
Rpy2 and Pandas: join output from predict to pandas dataframe

Rpy2 and Pandas: join output from predict to pandas dataframe


By : AshoK
Date : March 29 2020, 07:55 AM
may help you . There is a pull-request that adds R factor to Pandas Categorical functionality to Pandas. It has not yet been merged into the Pandas master branch. When it is,
code :
import pandas.rpy.common as rcom
rcom.convert_robj(pr)
def convert_factor(obj):
    """
    Taken from jseabold's PR: https://github.com/pydata/pandas/pull/9187
    """
    ordered = r["is.ordered"](obj)[0]
    categories = list(obj.levels)
    codes = np.asarray(obj) - 1  # zero-based indexing
    values = pd.Categorical.from_codes(codes, categories=categories,
                                       ordered=ordered)
    return values
import pandas as pd
import numpy as np
import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
pandas2ri.activate()
r = robjects.r
r.library("randomForest")
r.library("caret")

def convert_factor(obj):
    """
    Taken from jseabold's PR: https://github.com/pydata/pandas/pull/9187
    """
    ordered = r["is.ordered"](obj)[0]
    categories = list(obj.levels)
    codes = np.asarray(obj) - 1  # zero-based indexing
    values = pd.Categorical.from_codes(codes, categories=categories,
                                       ordered=ordered)
    return values


df = pd.DataFrame(data=np.random.rand(100, 10), 
                  columns=["a{}".format(i) for i in range(10)])
df["b"] = ['a' if x < 0.5 else 'b' for x in np.random.sample(size=100)]
train = df.ix[df.a0 < .75]
withheld = df.ix[df.a0 >= .75]

rf = r.randomForest(robjects.Formula('b ~ .'), data=train)
pr = convert_factor(r.predict(rf, withheld))

withheld['pr'] = pr
print(withheld)
Pandas - looping through columns of a dataframe and join this column with other dataframe

Pandas - looping through columns of a dataframe and join this column with other dataframe


By : BullXL
Date : March 29 2020, 07:55 AM
I hope this helps you . I think you can try concat:
code :
for col in df2.columns:
    print pd.concat([df1.reset_index(),df2[col]], axis=1)

      index  abc  bcd  def  a
0  20150101  0.5  0.3  0.2  0
1  20150102  0.7  0.9  1.6  9
2  20150103  1.7  2.9  4.6  2
      index  abc  bcd  def  b
0  20150101  0.5  0.3  0.2  1
1  20150102  0.7  0.9  1.6  5
2  20150103  1.7  2.9  4.6  3
      index  abc  bcd  def  c
0  20150101  0.5  0.3  0.2  8
1  20150102  0.7  0.9  1.6  3
2  20150103  1.7  2.9  4.6  7
dfs = {}

for col in df2.columns:
    df = pd.concat([df1.reset_index(),df2[col]], axis=1)
    #print df
    if (df2[col] == 1).any():
        print df
        #storing in dictionary of dataframes  
        dfs[col] = df
      index  abc  bcd  def  b
0  20150101  0.5  0.3  0.2  1
1  20150102  0.7  0.9  1.6  5
2  20150103  1.7  2.9  4.6  3       

print dfs['b']        
      index  abc  bcd  def  b
0  20150101  0.5  0.3  0.2  1
1  20150102  0.7  0.9  1.6  5
2  20150103  1.7  2.9  4.6  3
Pandas Dataframe and Series join returns empty Dataframe or NaN column

Pandas Dataframe and Series join returns empty Dataframe or NaN column


By : user6622595
Date : March 29 2020, 07:55 AM
hop of those help? There is problem indexes are not same dtypes, so get NaN.
Solution is cast both indexes to int or both to str for align:
code :
series1.index = series1.index.astype(int)
df1.index = df1.index.astype(int)
series1.index = series1.index.astype(str)
df1.index = df1.index.astype(str)
#inner join
merged = df1.join(series1, how='inner')
print (merged)
        mean        std  val
index                       
1110  -6.375  12.915982  0.0
#default left join
merged = df1.join(series1)
#same as:  
merged = df1.join(series1, how='left')
print (merged)
            mean         std  val
index                            
1101  -41.000000   46.305225  NaN
1102  -58.724998  126.810371  NaN
1110   -6.375000   12.915982  0.0
merged = df1.join(series1, how='outer')
print (merged)
            mean         std       val
index                                 
110          NaN         NaN  0.135135
1101  -41.000000   46.305225       NaN
1102  -58.724998  126.810371       NaN
111          NaN         NaN  0.000000
1110   -6.375000   12.915982  0.000000
merged = df1.join(series1, how='right')
print (merged)
        mean        std       val
index                            
110      NaN        NaN  0.135135
111      NaN        NaN  0.000000
1110  -6.375  12.915982  0.000000
inner join/merge in pandas dataframe give more rows than left dataframe

inner join/merge in pandas dataframe give more rows than left dataframe


By : Jey
Date : March 29 2020, 07:55 AM
will be helpful for those in need Only way I can see this happening... particularly with the 14,000 being the same exact number as the number of records in df2 is if the column combination in df2 are not unique.
You can verify that they are not unique with the following (True if unique)
code :
df2.duplicated(['device number', 'date']).sum() == 0
df.set_index(['device number', 'date']).index.is_unique
Join an empty pandas DataFrame with a Multiindex DataFrame

Join an empty pandas DataFrame with a Multiindex DataFrame


By : Giuliano Berrettaros
Date : March 29 2020, 07:55 AM
fixed the issue. Will look into that further The multiple index will not always recognized when we do assign for a simple index , so
code :
df1 = pd.DataFrame(index=range(6),columns=pd.MultiIndex.from_arrays([[],[]]))
df1[df2.columns] = df2
df1
Out[697]: 
          A                    
          a         b         c
0 -0.755397  0.574920  0.901570
1 -0.165472 -1.865715  1.583416
2 -0.403287  1.358329  0.706650
3  0.028019  1.432543 -0.586325
4 -0.414851  0.825253  0.745090
5  0.389917  0.940657  0.125837
Related Posts Related Posts :
  • Submitting login form with scrapy
  • How do i edit the favicon in the Browsable API in Django REST framework?
  • multiprocessing.Pool.map_async doesn't seem to... do anything at all?
  • Python Selenium: Stale Element Reference Exception Error
  • Datetime conversion - How to extract the inferred format?
  • Import YAML variables automatically?
  • How to create a powershell shortcut for my python file
  • Python's 'set' operator doesn't work with numpy.nan
  • Pass object fields and one2many fields on same method - Odoo v8
  • Select columns based on column name and location in Pandas
  • Standardizing timeseries in Pandas using interpolation
  • How many tweets can be collected?
  • how format specifier taking value while tuple list is passed
  • How to print a numpy array with data type?
  • Timeout child thread for python3
  • How can I regroup a dataframe and accumulate a colume's values?
  • Bulk Insert into SQL Server with Python not working
  • Removing last rows of each group based on condition in a pandas dataframe
  • Why the css file can not be found in Django template?
  • targeting center of mass - scipy / numpy
  • Foursquare - get tips from VENUE_ID
  • Unpack a dictionary to format
  • encoding special characters in python2
  • Replacing integers with NaN results in the entire column becoming float dtype
  • Python 3.6 - BeautifulSoup4, parse table AttributeError: ResultSet object has no attribute 'findAll'
  • Convert panda date list to python list of date strings
  • escape response from Scrapy to parse json
  • How to create a same dropdown menu for different labels?
  • Why are some python variables uppercase whereas others are lowercase?
  • Machine Learning, What are the common techniques for feature engineering and presenting the model?
  • Modify value of a Django form field during clean() and validate again
  • Heroku Django app can't start up -- 'No module named site'
  • Getting list of dates (excluding weekends)
  • Im trying to create the regular expression to include the text and not the href
  • Python file.readline(2) reads first 2 charectars
  • Groupby with handling empty bin in python
  • Modifying Gcode
  • calling a value in a dictionary within a dictionary (reading a json file)
  • Bouncing ball invalid syntax why is that?
  • Python making a counter
  • Python rstrip and split
  • What does the String mean in numpy.r_?
  • How to correctly extend variable __all__ in a __init__.py?
  • Python behaves weird with piped input
  • Python 3 two dimensional list comprehension
  • How to slice image by broadcasting slices? Error: 'only integer scalar arrays can be converted to a scalar index' in pyt
  • (Python Beginner) Need a start on classes
  • IndexError: At least one sheet must be visible
  • How to solve a system of linear equations over the nonnegative integers?
  • Pandas keep the most complete rows
  • "List index out of range" error in Python Memory Match game
  • Numpy: how to use argmax results to get the actual max?
  • Google Cloud Dataflow can't import 'google.cloud.datastore'
  • Calculate pandas DataFrame column by custom routine which accepts dictionary as input
  • Connect to a Class Method by it's method name holded into a var in a for loop in python
  • PyQt5 signals and threading.Timer
  • Replace 2 characters in a string in python
  • Passing command line arguments from a folder script to a file script
  • Understand the syntaxe X[Y == c] in Numpy
  • Optimize beginner python script about substring replacement
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org