logo
down
shadow

Pandas .min() method doesn't seem fastest


Pandas .min() method doesn't seem fastest

By : ChrisJung
Date : October 22 2020, 08:10 AM
wish help you to fix your issue When you just call a column, you are reducing it to a pandas series, which is based on a numpy array but with a lot more wrapped around it. Pandas objects are optimized for spreadsheet or database-type operations like joins, lookups, etc.
When you call .values on a column, it makes it a numpy array, which is a dtype optimized for mathematical and vector operations in C. Even with the 'unwrapping' to ndarray type, the mathematical operation efficiency beats the series datatype hands-down. Here is a quick discussion on some of the differences.
code :
type(df['a'])

pandas.core.series.Series

%timeit df['a'].min()

6.68 ms ± 121 µs per loop

type(df['a'].values)

numpy.ndarray

%timeit df['a'].values.min()

696 µs ± 18 µs per loop


Share : facebook icon twitter icon
Remove duplicate method for Python Pandas doesnt work

Remove duplicate method for Python Pandas doesnt work


By : toldyou123
Date : March 29 2020, 07:55 AM
around this issue You need to assign the result of drop_duplicates, by default inplace=False so it returns a copy of the modified df, as you don't pass param inplace=True your original df is unmodified:
code :
In [106]:

df = df.drop_duplicates('new', take_last=False)
df.groupby('new').max()
Out[106]:
            A         B         C         D  new2
new                                              
1   -1.698741 -0.550839 -0.073692  0.618410     1
3    0.519596  1.686003  1.395585  1.298783     2
4    1.557550  1.249577  0.214546 -0.077569     4
5   -0.183454 -0.789351 -0.374092 -1.824240     5
7   -1.176468  0.546904  0.666383 -0.315945     7
8   -1.224640 -0.650131 -0.394125  0.765916     8
10  -1.045131  0.726485 -0.194906 -0.558927     5
In [108]:

df.drop_duplicates('new', take_last=False, inplace=True)
df.groupby('new').max()
Out[108]:
            A         B         C         D  new2
new                                              
1    0.334352 -0.355528  0.098418 -0.464126     1
3   -0.394350  0.662889 -1.012554 -0.004122     2
4   -0.288626  0.839906  1.335405  0.701339     4
5    0.973462 -0.818985  1.020348 -0.306149     5
7   -0.710495  0.580081  0.251572 -0.855066     7
8   -1.524862 -0.323492 -0.292751  1.395512     8
10  -1.164393  0.455825 -0.483537  1.357744     5
Fastest method of finding data from another row in Pandas DataFrame based upon column data calculation?

Fastest method of finding data from another row in Pandas DataFrame based upon column data calculation?


By : Natalia Brzeska
Date : March 29 2020, 07:55 AM
it helps some times Without resorting to looping thru each individual row of the dataframe, which can be very slow for large datasets, how do I used the calculated result of two columns in a row, 2*A - B, to find a value in column B and from that new row pull data from column C and place into column D of the original row. , Use pd.DataFrame.eval
code :
df1.assign(D=df1.eval('2 * A - B').map(df1.set_index('B').C))

   A  B  C  D
0  3  1  3  5
1  3  3  4  4
2  3  5  5  3
m = dict(zip(df1.B.values.tolist(), df1.C.values.tolist()))
a = df1.A.values
b = df1.B.values
z = 2 * a - b

df1.assign(D=[m[i] for i in z.tolist()])

   A  B  C  D
0  3  1  3  5
1  3  3  4  4
2  3  5  5  3
Fastest method of filtering a pandas data frame by category

Fastest method of filtering a pandas data frame by category


By : Imran khan
Date : March 29 2020, 07:55 AM
hope this fix your issue Here's a similar but different approach that directly compares the value rather than using isin.
Basic map / lambda comparison:
code :
%timeit df[df['categories'].map(lambda x: x in string.ascii_lowercase)]
> 1 loop, best of 3: 12.3 s per loop
%timeit df[df['categories'].isin(list(string.ascii_lowercase))]
> 1 loop, best of 3: 55.1 s per loop
df.shape
> (100000000, 2)
df.dtypes
> categories    category
 values           int64
 dtype: object
Fastest method of finding and replacing row-specific data in a pandas DataFrame

Fastest method of finding and replacing row-specific data in a pandas DataFrame


By : Kyle Beggs
Date : March 29 2020, 07:55 AM
it helps some times The fastest method I found was to use the apply function in tandem with a replacer function that uses the basic str.replace() method. It's very fast, even with a for loop inside it, and it also allows for a dynamic amount of columns:
code :
def value_replacement(df_to_replace, replace_col):
    """ replace the <replace_col> column of a dataframe with the values in all other columns """

    cols = [col for col in df_to_replace.columns if col != replace_col]

    def replacer(rep_df):
        """ function to by used in the apply function """
        for col in cols:
            rep_df[replace_col] = \
                str(rep_df[replace_col]).replace(col.lower(), str(rep_df[col]))

        return rep_df[replace_col]

    df_to_replace[replace_col] = df_to_replace.apply(replacer, axis=1)

    return df_to_replace
Pandas Interview Question - Compare Pandas-Joins and Ideally Provide the Fastest Method

Pandas Interview Question - Compare Pandas-Joins and Ideally Provide the Fastest Method


By : user2687754
Date : March 29 2020, 07:55 AM
it fixes the issue You can change the last method to join as was suggested by Siddharth. Suppose your DataFrame is much larger:
code :
hist_df = pd.DataFrame(columns=['HoD', 'Volume'])
hist_df['HoD'] = np.random.randint(0, 10000, 365 * 10000)
hist_df['Volume'] = np.random.uniform(1, 10000, 365 * 10000)
%timeit merged_df = pd.merge(hist_df, tariffs_df, how='left', left_on='bin', right_on='Time range')

1 loop, best of 3: 740 ms per loop


%timeit hist = hist_df.set_index('bin')
%timeit tariffs = tariffs_df.set_index('Time range')
%timeit merged_df = hist.join(tariffs)

10 loops, best of 3: 20.1 ms per loop
1000 loops, best of 3: 449 µs per loop
100 loops, best of 3: 3.59 ms per loop
Related Posts Related Posts :
  • Submitting login form with scrapy
  • How do i edit the favicon in the Browsable API in Django REST framework?
  • multiprocessing.Pool.map_async doesn't seem to... do anything at all?
  • Python Selenium: Stale Element Reference Exception Error
  • Datetime conversion - How to extract the inferred format?
  • Import YAML variables automatically?
  • How to create a powershell shortcut for my python file
  • Python's 'set' operator doesn't work with numpy.nan
  • Pass object fields and one2many fields on same method - Odoo v8
  • Select columns based on column name and location in Pandas
  • Standardizing timeseries in Pandas using interpolation
  • How many tweets can be collected?
  • how format specifier taking value while tuple list is passed
  • How to print a numpy array with data type?
  • Timeout child thread for python3
  • How can I regroup a dataframe and accumulate a colume's values?
  • Bulk Insert into SQL Server with Python not working
  • Removing last rows of each group based on condition in a pandas dataframe
  • Why the css file can not be found in Django template?
  • targeting center of mass - scipy / numpy
  • Foursquare - get tips from VENUE_ID
  • Unpack a dictionary to format
  • encoding special characters in python2
  • Replacing integers with NaN results in the entire column becoming float dtype
  • Python 3.6 - BeautifulSoup4, parse table AttributeError: ResultSet object has no attribute 'findAll'
  • Convert panda date list to python list of date strings
  • escape response from Scrapy to parse json
  • How to create a same dropdown menu for different labels?
  • Why are some python variables uppercase whereas others are lowercase?
  • Machine Learning, What are the common techniques for feature engineering and presenting the model?
  • Modify value of a Django form field during clean() and validate again
  • Heroku Django app can't start up -- 'No module named site'
  • Getting list of dates (excluding weekends)
  • Im trying to create the regular expression to include the text and not the href
  • Python file.readline(2) reads first 2 charectars
  • Groupby with handling empty bin in python
  • Modifying Gcode
  • calling a value in a dictionary within a dictionary (reading a json file)
  • Bouncing ball invalid syntax why is that?
  • Python making a counter
  • Python rstrip and split
  • What does the String mean in numpy.r_?
  • How to correctly extend variable __all__ in a __init__.py?
  • Python behaves weird with piped input
  • Python 3 two dimensional list comprehension
  • How to slice image by broadcasting slices? Error: 'only integer scalar arrays can be converted to a scalar index' in pyt
  • (Python Beginner) Need a start on classes
  • IndexError: At least one sheet must be visible
  • How to solve a system of linear equations over the nonnegative integers?
  • Pandas keep the most complete rows
  • "List index out of range" error in Python Memory Match game
  • Numpy: how to use argmax results to get the actual max?
  • Google Cloud Dataflow can't import 'google.cloud.datastore'
  • Calculate pandas DataFrame column by custom routine which accepts dictionary as input
  • Connect to a Class Method by it's method name holded into a var in a for loop in python
  • PyQt5 signals and threading.Timer
  • Replace 2 characters in a string in python
  • Passing command line arguments from a folder script to a file script
  • Understand the syntaxe X[Y == c] in Numpy
  • Optimize beginner python script about substring replacement
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org