logo
down
shadow

series.unique vs list of set - performance


series.unique vs list of set - performance

By : Jeff Ward
Date : November 21 2020, 03:00 PM
will be helpful for those in need It will depend on the data type. For numeric types, pd.unique should be significantly faster.
For strings, which are stored as python objects, there will be a much smaller difference, and set() will usually be competitive, as it is doing a very similar thing.
code :
strs = np.repeat(np.array(['a', 'b', 'c'], dtype='O'), 10000)

In [11]: %timeit pd.unique(strs)
558 µs ± 16.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [12]: %timeit list(set(strs))
531 µs ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

nums = np.repeat(np.array([1, 2, 3]), 10000)

In [13]: %timeit pd.unique(nums)
230 µs ± 9.28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [14]: %timeit list(set(nums))
2.16 ms ± 71 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Share : facebook icon twitter icon
Converting pandas series timestamp to list of unique dates

Converting pandas series timestamp to list of unique dates


By : Emon Selorio
Date : March 29 2020, 07:55 AM
will be helpful for those in need I have a column in pandas dataframe in timestamp format and want to extract unique dates (no time) into a list. I tried following ways doesn't really work, , You can use dt to access the date time object in a Series, try this:
code :
pd.to_datetime(df['EventTime']).dt.date.unique().tolist()
# [datetime.date(2014, 1, 1), datetime.date(2014, 1, 2)]
df = pd.DataFrame({"EventTime": ["2014-01-01", "2014-01-01", "2014-01-02 10:12:00", "2014-01-02 09:12:00"]})
Confusion with series, list and unique elements

Confusion with series, list and unique elements


By : Miju
Date : March 29 2020, 07:55 AM
hop of those help? Your tags is a list of pandas.Series objects. When you build your list from loc-based selection from the data-frame:
code :
for user in users2:
   tags.append(user_counters["tags"].loc[user])
>>> df = pd.DataFrame({'tag':[1,2,3, 4], 'c':[1.4,3.9, 2.8, 6.9]}, index=['ted','sara','anne', 'ted'])
>>> df
        c  tag
ted   1.4    1
sara  3.9    2
anne  2.8    3
ted   6.9    4
>>>
>>> df['tag'].loc['ted']
user
ted    1
ted    4
Name: a, dtype: int64
>>> type(df['a'].loc['ted'])
<class 'pandas.core.series.Series'>
List unique identifier of missing rows in a series

List unique identifier of missing rows in a series


By : LH_FCCS
Date : March 29 2020, 07:55 AM
hop of those help? Is it possible to return the row number of missing values within a given series? , You can do df.index[pd.isnull(df['Age'])]
Iterate through a dask series (getting unique values from dask series to list)

Iterate through a dask series (getting unique values from dask series to list)


By : Mazher Hussain
Date : March 29 2020, 07:55 AM
should help you out I need to iterate through unique values from a dask dataframe. I used .unique() to get the unique values of the columns but now i'm given a dask object that I cannot use to iterate. I need to know how to get these unique values out of this dask object into a list (or something similar) so I can use those values to iterate through the dask dataframe. , This issue has been resolved in dask=2.3.
code :
In [1]: import pandas as pd
   ...: import dask.dataframe as dd
   ...: import dask

In [2]: dask.__version__
Out[2]: '2.3.0'

In [3]: df = pd.DataFrame({"temp1":[1,2,2,4],"temp2":[1,2,2,4]})
   ...: ddf = dd.from_pandas(df,npartitions=2)
   ...: for unique_value in ddf.temp1.unique():
   ...:     print(unique_value)
   ...:     
1
2
4
How to put unique values of 1 series as columns and count each occurence of the unique values from the series per quarte

How to put unique values of 1 series as columns and count each occurence of the unique values from the series per quarte


By : Omveer Singh
Date : March 29 2020, 07:55 AM
This might help you IIUC, you want pd.crosstab
code :
new_df = pd.crosstab(df['date'].dt.to_period('Q'),df['col1'],
                     rownames=['dateresponded'],
                     colnames=[None])
print(new_df)
new_df = (df.groupby([df['date'].dt.to_period('Q'),'col1'])
            .size()
            .unstack(fill_value = 0)
            .rename_axis(columns = None,index = 'dateresponded'))
print(new_df)
new_df = (df.groupby(df['date'].dt.to_period('Q'))
            .col1
            .value_counts()
            .unstack(fill_value = 0)
            .rename_axis(columns = None,index = 'dateresponded'))
 print(new_df)
               a  b  c
dateresponded         
2019Q4         0  1  1
2020Q1         2  1  0
Related Posts Related Posts :
  • Element disappears when I add an {% include %} tag inside my for loop
  • Django Rest Framework with either a slug or a pk lookup field for the DetailAPIView
  • Flask doesn't stream on Lambda
  • Generate all permutations of fixed length where the elements come from two different sets
  • Making function for calculating distance
  • How to handle multiprocessing based on the limit of CPU's
  • Django - static files is not working
  • Remove x axis and y axis black lines with matplotlib
  • tkinter: assigning multiple functions to one button
  • flask-jwt-extended: Fake Authorization Header during testing (pytest)
  • Setting pandas dataframe value based on row and column conditions
  • swig char ** as a pointer to a char *
  • Confusion over `a` and `b` attributes from scipy.stats.uniform
  • How can I do groupy.apply() without sort my index?
  • Querying Google Cloud datastore with ancestor not returning anything
  • Read value from one thread in Python: queue or global variable?
  • Django - context process query being repeated 102 times
  • Convert a list of images and labels to np array to train tensorflow
  • Lambda not supporting NLTK file size
  • Numpy ndarray image pixel mean for pixel values greater than zero: Normalizing image
  • Understanding output of np.corrcoef for two matrices of different sizes
  • Finding longest perfect match between two strings
  • what is wrong with my cosine similarity? Tensorflow
  • How to manage user content in django?
  • Receiving unsupported operand error while comparing random number and user input.
  • How to wrap the process of creating start_urls in scrapy?
  • How to mark 'duplicated sequence' in pandas?
  • Boolean indexing on multidimensionnal array
  • Unmodified column name index in patsy
  • Cleaner way to unpack nested dictionaries
  • Importing a python module to enable a script to be run from command line
  • Maya Python read and set optionMenu value via variable
  • How can I bind a property to another property in Kivy?
  • Python extracting specific line in text file
  • How to implement n-body simulation with pymunk?
  • Python / matplotlib: print to resolution and without white space / borders / margins
  • Sum up the second value from one dictionary with all values from another dictionary
  • Robot Framework: Open a chrome browser without launching URL
  • Generate inline Bokeh scatterplots in Jupyter using a for loop
  • Group list of dictionaries python
  • Efficient way to apply multiple Boolean mask to set values in a column using pandas
  • Lazy evaluation of a Python dictionary
  • id of xpath is getting changed every time in selenium python 2.7 chrome
  • Matplotlib RuntimeWarning displaying a 3D plot
  • Cannot install pyqt5 for python3.4 on windows 10
  • Gravity Problems
  • Where to position `import` modules inside an class?
  • Python OpenCV: Cannot resize image
  • Print on the same spot in IPython console
  • Disable logging except in tests
  • Writing json to file in s3 bucket
  • Sorting numpy array created by laspy
  • Open an XML file through URL and save it
  • How to build a 2-level dictionary?
  • error installing scipy using pip on windows 10
  • __str__ from my own matrix, python
  • python re how to Extract fields use findall()?
  • how to read a value from text HI file using python?
  • How to use horizontal scrolling in treeview,here i use tree view to make a table
  • Dependant widgets in tkinter
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org