logo
Tags down

shadow

How to convert 'NaN' strings in a pandas Series to null values for dropna?


By : زينب قريع أبو نعمة
Date : October 17 2020, 08:10 AM
Any of those help I tried a couple methods to clean rows containing NaN from a particular Series in my DataFrame only to realize every NaN entry is a 'NaN' string, not a null value. , First convert your strings to NaN values:
code :
df = df.replace('NaN', np.nan)
df = df.dropna(subset=['GDP per Capita'])           # not in place version
df.dropna(subset=['GDP per Capita'], inplace=True)  # in place version
df = df.loc[df['GDP per Capita'].notnull()]


Share : facebook icon twitter icon

pandas dropna on series


By : gneuner
Date : March 29 2020, 07:55 AM
should help you out I have a pandas table df: , The basic issue with your code is in the line -
code :
filtered_df = df.drop_duplicates
filtered_df = df.drop_duplicates()
filtered_df = filtered_df[filtered_df['Price'].notnull()]
filtered_df = filtered_df[filtered_df['Price'].convert_objects(convert_numeric=True).notnull()]
In [42]: df = pd.DataFrame({'sku': ('SKU123', 'SKU124', 'SKU124', 'SKU125', 'SKU126', 'SKU127'), 'Cat':('CatA', 'CatB', 'CatB', 'CatA', 'CatB', 'CatC'), 'Price':(4.5, 4.7, 4.7, '', '', 4.5)})

In [43]: filtered_df = df.drop_duplicates()

In [44]: filtered_df = filtered_df[filtered_df['Price'].convert_objects(convert_numeric=True).notnull()]

In [45]: filtered_df
Out[45]:
    Cat Price     sku
0  CatA   4.5  SKU123
1  CatB   4.7  SKU124
5  CatC   4.5  SKU127

Pandas convert Series of strings to Series of lists of strings (of size 1) for encoding


By : user1893227
Date : March 29 2020, 07:55 AM
help you fix your problem I know the title is confusing, but let me explain. I'm trying to prepare Series' for a sklearn.MultiLableBinarizer, with each string being a separate user id I want to one-hot-encode. Erroneously, it is iterating over each individual character of the string. Doing series.apply(list) does the same thing, splitting each string into its individual characters. If the series goes like: , Chaining s.apply(lambda x: [x]) works perfectly.

Quickly convert Pandas Series of labels into Series of indirect values from corresponding columns


By : user3285607
Date : March 29 2020, 07:55 AM
This might help you you can do it using DataFrame.mask or numpy where like below looks like numpy where performs slightly better in this dataset
code :
N = np.arange(1, 10)
df_b = pd.DataFrame({
    'ref': [ 'a',  'b',  'c',  'd',  'c',  'b',  'a',  'b',  'c'],
    'a':   [   1,    2,    3,    4,    5,    6,    7,    8,    9],
    'b':   [  10,   20,   30,   40,   50,   60,   70,   80,   90],
    'c':   [ 100,  200,  300,  400,  500,  600,  700,  800,  900],
    'd':   [1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000],
})

df_b
%%timeit
df = df_b.copy()
cols = df.columns[1:]
df["ind"] = df["ref"]

for col in cols:
    df.ind.mask(df.ind==col, df[col], inplace=True)
df
## 6.73 ms ± 129 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
df = df_b.copy()
arr = df.ref.values

cols = df.columns[1:]
for col in cols:
    arr2 = df[col].values
    arr = np.where(arr==col, arr2, arr)

df["ind"] = arr
df

## 1.21 ms ± 73 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    ref a   b   c   d   ind
0   a   1   10  100 1000    1
1   b   2   20  200 2000    20
2   c   3   30  300 3000    300
3   d   4   40  400 4000    4000
4   c   5   50  500 5000    500
5   b   6   60  600 6000    60
6   a   7   70  700 7000    7
7   b   8   80  800 8000    80
8   c   9   90  900 9000    900

iterate over all columns of a pandas dataframe and count the values in each column (pd.Series.value_counts(dropna=False)


By : user3607112
Date : March 29 2020, 07:55 AM
fixed the issue. Will look into that further You can use lambda in order to pass the argument dropna=False into value_counts function
code :
df.apply((lambda x: pd.Series.value_counts(x, dropna=False)))

Get the most common values in a column according to another / Convert pandas series of series into a dictionary of list


By : user3707142
Date : March 29 2020, 07:55 AM
I wish this helpful for you Pandas solution is possible, but slowier, because many operiations - created DataFrame from MultiIndex and aggregate list:
code :
d = common_values.index.to_frame(index=False).groupby('A')['B'].apply(list).to_dict()
print (d)
{1: [10, 11], 3: [30]}
from collections import defaultdict

d = defaultdict(list)
for a, b in common_values.index:
    d[a].append(b)

d = dict(d)
print (d)
{1: [10, 11], 3: [30]
Related Posts Related Posts :
  • Get mongod rs.status() results from a python script
  • ImportError: C extension: No module named 'parsing' not built
  • python pandas update column values related to previous updated row during iteration over it
  • 3 nested loops: Optimizing a simple simulation for speed
  • Assign subset of values to pandas dataframe with MultiIndex
  • How to group two sets of buttons on each top corner of the screen using Tkinter?
  • django login using class based for custom user
  • MRJob sort reducer output
  • Python Pandas Counts using rolling time window
  • Getting or editing a string from a column in a csv file with pandas
  • Python - Delete row in matrix/array if row contains
  • Using dicom Images with OpenCV in Python
  • Odoo ghost record
  • Creating and assigning multiple variables in a tkinter application
  • Graph dictionary
  • No changes to original dataframe after applying loop
  • AUC of Random forest model is lower after tuning parameters using hypergrid search and CV with 10 folds
  • Python: Reading multiple CSV files, and assigning each to a different variable
  • How to identify empty rectangle using OpenCV
  • How to iterate multilevel dataframe in python
  • How to limit the contour plot with a line plot?
  • Why subclassing a str or int behaves differently from subclising a list or dict?
  • Python decode with translation table
  • i need to click unordered links in the below URL using selenium, python
  • How to join pandas dataframe with itself?
  • How to apply a color cast to a video frame in OpenCV Python?
  • Is there any existing library for median filtering with kernel size greater then 5 using OpenCL acceleration in python?
  • Changing the color of points in scatter plot for different dummy values
  • Calculate center for each polygon in a list efficiently
  • Loading modules in the same Python package
  • replacing pixels in an imagewith pixels from another image python
  • Suggestion on picking the best options of two lists (minimum and maximum )python
  • Resetting Index in a Dataframe drops the Indexed column by 1 row
  • Convert number which are str from readlines to digits - python
  • Unable to authenitcate with python minds api
  • Print variables from a query in python
  • Ipython does not see the installed library
  • Javascript-like array-method chaining in Python?
  • PyQT: Get contents CustFormWidgetIem inside QListWidgetItem
  • Bottle server: HTTPResponse vs bottle.response
  • pytorch vgg model test on one image
  • Runtime scope and `main` symbol is different inside or outside a function
  • Use anaconda in pycharm (Import libraries error, updating anaconda and virtual environment)
  • how to get the sum of a CSV column list to print
  • Python plot drop lines with repeating value in column
  • receive binary file from POST request with BaseHTTPRequestHandler
  • D-Bus - 'ServiceUnknown' exception encountered while calling a remote procedure
  • Pandas .min() method doesn't seem fastest
  • Pandas: How to reference columns of structure: ('Name', n) ('Name', n+1)
  • Read a text file and remove all characters except alphabets & spaces in Python
  • Compute all powerset intersections of two lists
  • Applying literal_eval on string of lists of POS tags gives ValueError
  • Modelling a logic puzzle
  • What is the meaning of Copy_X in sklearn linear models
  • selenium.common.exceptions.ElementNotInteractableException: Message: Element is not displayed
  • pydev debugger does not stop in breakpoint
  • Python windows path regex
  • Flask and selenium-hub are not communicating when dockerised
  • How to use groupby on a single column and perform comparisons for multiple columns in Pandas?
  • Locate a python script without absolute path
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org