logo
Tags down

shadow

How to create two bins from 4 levels in pandas dataframe?


By : user2174679
Date : October 17 2020, 08:10 AM
will be helpful for those in need One of my columns in pandas dataframe looks like following. I would like to bin my "Daughter" column such that row with 0 will receive label of "None" and rows containing 1,2,3,4 will receive labels of "Some". , Is this what you need ?
code :
pd.cut(df.Daughter,[-np.inf,0,np.inf],labels=['None','some'])
Out[35]: 
0    None
1    None
2    some
3    some
4    some
5    some
Name: Daughter, dtype: category
Categories (2, object): [None < some]


Share : facebook icon twitter icon

python pandas dataframe create bins only for data in threshold


By : tongtong
Date : March 29 2020, 07:55 AM
Does that help In a python pandas dataframe "df", I have the following three columns:
code :
# get threshholds for first 9 bins
_, bins = pd.cut(df[df.play_count < 200].play_count, bins=9,retbins=True)

# append threshhold representing class with play_counts > 200
new_bins = pd.np.append(bins,float(max(df.play_count)))

# our categorized data
out = pd.cut(df.play_count,bins=new_bins)

# a histogram of the data with the updated bins
df.play_count.hist(bins=new_bins)

Pandas Dataframe create arbitrary Bins by row count


By : ChunGang Xu
Date : March 29 2020, 07:55 AM
should help you out Let's start with a random dataframe with 50 rows:
df = pd.DataFrame(np.random.randn(50, 4), columns=list("ABCD"))
code :
           A         B         C         D
0   0.113454  3.357840 -0.413755 -1.089784
1   0.800012  0.655826  0.688414  0.012480
2   0.604902 -0.332028  0.470119 -0.370570
3   0.661120  0.635879 -0.441816 -0.847047
4   0.836218  2.597254  1.029996  0.554012
..  0.076679  0.262971  0.687525  0.195338
49  1.948361 -0.801236  2.075301 -0.540771
for sub_df_index, sub_df in df.groupby(np.arange(len(df)) // 10):
    print(sub_df.head(10))
          A         B         C         D
0  0.113454  3.357840 -0.413755 -1.089784
1  0.800012  0.655826  0.688414  0.012480
2  0.604902 -0.332028  0.470119 -0.370570
3  0.661120  0.635879 -0.441816 -0.847047
4  0.836218  2.597254  1.029996  0.554012
          A         B         C         D
5 -0.236094  1.714750 -0.091074  0.182944
6  0.928875 -1.125854  0.493389  0.309107
7 -0.238064  1.566493 -0.244627  0.744391
8  0.041049  0.423166  1.020502 -0.467028
9  0.290232  2.119993 -0.174697  0.784637
           A         B         C         D
10 -0.600395  0.604698  0.220617  2.122293
11  0.717157 -0.067665 -1.150331 -0.683567
12  1.006764 -0.869975 -1.646339  0.632909
13  0.076679  0.262971  0.687525  0.195338
14 -0.582238  0.236346 -0.903972 -0.223720
for sub_df_index, sub_df in df.groupby(np.arange(len(df)) // 5):
    sub_df["sub_index"] = sub_df_index
    print(sub_df.head(10))
         A         B         C         D  sub_index
0  0.113454  3.357840 -0.413755 -1.089784          0
1  0.800012  0.655826  0.688414  0.012480          0
2  0.604902 -0.332028  0.470119 -0.370570          0
3  0.661120  0.635879 -0.441816 -0.847047          0
4  0.836218  2.597254  1.029996  0.554012          0
          A         B         C         D  sub_index
5 -0.236094  1.714750 -0.091074  0.182944          1
6  0.928875 -1.125854  0.493389  0.309107          1
7 -0.238064  1.566493 -0.244627  0.744391          1
8  0.041049  0.423166  1.020502 -0.467028          1
9  0.290232  2.119993 -0.174697  0.784637          1
           A         B         C         D  sub_index
10 -0.600395  0.604698  0.220617  2.122293          2
11  0.717157 -0.067665 -1.150331 -0.683567          2
12  1.006764 -0.869975 -1.646339  0.632909          2
13  0.076679  0.262971  0.687525  0.195338          2
14 -0.582238  0.236346 -0.903972 -0.223720          2
df["sub_index"] = np.arange(len(df)) // 5
           A         B         C         D  sub_index
0  -1.381390  0.523980  1.306372  0.000278          0
1  -0.425316  0.937133  0.627025 -0.439032          0
2  -0.443357  0.160292  0.450645 -0.366276          0
3  -2.222720 -1.768990 -0.067939  1.239722          0
4   2.039943  0.774243  0.108462  0.192314          0
5  -0.702514 -1.258634 -1.086802  1.151799          1
6   1.269017  1.115269 -0.417813  1.161220          1
7  -0.620205 -0.054393  0.431089  0.436805          1
8  -2.321976 -1.269446  0.927542 -0.069101          1
9   0.387243  0.055290  1.519623 -0.732410          1
10 -0.227690 -1.991782 -0.712146  0.003375          2
11 -1.396515 -0.074016 -1.141520 -0.226016          2
12 -0.430559  1.347512 -0.773859  1.016727          2
13  0.867294  0.924141 -0.484293 -0.666916          2
14 -0.224497  0.818024  1.057355  1.700363          2
15 -0.790723 -0.039521  1.529804 -0.415783          3

From pandas Dataframe to dataframe timeseries in 15minute Bins by sum rows


By : alanois
Date : March 29 2020, 07:55 AM
this will help First use dt.floor for getting 15 minutes round time and the use groupby.count with resample:
code :
df = (df.groupby(df['Arrival_time'].dt.floor('15T'))['Arrival_time'].count()
        .resample('15T')
        .mean()
        .fillna(0, downcast='infer')
        .reset_index(name='Counted_Arrival'))

print(df)
         Arrival_time  Counted_Arrival
0 2019-01-01 05:30:00                3
1 2019-01-01 05:45:00                0
2 2019-01-01 06:00:00                2
3 2019-01-01 06:15:00                1
print(df)
   ID        Arrival_time
0  22 2019-01-01 05:34:10
1  23 2019-01-01 05:36:18
2  24 2019-01-01 05:44:24
3  25 2019-01-01 06:10:26
4  26 2019-01-01 06:08:28
5  27 2019-01-01 06:22:29

Variable bins for each row in pandas dataframe


By : user3470366
Date : March 29 2020, 07:55 AM
seems to work fine Given a coordinate dataframe such as df1 = pd.DataFrame({'x': np.tile(np.arange(20),5), 'y': np.repeat(np.arange(5),20)}) , IIUC:
code :
df1['xbinned'] = (df1.groupby('y')
                     .apply(lambda d: pd.cut(d['x'], bins=d['y'][0]+1))
                     .reset_index(level=0, drop=True)
                 )
     x  y         xbinned
18  18  0  (-0.019, 19.0]
19  19  0  (-0.019, 19.0]
38  18  1     (9.5, 19.0]
39  19  1     (9.5, 19.0]

How to write pandas dataframe containing bins to a file so it can be read back into pandas?


By : George Fuller
Date : March 29 2020, 07:55 AM
I wish did fix the issue. You might want to check out pandas.DataFrame.to_pickle and pandas.read_pickle:
code :
>>> df.to_pickle("./test.pkl")
...
...
>>> df = pd.read_pickle("./test.pkl")
>>> type(df['aBins'].iloc[0]) 
pandas._libs.interval.Interval
Related Posts Related Posts :
  • How do I capitalize each parameter in a function definition using Python?
  • Regex matching of a bytes pattern gives unusual results - '.' not equivalent to [\x00-\xff]
  • I need help converting this REST API Curl command to Python requests
  • How do you make a variable comparison to decide a better score in a dice game?
  • How do I run sumo-gui on instant-veins-4.7.1-i1.ova
  • Deal with NAN values when creating models with python
  • Python requests: having a space in header for posting
  • Adding a column to a pandas dataframe based on cell values
  • Get mongod rs.status() results from a python script
  • ImportError: C extension: No module named 'parsing' not built
  • python pandas update column values related to previous updated row during iteration over it
  • 3 nested loops: Optimizing a simple simulation for speed
  • Assign subset of values to pandas dataframe with MultiIndex
  • How to group two sets of buttons on each top corner of the screen using Tkinter?
  • django login using class based for custom user
  • MRJob sort reducer output
  • Python Pandas Counts using rolling time window
  • Getting or editing a string from a column in a csv file with pandas
  • Python - Delete row in matrix/array if row contains
  • Using dicom Images with OpenCV in Python
  • Odoo ghost record
  • Creating and assigning multiple variables in a tkinter application
  • Graph dictionary
  • No changes to original dataframe after applying loop
  • AUC of Random forest model is lower after tuning parameters using hypergrid search and CV with 10 folds
  • Python: Reading multiple CSV files, and assigning each to a different variable
  • How to identify empty rectangle using OpenCV
  • How to iterate multilevel dataframe in python
  • How to limit the contour plot with a line plot?
  • Why subclassing a str or int behaves differently from subclising a list or dict?
  • Python decode with translation table
  • i need to click unordered links in the below URL using selenium, python
  • How to join pandas dataframe with itself?
  • How to apply a color cast to a video frame in OpenCV Python?
  • Is there any existing library for median filtering with kernel size greater then 5 using OpenCL acceleration in python?
  • Changing the color of points in scatter plot for different dummy values
  • Calculate center for each polygon in a list efficiently
  • Loading modules in the same Python package
  • replacing pixels in an imagewith pixels from another image python
  • Suggestion on picking the best options of two lists (minimum and maximum )python
  • Resetting Index in a Dataframe drops the Indexed column by 1 row
  • Convert number which are str from readlines to digits - python
  • Unable to authenitcate with python minds api
  • Print variables from a query in python
  • Ipython does not see the installed library
  • Javascript-like array-method chaining in Python?
  • PyQT: Get contents CustFormWidgetIem inside QListWidgetItem
  • Bottle server: HTTPResponse vs bottle.response
  • pytorch vgg model test on one image
  • Runtime scope and `main` symbol is different inside or outside a function
  • Use anaconda in pycharm (Import libraries error, updating anaconda and virtual environment)
  • how to get the sum of a CSV column list to print
  • Python plot drop lines with repeating value in column
  • receive binary file from POST request with BaseHTTPRequestHandler
  • D-Bus - 'ServiceUnknown' exception encountered while calling a remote procedure
  • Pandas .min() method doesn't seem fastest
  • Pandas: How to reference columns of structure: ('Name', n) ('Name', n+1)
  • Read a text file and remove all characters except alphabets & spaces in Python
  • Compute all powerset intersections of two lists
  • Applying literal_eval on string of lists of POS tags gives ValueError
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org