logo
down
shadow

aggregate in multilevel index


aggregate in multilevel index

By : Dreqnoid
Date : November 20 2020, 03:01 PM
wish helps you I have a dataframe that has two level column indices. I need to have different aggregation functions on two keys (columns). However, I received an error on my code. How can I aggregate on multiple columns in multilevel dataframe. , You could flatten the column MultiIndex before grouping:
code :
df1 = pd.DataFrame(dic1)
df2 = df1.to_timestamp(how='end')
df2 = df2.rename_axis(['operation', 'YN'], axis=1)
df3 = df2.stack(level='YN').reset_index('YN')
# operation     YN  count       sum
# 1993-01-31  N.A.      0       NaN
# 1993-01-31    No      1    6.5820
# 1993-01-31   Yes      0       NaN
# 1993-02-28  N.A.      0       NaN
# 1993-02-28    No      1  131.1865
# 1993-02-28   Yes      0       NaN
# 1993-03-31  N.A.      0       NaN
# 1993-03-31    No      1  133.3105
# 1993-03-31   Yes      0       NaN
import numpy as np
import pandas as pd
Period = pd.Period
nan = np.nan

dic1 = {('count', 'N.A.'): {Period('1993-01', 'M'): 0, Period('1993-02', 'M'): 0, Period('1993-03', 'M'): 0}, ('count', 'No'): {Period('1993-01', 'M'): 1, Period('1993-02', 'M'): 1, Period('1993-03', 'M'): 1}, ('count', 'Yes'): {Period('1993-01', 'M'): 0, Period('1993-02', 'M'): 0, Period('1993-03', 'M'): 0}, ('sum', 'N.A.'): {Period('1993-01', 'M'): nan, Period('1993-02', 'M'): nan, Period('1993-03', 'M'): nan}, ('sum', 'No'): {Period('1993-01', 'M'): 6.5820000000000007, Period('1993-02', 'M'): 131.1865, Period('1993-03', 'M'): 133.31049999999999}, ('sum', 'Yes'): {Period('1993-01', 'M'): nan, Period('1993-02', 'M'): nan, Period('1993-03', 'M'): nan}}

df1 = pd.DataFrame(dic1)
df2 = df1.to_timestamp(how='end')
df2 = df2.rename_axis(['operation', 'YN'], axis=1)
df3 = df2.stack(level='YN').reset_index('YN')

grouped = df3.groupby([pd.TimeGrouper('A'), 'YN'])
result = grouped.agg(
    {'count':['max', 'min', 'median', 'last'],  'sum':['mean', 'max' , 'last']})
result = result.unstack('YN')
print(result)
            sum                                                      count  \
           mean                 max               last                 max   
YN         N.A.         No Yes N.A.        No Yes N.A.        No Yes  N.A.   
1993-12-31  NaN  90.359667 NaN  NaN  133.3105 NaN  NaN  133.3105 NaN     0   

           ...                                            
           ...      min        median        last         
YN         ... Yes N.A. No Yes   N.A. No Yes N.A. No Yes  
1993-12-31 ...   0    0  1   0      0  1   0    0  1   0  


Share : facebook icon twitter icon
Aggregate arrays in multilevel array if their values match

Aggregate arrays in multilevel array if their values match


By : Clayton
Date : March 29 2020, 07:55 AM
should help you out My favorite solution uses array_reduce():
code :
$filtered = array_reduce(
    // Reduce the original list
    $arrays,
    // The callback function adds $item to $carry (the partial result)
    function (array $carry, array $item) {
        // Generate a key that contains the first 3 properties
        $key = $item['product-id'].'|'.$item['product'].'|'.$item['description'];
        // Search into the partial list generated until now
        if (array_key_exists($key, $carry)) {
            // Update the existing item
            $carry[$key]['quantity'] += $item['quantity'];
        } else {
            // Add the new item
            $carry[$key] = $item;
        }
        // The array_reduce() callback must return the updated $carry
        return $carry;
    },
    // Start with an empty list
    array()
);
Reset index on multilevel columns in pandas so that higher index perfaces lower index

Reset index on multilevel columns in pandas so that higher index perfaces lower index


By : Tear
Date : March 29 2020, 07:55 AM
it fixes the issue I have a df like: , df.columns.map("".join) is enough, as following:
code :
In [12]: df
Out[12]: 
   A        B      
   C  D  E  F  G  H
0  0  1  2  3  4  5
1  0  1  2  3  4  5
2  0  1  2  3  4  5
3  0  1  2  3  4  5
4  0  1  2  3  4  5
5  0  1  2  3  4  5

In [13]: df.columns = df.columns.map("".join)

In [14]: df
Out[14]: 
   AC  AD  AE  BF  BG  BH
0   0   1   2   3   4   5
1   0   1   2   3   4   5
2   0   1   2   3   4   5
3   0   1   2   3   4   5
4   0   1   2   3   4   5
5   0   1   2   3   4   5
Aggregate document multilevel

Aggregate document multilevel


By : user1094953
Date : March 29 2020, 07:55 AM
it helps some times Now consider the case , i have one document containing below collection like structure. Below is the order collection , You can try below aggregation:
code :
db.brand.aggregate([
    {
        $lookup: {
            from: "order",
            localField: "_id",
            foreignField: "brand_id",
            as: "orders"
        }
    },
    {
        $unwind: "$orders"
    },
    {
        $lookup: {
            from: "category",
            localField: "orders.category_id",
            foreignField: "_id",
            as: "categories"
        }
    },
    {
        $unwind: "$categories"
    },
    {
        $group: {
            _id: "$_id",
            name: { $first: "$name" },
            description: { $first: "$description" },
            updated_at: { $first: "$updated_at" },
            created_at: { $first: "$created_at" },
            categories: { $addToSet: "$categories" },
            orders: { $addToSet: "$orders" }
        }
    },
    {
        $addFields: {
            categories: {
                $map: {
                    input: "$categories",
                    as: "category",
                    in: {
                        $mergeObjects: [ 
                            "$$category", { 
                                orders: [ { 
                                    $filter: { 
                                        input: "$orders", 
                                        as: "order", 
                                        cond: { $eq: [ "$$category._id", "$$order.category_id" ] } 
                                    } 
                                } ]
                         } ]
                    }
                }
            }
        }
    },
    {
        $project: {
            orders: 0
        }
    }
])
{
    "_id" : ObjectId("5b0e52f058b8287a446f9f05"),
    "name" : "brand1",
    "description" : "brand1",
    "updated_at" : ISODate("2017-07-05T09:18:13.951Z"),
    "created_at" : ISODate("2017-07-05T09:18:13.951Z"),
    "categories" : [
            {
                    "_id" : ObjectId("5693d170a2191f9020b8c814"),
                    "name" : "Category1",
                    "created_at" : ISODate("2016-01-11T20:32:17.832Z"),
                    "updated_at" : ISODate("2016-01-11T20:32:17.832Z")
            }
    ],
    "orders" : [
            {
                    "_id" : ObjectId("5788fcd1d8159c2366dd5d93"),
                    "color" : "Blue",
                    "code" : "1",
                    "category_id" : ObjectId("5693d170a2191f9020b8c814"),
                    "description" : "julia tried",
                    "name" : "Order1",
                    "brand_id" : ObjectId("5b0e52f058b8287a446f9f05")
            }
    ]
}
db.brand.aggregate([
    {
        $lookup: {
            from: "order",
            localField: "_id",
            foreignField: "brand_id",
            as: "orders"
        }
    },
    {
        $unwind: "$orders"
    },
    {
        $lookup: {
            from: "category",
            localField: "orders.category_id",
            foreignField: "_id",
            as: "categories"
        }
    },
    {
        $unwind: "$categories"
    },
    {
        $group: {
            _id: "$_id",
            name: { $first: "$name" },
            description: { $first: "$description" },
            updated_at: { $first: "$updated_at" },
            created_at: { $first: "$created_at" },
            categories: { $addToSet: "$categories" },
            orders: { $addToSet: "$orders" }
        }
    },
    {
        $addFields: {
            categories: {
                $map: {
                    input: "$categories",
                    as: "category",
                    in: {
                        _id: "$$category._id",
                        name: "$$category.name",
                        created_at: "$$category.created_at",
                        updated_at: "$$category.updated_at",
                        orders: [ 
                            { 
                                $filter: { 
                                    input: "$orders", 
                                    as: "order", 
                                    cond: { $eq: [ "$$category._id", "$$order.category_id" ] } 
                                } 
                            } 
                        ]
                    }
                }
            }
        }
    },
    {
        $project: {
            orders: 0
        }
    }
])
Aggregate pandas Series/DataFrame with MultiLevel Index And Insert Result

Aggregate pandas Series/DataFrame with MultiLevel Index And Insert Result


By : user3303286
Date : March 29 2020, 07:55 AM
it helps some times Given a pandas Series (or DataFrame) with a multi-level index: , Here is one way use unstack
code :
s=df['count'].unstack()
s['sum']=s.sum(1)
s=s.stack()
name  month  
A     2019-05     8.0
      2019-06     8.0
      2019-07     3.0
      2019-08     4.0
      2019-09     7.0
      sum        30.0
B     2019-06    10.0
      2019-07     5.0
      2019-08    23.0
      2019-09    10.0
      2019-10    13.0
      sum        61.0
dtype: float64
Python MultiLevel Index on DataFrame. Access first row of first index level to apply function

Python MultiLevel Index on DataFrame. Access first row of first index level to apply function


By : user3695655
Date : March 29 2020, 07:55 AM
may help you . Use MultiIndex.get_level_values for extract second level and add DatetimeIndex.quarter:
code :
fake_data['qtr'] = fake_data.index.get_level_values(1).quarter
print (fake_data.head(20))
                       z  qtr
x y                          
0 2020-04-01    0.000000    2
  2020-05-01    0.000000    2
  2020-06-01    0.000000    2
  2020-07-01    0.000000    3
  2020-08-01    0.000000    3
  2020-09-01    0.000000    3
  2020-10-01    0.000000    4
  2020-11-01    0.000000    4
  2020-12-01    0.000000    4
  2021-01-01    0.000000    1
  2021-02-01    0.000000    1
1 2020-04-01  983.538088    2
  2020-05-01  983.538088    2
  2020-06-01  983.538088    2
  2020-07-01  983.538088    3
  2020-08-01  983.538088    3
  2020-09-01  983.538088    3
  2020-10-01  983.538088    4
  2020-11-01  983.538088    4
  2020-12-01  983.538088    4
fake_data['qtr'] = pd.factorize(fake_data.index.get_level_values(1).quarter)[0]
print (fake_data.head(20))
                      z  qtr
x y                         
0 2020-04-01   0.000000    0
  2020-05-01   0.000000    0
  2020-06-01   0.000000    0
  2020-07-01   0.000000    1
  2020-08-01   0.000000    1
  2020-09-01   0.000000    1
  2020-10-01   0.000000    2
  2020-11-01   0.000000    2
  2020-12-01   0.000000    2
  2021-01-01   0.000000    3
  2021-02-01   0.000000    3
1 2020-04-01  80.286425    0
  2020-05-01  80.286425    0
  2020-06-01  80.286425    0
  2020-07-01  80.286425    1
  2020-08-01  80.286425    1
  2020-09-01  80.286425    1
  2020-10-01  80.286425    2
  2020-11-01  80.286425    2
  2020-12-01  80.286425    2
Related Posts Related Posts :
  • How to use an API that requires user's entry (Sentiment Analysis)
  • Django first app
  • Why is this regex code not working
  • Beautifulsoup - findAll not finding string when link is also in container
  • Python: any() to check if attribute in List of Objects matches a list
  • How do I "enrich" every record in a Pandas dataframe with an hour column?
  • Failing to open an Excel file with Python
  • Python function to modify string
  • Pandas DataFrame seems not to have "factorize" method
  • Row column operations in CSV
  • How to decrypt RSA encrypted file (via PHP and OpenSSL) with pyopenssl?
  • How can we use pandas to generate min, max, mean, median, ...as new columns for the dataframe?
  • Cython: creating an array throws "not allowed in a constant expression"
  • Different thing is shown in html
  • sublimetext3 event for program exit
  • Join contigous tokens if the token includes "@" char
  • transparent background in gif using Python Imageio
  • Enable autologin into flask app using active directory
  • Make a NxN array of 1x3 arrays of random numbers (python)
  • django how to use Max and Count on the same field in back-to-back annotations
  • Using the OR operator seems to only take the first of two conditions when used with np.where filter
  • Elegant Dataframe Operations in Pandas
  • Change metadata of pdf file with pypdf2
  • How can I animate a set of points with matplotlib?
  • error: (-215) count >= 0 && (depth == CV_32F || depth == CV_32S) in function arcLength
  • OpenStack KeyStone SSL Exception When Creating an Instance of KeyStone
  • pyspark: The system cannot find the path specified
  • How can I set path to load data from CSV file into PostgreSQL database in Docker container?
  • Summation in python dictionary
  • DRF 3.7.0 removed handling None in fields and broke my foreign key source fields. Is there a way around it?
  • Error with Padlen in signal.filtfilt in Python
  • Abstract matrix multiplication with variables
  • Reading binary data on bit level
  • How to replace multiple instances of a sub strings in a string using a for loop (in a function)?
  • py2neo cypher create several relations to central node in for loop
  • [python-3]TypeError: must be str, not int
  • How to exit/terminate a job earlier and handle the raised exception in apscheduler?
  • python, print intermediate values while loop
  • python to loop over yaml config
  • D3.js is not recognized by PyCharm
  • Access the regularization paths obtained from ElasticNetCV in sklearn
  • Pattern table to Pandas DataFrame
  • Get the earliest date from a column (Python Pandas) after csv.reader
  • Get SystemError: Parent module '' not loaded, cannot perform relative import when trying to import numpy in a Cython Ext
  • Bash or Python : Append and prepend a string recursively in all .tex files
  • Changing a certain index of boolean list of lists change others, too
  • complex dataframe filtering request on the last occurence of a value in Panda/Python [EDIT]
  • How to repeatedly get the contents of a Text widget every loop with tkinter?
  • How to call the tornado.queues message externally
  • How can I use regex in python so that characters not included are disallowed?
  • Discarding randmly scattered empty spaces in pandas data frame
  • Get sums grouped by date by same column filtered by 2 conditions
  • Element disappears when I add an {% include %} tag inside my for loop
  • Django Rest Framework with either a slug or a pk lookup field for the DetailAPIView
  • Flask doesn't stream on Lambda
  • Generate all permutations of fixed length where the elements come from two different sets
  • Making function for calculating distance
  • How to handle multiprocessing based on the limit of CPU's
  • Django - static files is not working
  • Remove x axis and y axis black lines with matplotlib
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org