logo
down
shadow

How to summarize all possible combinations of variables?


How to summarize all possible combinations of variables?

By : Alex1290
Date : November 20 2020, 03:01 PM
I wish this helpful for you For this sort of query using some of the built in aggregate tools is quite straight forward.
First off setup some sample data based on your sample image:
code :
declare @Table1 as table
    ([id] int, [a] int, [b] int, [c] int)
;

INSERT INTO @Table1
    ([id], [a], [b], [c])
VALUES
    (10001, 1, 3, 3),
    (10002, 0, 0, 0),
    (10003, 3, 6, 0),
    (10004, 7, 0, 0),
    (10005, 0, 0, 0)
;
with t1 as (
select case a when 0 then null else 'a' end a
     , case b when 0 then null else 'b' end b
     , case c when 0 then null else 'c' end c
     , id
  from @Table1
)
select a, b, c, count(id) cnt
  from t1
  group by cube(a,b,c)
  having (a is not null or grouping(a) = 1) -- For each attribute
     and (b is not null or grouping(b) = 1) -- only allow nulls as
     and (c is not null or grouping(c) = 1) -- a result of grouping.
     and grouping_id(a,b,c) <> 7  -- exclude the grand total
  order by grouping_id(a,b,c);
    a       b       c       cnt
1   a       b       c       1
2   a       b       NULL    2
3   a       NULL    c       1
4   a       NULL    NULL    3
5   NULL    b       c       1
6   NULL    b       NULL    2
7   NULL    NULL    c       1
declare @sql varchar(max), @table varchar(30), @col varchar(30);
set @table = 'Table1';
set @col = 'id';
with x(object_id, column_id, name, names, proj, pred, max_col, cnt) 
  as (
    select object_id, column_id, name, cast(name as varchar(max))
     , cast('case '+name+' when 0 then null else '''+name+''' end '+name as varchar(4000))
     , cast('('+name+' is not null or grouping('+name+') = 1)' as varchar(4000))
     , (select max(column_id) from sys.columns m where m.object_id = c.object_id and m.name <>'ID')
     , 1
     from sys.columns c
    where object_id = OBJECT_ID(@Table)
      and column_id = (select min(column_id) from sys.columns m where m.object_id = c.object_id and m.name <> @col)
    union all
    select x.object_id, c.column_id, c.name, cast(x.names+', '+c.name as varchar(max))
     , cast(proj+char(13)+char(10)+'     , case '+c.name+' when 0 then null else '''+c.name+''' end '+c.name as varchar(4000))
     , cast(pred+char(13)+char(10)+'   and ('+c.name+' is not null or grouping('+c.name+') = 1)' as varchar(4000))
     , max_col
     , cnt+1
      from x join sys.columns c on c.object_id = x.object_id and c.column_id = x.column_id+1
)
select @sql='with t1 as (
select '+proj+'
     , '+@col+'
  from '+@Table+'
)
select '+names+'
     , count('+@col+') cnt 
  from t1
 group by cube('+names+')
having '+pred+'
   and grouping_id('+names+') <> '+cast(power(2,cnt)-1 as varchar(10))+'
 order by grouping_id('+names+');'
  from x where column_id = max_col;

select @sql sql;
exec (@sql);


Share : facebook icon twitter icon
using ddply to summarize when not all combinations exist

using ddply to summarize when not all combinations exist


By : Baiyang Zhang
Date : November 21 2020, 07:35 AM
hop of those help? I am trying to count the number of observations that occurred in each month for a combination of a site and variable. For example, my data are in the format
code :
## make some data like yours
set.seed(1)
dat <- seq(as.POSIXct(42, origin = "1990-01-01"), Sys.time(), length.out = 100)
seasons <- data.frame(
  station = sample(LETTERS[1:10], length(dat), TRUE),
  variable = paste0("v", sample(1:5, length(dat), TRUE)),
  date = dat,
  month = as.integer(format(dat, "%m"))
  )

head(seasons)
##   station variable                date month
## 1       C       v4 1989-12-31 19:00:42    12
## 2       D       v2 1990-03-30 18:45:47     3
## 3       F       v2 1990-06-27 19:30:52     6
## 4       J       v5 1990-09-24 19:15:57     9
## 5       C       v4 1990-12-22 18:01:02    12
## 6       I       v2 1991-03-21 17:46:07     3

library(plyr)

out <- ddply(seasons, .(station, variable), function(x)
             table(factor(x$month, levels = 1:12, labels = month.abb)))

head(out)
##   station variable Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1       A       v1   0   0   1   0   1   0   0   0   0   0   0   0
## 2       A       v2   0   0   1   0   0   0   0   0   0   0   0   0
## 3       A       v3   0   1   1   0   1   0   0   0   0   0   0   0
## 4       A       v4   0   0   0   0   0   0   1   0   0   0   0   0
## 5       B       v1   0   0   0   0   0   0   0   1   0   0   0   0
## 6       B       v3   1   0   0   0   0   0   0   0   1   0   0   0
Summarize two columns given unique combinations

Summarize two columns given unique combinations


By : new2spark
Date : March 29 2020, 07:55 AM
around this issue in advance. I have a data frame of trips denoting start locations, finish locations and distance between each location combination. Like so:
code :
library(data.table)
myDT <- data.table(myDF)
x <- paste(myDT$Start, myDT$Finish, sep = "|")
myDT$v <- vapply(x, function(xi) paste(sort(strsplit(xi, "[|]")[[1]]), collapse=''), '')
myDT[, Count := length(Distance), by = v]
myDT <- myDT[!duplicated(v), ]
myDT

#          Start      Finish Distance           v Count
#1:  Johns House Mikes House     1000  JohnsMikes     2
#2: Franks House Lisas House      500 FranksLisas     1
How to summarize on different groupby combinations?

How to summarize on different groupby combinations?


By : Tim Nevermore
Date : March 29 2020, 07:55 AM
I hope this helps . I am compiling a table of top-3 crops by county. Some counties have the same crop varieties in the same order. Other counties have the same crop varieties in a different order. , Method 1:
Combine the crop columns
code :
>>> df1['combined_temp'] = df1.apply(lambda x : list([x['Crop1'],
...                           x['Crop2'],
...                           x['Crop3']]),axis=1)
>>> df1.head()
       County   Crop1    Crop2    Crop3  Total_pop              combined_temp
0      Harney   grain   melons   apples       2000    [grain, melons, apples]
1       Baker  melons    grain   apples       1500    [melons, grain, apples]
2     Wheeler  melons    grain   apples       3000    [melons, grain, apples]
3  Hood River  apples   melons    grain       1500    [apples, melons, grain]
4       Wasco   pears  carrots  raddish       2000  [pears, carrots, raddish]
>>> df1['sorted'] = df1.apply(lambda x : tuple(sorted(x['combined_temp'])),axis=1)
>>> df1.head()
       County   Crop1    Crop2            ...             Total_pop              combined_temp                     sorted
0      Harney   grain   melons            ...                  2000    [grain, melons, apples]    (apples, grain, melons)
1       Baker  melons    grain            ...                  1500    [melons, grain, apples]    (apples, grain, melons)
2     Wheeler  melons    grain            ...                  3000    [melons, grain, apples]    (apples, grain, melons)
3  Hood River  apples   melons            ...                  1500    [apples, melons, grain]    (apples, grain, melons)
4       Wasco   pears  carrots            ...                  2000  [pears, carrots, raddish]  (carrots, pears, raddish)
>>> df1_grouped = df1.groupby(['sorted'])['Total_pop'].sum().reset_index()
>>> df1_grouped
                      sorted  Total_pop
0    (apples, grain, melons)       8000
1  (carrots, pears, raddish)       9200
df = df1.copy()

grouping_cols = ['Crop1', 'Crop2', 'Crop3']

df[grouping_cols] = pd.DataFrame(df.loc[:, grouping_cols] \
                            .apply(set, axis=1) \
                            .apply(sorted)            
                            .values \
                            .tolist(), columns=grouping_cols)

>>> df.head()
       County    Crop1  Crop2    Crop3  Total_pop
0      Harney   apples  grain   melons       2000
1       Baker   apples  grain   melons       1500
2     Wheeler   apples  grain   melons       3000
3  Hood River   apples  grain   melons       1500
4       Wasco  carrots  pears  raddish       2000
>>> df.groupby(grouping_cols).Total_pop.sum()
Crop1    Crop2  Crop3  
apples   grain  melons     8000
carrots  pears  raddish    9200
Name: Total_pop, dtype: int64
How to summarize key statistics by two variables?

How to summarize key statistics by two variables?


By : user2754760
Date : March 29 2020, 07:55 AM
hop of those help? Here is some sample code: , Here's a sample using your data:
code :
library(dplyr)
dat %>% 
  group_by(sex) %>%  
  summarise(mean = mean(income), 
            var = var(income),
            sd = sd(income))
Summarize variables beside

Summarize variables beside


By : AspiringDev1981
Date : March 29 2020, 07:55 AM
I think the issue was by ths following , I am looking for a solution for my problem. I just can solve it with manually rearranging. , One option would be to post-process with adequate select_helpers
code :
library(dplyr)
summarized %>% 
    select(Z, starts_with('W'), everything())
# A tibble: 2 x 5
#  Z     W_mean W_median X_mean X_median
#  <fct>  <dbl>    <dbl>  <dbl>    <dbl>
#1 cat     5.25      5.5   3.75      3.5
#2 dog     5.67      5.5   6.67      7  
library(stringr)
summarized %>% 
         select(Z, order(str_remove(names(.), "_.*")))
# A tibble: 2 x 5
#  Z     W_mean W_median X_mean X_median
#  <fct>  <dbl>    <dbl>  <dbl>    <dbl>
#1 cat     5.25      5.5   3.75      3.5
#2 dog     5.67      5.5   6.67      7  
Related Posts Related Posts :
  • How to I get a total count?
  • Many-to-many SQL relationship
  • Procedure call inside procedure
  • How to have decreasing running total in Oracle sql
  • SQL Group function query?
  • Which is the best way to calculate Year/Month in SQL Server?
  • Best way of sanitize unparametrizable sql
  • Querying for JSON data in Oracle creates syntax error
  • Show top 1 by max column
  • SQL: Insert newly created column in the same table
  • How to extract numbers after string using regexp?
  • SQL Query - Group consecutive items based on condition
  • Users who work in same department
  • Syntax error near column value Vb
  • Oracle Trigger BEFORE INSERT has No data found
  • What kind of join to use on SQL tables
  • Is there a way to add a constant value dynamically to all records returned in Hive?
  • SQL optimization (inner join or selects)
  • EF 6.x, LINQ-to-SQL and raw SQL clauses
  • Simple SQL Variable Assignment Only Returns One Letter: Why?
  • Converting a custom timestamp to date
  • SQL Server : inserting Player vs Player names in to new table from tblEntrants
  • invalid identifier in sql
  • PL/SQL - I keep getting this error when concatenating: PLS-00306: wrong number or types of arguments in call to '||'
  • Count records only from left side of a LEFT JOIN
  • get everything before a string including itself oracle
  • Format Data from Word Doc to SQL using RegEX
  • Conditional formatting on MAX value row
  • MS-Access : selecting data from two tables and only returning you need
  • SQL Server: optimal indexing strategies for many-to-many join
  • DBgrid column very wide
  • PostgreSQL Group values by category, count and calculate percentage
  • MS Access SQL - Most Recent Record for Each Consultant ID
  • Update table: Summary of previous rows without using cursor or while loop
  • PostgreSQL: built-in function to remove substring starting with certain pattern
  • ORA-00909: invalid number of arguments
  • Select Column within a Column SQL
  • PostgreSQL Inserting 2 relationships at once
  • T sql - How to store results from a dynamic query using EXEC or EXECUTE sp_executesql
  • How do I parse my json into CSV using regex?
  • Reverse foreign key cascading (or how to collect database garbage)
  • SQL Pivot Questions
  • Insert records into a table with a condition in SQL Server 2016
  • display null value using rank functions in oracle sql
  • SQL - Get count of group by column but also select top item of group
  • How to add an array of datarows into an exisitng table inside my database
  • There is no unique constraint matching given keys for referenced table "employee" 1
  • SQL: Unable to SELECT joined column
  • How to find out how much space a SQL Server table uses?
  • Window function to remove specific records from SQL Server dataset
  • How to add a column for each day in sql?
  • Create group column based on the specific rows
  • Not sure if this consistitues a transitive dependency
  • How to compare the values in a column to a long list in SQL Server
  • Preserving data format Decimal(6,5) from vba to sql
  • Oracle Query to rollup QTY by Year- only last 3 years
  • SQL - Calculate 2 columns and view result to another to column
  • Divide or Multiply according to a condition (Improving query)
  • PostgreSQL unnest() with consecutive integers grouped by number
  • SQL to limit output to certain months and years
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org