logo
down
shadow

Dummy variable with 2 classes. Should it be in a single or multiple columns?


Dummy variable with 2 classes. Should it be in a single or multiple columns?

By : nadeem ahmed
Date : November 19 2020, 03:01 PM
wish helps you I personally like to use a n - 1 columns for a field with n categories. When using the get_dummies method this means setting drop_first to True.
As far as why I like to do this; a former instructor of mine explains it pretty well in his answer to one hot encoding vs dummy encoding in sckikit learn. Basically it boils down to eliminating collinearity.
code :


Share : facebook icon twitter icon
Going from multiple dummy variables to a single variable

Going from multiple dummy variables to a single variable


By : user2885997
Date : March 29 2020, 07:55 AM
hope this fix your issue How can I take n dummy variables that are mutually exclusive in a data frame and concatenate them into a single variable? In the following example
code :
k$i <- names(k)[apply(k, 1, which)]
How to Create a Single Dummy Variable with conditions in multiple columns?

How to Create a Single Dummy Variable with conditions in multiple columns?


By : Ren
Date : March 29 2020, 07:55 AM
seems to work fine I am trying to efficiently create a binary dummy variables (1/0) in my data set based on whether or not one or more of 7 variables (col9-15) in the data set take on a specific value (35), but I don't want to test all columns. , You can use rowSums (vectorized solution) like this :
code :
set.seed(123)
dat <- matrix(sample(c(35,1:100),size=15*20,rep=T),ncol=15,byrow=T)
cbind(dat,rowSums(dat[,9:15] == 35) > 0)
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16]
 [1,]   29   79   41   89   94    4   53   90   55    46    96    45    68    57    10     0
 [2,]   90   24    4   33   96   89   69   64  100    66    71    54    60    29    14     0
 [3,]   97   91   69   80    2   48   76   21   32    23    14    41    41    37    15     0
 [4,]   14   23   47   26   86    4   44   80   12    56    20    12    76    90    37     0
 [5,]   67    9   38   27   82   45   81   82   80    44    76    63    71    35    48     1
 [6,]   22   38   61   35   11   24   67   42   79    10    43    99    90    89    17     0
 [7,]   13   65   34   66   32   18   79    9   47    51    60    33    49    96    48     0
 [8,]   89   92   61   41   14   94   30    6   95    72    14    55    96    59    40     0
 [9,]   65   32   31   22   37   99   15    9   14    69    62    90    67    74    52     0
[10,]   66   83   79   98   44   31   41    1   18    85    23    24     7    24    73     0
[11,]   85   50   39   24   11   39   57   21   44    22    50    35    65    37    35     1
[12,]   53   74   22   41   26   63   18   87   75    67    62    37    53    88    58     0
[13,]   84   31   71   26   60   48   26   57   92    91    27    32    99    62    94     0
[14,]   47   41   66   15   57   24   97   60   52    40    88    36    29    17    17     0
[15,]   48   25   21   68    4   70   35   41   82    92    28    97    73    69     5     0
[16,]   39   48   56   70   92   62   43   54    5    26    40    19    84    15    81     0
[17,]   55   66   17   63   31   73   40   97   97    73    25    22    59    27    53     0
[18,]   79   16   40   47   87   93   89   68   95    52    58    33    35     2    50     1
[19,]   87   35    7   16   77   74   98   47    7    65    76    13    40    22     5     0
[20,]   39    6   22    5   67   30   10    7   88    76    82    99    10    10    80     0
 transform(dat,x=as.numeric((rowSums(dat[,9:15] == 35) > 0)))
data$indicator <- as.integer(rowSums(data[paste0("col", 9:15)] == 35) > 0)
R: Recoding multiple dummy variables into a single variable and replacing the corresponding dummy value with the variabl

R: Recoding multiple dummy variables into a single variable and replacing the corresponding dummy value with the variabl


By : user3673867
Date : March 29 2020, 07:55 AM
help you fix your problem You could use max.col to get the column index that have a value of '1' in each row for columns 5 to 9. (The 'df' example is not correct as most of the rows were all 0s. The corrected one is below).
code :
df$QUEUE <-  names(df)[-c(1:4)][max.col(df[-c(1:4)])]
df$QUEUE <-  names(df)[-(1:4)][(as.matrix(df[-(1:4)]) %*% 
                         seq_along(df[-(1:4)]))[,1]]
 df$CONTENT[!rowSums(df[5:9])] <- 1
 df$QUEUE1 <-  names(df)[5:9][max.col(df[5:9])]
 df$QUEUE1
 #[1] "CLAIMS"      "CONTENT"     "CONTENT"     "DEDUCT_BILL" "CONTENT"    
 #[6] "CONTENT"     "CONTENT"     "CONTENT"     "CONTENT"     "CONTENT" 
df <- structure(list(MON1_12 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L), WEEK1_53 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), 
AGENT_ID = structure(c(3L, 
4L, 7L, 8L, 1L, 6L, 5L, 9L, 2L, 10L), .Label = c("A129", "A360", 
"A407", "B891", "D197", "L145", "L722", "O518", "T443", "W764"
), class = "factor"), CallsHandled = c(1L, 4L, 2L, 14L, 1L, 2L, 
5L, 1L, 1L, 3L), CONTENT = c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0), CLAIMS = c(1, 
0, 0, 0, 1, 0, 0, 0, 0, 0), CREDIT_CARD = c(0, 0, 0, 0, 0, 1, 
1, 0, 0, 0), DEDUCT_BILL = c(0, 1, 0, 1, 0, 0, 0, 0, 0, 1),
 HCREFORM = c(0, 
0, 0, 0, 0, 0, 0, 1, 1, 0)), .Names = c("MON1_12", "WEEK1_53", 
"AGENT_ID", "CallsHandled", "CONTENT", "CLAIMS", "CREDIT_CARD", 
"DEDUCT_BILL", "HCREFORM"), row.names = c(NA, -10L), class = "data.frame")
R Dummy-variable to be populated from multiple columns

R Dummy-variable to be populated from multiple columns


By : Roshan Bagla
Date : March 29 2020, 07:55 AM
To fix this issue We can use mtabulate from qdapTools. Transpose the 'Dataset1', convert it to data.frame, apply the mtabulate, change its column names (if needed) and cbind with the original 'Dataset1'
code :
library(qdapTools)
d1 <- mtabulate(as.data.frame(t(Dataset1)))
row.names(d1) <- NULL
names(d1) <- paste0("dummy.", names(d1))
cbind(Dataset1, d1)
#   T1 T2 T3 dummy.A dummy.B dummy.C dummy.D dummy.E dummy.F
#1  A  C  B       1       1       1       0       0       0
#2  A  C  B       1       1       1       0       0       0
#3  A  C  B       1       1       1       0       0       0
#4  A  D  C       1       0       1       1       0       0
#5  B  D  C       0       1       1       1       0       0
#6  B  E  F       0       1       0       0       1       1
Create dummy variable of multiple columns with python

Create dummy variable of multiple columns with python


By : user2537257
Date : March 29 2020, 07:55 AM
wish help you to fix your issue If need indicators in output use max, if need count values use sum after get_dummies with another parameters and casting values to strings:
code :
df = pd.get_dummies(df.astype(str), prefix='', prefix_sep='').max(level=0, axis=1)
#count alternative 
#df = pd.get_dummies(df.astype(str), prefix='', prefix_sep='').sum(level=0, axis=1)
print (df)
   1  2  3  4
0  1  1  0  0
1  0  1  1  0
2  0  0  1  1
Related Posts Related Posts :
  • How to use an API that requires user's entry (Sentiment Analysis)
  • Django first app
  • Why is this regex code not working
  • Beautifulsoup - findAll not finding string when link is also in container
  • Python: any() to check if attribute in List of Objects matches a list
  • How do I "enrich" every record in a Pandas dataframe with an hour column?
  • Failing to open an Excel file with Python
  • Python function to modify string
  • Pandas DataFrame seems not to have "factorize" method
  • Row column operations in CSV
  • How to decrypt RSA encrypted file (via PHP and OpenSSL) with pyopenssl?
  • How can we use pandas to generate min, max, mean, median, ...as new columns for the dataframe?
  • Cython: creating an array throws "not allowed in a constant expression"
  • Different thing is shown in html
  • sublimetext3 event for program exit
  • Join contigous tokens if the token includes "@" char
  • transparent background in gif using Python Imageio
  • Enable autologin into flask app using active directory
  • Make a NxN array of 1x3 arrays of random numbers (python)
  • django how to use Max and Count on the same field in back-to-back annotations
  • Using the OR operator seems to only take the first of two conditions when used with np.where filter
  • Elegant Dataframe Operations in Pandas
  • Change metadata of pdf file with pypdf2
  • How can I animate a set of points with matplotlib?
  • error: (-215) count >= 0 && (depth == CV_32F || depth == CV_32S) in function arcLength
  • OpenStack KeyStone SSL Exception When Creating an Instance of KeyStone
  • pyspark: The system cannot find the path specified
  • How can I set path to load data from CSV file into PostgreSQL database in Docker container?
  • Summation in python dictionary
  • DRF 3.7.0 removed handling None in fields and broke my foreign key source fields. Is there a way around it?
  • Error with Padlen in signal.filtfilt in Python
  • Abstract matrix multiplication with variables
  • Reading binary data on bit level
  • How to replace multiple instances of a sub strings in a string using a for loop (in a function)?
  • py2neo cypher create several relations to central node in for loop
  • [python-3]TypeError: must be str, not int
  • How to exit/terminate a job earlier and handle the raised exception in apscheduler?
  • python, print intermediate values while loop
  • python to loop over yaml config
  • D3.js is not recognized by PyCharm
  • Access the regularization paths obtained from ElasticNetCV in sklearn
  • Pattern table to Pandas DataFrame
  • Get the earliest date from a column (Python Pandas) after csv.reader
  • Get SystemError: Parent module '' not loaded, cannot perform relative import when trying to import numpy in a Cython Ext
  • Bash or Python : Append and prepend a string recursively in all .tex files
  • Changing a certain index of boolean list of lists change others, too
  • complex dataframe filtering request on the last occurence of a value in Panda/Python [EDIT]
  • How to repeatedly get the contents of a Text widget every loop with tkinter?
  • How to call the tornado.queues message externally
  • How can I use regex in python so that characters not included are disallowed?
  • Discarding randmly scattered empty spaces in pandas data frame
  • Get sums grouped by date by same column filtered by 2 conditions
  • Element disappears when I add an {% include %} tag inside my for loop
  • Django Rest Framework with either a slug or a pk lookup field for the DetailAPIView
  • Flask doesn't stream on Lambda
  • Generate all permutations of fixed length where the elements come from two different sets
  • Making function for calculating distance
  • How to handle multiprocessing based on the limit of CPU's
  • Django - static files is not working
  • Remove x axis and y axis black lines with matplotlib
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org