logo
down
shadow

Transform (shuffle) just 2 Fields in a Dataframe


Transform (shuffle) just 2 Fields in a Dataframe

By : user2173445
Date : October 20 2020, 08:10 PM
help you fix your problem I have a data frame with firstname, lastname and I want to permutate them but ONLY for the rows that have values. There are many null fields and I don't want reorder them so that there is ever a firstname value without a lastname value. Ex: , What helps is shuffling only the nonempty entries:
code :
permtest$lastname[permtest$lastname != ''] <- sample(permtest$lastname[permtest$lastname != ''])
permtest
#   number firstname  lastname
# 1      1                    
# 2      2     Eddie Van Halen
# 3      3    Edward    Vedder
# 4      4                    
# 5      5  Edurardo    Norton


Share : facebook icon twitter icon
Shuffle DataFrame rows

Shuffle DataFrame rows


By : user3464817
Date : March 29 2020, 07:55 AM
around this issue The idiomatic way to do this with Pandas is to use the .sample method of your dataframe to sample all rows without replacement:
code :
df.sample(frac=1)
df = df.sample(frac=1).reset_index(drop=True)
$ python3 -m memory_profiler .\test.py
Filename: .\test.py

Line #    Mem usage    Increment   Line Contents
================================================
     5     68.5 MiB     68.5 MiB   @profile
     6                             def shuffle():
     7    847.8 MiB    779.3 MiB       df = pd.DataFrame(np.random.randn(100, 1000000))
     8    847.9 MiB      0.1 MiB       df = df.sample(frac=1).reset_index(drop=True)

When I shuffle a copy of a DataFrame, why the original DataFrame is also shuffled?

When I shuffle a copy of a DataFrame, why the original DataFrame is also shuffled?


By : Ross Taylor
Date : March 29 2020, 07:55 AM
this will help IDs of df1.index and df2.index are different but df1.index.values and df2.index.values have the same ID:
code :
In [68]: id(df1.index), id(df2.index)
Out[68]: (140032214366920, 140032214391720)

In [69]: id(df1.index.values), id(df2.index.values)
Out[69]: (140032213182304, 140032213182304)
In [73]: df2.index = np.random.permutation(df2.index)

In [74]: df1.index
Out[74]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')

In [75]: df2.index
Out[75]: Int64Index([6, 2, 1, 8, 7, 0, 4, 5, 3, 9], dtype='int64')
Shuffle DataFrame rows except the first row

Shuffle DataFrame rows except the first row


By : xja
Date : March 29 2020, 07:55 AM
I think the issue was by ths following , I am trying to randomize all rows in a data frame except for the first. I would like for the first row to always appear first, and the remaining rows can be in any randomized order. , try this:
code :
df = pd.concat([df[:1], df[1:].sample(frac=1)]).reset_index(drop=True)
In [38]: df
Out[38]:
          a         b         c         d         e
0  2.070074  2.216060 -0.015823  0.686516 -0.738393
1 -1.213517  0.994057  0.634805  0.517844 -0.128375
2  0.937532  0.814923 -0.231120  1.970019  1.438927
3  1.499967  0.105707  1.255207  0.929084 -3.359826
4  0.418702 -0.894226 -1.088968  0.631398  0.152026
5  1.214119 -0.122633  0.983818 -0.445202 -0.807955
6  0.252078 -0.258703 -0.445209 -0.179094  1.180077
7  1.428827 -0.569009 -0.718485  0.161108  1.300349
8 -1.403100  2.154548 -0.492264 -0.544538 -0.061745
9  0.468671  0.004839 -0.738240 -0.385624 -0.532640

In [39]: df = pd.concat([df[:1], df[1:].sample(frac=1)]).reset_index(drop=True)

In [40]: df
Out[40]:
          a         b         c         d         e
0  2.070074  2.216060 -0.015823  0.686516 -0.738393
1  0.468671  0.004839 -0.738240 -0.385624 -0.532640
2  0.418702 -0.894226 -1.088968  0.631398  0.152026
3 -1.213517  0.994057  0.634805  0.517844 -0.128375
4  1.428827 -0.569009 -0.718485  0.161108  1.300349
5  0.937532  0.814923 -0.231120  1.970019  1.438927
6  0.252078 -0.258703 -0.445209 -0.179094  1.180077
7  1.499967  0.105707  1.255207  0.929084 -3.359826
8 -1.403100  2.154548 -0.492264 -0.544538 -0.061745
9  1.214119 -0.122633  0.983818 -0.445202 -0.807955
R: Shuffle dataframe columnwise

R: Shuffle dataframe columnwise


By : Tùng Lê
Date : March 29 2020, 07:55 AM
may help you . This link answers a part of my question: How to randomize (or permute) a dataframe rowwise and columnwise?. , You might want to just sample the column-names. Something like:
code :
names(df) <- names(df)[sample(ncol(df))]
How to shuffle the rows in a Spark dataframe?

How to shuffle the rows in a Spark dataframe?


By : pradeep pasupuleti
Date : March 29 2020, 07:55 AM
fixed the issue. Will look into that further I have a dataframe like this: , You need to use orderBy method of the dataframe:
code :
import org.apache.spark.sql.functions.rand
val shuffledDF = dataframe.orderBy(rand())
Related Posts Related Posts :
  • R 'cowplot' neatly produce gridded plot with shared (common) legends and unique legends
  • Repeat R script for many times and save results to text file
  • How to negative lookbehind for special characters
  • data.table inner join produces error when no match is found
  • Create a new column base on existing column, but row above
  • Is there a way to visualize the process of source() in RStudio?
  • google places api consumes 10 request but I am doing only 1
  • Statistical mode of a categorical variable in R (using mlv)
  • Using for-loop to mutate a data.frame in r
  • Make plot with regression line for mixed model
  • Shortcut to select matces cases in R studio
  • vectoriced norm/matrix multiplication
  • Negative log10 transformation in R
  • Plot data with duplicate points
  • Visualizing crosstab tables with a plot in R - changing colours
  • How to manually modify automated numbers and labels in plot
  • How can I follow any redirections of a url in R?
  • Add jitter to box plot using markers in plotly
  • Adding an extra item to the legend
  • ggplot fills in data in the wrong order
  • Convert list to data frame
  • R: filtering by list(s) of strings and returning all results that start with the content of the lists
  • R:How to attach parts of a data frame with different headers and/or an overflowing piece of the dat frame
  • How to use 'par' for manipulating plot margins?
  • Can dplyr::case_when return mix of NAs and non-NAs?
  • Text preprocessing and topic modelling using text2vec package
  • Uploading multiple files in Shiny, process the files, rbind the results and return a download
  • R levelplot: color green-white-red (white on 0) according to one variable, but show the values of another variable
  • Why [i] doesn't point to the starting point in a vector
  • In R after generating a mvrnorm distribution, Y, what does Y[,1] do?
  • expand a data frame to have as many rows as range of two columns in original row
  • Getting started with R and CFA
  • Re order x-axis in ggplot so time goes from 12AM to 11PM in R
  • R - Automatically stack every nth column of a data frame and save them as new objects
  • How to format dplyr output in R into doubles (or other workable format)?
  • Dataframe to matrix conversion using tapply turns zeros to NAs
  • Smallest multiple of 1:20 - How can I make it quicker?
  • How to specify the size of a graph in ggplot2 independent of axis labels
  • How can I find the number of a vector's elements in another vector?
  • ROC curve from train/test set in caret R package
  • Random Forest for a mixture of categorical,numeric and "unwanted" variables which include missing values
  • extract certain data from multiple excel files with R
  • Matrix with counts of wins and losses between methods in R
  • Grouping string variables from a dataframe by best string match to make subsets
  • Reorder does not work after adding second geom_points
  • cover POS data formate to the one can apply Arules (Apriori)
  • Matching values between data frames based on overlapping dates
  • Grouped bar chart turns into stacked bar chart ggplot
  • R: How to fill in NA Values within a Column based on grouping?
  • Two action buttons, but only the first one, that is written in the server file, works?
  • Barchart grouped by variable both count up to 100 percent
  • Converting time in R to 24 hours
  • R - Web scrapping and downloading multiple zip files and save the files without overwriting
  • Find month and year inside string
  • Append multiple csv files into one file using R
  • Use `purrr::map` with k-means
  • R - 'data' is not an exported object from 'namespace:my_package'
  • Sum vector with number by dinamic intervals without looping
  • Issues with ave function in R: error "cannot allocate vector of size 419 kb."
  • Shiny system call with continuous updates
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org