logo
down
shadow

Converting a Spark Dataframe to a mutable Map


Converting a Spark Dataframe to a mutable Map

By : Mohamed.Khaled
Date : November 20 2020, 03:01 PM
wish helps you This can be broken down into 3 steps, each one already solved on SO:
Convert DataFrame to RDD[(String, Int)] Call collectAsMap() on that RDD to get an immutable map Convert that map into a mutable one (e.g. as described here)
code :


Share : facebook icon twitter icon
Sparkling Water: out of memory when converting spark dataframe to H2o dataframe

Sparkling Water: out of memory when converting spark dataframe to H2o dataframe


By : checkhavoc
Date : March 29 2020, 07:55 AM
I hope this helps . the problem is that you run out of PermGem memory which is not the same memory space as you usually configure for your driver and executors using
.set("spark.driver.memory", "4g") .set("spark.executor.memory", "4g")
why spark scala JDBC converting NUMBER(1) in oracle to boolean in spark dataframe

why spark scala JDBC converting NUMBER(1) in oracle to boolean in spark dataframe


By : Margo Clement
Date : March 29 2020, 07:55 AM
I think the issue was by ths following , I solved this issue using to_char method in the select class of JDBC for any Boolean column .
Below is the code used ( to_char(HEARTBEATS_ENABLED) ). I tried to_number as well, but it produce result like 1.0000, so i used to_char to achieve desire result
code :
val result=sqlcontext.read.format("jdbc").option("url", "jdbc:oracle:thin:@//Host:1521/QAM").option("driver", "oracle.jdbc.driver.OracleDriver")
.option("dbtable", "(select to_char(HEARTBEATS_ENABLED) as HEARTBEATS_ENABLED[enter link description here][1],APPLICATION_ID,APPLICATION_TYPE_ID,NAME,DESCR,***to_char(ACTIVE_STAT) as ACTIVE_STAT*** ,PROGRAM_ID from  MGPH.APPLICATION where APPLICATION_ID in (11,12))").option("user", "myuser").option("password", "my password").option("fetchsize", "100").load()
result.show()
result.printSchema
Converting Scala mutable arrays to a spark dataframe

Converting Scala mutable arrays to a spark dataframe


By : IncompetentIntern
Date : March 29 2020, 07:55 AM
it should still fix some issue Mutable structures like ArrayBuffer are evil, especially in parallelizable context. Here they can be avoided quite easily.
func1 can return a tuple of (String, Array[Double]), where the first element corresponds to the id (former id buffer) and the second element is the quartiles returned from approxQuantile:
code :
def func1(x: Row): (String, Array[Double]) = {
  val cardNum1 = x(0).toString
  val quartiles = df_auth_for_qnt.where($"id" === cardNum1).stat.approxQuantile("tran_amt", Array(0.25, 0.75), 0.001)
  (cardNum1, quartiles)
}
val resultDf = df_auth_for_qnt_gb.where($"tran_cnt" > 8).select("card_num_1").map(func1).toDF("id", "quartiles")
val resultMap = df_auth_for_qnt_gb.where($"tran_cnt" > 8).select("card_num_1").map(func1).collect().toMap
Spark - avoid mutable dataframe

Spark - avoid mutable dataframe


By : bjshdq
Date : March 29 2020, 07:55 AM
this will help You can use foldLeft with withColumn to create new n number of columns as
code :
//demo data 
val df = Seq(
  (1, "a"),
  (2, "b"),
  (3, "c")
).toDF("id", "name")


val n = 5

//Use fold left to add each new column with literal value as 
val newDF = (1 to n).foldLeft(df){(tempDF, number) => {
  tempDF.withColumn(number.toString, lit(number))
}}

newDF.show(false)
+---+----+---+---+---+---+---+
|id |name|1  |2  |3  |4  |5  |
+---+----+---+---+---+---+---+
|1  |a   |1  |2  |3  |4  |5  |
|2  |b   |1  |2  |3  |4  |5  |
|3  |c   |1  |2  |3  |4  |5  |
+---+----+---+---+---+---+---+
Scala: adding value from spark DataFrame to mutable list in a for loop

Scala: adding value from spark DataFrame to mutable list in a for loop


By : Thuc Nguyen
Date : March 29 2020, 07:55 AM
around this issue I want to update the elements of a MutableList that was declared outside of a for loop with values from a dataframe. I initialized the list as empty and expect the list to have the n number of elements added when the loop terminates. However, it seems only one element is back to an empty list (never gets updated with new additions) and when the loop terminates, the list is back to empty. , Can you try like this,
code :
for (i <-df.collect){
      my_list += "ok"
      println(my_list)
      }
scala> val a = scala.collection.mutable.ListBuffer[String]()
a: scala.collection.mutable.ListBuffer[String] = ListBuffer()

scala> for ( i <- df.collect) {a+="ok"; println(a)}
ListBuffer(ok)
ListBuffer(ok, ok)
ListBuffer(ok, ok, ok)
ListBuffer(ok, ok, ok, ok)
ListBuffer(ok, ok, ok, ok, ok)
ListBuffer(ok, ok, ok, ok, ok, ok)
ListBuffer(ok, ok, ok, ok, ok, ok, ok)
ListBuffer(ok, ok, ok, ok, ok, ok, ok, ok)

scala> a
res11: scala.collection.mutable.ListBuffer[String] = ListBuffer(ok, ok, ok, ok, ok, ok, ok, ok)
Related Posts Related Posts :
  • Can spark-submit with named argument?
  • Scala alternative to series of if statements that append to a list?
  • Convert string column to Array
  • Unable to authenticate OAuth2 with Akka-Http
  • Iterate through rows in DataFrame and transform one to many
  • Spark Scala Delete rows in one RDD based on columns of another RDD
  • SPARK RDD Between logic using scala
  • Run a function in scala with a list as input
  • Convert arbitrary number of columns to Vector
  • how to call a method from another method using scala?
  • Scala: Traversable foreach definition
  • How to handle multiple invalid query params in akka http?
  • Scala error: value $ is not a member of object org.apache.spark.api.java.JavaSparkContext
  • Extract a specific JSON structure from a json string in a Spark Rdd - Scala
  • Spark: How do I query an array in a column?
  • scala - Functional way to take a string and create a dictionary using specific delimiters
  • Spark Scala: convert arbitrary N columns into Map
  • How to delete file right after processing it with Play Framework
  • scala: mapping future of tuple
  • why does sameElements returns true for sets?
  • Scala: Class of Options to Option of Class
  • timeout in scala's future firstcompletedof
  • No 'scala-library*.jar' in every new IntelliJ Scala Project
  • What is the meaning of "new {}" in Scala?
  • Why I cannot use iterator again in Scala
  • Spark worker throws FileNotFoundException on temporary shuffle files
  • Version conflict: some are suspected to be binary incompatible
  • Sbt: when to use testQuick and how does it determine which tests to skip?
  • IntelliJ: Scala worksheet don't pick up code changes without restart
  • The relationship between Type Symbol and Mirror of Scala reflection
  • Difference between [ ] and ( ) to create new Scala objects
  • Error: Could not find or load main class Main Scala
  • Maximum value of an mllib Vector?
  • Scalafx: create lineChart in scala
  • Conversion to tuple with by-name parameter
  • How to convert RDD of JSONs to Dataframe?
  • Spark: display log messages
  • How to bind Slick dependency with Lagom?
  • Sorting numeric String in Spark Dataset
  • understanding unapply without case class
  • Parsing more than 22 fields with Spray Json without nesting case classes
  • Why is Scala returning a ';' expected but ',' found error?
  • Spark reading Avro file
  • How to refactor similar and repetitive functions in scala
  • Getting ClassCastException while trying to save file in avro format in spark
  • How to Microbenchmark using data from a file?
  • Overloaded method value trigger with alternatives for '=> Unit' parameter
  • Unselecting "Run worksheet in the compiler process" causes source file not to be found
  • Why adding two List[Map[String, Any]] changes the output result type to List[Equals] in scala?
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org