logo
down
shadow

How to parse a column that has a custom json format from a spark DataFrame


How to parse a column that has a custom json format from a spark DataFrame

By : VBPatient
Date : October 22 2020, 08:10 AM
hop of those help? I don't see any simple way to parse this input easily. You need to break the string and construct the json using a udf. Check this out:
code :
scala> val df = Seq(("{a=6236.0, b=0.0}"),("{a=323, b=2.3} ")).toDF("data")
df: org.apache.spark.sql.DataFrame = [data: string]

scala> import org.apache.spark.sql.types._
import org.apache.spark.sql.types._

scala> val sch1 = new StructType().add($"a".string).add($"b".string)
sch1: org.apache.spark.sql.types.StructType = StructType(StructField(a,StringType,true), StructField(b,StringType,true))

scala> def  json1(x:String):String=
     | {
     |  val coly = x.replaceAll("[{}]","").split(",")
     |  val cola = coly(0).trim.split("=")
     |  val colb = coly(1).trim.split("=")
     |  "{\""+cola(0)+"\":"+cola(1)+ "," + "\"" +colb(0) + "\":" + colb(1) + "}"
     | }
json1: (x: String)String

scala>  val my_udf = udf( json1(_:String):String )
my_udf: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,StringType,Some(List(StringType)))

scala> df.withColumn("n1",my_udf('data)).select(from_json($"n1",sch1) as "data").select("data.*").show(false)
+------+---+
|a     |b  |
+------+---+
|6236.0|0.0|
|323   |2.3|
+------+---+


scala>


Share : facebook icon twitter icon
spark dataframe parse csv with non US format strange error

spark dataframe parse csv with non US format strange error


By : Tristan Eason
Date : March 29 2020, 07:55 AM
Hope this helps Actually, thread safety is the problem. So changing the parsing function to
code :
def toDouble: UserDefinedFunction = udf[Double, String](_.replace(',', '.').toDouble)
How to parse each row JSON to columns of Spark 2 DataFrame?

How to parse each row JSON to columns of Spark 2 DataFrame?


By : arabji yacine
Date : March 29 2020, 07:55 AM
will help you You can use select($"value.*") in the end to select the elements of struct column into separate columns as
code :
val result = df.withColumn("value", from_json($"value", schema)).select($"value.*")
Apache Spark: Convert column with a JSON String to new Dataframe in Scala spark

Apache Spark: Convert column with a JSON String to new Dataframe in Scala spark


By : Darshit
Date : March 29 2020, 07:55 AM
I think the issue was by ths following , If you do not have a predefined schema the other option is to convert it to RDD[String] or Dataset[String] and load as a json
Here is how you can do
code :
//convert to RDD[String]
val rdd = originalDF.rdd.map(_.getString(0))

val ds = rdd.toDS
val df = spark.read.json(rdd) // or spark.read.json(ds)

df.show(false)
+---+-----+
|a  |b    |
+---+-----+
|2  |hello|
|1  |hi   |
+---+-----+
Generic way to Parse Spark DataFrame to JSON Object/Array Using Spray JSON

Generic way to Parse Spark DataFrame to JSON Object/Array Using Spray JSON


By : user3279187
Date : March 29 2020, 07:55 AM
this one helps. After trying various approach using various libraries, I finally settled with the below simple approach.
code :
val list = sc.parallelize(List(("a1","b1","c1","d1"),("a2","b2","c2","d2"))).toDF

val jsonArray = list.toJSON.collect
/*jsonArray: Array[String] = Array({"_1":"a1","_2":"b1","_3":"c1","_4":"d1"}, {"_1":"a2","_2":"b2","_3":"c2","_4":"d2"})*/

val finalOutput = jsonArray.mkString("[", ",", "]")

/*finalOutput: String = [{"_1":"a2","_2":"b2","_3":"c2","_4":"d2"},{"_1":"a1","_2":"b1","_3":"c1","_4":"d1"}]*/
How can I convert a spark dataframe column, containing serialized json, into a dataframe itself?

How can I convert a spark dataframe column, containing serialized json, into a dataframe itself?


By : I.Kulakov
Date : March 29 2020, 07:55 AM
To fix this issue The key turned out to be in the spark source code. path when passed to spark.read.json may be a "RDD of Strings storing json objects".
Here's the source dataframe schema:
code :
def inject_id(row):
    js = json.loads(row['fields'])
    js['id'] = row['id']
    return json.dumps(js)
json_df = spark.read.json(df.rdd.map(inject_id))
Related Posts Related Posts :
  • SpringBoot : No matching bean found exception
  • Implementing the Clonable interface, but don't have to override the clone() method
  • how can i get this code to choose a random word once from each array to print a sentence?
  • Static class to get connections from connection pool
  • Unable to start Chrome browser with user profile in Selenium
  • How to pass a object to be created as a parameter for a method in Java?
  • How to retry with hystrix
  • Loading key to KeyStore fails on Android Oreo
  • Spring+Velocity unsuccessful attempts to save object
  • Private constructor in Kotlin
  • Jetty:run fails with NoSuchMethodError with Spring 5
  • Cannot upload my Web Applicartion in Jboss7 EAP7 EAP
  • How to load a certificate from "Credential storage"?
  • Call Genexus procedure stub in Java environment
  • JavaFX clipping produces a 'lottery scratch ticket'-Effect
  • Using DateTimeFormatter on january first cause an invalid year value
  • Get all the output from Watson conversation using java
  • Java unable to open main class and jar file
  • How to override @override method from activity into another class
  • Adding Runtime VM parameters to intellij for Java 9
  • Java IBM MQ Client connected but not getting messages from queue
  • Questions of Tomcat SSL configuration
  • Stale JNLP files for <extension> (since Java 9?)
  • Properly set (system) properties in JUnit 5
  • Spring MVC Model within POST Method is empty
  • VSTS buildagent: Java 9
  • java.lang.NumberFormatException: For input string: "id" for Hibernate
  • Query id return type
  • Pass variable in API url in java .
  • Reading semicolon delimited csv
  • Get the workspace root
  • Native mmap error
  • hashmap and multiple txt files java
  • Kotlin: Access nested enum class from java
  • Google RateLimiter not Working for counter
  • Spring Boot Application Hanging When Running on Command Line
  • large amount of if else refactor
  • Unable to add xmlunit as dependancy in my pom
  • Scanner input needed twice, when only wanted once
  • How to interpret and translate kotlin code to java?
  • Firebase authentication: signInWithEmailAndPassword method dont respond at all
  • How Remove Recycler Separator/divider programmatically or using xml property
  • Not Able to Save Data Hibernate
  • Toolbar addView not working
  • Freeing memory wrapped with NewDirectByteBuffer
  • Synchronization with implicitwait() do not work, why?
  • Wrapper around Java primitive types
  • ClassNotFoundException: spark.Request when running from command line
  • Exception in Hibernate Configuration
  • How can I validate XML embedded in JSON using Citrus framework?
  • How is the String value passed to the updateText() method?
  • Memory efficient replace functions
  • Upload Photo with onActivityResult, but without Fullscreen capturing
  • Docker: Java 8 installation failing on Ubuntu
  • Java Netbeans Calculator performing wrong calculation
  • Nifi: Threads in nifi
  • While loop Int return inside the method completely dead
  • Checking if a user's entry is an integer using try catch statements
  • Forcing a subclass to provide an initialization method
  • calling a fuction and variables from another class in java
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org