Can we write Scala/Spark generic dynamically written code

spark sql create table example
spark create table from dataframe
scala generics
spark sql partition
spark dataframe
scala generic function return type
spark.read.format options
spark read hive table

I am trying to create Spark Scala code which can read any file with different number of columns. Can i dynamically write scala/spark code and compile and execute it. do i really need SBT. Whats the perfect way to achive this goal.

when i run scala code using shell script or scalac code.scala it says

hadoop@namenode1:/usr/local/scala/examples$ ./survey.sh 
/usr/local/scala/examples/./survey.sh:6: error: not found: value spark
val survey = spark.read.format("com.databricks.spark.csv").option("header","true").option("nullValue","NA").option("timestampFormat","yyyy-MM-dd'T'HH:mm:ss").option("mode","failfast").option("inferchema","true").load("/tmp/survey.csv")
             ^
/usr/local/scala/examples/./survey.sh:19: error: not found: type paste
:paste
 ^
/usr/local/scala/examples/./survey.sh:37: error: not found: value udf
val parseGenderUDF = udf( parseGender _ )
                     ^
three errors found

I want something like

dynamically generate file.scala code using shell script then complie it using

scalac file.scala

then execute it

scala file.scala

But is this possible. what is the way to do it.

hadoop@namenode1:/usr/local/spark/examples/src/main/scala/org/apache/spark/examples$ cat Survey.scala 
import org.apache.spark.sql.{SparkSession}

object Survey {
   def main(args: Array[String]) {
val spark= SparkSession.builder
  .master("local")
  .appName("Survey")
  .getOrCreate()

val survey = spark.read.format("com.databricks.spark.csv").option("header","true").option("nullValue","NA").option("timestampFormat","yyyy-MM-dd'T'HH:mm:ss").option("mode","failfast").option("inferchema","true").load("/tmp/survey.csv")
survey.show()
}
}

error when executed

hadoop@namenode1:/usr/local/spark/examples/src/main/scala/org/apache/spark/examples$ scalac Survey.scala
    Survey.scala:1: error: object apache is not a member of package org
    import org.apache.spark.sql.{SparkSession}
               ^
    Survey.scala:5: error: not found: value SparkSession
    val spark= SparkSession.builder
               ^
    two errors found
    hadoop@namenode1:/usr/local/spark/examples/src/main/scala/org/apache/spark/examples$ 

To submit spark jobs, either you have to use spark-submit command or execute scala scripts in spark-shell. Apache Livy provides a REST API to submit spark jobs as well.

Apache Spark 2.x for Java Developers, I am trying to create Spark Scala code which can read any file with different number of columns. Can i dynamically write scala/spark code and  In command line, you can use . spark-shell -i file.scala to run code which is written in file.scala


you need create sparkSession exemple :

import org.apache.spark.sql.{SparkSession}
val spark= SparkSession.builder
  .master("local")
  .appName("MYAPP")
  .getOrCreate()

val survey = spark.read.format("com.databricks.spark.csv").option("header","true").option("nullValue","NA").option("timestampFormat","yyyy-MM-dd'T'HH:mm:ss").option("mode","failfast").option("inferchema","true").load("/tmp/survey.csv")

// for udf you need

import org.apache.spark.sql.functions._
val parseGenderUDF = udf( parseGender _ )

i hop this help you

How to write a Scala method that takes a simple generic type , And it was just then that Apache Spark was born, primarily written in Scala, the authors state that: "We chose Scala due to its combination of conciseness difference in writing a code in Scala and Java is blurring with each release. is Spark-SQL, which can come in handy if interactive and dynamic coding is required. In a context where Data Scientists write Python code but Software Engineers prefer to write Java/Scala code, that’s a good thing because we can share the responsibilities within the team. That makes me happy because I’m not a big fan of shipping code written by Data Scientists to production, and that also means we will have less code duplication between languages.


I have found an alternative (by cricket-007)

spark-shell -i survey.scala

But this takes time in configuring spark-shell it seems.

and this is not what I want

Spark SQL 2.x Fundamentals and Cookbook: More than 35 Exercises , creating javabeans · importing java code · multiple constructors This is a short recipe, Recipe 19.2, “How to write a Scala method that takes a simple generic type. names, you might follow the “Do the simplest thing that could possibly work” credo, As written, this works with a sequence of String values: The resiliency code was written in Scala. Now, I want to leverage that Scala code to connect Spark to Kafka in a PySpark application. We will see how we can call Scala code from Python code and what are the restrictions. Basic method call through Py4J. PySpark relies on Py4J to execute


Spark SQL and DataFrames - Spark 2.3.1 , Since Spark 2.0, you don't have to learn separate API for DataFrame and DataFrame to Dataset conversion: Covert a DataFrame to Dataset using Scala case classes. advanced then it will be created using generic column named like (_c0, _c1,. Catalyst optimizer can take advantage of this to optimize the code written  Of source the hdp-master:19000 needs to be accessible from the server that running the Spark/Scala code. At the moment, my HDFS is set as readable for all servers/users in the LAN. In a production environment, you may need to manage the permissions too.


Spark SQL and DataFrames - Spark 2.2.1 , Generic Load/Save Functions Throughout this document, we will often refer to Scala/Java Datasets of Row s as DataFrames. encoders are code generated dynamically and use a format that allows Spark to perform many code and works well when you already know the schema while writing your Spark application. • Spark itself is written in Scala, and Spark jobs can be written in Scala, Python, and Java (and more recently R and SparkSQL) • Other libraries (Streaming, Machine Learning, Graph Processing) • Percent of Spark programmers who use each language 88% Scala, 44% Java, 22% Python Note: This survey was done a year ago. I think if it were done today, we would see the rank as Scala, Python, and Java 18


Type safety and Spark Datasets in Scala, Generic Load/Save Functions Throughout this document, we will often refer to Scala/Java Datasets of Row s as DataFrames. encoders are code generated dynamically and use a format that allows Spark to perform many code and works well when you already know the schema while writing your Spark application. Created by admin on 2008-07-24. Updated: 2008-07-28, 21:48. Curious to see what a Scala program looks like? Here you will find the standard "Hello, world!" program, plus simple snippets of Scala code and more advanced code examples. A first program in Scala: how to run it using the interpreter or the compiler.