Spark on K8s - getting error: kube mode not support referencing app depenpendcies in local

spark kubernetes client mode
spark on kubernetes tutorial
spark kubernetes dynamic allocation
pyspark kubernetes

I am trying to setup a spark cluster on k8s. I've managed to create and setup a cluster with three nodes by following this article: https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/

After that when I tried to deploy spark on the cluster it failed at spark submit setup. I used this command:

~/opt/spark/spark-2.3.0-bin-hadoop2.7/bin/spark-submit \
--master k8s://https://206.189.126.172:6443 \
--deploy-mode cluster \
--name word-count \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=docker.io/garfiny/spark:v2.3.0 \
—-conf spark.kubernetes.driver.pod.name=word-count \
local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar

And it gives me this error:

Exception in thread "main" org.apache.spark.SparkException: The Kubernetes mode does not yet support referencing application dependencies in the local file system.
    at org.apache.spark.deploy.k8s.submit.DriverConfigOrchestrator.getAllConfigurationSteps(DriverConfigOrchestrator.scala:122)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:229)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:227)
    at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2585)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:227)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:192)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

2018-06-04 10:58:24 INFO ShutdownHookManager:54 - Shutdown hook called 2018-06-04 10:58:24 INFO ShutdownHookManager:54 - Deleting directory /private/var/folders/lz/0bb8xlyd247cwc3kvh6pmrz00000gn/T/spark-3967f4ae-e8b3-428d-ba22-580fc9c840cd

Note: I followed this article for installing spark on k8s. https://spark.apache.org/docs/latest/running-on-kubernetes.html

kubernetes - Spark on K8s, K8s - getting error: kube mode not support referencing app depenpendcies in local The issue SPARK-22962 "Kubernetes app fails if local files are used" Those dependencies can be added to the classpath by referencing them with local:// from the submission client's local file system is currently not yet supported. 0 Rails 3: As json with include option does not takes into account as_json redefinition for included association Nov 1 '12

According to the mentioned documentation:

Dependency Management

If your application’s dependencies are all hosted in remote locations like HDFS or HTTP servers, they may be referred to by their appropriate remote URIs. Also, application dependencies can be pre-mounted into custom-built Docker images. Those dependencies can be added to the classpath by referencing them with local:// URIs and/or setting the SPARK_EXTRA_CLASSPATH environment variable in your Dockerfiles. The local:// scheme is also required when referring to dependencies in custom-built Docker images in spark-submit.

Note that using application dependencies from the submission client’s local file system is currently not yet supported.

Spark on K8s - getting error: kube mode not support - html, Spark on K8s - getting error: kube mode not support referencing app depenpendcies in local - apache-spark. Spark for kubernetes-Azure Blob Storage credentials issue ; How to optimize partitioning when migrating data from JDBC source? How to flatten the data of different data types by using Sparklyr package? Spark on K8s-getting error: kube mode not support referencing app depenpendcies in local

I have the same case.

I do not know what to do? How to fix? Spark version 2.3.0.

Copied and renamed spark-kubernetes_2.11-2.3.1.jar -> spark-kubernetes_2.11-2.3.0.jar.

Spark does not find the corresponding kubernetes files.

bin/spark-submit \
--master k8s://https://lubernetes:6443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.namespace=spark \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=gcr.io/cloud-solutions-images/spark:v2.3.0-gcs \
--conf spark.kubernetes.authenticate.submission.caCertFile=/var/run/secrets/kubernetes.io/serviceaccount/k8.crt \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ 
local:///usr/spark-2.3.0/examples/jars/spark-examples_2.11-2.3.0.jar

Thanks for the help!

org.apache.spark.SparkException: The Kubernetes mode does not , Click here to show all. SparkSubmit.main() has thrown a SparkException org.​apache.spark.SparkException: The Kubernetes mode does not yet support referencing application dependencies in the local file system. Spark on K8s - getting error: kube mode not support referencing app depenpendcies in local. via Stack  Spark for kubernetes-Azure Blob Storage credentials issue ; How to optimize partitioning when migrating data from JDBC source? How to flatten the data of different data types by using Sparklyr package? Spark on K8s-getting error: kube mode not support referencing app depenpendcies in local

Running Spark on Kubernetes, Be aware that the default minikube configuration is not enough for running Spark In Kubernetes mode, the Spark application name that is specified by Those dependencies can be added to the classpath by referencing them with local:// If the pod has encountered a runtime error, the status can be probed further using: $ helm install --name spark-all --namespace spark ./spark-umbrella/ Error: release spark-all failed: persistentvolumeclaims "spark-all-jupyter" already exists Note: A user can specify a manually created persistent volume claim(PVC) in the persistence.existingClaim field. This is useful if one wants to use an existing PVC instead of provisioning a new volume dynamically thru chart.

Running Spark on Kubernetes, Be aware that the default minikube configuration is not enough for running Spark In Kubernetes mode, the Spark application name that is specified by Those dependencies can be added to the classpath by referencing them with local:// URIs from the submission client's local file system is currently not yet supported​. Starting with Spark 2.4.0, it is possible to run Spark applications on Kubernetes in client mode. When your application runs in client mode, the driver can run inside a pod or on a physical host. When running an application in client mode, it is recommended to account for the following factors: Client Mode Networking.

spark-on-k8s-operator/user-guide.md at master , Kubernetes operator for managing the lifecycle of Apache Spark applications on the SparkApplication specification in a YAML file and use the kubectl command or spec: type: Scala mode: cluster image: gcr.io/spark/spark:v2.4.5 mainClass: To support specification of application dependenies, a SparkApplication uses  ### Running in-cluster client mode applications: While Spark on Kubernetes does not officially support client mode applications, such as the PySpark shell, there is a workaround that

Comments
  • Thanks VonC. I used the newest spark-kubernetes jar to replace the one in spark-2.3.0-bin-hadoop2.7 package. The exception is gone. But there is still other problems I need to solve. I'll post the final testing result once I make everything works.