Can't connect to Mongo DB via Spark

can definition
can synonym
can verb
can y
can noun meaning
can band

I'm trying to read data from Mongo DB through an Apache Spark master.

I'm using 3 machines for this:

  • M1 - with a Mongo DB instance on it
  • M2 - with a Spark Master, with Mongo connector, running on it
  • M3 - with a python application that connects to M2's Spark master

The application(M3) is getting a connection to the spark master like this:

_sparkSession = SparkSession.builder.master(masterPath).appName(appName)\
.config("spark.mongodb.input.uri", "mongodb://")\
.config("spark.mongodb.output.uri", "mongodb://").getOrCreate()

The application(M3) is trying to read data from the DB:

sqlContext = SQLContext(_sparkSession.sparkContext)
        df ="com.mongodb.spark.sql.DefaultSource").option("uri","mongodb://user:pass@").load()

but fails with this exception:

    py4j.protocol.Py4JJavaError: An error occurred while calling o56.load.
: java.lang.ClassNotFoundException: Failed to find data source: com.mongodb.spark.sql.DefaultSource. Please find packages at
        at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:594)
        at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86)
        at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:325)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
        at java.lang.reflect.Method.invoke(
        at py4j.reflection.MethodInvoker.invoke(
        at py4j.reflection.ReflectionEngine.invoke(
        at py4j.Gateway.invoke(
        at py4j.commands.AbstractCommand.invokeMethod(
        at py4j.commands.CallCommand.execute(
Caused by: java.lang.ClassNotFoundException: com.mongodb.spark.sql.DefaultSource.DefaultSource
        at java.lang.ClassLoader.loadClass(
        at java.lang.ClassLoader.loadClass(
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25$$anonfun$apply$13.apply(DataSource.scala:579)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25$$anonfun$apply$13.apply(DataSource.scala:579)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25.apply(DataSource.scala:579)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25.apply(DataSource.scala:579)
        at scala.util.Try.orElse(Try.scala:84)
        at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:579)
        ... 16 more

Spark can't find the com.mongodb.spark.sql.DefaultSource package, hence the error message.

Everything, else looks good just need to include the Mongo Spark package:

> $SPARK_HOME/bin/pyspark --packages org.mongodb.spark:mongo-spark-connector_2.11:2.2.0

Or ensure that the jar file is on the correct path.

Make sure you check the version of the Mongo-Spark package required for your version of Spark:

May I? Can I? - English Grammar - English, Can was the leading avant-garde rock group of the 70s. Can experimented with noise, synthesizers, non-traditional music, cut-and-paste techniques, and, most  Can definition, to be able to; have the ability, power, or skill to: She can solve the problem easily, I'm sure. See more.

I am a pyspark user, here is what my code looks like, and it works:

MongoDB connection configuration in pyspark
from pyspark.sql import SparkSession
spark = SparkSession\
    .config('spark.mongodb.input.uri', 'mongodb://user:password@ip.x.x.x:27017/')\
    .config('spark.mongodb.output.uri', 'mongodb://user:password@ip.x.x.x:27017/')\
    .config('spark.jars.packages', 'org.mongodb.spark:mongo-spark-connector_2.11:2.3.1')\
Read from MongoDB:
df01 =\
    .option("collection", "collection01")\
Write to MongoDB:
    .option("collection", "collection02")\

Use can in a sentence, Can may refer to: Contents. 1 Containers; 2 Music; 3 Other; 4 See also. Containers[edit]. Aluminum can · Drink can · Oil can · Steel and tin cans · Trash can  Define can. can synonyms, can pronunciation, can translation, English dictionary definition of can. to be able to, have the power or skill to: I can take a bus to the

I have had a quite hard time configuring the Spark connection to CosmosDB (API MongoDB), so I decided to post the code that worked for me as a contribution.

I used Spark 2.4.0 through a Databricks notebook.

from pyspark.sql import SparkSession

# Connect to CosmosDB to write on the collection
userName = "userName"
primaryKey = "myReadAndWritePrimaryKey"
host = "ipAddress"
port = "10255"
database = "dbName"
collection = "collectionName"

# Structure the connection
connectionString = "mongodb://{0}:{1}@{2}:{3}/{4}.{5}?ssl=true&replicaSet=globaldb".format(userName, primaryKey, host, port, database, collection)

spark = SparkSession\
    .config('spark.mongodb.input.uri', connectionString)\
    .config('spark.mongodb.output.uri', connectionString)\
    .config('spark.jars.packages', 'org.mongodb.spark:mongo-spark-connector_2.11:2.3.1')\

# Reading from CosmosDB
df =\
    .option("uri", connectionString)\
    .option("database", database)\
    .option("collection", collection)\

# Writing on CosmosDB (Appending new information without replacing documents)
    .option("uri", connectionString)\
    .option("replaceDocument", False)\
    .option("maxBatchSize", 100)\
    .option("database", database)\
    .option("collection", collection)\

I found the options to configure the connector at the link.

CAN (noun) definition and synonyms, VerbEdit. can (third-person singular simple present can, present participle -, simple past could, past participle (obsolete except in adjectival use) couth). 110 synonyms of can from the Merriam-Webster Thesaurus, plus 97 related words, definitions, and antonyms. Find another word for can. Can: to bring (as an action or operation) to an immediate end.

Can, can meaning: 1. to be able to: 2. used to say that you can and will do something: 3. to be allowed to: . Learn more. can definition: 1. to be able to: 2. used to say that you can and will do something: 3. to be allowed to: . Learn more.

Can, 1.4Used to indicate that something is typically the case. 'antique clocks can seem out of place in modern homes'. More example sentences. Can means someone or something knows how to, is able to, is likely to or has the right to do something. An example of can is someone knowing how to play the piano. An example of can is a cat being able to paint. An example of can is a car that usually starts.

The Official CAN / Spoon Records Website, can. (kæn). n. 1. a container, esp for liquids, usually of thin sheet metal: a petrol can;  Looking for online definition of CAN or what CAN stands for? CAN is listed in the World's largest and most authoritative dictionary database of abbreviations and acronyms The Free Dictionary

  • See this solution, implemented in Jupyter Notebooks:
  • Thank you for your answer. I specified that I ran the app through a remote Python application, and not by the PySpark shell. So, as a noob python developer, I ask again, how do I run my application with the connector package. Or do I need to run the spark master with the package?
  • Please update the question with more information on how you submit your spark jobs and I'll look to update my answer.
  • I changed the way I use the spark master. I initiate the Spark master and its slaves. After that I run the spark-submit with the mongo-spark-connector package and the python script. Guess that's the recommended way. Thanx all
  • @Ross I have the same issue and can't seem to resolve it. any ideas?
  • When I ran into this, I didn't have the mongodb-spark-connector_2.11-2.2.3.jar in my $SPARK_HOME/jars (e.g. /usr/local/spark-2.2.2-bin-hadoop2.7/jars).
  • This should be the accepted answer. The spark.jars.packages option is documented at…