How to round decimal in Scala Spark

spark dataframe round column
spark scala round to 2 decimals
spark sql decimal precision
scala bigdecimal round to 2 decimal places
scala long decimal
spark when
spark describe round
spark sort_array

I have a (large ~ 1million) Scala Spark DataFrame with the following data:


How do I discretise/round the scores to the nearest 0.05 decimal place?

Expected result:


Would like to avoid using UDF to maximise performance.

You can do it using spark built in functions like so

dataframe.withColumn("rounded_score", round(col("score") * 100 / 5) * 5 / 100)
  1. Multiply it so that the precision you want is a whole number.
  2. Then divide that number by 5, and round.
  3. Now the number is divisable by 5, so multiply it by 5 to get back the entire number
  4. Divide by 100 to get the precision correct again.


| id|score|rounded_score|
|  1|0.956|         0.95|
|  2|0.977|          1.0|
|  3|0.855|         0.85|
|  4|0.866|         0.85|

Spark: Round to Decimal in Dataset - scala - html, {sum => typedSum} import org.apache.spark.sql.functions._ import org.apache.​spark.sql.types.{DecimalType} case class Record(BOOK: String,ID: String,CCY:  While accepted answer works and is more general, in this case you can also use round. You just need to make column typed after rounding using .as[T] (also defining type to avg becomes necessary)..agg( // Alternative ways to define a type to avg round(avg((r: MyRow) => r.c1)).as[Double], round(avg[MyRow](_.c2)).as[Double] )

You can specify your schema when convert into dataframe ,

Example :

DecimalType(10, 2) for the column in your customSchema when loading data.


import org.apache.spark.sql.types._

val mySchema = StructType(Array(
  StructField("id", IntegerType, true),
   StructField("score", DecimalType(10, 2), true)
  option("header", "true").option("nullvalue", "?").

Functions.Round Method (Microsoft.Spark.Sql), Returns the value of the column rounded to 0 decimal places with HALF_UP round mode. C# Copy. public static Microsoft.Spark.Sql.Column Round (Microsoft​. Stack Overflow Public questions by making the values of multiple columns for a given DF to round the decimal values to 2 positions. scala apache-spark hadoop

The answer can be simplifier:

dataframe.withColumn("rounded_score", round(col("score"), 2))

there is a method

def round(e: Column, scale: Int)

Round the value of e to scale decimal places with HALF_UP round mode

Spark 1.5 DataFrame API Highlights - Databricks, round is a function that rounds a numeric value to the specified precision. input numeric value is rounded to the decimal position specified by the precision. UserDefinedAggregateFunction import org.apache.spark.sql. I have some cassandra data that is of the type double that I need to round down in spark to 1 decimal place. The problem is how to extract it from cassandra, convert it to a decimal, round down to 1 decimal point and then write back to a table in cassandra.

Decimal (Spark 1.6.3 JavaDoc), Returns the value of the column e rounded to scale decimal places using HALF_EVEN rounding mode if scale >= 0 or at integer part when scale < 0. Also known  Decimal public Decimal() Method Detail. ROUND_HALF_UP public static scala.Enumeration.Value ROUND_HALF_UP() ROUND_HALF_EVEN public static scala.Enumeration.Value ROUND_HALF_EVEN() ROUND_CEILING public static scala.Enumeration.Value ROUND_CEILING() ROUND_FLOOR public static scala.Enumeration.Value ROUND_FLOOR() MAX_INT_DIGITS public static int MAX_INT_DIGITS()

R: bround, sessionState.conf) // Build analyzed logical plan // with sum aggregate function and Decimal field import org.apache.spark.sql.types.DecimalType val query  However I want the values to be rounded to 2 digit after the decimal like . 2.35 1.55 before summing it. How can I do it? I was not able to find any sub function like sum().round of function sum. Note: I am using Spark 1.5.1 version.

DecimalAggregates · The Internals of Spark SQL, scala.Math apparently has two methods named round. But the documentation looks wrong to me. Am I confused? The method declared to  here 5 is the decimal places you want to show. As you can see in the link above that the format_number functions returns a string column. format_number(Column x, int d) Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places, and returns the result as a string column.