## How to round decimal in Scala Spark

spark scala round to 2 decimals

spark sql decimal precision

scala bigdecimal round to 2 decimal places

scala long decimal

spark when

spark describe round

spark sort_array

I have a (large ~ 1million) Scala Spark DataFrame with the following data:

id,score 1,0.956 2,0.977 3,0.855 4,0.866 ...

How do I discretise/round the scores to the nearest 0.05 decimal place?

Expected result:

id,score 1,0.95 2,1.00 3,0.85 4,0.85 ...

Would like to avoid using UDF to maximise performance.

You can do it using spark built in functions like so

dataframe.withColumn("rounded_score", round(col("score") * 100 / 5) * 5 / 100)

- Multiply it so that the precision you want is a whole number.
- Then divide that number by 5, and round.
- Now the number is divisable by 5, so multiply it by 5 to get back the entire number
- Divide by 100 to get the precision correct again.

result

+---+-----+-------------+ | id|score|rounded_score| +---+-----+-------------+ | 1|0.956| 0.95| | 2|0.977| 1.0| | 3|0.855| 0.85| | 4|0.866| 0.85| +---+-----+-------------+

**Spark: Round to Decimal in Dataset - scala - html,** {sum => typedSum} import org.apache.spark.sql.functions._ import org.apache.spark.sql.types.{DecimalType} case class Record(BOOK: String,ID: String,CCY: While accepted answer works and is more general, in this case you can also use round. You just need to make column typed after rounding using .as[T] (also defining type to avg becomes necessary)..agg( // Alternative ways to define a type to avg round(avg((r: MyRow) => r.c1)).as[Double], round(avg[MyRow](_.c2)).as[Double] )

You can specify your schema when convert into dataframe ,

Example :

DecimalType(10, 2) for the column in your customSchema when loading data.

id,score 1,0.956 2,0.977 3,0.855 4,0.866 ... import org.apache.spark.sql.types._ val mySchema = StructType(Array( StructField("id", IntegerType, true), StructField("score", DecimalType(10, 2), true) )) spark.read.format("csv").schema(mySchema). option("header", "true").option("nullvalue", "?"). load("/path/to/csvfile").show

**Functions.Round Method (Microsoft.Spark.Sql),** Returns the value of the column rounded to 0 decimal places with HALF_UP round mode. C# Copy. public static Microsoft.Spark.Sql.Column Round (Microsoft. Stack Overflow Public questions by making the values of multiple columns for a given DF to round the decimal values to 2 positions. scala apache-spark hadoop

The answer can be simplifier:

dataframe.withColumn("rounded_score", round(col("score"), 2))

there is a method

def round(e: Column, scale: Int)

Round the value of

`e`

to`scale`

decimal places with HALF_UP round mode

**Spark 1.5 DataFrame API Highlights - Databricks,** round is a function that rounds a numeric value to the specified precision. input numeric value is rounded to the decimal position specified by the precision. UserDefinedAggregateFunction import org.apache.spark.sql. I have some cassandra data that is of the type double that I need to round down in spark to 1 decimal place. The problem is how to extract it from cassandra, convert it to a decimal, round down to 1 decimal point and then write back to a table in cassandra.

**Decimal (Spark 1.6.3 JavaDoc),** Returns the value of the column e rounded to scale decimal places using HALF_EVEN rounding mode if scale >= 0 or at integer part when scale < 0. Also known Decimal public Decimal() Method Detail. ROUND_HALF_UP public static scala.Enumeration.Value ROUND_HALF_UP() ROUND_HALF_EVEN public static scala.Enumeration.Value ROUND_HALF_EVEN() ROUND_CEILING public static scala.Enumeration.Value ROUND_CEILING() ROUND_FLOOR public static scala.Enumeration.Value ROUND_FLOOR() MAX_INT_DIGITS public static int MAX_INT_DIGITS()

**R: bround,** sessionState.conf) // Build analyzed logical plan // with sum aggregate function and Decimal field import org.apache.spark.sql.types.DecimalType val query However I want the values to be rounded to 2 digit after the decimal like . 2.35 1.55 before summing it. How can I do it? I was not able to find any sub function like sum().round of function sum. Note: I am using Spark 1.5.1 version.

**DecimalAggregates · The Internals of Spark SQL,** scala.Math apparently has two methods named round. But the documentation looks wrong to me. Am I confused? The method declared to here 5 is the decimal places you want to show. As you can see in the link above that the format_number functions returns a string column. format_number(Column x, int d) Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places, and returns the result as a string column.