Replicate Spark Row N-times

spark create multiple rows from single row
spark explode
pandas duplicate each row n times
pyspark duplicate rows
pyspark duplicate dataframe
pyspark add duplicate rows
spark dataframe row
pyspark udf return multiple rows

I want to duplicate a Row in a DataFrame, how can I do that?

For example, I have a DataFrame consisting of 1 Row, and I want to make a DataFrame with 100 identical Rows. I came up with the following solution:

  var data:DataFrame=singleRowDF

   for(i<-1 to 100-1) {
       data = data.unionAll(singleRowDF)
   }

But this introduces many transformations and it seems my subsequent actions become very slow. Is there another way to do it?

You can add a column with a literal value of an Array with size 100, and then use explode to make each of its elements create its own row; Then, just get rid of this "dummy" column:

import org.apache.spark.sql.functions._

val result = singleRowDF
  .withColumn("dummy", explode(array((1 until 100).map(lit): _*)))
  .selectExpr(singleRowDF.columns: _*)

Replicating a Spark DataFrame Row N-times – Tech Adventures, Replicating a Spark DataFrame Row N-times. Here the column must be of type array. The explode function creates a new row for each element in the given array or map column. Since the repetition parameter is not fixed, for each row we must create a column of type array whose length will be determined by some other column asked Jul 15, 2019 in Big Data Hadoop & Spark by Aarav (11.5k points) I've got a dataframe like this and I want to duplicate the row n times if the column n is bigger than one: A B n

You could pick out the single row, make a list with a hundred elements, populated with that row and convert it back into a dataframe.

import org.apache.spark.sql.DataFrame

val testDf = sc.parallelize(Seq(
    (1,2,3), (4,5,6)
)).toDF("one", "two", "three")

def replicateDf(n: Int, df: DataFrame) = sqlContext.createDataFrame(
    sc.parallelize(List.fill(n)(df.take(1)(0)).toSeq), 
    df.schema)

val replicatedDf = replicateDf(100, testDf)

Pyspark: how to duplicate a row n time in dataframe?, I've got a dataframe like this and I want to duplicate the row n times if the column n is bigger explode, but I don't understand how it works Option Explicit. Sub InsRws () 'Excel VBA to copy n times. Dim rng as Range. For Each rng In Range ("C10", Range ("C" & Rows.Count).End (xlUp)) Cells (Rows.Count, 11).End (xlUp) (2).Resize (rng.Value, 3)=rng.Offset (, -2).Resize (1, 3).Value. Next rng.

You could use a flatMap, or a for-comprehension, like it is described here.

I encourage you to use DataSets every time you can, but if it's not possible, the last example in the link works with DataFrames as well:

val df = Seq(
  (0, "Lorem ipsum dolor", 1.0, List("prp1", "prp2", "prp3"))
).toDF("id", "text", "value", "properties")

val df2 = for {
  row <- df
  p <- row.getAs[Seq[String]]("properties")
} yield (row.getAs[Int]("id"), row.getAs[String]("text"), row.getAs[Double]("value"), p)

Also keep in mind that explode is deprecated, see here.

Replicating a row n times, Replicating a row n times. Hi, I'm trying to replicate a single row from a dataset n times and create a new dataset from it. But, while replicating I� Suppose I have the following dataset. id var1 var2 var3. 1 1 4 6 2 7 4 0 3 1 9 6 How do I replicate each observation (1,2, and 3) twice so that the final output looks like. 1 1 4 6 1 1 4 6 2 7 4 0 2 7 4 0 3 1 9 6 3 1 9 6 One (inefficient) way of doing this is to make two identical datasets an

Pyspark: how to duplicate a row n time in , spark make duplicate rows pyspark create copy of dataframe. I've got a dataframe like this and I want to duplicate the row n times if the column n is bigger than� Hello, I have a revenue table that shows the total contract value, the start month, and the number of months (Term__c). I have been able to repeat the rows for the number of months, but I need to add a new column showing the number of months incrementing from 1 to Number of months. Here's what my

Replicate Spark Row N-times - scala - html, I want to get max value for each column of a dataframe in Spark. My code works just for one column (e.g. first): val col = df.columns(0); val Row(maxValue: Int)� Represents one row of output from a relational operator. Allows both generic access by ordinal, which will incur boxing overhead for primitives, as well as native primitive access. It is invalid to use the native primitive interface to retrieve a value that is null, instead a user must check isNullAt before attempting to retrieve a value that

Replicating rows in Spark Dataset N times - scala - html, For the reference, here is the code that replicates rows using the DataFrame API: val dfReplicated = df. withColumn("__temporarily__", typedLit((0 until replicas). Hello, I am having a bit of trouble with replicating every row in my dataset n times. The idea is to merge 4003 subjects from 44 groups, replicated 1000 times, with 1000 rows of logistic model parameter estimates from 1000 bootstrapped samples of my original data, for each of the 4003 subjects.

Comments
  • I don't see why this question should be closed as duplicate because this question is older than the other question... if at all, the other question should be marked as duplicate
  • you could drop('dummy) instead of the more complicated selectExpr
  • @TzachZohar Thats great, although I've still problems to understand how it works :)
  • How to rewrite this in pyspark? I tried df.withColumn("dummy", explode(map(lit, range(repeated)))).drop("dummy"), and it prints out literals, use 'lit', 'array', 'struct' or 'create ... error