Create Tuple out of Array(Array[String) of Varying Sizes using Scala

scala list of tuples
scala partition
scala tuple
make tuple scala
scala zip
index a tuple scala
scala list take
return tuples scala

I am new to scala and I am trying to make a Tuple pair out an RDD of type Array(Array[String]) that looks like:

(122abc,223cde,334vbn,445das),(221bca,321dsa),(231dsa,653asd,698poq,897qwa)

I am trying to create Tuple Pairs out of these arrays so that the first element of each array is key and and any other part of the array is a value. For example the output would look like:

122abc    223cde
122abc    334vbn
122abc    445das
221bca    321dsa
231dsa    653asd
231dsa    698poq
231dsa    897qwa

I can't figure out how to separate the first element from each array and then map it to every other element.

If I'm reading it correctly, the core of your question has to do with separating the head (first element) of the inner arrays from the tail (remaining elements), which you can use the head and tail methods. RDDs behave a lot like Scala lists, so you can do this all with what looks like pure Scala code.

Given the following input RDD:

val input: RDD[Array[Array[String]]] = sc.parallelize(
  Seq(
    Array(
      Array("122abc","223cde","334vbn","445das"),
      Array("221bca","321dsa"),
      Array("231dsa","653asd","698poq","897qwa")
    )
  )
)

The following should do what you want:

val output: RDD[(String,String)] =
  input.flatMap { arrArrStr: Array[Array[String]] =>
    arrArrStr.flatMap { arrStrs: Array[String] =>
      arrStrs.tail.map { value => arrStrs.head -> value }
    }
  }

And in fact, because of how the flatMap/map is composed, you could re-write it as a for-comprehension.:

val output: RDD[(String,String)] =
  for {
    arrArrStr: Array[Array[String]] <- input
    arrStr: Array[String] <- arrArrStr
    str: String <- arrStr.tail
  } yield (arrStr.head -> str)

Which one you go with is ultimately a matter of personal preference (though in this case, I prefer the latter, as you don't have to indent code as much).

For verification:

output.collect().foreach(println)

Should print out:

(122abc,223cde)
(122abc,334vbn)
(122abc,445das)
(221bca,321dsa)
(231dsa,653asd)
(231dsa,698poq)
(231dsa,897qwa)

Scala Cookbook: Recipes for Object-Oriented and Functional Programming, I am new to scala and I am trying to make a Tuple pair out an RDD of type Array(​Array[String]) that looks like: (122abc,223cde,334vbn,445das),(221bca,321dsa)  Scala tuple combines a fixed number of items together so that they can be passed around as a whole. Unlike an array or list, a tuple can hold objects with different types but they are also immutable.

This is a classic fold operation; but folding in Spark is calling aggregate:

// Start with an empty array
data.aggregate(Array.empty[(String, String)]) { 
  // `arr.drop(1).map(e => (arr.head, e))` will create tuples of 
  // all elements in each row and the first element.
  // Append this to the aggregate array.
  case (acc, arr) => acc ++ arr.drop(1).map(e => (arr.head, e))
}

The solution is a non-Spark environment:

scala> val data = Array(Array("122abc","223cde","334vbn","445das"),Array("221bca","321dsa"),Array("231dsa","653asd","698poq","897qwa"))
scala> data.foldLeft(Array.empty[(String, String)]) { case (acc, arr) =>
     |     acc ++ arr.drop(1).map(e => (arr.head, e))
     | }
res0: Array[(String, String)] = Array((122abc,223cde), (122abc,334vbn), (122abc,445das), (221bca,321dsa), (231dsa,653asd), (231dsa,698poq), (231dsa,897qwa))

Scala: How to merge two sequential collections into pairs with 'zip , Make ArrayBuffer Your “GoTo” Mutable Sequence 10.9. Tuples, for When You Just Need a Bag of Things 10.28. Sorting a Collection 10.29. Converting a Collection to a String with mkString List, Array, Map, Set (and More). Different Ways to Create and Update an Array 238 242 245 246 250 255 261 264 265 266 268  The tuple toString method gives you a nice representation of a tuple: scala> t.toString res9: java.lang.String = (Al,42,200.0) scala> println(t.toString) (Al,42,200.0) Creating a tuple with -> In another cool feature, you can create a tuple using this syntax: 1 -> "a" This creates a Tuple2, which we can demonstrate in the REPL:

Convert your input element to seq and all and then try to write the wrapper which will give you List(List(item1,item2), List(item1,item2),...)

Try below code

val seqs = Seq("122abc","223cde","334vbn","445das")++
Seq("221bca","321dsa")++
Seq("231dsa","653asd","698poq","897qwa")

Write a wrapper to convert seq into a pair of two

def toPairs[A](xs: Seq[A]): Seq[(A,A)] = xs.zip(xs.tail)

Now send your seq as params and it it will give your pair of two

toPairs(seqs).mkString(" ")

After making it to string you will get the output like

res8: String = (122abc,223cde) (223cde,334vbn) (334vbn,445das) (445das,221bca) (221bca,321dsa) (321dsa,231dsa) (231dsa,653asd) (653asd,698poq) (698poq,897qwa)

Now you can convert your string, however, you want.

How to split sequences into subsets in Scala (groupBy, partition , Use the zip method that's available to Scala sequential collections to Barney) scala> val couples = women zip men couples: List[(String, This creates an Array of Tuple2 elements, which is a merger of the two original sequences. Once you have a sequence of tuples like couples , you can convert it to  Scala Array Declaration. The syntax for declaring an array variable is. var arrayname = new Array[datatype](size) var indicates variable and arrayname is the name of the array, new is the keyword, datatype indicates the type of data such as integer, string and size is the number of elements in an array.

Using df and explode.

  val df =   Seq(
      Array("122abc","223cde","334vbn","445das"),
      Array("221bca","321dsa"),
      Array("231dsa","653asd","698poq","897qwa")
    ).toDF("arr")
    val df2 = df.withColumn("key", 'arr(0)).withColumn("values",explode('arr)).filter('key =!= 'values).drop('arr).withColumn("tuple",struct('key,'values))
    df2.show(false)
    df2.rdd.map( x => Row( (x(0),x(1)) )).collect.foreach(println)

Output:

+------+------+---------------+
|key   |values|tuple          |
+------+------+---------------+
|122abc|223cde|[122abc,223cde]|
|122abc|334vbn|[122abc,334vbn]|
|122abc|445das|[122abc,445das]|
|221bca|321dsa|[221bca,321dsa]|
|231dsa|653asd|[231dsa,653asd]|
|231dsa|698poq|[231dsa,698poq]|
|231dsa|897qwa|[231dsa,897qwa]|
+------+------+---------------+


[(122abc,223cde)]
[(122abc,334vbn)]
[(122abc,445das)]
[(221bca,321dsa)]
[(231dsa,653asd)]
[(231dsa,698poq)]
[(231dsa,897qwa)]

Update1:

Using paired rdd

val df =   Seq(
  Array("122abc","223cde","334vbn","445das"),
  Array("221bca","321dsa"),
  Array("231dsa","653asd","698poq","897qwa")
).toDF("arr")
import scala.collection.mutable._
val rdd1 = df.rdd.map( x => { val y = x.getAs[mutable.WrappedArray[String]]("arr")(0); (y,x)} )
val pair = new PairRDDFunctions(rdd1)
pair.flatMapValues( x => x.getAs[mutable.WrappedArray[String]]("arr") )
    .filter( x=> x._1 != x._2)
    .collect.foreach(println)

Results:

(122abc,223cde)
(122abc,334vbn)
(122abc,445das)
(221bca,321dsa)
(231dsa,653asd)
(231dsa,698poq)
(231dsa,897qwa)

Scala Tutorial, You want to partition a Scala sequence into two or more different be used to split sequences into subsequences, though sliding can generate many toArray nums: Array[Int] = Array(1, 2, 3, 4, 5) // size = 2 scala> nums.sliding(2). List[(​String, String)] = List((Kim,Al), (Julia,Terry)) scala> val (women, men)  This is Recipe 11.11, “How to Create Multidimensional Arrays in Scala” Problem. You need to create a multidimensional array, i.e., an array with two or more dimensions. Solution. There are two main solutions: Use Array.ofDim to create a multidimensional array. You can use this approach to create arrays of up to five dimensions.

Scala programming guide, The elements of a Tuple can be recovered in a few different ways. of the range and the size of the interval on each side of the midpoint. In Scala, we can create lists of Strings, Ints, and Doubles (and more). So If you have a blog with unique and interesting content then you should check out our JCG  Scala provides a data structure, the array, which stores a fixed-size sequential collection of elements of the same type.An array is used to store a collection of data, but it is often more useful to think of an array as a collection of variables of the same type.

4. Working with Key/Value Pairs, RDDs are created by starting with a file in the Hadoop file system (or any other set of tasks on different nodes, it ships a copy of each variable used in the function The master parameter is a string specifying a Spark or Mesos cluster URL to scala> val data = Array(1, 2, 3, 4, 5) data: Array[Int] = Array(1, 2, 3, 4, 5) scala>  Array is collection of homogeneous kind of data. Scala has a different syntax of creating array. var a:Array[String]=new Array[String](5) This video will teach you how to use array in scala.

Introduction to the Art of Programming Using Scala, Creating a pair RDD using the first word as the key in Scala Java doesn't have a built-in tuple type, so Spark's Java API has users create tuples using the scala. Combine values with the same key using a different result type. FlatMapFunction < String , String >() { public Iterable < String > call ( String x ) { return Arrays . On the one hand, Scala arrays correspond one-to-one to Java arrays. That is, a Scala array Array[Int] is represented as a Java int[], an Array[Double] is represented as a Java double[] and a Array[String] is represented as a Java String[]. But at the same time, Scala arrays offer much more than their Java analogues. First, Scala arrays can be

Comments
  • Why do you have two of 221bca 321dsa?
  • @smac89 that was a typo sorry. Changed now.
  • Are you trying to map an RDD[Array[Array[String]]] to an RDD[(String,String)]?
  • @JackLeow yes I am trying to map RDD[Array[Array[String]]] to an RDD[(String,String)]. Sorry if I was not being clear enough.
  • I'm not sure, but your output doesn't really look like OP's.
  • toPairs(seqs) will give you List(List(item1,item2),List(item1,item2)...) so it is pretty much which are supposed to come and then you can convert into however you want.
  • No, that's not what OP wants. OP wants to create a single array of tuples where the tuples came from each subarray's first element combined with the rest of elements of the subarray for each subarray in the original RDD.