Sometimes we want to produce multiple output elements for each input element. The operation to do this is called flatMap() . As with map() , the function we provide to flatMap() is called individually for each element in our input RDD. Instead of returning a single element, we return an iterator with our return values. Rather than producing an RDD of iterators, we get back an RDD that consists of the elements from all of the iterators.
For example, to print to the screen each string using map:
object LocalPi {
val conf: SparkConf = new SparkConf().setMaster("local[*]").setAppName("test")
def main(args: Array[String]): Unit = {
val sc = new SparkContext(conf)
val input = sc.parallelize(List("akka netflix", "production", "ata", "abc"))
val result = input.map(x => x)
println(result.collect().mkString(","))
}
}
Show the result
akka netflix,production,ata,abc
When we need to work with traversable value, we will use flatMap, which will divide each word into multiple characters.
object LocalPi {
val conf: SparkConf = new SparkConf().setMaster("local[*]").setAppName("test")
def main(args: Array[String]): Unit = {
val sc = new SparkContext(conf)
val input = sc.parallelize(List("akka netflix", "production", "ata", "abc"))
val result = input.flatMap(x => x)
println(result.collect().mkString(","))
}
}
The result will be shown as:
a,k,k,a, ,n,e,t,f,l,i,x,p,r,o,d,u,c,t,i,o,n,a,t,a,a,b,c
Hope it helps
~~PEACE~~