Flatmap vs map in Apache Spark

Donald Le
2 min readDec 31, 2020
Photo by Vadim Sadovski on Unsplash

Sometimes we want to produce multiple output elements for each input element. The operation to do this is called flatMap() . As with map() , the function we provide to flatMap() is called individually for each element in our input RDD. Instead of returning a single element, we return an iterator with our return values. Rather than producing an RDD of iterators, we get back an RDD that consists of the elements from all of the iterators.

--

--

Donald Le

A passionate automation engineer who strongly believes in “A man can do anything he wants if he puts in the work”.