Calculation in Spark is expensive. That’s why we should prioritize which kind of data should be calculated, and which should not. In order to keep the data to be cached ( no need to recalculate) we can use persist method.
object LocalPi {
val conf: SparkConf = new SparkConf().setMaster("local[*]").setAppName("test")
def main(args: Array[String]): Unit = {
val sc = new SparkContext(conf)
val input = sc.parallelize(List(1, 2, 3, 4))
val result = input.map(x => (x * x,x+x))
result.persist(StorageLevel.DISK_ONLY)
println(result.count())…