RDD中cache和persist的区别

  • Post author:
  • Post category:其他



通过观察RDD.scala源代码即可知道cache和persist的区别:


def


persist


(newLevel: StorageLevel): this.type = {




if (storageLevel != StorageLevel.NONE && newLevel != storageLevel) {




throw new UnsupportedOperationException( “Cannot change storage level of an RDD after it was already assigned a level”)



}



sc.persistRDD(this)


sc.cleaner.foreach(_.registerRDDForCleanup(this))



storageLevel = newLevel



this



}


/** Persist this RDD with the default storage level (`MEMORY_ONLY`). */



def


persist


(): this.type = persist(


StorageLevel.MEMORY_ONLY


)



/** Persist this RDD with the default storage level (`MEMORY_ONLY`). */



def


cache


(): this.type = persist()













可知:




1)RDD的cache()方法其实调用的就是persist方法,缓存策略均为MEMORY_ONLY;





2)可以通过persist方法手工设定StorageLevel来满足工程需要的存储级别;





3)cache或者persist并不是action;