Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Usage
revoscalepy.rx_spark_cache_data(in_data: revoscalepy.datasource.RxDataSource.RxDataSource,
    cache: bool)
Description
Use this function to set the cache flag to control whether data objects should be cached in Spark memory system, applicable for RxXdfData, RxHiveData, RxParquetData, RxOrcData and RxSparkDataFrame.
Arguments
in_data
A data source that can be RxXdfData, RxHiveData, RxParquetData, RxOrcData or RxSparkDataFrame. RxTextData is currently not supported.
cache
Bool value controlling whether the data should be cached or not. If TRUE the data will be cached in the Spark memory system after the first use for performance enhancement.
Returns
Data object with new cache flag.
Example
from revoscalepy import RxHiveData, rx_spark_connect, rx_spark_cache_data
hive_data = RxHiveData(table = "mytable")
# The cache flag of hiveData is now set to TRUE.
hive_data = rx_spark_cache_data(hive_data, True)
# set cache value to False
hive_data = rx_spark_cache_data(hive_data, False)