Anteckning
Åtkomst till den här sidan kräver auktorisering. Du kan prova att logga in eller ändra kataloger.
Åtkomst till den här sidan kräver auktorisering. Du kan prova att ändra kataloger.
Usage
revoscalepy.rx_read_xdf(file: str, vars_to_keep: list = None,
    vars_to_drop: list = None, row_var_name: str = None,
    start_row: int = 1, num_rows: int = None,
    return_data_frame: bool = True,
    strings_as_factors: bool = False,
    max_rows_by_columns: int = None, report_progress: int = None,
    read_by_block: bool = False, cpp_interp: list = None)
Description
Read data from an “.xdf” file into a data frame.
Arguments
file
Either an RxXdfData object or a character string specifying the “.xdf” file.
vars_to_keep
List of strings of variable names to include when reading from the input data file. If None, argument is ignored. Cannot be used with vars_to_drop.
vars_to_drop
List of strings of variable names to exclude when reading from the input data file. If None, argument is ignored. Cannot be used with vars_to_keep.
row_var_name
Optional character string specifying the variable in the data file to use as row names for the output data frame.
start_row
Starting row for retrieval.
num_rows
Number of rows of data to retrieve. If -1, all are read.
return_data_frame
Bool indicating whether or not to create a data frame. If False, a list is returned.
strings_as_factors
Bool indicating whether or not to convert strings into factors.
max_rows_by_columns
The maximum size of a data frame that will be read in, measured by the number of rows times the number of columns. If the number of rows times the number of columns being extracted from the “.xdf” file exceeds this, a warning will be reported and a smaller number of rows will be read in than requested. If max_rows_by_columns is set to be too large, you may experience problems from loading a huge data frame into memory. To extract a subset of rows and/or columns from an “.xdf” file, use rx_data_step.
report_progress
Integer value with options: 0: No progress is reported. 1: The number of processed rows is printed and updated. 2: Rows processed and timings are reported. 3: Rows processed and all timings are reported.
read_by_block
Read data by blocks. This argument is deprecated.
cpp_interp
List of information sent to C++ interpreter.
Returns
a data frame.
See also
Example
import os
from revoscalepy import RxOptions, rx_read_xdf
sample_data_path = RxOptions.get_option("sampleDataDir")
mort_file = os.path.join(sample_data_path, "mortDefaultSmall.xdf")
mort_df = rx_read_xdf(mort_file, num_rows = 10)
print(mort_df)
Output:
Rows Processed: 10
Time to read data file: 0.00 secs.
Time to convert to data frame: less than .001 secs.
   creditScore  houseAge  yearsEmploy  ccDebt  year  default
0          691        16            9    6725  2000        0
1          691         4            4    5077  2000        0
2          743        18            3    3080  2000        0
3          728        22            1    4345  2000        0
4          745        17            3    2969  2000        0
5          539        15            3    4588  2000        0
6          724         6            5    4057  2000        0
7          659        14            3    6456  2000        0
8          621        18            3    1861  2000        0
9          720        14            7    4568  2000        0