Local Data Split
Introduction
Local data split is a module that split raw data into train and validation dataset.
Parameter List
identity: str Federated identity of the party, should be label_trainer or trainer.
- model_info:
name:
strModel name, should be local_data_split.
- input:
- dataset:
type:
strInput dataset type, support csv.path:
strFolder path of input dataset.name:
strFile name of input dataset. If None, all csv files under the folder path will be concated as the input dataset.has_label:
boolWhether dataset has label column.has_header:
boolWhether dataset has header. If True, the header of each input file must be the same.
- output:
path:
strFolder path of output.- trainset:
name:
strFile name of output train dataset.
- valset:
name:
strFile name of output validation dataset.
- train_info:
- train_params:
shuffle:
boolIf True, input data will be shuffled.max_num_cores:
intNumber of workers for parallel computing.batch_size:
intThe size of small file in shuffle process.train_weight:
intThe proportion of train dataset.val_weight:
intThe proportion of validation dataset.