Vertical Pearson
Introduction
This algorithm calculates the pearson correlation coefficient matrix for the data owned by all the parties. Pearson correlation coefficient is the ratio between the covariance of two variables and the product of their standard deviations.
\(\rho_{X,Y} = \frac{cov(X, Y)}{\sigma_X \sigma_Y} = \frac{\mathbb{E}[(X-\mu_X)(Y-\mu_Y)]}{\sigma_X \sigma_Y}\)
Parameter List
identity: str The role of each participant in federated learning, should be label_trainer or trainer.
- model_info:
name:
strModel name, should be vertical_pearson.
- input:
- trainset:
type:
strTrain dataset type, currently supported is csv.path:
strThe folder path of train dataset.name:
strThe file name of train dataset.has_id:
boolWhether the dataset has id column.has_label:
boolWhether the dataset has label column.
- output:
path:
strOutput folder path.- model:
name:
strFile name of output model.
- train_info:
- train_params:
col_index:
list or intColumn indexes involved in calculation. If it is -1, all columns participate in the calculation.col_names:
strColumn names involved in calculation. the format is “name1, …, nameN”. If both name and index are provided, the union set of them will be applied.- encryption:
- paillier:
key_bit_size:
intBit length of paillier key, recommend to be greater than or equal to 2048.precision:
intPrecison.djn_on:
boolWhether to use djn method to generate key pair.parallelize_on:
boolWhether to use multicore for computing.
plain:
mapNo encryption, an alternative to otp encryption, please set to “plain”: {}.
max_num_core:
intMax number of cpu cores used for computing.sample_size:
intRow sampling size for speeding up pearson computation.