21. A student on attachment is preparing the dataset to be used for training a linear regression model in Scikit Learn. During exploratory data analysis, he has detected multiple feature columns that have missing values. The percentage of missing data across the whole training dataset is about 15%. The Specialist is worried that this might cause bias to his model that can lead to inaccurate results. Which approach will MOST likely yield the best result in reducing the bias caused by missing values?Compute the mean of non-missing values in the same column and use the result to replace missing values.Use supervised learning methods to estimate the missing values for each featureCompute the mean of non-missing values in the same row and use the result to replace missing values.Drop the columns that include missing values because they only account for 10% of the training data.

Question

A student on attachment is preparing the dataset to be used for training a linear regression model in Scikit Learn. During exploratory data analysis, he has detected multiple feature columns that have missing values. The percentage of missing data across the whole training dataset is about 15%. The Specialist is worried that this might cause bias to his model that can lead to inaccurate results. Which approach will MOST likely yield the best result in reducing the bias caused by missing values?Compute the mean of non-missing values in the same column and use the result to replace missing values.Use supervised learning methods to estimate the missing values for each featureCompute the mean of non-missing values in the same row and use the result to replace missing values.Drop the columns that include missing values because they only account for 10% of the training data.

Question