An Outlier in Linear Regression

Brain Glitch
2 min readFeb 3, 2023

Outlier: Data points that are far away from observations/Regression Data Points.

1. How outliers are introduced in the data

  1. Data Entry Error: Refers to errors made during manual entry of data into a database or spreadsheet.
  2. Measurement Error: Refers to errors made during data collection due to inaccuracies in instruments or techniques.
  3. Intentional Error: Refers to errors that are deliberately made, such as using fake or dummy data in a dataset.
  4. Sampling Error: Refers to errors made when incorrect or biased samples or wrong resources are selected for analysis, leading to incorrect results.
  5. Natural Error: Refers to errors in data that occur naturally, such as due to real value existing more than the average value.

Impact of Outliers:

  1. Reduce statistical analysis: Outliers reduce the power of statistical analysis, making it less accurate and reliable.
  2. Mean and std impact: Outliers have a high impact on the mean and standard deviation of a dataset.
  3. Algorithm assumptions: Outliers violate the basic assumptions of algorithms, such as normality and homoscedasticity, making them less effective or invalid.
  4. Algorithm performance: Outliers decrease the accuracy, mean squared error, precision, and recall of algorithms.

2.1 Outliers Impact on following ML Algorithms

2.1.1 Linear Regression
2.1.2 Logistic Regression
2.1.3 K-nearest Neighbour
2.1.4 Support Vector Machine
2.1.5 K-Mean-Clustering

2.2 Outliers not Impact on following ML Algorithms

1. Decision Tree
2. Random Forest
3. AdaBoost
4. XGBoost
5. GradientBoost
6. Naive Bayes (MULTINOMIAL, NB, BNB)

3. Methods to detect the Outliers

  1. Z-score > Numerical Method
    2. IQR Method > Numerical Method
    3. Boxplot > Visualization Method
    4. Scatterplot > Visualization Method
    5. Kdeplot > Visualization Method

4. How to handle Outliers

How to handle Outliers(click here for more details)

--

--

Brain Glitch

Artificial Intelligence | Spirituality | Manifestation | Conspiracy | Technology