Another common interview question
• What are the assumptions of Linear Regression?
• How do we check them?
• How can we fix them?
Here's the answer ↓
0/5
1. Linear Relationship
It is assumed that the relationship between the dependent and independent variables is linear.
!How to check:
• Observe the Residuals vs Fitted Value plots, there shouldn't be any evident pattern.
☑ How to fix:
• Apply non-linear transformation to the dependent/independent variable.
1/5
2. Multivariate Normality
We assume that the targets and errors are normally distributed.
!How to check:
• Use Q-Q plot
• Shapiro-Wilk Test
• Kolmogorov-Smirnov Test
• Check skewness =0 and Kurtosis < 3
☑ How to fix:
• Verify that outliers aren't impacting the distribution.
• Apply non-linear transformation on dependent/independent variables.
2/5
3. No Auto-correlation
No autocorrelation in error terms, we don't want that because it tends to underestimate the true standard error.
!How to check:
• Durbin Watson Test,
[0,2] (+ve AC), [2] (No AC), [2,4](-ve AC)
3/5
4. No Multi-collinearity
None of the features are highly correlated to each other. Because we are then not able to distinguish the individual effects of each of the independent variables.
!How to check:
• Variance Inflation Factor (VIF)
VIF = 1, no correlation
VIF > 5, indicates high multicollinearity generally.
☑ How to fix:
• Drop the features that are highly correlated.
• Merge the collinear variables using a linear combination or some other method.
4/5
5. Homoscedasticity
There should be constant variance in the error terms. Generally arises due to the presence of outliers. These values get too much weight and influence the model's performance.
!How to check:
• Observe Residual vs Fitted values plot, shouldn't show funnel shape.
• Breush-Pagan Test
• Cook-Weisberg Test
• White general Test
☑ How to fix:
• Transform the dependent variable.
• Redefine the independent variable.
• Use weighted regression.
5/5
For more such content on machine learning and programming.
Consider following @capeandcode.