45. Statistics – Regression Analysis Intro

Regression analysis is used to predict or forecast the dependent variable Y with new independent variable x (Y = f[x]). This is used to have existing input and output data to analyze a model for the prediction.

In general, there are 5 major types of regression model to predict:

The most common equation to predict linearity where Y = ax + b. a is the change of rate or the slope while b is the y-intercept when independent variable x is 0.

Polynomial is applied when the linear equation is insufficient to estimate the dependent variable. Equations such as quadratic (Y = a + bx + cx²) or cubic (Y = a + bx + cx² + dx³) are considered to have polynomial degrees to estimate more precisely. But the higher polynomial degree the equation has, the more complex for the behavior to estimate the actual output.

Logarithm function where Y = alog(x) + b can be applied to calculate or estimate the population or progress where log is the natural logarithm exponential. This type of model will have a rapid increase at beginning and towards the end of the model it will be saturated.

Where Y = bxª, this is the power method where the number is used to evaluate the data’s acceleration or deceleration within the data. There shall be no negative numbers inside the dataset. in this case b is the correlation factor while a is the rate of the acceleration or deceleration. 

In this model where Y = ae^(bX) is using exponential function estimate the radiation’s decay forecast. As the power method, this data set shall not have any negative value for it.

And the following criteria to decide the regression standard is as follows

This is to describe how well the regression model matches the trend or data’s nature.

For instance, when evaluating the potential college grades, the independent variables cannot be the grade score coming from junior high or high school. Otherwise it will not have the matching distribution for the college scores.

Regression model shall be applied with purpose and calculation method. If individual cannot fully understand or operate the model, then the value of regression model will diminish. Therefore find the appropriate model which is suitable for the population is more appropriate to illustrate the trend.

If the regression model is heavily rely on the variance of sampling. Then trying to minimize the variation coming from sampling is more critical for the correct distribution. 

 

In general, the linear regression is less likely getting impacted for sampling variance compared with polynomial or other complex regression.

If regression model contains difficult variables, then it will not maximize the value of the regression model. Therefore, select the correct and appropriate independent variable will be the correct way to illustrate.

For regression models, the following methodologies can be evaluated by the following steps.

This is to use the existing values (displayed output and input) to estimate the equation or regression model. 

This is to evaluate whether the slope of the regression is sufficient for the estimation of regression model. 

Use ANOVA method to calculate the effectiveness of regression while verifying the coefficient of determination (R square value) to see the correlation and matching level between regression and actual data.

This is to finalize the average’s error based on the estimated regression line. Also helps to evaluate the estimate value’s confidence interval zone.

The following slides are the equation setup pages before examples are given:

The following example is the same question data set which illustrates the entire steps from step 1 to step 4.

Regression Estimate Calculation

Regression Estimate Effectiveness Calculation

Regression Estimate Effectiveness and R2 Calculation

Estimated Mean’s Error Calculation & Interval

Estimated Mean’s Error Calculation & Interval Extended

Share your thoughts