# ANOVA & Regression Intro

This particular post will briefly introduce one way ANOVA, regression introduction and multiple regression.

**One Way ANOVA**

Analysis of Variance (ANOVA) is generally applied to** process more than 2 groups (categories)’ of continuous variable comparison**, and this method is used to evaluate whether the **compared group averages are statistically similar or different**.

For ANOVA, the critical elements are listed below to calculate the statistical significance is based on the following parameters.

For the multiple ANOVA example, please refer to the Gage R&R demonstration for the multiple ANOVA setup.

**Regression Introduction**

Regression analysis is used to predict or forecast the **dependent variable Y** with new **independent variable x** (**Y = f[x]**). This is used to have existing input and output data to analyze a model for the prediction.

In general, there are 5 major types of regression model to predict:

The most common equation to predict linearity where **Y = ax + b**. **a is the change of rate or the slope** while **b is the y-intercept **when **independent variable x is 0**.

Polynomial is applied when the linear equation is insufficient to estimate the dependent variable. Equations such as quadratic (**Y = a + bx + cx²**) or cubic (**Y = a + bx + cx² + dx³**) are considered to have polynomial degrees to estimate more precisely. But the higher polynomial degree the equation has, the more complex for the behavior to estimate the actual output.

Logarithm function where **Y = alog(x) + b** can be applied to calculate or estimate the population or progress where log is the natural logarithm exponential. This type of model will have a rapid increase at beginning and towards the end of the model it will be saturated.

Where **Y = bxª**, this is the power method where the number is used to evaluate the data’s acceleration or deceleration within the data. There shall be no negative numbers inside the dataset. in this case b is the correlation factor while a is the rate of the acceleration or deceleration.

In this model where **Y = ae^(bX)** is using exponential function estimate the radiation’s decay forecast. As the power method, this data set shall not have any negative value for it.

And the following criteria to decide the regression standard is as follows

This is to describe how well the regression model matches the trend or data’s nature.

For instance, when evaluating the potential college grades, the independent variables cannot be the grade score coming from junior high or high school. Otherwise it will not have the matching distribution for the college scores.

Regression model shall be applied with purpose and calculation method. If individual cannot fully understand or operate the model, then the value of regression model will diminish. Therefore find the appropriate model which is suitable for the population is more appropriate to illustrate the trend.

If the regression model is heavily rely on the variance of sampling. Then trying to minimize the variation coming from sampling is more critical for the correct distribution.

In general, the linear regression is less likely getting impacted for sampling variance compared with polynomial or other complex regression.

If regression model contains difficult variables, then it will not maximize the value of the regression model. Therefore, select the correct and appropriate independent variable will be the correct way to illustrate.

The following are example of regression calculation setup based on different occassions.

**Multivariate Regression**

Multiple regression’s approach is similar to the linear regression. Except the matrix formulation will be applied since it’s a linear set of equations. And for the bold letters, this is to indicate it’s in the matrix form.

This is to use the existing values (displayed output and input) to estimate the equation or regression model.

This is to evaluate whether the slope of the regression is sufficient for the estimation of regression model.

Use ANOVA method to calculate the effectiveness of regression while verifying the coefficient of determination (R square value) to see the correlation and matching level between regression and actual data.

This is to finalize the average’s error based on the estimated regression line. Also helps to evaluate the estimate value’s confidence interval zone.

The following illustration will demonstrate the full multivariate regression with given examples: