The simplest method in statistics used to model and predict the relationship between continuous variables.
Simple Linear Regression
Section titled “Simple Linear Regression”Linear regression involving 2 variables, dependent (response) and independent (explanatory).
Indications of a Linear Relationship
Section titled “Indications of a Linear Relationship”- Scatter Diagram
Plots observed pairs to visualize the relationship. - Correlation Coefficient
Measures the strength and direction of the linear relationship.
General population model:
For an individual observation:
where
- = intercept (value of when )
- = slope (rate of change of with respect to )
- = random error, assumed
Coefficient of Determination
Section titled “Coefficient of Determination”Shows how much of the variance in () is explained by (). Denoted by .
Error Sum of Squares
Section titled “Error Sum of Squares”Denoted by .
Estimation of Parameters
Section titled “Estimation of Parameters”Suppose the fitted regression line is . Finding that minimize is the goal. Least Squares Method is used here.
By setting partial derivatives to zero gives the normal equations:
Solving gives:
Alternate form using deviations:
Sampling Distribution of Beta
Section titled “Sampling Distribution of Beta”Under the normal error assumption :
where .
When is unknown, estimate it using:
A confidence interval for the true slope is:
Hypothesis Testing on the Regression Coefficient
Section titled “Hypothesis Testing on the Regression Coefficient”To test whether significantly predicts :
Test statistic:
If , reject , which means the relationship between X and Y is statistically significant.
Analysis of Variance (ANOVA) for Regression
Section titled “Analysis of Variance (ANOVA) for Regression”Tests whether the regression line fits the data well.
| Source of Variation | Sum of Squares (SS) | df | Mean Square (MS) |
|---|---|---|---|
| Regression (RSS) | |||
| Error (ESS) | |||
| Total (TSS) |
Here:
- : Actual observed value of the dependent variable for observation
- : Predicted value of from the regression line
- : Mean of all observed values (overall average)
If RSS is large relative to ESS, the model fits well. Computed by F-ratio.
F-ratio
Section titled “F-ratio”Decision Rule
Section titled “Decision Rule”
Reject if .