Simple linear regression

In statistics, linear regression is a method of estimating the conditional expected value of one variable y given the values of some other variable or variables x.

A linear regression model is typically stated in the form

y=\alpha+\beta*x+\varepsilon.

Usually, we assume x is determinstic. Conditionally on x,

y|x\sim N(\alpha+\beta*x,\sigma_2^2).

However,

y\sim N(\alpha+\beta*\mu_x, \beta^2*\sigma_x^2+\sigma_e^2).

This can be obtained using the following formula:

var(y)=var[E(y|x)] + E[var(y|x)].

var(y_i|x_i)=\sigma_e^2, thus E[var(y_i|x_i)]=\sigma_e^2.

E(y_i|x_i)=\alpha+\beta*x_i, thus

var[E(y_i|x_i)]=var(\alpha+\beta*x_i)=\beta^2*\sigma_x^2.

R square, which represents how much variance in y can be explained by x, is equal to

R^2=\frac{\beta^2*\sigma_x^2}{\beta^2*\sigma_x^2+\sigma_e^2}.

Adjusted R square =1-(1-R^2)\frac{n-1}{n-k-1}.

R sqaure sometimes is used to judge how well x can predict y. Big R suqare means that x is a good predictor of y. Small R square means we may need the other variables to predict y well.

R square does nothing with the model fit. For the simple regression, the F-test is the same with t-test of H_0: \beta=0. If this kind of test is significant, there exists linear relationship between y and x. Whether F/t-test is significant or not is not related to the magnitude of R square. However, if R square is very small, it usually means x is not a good predictor of y.

A related discussion of R square can be found at http://www.statisticalexperts.com/jianxu/2006/10/08/r2-confusion/.

Any comments are welcome.

Leave a Reply

You must be logged in to post a comment.