1.6 Correlation & Regression

1.6.1 Why This Chapter is Important

We knew the universe is expanding from the knowledge of this chapter!

We learn in this chapter

  • How to make bread without wheat/flour

1.6.2 Scatter Plot

1.6.3 Sequence

Scatter Plot \(\rightarrow\) Correlation \(\rightarrow\) Regression

Scatter Plot Correlation Regression
Preliminary idea about relationship Measures linear relationship Measures Influence
Either variable can be independent (usually) Does not clarify dependency Predicts dependent variable based on independent one.

1.6.4 Correlation

Linear relationship between two variables

Corrleation, \(r = \frac{\sum (x_i - \bar x)(y_i - \bar y)}{\sqrt{\frac{\sum(x_i - \bar x)^2}{n}\frac{\sum(y_i - \bar y)^2}{n}}}; -1 \le r \le 1\)

  • \(r = \frac{Cov(x,y)}{\sigma_x \sigma_y}\)

  • Compare with \[\sigma ^2 = \sum_{i=1}^n \frac{(x_i-\bar x)^2}{n}\]

1.6.5 Scatter Plot And Correlation

\(r^2=R^2 \rightarrow\) Coefficient of determination

\(R^2 = 80\% \rightarrow\) 80% of total variation in Y (say, brightness of stars) can be explained by X (say, distance).

1.6.6 r: Estimating Mechanism

Make a table with columns for

  • \((x_i-\bar x)\)
  • \((y_i-\bar y)\)
  • \((x_i-\bar x)(y_i-\bar y)\)
  • \((x_i-\bar x)^2\)
  • \((y_i-\bar y)^2\)

Then sum them and put in the formula

1.6.7 Example of r

1.6.8 Features of r

  • Independent of origin and scale
  • \(-1 \le r \le 1\)
  • \(r = \sqrt{b_{yx} \cdot b_{xy}}\) (Concerning GM of regression coeff)
  • \(\frac{b_{yx}+b_{xy}}{2} \ge r\) (About AM)
  • \(r = 0 \rightarrow\) no linear relationship

1.6.9 Rank Correlation

Competitor Judge_1 Judge_2 rank_1 rank_2
1 20 15 1 4
2 18 20 3 1
3 16 14 5 5
4 17 13 4 6
5 15 18 6 2
6 12 10 9 8
7 11 17 10 3
8 19 9 2 9
9 14 12 7 7
10 13 8 8 10

Coefficient, \(\rho = 1- \frac{6 \sum d_i^2}{n(n^2-1)}\)

1.6.10 Linear Equation/ Straight Lines

\(Y = c + mx;\) m is slope c is intercept

\(m = \frac{dy}{dx} = tan \theta=\) Change in y due to change in x.

Bread without sour or wheat!

1.6.11 Purity of Coefficients

  • \(r = \frac{\sum (x_i - \bar x)(y_i - \bar y)}{\sqrt{\frac{\sum(x_i - \bar x)^2}{n}\frac{\sum(y_i - \bar y)^2}{n}}}\)
  • \(\rho = 1- \frac{6 \sum d_i^2}{n(n^2-1)}\)
  • b or \(\beta\)