| Abdullah Al Mahmud | www.statmania.info |
Why needed?
## mpg cyl disp hp drat
## Mazda RX4 21.0 6 160.0 110 3.90
## Mazda RX4 Wag 21.0 6 160.0 110 3.90
## Datsun 710 22.8 4 108.0 93 3.85
## Hornet 4 Drive 21.4 6 258.0 110 3.08
## Hornet Sportabout 18.7 8 360.0 175 3.15
## Valiant 18.1 6 225.0 105 2.76
## Duster 360 14.3 8 360.0 245 3.21
## Merc 240D 24.4 4 146.7 62 3.69
## mpg cyl disp
## Min. :14.30 Min. :4.0 Min. :108.0
## 1st Qu.:18.55 1st Qu.:5.5 1st Qu.:156.7
## Median :21.00 Median :6.0 Median :192.5
## Mean :20.21 Mean :6.0 Mean :222.2
## 3rd Qu.:21.75 3rd Qu.:6.5 3rd Qu.:283.5
## Max. :24.40 Max. :8.0 Max. :360.0
\(AM=\bar x=\frac{\sum x}{n}\)
If there are frequencies or weights
\(\bar x=\frac{\sum f_i x_i}{\sum f_i} \space or \space \frac{\sum w_i x_i}{w_i=n}\)
## [1] 10 55 38 48 51 25 14 44 23 22
## [1] 33
Find AM: 2, 2, 3, 3, 5, 5, 5, 8, 8, 9
There are 2 ways.
\(\bar x = \frac{2+2+...+9}{10}\)## [1] 5
For grouped data
Working hours (x) |
Employee (f) |
fx |
---|---|---|
2 | 2 | 4 |
3 | 2 | 6 |
5 | 3 | 15 |
8 | 2 | 10 |
9 | 1 | 9 |
\(\sum f =10\) | \(\sum fx = 50\) |
\(\therefore \bar x = \frac{\sum fx}{\sum f} =\frac{50}{10}=5\)
Suppose, different judges give different scores, but not all evaluation has same weight.
Judge | Rating (x) |
Weight (w) |
wx |
---|---|---|---|
1 | 8 | 2 | 16 |
2 | 7 | 3 | 21 |
3 | 4 | 5 | 20 |
4 | 5 | 1 | 5 |
5 | 7 | 3 | 21 |
\(\sum w_i = 14\) | \(\sum w_ix_i = 83\) |
\(\therefore \bar x = \frac{\sum w_ix_i}{\sum w_i}\)
\(\frac{83}{14}\)
## [1] 5.93
Calculate the mean in a smart way
## [1] 1009 1037 1047 1024 1013 1043
Subtract a number from all, say 1020
## [1] "The new values are"
## [1] -11 17 27 4 -7 23
## [1] "Mean of y is 8.83"
## [1] "Mean of x is 1028.83"
Consider the values: 1005, 1010, 1015
If 1000 is subtracted: 5, 10, 15
If again divided by 5: 1, 2, 3
Converted Mean = 2
Original Mean = \(2 \times 5 + 1000=1010\)
x = 1005, 1010, 1015
\(GM=(x_1 \times x_2 \times ... \times x_n)^{(1/n)}\) or \(GM=(x_1^{f_1} \times x_2^{f_2} \times ... \times x_n^{f_n})^{(1/\sum f_i)}\)
Find GM: 2, 4, 6
3.63
Try this one: 20020, 30080, 50086, 40130
“An admirable artifice which, by reducing to a few days the labour of many months, doubles the life of the astronomer, and spares him the errors and disgust inseparable from long calculations.”
— Pierre-Simon Laplace
x = 20020, 30080, 50086, 40130
33168.96
Marks | # Students |
---|---|
10-12 | 4 |
12-14 | 5 |
14-16 | 3 |
16-18 | 5 |
18-20 | 7 |
20-22 | 2 |
Make a table using these columns: \(x_i, f_i, logx_i, f_ilogx_i\)
15.5865773
Not affected by outliers
x = 5, 10, 15, 20, 100, 1000
log(x) = 0.7, 1, 1.18, 1.3, 2, 3
Less affected by sample fluctuation
Suitable for further analysis
What if one or some x = 0?
What if one or some x < 0?
S = 150 km
\(v_1=\space 10 km/h, v_2=\space 15 km/h, v_3=\space 20 km/h\)
What is the average speed?
Formula: Reciprocal of Mean of \(\frac{1}{x_i}\)
Reciprocal of \(\frac{\frac{1}{x_1}+\frac{1}{x_2}+...+\frac{1}{x_n}}{n}\)
Thus, \(HM = \frac{n}{\sum \frac{1}{x_i}}\)
Calculate: 2, 4, 8
3.43
For grouped data
Suppose, a bus travels 10 km at 10 kph, another 15 km at 20 kph, and another 20 km at 25 kph. What is the average speed.
HM \(\rightarrow\) consider distances as weights
AM \(\rightarrow\) consider times as weights
Time, \(t=\frac d v=\) 1, 0.75, 0.8
A passerby travels 10 km at 20 kph, 5 km at 15 kph, and 4 km at 12 kph. What is the average speed? (Use weighted HM)
Here, distances are different. Consider them weights.
\(QM=\sqrt{\frac{x_1^2+x_2^2+...+x_n^2}{n}}=\sqrt{\frac{\sum x_i^2}{n}}\)
x = 4,5,6,8,9,11,16
y = 4,5,6,8,9,11,16,19
\(Q_1= \frac{(n+1)}{4}th \space\)
\(Q_2, Q_3=?\)
Find \(Q_1, Q_2, Q_3\)
X = 4, 6, 7, 10, 12, 13, 14, 15, 16, 17, 19
For odd n,
\(A_i= \frac{i \times (n+1)}{k}th \space value\)
For even n,
\(A_i=\frac{\frac{i \times n}{k}th+(\frac{i\times n}{k}+1)th}{2}\)
where, k = no. of partitions
For median, for example, k = 2.
X = 4, 6, 7, 10, 12, 13, 14, 15, 16, 17, 19
Find \(D_3, D_8, P_{17}, P_{56}, P_{93}\)
Age | # Employees | CF |
---|---|---|
20-24 | 40 | |
25-29 | 60 | |
30-34 | 200 | |
35-39 | 180 | |
40-44 | 150 | |
45-49 | 110 | |
50-54 | 175 | |
55-59 | 60 | |
60-64 | 25 |
Median: \(Me = L + \frac{\frac{n}{2}-F_c}{f_m}\times c\)
Mode: \(M_o=L+\frac{\Delta_1}{\Delta_1+\Delta_2}\times c\)
Quartiles: \(Q_i = L + \frac{\frac{in}{4}-F_c}{f}\times c\)
Find AM, Me, Mo, Q, and \(P_{30}\)
Property | AM | GM | HM | Median | Mode |
---|---|---|---|---|---|
Formula | Easy to understand |
Easy |
Easy |
Easy |
Easy |
Considers values |
All | All | All | Middle term(s) | Highest frequency |
Computaion | Easy | Easy |
Easy |
Easy | Easy |
Effect of Outliers |
Highly affected | Less affected than AM |
Less affected than GM |
Unaffected | Can be highly affected |
Effect of Sampling Flcutuation |
Less affected | Less affected | Less affected | Can be highly affected |
Can be highly affected |
Suitability for further analysis |
Possible | Possible | Possible | Possible |
Possible |
If \(x_1=x_2=x_3\); prove it
\[\sum_{i=0}^n (x_i-\bar x)=0\]
\[\sum_{i=0}^n f_i(x_i-\bar x)=0\] ## Theorem 03
\[\sum_{i=1}^n (x_i-\bar x)^2 \lt \sum_{i=1}^n (x_i-a)^2; a\ne\bar x\]
\[\sum_{i=1}^n f_i(x_i-\bar x)^2 \lt \sum_{i=1}^n f_i(x_i-a)^2; a\ne\bar x\]
Prove that AM depends on origin and scale Use frequency as well i.e,
If the GM of \(n_1\) va;ues is \(G_1\), and of \(n_2\) values is \(G_2\), show GM of \(n_1+n_2\) values is \(G=\sqrt{G_1G_2}\)
Let, the numbers be a, b
\(\therefore AM = \frac{a+b}{2}\)
\(GM = \sqrt{ab}\)
\(HM = \frac{2}{\frac 1 a +\frac 1 b}\)
We know, \[\begin{eqnarray} & &(a-b)^2\ge 0 \nonumber \\ & \Rightarrow & (a+b)^2-4ab \ge 0 \nonumber \\ & \Rightarrow & (a+b)^2 \ge 4ab \nonumber \\ & \Rightarrow & (a+b) \ge 2 \sqrt{ab} \nonumber \\ & \Rightarrow & \frac{a+b} 2 \ge \sqrt{ab} \nonumber \\ & \Rightarrow & AM \ge GM \nonumber \\ \end{eqnarray}\]
Similarly, \[\begin{eqnarray} & &(\frac{1}{a}-\frac{1}{b})^2\ge 0 \nonumber \\ & \Rightarrow & (\frac{1}{a}+\frac{1}{b})^2 -4 \cdot \frac 1 a \cdot \frac 1 b\ge 0 \nonumber \\ & \Rightarrow & (\frac{1}{a}+\frac{1}{b})^2\ge \frac 4 {ab} \nonumber \\ & \Rightarrow & (\frac{1}{a}+\frac{1}{b}) \ge \frac 2 {\sqrt{ab}} \nonumber \\ & \Rightarrow & \sqrt{ab}(\frac{1}{a}+\frac{1}{b}) \ge 2 \nonumber \\ & \Rightarrow & \sqrt{ab} \ge \frac{2}{(\frac{1}{a}+\frac{1}{b})} \nonumber \\ & \Rightarrow & GM \ge HM \nonumber \\ \end{eqnarray}\]
For two non-zero positive numbers, \(AM \times HM =(GM)^2\)
Mean and Median of first n natural numbers are \(\frac {n+1} 2\)
If \(\bar x_1\) and \(\bar x_2\) are means of 2 data sets of sizes \(n_1\) and \(n_2\), respectively, the combined mean is \(\bar x_c=\frac{n_1 \bar x_1+n_2 \bar x_2}{n_1+n_2}\)
If \(u=x+y, \bar u=\bar x + \bar y\); if \(n_1=n_2=n\)
Given \(u=x+y\)
\[\begin{eqnarray} \bar u &=& \frac{\sum u}{n} \nonumber \\ &=& \frac{\sum (x+y)}{n} \nonumber \\ &=& \frac{\sum x}{n}+ \frac{\sum y}{n} \nonumber \\ &=& \bar x + \bar y \nonumber \\ \end{eqnarray}\]
For equal number of observations, GM of two variables is equal to the product of their individual means.
\(GM=Antilog(\frac{\sum \log x_i}{n})\) or \(Antilog(\frac{\sum f_i \log x_i}{\sum f_i})\)
If \(y = a + bx, \bar y = a + b \bar x\)
If \(z_i=ax_i+by_i, \bar z=a \bar x + b \bar y\)
We know, \(AM \times HM=(GM)^2\)
If the numbers are \(a, b; a>b\) \(\frac{a+b}{2}=25\) and \(\sqrt{ab}=15\)
Thus, \(a+b=50\), and \(ab=15^2=225\)
\(\therefore (a-b)^2=(a+b)^2-4ab\) \(\Rightarrow a-b =\) 40
The mean of 200 numbers was 50. Later it was revealed that two observations were incorrectly given as 92 and 8, instead of 192 and 88, respectively. Find the correct mean.
\(n=200, \bar x = 200\)
\(\therefore\) Incorrect total, \(\sum x = n \times \bar x=\) 10000
Correct total, \(\sum x'=10,000-92-8+192+88=\) 10180
Correct mean, \(\bar x'=\frac{10180}{200}=\) 50.9
If \(\sum f_i(x_i-k)=0\), what is
the value of k
?
Given \(\sum f_i(x_i-k)=0\)
\(\Rightarrow \sum f_i x_i - k \sum f_i =0\)
\(\Rightarrow k = \frac{\sum f_i x_i}{\sum f_i}\)