# Concept of dispersion

## What is Dispersion?

• Which plot has the largest dispersion?
• Which plot has the smallest dispersion?

## Why?

To

• estimate reliability
• compare variability of several sets of data
• enable further analysis
• have a perception of probability

## Criteria of A Good Measure

• Well-defined
• Easy to understand and compute
• Based on all observations
• Suitable for for further mathematical/statistical analysis
• Less affected by sample fluctuation
• Unaffected by outliers

## Range

Ungrouped Data Range, $$R = X_H - X_L$$

Grouped Data Range, $$R = R_u - R_l$$

$$L_u$$ = Upper boundary of the highest class

$$L_l$$ = Lower boundary of the lowest class

$$X=$$ 14, 10, 2, 25, 21, 9, 19, 27, 16, 13, 12, 7, 20, 18, 17

• Range = 25

• Simple & quick
• Influenced by outliers
• Not suitable for for further mathematical analysis
• Cannot be computed for open-ended distribution

## Mean Deviation (MD)

Ungrouped Data

$$MD(k)=\frac{\sum_{i=1}^n |x_i-k|} n$$, not $$MD(k)=\frac{|\sum_{i=1}^n(x_i-k)|} n$$; Beware!

Grouped Data $$MD(k)=\frac{\sum_{i=1}^n f_i|x_i-k|} n$$

k = $\begin{cases} \bar x, & \text{for MD about mean} \\ \tilde x, & \text{for MD about median} \\ Mo, & \text{for MD about Mode} \end{cases}$

## Compute MD

Compute about mean, median and mode

• X = 14, 10, 2, 25, 21, 9, 19, 27, 16, 13, 12, 7, 20, 18, 17

• Simple
• Unaffected by outliers
• Useful for symmetric data only
• Not suitable for for further mathematical analysis

## Variance and SD

### Variance

$\sigma ^2 = \sum_{i=1}^n \frac{(x_i-\bar x)^2}{n}$

$$\quad$$ = $$\sum \frac{x_i^2}{n}-(\frac{\sum x_i}{n})^2$$

$$\quad$$ = Mean of square - square of mean

### Standard Deviation ($$\sigma$$)

Positive square root of variance

• $\sigma = \sqrt{\sum_{i=1}^n \frac{(x_i-\bar x)^2}{n}}$
• For grouped data?

## Variance Estimation

$$x$$ $$x^2$$
12 144
11 121
3 9
$$\sum x = 26$$ $$\sum x^2 = 274$$

$$\sigma^2$$=Mean of square - square of mean

$$\quad$$ =$$\bar {x^2}-{\bar x}^2$$

• $$\therefore \sigma^2 = \frac{274}{3}-(\frac{26}{3})^2 =$$ 24.333
• $$\sigma =$$ 4.933

## Thoughts on Variance?

### Think

• What is the unit of variance
• What is the unit of sd?
• Why is variance determined before sd?
• What do variance and sd mean?

• Well-defined
• Less affected by sample fluctuation
• Useful for further mathematical analysis
• Measures consistency
• Difficult to compute
• Affected by outliers

## Variance Example

X = 29, 32, 21, 34, 31, 35, 30, 22

Find variance and stand deviation

• Variance, $$\sigma^2=$$ 26.786
• SD, $$\sigma$$ = 5.175

## Quartile Deviation(QD)

$$QD=\frac{Q_3-Q_1} 2$$

$$Q_3-Q_1$$ is called Interquartile range.

• Simple
• Unaffected by outliers
• Can work with open-ended distribution
• Not based on all observations
• Not suitable for for further mathematical analysis

## Coefficients

- Coefficient of Range,$$CR=\frac{X_H-X_L}{X_H+X_L}\times 100= \frac{\text{Range}}{X_H+X_L}\times 100=\frac{X_u-X_l}{X_u+X_l}\times 100$$ (grouped)

• Coefficient of Mean Deviation, $$CMD=\frac{MD(\bar x)}{\bar x} \times 100$$ (about mean, median and mode similarly)
• Coefficient of Variance, $$CV=\frac{\sigma}{\bar x} \times 100$$
• Coefficient of Quartile Deviation, $$CQD=\frac{Q_3-Q_1}{Q_3+Q_1} \times 100$$

## Compute Coefficients

X = 29, 32, 21, 34, 31, 35, 30, 22

• CR = 25
• $$CMD(\bar x)$$ = 13.462
• CV = 17.694
• CQD = 8.787

## SD vs CV

• 60 students earned GPA 5 from VNC
• 45 students earned GPA 5 from Udayan college.

Which college is better?

• Average batting score of Soumya = 35
• Average batting score of Mushfiq = 34

Who is better?

## Use of CV

• For estimating skewness, correlation etc.
• A measure of consistency/performance

## Properties of SD

• Depends on scale, but not on origin (prove)
• For two unequal observations, $$SD=\frac R 2$$; where $$R = Range$$
• $$MD \le SD$$
• For n positive observations, $$\bar X \sqrt{n-1}\ge \sigma$$
• For the first n natural numbers, $$\sigma = \sqrt{\frac{n^2-1}{12}}$$
• For two positive observations, $$AM>SD$$
• For n number of unequal values, $$\sigma \lt R$$
• $$SD \lt \text{Root Mean Square Deviation}$$

## Is $$\sigma^2$$ alwyas $$\gt \sigma$$

Exceptions

We know, $$\sigma=\sqrt \sigma^2$$ (N:B: $$-2=-\sqrt 4; -2\ne \sqrt 4$$)

• If $$\sigma^2=1, \sigma =1$$
• If $$\sigma^2 \lt 1, \sigma ^2 < \sigma$$
• Example: $$\sigma^2=0.05, \sigma = \sqrt{0.05} = 0.2236068$$

## CV is a Pure Number

• No Unit
• Absolute number

## Minimum Value of $$\sigma$$

$$\sigma \ge 0$$

$$\therefore$$ Least value of $$\sigma$$ is 0.

$$\Rightarrow \frac{\sum(x_i-\bar x)^2}{n} = 0$$

$$\Rightarrow \sum(x_i-\bar x)^2 = 0$$

$$\Rightarrow (x_1-\bar x)^2 + (x_2-\bar x)^2 + \cdots + (x_n-\bar x)^2= 0$$

$$\therefore (x_1-\bar x)^2 =0, (x_2-\bar x)^2=0, (x_n-\bar x)^2=0$$

$$\Rightarrow x_1=\bar x, x_2=\bar x, x_n=\bar x$$

$$\Rightarrow x_1=x_2= \cdots =x_n$$

$$\therefore$$ SD is least (i.e., 0) when all values are equal.

## Comparison with Mean

Subject Bangla English Mathematics Statistics Average
Student X 70 70 80 72 72.5
Student Y 98 96 48 50 72.5

Who is better?

# Theorems

## $$\sigma^2$$ Origin-Scale

$$\sigma_x^2=\frac{\sum(x_i-\bar x)^2}{n}$$

Let, $$d_i=\frac{x_i-a}{c}$$ (a = origin, c = scale)

$\begin{eqnarray} &\Rightarrow& x_i=a+cd_i \nonumber \\ &\Rightarrow& \bar x = a + c \bar d \nonumber \\ \end{eqnarray}$

$\begin{eqnarray} \sigma_x^2&=&\frac{\sum(x_i-\bar x)^2}{n} \nonumber \\ &=& \frac{\sum(a+cd_i-a-c \bar d)^2}{n} \nonumber \\ &=& \frac{\sum(cd_i-c \bar d)^2}{n} \nonumber \\ &=& \frac{c^2\sum(d_i-\bar d)^2}{n} \nonumber \\ &=& c^2 \sigma_d^2 \nonumber \\ \end{eqnarray}$

$$\therefore \sigma_x^2=c^2 \sigma_d^2$$

• Similar procedure and outcome for standard deviation

## MD, SD and Range

$$MD=SD=\frac R 2$$ for $$x_1\ne x_2$$ (two unequal observations)

$$\bar x = \frac {x_1+x_2} 2$$ and $$R=|x_1-x_2|$$

Mean Deviation,

$\begin{eqnarray} MD &=& \frac{\sum_{i=1}^2 |x_i-\bar x|}{2} \nonumber \\ MD &=& \frac{\sum_{i=1}^2 |x_1-\bar x|+|x_2-\bar x|}{2} \nonumber \\ &=& \frac{|x_1-\frac{x_1+x_2} 2|+|x_2-\frac{x_1+x_2} 2|}{2} \nonumber \\ &=& \frac{2|\frac{x_1-x_2}{2}|}{2} \nonumber \\ &=& |x_1-x_2| \nonumber \\ &=& \frac R 2 \nonumber \\ \end{eqnarray}$

Similar process for SD; Start from SD formula

## SD and Range

For two unequal observations, $$SD=\frac R 2$$

$\begin{eqnarray} SD&=&\sqrt{\frac{\sum_{i=1}^2 (x_i-\bar x)^2}{2}} \nonumber \\ &=& \sqrt{\frac{(x_1-\bar x)^2+(x_2-\bar x)^2}{2}} \nonumber \\ &=&\sqrt{\frac{(x_1-\frac{x_1+x_2} 2)^2+(x_2-\frac{x_1+x_2} 2)^2}{2}} \nonumber \\ &=&\sqrt{\frac{(\frac{x_1-x_2}{2})^2+(\frac{x_2-x_1}{2})^2}{2}}\nonumber \\ &=&\sqrt{\frac{2 (\frac{x_1-x_2}{2})^2}{2}}\nonumber \\ &=&\sqrt{\frac{(x_1-x_2)^2} 2}=\frac{|x_1-x_2|}{2} = \frac R 2 \nonumber \\ \end{eqnarray}$

## Variance of First n Natural Numbers

$\begin{eqnarray} \sigma^2 &=& \frac{\sum x_i^2}{n} - (\frac{\sum x_i}{n})^2 \nonumber \\ &=& \frac{1^2+2^2+3^2+ \cdots + n^2}{n} - (\frac{1+2+3+ \cdots + n}{n})^2 \nonumber \\ &=& \frac{\frac{n(n+1)(2n+1)}{6}}{n} - (\frac{\frac{n(n+1)}{2}}{n})^2 \nonumber \\ &=& \frac{(n+1)(2n+1)}{6} - (\frac{n+1}{2})^2 \nonumber \\ &=& \frac{n+1}{2} (\frac{2n+1}{3}-\frac {n+1}{2}) \nonumber \\ &=& \frac{n+1}{2} (\frac{4n+2-3n-3}{6}) = \frac{n+1}{2}(\frac{n-1}{6}) = \frac{n^2-1}{12} \nonumber \\ \end{eqnarray}$

## Mean, SD, and CV

$$\bar X \sqrt{n-1}\ge \sigma$$ or $$CV \lt 100 \sqrt{n-1}$$

## Problem 01

Tow numbers are 10 and 20; Determine Range and CV

## Find SD & MD (3)

Find SD and MD of three observations: -3, 0, 3

Solution

## Missing Numbers for Mean and SD (11)

The mean and SD of 5 observations are 4.4 and $$\sqrt{8.24}$$, respectively. If three of the five observations are 1, 2, and 6, find the other two.

Solution
More

## Converting Series of Natural Numbers

$$scale, c=\text{Common Difference}$$

$$origin, a = \text{Firts observation - c}$$

Example

## Thanks

Visit

https://lecture.statmania.info

to see all lecture slides.