Statistics SamplingDistributions

From ECLR
Revision as of 14:37, 10 September 2013 by Rb (talk | contribs) (Sampling from Non-Normal distributions)
Jump to: navigation, search


Statistics

Suppose that a random sample of size [math]n[/math] has been obtained from the probability distribution of a random variable [math]X.[/math] This gives [math]n[/math] sample random variables [math]X_{1},...,X_{n},[/math] and the sample of data consists of the values [math]x_{1},...,x_{n}[/math] of these random variables. In the section on Basic Descriptive Statistics, the sample mean,

[math]\bar{x}=\dfrac{1}{n}\sum_{i=1}^{n}x_{i}[/math]

was considered an appropriate measure of location, and the sample variance

[math]s^{2}=\dfrac{1}{n-1}\sum_{i=1}^{n}\left( x_{i}-\bar{x}\right) ^{2}[/math]

an appropriate measure of dispersion for a data set. There is a notation change here, compared with Section 2, a notation change that is important.

The sample mean and the sample variance are both examples of a statistic, a quantity which is a function of the data. We now consider them as functions of the values [math]x_{1},...,x_{n}[/math] of the sample random variables [math]X_{1},...,X_{n}.[/math] There are expressions corresponding to [math]\bar{x}[/math] and [math]s^{2}[/math] in terms of these sample random variables [math]X_{1},...,X_{n}:[/math]

[math]\begin{aligned} \bar{X} &=&\dfrac{1}{n}\sum_{i=1}^{n}X_{i}, \\ S^{2} &=&\dfrac{1}{n-1}\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}.\end{aligned}[/math]

These expressions show that [math]\bar{X}[/math] and [math]S^{2}[/math] are functions of the sample random variables [math]X_{1},...,X_{n},[/math] and are therefore random variables themselves, having probability distributions, expected values etc.

  • The probability distributions of [math]\bar{X}[/math] and [math]S^{2}[/math] are called sampling distributions because they depend on a random sampling procedure.

Notice that with this new perspective, the statistic [math]\bar{x}[/math] is a sample value of the sample statistic [math]\bar{X},[/math] and the statistic [math]s^{2}[/math] is a sample value of the sample statistic [math]S^{2}.[/math]

  • It is important to distinguish a statistic and a sample statistic: one is a numerical value, the other a random variable, with a probability distribution referred to as a sampling distribution.

Sampling Distributions

How are sampling distributions found? The following example shows that it can be quite laborious to find them from first principles.

Example

Suppose that a random sample of size [math]2[/math] is to be drawn from the probability distribution of the random variable [math]X[/math], where this is given in the table

Values of [math]X[/math] [math]0[/math] [math]1[/math] [math]2[/math] [math]E\left[ X\right] [/math]
Probability [math]0.2[/math] [math]0.3[/math] [math]0.5[/math] [math]1.3[/math]

The random sample will consist of the independent random variables [math]X_{1},X_{2}[/math], each with this probability distribution. So, for example, the probability of obtaining the sample [math]x_{1}=0,x_{2}=1[/math] (for example), is, by independence,

[math]\begin{aligned} \Pr \left( X_{1}=0,X_{2}=1\right) &=&\Pr \left( X_{1}=0\right) \Pr \left( X_{2}=1\right) \\ &=&0.06.\end{aligned}[/math]

Here, the sample mean is

[math]\bar{X}=\dfrac{1}{2}\left( X_{1}+X_{2}\right) .[/math]

What is its probability distribution?

The strategy is to find out what possible samples can be drawn, what their probability of occurrence is, and the value of [math]\bar{X}[/math] implied by that sample. From this information we can deduce the probability distribution of [math]\bar{X}.[/math] All of these pieces of information are displayed in the table below:

Samples Probability Value of [math]\bar{X}[/math]
[math]\left( 0,0\right) [/math] [math]0.04[/math] [math]0[/math]
[math]\left( 0,1\right) [/math] [math]0.06[/math] [math]0.5[/math]
[math]\left( 0,2\right) [/math] [math]0.1[/math] [math]1[/math]
[math]\left( 1,0\right) [/math] [math]0.06[/math] [math]0.5[/math]
[math]\left( 1,1\right) [/math] [math]0.09[/math] [math]1[/math]
[math]\left( 1,2\right) [/math] [math]0.15[/math] [math]1.5[/math]
[math]\left( 2,0\right) [/math] [math]0.1[/math] [math]1[/math]
[math]\left( 2,1\right) [/math] [math]0.15[/math] [math]1.5[/math]
[math]\left( 2,2\right) [/math] [math]0.25[/math] [math]2[/math]

We can see what the possible values for [math]\bar{X}[/math] are, and the probabilities of the samples which are favourable to each value of [math]\bar{X}[/math]. This leads to a table displaying the probability distribution of [math]\bar{X}:[/math]

Value of [math]\bar{X}[/math] [math]0[/math] [math]0.5[/math] [math]1[/math] [math]1.5[/math] [math]2[/math] [math]E\left[ \bar{X}\right] [/math]
Probability [math]0.04[/math] [math]0.12[/math] [math]0.29[/math] [math]0.3[/math] [math]0.25[/math] [math]1.3[/math]

It is easily checked that the probabilities add up to 1, and that the expected value calculation for [math]\bar{X}[/math] is correct.

An important aspect of this example is that the expected value of [math]\bar{X}[/math] is equal to the expected value of [math]X[/math], which could here be described as the population mean.

Another Example: the Binomial Distribution

This distribution is described in this Section. Suppose that

  • a (large) population consists of values [math]0[/math] or [math]1[/math],
  • [math]1[/math] indicates (for example) a household in the UK owning a Tablet Computer,
  • the (unknown) population relative frequency of [math]1^{\prime }s[/math] is [math]\pi[/math].

A random variable which can describe this population is one which takes on values [math]0[/math] and [math]1[/math] with probabilities [math]1-\pi [/math] and [math]\pi [/math] respectively. Denote this random variable by [math]X[/math], a Bernoulli random variable, discussed in this Section.

Imagine that the experiment generating the value of a Bernoulli random variable is repeated [math]n[/math] times under identical conditions, in such a way that the potential outcome of one experiment is independent of the other experiments. Then, these repetitions are equivalent to drawing a random sample of size [math]n[/math] from this Bernoulli distribution. If the outcome of an experiment is a value [math]1[/math], call it a “success”.

Each experiment generates the value of a Bernoulli random variable [math]X_{i}[/math], having the same distribution as [math]X[/math]. Let the random variable [math]T[/math] be the total number of successes,

[math]T=\sum_{i=1}^{n}X_{i}.[/math]

As explained here in this situation, the probability distribution of [math]T[/math] is a binomial distribution:

[math]\Pr \left( T=t\right) =\binom{n}{t}\pi ^{t}\left( 1-\pi \right)^{n-t},\;\;\;t=0,...,n,\;\;\;0\lt \pi \lt 1.[/math]

How does this relate to the previous example?

We can use this to deduce the sampling distribution of the sample mean [math]\bar{X}[/math], since it is related very simply to [math]T[/math]:

[math]\bar{X}=\dfrac{1}{n}\sum_{i=1}^{n}X_{i}=\dfrac{T}{n}.[/math]

We can deduce that if [math]T[/math] can take on values [math]0,1,2,...,n-1,n[/math] then [math]\bar{X}[/math] can take on values

[math]0,\dfrac{1}{n},\dfrac{2}{n},...,\dfrac{n-1}{n},1[/math]

and that

[math]\Pr \left( \bar{X}=\dfrac{t}{n}\right) =\Pr \left( T=t\right),\;\;\;\;\;t=0,...,n.[/math]

In principle, we could now try to show that the expected value of [math]\bar{X}[/math] is equal to the expected value of [math]X[/math], (which is [math]\pi)[/math], the population mean. This will follow from the discussion in the next section.

Sampling Distribution of [math]\bar{X}[/math]

Assuming random sampling, we can find the mean and variance of the sampling distribution of [math]\bar{X}[/math], without actually knowing what the sampling distribution is. This is a very useful and important result.

Suppose that a random sample of size [math]n[/math] is drawn from a population with mean [math]\mu [/math] and variance [math]\sigma ^{2}[/math], or equivalently, from the probability distribution of a random variable [math]X[/math] with [math]E\left[ X\right]=\mu [/math], [math]var\left[X\right] =\sigma ^{2}[/math]. Note that, at this stage, we make no assumption about the actual type of distribution for [math]X[/math], just about its first two moments.

Since

[math]\bar{X}=\dfrac{1}{n}\sum_{i=1}^{n}X_{i},[/math]

[math]\bar{X}[/math] is a linear combination of the sample random variables [math]X_{1},...,X_{n}[/math], so that the results for the expectation and the variance of linear combinations of random variables can be used to find [math]E\left[ \bar{X}\right] [/math] and [math]var\left[\bar{X}\right] [/math]. The weights in the linear combination are all the same:

[math]a_{i}=\dfrac{1}{n},[/math]

in the notation of those sections.

In turn, from the properties of random sampling, we know that

[math]\begin{aligned} E\left[ X_{i}\right] &=&E\left[ X\right] =\mu , \\ var\left[ X_{i}\right] &=&var\left[ X\right] =\sigma^{2}.\end{aligned}[/math]

The mean of the sample mean

This heading is deliberately misleading: what is meant precisely is the expected value or expectation of the sampling distribution of the sample mean.

  • In random sampling, the expected value of [math]\bar{X}[/math] is equal to the population mean, [math]\mu [/math].

Proof: (refer to this section),

[math]\begin{aligned} E\left[ \bar{X}\right] &=&E\left[ \dfrac{1}{n}\sum_{i=1}^{n}X_{i}\right] \\ &=&\dfrac{1}{n}\sum_{i=1}^{n}E\left[ X_{i}\right] \\ &=&\dfrac{1}{n}\sum_{i=1}^{n}\mu \\ &=&\mu .\end{aligned}[/math]

The variance of the sample mean[variance]

Random sampling makes the sample random variables [math]X_{1},...,X_{n}[/math] independent and therefore uncorrelated. This is nice as it simplifies the job of obtaining the variance of [math]\bar{X}[/math]. In the case of independence the variance of a weighted sum is a weighted sum of variances. (see this section for a discussion of this rule). The result is that

  • in random sampling, the variance of the sample mean is [math]\dfrac{\sigma^{2}}{n}[/math].

Proof:

[math]\begin{aligned} var\left[ \bar{X}\right] &=&var\left[ \dfrac{1}{n}\sum_{i=1}^{n}X_{i}\right] \\ &=&\sum_{i=1}^{n}var\left[ \dfrac{1}{n}X_{i}\right] \\ &=&\sum_{i=1}^{n}\left( \dfrac{1}{n}\right) ^{2}var\left[ X_{i}\right] \\ &=&\sum_{i=1}^{n}\left( \dfrac{1}{n}\right) ^{2}\sigma ^{2} \\ &=&\dfrac{n\sigma ^{2}}{n^{2}} \\ &=&\dfrac{\sigma ^{2}}{n}.\end{aligned}[/math]

The square root of [math]var\left[ \bar{X}\right] [/math] is the standard deviation of [math]\bar{X}[/math], and this is usually given the specific name of standard error. So far, we have identified population parameters with the parameters of the distribution of the corresponding random variable [math]X[/math]. We can extend this to cover characteristics or parameters of the probability distributions of sample statistics like [math]\bar{X}[/math]. So, [math]var\left[ \bar{X}\right] [/math] is a parameter of the probability distribution of [math]\bar{X}[/math], and so is the standard error,

[math]SE\left[ \bar{X}\right] =\dfrac{\sigma }{\sqrt{n}}.[/math]

Example

In the earlier Example of this Section, we found [math]E\left[ X\right] =1.3[/math], and it is easy to find that

[math]E\left[ X^{2}\right] =\left( 0\right) ^{2}\left( 0.2\right) +\left( 1\right)^{2}\left( 0.3\right) +\left( 2\right) ^{2}\left( 0.5\right) =2.3,[/math]

so that the population variance is

[math]var\left[ X\right] =2.3-\left( 1.3\right) ^{2}=0.61.[/math]

We also found that [math]E\left[ \bar{X}\right] =1.3[/math], and we can calculate from the probability distribution of [math]\bar{X}[/math] that

[math]\begin{aligned} E\left[ \bar{X}^{2}\right] &=&\left( 0\right) ^{2}\left( 0.04\right) +\left(0.5\right) ^{2}\left( 0.12\right) +\left( 1\right) ^{2}\left( 0.29\right)+\left( 1.5\right) ^{2}\left( 0.3\right) +\left( 2\right) ^{2}\left(0.25\right) \\ &=&1.995.\end{aligned}[/math]

This gives

[math]var\left[ \bar{X}\right] =1.995-\left( 1.3\right) ^{2}=0.305[/math]

which is precisely

[math]\dfrac{\sigma ^{2}}{2}=\dfrac{0.61}{2},[/math]

matching the theoretical result exactly, since [math]n=2[/math] here.

Here, the standard error is

[math]SE\left[ \bar{X}\right] =\sqrt{0.305}=0.552.[/math]

Summary and outlook

The results presented above are so important that they need to be stated compactly.

  • If a random sample of size [math]n[/math] is drawn from a population with mean [math]\mu [/math] and variance [math]\sigma ^{2}[/math],
  • the expected value of [math]\bar{X}[/math] is equal to the population mean, [math]\mu: [/math] [math]E\left[ \bar{X}\right] =\mu [/math],
  • the variance of the sample mean is [math]\dfrac{\sigma ^{2}}{n}:[/math] [math]var\left[ \bar{X}\right] =\dfrac{\sigma ^{2}}{n}[/math].

Notice that the variance of [math]\bar{X}[/math] declines, relative to the population variance, as the sample size [math]n[/math] increases. This is due explicitly to the averaging effect contained in the sample mean. Clearly, the same applies to the standard error [math]SE\left[ \bar{X}\right] [/math] as well.

Here we stated that if we know the expected value and the variance of an independently sampled random variable, then we know what the expected value and the variance of the sample mean is. If you think that these results are already pretty cool, then hold you breath, it will get better. Later we will show that without any additional about the original random variable, we will be able to say what type of distribution the sample average will follow! This is certainly worth reading on for!

The Binomial case

Here, [math]\bar{X}[/math] is obtained by random sampling from a Bernoulli distribution, with success probability [math]\pi [/math] - see the Example above. So, if [math]X[/math] has a Bernoulli distribution, the population mean and population variance are

[math]\begin{aligned} E\left[ X\right] &=&\left( 0\right) \left( 1-\pi \right) +\left( 1\right) \left( \pi \right) =\pi , \\ E\left[ X^{2}\right] &=&\left( 0\right) ^{2}\left( 1-\pi \right) +\left(1\right) ^{2}\left( \pi \right) =\pi , \\ var\left[ X\right] &=&E\left[ X^{2}\right] -\left( E\left[ X\right]\right) ^{2}=\pi -\pi ^{2}=\pi \left( 1-\pi \right) .\end{aligned}[/math]

Using the general properties from the above Sections on the mean and variance of a sample mean, it will follow that

[math]E\left[ \bar{X}\right] =\pi ,\ \ \ \ \ var\left[ \bar{X}\right] =\dfrac{\pi \left( 1-\pi \right) }{n}.[/math]

In turn, we can use the relationship

[math]\bar{X}=\dfrac{T}{n}[/math]

to deduce that if [math]T[/math] has a Binomial distribution,

[math]E\left[ T\right] =n\pi ,\ \ \ \ \ var\left[ T\right] =n^{2}var\left[ \bar{X}\right] =n\pi \left( 1-\pi \right) .[/math]

This confirms the results given in the Section on the Moments of distributions.

The Sampling Distribution

It is worth emphasising that the results above have been deduced without knowing the nature of the distribution which has been sampled. Without such information we cannot calculate, for any value [math]x[/math],

[math]\Pr \left( \bar{X}\leqslant x\right) .[/math]

In the initial example of this Section, it was easy to find the probability distribution of [math]\bar{X}[/math] from first principles. In the next example we saw that the nature of the population and its corresponding Bernoulli random variable generated a sampling distribution which was a Binomial probability distribution. So, the sampling distribution of [math]\bar{X}[/math] changes as we change the population probability distribution.

The classical example of the sampling distribution of [math]\bar{X}[/math] is where the population probability distribution is a normal distribution. We discussed previously that a linear combination of normal random variables also has a normal distribution. Since [math]\bar{X}[/math] is such a linear combination, it follows that [math]\bar{X}[/math] also has a normal distribution.

This result is very important. Almost all of the rest of this course depends on this result.

  • If a random sample of size [math]n[/math] is drawn from the distribution [math]X\sim N\left( \mu ,\sigma ^{2}\right) [/math], then

    [math]\bar{X}\sim N\left( \mu ,\dfrac{\sigma ^{2}}{n}\right) .[/math]

    Note that the mean and variance of this distribution is exactly that deduced above without using knowledge of the population probability distribution.

Example

IQ tests are designed to behave as if they are drawings from a normal distribution with mean [math]100[/math] and variance [math]400:[/math] [math]X\sim N\left(100,400\right) [/math]. Suppose that a random sample of [math]25[/math] individuals is obtained. Then,

[math]\bar{X}\sim N\left( 100,\dfrac{400}{25}\right) ,\;\;\;\text{or,\ \ \ }\bar{X}\sim N\left( 100,16\right) .[/math]

We can then calculate, for example,

[math]\begin{aligned} \Pr \left( \bar{X}\lt 90\right) &=&\Pr \left( \dfrac{\bar{X}-100}{4}\lt \dfrac{90-100}{4}\right) \\ &=&\Pr \left( Z\lt -2.5\right) \\ &=&0.0062.\end{aligned}[/math]

Sampling from Non-Normal distributions

In this case, the sampling distribution of [math]\bar{X}[/math] will not be normal. However, if we imagine that the sample size [math]n[/math] is allowed to increase without bound, so that [math]n\rightarrow \infty [/math], we can appeal to a famous theorem (more accurately, a collection of theorems) in probability theory called the Central Limit Theorem, often appreviated as CLT. This states that

  • if [math]\bar{X}[/math] is obtained from a random sample of size [math]n[/math] from a population with mean [math]\mu [/math] and variance [math]\sigma ^{2}[/math], then, irrespective of the distribution sampled,

    [math]\dfrac{\bar{X}-\mu }{SE\left[ \bar{X}\right] }=\dfrac{\bar{X}-\mu }{\left( \sigma /\sqrt{n}\right) }\rightarrow N\left( 0,1\right) \;\;\;\text{as }n\rightarrow \infty .[/math]

    That is, the probability distribution of [math]\dfrac{\bar{X}-\mu }{SE\left[ \bar{X}\right] }[/math] approaches the standard normal distribution as [math]n\rightarrow \infty [/math].

We interpret this as saying that

[math]\dfrac{\bar{X}-\mu }{SE\left[ \bar{X}\right] }=\dfrac{\bar{X}-\mu}{\left( \sigma /\sqrt{n}\right) }\sim N\left( 0,1\right) ,\;\;\;\text{\textbf{approximately}}[/math]

for finite [math]n[/math].

  • An alternative is to say that

    [math]\bar{X}\sim N\left( \mu ,\dfrac{\sigma ^{2}}{n}\right) \;\;\;\text{\textbf{approximately}}[/math]

    for finite [math]n[/math].

The rate at which the standard normal distribution is approached influences the quality of the approximation. This is expected to improve as [math]n[/math] increases, and textbooks usually claim that the approximation is good enough if

[math]n\geqslant 20\;\;\;\text{or\ \ \ }n\geqslant 30.[/math]

The idea that [math]\bar{X}[/math] has an approximate normal distribution as [math]n\rightarrow \infty [/math] is often described as the large sample normality of the sample mean. The textbook claim here is that a “large” sample is at least 20. This is not really reasonable, but is adequate for use in a course like this.

So, in the IQ example above, we can argue that [math]\Pr \left( \bar{X}\lt 90\right) =0.0062[/math] approximately if in fact IQ’s are not normally distributed, but do have population mean [math]100[/math] and population variance [math]400[/math].

Additional Resources

  • Our friends from the Khan Academy have produced a series of clips that illustrate the workings of the Central Limit Theorem. They are here: Clip 1, Clip 2, Clip 3 and Clip 4. An example problem is discussed in this clip
  • A central part of these clips are little demonstrations which you can use yourself from this excellent site: Online Stats Book

Footnotes