ECLR - User contributions [en]

MediaWiki:CustomBlockMainNav

2015-07-07T13:06:05Z

Admin:

Probability Conditional

2013-08-09T10:59:22Z

Admin: /* Exercises */

= Conditional Probability =

An important consideration in the development of probability is that of ''conditional probability''. This refers to the calculation of updating probabilities in the light of revealed information. For example, insurance companies nearly always set their home contents insurance premiums on the basis of the postcode in which the home is located. That is to say, insurance companies believe the risk depends upon the location; i.e., the probability of property crime is assessed conditional upon the location of the property. (A similar calculation is made to set car insurance premiums.) As a result, the premiums for two identical households located in different parts of the country can differ substantially.

* In general, the probability of an event, <math>E</math>, occurring ''given'' that an event, <math>F</math>, has occurred is called the ''conditional probability'' of <math>E</math> given <math>F</math> and is denoted <math>\Pr (E|F)</math>.

As another example, it has been well documented that the ability of a new born baby to survive is closely associated with its birth-weight. A birth-weight of less than 1500''g'' is regarded as dangerously low. Consider <math>E=</math> ''birth weight of a baby is less than'' 1500''g'', <math>F=</math> ''mother smoked during pregnancy''; then evidence as to whether <math>\Pr(E|F)>\Pr (E|\bar{F})</math> is of considerable interest.

As a preliminary to the main development, consider the simple experiment of rolling a fair die and observing the number of dots on the upturned face. Then <math>S=\left\{ 1,2,3,4,5,6\right\} </math> and define events, <math>E=\left\{4\right\} </math> and <math>F=\left\{ 4,5,6\right\} ;</math> we are interested in <math>\Pr \left(E|F\right)</math>. To work this out we take <math>F</math> as known. Given this knowledge the sample space becomes restricted to simply <math>\left\{ 4,5,6\right\} </math> and, given no other information, each of these <math>3</math> outcome remains equally likely. So the required event, <math>4</math>, is just one of three equally likely outcomes. It therefore seems reasonable that <math>\Pr (E|F)=\frac{1}{3}</math>.

We shall now develop this idea more fully, using Venn Diagrams with the implied notion of area giving probability. Consider an abstract sample space, denoted by <math>S</math>, with events <math>E\subset S,\,\,F\subset S</math>. This is illustrated in the following Figure. Eventually we will want to construct the conditional probability, <math>\Pr \left( E|F\right)</math>. Sticking with the above example that could be the probability that ''a child is underweight'', given that ''the mother is a smoker''. Two important areas used in the construction of this conditional probability are highlighted as <math>\mathbf{a}</math> and <math>\mathbf{b}</math>:

[[File:ProbCond_venn1.jpg|frameless|400px]]

In general, it is useful to think of <math>\Pr (E)</math> as <math>\frac{area\left( E\right)}{area\left( S\right) }</math>; and similarly for <math>\Pr (F)</math>. The <math>\Pr (E\cap F)</math> could equally be thought of as <math>\frac{area\left( a\right)}{area\left( S\right) }</math>. With this in mind, consider what happens if we are now told that <math>F</math> has occurred. Incorporating this information implies that the effective sample space becomes restricted to <math>S^{*}=F</math>, since <math>F</math> now defines what can happen. This now covers the sample area <math>a+b.</math> On this new, restricted, sample space an outcome in <math>E</math> can only be observed if that outcome also belongs to <math>F</math>, the restricted sample space <math>S^*</math>. And this only occurs in area <math>a</math> which corresponds to the event <math>E\cap F</math>. Thus the event of interest ''now'' is <math>E^{*}=E\cap F,</math> as defined on the ''restricted ''sample space of <math>S^{*}=F</math>.

In order to proceed with the construction of the conditional probability, <math>\Pr \left( E|F\right)</math>, let <math>area(S)=z</math>. Then, since the ratio of the area of the event of interest to that of the sample space gives probability, we have (on this restricted sample space):

<math>\begin{aligned}
\Pr (E|F) &=&\frac{area\left( E\cap F\right) }{area\left( F\right) } \\
&=&\frac{a}{a+b} \\
&=&\frac{a/z}{\left( a+b\right) /z} \\
&=&\frac{\Pr \left( E\cap F\right) }{\Pr \left( F\right) },\end{aligned}</math>

We have shown, for this example how a conditional probability can be expressed as a function of the joint probability <math>\Pr \left( E\cap F\right)</math> and the marginal probability <math>\Pr \left( F\right)</math>. This is a profound result and should be formulated in more general terms:

<ul>
<li>The probability that <math>E</math> occurs, given that <math>F</math> is known to have occurred, gives the '''conditional probability''' of <math>E</math> given <math>F</math>. This is denoted <math>Pr(E|F)</math> and is calculated as
<math>\Pr (E|F)=\frac{\Pr (E\cap F)}{\Pr (F)}</math>
and from the axioms of probability will generate a number lying between 0 and 1, since <math>\Pr (F)\geq \Pr (E\cap F)\geq 0.</math></li>
<li>''Example: ''A Manufacturer of electrical components knows that the probability is 0.8 that an order will be ready for shipment on time and it is 0.6 that it will also be delivered on time. What is the probability that such an order will be delivered on time given that it was ready for shipment on time?
Let <math>R=</math> READY, <math>D=</math> DELIVERED ON TIME. <math>Pr(R)=0.8,Pr(R\cap D)=0.6.</math> From this we need to calculate <math>Pr(D|R),</math> using the above formula. This gives, <math>Pr(D|R)=Pr(R\cap D)/Pr(R)=6/8,\,\,</math>or<math>\,\,75\%</math>.</li></ul>

If we re-arrange the above formula for conditional probability, we obtain the so-called ''multiplication rule of probability ''for ''intersections'' of events:

== Multiplication rule of probability ==

The multiplication rule of probability can be stated as follows:

* <math>\Pr (E\cap F)=\Pr (E|F)\times \Pr (F)</math>

Note that for any two events, <math>E</math> and <math>F,</math> <math>(E\cap F)</math> and <math>(E\cap \bar{F})</math> are mutually exclusive with <math>E=(E\cap F)\cup (E\cap \bar{F})</math>; this has been seen before. So the ''addition rule'' and ''multiplication rule'' of probability together give:

<math>\begin{aligned}
\Pr (E) &=&\Pr (E\cap F)+\Pr (E\cap \bar{F}) \\
&=&\Pr (E|F)\times \Pr (F)+\Pr (E|\bar{F})\times \Pr (\bar{F}).\end{aligned}</math>

This is an extremely important and useful result, in practice, as we shall see shortly.

=== Additional resources ===

Khan Academy

* Another application of this rule [https://www.khanacademy.org/math/probability/independent-dependent-probability/dependent_probability/v/dependent-probability-example-1]

== Statistical Independence ==

If the knowledge that <math>F</math> has occurred does NOT alter our probability assessment of <math>E</math>, then <math>E</math> and <math>F</math> are said to be (statistically) ''independent''. In this sense, <math>F</math> carries no information about <math>E</math>.

<ul>
<li>Formally, <math>E</math> and <math>F</math> are '''independent''' events if and only if
<math>Pr(E|F)=Pr(E)</math>
which, in turn is true'' if and only if ''
<math>Pr(E\cap F)=Pr(E)\times Pr(F).</math></li></ul>

This concept of independence is of enormous importance in practice. Consider the case of lung cancer and its connection to smoking (apologies to all smokers for being picked upon here). The first connection between smoking and lung cancer was made in the 1920s. However, for many decades after the tobacco industry spend a lot of money and effort to convince people that there was no connection between the two. In other words they claimed that the two events are ''independent'', or <math>Pr(Cancer|Smoking)=Pr(Cancer|\cap{Smoking})=Pr(Cancer)</math>. It was then the task of epidemiologists to show otherwise. This was famously and comprehensively achieved by the [http://en.wikipedia.org/wiki/British_Doctors_Study ''British Doctors Study''].

== Bayes’ Theorem ==

One area where conditional probability is extremely important is that of clinical trials - testing the power of a diagnostic test to detect the presence of a particular disease. Suppose, then, that a new test is being developed and let <math>P=</math> ‘''test positive''’ and <math>D=</math> ‘''presence of disease''’, but where the results from applying the diagnostic test can never be wholly reliable. From the point of view of our previous discussion on conditional probability, we would of course require that <math>\Pr \left( P|D\right)</math> to be large; i.e., the test should be effective at detecting the disease. However, if you think about, this is not necessarily the probability that we might be interested in from a diagnosis point of view. Rather, we should be more interested in <math>\Pr \left( D|P\right)</math>, the probability of correct diagnosis, and require this to be large (with, presumably, <math>\Pr (D|\bar{P})</math> being small). Here, what we are trying to attach a probability to is a possible ‘cause’. The observed outcome is a positive test result (<math>P</math>), but the presence or non-presence of the disease is what is of interest and this is uncertain. <math>\Pr (D|P)</math> asks the question ‘''what is the probability that it is the presence of the disease which caused the positive test result''’? (Another recent news-worthy example would be the effect of exposure to depleted uranium on Gulf and Balkan war veterans. Given the presence of lymph, lung or brain cancer in such individuals (<math>P</math>), how likely is that the cause was exposure to depleted uranium weapons (<math>D</math>)? Firstly, is <math>\Pr \left( D|P\right) </math> high or low? Secondly, might there being something else (<math>F</math>) which could offer a “better” explanation, such that <math>\Pr \left( F|P\right) >\Pr \left( D|F\right) </math> ?)

The situation is depicted in the following Figure, in which there are two possible ‘states’ in the population: <math>D</math> (depicted by the lighter shaded area covering the left portion of the sample space) and <math>\bar{D}.</math> It must be that <math>D\cup \bar{D}=S,</math> since any individual in the population either has the disease or does not. The event of an observed positive test result is denoted by the closed loop, <math>P</math>. (Notice that the shading in the diagram is relatively darker where <math>P</math> intersects with <math>D</math>.)

[[File:ProbCond_venn2.jpg|frameless|400px]]

To investigate how we might construct the required probability, <math>\Pr \left(D|P\right)</math>, proceed as follows:

<math>\begin{aligned}
\Pr \left( D|P\right) &=&\frac{\Pr \left( D\cap P\right) }{\Pr (P)} \\
&=&\frac{\Pr (D\cap P)}{\Pr (P\cap D)+\Pr (P\cap \bar{D})},\end{aligned}</math>

since <math>P=(P\cap D)\cup (P\cap \bar{D}),</math> and these are mutually exclusive. From the multiplication rule of probability, <math>\Pr \left( P\cap D\right) =\Pr(P|D)\times \Pr (D),</math> and similarly for <math>\Pr \left( P\cap \bar{D}\right)</math>. Thus

<math>\Pr \left( D|P\right) =\frac{\Pr \left( P|D\right) \times \Pr \left(D\right) }{\Pr \left( P|D\right) \times \Pr \left( D\right) +\Pr (P|\bar{D})\times \Pr \left( \bar{D}\right) },</math>

which may be convenient to work with since <math>\Pr \left( P|D\right) </math> and <math>\Pr\left( P|\bar{D}\right) </math> can be estimated from clinical trials and <math>\Pr\left( D\right) </math> estimated from recent historical survey data.

This sort of calculation (assigning probabilities to possible causes of observed events) is an example of ''Bayes’ Theorem''. Of course, we may have to consider more than two possible causes, and the construction of the appropriate probabilities is as follows.

<ol>
<li>Consider a sample space, <math>S</math>, where <math>E\subset S</math> and <math>A,B,C</math> are three mutually exclusive events (possible causes), defined on <math>S</math>, such that <math>S=A\cup B\cup C</math>. In such a situation, <math>A,B</math> and <math>C</math> are said to form a '''partition''' of <math>S</math>. '''Bayes’ Theorem''' states that:
<math>\Pr (A|E)=\frac{\Pr (E|A)\times \Pr (A)}{\left\{ \Pr (E|A)\times \Pr(A)\right\} +\left\{ \Pr (E|B)\times \Pr (B)\right\} +\left\{ \Pr(E|C)\times \Pr (C)\right\} }.</math></li>
<li>And, more generally, consider a sample space, <math>S</math>, where <math>E\subset S</math> and <math>F_{1},F_{2},...,F_{k}</math> are <math>k</math> mutually exclusive events (possible causes), which form a partition of <math>S:S=\bigcup_{j=1}^{k}F_{j}</math>. '''Bayes’ Theorem''' then states that:
<math>\Pr (F_{j}|E)=\frac{\Pr (E|F_{j})\times \Pr (F_{j})}{\sum_{s=1}^{k}\left\{\Pr (E|F_{s})\times \Pr (F_{s})\right\} }.</math></li></ol>

From the above formula, you should be able to satisfy yourself that <math>\sum_{j=1}^{k}\Pr \left( F_{j}|E\right) =1.</math> If this is not at first clear, consider case (1) and show that <math>\Pr \left( A|E\right) +\Pr \left(
B|E\right) +\Pr \left( C|E\right) =1.</math> The reason for this is that since <math>A</math>, <math>B</math> and <math>C</math> form a partition of <math>S,</math> they must also form a partition of any event <math>E\subset S.</math> In the above conditional probabilities, we are regarding <math>E</math> as the restricted sample space and therefore the probabilities assigned the mutually exclusive events <math>\left( A,B,C\right) </math> which ''cover'' this (restricted) sample space, <math>E</math>, must sum to 1.

<ul>
<li>''Example'': Box A contains 2 red balls. Box B contains 1 red and 1 white ball. Box A and Box B are identical. If a box is selected at random and one ball is withdrawn from it, what is the probability that the selected box was number 1 if the ball withdrawn from it turns out to be red?
Let <math>A</math> be the event of selecting Box A and <math>R</math> the event of drawing a red ball. Require <math>Pr(A|R)</math>.</li></ul>

<math>Pr(A|R)=Pr(A\cap R)/Pr(R);</math>

<math>Pr(A\cap R)=Pr(A)Pr(R|A)=(1/2)\times 1=1/2.</math>

And,

<math>\begin{aligned}
Pr(R) &=&Pr(A\cap R)+Pr(\bar{A}\cap R) \\
&=&\Pr (A)\times \Pr (R|A)\,\,\,\,+\,\,\,\,\Pr (\bar{A})\times \Pr (R|\bar{A}) \\
&=&(1/2)\,\,\,\,+\,\,\,\,(1/2)\times (1/2) \\
&=&3/4.\end{aligned}</math>

Therefore, <math>\Pr (A|R)=(1/2)/(3/4)=2/3</math>.

= Additional resources =

Khan Academy

* A different example that intuitively leads to Bayes Formula [https://www.khanacademy.org/math/probability/independent-dependent-probability/dependent_probability/v/introduction-to-dependent-probability]
* Should you switch the door? A classic problem. [https://www.khanacademy.org/math/probability/independent-dependent-probability/dependent_probability/v/monty-hall-problem]

= Exercises =

You can find examples related to these topics here: [[Probability_Conditional_Exercises]].

= Footnotes =

2013-08-09T09:58:25Z

Admin:

File:ProbCond venn1.jpg

2013-08-09T09:58:11Z

Admin:

Probability Conditional

2013-08-09T09:57:26Z

Admin: Created page with " = Conditional Probability = An important consideration in the development of probability is that of ''conditional probability''. This refers to the calculation of updating ..."

= Conditional Probability =

An important consideration in the development of probability is that of ''conditional probability''. This refers to the calculation of updating probabilities in the light of revealed information. For example, insurance companies nearly always set their home contents insurance premiums on the basis of the postcode in which the home is located. That is to say, insurance companies believe the risk depends upon the location; i.e., the probability of property crime is assessed conditional upon the location of the property. (A similar calculation is made to set car insurance premiums.) As a result, the premiums for two identical households located in different parts of the country can differ substantially.

* In general, the probability of an event, <math>E</math>, occurring ''given'' that an event, <math>F</math>, has occurred is called the ''conditional probability'' of <math>E</math> given <math>F</math> and is denoted <math>\Pr (E|F)</math>.

As another example, it has been well documented that the ability of a new born baby to survive is closely associated with its birth-weight. A birth-weight of less than 1500''g'' is regarded as dangerously low. Consider <math>E=</math> ''birth weight of a baby is less than'' 1500''g'', <math>F=</math> ''mother smoked during pregnancy''; then evidence as to whether <math>\Pr(E|F)>\Pr (E|\bar{F})</math> is of considerable interest.

As a preliminary to the main development, consider the simple experiment of rolling a fair die and observing the number of dots on the upturned face. Then <math>S=\left\{ 1,2,3,4,5,6\right\} </math> and define events, <math>E=\left\{4\right\} </math> and <math>F=\left\{ 4,5,6\right\} ;</math> we are interested in <math>\Pr \left(E|F\right)</math>. To work this out we take <math>F</math> as known. Given this knowledge the sample space becomes restricted to simply <math>\left\{ 4,5,6\right\} </math> and, given no other information, each of these <math>3</math> outcome remains equally likely. So the required event, <math>4</math>, is just one of three equally likely outcomes. It therefore seems reasonable that <math>\Pr (E|F)=\frac{1}{3}</math>.

We shall now develop this idea more fully, using Venn Diagrams with the implied notion of area giving probability. Consider an abstract sample space, denoted by <math>S</math>, with events <math>E\subset S,\,\,F\subset S</math>. This is illustrated in the following Figure. Eventually we will want to construct the conditional probability, <math>\Pr \left( E|F\right)</math>. Sticking with the above example that could be the probability that ''a child is underweight'', given that ''the mother is a smoker''. Two important areas used in the construction of this conditional probability are highlighted as <math>\mathbf{a}</math> and <math>\mathbf{b}</math>:

[[File:ProbCond_venn1.jpg|frameless|600px]]

In general, it is useful to think of <math>\Pr (E)</math> as <math>\frac{area\left( E\right)}{area\left( S\right) }</math>; and similarly for <math>\Pr (F)</math>. The <math>\Pr (E\cap F)</math> could equally be thought of as <math>\frac{area\left( a\right)}{area\left( S\right) }</math>. With this in mind, consider what happens if we are now told that <math>F</math> has occurred. Incorporating this information implies that the effective sample space becomes restricted to <math>S^{*}=F</math>, since <math>F</math> now defines what can happen. This now covers the sample area <math>a+b.</math> On this new, restricted, sample space an outcome in <math>E</math> can only be observed if that outcome also belongs to <math>F</math>, the restricted sample space <math>S^*</math>. And this only occurs in area <math>a</math> which corresponds to the event <math>E\cap F</math>. Thus the event of interest ''now'' is <math>E^{*}=E\cap F,</math> as defined on the ''restricted ''sample space of <math>S^{*}=F</math>.

In order to proceed with the construction of the conditional probability, <math>\Pr \left( E|F\right)</math>, let <math>area(S)=z</math>. Then, since the ratio of the area of the event of interest to that of the sample space gives probability, we have (on this restricted sample space):

<math>\begin{aligned}
\Pr (E|F) &=&\frac{area\left( E\cap F\right) }{area\left( F\right) } \\
&=&\frac{a}{a+b} \\
&=&\frac{a/z}{\left( a+b\right) /z} \\
&=&\frac{\Pr \left( E\cap F\right) }{\Pr \left( F\right) },\end{aligned}</math>

We have shown, for this example how a conditional probability can be expressed as a function of the joint probability <math>\Pr \left( E\cap F\right)</math> and the marginal probability <math>\Pr \left( F\right)</math>. This is a profound result and should be formulated in more general terms:

<ul>
<li>The probability that <math>E</math> occurs, given that <math>F</math> is known to have occurred, gives the '''conditional probability''' of <math>E</math> given <math>F</math>. This is denoted <math>Pr(E|F)</math> and is calculated as
<math>\Pr (E|F)=\frac{\Pr (E\cap F)}{\Pr (F)}</math>
and from the axioms of probability will generate a number lying between 0 and 1, since <math>\Pr (F)\geq \Pr (E\cap F)\geq 0.</math></li>
<li>''Example: ''A Manufacturer of electrical components knows that the probability is 0.8 that an order will be ready for shipment on time and it is 0.6 that it will also be delivered on time. What is the probability that such an order will be delivered on time given that it was ready for shipment on time?
Let <math>R=</math> READY, <math>D=</math> DELIVERED ON TIME. <math>Pr(R)=0.8,Pr(R\cap D)=0.6.</math> From this we need to calculate <math>Pr(D|R),</math> using the above formula. This gives, <math>Pr(D|R)=Pr(R\cap D)/Pr(R)=6/8,\,\,</math>or<math>\,\,75\%</math>.</li></ul>

If we re-arrange the above formula for conditional probability, we obtain the so-called ''multiplication rule of probability ''for ''intersections'' of events:

== Multiplication rule of probability ==

The multiplication rule of probability can be stated as follows:

* <math>\Pr (E\cap F)=\Pr (E|F)\times \Pr (F)</math>

Note that for any two events, <math>E</math> and <math>F,</math> <math>(E\cap F)</math> and <math>(E\cap \bar{F})</math> are mutually exclusive with <math>E=(E\cap F)\cup (E\cap \bar{F})</math>; this has been seen before. So the ''addition rule'' and ''multiplication rule'' of probability together give:

<math>\begin{aligned}
\Pr (E) &=&\Pr (E\cap F)+\Pr (E\cap \bar{F}) \\
&=&\Pr (E|F)\times \Pr (F)+\Pr (E|\bar{F})\times \Pr (\bar{F}).\end{aligned}</math>

This is an extremely important and useful result, in practice, as we shall see shortly.

== Statistical Independence ==

If the knowledge that <math>F</math> has occurred does NOT alter our probability assessment of <math>E</math>, then <math>E</math> and <math>F</math> are said to be (statistically) ''independent''. In this sense, <math>F</math> carries no information about <math>E</math>.

<ul>
<li>Formally, <math>E</math> and <math>F</math> are '''independent''' events if and only if
<math>Pr(E|F)=Pr(E)</math>
which, in turn is true'' if and only if ''
<math>Pr(E\cap F)=Pr(E)\times Pr(F).</math></li></ul>

This concept of independence is of enormous importance in practice. Consider the case of lung cancer and its connection to smoking (apologies to all smokers for being picked upon here). The first connection between smoking and lung cancer was made in the 1920s. However, for many decades after the tobacco industry spend a lot of money and effort to convince people that there was no connection between the two. In other words they claimed that the two events are ''independent'', or <math>Pr(Cancer|Smoking)=Pr(Cancer|\cap{Smoking})=Pr(Cancer)</math>. It was then the task of epidemiologists to show otherwise. This was famously and comprehensively achieved by the [http://en.wikipedia.org/wiki/British_Doctors_Study ''British Doctors Study''].

== Bayes’ Theorem ==

One area where conditional probability is extremely important is that of clinical trials - testing the power of a diagnostic test to detect the presence of a particular disease. Suppose, then, that a new test is being developed and let <math>P=</math> ‘''test positive''’ and <math>D=</math> ‘''presence of disease''’, but where the results from applying the diagnostic test can never be wholly reliable. From the point of view of our previous discussion on conditional probability, we would of course require that <math>\Pr \left( P|D\right)</math> to be large; i.e., the test should be effective at detecting the disease. However, if you think about, this is not necessarily the probability that we might be interested in from a diagnosis point of view. Rather, we should be more interested in <math>\Pr \left( D|P\right)</math>, the probability of correct diagnosis, and require this to be large (with, presumably, <math>\Pr (D|\bar{P})</math> being small). Here, what we are trying to attach a probability to is a possible ‘cause’. The observed outcome is a positive test result (<math>P</math>), but the presence or non-presence of the disease is what is of interest and this is uncertain. <math>\Pr (D|P)</math> asks the question ‘''what is the probability that it is the presence of the disease which caused the positive test result''’? (Another recent news-worthy example would be the effect of exposure to depleted uranium on Gulf and Balkan war veterans. Given the presence of lymph, lung or brain cancer in such individuals (<math>P</math>), how likely is that the cause was exposure to depleted uranium weapons (<math>D</math>)? Firstly, is <math>\Pr \left( D|P\right) </math> high or low? Secondly, might there being something else (<math>F</math>) which could offer a “better” explanation, such that <math>\Pr \left( F|P\right) >\Pr \left( D|F\right) </math> ?)

The situation is depicted in the following Figure, in which there are two possible ‘states’ in the population: <math>D</math> (depicted by the lighter shaded area covering the left portion of the sample space) and <math>\bar{D}.</math> It must be that <math>D\cup \bar{D}=S,</math> since any individual in the population either has the disease or does not. The event of an observed positive test result is denoted by the closed loop, <math>P</math>. (Notice that the shading in the diagram is relatively darker where <math>P</math> intersects with <math>D</math>.)

[[File:ProbCond_venn2.jpg|frameless|600px]]

To investigate how we might construct the required probability, <math>\Pr \left(D|P\right)</math>, proceed as follows:

<math>\begin{aligned}
\Pr \left( D|P\right) &=&\frac{\Pr \left( D\cap P\right) }{\Pr (P)} \\
&=&\frac{\Pr (D\cap P)}{\Pr (P\cap D)+\Pr (P\cap \bar{D})},\end{aligned}</math>

since <math>P=(P\cap D)\cup (P\cap \bar{D}),</math> and these are mutually exclusive. From the multiplication rule of probability, <math>\Pr \left( P\cap D\right) =\Pr(P|D)\times \Pr (D),</math> and similarly for <math>\Pr \left( P\cap \bar{D}\right)</math>. Thus

<math>\Pr \left( D|P\right) =\frac{\Pr \left( P|D\right) \times \Pr \left(D\right) }{\Pr \left( P|D\right) \times \Pr \left( D\right) +\Pr (P|\bar{D})\times \Pr \left( \bar{D}\right) },</math>

which may be convenient to work with since <math>\Pr \left( P|D\right) </math> and <math>\Pr\left( P|\bar{D}\right) </math> can be estimated from clinical trials and <math>\Pr\left( D\right) </math> estimated from recent historical survey data.

This sort of calculation (assigning probabilities to possible causes of observed events) is an example of ''Bayes’ Theorem''. Of course, we may have to consider more than two possible causes, and the construction of the appropriate probabilities is as follows.

<ol>
<li>Consider a sample space, <math>S</math>, where <math>E\subset S</math> and <math>A,B,C</math> are three mutually exclusive events (possible causes), defined on <math>S</math>, such that <math>S=A\cup B\cup C</math>. In such a situation, <math>A,B</math> and <math>C</math> are said to form a '''partition''' of <math>S</math>. '''Bayes’ Theorem''' states that:
<math>\Pr (A|E)=\frac{\Pr (E|A)\times \Pr (A)}{\left\{ \Pr (E|A)\times \Pr(A)\right\} +\left\{ \Pr (E|B)\times \Pr (B)\right\} +\left\{ \Pr(E|C)\times \Pr (C)\right\} }.</math></li>
<li>And, more generally, consider a sample space, <math>S</math>, where <math>E\subset S</math> and <math>F_{1},F_{2},...,F_{k}</math> are <math>k</math> mutually exclusive events (possible causes), which form a partition of <math>S:S=\bigcup_{j=1}^{k}F_{j}</math>. '''Bayes’ Theorem''' then states that:
<math>\Pr (F_{j}|E)=\frac{\Pr (E|F_{j})\times \Pr (F_{j})}{\sum_{s=1}^{k}\left\{\Pr (E|F_{s})\times \Pr (F_{s})\right\} }.</math></li></ol>

From the above formula, you should be able to satisfy yourself that <math>\sum_{j=1}^{k}\Pr \left( F_{j}|E\right) =1.</math> If this is not at first clear, consider case (1) and show that <math>\Pr \left( A|E\right) +\Pr \left(
B|E\right) +\Pr \left( C|E\right) =1.</math> The reason for this is that since <math>A</math>, <math>B</math> and <math>C</math> form a partition of <math>S,</math> they must also form a partition of any event <math>E\subset S.</math> In the above conditional probabilities, we are regarding <math>E</math> as the restricted sample space and therefore the probabilities assigned the mutually exclusive events <math>\left( A,B,C\right) </math> which ''cover'' this (restricted) sample space, <math>E</math>, must sum to 1.

<ul>
<li>''Example'': Box A contains 2 red balls. Box B contains 1 red and 1 white ball. Box A and Box B are identical. If a box is selected at random and one ball is withdrawn from it, what is the probability that the selected box was number 1 if the ball withdrawn from it turns out to be red?
Let <math>A</math> be the event of selecting Box A and <math>R</math> the event of drawing a red ball. Require <math>Pr(A|R)</math>.</li></ul>

<math>Pr(A|R)=Pr(A\cap R)/Pr(R);</math>

<math>Pr(A\cap R)=Pr(A)Pr(R|A)=(1/2)\times 1=1/2.</math>

And,

<math>\begin{aligned}
Pr(R) &=&Pr(A\cap R)+Pr(\bar{A}\cap R) \\
&=&\Pr (A)\times \Pr (R|A)\,\,\,\,+\,\,\,\,\Pr (\bar{A})\times \Pr (R|\bar{A}) \\
&=&(1/2)\,\,\,\,+\,\,\,\,(1/2)\times (1/2) \\
&=&3/4.\end{aligned}</math>

Therefore, <math>\Pr (A|R)=(1/2)/(3/4)=2/3</math>.

= Footnotes =

2013-08-07T21:25:27Z

Admin: Admin uploaded a new version of "File:Prob coin.jpg"

Probability Intro

2013-08-07T21:14:49Z

Admin: /* The addition rule of probability */

= Introducing Probability =

So far we have been looking at ways of summarising samples of data drawn from an underlying population of interest. Although at times tedious, all such arithmetic calculations are fairly mechanical and straightforward to apply. To remind ourselves, one of the primary reasons for wishing to summarise data is so assist in the development of inferences about the population from which the data were taken. That is to say, we would like to elicit some information about the mechanism which generated the observed data.

We now start on the process of developing mathematical ways of formulating inferences and this requires the use of ''probability''. This becomes clear if we think back to one of the early questions posed in this course: ''prior to sampling is it possible to predict with absolute certainty what will be observed''? The answer to this question is ''no''; although it would be of interest to know how ''likely'' it is that certain values would be observed. Or, what is the ''probability'' of observing certain values?

Before proceeding, we need some more tools:

= Venn diagrams =

Venn diagrams (and diagrams in general) are of enormous help in trying to understand, and manipulate probability. We begin with some basic definitions, some of which we have encountered before.

* '''Experiment:''' any process which, when applied, provides data or an outcome; e.g., rolling a die and observing the number of dots on the upturned face; recording the amount of rainfall in Manchester over a period of time.
* '''Sample Space:''' set of possible outcomes of an experiment; e.g., <math>S</math> (or <math>\Omega </math>) <math>=</math> <math>\{1,2,3,4,5,6\}</math>, which is the sample space of rolling a dice. Or <math>S</math> <math>=</math> <math>\{x;x\geq 0\}</math>, which is the sample space of an experiment where the outcomes can be any real non-negative number, or ‘''the set of real non-negative real numbers''’.
* '''Simple Event''': just one of the possible outcomes on <math>S</math>
* '''Event:''' a ''subset'' of <math>S</math>, denoted <math>E\subset S</math>; e.g., <math>E=\left\{ 2,4,6\right\}</math> (i.e. any even number on a dice) or <math>E=\left\{ x;4<x\leq 10\right\} ,</math> which means ‘''the set of real numbers which are strictly bigger than'' <math>4</math> ''but less than or equal to ''<math>10</math>’. Note that an event, <math>E</math>, is a collection of simple events.

Such concepts can be represented by means of the following Venn Diagram:

[[File:Venn_1.jpg|frameless|600px]]

The sample space, <math>S,</math> is depicted as a closed rectangle, and the event <math>E</math> is a closed loop wholly contained within <math>S</math> and we write (in set notation) <math>E\subset S</math>.

In dealing with probability, and in particular the probability of an event (or events) occurring, we shall need to be familiar with '''UNIONS, INTERSECTIONS''' and '''COMPLEMENTS'''.

To illustrate these concepts, consider the sample space <math>S=\{x;x\geq 0\},\,</math> with the following events defined on <math>S,</math> as depicted in Figure 3.2:

<math>E=\{x;4<x\leq 10\},\,F=\{x;7<x\leq 17\},\,G=\{x;x>15\},\,H=\{x;9<x\leq
13\}. </math>

{| class="wikitable"
|-
| (a) Event <math>E</math>: A closed loop
| (b) Union: <math>E\cup F</math>
|-
| [[File:Venn_2a.jpg|frameless|300px]]
| [[File:Venn_2b.jpg|frameless|300px]]
|-
| (c) Intersection: <math>E\cap F</math>
| (d) The Null set/event: <math>E\cap G=\emptyset </math>
|-
| [[File:Venn_2c.jpg|frameless|300px]]
| [[File:Venn_2d.jpg|frameless|300px]]
|-
| (e) Complement of <math>E</math>: <math>\bar{E}</math>
| (f) Subset of <math>F</math>: <math>H\subset F</math> and <math>H\cap F=H</math>
|-
| [[File:Venn_2e.jpg|frameless|300px]]
| [[File:Venn_2f.jpg|frameless|300px]]
|}

* The ''union'' of <math>E</math> and <math>F</math> is denoted <math>E\cup F,</math> with <math>E\cup F=\{x;4<x\leq 17\};</math> i.e., it contains elements (simple events) which are either in <math>E</math> or in <math>F</math> or (perhaps) in both. This is illustrated on the Venn diagram by the dark shaded area in diagram (b).
* The'' intersection'' of <math>E</math> and <math>F</math> is denoted <math>E\cap F,</math> with <math>E\cap F=\left\{ x;7\leq x\leq 10\right\} ;</math> i.e., it contains elements (simple events) which are common to both <math>E</math> and <math>F.</math> Again this is depicted by the dark shaded area in (c). If events have no elements in common (as, for example, <math>E</math> and <math>G</math>) then they are said to be ''mutually exclusive'', and we can write <math>E\cap G=\emptyset ,</math> meaning the ''null set'' which contains no elements. Such a situation is illustrated on the Venn Diagram by events (the two shaded closed loops in (d)) which do not overlap. Notice however that <math>G\cap F\neq \emptyset ,</math> since <math>G</math> and <math>F</math> have elements in common.
* The ''complement'' of an event <math>E,</math> say, is everything defined on the sample space which is not in <math>E.</math> This event is denoted <math>\bar{E}</math>, the dark shaded area in (e); here <math>\bar{E}=\left\{ x;x\leq 4\right\} \cup \left\{ x;x>10\right\}</math>.
* Finally note that <math>H</math> is a sub-set of <math>F;</math> see (f). It is depicted as the dark closed loop wholly contained within <math>F,</math> the lighter shaded area, so that <math>H\cap F=H;</math> if an element in the sample space is a member of <math>H</math> then it must also be member of <math>F.</math> (In mathematical logic, we employ this scenario to indicate that “<math>H</math> implies <math>F</math>”, but not necessarily vice-versa.) Notice that <math>G\cap H=\emptyset </math> but <math>H\cap E\neq \emptyset</math>.

= Probability =

The term ''probability'' (or some equivalent) is used in everyday conversation and so can not be unfamiliar to the reader. We talk of the probability, or chance, of rain; the likelihood of England winning the World Cup; or, perhaps more scientifically, the chance of getting a <math>6</math> when rolling a die. What we shall now do is develop a coherent theory of probability; a theory which allows us to combine and manipulate probabilities in a consistent and meaningful manner. We shall describe ways of dealing with, and describing, uncertainty. This will involve ''rules'' which govern our use of terms like probability.

There have been a number of different approaches (interpretations) of probability. Most depend, at least to some extent, on the notion of relative frequency as now described:

* Suppose an experiment has an outcome of interest <math>E</math>. The ''relative frequency interpretation'' of probability says that assuming the experiment can be repeated a large number of times then the relative frequency of observing the outcome <math>E</math> will settle down to a ''number'', denoted <math>\Pr (E),</math> <math>P(E)</math> or Prob<math>(E),</math> called the '''probability''' of <math>E</math>.

This is illustrated in the next Figure where the proportion of heads obtained after <math>n</math> flips of a fair coin is plotted against <math>n</math>, as <math>n</math> increases; e.g., of the first <math>100</math> flips, <math>55</math> were heads (<math>55\%</math>). Notice that the plot becomes less ‘wobbly’ after about <math>n=220</math> and appears to be settling down to the value of <math>\frac{1}{2}</math>.

[[File:Prob_coin.jpg|frameless|600px]]

Due to this interpretation of probability, we often use observed sample proportions to approximate underlying probabilities of interest; see, for example, Question 4 of Exercise 2. There are, of course, other interpretations of probability; e.g., the subjective interpretation which simply expresses the strength of one’s belief about an event of interest such as whether Manchester United will win the European Cup! Any one of these interpretations can be used in practical situations provided the implied notion of probability follows a simple set of ''axioms'' or ''rules''.

== The axioms of probability ==

There are just ''three ''basic rules that must be obeyed when dealing with probabilities:

<ol>
<li>For any event <math>E</math> defined on <math>S,</math> i.e., <math>E\subset S,\,\,\Pr (E)\geq 0</math>; ''probabilities are non-negative''.</li>
<li><math>\Pr (S)=1;</math> ''having defined the sample space of outcomes, one of these outcomes must be observed''.</li>
<li>If events <math>E</math> and <math>F</math> are mutually exclusive defined on <math>S</math>, so that <math>E\cap F=\emptyset </math>, then <math>\Pr \left( E\cup F\right) =\Pr \left( E\right)+\Pr \left( F\right) .</math> In general, for any set of mutually exclusive events, <math>E_{1},E_{2},\ldots ,E_{k},</math> defined on <math>S:</math>
<math>\Pr (E_{1}\cup E_{2}\cup \ldots \cup E_{k})=\Pr (E_{1})+\Pr (E_{2})+\ldots\Pr (E_{k})</math>
i.e., <math>\Pr \left( \bigcup_{j=1}^{k}E_{j}\right) =\sum_{j=1}^{k}\Pr (E_{j}).</math></li></ol>

In terms of the Venn Diagram, one can (and should) usefully think of the area of <math>E,</math> relative to that of <math>S,</math> as providing an indication of probability. (Note, from axiom 2, that the area of <math>S</math> is implicitly normalised to be unity).

Also observe that, contrary to what you may have believed, it is not one of the rules that <math>\Pr (E)\leq 1</math> for any event <math>E</math>. Rather, this is an implication of the <math>3</math> rules given:

* '''implications: '''it must be that for any event <math>E,</math> defined on <math>S</math>, <math>E\cap \bar{E}=\emptyset </math> and <math>E\cup \bar{E}=S.</math> By Axiom <math>1,</math> <math>\Pr (E)\geq 0</math> and <math>\Pr \left( \bar{E}\right) \geq 0</math> and by Axiom <math>3</math> <math>\Pr(E)+\Pr (\bar{E})=\Pr (S).</math> So <math>\Pr \left( E\right) +\Pr \left( \bar{E}\right) =1,</math> by Axiom <math>2.</math> This implies that

# <math>0\leq \Pr (E)\leq 1</math>
# <math>\Pr (\bar{E})=1-\Pr (E)</math>

The first of these is what we might have expected from probability (a number lying between <math>0</math> and <math>1</math>). The second implication is also very important; it says that the probability of <math>E</math> not happening is ‘''one minus the probability of it happening''’. Thus when rolling a die, the probability of getting <math>6</math> is one minus the probability of getting either a <math>1</math>, <math>2</math>, <math>3</math>, <math>4</math> or <math>5.</math>

These axioms imply how to calculate probabilities on a sample space of equally likely outcomes. For example, and as we have already noted, the experiment of rolling a fair die defines a sample space of six, mutually exclusive and equally likely outcomes (<math>1</math> to <math>6</math> dots on the up-turned face). The axioms then say that each of the six probabilities are positive, add to 1 and are all the same. Thus, the probability of any one of the outcomes must be simply <math>\frac{1}{6};</math> which may accord with your intuition. A similar sort of analysis reveals that the probability of drawing a club from a deck of <math>52</math> cards is <math>\frac{13}{52},</math> since any one of the <math>52</math> cards has an equal chance of being drawn and <math>13</math> of them are clubs; i.e., <math>13</math> of the <math>52</math> are clubs, so the probability of drawing a club is <math>\frac{13}{52}.</math> Notice the importance of the assumption of equally likely outcomes here.

In this, and the next section of notes, we shall see how these axioms can be used. Firstly, consider the construction of a probability for the ''union'' of two events; i.e., the probability that ''either ''<math>E</math> or <math>F</math> or (perhaps) ''both ''will occur. Such a probability is embodied in the ''addition rule of probability''.

== The addition rule of probability ==

When rolling a fair die, let <math>E</math> denote the event of an “odd number of dots” and <math>F</math> the event of the “number of dots being greater than, or equal, to <math>4</math>”<math>.</math> What is the probability of the event <math>E\cup F</math>? To calculate this we can collect together all the mutually exclusive (simple) events which comprise <math>E\cup F</math>, and then add up the probabilities (by axiom 3). These simple events are <math>1,3,4,5</math> or <math>6</math> dots. Each has a probability of <math>\frac{1}{6},</math> so the required total probability is: <math>\Pr \left( E\cup F\right) =\frac{5}{6}</math>. Consider carefully how this probability is constructed and note, in particular, that <math>\Pr \left( E\cup F\right) \neq \Pr \left( E\right) +\Pr \left( F\right) </math> since <math>E</math> and <math>F</math> have a simple event in common (namely <math>5</math> dots).

In general, we can calculate the probability of the union of events using the ''addition rule of probability'', as follows.

* For any events, <math>E\subset S</math> and <math>F\subset S:\Pr (E\cup F)=\Pr (E)+\Pr (F)-\Pr (E\cap F).</math> So, in general, <math>\Pr \left( E\cup F\right) \leq \Pr (E)+\Pr (F).</math>

This generalises to three events, <math>E_{1},E_{2}</math> and <math>E_{3}</math> as

<math>\begin{aligned}
\Pr (E_{1}\cup E_{2}\cup E_{3}) &=&\Pr (E_{1})+\Pr (E_{2})+\Pr (E_{3}) \\
&&-\Pr (E_{1}\cap E_{2})-\Pr (E_{1}\cap E_{3})-\Pr (E_{2}\cap E_{3}) \\
&&+\Pr (E_{1}\cap E_{2}\cap E_{3}).\end{aligned}</math>

We can demonstrate this as follows.

Note that

<math>E\cup F=\left( E\cap \bar{F}\right) \cup \left( E\cap F\right) \cup \left(\bar{E}\cap F\right)</math>

the union of <math>3</math> mutually exclusive events. These mutually exclusive events are depicted by the shaded areas <math>\mathbf{a,}</math> <math>\mathbf{b}</math> and <math>\mathbf{c}</math>, respectively, in the next Figure.

[[File:Prob_add.jpg|frameless|500px]]

Then by Axiom <math>3</math>, and from the fact that the three events <math>\left( E\cap\bar{F}\right) </math>, <math>\left( E\cap F\right) </math> and <math>\left( \bar{E}\cap F\right)</math> are mutually exclusive so that the “area” occupied by <math>E\cup F</math> is simply <math>\mathbf{a+b+c,}</math>

<math>\Pr \left( E\cup F\right) =\Pr \left( E\cap \bar{F}\right) +\Pr \left( \bar{E}\cap F\right) +\Pr \left( E\cap F\right) .</math>

But also by Axiom <math>3</math>, since <math>E=\left( E\cap \bar{F}\right) \cup \left(E\cap F\right) </math>, it must be that <math>\Pr (E)=\Pr \left( E\cap \bar{F}\right)+\Pr (E\cap F);</math> similarly, <math>\Pr \left( \bar{E}\cap F\right) =\Pr \left(F\right) -\Pr \left( E\cap F\right)</math>. Putting all of this together gives

<math>\Pr (E\cup F)=\Pr (E)+\Pr (F)-\Pr (E\cap F).</math>

When <math>E</math> and <math>F</math> are mutually exclusive, so that <math>E\cap F=\emptyset</math>, this rule reveals Axiom 2: <math>\Pr (E\cup F)=\Pr (E)+\Pr (F)</math>.

<ul>
<li>''Example: ''What is the probability of drawing a Queen (<math>Q </math>) or a Club (<math>C</math>) in a single draw from a pack of cards? Now, <math>4</math> out of <math>52 </math> cards are Queens, so <math>\Pr \left( Q\right) =\frac{4}{52},</math> whilst <math>\Pr\left( C\right) =\frac{13}{52}.</math> The probability of drawing the Queen of Clubs is simply <math>\frac{1}{52};</math> i.e., <math>\Pr \left( Q\cap C\right) =\frac{1}{52}</math>. What we require is a Club or a Queen, for which the probability is
<math>\begin{aligned}
\Pr \left( Q\cup C\right) &=&\Pr \left( Q\right) +\Pr \left( C\right) -\Pr\left( Q\cap C\right) \\
&=&\frac{4}{52}+\frac{13}{52}-\frac{1}{52} \\
&=&\frac{16}{52}=\frac{4}{13}.\end{aligned}</math></li>
<li>''Example: ''Consider a car journey from Manchester to London via the M6 and M1. Let <math>E=</math> ''heavy traffic somewhere on route'' and <math>F=</math> ''roadworks somewhere on route''. It is estimated that <math>\Pr (E)=0.8</math> and <math>\Pr (F)=0.4,</math> whilst the probability of NOT encountering both is <math>\Pr (\overline{E\cap F})=0.6.</math> What is the probability of encountering heavytraffic or roadworks?
We require <math>\Pr \left( E\cup F\right) .</math>
<math>\begin{aligned}
\Pr (E\cup F) &=&\Pr (E)+\Pr (F)-\Pr (E\cap F) \\
&=&\Pr (E)+\Pr (F)-(1-\Pr (\overline{E\cap F})) \\
&=&0.8+0.4-1+0.6 \\
&=&0.8=\Pr (E)\end{aligned}</math>
Notice that this implies, in this case, that <math>F\subset E</math> (why?). This ''model ''then implies that when there are roadworks somewhere on route you are bound to encounter heavy traffic; on the other hand, you can encounter heavy traffic on route without ever passing through roadworks. (My own experience of this motorway inclines me towards this implication!)</li></ul>

Similar concepts apply when manipulating proportions as follows:

<ul>
<li>''Example'': A sample of 1000 undergraduates were asked whether they took either Mathematics, Physics or Chemistry at A-level. The following responses were obtained: 100 just took Mathematics; 70 just took Physics; 100 just took Chemistry; 150 took Mathematics and Physics, but not Chemistry; 40 took Mathematics and Chemistry, but not Physics; and, 240 took Physics and Chemistry, but not Mathematics. What proportion took all three?
This can be addressed with the following diagram:
[[File:Prob_Alevels.jpg|frameless|500px]]
The shaded area contains the number who took all three, which can be deduced from the above information (since the total of the numbers assigned to each part of the Venn diagram must be <math>1000</math>). The answer is therefore <math>30\%</math> (being <math>300</math> out of <math>1000</math>).</li>
<li>Two further results on unions, intersections and complements which are of use (and which are fairly easy to demonstrate using Venn diagrams) are '''de Morgan Laws''':
<ul>
<li><math>\left( \bar{A}\cap \bar{B}\right) =\left( \overline{A\cup B}\right) </math></li>
<li><math>\bar{A}\cup \bar{B}=\left( \overline{A\cap B}\right) </math></li></ul>
</li></ul>

= Footnotes =

Probability Intro

2013-08-07T21:12:08Z

Admin: /* The addition rule of probability */

= Introducing Probability =

So far we have been looking at ways of summarising samples of data drawn from an underlying population of interest. Although at times tedious, all such arithmetic calculations are fairly mechanical and straightforward to apply. To remind ourselves, one of the primary reasons for wishing to summarise data is so assist in the development of inferences about the population from which the data were taken. That is to say, we would like to elicit some information about the mechanism which generated the observed data.

We now start on the process of developing mathematical ways of formulating inferences and this requires the use of ''probability''. This becomes clear if we think back to one of the early questions posed in this course: ''prior to sampling is it possible to predict with absolute certainty what will be observed''? The answer to this question is ''no''; although it would be of interest to know how ''likely'' it is that certain values would be observed. Or, what is the ''probability'' of observing certain values?

Before proceeding, we need some more tools:

= Venn diagrams =

Venn diagrams (and diagrams in general) are of enormous help in trying to understand, and manipulate probability. We begin with some basic definitions, some of which we have encountered before.

* '''Experiment:''' any process which, when applied, provides data or an outcome; e.g., rolling a die and observing the number of dots on the upturned face; recording the amount of rainfall in Manchester over a period of time.
* '''Sample Space:''' set of possible outcomes of an experiment; e.g., <math>S</math> (or <math>\Omega </math>) <math>=</math> <math>\{1,2,3,4,5,6\}</math>, which is the sample space of rolling a dice. Or <math>S</math> <math>=</math> <math>\{x;x\geq 0\}</math>, which is the sample space of an experiment where the outcomes can be any real non-negative number, or ‘''the set of real non-negative real numbers''’.
* '''Simple Event''': just one of the possible outcomes on <math>S</math>
* '''Event:''' a ''subset'' of <math>S</math>, denoted <math>E\subset S</math>; e.g., <math>E=\left\{ 2,4,6\right\}</math> (i.e. any even number on a dice) or <math>E=\left\{ x;4<x\leq 10\right\} ,</math> which means ‘''the set of real numbers which are strictly bigger than'' <math>4</math> ''but less than or equal to ''<math>10</math>’. Note that an event, <math>E</math>, is a collection of simple events.

Such concepts can be represented by means of the following Venn Diagram:

[[File:Venn_1.jpg|frameless|600px]]

The sample space, <math>S,</math> is depicted as a closed rectangle, and the event <math>E</math> is a closed loop wholly contained within <math>S</math> and we write (in set notation) <math>E\subset S</math>.

In dealing with probability, and in particular the probability of an event (or events) occurring, we shall need to be familiar with '''UNIONS, INTERSECTIONS''' and '''COMPLEMENTS'''.

To illustrate these concepts, consider the sample space <math>S=\{x;x\geq 0\},\,</math> with the following events defined on <math>S,</math> as depicted in Figure 3.2:

<math>E=\{x;4<x\leq 10\},\,F=\{x;7<x\leq 17\},\,G=\{x;x>15\},\,H=\{x;9<x\leq
13\}. </math>

{| class="wikitable"
|-
| (a) Event <math>E</math>: A closed loop
| (b) Union: <math>E\cup F</math>
|-
| [[File:Venn_2a.jpg|frameless|300px]]
| [[File:Venn_2b.jpg|frameless|300px]]
|-
| (c) Intersection: <math>E\cap F</math>
| (d) The Null set/event: <math>E\cap G=\emptyset </math>
|-
| [[File:Venn_2c.jpg|frameless|300px]]
| [[File:Venn_2d.jpg|frameless|300px]]
|-
| (e) Complement of <math>E</math>: <math>\bar{E}</math>
| (f) Subset of <math>F</math>: <math>H\subset F</math> and <math>H\cap F=H</math>
|-
| [[File:Venn_2e.jpg|frameless|300px]]
| [[File:Venn_2f.jpg|frameless|300px]]
|}

* The ''union'' of <math>E</math> and <math>F</math> is denoted <math>E\cup F,</math> with <math>E\cup F=\{x;4<x\leq 17\};</math> i.e., it contains elements (simple events) which are either in <math>E</math> or in <math>F</math> or (perhaps) in both. This is illustrated on the Venn diagram by the dark shaded area in diagram (b).
* The'' intersection'' of <math>E</math> and <math>F</math> is denoted <math>E\cap F,</math> with <math>E\cap F=\left\{ x;7\leq x\leq 10\right\} ;</math> i.e., it contains elements (simple events) which are common to both <math>E</math> and <math>F.</math> Again this is depicted by the dark shaded area in (c). If events have no elements in common (as, for example, <math>E</math> and <math>G</math>) then they are said to be ''mutually exclusive'', and we can write <math>E\cap G=\emptyset ,</math> meaning the ''null set'' which contains no elements. Such a situation is illustrated on the Venn Diagram by events (the two shaded closed loops in (d)) which do not overlap. Notice however that <math>G\cap F\neq \emptyset ,</math> since <math>G</math> and <math>F</math> have elements in common.
* The ''complement'' of an event <math>E,</math> say, is everything defined on the sample space which is not in <math>E.</math> This event is denoted <math>\bar{E}</math>, the dark shaded area in (e); here <math>\bar{E}=\left\{ x;x\leq 4\right\} \cup \left\{ x;x>10\right\}</math>.
* Finally note that <math>H</math> is a sub-set of <math>F;</math> see (f). It is depicted as the dark closed loop wholly contained within <math>F,</math> the lighter shaded area, so that <math>H\cap F=H;</math> if an element in the sample space is a member of <math>H</math> then it must also be member of <math>F.</math> (In mathematical logic, we employ this scenario to indicate that “<math>H</math> implies <math>F</math>”, but not necessarily vice-versa.) Notice that <math>G\cap H=\emptyset </math> but <math>H\cap E\neq \emptyset</math>.

= Probability =

The term ''probability'' (or some equivalent) is used in everyday conversation and so can not be unfamiliar to the reader. We talk of the probability, or chance, of rain; the likelihood of England winning the World Cup; or, perhaps more scientifically, the chance of getting a <math>6</math> when rolling a die. What we shall now do is develop a coherent theory of probability; a theory which allows us to combine and manipulate probabilities in a consistent and meaningful manner. We shall describe ways of dealing with, and describing, uncertainty. This will involve ''rules'' which govern our use of terms like probability.

There have been a number of different approaches (interpretations) of probability. Most depend, at least to some extent, on the notion of relative frequency as now described:

* Suppose an experiment has an outcome of interest <math>E</math>. The ''relative frequency interpretation'' of probability says that assuming the experiment can be repeated a large number of times then the relative frequency of observing the outcome <math>E</math> will settle down to a ''number'', denoted <math>\Pr (E),</math> <math>P(E)</math> or Prob<math>(E),</math> called the '''probability''' of <math>E</math>.

This is illustrated in the next Figure where the proportion of heads obtained after <math>n</math> flips of a fair coin is plotted against <math>n</math>, as <math>n</math> increases; e.g., of the first <math>100</math> flips, <math>55</math> were heads (<math>55\%</math>). Notice that the plot becomes less ‘wobbly’ after about <math>n=220</math> and appears to be settling down to the value of <math>\frac{1}{2}</math>.

[[File:Prob_coin.jpg|frameless|600px]]

Due to this interpretation of probability, we often use observed sample proportions to approximate underlying probabilities of interest; see, for example, Question 4 of Exercise 2. There are, of course, other interpretations of probability; e.g., the subjective interpretation which simply expresses the strength of one’s belief about an event of interest such as whether Manchester United will win the European Cup! Any one of these interpretations can be used in practical situations provided the implied notion of probability follows a simple set of ''axioms'' or ''rules''.

== The axioms of probability ==

There are just ''three ''basic rules that must be obeyed when dealing with probabilities:

<ol>
<li>For any event <math>E</math> defined on <math>S,</math> i.e., <math>E\subset S,\,\,\Pr (E)\geq 0</math>; ''probabilities are non-negative''.</li>
<li><math>\Pr (S)=1;</math> ''having defined the sample space of outcomes, one of these outcomes must be observed''.</li>
<li>If events <math>E</math> and <math>F</math> are mutually exclusive defined on <math>S</math>, so that <math>E\cap F=\emptyset </math>, then <math>\Pr \left( E\cup F\right) =\Pr \left( E\right)+\Pr \left( F\right) .</math> In general, for any set of mutually exclusive events, <math>E_{1},E_{2},\ldots ,E_{k},</math> defined on <math>S:</math>
<math>\Pr (E_{1}\cup E_{2}\cup \ldots \cup E_{k})=\Pr (E_{1})+\Pr (E_{2})+\ldots\Pr (E_{k})</math>
i.e., <math>\Pr \left( \bigcup_{j=1}^{k}E_{j}\right) =\sum_{j=1}^{k}\Pr (E_{j}).</math></li></ol>

In terms of the Venn Diagram, one can (and should) usefully think of the area of <math>E,</math> relative to that of <math>S,</math> as providing an indication of probability. (Note, from axiom 2, that the area of <math>S</math> is implicitly normalised to be unity).

Also observe that, contrary to what you may have believed, it is not one of the rules that <math>\Pr (E)\leq 1</math> for any event <math>E</math>. Rather, this is an implication of the <math>3</math> rules given:

* '''implications: '''it must be that for any event <math>E,</math> defined on <math>S</math>, <math>E\cap \bar{E}=\emptyset </math> and <math>E\cup \bar{E}=S.</math> By Axiom <math>1,</math> <math>\Pr (E)\geq 0</math> and <math>\Pr \left( \bar{E}\right) \geq 0</math> and by Axiom <math>3</math> <math>\Pr(E)+\Pr (\bar{E})=\Pr (S).</math> So <math>\Pr \left( E\right) +\Pr \left( \bar{E}\right) =1,</math> by Axiom <math>2.</math> This implies that

# <math>0\leq \Pr (E)\leq 1</math>
# <math>\Pr (\bar{E})=1-\Pr (E)</math>

The first of these is what we might have expected from probability (a number lying between <math>0</math> and <math>1</math>). The second implication is also very important; it says that the probability of <math>E</math> not happening is ‘''one minus the probability of it happening''’. Thus when rolling a die, the probability of getting <math>6</math> is one minus the probability of getting either a <math>1</math>, <math>2</math>, <math>3</math>, <math>4</math> or <math>5.</math>

These axioms imply how to calculate probabilities on a sample space of equally likely outcomes. For example, and as we have already noted, the experiment of rolling a fair die defines a sample space of six, mutually exclusive and equally likely outcomes (<math>1</math> to <math>6</math> dots on the up-turned face). The axioms then say that each of the six probabilities are positive, add to 1 and are all the same. Thus, the probability of any one of the outcomes must be simply <math>\frac{1}{6};</math> which may accord with your intuition. A similar sort of analysis reveals that the probability of drawing a club from a deck of <math>52</math> cards is <math>\frac{13}{52},</math> since any one of the <math>52</math> cards has an equal chance of being drawn and <math>13</math> of them are clubs; i.e., <math>13</math> of the <math>52</math> are clubs, so the probability of drawing a club is <math>\frac{13}{52}.</math> Notice the importance of the assumption of equally likely outcomes here.

In this, and the next section of notes, we shall see how these axioms can be used. Firstly, consider the construction of a probability for the ''union'' of two events; i.e., the probability that ''either ''<math>E</math> or <math>F</math> or (perhaps) ''both ''will occur. Such a probability is embodied in the ''addition rule of probability''.

== The addition rule of probability ==

When rolling a fair die, let <math>E</math> denote the event of an “odd number of dots” and <math>F</math> the event of the “number of dots being greater than, or equal, to <math>4</math>”<math>.</math> What is the probability of the event <math>E\cup F</math>? To calculate this we can collect together all the mutually exclusive (simple) events which comprise <math>E\cup F</math>, and then add up the probabilities (by axiom 3). These simple events are <math>1,3,4,5</math> or <math>6</math> dots. Each has a probability of <math>\frac{1}{6},</math> so the required total probability is: <math>\Pr \left( E\cup F\right) =\frac{5}{6}</math>. Consider carefully how this probability is constructed and note, in particular, that <math>\Pr \left( E\cup F\right) \neq \Pr \left( E\right) +\Pr \left( F\right) </math> since <math>E</math> and <math>F</math> have a simple event in common (namely <math>5</math> dots).

In general, we can calculate the probability of the union of events using the ''addition rule of probability'', as follows.

* For any events, <math>E\subset S</math> and <math>F\subset S:\Pr (E\cup F)=\Pr (E)+\Pr (F)-\Pr (E\cap F).</math> So, in general, <math>\Pr \left( E\cup F\right) \leq \Pr (E)+\Pr (F).</math>

This generalises to three events, <math>E_{1},E_{2}</math> and <math>E_{3}</math> as

<math>\begin{aligned}
\Pr (E_{1}\cup E_{2}\cup E_{3}) &=&\Pr (E_{1})+\Pr (E_{2})+\Pr (E_{3}) \\
&&-\Pr (E_{1}\cap E_{2})-\Pr (E_{1}\cap E_{3})-\Pr (E_{2}\cap E_{3}) \\
&&+\Pr (E_{1}\cap E_{2}\cap E_{3}).\end{aligned}</math>

We can demonstrate this as follows.

Note that

<math>E\cup F=\left( E\cap \bar{F}\right) \cup \left( E\cap F\right) \cup \left(\bar{E}\cap F\right)</math>

the union of <math>3</math> mutually exclusive events. These mutually exclusive events are depicted by the shaded areas <math>\mathbf{a,}</math> <math>\mathbf{b}</math> and <math>\mathbf{c}</math>, respectively, in the next Figure.

[[File:Prob_add.jpg|frameless|500px]]

Then by Axiom <math>3</math>, and from the fact that the three events <math>\left( E\cap\bar{F}\right) </math>, <math>\left( E\cap F\right) </math> and <math>\left( \bar{E}\cap F\right)</math> are mutually exclusive so that the “area” occupied by <math>E\cup F</math> is simply <math>\mathbf{a+b+c,}</math>

<math>\Pr \left( E\cup F\right) =\Pr \left( E\cap \bar{F}\right) +\Pr \left( \bar{E}\cap F\right) +\Pr \left( E\cap F\right) .</math>

But also by Axiom <math>3</math>, since <math>E=\left( E\cap \bar{F}\right) \cup \left(E\cap F\right) </math>, it must be that <math>\Pr (E)=\Pr \left( E\cap \bar{F}\right)+\Pr (E\cap F);</math> similarly, <math>\Pr \left( \bar{E}\cap F\right) =\Pr \left(F\right) -\Pr \left( E\cap F\right)</math>. Putting all of this together gives

<math>\Pr (E\cup F)=\Pr (E)+\Pr (F)-\Pr (E\cap F).</math>

When <math>E</math> and <math>F</math> are mutually exclusive, so that <math>E\cap F=\emptyset</math>, this rule reveals Axiom 2: <math>\Pr (E\cup F)=\Pr (E)+\Pr (F)</math>.

<ul>
<li>''Example: ''What is the probability of drawing a Queen (<math>Q </math>) or a Club (<math>C</math>) in a single draw from a pack of cards? Now, <math>4</math> out of <math>52 </math> cards are Queens, so <math>\Pr \left( Q\right) =\frac{4}{52},</math> whilst <math>\Pr\left( C\right) =\frac{13}{52}.</math> The probability of drawing the Queen of Clubs is simply <math>\frac{1}{52};</math> i.e., <math>\Pr \left( Q\cap C\right) =\frac{1}{52}</math>. What we require is a Club or a Queen, for which the probability is
<math>\begin{aligned}
\Pr \left( Q\cup C\right) &=&\Pr \left( Q\right) +\Pr \left( C\right) -\Pr\left( Q\cap C\right) \\
&=&\frac{4}{52}+\frac{13}{52}-\frac{1}{52} \\
&=&\frac{16}{52}=\frac{4}{13}.\end{aligned}</math></li>
<li>''Example: ''Consider a car journey from Manchester to London via the M6 and M1. Let <math>E=</math> ''heavy traffic somewhere on route'' and <math>F=</math> ''roadworks somewhere on route''. It is estimated that <math>\Pr (E)=0.8</math> and <math>\Pr (F)=0.4,</math> whilst the probability of NOT encountering both is <math>\Pr (\overline{E\cap F})=0.6.</math> What is the probability of encountering heavytraffic or roadworks?
We require <math>\Pr \left( E\cup F\right) .</math>
<math>\begin{aligned}
\Pr (E\cup F) &=&\Pr (E)+\Pr (F)-\Pr (E\cap F) \\
&=&\Pr (E)+\Pr (F)-(1-\Pr (\overline{E\cap F})) \\
&=&0.8+0.4-1+0.6 \\
&=&0.8=\Pr (E)\end{aligned}</math>
Notice that this implies, in this case, that <math>F\subset E</math> (why?). This ''model ''then implies that when there are roadworks somewhere on route you are bound to encounter heavy traffic; on the other hand, you can encounter heavy traffic on route without ever passing through roadworks. (My own experience of this motorway inclines me towards this implication!)</li></ul>

Similar concepts apply when manipulating proportions as follows:

<ul>
<li>''Example''<math>:</math> A sample of <math>1000</math> undergraduates were asked whether they took either Mathematics, Physics or Chemistry at A-level. The following responses were obtained: <math>100</math> just took Mathematics; <math>70</math> just took Physics; <math>100</math> just took Chemistry; <math>150</math> took Mathematics and Physics, but not Chemistry; <math>40</math> took Mathematics and Chemistry, but not Physics; and, <math>240</math> took Physics and Chemistry, but not Mathematics. What proportion took all three?
This can be addressed with the following diagram:
[[File:Prob_Alevels.jpg|frameless|500px]]
The shaded area contains the number who took all three, which can be deduced from the above information (since the total of the numbers assigned to each part of the Venn diagram must be <math>1000</math>). The answer is therefore <math>30\%</math> (being <math>300</math> out of <math>1000</math>).</li>
<li>Two further results on unions, intersections and complements which are of use (and which are fairly easy to demonstrate using Venn diagrams) are '''de Morgan Laws''':
<ul>
<li><math>\left( \bar{A}\cap \bar{B}\right) =\left( \overline{A\cup B}\right) </math></li>
<li><math>\bar{A}\cup \bar{B}=\left( \overline{A\cap B}\right) </math></li></ul>
</li></ul>

= Footnotes =

Probability Intro

2013-08-07T21:11:46Z

Admin: /* The addition rule of probability */

= Introducing Probability =

So far we have been looking at ways of summarising samples of data drawn from an underlying population of interest. Although at times tedious, all such arithmetic calculations are fairly mechanical and straightforward to apply. To remind ourselves, one of the primary reasons for wishing to summarise data is so assist in the development of inferences about the population from which the data were taken. That is to say, we would like to elicit some information about the mechanism which generated the observed data.

We now start on the process of developing mathematical ways of formulating inferences and this requires the use of ''probability''. This becomes clear if we think back to one of the early questions posed in this course: ''prior to sampling is it possible to predict with absolute certainty what will be observed''? The answer to this question is ''no''; although it would be of interest to know how ''likely'' it is that certain values would be observed. Or, what is the ''probability'' of observing certain values?

Before proceeding, we need some more tools:

= Venn diagrams =

Venn diagrams (and diagrams in general) are of enormous help in trying to understand, and manipulate probability. We begin with some basic definitions, some of which we have encountered before.

* '''Experiment:''' any process which, when applied, provides data or an outcome; e.g., rolling a die and observing the number of dots on the upturned face; recording the amount of rainfall in Manchester over a period of time.
* '''Sample Space:''' set of possible outcomes of an experiment; e.g., <math>S</math> (or <math>\Omega </math>) <math>=</math> <math>\{1,2,3,4,5,6\}</math>, which is the sample space of rolling a dice. Or <math>S</math> <math>=</math> <math>\{x;x\geq 0\}</math>, which is the sample space of an experiment where the outcomes can be any real non-negative number, or ‘''the set of real non-negative real numbers''’.
* '''Simple Event''': just one of the possible outcomes on <math>S</math>
* '''Event:''' a ''subset'' of <math>S</math>, denoted <math>E\subset S</math>; e.g., <math>E=\left\{ 2,4,6\right\}</math> (i.e. any even number on a dice) or <math>E=\left\{ x;4<x\leq 10\right\} ,</math> which means ‘''the set of real numbers which are strictly bigger than'' <math>4</math> ''but less than or equal to ''<math>10</math>’. Note that an event, <math>E</math>, is a collection of simple events.

Such concepts can be represented by means of the following Venn Diagram:

[[File:Venn_1.jpg|frameless|600px]]

The sample space, <math>S,</math> is depicted as a closed rectangle, and the event <math>E</math> is a closed loop wholly contained within <math>S</math> and we write (in set notation) <math>E\subset S</math>.

In dealing with probability, and in particular the probability of an event (or events) occurring, we shall need to be familiar with '''UNIONS, INTERSECTIONS''' and '''COMPLEMENTS'''.

To illustrate these concepts, consider the sample space <math>S=\{x;x\geq 0\},\,</math> with the following events defined on <math>S,</math> as depicted in Figure 3.2:

<math>E=\{x;4<x\leq 10\},\,F=\{x;7<x\leq 17\},\,G=\{x;x>15\},\,H=\{x;9<x\leq
13\}. </math>

{| class="wikitable"
|-
| (a) Event <math>E</math>: A closed loop
| (b) Union: <math>E\cup F</math>
|-
| [[File:Venn_2a.jpg|frameless|300px]]
| [[File:Venn_2b.jpg|frameless|300px]]
|-
| (c) Intersection: <math>E\cap F</math>
| (d) The Null set/event: <math>E\cap G=\emptyset </math>
|-
| [[File:Venn_2c.jpg|frameless|300px]]
| [[File:Venn_2d.jpg|frameless|300px]]
|-
| (e) Complement of <math>E</math>: <math>\bar{E}</math>
| (f) Subset of <math>F</math>: <math>H\subset F</math> and <math>H\cap F=H</math>
|-
| [[File:Venn_2e.jpg|frameless|300px]]
| [[File:Venn_2f.jpg|frameless|300px]]
|}

* The ''union'' of <math>E</math> and <math>F</math> is denoted <math>E\cup F,</math> with <math>E\cup F=\{x;4<x\leq 17\};</math> i.e., it contains elements (simple events) which are either in <math>E</math> or in <math>F</math> or (perhaps) in both. This is illustrated on the Venn diagram by the dark shaded area in diagram (b).
* The'' intersection'' of <math>E</math> and <math>F</math> is denoted <math>E\cap F,</math> with <math>E\cap F=\left\{ x;7\leq x\leq 10\right\} ;</math> i.e., it contains elements (simple events) which are common to both <math>E</math> and <math>F.</math> Again this is depicted by the dark shaded area in (c). If events have no elements in common (as, for example, <math>E</math> and <math>G</math>) then they are said to be ''mutually exclusive'', and we can write <math>E\cap G=\emptyset ,</math> meaning the ''null set'' which contains no elements. Such a situation is illustrated on the Venn Diagram by events (the two shaded closed loops in (d)) which do not overlap. Notice however that <math>G\cap F\neq \emptyset ,</math> since <math>G</math> and <math>F</math> have elements in common.
* The ''complement'' of an event <math>E,</math> say, is everything defined on the sample space which is not in <math>E.</math> This event is denoted <math>\bar{E}</math>, the dark shaded area in (e); here <math>\bar{E}=\left\{ x;x\leq 4\right\} \cup \left\{ x;x>10\right\}</math>.
* Finally note that <math>H</math> is a sub-set of <math>F;</math> see (f). It is depicted as the dark closed loop wholly contained within <math>F,</math> the lighter shaded area, so that <math>H\cap F=H;</math> if an element in the sample space is a member of <math>H</math> then it must also be member of <math>F.</math> (In mathematical logic, we employ this scenario to indicate that “<math>H</math> implies <math>F</math>”, but not necessarily vice-versa.) Notice that <math>G\cap H=\emptyset </math> but <math>H\cap E\neq \emptyset</math>.

= Probability =

The term ''probability'' (or some equivalent) is used in everyday conversation and so can not be unfamiliar to the reader. We talk of the probability, or chance, of rain; the likelihood of England winning the World Cup; or, perhaps more scientifically, the chance of getting a <math>6</math> when rolling a die. What we shall now do is develop a coherent theory of probability; a theory which allows us to combine and manipulate probabilities in a consistent and meaningful manner. We shall describe ways of dealing with, and describing, uncertainty. This will involve ''rules'' which govern our use of terms like probability.

There have been a number of different approaches (interpretations) of probability. Most depend, at least to some extent, on the notion of relative frequency as now described:

* Suppose an experiment has an outcome of interest <math>E</math>. The ''relative frequency interpretation'' of probability says that assuming the experiment can be repeated a large number of times then the relative frequency of observing the outcome <math>E</math> will settle down to a ''number'', denoted <math>\Pr (E),</math> <math>P(E)</math> or Prob<math>(E),</math> called the '''probability''' of <math>E</math>.

This is illustrated in the next Figure where the proportion of heads obtained after <math>n</math> flips of a fair coin is plotted against <math>n</math>, as <math>n</math> increases; e.g., of the first <math>100</math> flips, <math>55</math> were heads (<math>55\%</math>). Notice that the plot becomes less ‘wobbly’ after about <math>n=220</math> and appears to be settling down to the value of <math>\frac{1}{2}</math>.

[[File:Prob_coin.jpg|frameless|600px]]

Due to this interpretation of probability, we often use observed sample proportions to approximate underlying probabilities of interest; see, for example, Question 4 of Exercise 2. There are, of course, other interpretations of probability; e.g., the subjective interpretation which simply expresses the strength of one’s belief about an event of interest such as whether Manchester United will win the European Cup! Any one of these interpretations can be used in practical situations provided the implied notion of probability follows a simple set of ''axioms'' or ''rules''.

== The axioms of probability ==

There are just ''three ''basic rules that must be obeyed when dealing with probabilities:

<ol>
<li>For any event <math>E</math> defined on <math>S,</math> i.e., <math>E\subset S,\,\,\Pr (E)\geq 0</math>; ''probabilities are non-negative''.</li>
<li><math>\Pr (S)=1;</math> ''having defined the sample space of outcomes, one of these outcomes must be observed''.</li>
<li>If events <math>E</math> and <math>F</math> are mutually exclusive defined on <math>S</math>, so that <math>E\cap F=\emptyset </math>, then <math>\Pr \left( E\cup F\right) =\Pr \left( E\right)+\Pr \left( F\right) .</math> In general, for any set of mutually exclusive events, <math>E_{1},E_{2},\ldots ,E_{k},</math> defined on <math>S:</math>
<math>\Pr (E_{1}\cup E_{2}\cup \ldots \cup E_{k})=\Pr (E_{1})+\Pr (E_{2})+\ldots\Pr (E_{k})</math>
i.e., <math>\Pr \left( \bigcup_{j=1}^{k}E_{j}\right) =\sum_{j=1}^{k}\Pr (E_{j}).</math></li></ol>

In terms of the Venn Diagram, one can (and should) usefully think of the area of <math>E,</math> relative to that of <math>S,</math> as providing an indication of probability. (Note, from axiom 2, that the area of <math>S</math> is implicitly normalised to be unity).

Also observe that, contrary to what you may have believed, it is not one of the rules that <math>\Pr (E)\leq 1</math> for any event <math>E</math>. Rather, this is an implication of the <math>3</math> rules given:

* '''implications: '''it must be that for any event <math>E,</math> defined on <math>S</math>, <math>E\cap \bar{E}=\emptyset </math> and <math>E\cup \bar{E}=S.</math> By Axiom <math>1,</math> <math>\Pr (E)\geq 0</math> and <math>\Pr \left( \bar{E}\right) \geq 0</math> and by Axiom <math>3</math> <math>\Pr(E)+\Pr (\bar{E})=\Pr (S).</math> So <math>\Pr \left( E\right) +\Pr \left( \bar{E}\right) =1,</math> by Axiom <math>2.</math> This implies that

# <math>0\leq \Pr (E)\leq 1</math>
# <math>\Pr (\bar{E})=1-\Pr (E)</math>

The first of these is what we might have expected from probability (a number lying between <math>0</math> and <math>1</math>). The second implication is also very important; it says that the probability of <math>E</math> not happening is ‘''one minus the probability of it happening''’. Thus when rolling a die, the probability of getting <math>6</math> is one minus the probability of getting either a <math>1</math>, <math>2</math>, <math>3</math>, <math>4</math> or <math>5.</math>

These axioms imply how to calculate probabilities on a sample space of equally likely outcomes. For example, and as we have already noted, the experiment of rolling a fair die defines a sample space of six, mutually exclusive and equally likely outcomes (<math>1</math> to <math>6</math> dots on the up-turned face). The axioms then say that each of the six probabilities are positive, add to 1 and are all the same. Thus, the probability of any one of the outcomes must be simply <math>\frac{1}{6};</math> which may accord with your intuition. A similar sort of analysis reveals that the probability of drawing a club from a deck of <math>52</math> cards is <math>\frac{13}{52},</math> since any one of the <math>52</math> cards has an equal chance of being drawn and <math>13</math> of them are clubs; i.e., <math>13</math> of the <math>52</math> are clubs, so the probability of drawing a club is <math>\frac{13}{52}.</math> Notice the importance of the assumption of equally likely outcomes here.

In this, and the next section of notes, we shall see how these axioms can be used. Firstly, consider the construction of a probability for the ''union'' of two events; i.e., the probability that ''either ''<math>E</math> or <math>F</math> or (perhaps) ''both ''will occur. Such a probability is embodied in the ''addition rule of probability''.

== The addition rule of probability ==

When rolling a fair die, let <math>E</math> denote the event of an “odd number of dots” and <math>F</math> the event of the “number of dots being greater than, or equal, to <math>4</math>”<math>.</math> What is the probability of the event <math>E\cup F</math>? To calculate this we can collect together all the mutually exclusive (simple) events which comprise <math>E\cup F</math>, and then add up the probabilities (by axiom 3). These simple events are <math>1,3,4,5</math> or <math>6</math> dots. Each has a probability of <math>\frac{1}{6},</math> so the required total probability is: <math>\Pr \left( E\cup F\right) =\frac{5}{6}</math>. Consider carefully how this probability is constructed and note, in particular, that <math>\Pr \left( E\cup F\right) \neq \Pr \left( E\right) +\Pr \left( F\right) </math> since <math>E</math> and <math>F</math> have a simple event in common (namely <math>5</math> dots).

In general, we can calculate the probability of the union of events using the ''addition rule of probability'', as follows.

* For any events, <math>E\subset S</math> and <math>F\subset S:\Pr (E\cup F)=\Pr (E)+\Pr (F)-\Pr (E\cap F).</math> So, in general, <math>\Pr \left( E\cup F\right) \leq \Pr (E)+\Pr (F).</math>

This generalises to three events, <math>E_{1},E_{2}</math> and <math>E_{3}</math> as

<math>\begin{aligned}
\Pr (E_{1}\cup E_{2}\cup E_{3}) &=&\Pr (E_{1})+\Pr (E_{2})+\Pr (E_{3}) \\
&&-\Pr (E_{1}\cap E_{2})-\Pr (E_{1}\cap E_{3})-\Pr (E_{2}\cap E_{3}) \\
&&+\Pr (E_{1}\cap E_{2}\cap E_{3}).\end{aligned}</math>

We can demonstrate this as follows.

Note that

<math>E\cup F=\left( E\cap \bar{F}\right) \cup \left( E\cap F\right) \cup \left(\bar{E}\cap F\right)</math>

the union of <math>3</math> mutually exclusive events. These mutually exclusive events are depicted by the shaded areas <math>\mathbf{a,}</math> <math>\mathbf{b}</math> and <math>\mathbf{c}</math>, respectively, in the next Figure.

[[File:Prob_add.jpg|frameless|500px]]

Then by Axiom <math>3</math>, and from the fact that the three events <math>\left( E\cap\bar{F}\right) </math>, <math>\left( E\cap F\right) </math> and <math>\left( \bar{E}\cap F\right)</math> are mutually exclusive so that the “area” occupied by <math>E\cup F</math> is simply <math>\mathbf{a+b+c,}</math>

<math>\Pr \left( E\cup F\right) =\Pr \left( E\cap \bar{F}\right) +\Pr \left( \bar{E}\cap F\right) +\Pr \left( E\cap F\right) .</math>

But also by Axiom <math>3</math>, since <math>E=\left( E\cap \bar{F}\right) \cup \left(E\cap F\right) </math>, it must be that <math>\Pr (E)=\Pr \left( E\cap \bar{F}\right)+\Pr (E\cap F);</math> similarly, <math>\Pr \left( \bar{E}\cap F\right) =\Pr \left(F\right) -\Pr \left( E\cap F\right)</math>. Putting all of this together gives

<math>\Pr (E\cup F)=\Pr (E)+\Pr (F)-\Pr (E\cap F).</math>

When <math>E</math> and <math>F</math> are mutually exclusive, so that <math>E\cap F=\emptyset</math>, this rule reveals Axiom 2: <math>\Pr (E\cup F)=\Pr (E)+\Pr (F).\smallskip </math>

<ul>
<li>''Example: ''What is the probability of drawing a Queen (<math>Q </math>) or a Club (<math>C</math>) in a single draw from a pack of cards? Now, <math>4</math> out of <math>52 </math> cards are Queens, so <math>\Pr \left( Q\right) =\frac{4}{52},</math> whilst <math>\Pr\left( C\right) =\frac{13}{52}.</math> The probability of drawing the Queen of Clubs is simply <math>\frac{1}{52};</math> i.e., <math>\Pr \left( Q\cap C\right) =\frac{1}{52}</math>. What we require is a Club or a Queen, for which the probability is
<math>\begin{aligned}
\Pr \left( Q\cup C\right) &=&\Pr \left( Q\right) +\Pr \left( C\right) -\Pr\left( Q\cap C\right) \\
&=&\frac{4}{52}+\frac{13}{52}-\frac{1}{52} \\
&=&\frac{16}{52}=\frac{4}{13}.\end{aligned}</math></li>
<li>''Example: ''Consider a car journey from Manchester to London via the M6 and M1. Let <math>E=</math> ''heavy traffic somewhere on route'' and <math>F=</math> ''roadworks somewhere on route''. It is estimated that <math>\Pr (E)=0.8</math> and <math>\Pr (F)=0.4,</math> whilst the probability of NOT encountering both is <math>\Pr (\overline{E\cap F})=0.6.</math> What is the probability of encountering heavytraffic or roadworks?
We require <math>\Pr \left( E\cup F\right) .</math>
<math>\begin{aligned}
\Pr (E\cup F) &=&\Pr (E)+\Pr (F)-\Pr (E\cap F) \\
&=&\Pr (E)+\Pr (F)-(1-\Pr (\overline{E\cap F})) \\
&=&0.8+0.4-1+0.6 \\
&=&0.8=\Pr (E)\end{aligned}</math>
Notice that this implies, in this case, that <math>F\subset E</math> (why?). This ''model ''then implies that when there are roadworks somewhere on route you are bound to encounter heavy traffic; on the other hand, you can encounter heavy traffic on route without ever passing through roadworks. (My own experience of this motorway inclines me towards this implication!)</li></ul>

Similar concepts apply when manipulating proportions as follows:

<ul>
<li>''Example''<math>:</math> A sample of <math>1000</math> undergraduates were asked whether they took either Mathematics, Physics or Chemistry at A-level. The following responses were obtained: <math>100</math> just took Mathematics; <math>70</math> just took Physics; <math>100</math> just took Chemistry; <math>150</math> took Mathematics and Physics, but not Chemistry; <math>40</math> took Mathematics and Chemistry, but not Physics; and, <math>240</math> took Physics and Chemistry, but not Mathematics. What proportion took all three?
This can be addressed with the following diagram:
[[File:Prob_Alevels.jpg|frameless|500px]]
The shaded area contains the number who took all three, which can be deduced from the above information (since the total of the numbers assigned to each part of the Venn diagram must be <math>1000</math>). The answer is therefore <math>30\%</math> (being <math>300</math> out of <math>1000</math>).</li>
<li>Two further results on unions, intersections and complements which are of use (and which are fairly easy to demonstrate using Venn diagrams) are '''de Morgan Laws''':
<ul>
<li><math>\left( \bar{A}\cap \bar{B}\right) =\left( \overline{A\cup B}\right) </math></li>
<li><math>\bar{A}\cup \bar{B}=\left( \overline{A\cap B}\right) </math></li></ul>
</li></ul>

= Footnotes =

Probability Intro

2013-08-07T21:11:24Z

Admin: /* The addition rule of probability */

= Introducing Probability =

So far we have been looking at ways of summarising samples of data drawn from an underlying population of interest. Although at times tedious, all such arithmetic calculations are fairly mechanical and straightforward to apply. To remind ourselves, one of the primary reasons for wishing to summarise data is so assist in the development of inferences about the population from which the data were taken. That is to say, we would like to elicit some information about the mechanism which generated the observed data.

We now start on the process of developing mathematical ways of formulating inferences and this requires the use of ''probability''. This becomes clear if we think back to one of the early questions posed in this course: ''prior to sampling is it possible to predict with absolute certainty what will be observed''? The answer to this question is ''no''; although it would be of interest to know how ''likely'' it is that certain values would be observed. Or, what is the ''probability'' of observing certain values?

Before proceeding, we need some more tools:

= Venn diagrams =

Venn diagrams (and diagrams in general) are of enormous help in trying to understand, and manipulate probability. We begin with some basic definitions, some of which we have encountered before.

* '''Experiment:''' any process which, when applied, provides data or an outcome; e.g., rolling a die and observing the number of dots on the upturned face; recording the amount of rainfall in Manchester over a period of time.
* '''Sample Space:''' set of possible outcomes of an experiment; e.g., <math>S</math> (or <math>\Omega </math>) <math>=</math> <math>\{1,2,3,4,5,6\}</math>, which is the sample space of rolling a dice. Or <math>S</math> <math>=</math> <math>\{x;x\geq 0\}</math>, which is the sample space of an experiment where the outcomes can be any real non-negative number, or ‘''the set of real non-negative real numbers''’.
* '''Simple Event''': just one of the possible outcomes on <math>S</math>
* '''Event:''' a ''subset'' of <math>S</math>, denoted <math>E\subset S</math>; e.g., <math>E=\left\{ 2,4,6\right\}</math> (i.e. any even number on a dice) or <math>E=\left\{ x;4<x\leq 10\right\} ,</math> which means ‘''the set of real numbers which are strictly bigger than'' <math>4</math> ''but less than or equal to ''<math>10</math>’. Note that an event, <math>E</math>, is a collection of simple events.

Such concepts can be represented by means of the following Venn Diagram:

[[File:Venn_1.jpg|frameless|600px]]

The sample space, <math>S,</math> is depicted as a closed rectangle, and the event <math>E</math> is a closed loop wholly contained within <math>S</math> and we write (in set notation) <math>E\subset S</math>.

In dealing with probability, and in particular the probability of an event (or events) occurring, we shall need to be familiar with '''UNIONS, INTERSECTIONS''' and '''COMPLEMENTS'''.

To illustrate these concepts, consider the sample space <math>S=\{x;x\geq 0\},\,</math> with the following events defined on <math>S,</math> as depicted in Figure 3.2:

<math>E=\{x;4<x\leq 10\},\,F=\{x;7<x\leq 17\},\,G=\{x;x>15\},\,H=\{x;9<x\leq
13\}. </math>

{| class="wikitable"
|-
| (a) Event <math>E</math>: A closed loop
| (b) Union: <math>E\cup F</math>
|-
| [[File:Venn_2a.jpg|frameless|300px]]
| [[File:Venn_2b.jpg|frameless|300px]]
|-
| (c) Intersection: <math>E\cap F</math>
| (d) The Null set/event: <math>E\cap G=\emptyset </math>
|-
| [[File:Venn_2c.jpg|frameless|300px]]
| [[File:Venn_2d.jpg|frameless|300px]]
|-
| (e) Complement of <math>E</math>: <math>\bar{E}</math>
| (f) Subset of <math>F</math>: <math>H\subset F</math> and <math>H\cap F=H</math>
|-
| [[File:Venn_2e.jpg|frameless|300px]]
| [[File:Venn_2f.jpg|frameless|300px]]
|}

* The ''union'' of <math>E</math> and <math>F</math> is denoted <math>E\cup F,</math> with <math>E\cup F=\{x;4<x\leq 17\};</math> i.e., it contains elements (simple events) which are either in <math>E</math> or in <math>F</math> or (perhaps) in both. This is illustrated on the Venn diagram by the dark shaded area in diagram (b).
* The'' intersection'' of <math>E</math> and <math>F</math> is denoted <math>E\cap F,</math> with <math>E\cap F=\left\{ x;7\leq x\leq 10\right\} ;</math> i.e., it contains elements (simple events) which are common to both <math>E</math> and <math>F.</math> Again this is depicted by the dark shaded area in (c). If events have no elements in common (as, for example, <math>E</math> and <math>G</math>) then they are said to be ''mutually exclusive'', and we can write <math>E\cap G=\emptyset ,</math> meaning the ''null set'' which contains no elements. Such a situation is illustrated on the Venn Diagram by events (the two shaded closed loops in (d)) which do not overlap. Notice however that <math>G\cap F\neq \emptyset ,</math> since <math>G</math> and <math>F</math> have elements in common.
* The ''complement'' of an event <math>E,</math> say, is everything defined on the sample space which is not in <math>E.</math> This event is denoted <math>\bar{E}</math>, the dark shaded area in (e); here <math>\bar{E}=\left\{ x;x\leq 4\right\} \cup \left\{ x;x>10\right\}</math>.
* Finally note that <math>H</math> is a sub-set of <math>F;</math> see (f). It is depicted as the dark closed loop wholly contained within <math>F,</math> the lighter shaded area, so that <math>H\cap F=H;</math> if an element in the sample space is a member of <math>H</math> then it must also be member of <math>F.</math> (In mathematical logic, we employ this scenario to indicate that “<math>H</math> implies <math>F</math>”, but not necessarily vice-versa.) Notice that <math>G\cap H=\emptyset </math> but <math>H\cap E\neq \emptyset</math>.

= Probability =

The term ''probability'' (or some equivalent) is used in everyday conversation and so can not be unfamiliar to the reader. We talk of the probability, or chance, of rain; the likelihood of England winning the World Cup; or, perhaps more scientifically, the chance of getting a <math>6</math> when rolling a die. What we shall now do is develop a coherent theory of probability; a theory which allows us to combine and manipulate probabilities in a consistent and meaningful manner. We shall describe ways of dealing with, and describing, uncertainty. This will involve ''rules'' which govern our use of terms like probability.

There have been a number of different approaches (interpretations) of probability. Most depend, at least to some extent, on the notion of relative frequency as now described:

* Suppose an experiment has an outcome of interest <math>E</math>. The ''relative frequency interpretation'' of probability says that assuming the experiment can be repeated a large number of times then the relative frequency of observing the outcome <math>E</math> will settle down to a ''number'', denoted <math>\Pr (E),</math> <math>P(E)</math> or Prob<math>(E),</math> called the '''probability''' of <math>E</math>.

This is illustrated in the next Figure where the proportion of heads obtained after <math>n</math> flips of a fair coin is plotted against <math>n</math>, as <math>n</math> increases; e.g., of the first <math>100</math> flips, <math>55</math> were heads (<math>55\%</math>). Notice that the plot becomes less ‘wobbly’ after about <math>n=220</math> and appears to be settling down to the value of <math>\frac{1}{2}</math>.

[[File:Prob_coin.jpg|frameless|600px]]

Due to this interpretation of probability, we often use observed sample proportions to approximate underlying probabilities of interest; see, for example, Question 4 of Exercise 2. There are, of course, other interpretations of probability; e.g., the subjective interpretation which simply expresses the strength of one’s belief about an event of interest such as whether Manchester United will win the European Cup! Any one of these interpretations can be used in practical situations provided the implied notion of probability follows a simple set of ''axioms'' or ''rules''.

== The axioms of probability ==

There are just ''three ''basic rules that must be obeyed when dealing with probabilities:

<ol>
<li>For any event <math>E</math> defined on <math>S,</math> i.e., <math>E\subset S,\,\,\Pr (E)\geq 0</math>; ''probabilities are non-negative''.</li>
<li><math>\Pr (S)=1;</math> ''having defined the sample space of outcomes, one of these outcomes must be observed''.</li>
<li>If events <math>E</math> and <math>F</math> are mutually exclusive defined on <math>S</math>, so that <math>E\cap F=\emptyset </math>, then <math>\Pr \left( E\cup F\right) =\Pr \left( E\right)+\Pr \left( F\right) .</math> In general, for any set of mutually exclusive events, <math>E_{1},E_{2},\ldots ,E_{k},</math> defined on <math>S:</math>
<math>\Pr (E_{1}\cup E_{2}\cup \ldots \cup E_{k})=\Pr (E_{1})+\Pr (E_{2})+\ldots\Pr (E_{k})</math>
i.e., <math>\Pr \left( \bigcup_{j=1}^{k}E_{j}\right) =\sum_{j=1}^{k}\Pr (E_{j}).</math></li></ol>

In terms of the Venn Diagram, one can (and should) usefully think of the area of <math>E,</math> relative to that of <math>S,</math> as providing an indication of probability. (Note, from axiom 2, that the area of <math>S</math> is implicitly normalised to be unity).

Also observe that, contrary to what you may have believed, it is not one of the rules that <math>\Pr (E)\leq 1</math> for any event <math>E</math>. Rather, this is an implication of the <math>3</math> rules given:

* '''implications: '''it must be that for any event <math>E,</math> defined on <math>S</math>, <math>E\cap \bar{E}=\emptyset </math> and <math>E\cup \bar{E}=S.</math> By Axiom <math>1,</math> <math>\Pr (E)\geq 0</math> and <math>\Pr \left( \bar{E}\right) \geq 0</math> and by Axiom <math>3</math> <math>\Pr(E)+\Pr (\bar{E})=\Pr (S).</math> So <math>\Pr \left( E\right) +\Pr \left( \bar{E}\right) =1,</math> by Axiom <math>2.</math> This implies that

# <math>0\leq \Pr (E)\leq 1</math>
# <math>\Pr (\bar{E})=1-\Pr (E)</math>

The first of these is what we might have expected from probability (a number lying between <math>0</math> and <math>1</math>). The second implication is also very important; it says that the probability of <math>E</math> not happening is ‘''one minus the probability of it happening''’. Thus when rolling a die, the probability of getting <math>6</math> is one minus the probability of getting either a <math>1</math>, <math>2</math>, <math>3</math>, <math>4</math> or <math>5.</math>

These axioms imply how to calculate probabilities on a sample space of equally likely outcomes. For example, and as we have already noted, the experiment of rolling a fair die defines a sample space of six, mutually exclusive and equally likely outcomes (<math>1</math> to <math>6</math> dots on the up-turned face). The axioms then say that each of the six probabilities are positive, add to 1 and are all the same. Thus, the probability of any one of the outcomes must be simply <math>\frac{1}{6};</math> which may accord with your intuition. A similar sort of analysis reveals that the probability of drawing a club from a deck of <math>52</math> cards is <math>\frac{13}{52},</math> since any one of the <math>52</math> cards has an equal chance of being drawn and <math>13</math> of them are clubs; i.e., <math>13</math> of the <math>52</math> are clubs, so the probability of drawing a club is <math>\frac{13}{52}.</math> Notice the importance of the assumption of equally likely outcomes here.

In this, and the next section of notes, we shall see how these axioms can be used. Firstly, consider the construction of a probability for the ''union'' of two events; i.e., the probability that ''either ''<math>E</math> or <math>F</math> or (perhaps) ''both ''will occur. Such a probability is embodied in the ''addition rule of probability''.

== The addition rule of probability ==

When rolling a fair die, let <math>E</math> denote the event of an “odd number of dots” and <math>F</math> the event of the “number of dots being greater than, or equal, to <math>4</math>”<math>.</math> What is the probability of the event <math>E\cup F</math>? To calculate this we can collect together all the mutually exclusive (simple) events which comprise <math>E\cup F</math>, and then add up the probabilities (by axiom 3). These simple events are <math>1,3,4,5</math> or <math>6</math> dots. Each has a probability of <math>\frac{1}{6},</math> so the required total probability is: <math>\Pr \left( E\cup F\right) =\frac{5}{6}</math>. Consider carefully how this probability is constructed and note, in particular, that <math>\Pr \left( E\cup F\right) \neq \Pr \left( E\right) +\Pr \left( F\right) </math> since <math>E</math> and <math>F</math> have a simple event in common (namely <math>5</math> dots).

In general, we can calculate the probability of the union of events using the ''addition rule of probability'', as follows.

* For any events, <math>E\subset S</math> and <math>F\subset S:\Pr (E\cup F)=\Pr (E)+\Pr (F)-\Pr (E\cap F).</math> So, in general, <math>\Pr \left( E\cup F\right) \leq \Pr (E)+\Pr (F).</math>

This generalises to three events, <math>E_{1},E_{2}</math> and <math>E_{3}</math> as

<math>\begin{aligned}
\Pr (E_{1}\cup E_{2}\cup E_{3}) &=&\Pr (E_{1})+\Pr (E_{2})+\Pr (E_{3}) \\
&&-\Pr (E_{1}\cap E_{2})-\Pr (E_{1}\cap E_{3})-\Pr (E_{2}\cap E_{3}) \\
&&+\Pr (E_{1}\cap E_{2}\cap E_{3}).\end{aligned}</math>

We can demonstrate this as follows.

Note that

<math>E\cup F=\left( E\cap \bar{F}\right) \cup \left( E\cap F\right) \cup \left(\bar{E}\cap F\right)</math>

the union of <math>3</math> mutually exclusive events. These mutually exclusive events are depicted by the shaded areas <math>\mathbf{a,}</math> <math>\mathbf{b}</math> and <math>\mathbf{c}</math>, respectively, in the next Figure.

[[File:Prob_add.jpg|frameless|500px]]

Then by Axiom <math>3</math>, and from the fact that the three events <math>\left( E\cap\bar{F}\right) </math>, <math>\left( E\cap F\right) </math> and <math>\left( \bar{E}\cap F\right)</math> are mutually exclusive so that the “area” occupied by <math>E\cup F</math> is simply <math>\mathbf{a+b+c,}</math>

<math>\Pr \left( E\cup F\right) =\Pr \left( E\cap \bar{F}\right) +\Pr \left( \bar{E}\cap F\right) +\Pr \left( E\cap F\right) .</math>

But also by Axiom <math>3</math>, since <math>E=\left( E\cap \bar{F}\right) \cup \left(E\cap F\right) </math>, it must be that <math>\Pr (E)=\Pr \left( E\cap \bar{F}\right)+\Pr (E\cap F);</math> similarly, <math>\Pr \left( \bar{E}\cap F\right) =\Pr \left(F\right) -\Pr \left( E\cap F\right)</math>. Putting all of this together gives

<math>\Pr (E\cup F)=\Pr (E)+\Pr (F)-\Pr (E\cap F).</math>

When <math>E</math> and <math>F</math> are mutually exclusive, so that <math>E\cap F=\emptyset</math>, this rule reveals Axiom 2: <math>\Pr (E\cup F)=\Pr (E)+\Pr (F).\smallskip </math>

<ul>
<li>''Example: ''What is the probability of drawing a Queen (<math>Q </math>) or a Club (<math>C</math>) in a single draw from a pack of cards? Now, <math>4</math> out of <math>52 </math> cards are Queens, so <math>\Pr \left( Q\right) =\frac{4}{52},</math> whilst <math>\Pr\left( C\right) =\frac{13}{52}.</math> The probability of drawing the Queen of Clubs is simply <math>\frac{1}{52};</math> i.e., <math>\Pr \left( Q\cap C\right) =\frac{1}{52}</math>. What we require is a Club or a Queen, for which the probability is
<math>\begin{aligned}
\Pr \left( Q\cup C\right) &=&\Pr \left( Q\right) +\Pr \left( C\right) -\Pr\left( Q\cap C\right) \\
&=&\frac{4}{52}+\frac{13}{52}-\frac{1}{52} \\
&=&\frac{16}{52}=\frac{4}{13}.\end{aligned}</math></li>
<li>''Example: ''Consider a car journey from Manchester to London via the M6 and M1. Let <math>E=</math> ''heavy traffic somewhere on route'' and <math>F=</math> ''roadworks somewhere on route''. It is estimated that <math>\Pr (E)=0.8</math> and <math>\Pr (F)=0.4,</math> whilst the probability of NOT encountering both is <math>\Pr (\overline{E\cap F})=0.6.</math> What is the probability of encountering heavytraffic or roadworks?
We require <math>\Pr \left( E\cup F\right) .</math>
<math>\begin{aligned}
\Pr (E\cup F) &=&\Pr (E)+\Pr (F)-\Pr (E\cap F) \\
&=&\Pr (E)+\Pr (F)-(1-\Pr (\overline{E\cap F})) \\
&=&0.8+0.4-1+0.6 \\
&=&0.8=\Pr (E)\end{aligned}</math>
Notice that this implies, in this case, that <math>F\subset E</math> (why?). This ''model ''then implies that when there are roadworks somewhere on route you are bound to encounter heavy traffic; on the other hand, you can encounter heavy traffic on route without ever passing through roadworks. (My own experience of this motorway inclines me towards this implication!)</li></ul>

Similar concepts apply when manipulating proportions as follows:

<ul>
<li>''Example''<math>:</math> A sample of <math>1000</math> undergraduates were asked whether they took either Mathematics, Physics or Chemistry at A-level. The following responses were obtained: <math>100</math> just took Mathematics; <math>70</math> just took Physics; <math>100</math> just took Chemistry; <math>150</math> took Mathematics and Physics, but not Chemistry; <math>40</math> took Mathematics and Chemistry, but not Physics; and, <math>240</math> took Physics and Chemistry, but not Mathematics. What proportion took all three?
This can be addressed with the following diagram:
[[File:Prob_Alevels.jpg|frameless|600px]]
The shaded area contains the number who took all three, which can be deduced from the above information (since the total of the numbers assigned to each part of the Venn diagram must be <math>1000</math>). The answer is therefore <math>30\%</math> (being <math>300</math> out of <math>1000</math>).</li>
<li>Two further results on unions, intersections and complements which are of use (and which are fairly easy to demonstrate using Venn diagrams) are '''de Morgan Laws''':
<ul>
<li><math>\left( \bar{A}\cap \bar{B}\right) =\left( \overline{A\cup B}\right) </math></li>
<li><math>\bar{A}\cup \bar{B}=\left( \overline{A\cap B}\right) </math></li></ul>
</li></ul>

= Footnotes =

File:Prob add.jpg

2013-08-07T21:10:25Z

Admin:

File:Prob Alevels.jpg

2013-08-07T21:09:58Z

Admin:

Probability Intro

2013-08-07T21:09:20Z

Admin:

= Introducing Probability =

So far we have been looking at ways of summarising samples of data drawn from an underlying population of interest. Although at times tedious, all such arithmetic calculations are fairly mechanical and straightforward to apply. To remind ourselves, one of the primary reasons for wishing to summarise data is so assist in the development of inferences about the population from which the data were taken. That is to say, we would like to elicit some information about the mechanism which generated the observed data.

We now start on the process of developing mathematical ways of formulating inferences and this requires the use of ''probability''. This becomes clear if we think back to one of the early questions posed in this course: ''prior to sampling is it possible to predict with absolute certainty what will be observed''? The answer to this question is ''no''; although it would be of interest to know how ''likely'' it is that certain values would be observed. Or, what is the ''probability'' of observing certain values?

Before proceeding, we need some more tools:

= Venn diagrams =

Venn diagrams (and diagrams in general) are of enormous help in trying to understand, and manipulate probability. We begin with some basic definitions, some of which we have encountered before.

* '''Experiment:''' any process which, when applied, provides data or an outcome; e.g., rolling a die and observing the number of dots on the upturned face; recording the amount of rainfall in Manchester over a period of time.
* '''Sample Space:''' set of possible outcomes of an experiment; e.g., <math>S</math> (or <math>\Omega </math>) <math>=</math> <math>\{1,2,3,4,5,6\}</math>, which is the sample space of rolling a dice. Or <math>S</math> <math>=</math> <math>\{x;x\geq 0\}</math>, which is the sample space of an experiment where the outcomes can be any real non-negative number, or ‘''the set of real non-negative real numbers''’.
* '''Simple Event''': just one of the possible outcomes on <math>S</math>
* '''Event:''' a ''subset'' of <math>S</math>, denoted <math>E\subset S</math>; e.g., <math>E=\left\{ 2,4,6\right\}</math> (i.e. any even number on a dice) or <math>E=\left\{ x;4<x\leq 10\right\} ,</math> which means ‘''the set of real numbers which are strictly bigger than'' <math>4</math> ''but less than or equal to ''<math>10</math>’. Note that an event, <math>E</math>, is a collection of simple events.

Such concepts can be represented by means of the following Venn Diagram:

[[File:Venn_1.jpg|frameless|600px]]

The sample space, <math>S,</math> is depicted as a closed rectangle, and the event <math>E</math> is a closed loop wholly contained within <math>S</math> and we write (in set notation) <math>E\subset S</math>.

In dealing with probability, and in particular the probability of an event (or events) occurring, we shall need to be familiar with '''UNIONS, INTERSECTIONS''' and '''COMPLEMENTS'''.

To illustrate these concepts, consider the sample space <math>S=\{x;x\geq 0\},\,</math> with the following events defined on <math>S,</math> as depicted in Figure 3.2:

<math>E=\{x;4<x\leq 10\},\,F=\{x;7<x\leq 17\},\,G=\{x;x>15\},\,H=\{x;9<x\leq
13\}. </math>

{| class="wikitable"
|-
| (a) Event <math>E</math>: A closed loop
| (b) Union: <math>E\cup F</math>
|-
| [[File:Venn_2a.jpg|frameless|300px]]
| [[File:Venn_2b.jpg|frameless|300px]]
|-
| (c) Intersection: <math>E\cap F</math>
| (d) The Null set/event: <math>E\cap G=\emptyset </math>
|-
| [[File:Venn_2c.jpg|frameless|300px]]
| [[File:Venn_2d.jpg|frameless|300px]]
|-
| (e) Complement of <math>E</math>: <math>\bar{E}</math>
| (f) Subset of <math>F</math>: <math>H\subset F</math> and <math>H\cap F=H</math>
|-
| [[File:Venn_2e.jpg|frameless|300px]]
| [[File:Venn_2f.jpg|frameless|300px]]
|}

* The ''union'' of <math>E</math> and <math>F</math> is denoted <math>E\cup F,</math> with <math>E\cup F=\{x;4<x\leq 17\};</math> i.e., it contains elements (simple events) which are either in <math>E</math> or in <math>F</math> or (perhaps) in both. This is illustrated on the Venn diagram by the dark shaded area in diagram (b).
* The'' intersection'' of <math>E</math> and <math>F</math> is denoted <math>E\cap F,</math> with <math>E\cap F=\left\{ x;7\leq x\leq 10\right\} ;</math> i.e., it contains elements (simple events) which are common to both <math>E</math> and <math>F.</math> Again this is depicted by the dark shaded area in (c). If events have no elements in common (as, for example, <math>E</math> and <math>G</math>) then they are said to be ''mutually exclusive'', and we can write <math>E\cap G=\emptyset ,</math> meaning the ''null set'' which contains no elements. Such a situation is illustrated on the Venn Diagram by events (the two shaded closed loops in (d)) which do not overlap. Notice however that <math>G\cap F\neq \emptyset ,</math> since <math>G</math> and <math>F</math> have elements in common.
* The ''complement'' of an event <math>E,</math> say, is everything defined on the sample space which is not in <math>E.</math> This event is denoted <math>\bar{E}</math>, the dark shaded area in (e); here <math>\bar{E}=\left\{ x;x\leq 4\right\} \cup \left\{ x;x>10\right\}</math>.
* Finally note that <math>H</math> is a sub-set of <math>F;</math> see (f). It is depicted as the dark closed loop wholly contained within <math>F,</math> the lighter shaded area, so that <math>H\cap F=H;</math> if an element in the sample space is a member of <math>H</math> then it must also be member of <math>F.</math> (In mathematical logic, we employ this scenario to indicate that “<math>H</math> implies <math>F</math>”, but not necessarily vice-versa.) Notice that <math>G\cap H=\emptyset </math> but <math>H\cap E\neq \emptyset</math>.

= Probability =

The term ''probability'' (or some equivalent) is used in everyday conversation and so can not be unfamiliar to the reader. We talk of the probability, or chance, of rain; the likelihood of England winning the World Cup; or, perhaps more scientifically, the chance of getting a <math>6</math> when rolling a die. What we shall now do is develop a coherent theory of probability; a theory which allows us to combine and manipulate probabilities in a consistent and meaningful manner. We shall describe ways of dealing with, and describing, uncertainty. This will involve ''rules'' which govern our use of terms like probability.

There have been a number of different approaches (interpretations) of probability. Most depend, at least to some extent, on the notion of relative frequency as now described:

* Suppose an experiment has an outcome of interest <math>E</math>. The ''relative frequency interpretation'' of probability says that assuming the experiment can be repeated a large number of times then the relative frequency of observing the outcome <math>E</math> will settle down to a ''number'', denoted <math>\Pr (E),</math> <math>P(E)</math> or Prob<math>(E),</math> called the '''probability''' of <math>E</math>.

This is illustrated in the next Figure where the proportion of heads obtained after <math>n</math> flips of a fair coin is plotted against <math>n</math>, as <math>n</math> increases; e.g., of the first <math>100</math> flips, <math>55</math> were heads (<math>55\%</math>). Notice that the plot becomes less ‘wobbly’ after about <math>n=220</math> and appears to be settling down to the value of <math>\frac{1}{2}</math>.

[[File:Prob_coin.jpg|frameless|600px]]

Due to this interpretation of probability, we often use observed sample proportions to approximate underlying probabilities of interest; see, for example, Question 4 of Exercise 2. There are, of course, other interpretations of probability; e.g., the subjective interpretation which simply expresses the strength of one’s belief about an event of interest such as whether Manchester United will win the European Cup! Any one of these interpretations can be used in practical situations provided the implied notion of probability follows a simple set of ''axioms'' or ''rules''.

== The axioms of probability ==

There are just ''three ''basic rules that must be obeyed when dealing with probabilities:

<ol>
<li>For any event <math>E</math> defined on <math>S,</math> i.e., <math>E\subset S,\,\,\Pr (E)\geq 0</math>; ''probabilities are non-negative''.</li>
<li><math>\Pr (S)=1;</math> ''having defined the sample space of outcomes, one of these outcomes must be observed''.</li>
<li>If events <math>E</math> and <math>F</math> are mutually exclusive defined on <math>S</math>, so that <math>E\cap F=\emptyset </math>, then <math>\Pr \left( E\cup F\right) =\Pr \left( E\right)+\Pr \left( F\right) .</math> In general, for any set of mutually exclusive events, <math>E_{1},E_{2},\ldots ,E_{k},</math> defined on <math>S:</math>
<math>\Pr (E_{1}\cup E_{2}\cup \ldots \cup E_{k})=\Pr (E_{1})+\Pr (E_{2})+\ldots\Pr (E_{k})</math>
i.e., <math>\Pr \left( \bigcup_{j=1}^{k}E_{j}\right) =\sum_{j=1}^{k}\Pr (E_{j}).</math></li></ol>

In terms of the Venn Diagram, one can (and should) usefully think of the area of <math>E,</math> relative to that of <math>S,</math> as providing an indication of probability. (Note, from axiom 2, that the area of <math>S</math> is implicitly normalised to be unity).

Also observe that, contrary to what you may have believed, it is not one of the rules that <math>\Pr (E)\leq 1</math> for any event <math>E</math>. Rather, this is an implication of the <math>3</math> rules given:

* '''implications: '''it must be that for any event <math>E,</math> defined on <math>S</math>, <math>E\cap \bar{E}=\emptyset </math> and <math>E\cup \bar{E}=S.</math> By Axiom <math>1,</math> <math>\Pr (E)\geq 0</math> and <math>\Pr \left( \bar{E}\right) \geq 0</math> and by Axiom <math>3</math> <math>\Pr(E)+\Pr (\bar{E})=\Pr (S).</math> So <math>\Pr \left( E\right) +\Pr \left( \bar{E}\right) =1,</math> by Axiom <math>2.</math> This implies that

# <math>0\leq \Pr (E)\leq 1</math>
# <math>\Pr (\bar{E})=1-\Pr (E)</math>

The first of these is what we might have expected from probability (a number lying between <math>0</math> and <math>1</math>). The second implication is also very important; it says that the probability of <math>E</math> not happening is ‘''one minus the probability of it happening''’. Thus when rolling a die, the probability of getting <math>6</math> is one minus the probability of getting either a <math>1</math>, <math>2</math>, <math>3</math>, <math>4</math> or <math>5.</math>

These axioms imply how to calculate probabilities on a sample space of equally likely outcomes. For example, and as we have already noted, the experiment of rolling a fair die defines a sample space of six, mutually exclusive and equally likely outcomes (<math>1</math> to <math>6</math> dots on the up-turned face). The axioms then say that each of the six probabilities are positive, add to 1 and are all the same. Thus, the probability of any one of the outcomes must be simply <math>\frac{1}{6};</math> which may accord with your intuition. A similar sort of analysis reveals that the probability of drawing a club from a deck of <math>52</math> cards is <math>\frac{13}{52},</math> since any one of the <math>52</math> cards has an equal chance of being drawn and <math>13</math> of them are clubs; i.e., <math>13</math> of the <math>52</math> are clubs, so the probability of drawing a club is <math>\frac{13}{52}.</math> Notice the importance of the assumption of equally likely outcomes here.

In this, and the next section of notes, we shall see how these axioms can be used. Firstly, consider the construction of a probability for the ''union'' of two events; i.e., the probability that ''either ''<math>E</math> or <math>F</math> or (perhaps) ''both ''will occur. Such a probability is embodied in the ''addition rule of probability''.

== The addition rule of probability ==

When rolling a fair die, let <math>E</math> denote the event of an “odd number of dots” and <math>F</math> the event of the “number of dots being greater than, or equal, to <math>4</math>”<math>.</math> What is the probability of the event <math>E\cup F</math>? To calculate this we can collect together all the mutually exclusive (simple) events which comprise <math>E\cup F</math>, and then add up the probabilities (by axiom 3). These simple events are <math>1,3,4,5</math> or <math>6</math> dots. Each has a probability of <math>\frac{1}{6},</math> so the required total probability is: <math>\Pr \left( E\cup F\right) =\frac{5}{6}</math>. Consider carefully how this probability is constructed and note, in particular, that <math>\Pr \left( E\cup F\right) \neq \Pr \left( E\right) +\Pr \left( F\right) </math> since <math>E</math> and <math>F</math> have a simple event in common (namely <math>5</math> dots).

In general, we can calculate the probability of the union of events using the ''addition rule of probability'', as follows.

* For any events, <math>E\subset S</math> and <math>F\subset S:\Pr (E\cup F)=\Pr (E)+\Pr (F)-\Pr (E\cap F).</math> So, in general, <math>\Pr \left( E\cup F\right) \leq \Pr (E)+\Pr (F).</math>

This generalises to three events, <math>E_{1},E_{2}</math> and <math>E_{3}</math> as

<math>\begin{aligned}
\Pr (E_{1}\cup E_{2}\cup E_{3}) &=&\Pr (E_{1})+\Pr (E_{2})+\Pr (E_{3}) \\
&&-\Pr (E_{1}\cap E_{2})-\Pr (E_{1}\cap E_{3})-\Pr (E_{2}\cap E_{3}) \\
&&+\Pr (E_{1}\cap E_{2}\cap E_{3}).\end{aligned}</math>

We can demonstrate this as follows.

Note that

<math>E\cup F=\left( E\cap \bar{F}\right) \cup \left( E\cap F\right) \cup \left(\bar{E}\cap F\right)</math>

the union of <math>3</math> mutually exclusive events. These mutually exclusive events are depicted by the shaded areas <math>\mathbf{a,}</math> <math>\mathbf{b}</math> and <math>\mathbf{c}</math>, respectively, in the next Figure.

[[File:Prob_add.jpg|frameless|600px]]

Then by Axiom <math>3</math>, and from the fact that the three events <math>\left( E\cap\bar{F}\right) </math>, <math>\left( E\cap F\right) </math> and <math>\left( \bar{E}\cap F\right)</math> are mutually exclusive so that the “area” occupied by <math>E\cup F</math> is simply <math>\mathbf{a+b+c,}</math>

<math>\Pr \left( E\cup F\right) =\Pr \left( E\cap \bar{F}\right) +\Pr \left( \bar{E}\cap F\right) +\Pr \left( E\cap F\right) .</math>

But also by Axiom <math>3</math>, since <math>E=\left( E\cap \bar{F}\right) \cup \left(E\cap F\right) </math>, it must be that <math>\Pr (E)=\Pr \left( E\cap \bar{F}\right)+\Pr (E\cap F);</math> similarly, <math>\Pr \left( \bar{E}\cap F\right) =\Pr \left(F\right) -\Pr \left( E\cap F\right)</math>. Putting all of this together gives

<math>\Pr (E\cup F)=\Pr (E)+\Pr (F)-\Pr (E\cap F).</math>

When <math>E</math> and <math>F</math> are mutually exclusive, so that <math>E\cap F=\emptyset</math>, this rule reveals Axiom 2: <math>\Pr (E\cup F)=\Pr (E)+\Pr (F).\smallskip </math>

<ul>
<li>''Example: ''What is the probability of drawing a Queen (<math>Q </math>) or a Club (<math>C</math>) in a single draw from a pack of cards? Now, <math>4</math> out of <math>52 </math> cards are Queens, so <math>\Pr \left( Q\right) =\frac{4}{52},</math> whilst <math>\Pr\left( C\right) =\frac{13}{52}.</math> The probability of drawing the Queen of Clubs is simply <math>\frac{1}{52};</math> i.e., <math>\Pr \left( Q\cap C\right) =\frac{1}{52}</math>. What we require is a Club or a Queen, for which the probability is
<math>\begin{aligned}
\Pr \left( Q\cup C\right) &=&\Pr \left( Q\right) +\Pr \left( C\right) -\Pr\left( Q\cap C\right) \\
&=&\frac{4}{52}+\frac{13}{52}-\frac{1}{52} \\
&=&\frac{16}{52}=\frac{4}{13}.\end{aligned}</math></li>
<li>''Example: ''Consider a car journey from Manchester to London via the M6 and M1. Let <math>E=</math> ''heavy traffic somewhere on route'' and <math>F=</math> ''roadworks somewhere on route''. It is estimated that <math>\Pr (E)=0.8</math> and <math>\Pr (F)=0.4,</math> whilst the probability of NOT encountering both is <math>\Pr (\overline{E\cap F})=0.6.</math> What is the probability of encountering heavytraffic or roadworks?
We require <math>\Pr \left( E\cup F\right) .</math>
<math>\begin{aligned}
\Pr (E\cup F) &=&\Pr (E)+\Pr (F)-\Pr (E\cap F) \\
&=&\Pr (E)+\Pr (F)-(1-\Pr (\overline{E\cap F})) \\
&=&0.8+0.4-1+0.6 \\
&=&0.8=\Pr (E)\end{aligned}</math>
Notice that this implies, in this case, that <math>F\subset E</math> (why?). This ''model ''then implies that when there are roadworks somewhere on route you are bound to encounter heavy traffic; on the other hand, you can encounter heavy traffic on route without ever passing through roadworks. (My own experience of this motorway inclines me towards this implication!)</li></ul>

Similar concepts apply when manipulating proportions as follows:

<ul>
<li>''Example''<math>:</math> A sample of <math>1000</math> undergraduates were asked whether they took either Mathematics, Physics or Chemistry at A-level. The following responses were obtained: <math>100</math> just took Mathematics; <math>70</math> just took Physics; <math>100</math> just took Chemistry; <math>150</math> took Mathematics and Physics, but not Chemistry; <math>40</math> took Mathematics and Chemistry, but not Physics; and, <math>240</math> took Physics and Chemistry, but not Mathematics. What proportion took all three?
This can be addressed with the following diagram:
[[File:Prob_Alevels.jpg|frameless|600px]]
The shaded area contains the number who took all three, which can be deduced from the above information (since the total of the numbers assigned to each part of the Venn diagram must be <math>1000</math>). The answer is therefore <math>30\%</math> (being <math>300</math> out of <math>1000</math>).</li>
<li>Two further results on unions, intersections and complements which are of use (and which are fairly easy to demonstrate using Venn diagrams) are '''de Morgan Laws''':
<ul>
<li><math>\left( \bar{A}\cap \bar{B}\right) =\left( \overline{A\cup B}\right) </math></li>
<li><math>\bar{A}\cup \bar{B}=\left( \overline{A\cap B}\right) </math></li></ul>
</li></ul>

= Footnotes =

File:Prob coin.jpg

2013-08-07T20:59:35Z

Admin:

Probability Intro

2013-08-07T20:59:13Z

Admin:

= Introducing Probability =

So far we have been looking at ways of summarising samples of data drawn from an underlying population of interest. Although at times tedious, all such arithmetic calculations are fairly mechanical and straightforward to apply. To remind ourselves, one of the primary reasons for wishing to summarise data is so assist in the development of inferences about the population from which the data were taken. That is to say, we would like to elicit some information about the mechanism which generated the observed data.

We now start on the process of developing mathematical ways of formulating inferences and this requires the use of ''probability''. This becomes clear if we think back to one of the early questions posed in this course: ''prior to sampling is it possible to predict with absolute certainty what will be observed''? The answer to this question is ''no''; although it would be of interest to know how ''likely'' it is that certain values would be observed. Or, what is the ''probability'' of observing certain values?

Before proceeding, we need some more tools:

= Venn diagrams =

Venn diagrams (and diagrams in general) are of enormous help in trying to understand, and manipulate probability. We begin with some basic definitions, some of which we have encountered before.

* '''Experiment:''' any process which, when applied, provides data or an outcome; e.g., rolling a die and observing the number of dots on the upturned face; recording the amount of rainfall in Manchester over a period of time.
* '''Sample Space:''' set of possible outcomes of an experiment; e.g., <math>S</math> (or <math>\Omega </math>) <math>=</math> <math>\{1,2,3,4,5,6\}</math>, which is the sample space of rolling a dice. Or <math>S</math> <math>=</math> <math>\{x;x\geq 0\}</math>, which is the sample space of an experiment where the outcomes can be any real non-negative number, or ‘''the set of real non-negative real numbers''’.
* '''Simple Event''': just one of the possible outcomes on <math>S</math>
* '''Event:''' a ''subset'' of <math>S</math>, denoted <math>E\subset S</math>; e.g., <math>E=\left\{ 2,4,6\right\}</math> (i.e. any even number on a dice) or <math>E=\left\{ x;4<x\leq 10\right\} ,</math> which means ‘''the set of real numbers which are strictly bigger than'' <math>4</math> ''but less than or equal to ''<math>10</math>’. Note that an event, <math>E</math>, is a collection of simple events.

Such concepts can be represented by means of the following Venn Diagram:

[[File:Venn_1.jpg|frameless|600px]]

The sample space, <math>S,</math> is depicted as a closed rectangle, and the event <math>E</math> is a closed loop wholly contained within <math>S</math> and we write (in set notation) <math>E\subset S</math>.

In dealing with probability, and in particular the probability of an event (or events) occurring, we shall need to be familiar with '''UNIONS, INTERSECTIONS''' and '''COMPLEMENTS'''.

To illustrate these concepts, consider the sample space <math>S=\{x;x\geq 0\},\,</math> with the following events defined on <math>S,</math> as depicted in Figure 3.2:

<math>E=\{x;4<x\leq 10\},\,F=\{x;7<x\leq 17\},\,G=\{x;x>15\},\,H=\{x;9<x\leq
13\}. </math>

{| class="wikitable"
|-
| (a) Event <math>E</math>: A closed loop
| (b) Union: <math>E\cup F</math>
|-
| [[File:Venn_2a.jpg|frameless|300px]]
| [[File:Venn_2b.jpg|frameless|300px]]
|-
| (c) Intersection: <math>E\cap F</math>
| (d) The Null set/event: <math>E\cap G=\emptyset </math>
|-
| [[File:Venn_2c.jpg|frameless|300px]]
| [[File:Venn_2d.jpg|frameless|300px]]
|-
| (e) Complement of <math>E</math>: <math>\bar{E}</math>
| (f) Subset of <math>F</math>: <math>H\subset F</math> and <math>H\cap F=H</math>
|-
| [[File:Venn_2e.jpg|frameless|300px]]
| [[File:Venn_2f.jpg|frameless|300px]]
|}

* The ''union'' of <math>E</math> and <math>F</math> is denoted <math>E\cup F,</math> with <math>E\cup F=\{x;4<x\leq 17\};</math> i.e., it contains elements (simple events) which are either in <math>E</math> or in <math>F</math> or (perhaps) in both. This is illustrated on the Venn diagram by the dark shaded area in diagram (b).
* The'' intersection'' of <math>E</math> and <math>F</math> is denoted <math>E\cap F,</math> with <math>E\cap F=\left\{ x;7\leq x\leq 10\right\} ;</math> i.e., it contains elements (simple events) which are common to both <math>E</math> and <math>F.</math> Again this is depicted by the dark shaded area in (c). If events have no elements in common (as, for example, <math>E</math> and <math>G</math>) then they are said to be ''mutually exclusive'', and we can write <math>E\cap G=\emptyset ,</math> meaning the ''null set'' which contains no elements. Such a situation is illustrated on the Venn Diagram by events (the two shaded closed loops in (d)) which do not overlap. Notice however that <math>G\cap F\neq \emptyset ,</math> since <math>G</math> and <math>F</math> have elements in common.
* The ''complement'' of an event <math>E,</math> say, is everything defined on the sample space which is not in <math>E.</math> This event is denoted <math>\bar{E}</math>, the dark shaded area in (e); here <math>\bar{E}=\left\{ x;x\leq 4\right\} \cup \left\{ x;x>10\right\}</math>.
* Finally note that <math>H</math> is a sub-set of <math>F;</math> see (f). It is depicted as the dark closed loop wholly contained within <math>F,</math> the lighter shaded area, so that <math>H\cap F=H;</math> if an element in the sample space is a member of <math>H</math> then it must also be member of <math>F.</math> (In mathematical logic, we employ this scenario to indicate that “<math>H</math> implies <math>F</math>”, but not necessarily vice-versa.) Notice that <math>G\cap H=\emptyset </math> but <math>H\cap E\neq \emptyset</math>.

= Probability =

The term ''probability'' (or some equivalent) is used in everyday conversation and so can not be unfamiliar to the reader. We talk of the probability, or chance, of rain; the likelihood of England winning the World Cup; or, perhaps more scientifically, the chance of getting a <math>6</math> when rolling a die. What we shall now do is develop a coherent theory of probability; a theory which allows us to combine and manipulate probabilities in a consistent and meaningful manner. We shall describe ways of dealing with, and describing, uncertainty. This will involve ''rules'' which govern our use of terms like probability.

There have been a number of different approaches (interpretations) of probability. Most depend, at least to some extent, on the notion of relative frequency as now described:

* Suppose an experiment has an outcome of interest <math>E</math>. The ''relative frequency interpretation'' of probability says that assuming the experiment can be repeated a large number of times then the relative frequency of observing the outcome <math>E</math> will settle down to a ''number'', denoted <math>\Pr (E),</math> <math>P(E)</math> or Prob<math>(E),</math> called the '''probability''' of <math>E</math>.

This is illustrated in the next Figure where the proportion of heads obtained after <math>n</math> flips of a fair coin is plotted against <math>n</math>, as <math>n</math> increases; e.g., of the first <math>100</math> flips, <math>55</math> were heads (<math>55\%</math>). Notice that the plot becomes less ‘wobbly’ after about <math>n=220</math> and appears to be settling down to the value of <math>\frac{1}{2}</math>.

[[File:Prob_coin.jpg|frameless|600px]]

Due to this interpretation of probability, we often use observed sample proportions to approximate underlying probabilities of interest; see, for example, Question 4 of Exercise 2. There are, of course, other interpretations of probability; e.g., the subjective interpretation which simply expresses the strength of one’s belief about an event of interest such as whether Manchester United will win the European Cup! Any one of these interpretations can be used in practical situations provided the implied notion of probability follows a simple set of ''axioms'' or ''rules''.

== The axioms of probability ==

There are just ''three ''basic rules that must be obeyed when dealing with probabilities:

<ol>
<li>For any event <math>E</math> defined on <math>S,</math> i.e., <math>E\subset S,\,\,\Pr (E)\geq 0</math>; ''probabilities are non-negative''.</li>
<li><math>\Pr (S)=1;</math> ''having defined the sample space of outcomes, one of these outcomes must be observed''.</li>
<li>If events <math>E</math> and <math>F</math> are mutually exclusive defined on <math>S</math>, so that <math>E\cap F=\emptyset </math>, then <math>\Pr \left( E\cup F\right) =\Pr \left( E\right)+\Pr \left( F\right) .</math> In general, for any set of mutually exclusive events, <math>E_{1},E_{2},\ldots ,E_{k},</math> defined on <math>S:</math>
<math>\Pr (E_{1}\cup E_{2}\cup \ldots \cup E_{k})=\Pr (E_{1})+\Pr (E_{2})+\ldots\Pr (E_{k})</math>
i.e., <math>\Pr \left( \bigcup_{j=1}^{k}E_{j}\right) =\sum_{j=1}^{k}\Pr (E_{j}).</math></li></ol>

In terms of the Venn Diagram, one can (and should) usefully think of the area of <math>E,</math> relative to that of <math>S,</math> as providing an indication of probability. (Note, from axiom 2, that the area of <math>S</math> is implicitly normalised to be unity).

Also observe that, contrary to what you may have believed, it is not one of the rules that <math>\Pr (E)\leq 1</math> for any event <math>E</math>. Rather, this is an implication of the <math>3</math> rules given:

* '''implications: '''it must be that for any event <math>E,</math> defined on <math>S</math>, <math>E\cap \bar{E}=\emptyset </math> and <math>E\cup \bar{E}=S.</math> By Axiom <math>1,</math> <math>\Pr (E)\geq 0</math> and <math>\Pr \left( \bar{E}\right) \geq 0</math> and by Axiom <math>3</math> <math>\Pr(E)+\Pr (\bar{E})=\Pr (S).</math> So <math>\Pr \left( E\right) +\Pr \left( \bar{E}\right) =1,</math> by Axiom <math>2.</math> This implies that

# <math>0\leq \Pr (E)\leq 1</math>
# <math>\Pr (\bar{E})=1-\Pr (E)</math>

The first of these is what we might have expected from probability (a number lying between <math>0</math> and <math>1</math>). The second implication is also very important; it says that the probability of <math>E</math> not happening is ‘''one minus the probability of it happening''’. Thus when rolling a die, the probability of getting <math>6</math> is one minus the probability of getting either a <math>1</math>, <math>2</math>, <math>3</math>, <math>4</math> or <math>5.</math>

These axioms imply how to calculate probabilities on a sample space of equally likely outcomes. For example, and as we have already noted, the experiment of rolling a fair die defines a sample space of six, mutually exclusive and equally likely outcomes (<math>1</math> to <math>6</math> dots on the up-turned face). The axioms then say that each of the six probabilities are positive, add to 1 and are all the same. Thus, the probability of any one of the outcomes must be simply <math>\frac{1}{6};</math> which may accord with your intuition. A similar sort of analysis reveals that the probability of drawing a club from a deck of <math>52</math> cards is <math>\frac{13}{52},</math> since any one of the <math>52</math> cards has an equal chance of being drawn and <math>13</math> of them are clubs; i.e., <math>13</math> of the <math>52</math> are clubs, so the probability of drawing a club is <math>\frac{13}{52}.</math> Notice the importance of the assumption of equally likely outcomes here.

In this, and the next section of notes, we shall see how these axioms can be used. Firstly, consider the construction of a probability for the ''union'' of two events; i.e., the probability that ''either ''<math>E</math> or <math>F</math> or (perhaps) ''both ''will occur. Such a probability is embodied in the ''addition rule of probability''.

== The addition rule of probability ==

When rolling a fair die, let <math>E</math> denote the event of an “odd number of dots” and <math>F</math> the event of the “number of dots being greater than, or equal, to <math>4</math>”<math>.</math> What is the probability of the event <math>E\cup F</math>? To calculate this we can collect together all the mutually exclusive (simple) events which comprise <math>E\cup F</math>, and then add up the probabilities (by axiom 3). These simple events are <math>1,3,4,5</math> or <math>6</math> dots. Each has a probability of <math>\frac{1}{6},</math> so the required total probability is: <math>\Pr \left( E\cup F\right) =\frac{5}{6}</math>. Consider carefully how this probability is constructed and note, in particular, that <math>\Pr \left( E\cup F\right) \neq \Pr \left( E\right) +\Pr \left( F\right) </math> since <math>E</math> and <math>F</math> have a simple event in common (namely <math>5</math> dots).

In general, we can calculate the probability of the union of events using the ''addition rule of probability'', as follows.

* For any events, <math>E\subset S</math> and <math>F\subset S:\Pr (E\cup F)=\Pr (E)+\Pr (F)-\Pr (E\cap F).</math> So, in general, <math>\Pr \left( E\cup F\right) \leq \Pr (E)+\Pr (F).</math>

This generalises to three events, <math>E_{1},E_{2}</math> and <math>E_{3}</math> as

<math>\begin{aligned}
\Pr (E_{1}\cup E_{2}\cup E_{3}) &=&\Pr (E_{1})+\Pr (E_{2})+\Pr (E_{3}) \\
&&-\Pr (E_{1}\cap E_{2})-\Pr (E_{1}\cap E_{3})-\Pr (E_{2}\cap E_{3}) \\
&&+\Pr (E_{1}\cap E_{2}\cap E_{3}).\end{aligned}</math>

= Footnotes =

Probability Intro

MediaWiki:CustomNavBlocks

2013-08-07T19:42:29Z

Admin:

CustomBlockMainNav|ECLR

MediaWiki:CustomNavBlocks

2013-08-07T19:42:12Z

Admin: Created page with "CustomBlockMainNav|ECLR Home"

CustomBlockMainNav|ECLR Home

Main Page

2013-08-07T13:11:35Z

Admin:

== Econometric Computing Learning Resource (ECLR) ==

This is the home of the ECLR. It is the purpose of this resource to facilitate the application of Econometric techniques. There exist numerous software packages that can be used to solve Econometric problems. Some of them are menu driven (e.g. EVIEWS) and are great to tackle standard econometric problems. Others require more programming but therefore also allow the user to tackle non-standard problems. This resource provides support material for the use of [[EVIEWS]] and [[MATLAB]].

This page will not really teach Econometrics, although it will provide sufficient econometric background to facilitate the implementation of the different econometric techniques. To do any serious econometrics you need to have some basic statistics knowledge. On this page [[Statistics]] you can review some of the required statistical background.

{| class="wikitable"
|-
! scope="col"| Programming Languages
! scope="col"|
! scope="col"| Menu-driven software
! scope="col"|
! scope="col"| Statistics Introduction
|-
| [[MATLAB|MATLAB]]
| Python (to come)
| [[EVIEWS|EVIEWS]]
|
| [[Statistics|Statistics Intro]]
|}

== Should you learn programming to do Econometrics? ==
First, the answer to this question is often no. If you need to apply a very standard econometric technique and don't have to do that too often, then it may well be that a menu driven software package (like EVIEWS) is exactly the right way to go.

However, if you want to achieve nonstandard things (and working on a PhD or even a MSc you are bound to want to do non-standard things) the picture is very different. A menu driven software package may do some related things but often not exactly what you want. Even if the software package has a "button" to do what you want to do, it is often badly documented what exactly happens underneath the hood. But most importantly, many nonstandard things just cannot be done in menu driven software. This is were MATLAB (or other software packages like GAUSS or OX) comes to the rescue. At the core it has nothing to do with econometrics. It is a matrix algebra maths programme in which you can do with data (stored in matrices) whatever you want. When we say whatever, then this is to be understood almost literally. And that is the point.

Therefore, there are powerful reasons why you may want to learn to use a proper programming language to solve your econometric problem:

# You can easily repeat the same analysis with many more dataset.
# For some analysis there are some very repetitive things you got to do. In some cases, writing a program will eliminate the need to do essentially the same thing many times. You instruct the software to do the repetitive work for you.
# By writing a MATLAB program (or script) you essentially also create a record of what you do. This is extremely useful when it comes to understanding previous work and to identify mistakes. This is possibly the most important advantage of writing your own programs.
# By having to instruct your computer what to actually do, you properly learn the underlying Econometrics. In other words, programming can be an awesome econometric learning tool.

If you did any programming in any other programming language, such as Visual Basic, C, C++, GAUSS, etc. you will recognise a lot of common patterns. Learning how to use MATLAB to do econometrics is 100% a trial and error process and hence this wiki will provide guidance but you will have to practice yourself.

== Authors, Maintenance and Contributions ==

This wiki was created by [mailto:ralf.becker@manchester.ac.uk Ralf Becker] and [mailto:arthur.sinko@manchester.ac.uk Arthur Sinko] with the financial support of a University of Manchester Investing in Success grant. If you have any suggestions please contact us by email. Contributions to this wiki are encouraged. Please contact us for details.

== Help for using WikiMedia ==
* [http://www.mediawiki.org/wiki/Manual:MediaWiki - FAQ]
* [http://www.mediawiki.org/wiki/Manual:MediaWiki - Configuration Settings]

Main Page

2013-08-07T13:07:04Z

Admin:

== Econometric Computing Learning Resource (ECLR) ==

This is the home of the ECLR. It is the purpose of this resource to facilitate the application of Econometric techniques. There exist numerous software packages that can be used to solve Econometric problems. Some of them are menu driven (e.g. EVIEWS) and are great to tackle standard econometric problems. Others require more programming but therefore also allow the user to tackle non-standard problems. This resource provides support material for the use of [[EVIEWS]] and [[MATLAB]].

This page will not really teach Econometrics, although it will provide sufficient econometric background to facilitate the implementation of the different econometric techniques. To do any serious econometrics you need to have some basic statistics knowledge. On this page [[Statistics]] you can review some of the required statistical background.

== Should you learn programming to do Econometrics? ==
First, the answer to this question is often no. If you need to apply a very standard econometric technique and don't have to do that too often, then it may well be that a menu driven software package (like EVIEWS) is exactly the right way to go.

However, if you want to achieve nonstandard things (and working on a PhD or even a MSc you are bound to want to do non-standard things) the picture is very different. A menu driven software package may do some related things but often not exactly what you want. Even if the software package has a "button" to do what you want to do, it is often badly documented what exactly happens underneath the hood. But most importantly, many nonstandard things just cannot be done in menu driven software. This is were MATLAB (or other software packages like GAUSS or OX) comes to the rescue. At the core it has nothing to do with econometrics. It is a matrix algebra maths programme in which you can do with data (stored in matrices) whatever you want. When we say whatever, then this is to be understood almost literally. And that is the point.

Therefore, there are powerful reasons why you may want to learn to use a proper programming language to solve your econometric problem:

# You can easily repeat the same analysis with many more dataset.
# For some analysis there are some very repetitive things you got to do. In some cases, writing a program will eliminate the need to do essentially the same thing many times. You instruct the software to do the repetitive work for you.
# By writing a MATLAB program (or script) you essentially also create a record of what you do. This is extremely useful when it comes to understanding previous work and to identify mistakes. This is possibly the most important advantage of writing your own programs.
# By having to instruct your computer what to actually do, you properly learn the underlying Econometrics. In other words, programming can be an awesome econometric learning tool.

If you did any programming in any other programming language, such as Visual Basic, C, C++, GAUSS, etc. you will recognise a lot of common patterns. Learning how to use MATLAB to do econometrics is 100% a trial and error process and hence this wiki will provide guidance but you will have to practice yourself.

== Authors, Maintenance and Contributions ==

This wiki was created by [mailto:ralf.becker@manchester.ac.uk Ralf Becker] and [mailto:arthur.sinko@manchester.ac.uk Arthur Sinko] with the financial support of a University of Manchester Investing in Success grant. If you have any suggestions please contact us by email. Contributions to this wiki are encouraged. Please contact us for details.

== Help for using WikiMedia ==
* [http://www.mediawiki.org/wiki/Manual:MediaWiki - FAQ]
* [http://www.mediawiki.org/wiki/Manual:MediaWiki - Configuration Settings]