Difference between revisions of "Regression Examples"

From ECLR
Jump to: navigation, search
(Created page with "= Exercise 1 = <ol> <li><p>Within an area of <math>4</math> square miles, in a city centre, there are <math>10</math> petrol stations. The following table gives the price cha...")
 
(Exercises)
 
(9 intermediate revisions by 2 users not shown)
Line 1: Line 1:
= Exercise 1 =
 
  
 +
 +
 +
 +
= Exercises =
 +
Solutions in curly {curly brackets}. Worked solution clips can be found here [http://youtu.be/Nq3Z7bqoUYM?hd=1 Q1], [http://youtu.be/zIIidXiLhPE?hd=1 Q2], [http://youtu.be/GVOO3jvSNNs?hd=1 Q3] and [http://youtu.be/d4YLj2fUbSI?hd=1 Q4].
 
<ol>
 
<ol>
 
<li><p>Within an area of <math>4</math> square miles, in a city centre, there are <math>10</math> petrol stations. The following table gives the price charged at each petrol station (in pence per litre) for unleaded petrol, and the market share obtained:</p>
 
<li><p>Within an area of <math>4</math> square miles, in a city centre, there are <math>10</math> petrol stations. The following table gives the price charged at each petrol station (in pence per litre) for unleaded petrol, and the market share obtained:</p>
 
<table border="1">
 
<table border="1">
<thead>
+
 
 
<tr class="header">
 
<tr class="header">
 
<th align="center">Petrol Station</th>
 
<th align="center">Petrol Station</th>
Line 18: Line 22:
 
<th align="center"><math>10</math></th>
 
<th align="center"><math>10</math></th>
 
</tr>
 
</tr>
</thead>
+
 
  
 
<tr class="odd">
 
<tr class="odd">
Line 50: Line 54:
  
 
<ol>
 
<ol>
<li><p>Calculate the sample (or arithmetic) mean and the sample standard deviation of the price.</p></li>
+
<li><p>Calculate the sample (or arithmetic) mean and the sample standard deviation of the price. {138.4,3.765}</p></li>
<li><p>Calculate the ''weighted'' mean, using the market share values as the set of weights. Explain why the weighted and unweighted means differ.</p></li>
+
<li><p>Calculate the ''weighted'' mean, using the market share values as the set of weights. Explain why the weighted and unweighted means differ. {132.6}</p></li>
<li><p>Calculate the (sample) mean and standard deviation of the price per gallon.</p></li></ol>
+
<li><p>Calculate the (sample) mean and standard deviation of the price per gallon. {626.166; 17.1157}</p></li></ol>
 
</li>
 
</li>
 
<li><p>Consider, again, the data on <math>X=</math> ''petrol prices'' and <math>Y=</math> ''market share'' given in the previous question. You must find the answers to the following questions &quot;by hand&quot; so that you understand the calculations involved; however, you should also check you answers using EXCEL, MATLAB or any other software package.</p>
 
<li><p>Consider, again, the data on <math>X=</math> ''petrol prices'' and <math>Y=</math> ''market share'' given in the previous question. You must find the answers to the following questions &quot;by hand&quot; so that you understand the calculations involved; however, you should also check you answers using EXCEL, MATLAB or any other software package.</p>
Line 58: Line 62:
 
<li><p>Show that the correlation coefficient between these two variables is -0.8366. Interpret this number and, in particular, is the sign as you would expect?</p></li>
 
<li><p>Show that the correlation coefficient between these two variables is -0.8366. Interpret this number and, in particular, is the sign as you would expect?</p></li>
 
<li><p>Use the data to fit the regression line, <math>\hat{y}=a+bx</math>; i.e., show that regressing <math>y</math> on <math>x</math> yields a value of <math>a=2.4241</math> and <math>b=-0.0172</math>. Why would you expect the value of <math>b</math> to be negative given (a)?</p></li>
 
<li><p>Use the data to fit the regression line, <math>\hat{y}=a+bx</math>; i.e., show that regressing <math>y</math> on <math>x</math> yields a value of <math>a=2.4241</math> and <math>b=-0.0172</math>. Why would you expect the value of <math>b</math> to be negative given (a)?</p></li>
<li><p>Suppose that every price rises uniformly by <math>2</math> pence. Assuming that the market shares stay the same, write down what a regression of ''market share''on ''prices'' would now yield for values of <math>a</math> and <math>b</math>. Think, rather than calculate.</p></li></ol>
+
<li><p>Suppose that every price rises uniformly by <math>2</math> pence. Assuming that the market shares stay the same, write down what a regression of ''market share''on ''prices'' would now yield for values of <math>a</math> and <math>b</math>. Think, rather than calculate. {2.4586}</p></li></ol>
 
</li>
 
</li>
 
<li><p>Refer to the data given below in which you are given <math>11</math> observations on a variable labeled <math>X</math> and <math>11</math> observations on each of three variables <math>Y</math>, <math>Z</math> and <math>W</math>.</p>
 
<li><p>Refer to the data given below in which you are given <math>11</math> observations on a variable labeled <math>X</math> and <math>11</math> observations on each of three variables <math>Y</math>, <math>Z</math> and <math>W</math>.</p>
 
<table border="1">
 
<table border="1">
<thead>
+
 
 
<tr class="header">
 
<tr class="header">
 
<th align="center">observation</th>
 
<th align="center">observation</th>
Line 70: Line 74:
 
<th align="right"><math>\quad w</math></th>
 
<th align="right"><math>\quad w</math></th>
 
</tr>
 
</tr>
</thead>
+
 
  
 
<tr class="odd">
 
<tr class="odd">
Line 158: Line 162:
 
<li><p>To what extent do you feel that ''correlation'' and ''regression'' analysis is useful for the various pairs of variables?</p></li></ol>
 
<li><p>To what extent do you feel that ''correlation'' and ''regression'' analysis is useful for the various pairs of variables?</p></li></ol>
 
</li>
 
</li>
<li><p>In this file [[media:GDP<sub>C</sub>O2.xlsx|GDP<sub>C</sub>O2.xlsx]] you can find data for carbon dioxide (CO2) emissions and the Gross Domestic Product (GDP) for 39 European countries for the year 2009.</p>
+
<li><p>In this file [[media:GDP_CO2.xlsx|GDP_CO2.xlsx]] you can find data for carbon dioxide (CO2) emissions and the Gross Domestic Product (GDP) for 39 European countries for the year 2009.</p>
 
<ol>
 
<ol>
 
<li><p>Using EXCEL, construct a scatter plot of carbon dioxide emissions against gross domestic product, construct the regression line (of CO2 on GDP) and calculate the correlation coefficient.</p></li>
 
<li><p>Using EXCEL, construct a scatter plot of carbon dioxide emissions against gross domestic product, construct the regression line (of CO2 on GDP) and calculate the correlation coefficient.</p></li>

Latest revision as of 16:45, 4 October 2013



Exercises

Solutions in curly {curly brackets}. Worked solution clips can be found here Q1, Q2, Q3 and Q4.

  1. Within an area of [math]4[/math] square miles, in a city centre, there are [math]10[/math] petrol stations. The following table gives the price charged at each petrol station (in pence per litre) for unleaded petrol, and the market share obtained:

    Petrol Station [math]1[/math] [math]2[/math] [math]3[/math] [math]4[/math] [math]5[/math] [math]6[/math] [math]7[/math] [math]8[/math] [math]9[/math] [math]10[/math]
    Price [math]131[/math] [math]134[/math] [math]137[/math] [math]141[/math] [math]133[/math] [math]139[/math] [math]134[/math] [math]128[/math] [math]135[/math] [math]136[/math]
    Market Share [math]0.17[/math] [math]0.12[/math] [math]0.05[/math] [math]0.02[/math] [math]0.05[/math] [math]0.01[/math] [math]0.2[/math] [math]0.23[/math] [math]0.1[/math] [math]0.05[/math]
    1. Calculate the sample (or arithmetic) mean and the sample standard deviation of the price. {138.4,3.765}

    2. Calculate the weighted mean, using the market share values as the set of weights. Explain why the weighted and unweighted means differ. {132.6}

    3. Calculate the (sample) mean and standard deviation of the price per gallon. {626.166; 17.1157}

  2. Consider, again, the data on [math]X=[/math] petrol prices and [math]Y=[/math] market share given in the previous question. You must find the answers to the following questions "by hand" so that you understand the calculations involved; however, you should also check you answers using EXCEL, MATLAB or any other software package.

    1. Show that the correlation coefficient between these two variables is -0.8366. Interpret this number and, in particular, is the sign as you would expect?

    2. Use the data to fit the regression line, [math]\hat{y}=a+bx[/math]; i.e., show that regressing [math]y[/math] on [math]x[/math] yields a value of [math]a=2.4241[/math] and [math]b=-0.0172[/math]. Why would you expect the value of [math]b[/math] to be negative given (a)?

    3. Suppose that every price rises uniformly by [math]2[/math] pence. Assuming that the market shares stay the same, write down what a regression of market shareon prices would now yield for values of [math]a[/math] and [math]b[/math]. Think, rather than calculate. {2.4586}

  3. Refer to the data given below in which you are given [math]11[/math] observations on a variable labeled [math]X[/math] and [math]11[/math] observations on each of three variables [math]Y[/math], [math]Z[/math] and [math]W[/math].

    observation [math]\quad x[/math] [math]\quad y[/math] [math]\quad z[/math] [math]\quad w[/math]
    1 10 8.04 9.14 7.46
    2 8 6.95 8.14 6.77
    3 13 7.58 8.74 12.74
    4 9 8.81 8.77 7.11
    5 11 8.33 9.26 7.81
    6 14 9.96 8.10 8.84
    7 6 7.24 6.13 6.08
    8 4 4.16 3.10 5.39
    9 12 10.84 9.13 8.15
    10 7 4.82 7.26 6.42
    11 5 5.68 4.74 5.73
    1. Use EXCEL to obtain three separate scatter diagrams of [math]y[/math] against [math]x[/math], [math]z[/math] against [math]x[/math] and [math]w[/math] against [math]x[/math].

    2. Show that the sample correlation coefficient between [math]y[/math] and [math]x[/math] is [math]0.82[/math] and that this is the same as the corresponding correlation between [math]z[/math] and [math]x[/math] and also [math]w[/math] and [math]x[/math].

    3. Using EXCEL, show that the three separate regressions of [math]y[/math] on [math]x[/math], [math]z[/math] on [math]x[/math] and [math]w[/math] on [math]x[/math] all yield a "line of best fit" or regression equation of the form: [math]3+0.5x[/math]; i.e., a line with intercept [math]3[/math] and slope [math]0.5[/math]. Use EXCEL to superimpose this regression line on each of the three scatter diagrams obtained in part (a).

    4. To what extent do you feel that correlation and regression analysis is useful for the various pairs of variables?

  4. In this file GDP_CO2.xlsx you can find data for carbon dioxide (CO2) emissions and the Gross Domestic Product (GDP) for 39 European countries for the year 2009.

    1. Using EXCEL, construct a scatter plot of carbon dioxide emissions against gross domestic product, construct the regression line (of CO2 on GDP) and calculate the correlation coefficient.

    2. Repeat the exercise, but this time using the natural logarithm of CO, ln(CO), and ln(GDP).

    3. What do you think this tells us about the relationship between the two variables?

Footnotes