Regression Examples

From ECLR
Jump to: navigation, search



Exercises

Solutions in curly {curly brackets}. Worked solution clips can be found here Q1, Q2, Q3 and Q4.

  1. Within an area of [math]4[/math] square miles, in a city centre, there are [math]10[/math] petrol stations. The following table gives the price charged at each petrol station (in pence per litre) for unleaded petrol, and the market share obtained:

    Petrol Station [math]1[/math] [math]2[/math] [math]3[/math] [math]4[/math] [math]5[/math] [math]6[/math] [math]7[/math] [math]8[/math] [math]9[/math] [math]10[/math]
    Price [math]131[/math] [math]134[/math] [math]137[/math] [math]141[/math] [math]133[/math] [math]139[/math] [math]134[/math] [math]128[/math] [math]135[/math] [math]136[/math]
    Market Share [math]0.17[/math] [math]0.12[/math] [math]0.05[/math] [math]0.02[/math] [math]0.05[/math] [math]0.01[/math] [math]0.2[/math] [math]0.23[/math] [math]0.1[/math] [math]0.05[/math]
    1. Calculate the sample (or arithmetic) mean and the sample standard deviation of the price. {138.4,3.765}

    2. Calculate the weighted mean, using the market share values as the set of weights. Explain why the weighted and unweighted means differ. {132.6}

    3. Calculate the (sample) mean and standard deviation of the price per gallon. {626.166; 17.1157}

  2. Consider, again, the data on [math]X=[/math] petrol prices and [math]Y=[/math] market share given in the previous question. You must find the answers to the following questions "by hand" so that you understand the calculations involved; however, you should also check you answers using EXCEL, MATLAB or any other software package.

    1. Show that the correlation coefficient between these two variables is -0.8366. Interpret this number and, in particular, is the sign as you would expect?

    2. Use the data to fit the regression line, [math]\hat{y}=a+bx[/math]; i.e., show that regressing [math]y[/math] on [math]x[/math] yields a value of [math]a=2.4241[/math] and [math]b=-0.0172[/math]. Why would you expect the value of [math]b[/math] to be negative given (a)?

    3. Suppose that every price rises uniformly by [math]2[/math] pence. Assuming that the market shares stay the same, write down what a regression of market shareon prices would now yield for values of [math]a[/math] and [math]b[/math]. Think, rather than calculate. {2.4586}

  3. Refer to the data given below in which you are given [math]11[/math] observations on a variable labeled [math]X[/math] and [math]11[/math] observations on each of three variables [math]Y[/math], [math]Z[/math] and [math]W[/math].

    observation [math]\quad x[/math] [math]\quad y[/math] [math]\quad z[/math] [math]\quad w[/math]
    1 10 8.04 9.14 7.46
    2 8 6.95 8.14 6.77
    3 13 7.58 8.74 12.74
    4 9 8.81 8.77 7.11
    5 11 8.33 9.26 7.81
    6 14 9.96 8.10 8.84
    7 6 7.24 6.13 6.08
    8 4 4.16 3.10 5.39
    9 12 10.84 9.13 8.15
    10 7 4.82 7.26 6.42
    11 5 5.68 4.74 5.73
    1. Use EXCEL to obtain three separate scatter diagrams of [math]y[/math] against [math]x[/math], [math]z[/math] against [math]x[/math] and [math]w[/math] against [math]x[/math].

    2. Show that the sample correlation coefficient between [math]y[/math] and [math]x[/math] is [math]0.82[/math] and that this is the same as the corresponding correlation between [math]z[/math] and [math]x[/math] and also [math]w[/math] and [math]x[/math].

    3. Using EXCEL, show that the three separate regressions of [math]y[/math] on [math]x[/math], [math]z[/math] on [math]x[/math] and [math]w[/math] on [math]x[/math] all yield a "line of best fit" or regression equation of the form: [math]3+0.5x[/math]; i.e., a line with intercept [math]3[/math] and slope [math]0.5[/math]. Use EXCEL to superimpose this regression line on each of the three scatter diagrams obtained in part (a).

    4. To what extent do you feel that correlation and regression analysis is useful for the various pairs of variables?

  4. In this file GDP_CO2.xlsx you can find data for carbon dioxide (CO2) emissions and the Gross Domestic Product (GDP) for 39 European countries for the year 2009.

    1. Using EXCEL, construct a scatter plot of carbon dioxide emissions against gross domestic product, construct the regression line (of CO2 on GDP) and calculate the correlation coefficient.

    2. Repeat the exercise, but this time using the natural logarithm of CO, ln(CO), and ln(GDP).

    3. What do you think this tells us about the relationship between the two variables?

Footnotes