Difference between revisions of "Example 2b"

From ECLR
Jump to: navigation, search
(Created page with "= Task = This is a continuation of Exercise 2. In that exercise you were asked to download two share price series (Glaxo Smith Kline, GSK, and Apple, AAPL) for the follow...")
(No difference)

Revision as of 18:59, 6 October 2012

Task

This is a continuation of Exercise 2. In that exercise you were asked to download two share price series (Glaxo Smith Kline, GSK, and Apple, AAPL) for the following sample: 2 January 1987 to 30 December 2011. You then calculated a number of statistics for that dataset. The task tackled in this part is to repeat the calculation of these statistics for each year in the sample. Here is the full list of tasks again (Steps 1 to 4 were completed in Exercise 2):

  1. Download the data and import into MATLAB (Date info and adjusted close prices only are required)
  2. Delete days for which you do not have observations for both stocks
  3. Calculate the daily log and simple returns for both series
  4. Calculate the following summary statistics for both stocks and for both types of returns for the full sample:
    1. Mean, standard deviation, variance, skewness and kurtosis of returns
    2. Number of positive and negative returns in the sample
    3. correlation between the AAPL and GSK returns
    4. Average positive and negative returns in the sample
    5. the sum of the autoregressive coefficients of an AR(5) model for each series
  5. Calculate the same statistics separately for every year of data (first for 1987, then 1988 and so forth) and evaluate (by eyeballing) any significant changes through the years. This part of the exercise is discussed on Example2b.

Implementation

It may be best to do this initially in a new MATLAB file. As we now have to repeat a number of calculation many times it may be useful to write a small function which accepts as an input the return series (where we will only hand in the slice of return we are interested at the time) and which returns all the statistics we are meant to save. Then we will write a loop in which we select one year’s worth of data, hand these data to the summary statistics function and save the results for that slice of data into a matrix. Here is the scheme:

noy = % define as the number of years through which we loop

savemean = zeros(noy,2);    % save the average return results here
savevar  = zeros(noy,2);    % save the variance results here
                            % ... similar matrices for all required statistics

for i = 1:noy

    ret_i = % select the relevant year of returns
    [me,va,OTHERSTATS] = SummaryStats2var(ret_i);
    savemean(i,:) = me;
    savevar(i,:)  = va;
    % ... continue with other statistics
    
end

Function for summary statistics

Here we will discuss how the function that calculates the summary statistics for our two return series should look like.

function [me,va,sd,sk,ku,cor,avgp,avgn,ar5] = SummaryStats2var(returns)
% input:    (i) returns, (nx2) matrix of returns
% output:   (i) me, (1x2) vector of average returns
%           (ii) sd, (1x2) vector of standard deviations
%           (iii) va, (1x2) vector of variances
%           (iv) sk, (1x2) vector of skewness
%           (v) ku, (1x2) vector of kurtosis
%           (vi) cor, correlation coefficient
%           (vii) avgp, (1x2) vector of average positive returns
%           (viii) avgn, (1x2) vector of average negative returns
%           (ix) ar5, (1x2) vector of sum of AR5 coefficients

%% Full sample statistics
[T,n]     = size(returns);
me    = sum(returns)/T;         % this is (1xn)
ret_m = returns-repmat(me,T,1); % de-meaned returns (Txn)
va    = sum(ret_m.^2)/T;
sd    = sqrt(va);
sk    = (sum(ret_m.^3)/T)./sd.^3;   % need to use "./" as sd is (1xn)
ku    = (sum(ret_m.^4)/T)./sd.^4;   % and we want elementwise division

numcorr = sum(ret_m(:,1).*ret_m(:,2));
dencorr = sqrt(sum(ret_m(:,1).^2)*sum(ret_m(:,2).^2));
cor  = numcorr/dencorr;

So far the statistics could be calculated for both return series in one go (one line). The remaining statistics (average positive and negative returns and the AR(5) coefficients) need to be calculated one by one. This is why previously we basically had to replicate the code for the average positive and negative returns for both series.

Here we will do this by writing a loop. In that way we will write the code only once, but use it several times (here twice). Before we start the loop we define matrices into which we save the results (

avgp = zeros(1,n)

,

avgn

and

ar5

). Whenever we refer to the return matrix inside the loop we will have to refer to the

i

th column of the return matrix,

returns(:,i)

. Therefore the

SummaryStats2var

function continues as follows:

%% positive and negative returns (full sample) and sum of AR(5) coefficients
% these stats are best calculated individually, i.e. for one series at a
% time
% hence this will be done in a loop

avgp = zeros(1,n);    % save the average positive, negative avgs
avgn = zeros(1,n);    % and AR5 coef sums here
ar5  = zeros(1,n);

for i = 1:2     % loop through all series (here 2)
    indpos = (returns(:,i) >= 0);
    indneg = logical(1-indpos);
    Tplus  = sum(indpos);
    Tminus = sum(indneg);
    avgp(1,i) = sum(returns(indpos,i))/Tplus;
    avgn(1,i) = sum(returns(indneg,i))/Tminus;

        % AR(5) coefficients
    lags = 5;
    yret = returns(lags+1:end,i);
    xret = [ones(T-lags,1) returns(lags:end-1,i) returns(lags-1:end-2,i) ...
    returns(lags-2:end-3,i) returns(lags-3:end-4,i) returns(lags-4:end-5,i)];

    [bar5,~,~,~,~,~] = OLSest(yret,xret,0);
    ar5(1,i) = sum(bar5(2:end));  % Exclude first coefficient which is the constant
end     % end of i loop

end     % end of function

This function will return all the requested summary statistics

[me,va,sd,sk,ku,cor,avgp,avgn,ar5]

. You can the use these variables in the main code. This function is best saved in its own m file called

SummaryStats2var.m

and then you can call it as seen below.

Calling the Summary Statistics function through a loop

Here we will show how we will write the loop to select a year’s worth of data and then call the

SummaryStats2var

function.

% ...
% up to here as in the first part of Exercise 2
gsk_r = [0;gsk_r];          % append 0 for r_1
app_r = [0;app_r];

% this is the first new bit
returns = [gsk_r app_r];

%% Prepare loop through years

dates = datevec(dates);     % transfer in date vector format
year = dates(:,1);          % picks out the column with the year info
yearlist = unique(year);    % finds which years are in the data

noy = size(yearlist,1);     % number of years

savemean = zeros(noy,2);    % save the average return results here
savevar  = zeros(noy,2);    % save the variance results here
savesd   = zeros(noy,2);    % save the standard deviation results here
savesk   = zeros(noy,2);    % save the skewness results here
saveku   = zeros(noy,2);    % save the kurtosis results here
savecorr = zeros(noy,1);    % save the correlation results here
saveavgp = zeros(noy,2);    % save the average positive return results here
saveavgn = zeros(noy,2);    % save the average negative return results here
savear5  = zeros(noy,2);    % save the sum(AR5coef) results here

for i = 1:noy

    year_i = yearlist(i);       % pick the ith year
    sel_i  = (year==year_i);    % create logical variable that can select the data for the ith year
    ret_i  = returns(sel_i,:);    % picks out the returns for the ith year only
    [mei,vai,sdi,ski,kui,cori,avgpi,avgni,ar5i] = SummaryStats2var(ret_i);

    savemean(i,:) = mei;    % save results in ith row
    savevar(i,:)  = vai;
    savesd(i,:)   = sdi;
    savesk(i,:)   = ski;
    saveku(i,:)   = kui;
    savecorr(i)   = cori;
    saveavgp(i,:) = avgpi;
    saveavgn(i,:) = avgni;
    savear5(i,:)  = ar5i;

end

Now that we have all the results nicely saved we also want to look at them. Often the easiest way is by plotting the results. As we have data from 1987 to 2011 we have 25 observations for each statistic and it is is easiest to actually plot these results in time series plots (one line for GSK and AAPL each for every statistic but for the correlation statistic for which we only have one set of results.

This is not the place to give any details on how to plot data (seeGraphingData). Just a very brief explanation for the code below.

plot(series)

plots a time series (line) graph of the data in matrix

series

, one line for each column. If you call

plot(xlabels,series)

the graph will also add the values in

xlabels

on the x-axis. This vector should have the same number of rows as

series

. By adding the command

title(’TITLE FOR GRAPH’)

you add a title to the plot and

legend(’SERIES1’,’SERIES’)

adds a legend.

As we have 9 different statistics we want 9 such graphs and here we decide to produce one big picture that has 9 little graphs. This is what the

subplot(3,3,j)

command does, which tells MATLAB that there should be 3 rows of graphs and 3 columns of graphs. The

j

counts from 1 to 9 to fill the nine graphs.

%% Plot results
yearxaxis = (yearlist(1):yearlist(end))';

subplot(3,3,1);
plot(yearxaxis,savemean);
title('Mean');
legend('GSK','AAPL');

subplot(3,3,2);
plot(yearxaxis,savevar);
title('Variance');

subplot(3,3,3);
plot(yearxaxis,savemean);
title('StDev');

subplot(3,3,4);
plot(yearxaxis,savesk);
title('Skewness');

subplot(3,3,5);
plot(yearxaxis,saveku);
title('Kurtosis');

subplot(3,3,6);
plot(yearxaxis,savecorr);
title('Correlation');

subplot(3,3,7);
plot(yearxaxis,saveavgp);
title('Avg(r+)');

subplot(3,3,8);
plot(yearxaxis,saveavgn);
title('Avg(r-)');

subplot(3,3,9);
plot(yearxaxis,savear5);
title('Sum(Ar5)');