Difference between revisions of "Example 2b"
(→Task) |
(→Calling the Summary Statistics function through a loop) |
||
Line 145: | Line 145: | ||
Now that we have all the results nicely saved we also want to look at them. Often the easiest way is by plotting the results. As we have data from 1987 to 2011 we have 25 observations for each statistic and it is is easiest to actually plot these results in time series plots (one line for GSK and AAPL each for every statistic but for the correlation statistic for which we only have one set of results. | Now that we have all the results nicely saved we also want to look at them. Often the easiest way is by plotting the results. As we have data from 1987 to 2011 we have 25 observations for each statistic and it is is easiest to actually plot these results in time series plots (one line for GSK and AAPL each for every statistic but for the correlation statistic for which we only have one set of results. | ||
− | This is not the place to give any details on how to plot data (see[[GraphingData]]). Just a very brief explanation for the code below. <source enclose="none">plot(series)</source> plots a time series (line) graph of the data in matrix <source enclose="none">series</source>, one line for each column. If you call <source enclose="none">plot(xlabels,series)</source> the graph will also add the values in <source enclose="none">xlabels</source> on the x-axis. This vector should have the same number of rows as <source enclose="none">series</source>. By adding the command <source enclose="none">title(’TITLE FOR GRAPH’)</source> you add a title to the plot and <source enclose="none">legend(’SERIES1’,’SERIES’)</source> adds a legend. | + | This is not the place to give any details on how to plot data (see [[GraphingData]]). Just a very brief explanation for the code below. <source enclose="none">plot(series)</source> plots a time series (line) graph of the data in matrix <source enclose="none">series</source>, one line for each column. If you call <source enclose="none">plot(xlabels,series)</source> the graph will also add the values in <source enclose="none">xlabels</source> on the x-axis. This vector should have the same number of rows as <source enclose="none">series</source>. By adding the command <source enclose="none">title(’TITLE FOR GRAPH’)</source> you add a title to the plot and <source enclose="none">legend(’SERIES1’,’SERIES’)</source> adds a legend. |
As we have 9 different statistics we want 9 such graphs and here we decide to produce one big picture that has 9 little graphs. This is what the <source enclose="none">subplot(3,3,j)</source> command does, which tells MATLAB that there should be 3 rows of graphs and 3 columns of graphs. The <source enclose="none">j</source> counts from 1 to 9 to fill the nine graphs. | As we have 9 different statistics we want 9 such graphs and here we decide to produce one big picture that has 9 little graphs. This is what the <source enclose="none">subplot(3,3,j)</source> command does, which tells MATLAB that there should be 3 rows of graphs and 3 columns of graphs. The <source enclose="none">j</source> counts from 1 to 9 to fill the nine graphs. | ||
Line 188: | Line 188: | ||
plot(yearxaxis,savear5); | plot(yearxaxis,savear5); | ||
title('Sum(Ar5)');</source> | title('Sum(Ar5)');</source> | ||
+ | |||
= Summary = | = Summary = | ||
What you should have learned from this code is mainly how one can select subsections of data (here years) and then apply the same operations (here summary statistics) to each subsection. This is an incredibly common problem and you should appreciate the value of functions in such a cotext. | What you should have learned from this code is mainly how one can select subsections of data (here years) and then apply the same operations (here summary statistics) to each subsection. This is an incredibly common problem and you should appreciate the value of functions in such a cotext. |
Revision as of 19:49, 6 October 2012
Contents
Task
This is a continuation of Example 2. In that exercise you were asked to download two share price series (Glaxo Smith Kline, GSK, and Apple, AAPL) for the following sample: 2 January 1987 to 30 December 2011. You then calculated a number of statistics for that dataset. The task tackled in this part is to repeat the calculation of these statistics for each year in the sample. Here is the full list of tasks again (Steps 1 to 4 were completed in Example 2):
- Download the data and import into MATLAB (Date info and adjusted close prices only are required)
- Delete days for which you do not have observations for both stocks
- Calculate the daily log and simple returns for both series
- Calculate the following summary statistics for both stocks and for both types of returns for the full sample:
- Mean, standard deviation, variance, skewness and kurtosis of returns
- Number of positive and negative returns in the sample
- correlation between the AAPL and GSK returns
- Average positive and negative returns in the sample
- the sum of the autoregressive coefficients of an AR(5) model for each series
- Calculate the same statistics separately for every year of data (first for 1987, then 1988 and so forth) and evaluate (by eyeballing) any significant changes through the years.
Implementation
It may be best to do this initially in a new MATLAB file. As we now have to repeat a number of calculation many times it may be useful to write a small function which accepts as an input the return series (where we will only hand in the slice of return we are interested at the time) and which returns all the statistics we are meant to save. Then we will write a loop in which we select one year’s worth of data, hand these data to the summary statistics function and save the results for that slice of data into a matrix. Here is the scheme:
noy = % define as the number of years through which we loop
savemean = zeros(noy,2); % save the average return results here
savevar = zeros(noy,2); % save the variance results here
% ... similar matrices for all required statistics
for i = 1:noy
ret_i = % select the relevant year of returns
[me,va,OTHERSTATS] = SummaryStats2var(ret_i);
savemean(i,:) = me;
savevar(i,:) = va;
% ... continue with other statistics
end
Function for summary statistics
Here we will discuss how the function that calculates the summary statistics for our two return series should look like.
function [me,va,sd,sk,ku,cor,avgp,avgn,ar5] = SummaryStats2var(returns)
% input: (i) returns, (nx2) matrix of returns
% output: (i) me, (1x2) vector of average returns
% (ii) sd, (1x2) vector of standard deviations
% (iii) va, (1x2) vector of variances
% (iv) sk, (1x2) vector of skewness
% (v) ku, (1x2) vector of kurtosis
% (vi) cor, correlation coefficient
% (vii) avgp, (1x2) vector of average positive returns
% (viii) avgn, (1x2) vector of average negative returns
% (ix) ar5, (1x2) vector of sum of AR5 coefficients
%% Full sample statistics
[T,n] = size(returns);
me = sum(returns)/T; % this is (1xn)
ret_m = returns-repmat(me,T,1); % de-meaned returns (Txn)
va = sum(ret_m.^2)/T;
sd = sqrt(va);
sk = (sum(ret_m.^3)/T)./sd.^3; % need to use "./" as sd is (1xn)
ku = (sum(ret_m.^4)/T)./sd.^4; % and we want elementwise division
numcorr = sum(ret_m(:,1).*ret_m(:,2));
dencorr = sqrt(sum(ret_m(:,1).^2)*sum(ret_m(:,2).^2));
cor = numcorr/dencorr;
So far the statistics could be calculated for both return series in one go (one line). The remaining statistics (average positive and negative returns and the AR(5) coefficients) need to be calculated one by one. This is why previously we basically had to replicate the code for the average positive and negative returns for both series.
Here we will do this by writing a loop. In that way we will write the code only once, but use it several times (here twice). Before we start the loop we define matrices into which we save the results (avgp = zeros(1,n)
, avgn
and ar5
). Whenever we refer to the return matrix inside the loop we will have to refer to the i
th column of the return matrix, returns(:,i)
. Therefore the SummaryStats2var
function continues as follows:
%% positive and negative returns (full sample) and sum of AR(5) coefficients
% these stats are best calculated individually, i.e. for one series at a
% time
% hence this will be done in a loop
avgp = zeros(1,n); % save the average positive, negative avgs
avgn = zeros(1,n); % and AR5 coef sums here
ar5 = zeros(1,n);
for i = 1:2 % loop through all series (here 2)
indpos = (returns(:,i) >= 0);
indneg = logical(1-indpos);
Tplus = sum(indpos);
Tminus = sum(indneg);
avgp(1,i) = sum(returns(indpos,i))/Tplus;
avgn(1,i) = sum(returns(indneg,i))/Tminus;
% AR(5) coefficients
lags = 5;
yret = returns(lags+1:end,i);
xret = [ones(T-lags,1) returns(lags:end-1,i) returns(lags-1:end-2,i) ...
returns(lags-2:end-3,i) returns(lags-3:end-4,i) returns(lags-4:end-5,i)];
[bar5,~,~,~,~,~] = OLSest(yret,xret,0);
ar5(1,i) = sum(bar5(2:end)); % Exclude first coefficient which is the constant
end % end of i loop
end % end of function
This function will return all the requested summary statistics [me,va,sd,sk,ku,cor,avgp,avgn,ar5]
. You can the use these variables in the main code. This function is best saved in its own m file called SummaryStats2var.m
and then you can call it as seen below.
Calling the Summary Statistics function through a loop
Here we will show how we will write the loop to select a year’s worth of data and then call the SummaryStats2var
function.
% ...
% up to here as in the first part of Exercise 2
gsk_r = [0;gsk_r]; % append 0 for r_1
app_r = [0;app_r];
% this is the first new bit
returns = [gsk_r app_r];
%% Prepare loop through years
dates = datevec(dates); % transfer in date vector format
year = dates(:,1); % picks out the column with the year info
yearlist = unique(year); % finds which years are in the data
noy = size(yearlist,1); % number of years
savemean = zeros(noy,2); % save the average return results here
savevar = zeros(noy,2); % save the variance results here
savesd = zeros(noy,2); % save the standard deviation results here
savesk = zeros(noy,2); % save the skewness results here
saveku = zeros(noy,2); % save the kurtosis results here
savecorr = zeros(noy,1); % save the correlation results here
saveavgp = zeros(noy,2); % save the average positive return results here
saveavgn = zeros(noy,2); % save the average negative return results here
savear5 = zeros(noy,2); % save the sum(AR5coef) results here
for i = 1:noy
year_i = yearlist(i); % pick the ith year
sel_i = (year==year_i); % create logical variable that can select the data for the ith year
ret_i = returns(sel_i,:); % picks out the returns for the ith year only
[mei,vai,sdi,ski,kui,cori,avgpi,avgni,ar5i] = SummaryStats2var(ret_i);
savemean(i,:) = mei; % save results in ith row
savevar(i,:) = vai;
savesd(i,:) = sdi;
savesk(i,:) = ski;
saveku(i,:) = kui;
savecorr(i) = cori;
saveavgp(i,:) = avgpi;
saveavgn(i,:) = avgni;
savear5(i,:) = ar5i;
end
Now that we have all the results nicely saved we also want to look at them. Often the easiest way is by plotting the results. As we have data from 1987 to 2011 we have 25 observations for each statistic and it is is easiest to actually plot these results in time series plots (one line for GSK and AAPL each for every statistic but for the correlation statistic for which we only have one set of results.
This is not the place to give any details on how to plot data (see GraphingData). Just a very brief explanation for the code below. plot(series)
plots a time series (line) graph of the data in matrix series
, one line for each column. If you call plot(xlabels,series)
the graph will also add the values in xlabels
on the x-axis. This vector should have the same number of rows as series
. By adding the command title(’TITLE FOR GRAPH’)
you add a title to the plot and legend(’SERIES1’,’SERIES’)
adds a legend.
As we have 9 different statistics we want 9 such graphs and here we decide to produce one big picture that has 9 little graphs. This is what the subplot(3,3,j)
command does, which tells MATLAB that there should be 3 rows of graphs and 3 columns of graphs. The j
counts from 1 to 9 to fill the nine graphs.
%% Plot results
yearxaxis = (yearlist(1):yearlist(end))';
subplot(3,3,1);
plot(yearxaxis,savemean);
title('Mean');
legend('GSK','AAPL');
subplot(3,3,2);
plot(yearxaxis,savevar);
title('Variance');
subplot(3,3,3);
plot(yearxaxis,savemean);
title('StDev');
subplot(3,3,4);
plot(yearxaxis,savesk);
title('Skewness');
subplot(3,3,5);
plot(yearxaxis,saveku);
title('Kurtosis');
subplot(3,3,6);
plot(yearxaxis,savecorr);
title('Correlation');
subplot(3,3,7);
plot(yearxaxis,saveavgp);
title('Avg(r+)');
subplot(3,3,8);
plot(yearxaxis,saveavgn);
title('Avg(r-)');
subplot(3,3,9);
plot(yearxaxis,savear5);
title('Sum(Ar5)');
Summary
What you should have learned from this code is mainly how one can select subsections of data (here years) and then apply the same operations (here summary statistics) to each subsection. This is an incredibly common problem and you should appreciate the value of functions in such a cotext.