Difference between revisions of "Example 2"
Line 39: | Line 39: | ||
dates_gsk = flipud(dates_gsk); | dates_gsk = flipud(dates_gsk); | ||
dates_app = flipud(dates_app);</source> | dates_app = flipud(dates_app);</source> | ||
− | The adjusted close data are in the 6th column of <source enclose="none">gsk_data</source> and <source enclose="none">app_data</source> respectively. The date information is in the first column of the <source enclose="none">_txt</source> files (1st row is excluded as it includes the headers). The <source enclose="none">flipud</source> commands reverse the data order as yahoo routinely returns files with the latest data on the top. | + | The adjusted close data are in the 6th column of <source enclose="none">gsk_data</source> and <source enclose="none">app_data</source> respectively. The date information is in the first column of the <source enclose="none">*_txt</source> files (1st row is excluded as it includes the headers). The <source enclose="none">flipud</source> commands reverse the data order as yahoo routinely returns files with the latest data on the top. |
At this stage we will have to save two different date vectors as there is no guarantee that both series contain data for exactly the same dates. This is what needs to be checked next. | At this stage we will have to save two different date vectors as there is no guarantee that both series contain data for exactly the same dates. This is what needs to be checked next. |
Revision as of 13:01, 29 September 2012
Contents
Task
In this exercise you will have to download some share prices and then use these data to calculate summary statistics for every year in the sample. We will then compare these statistics and see how they change through time.
The data you should download is the share prices of two companies, Glaxo Smith Kline (GSK) and Apple (AAPL). You can get these data from http://finance.yahoo.com/. Enter the Ticker symbols into the search box and after clicking enter go to the historical prices link. You should download daily data and then use the "Adjusted Close Prices". The sample period we use is from 2 January 1987 to 30 December 2011.
These are your tasks:
- Download the data and import into MATLAB (Date info and adjusted close prices only are required)
- Delete days for which you do not have observations for both stocks
- Calculate the daily log and simple returns for both series
- Calculate the following summary statistics for both stocks and for both types of returns for the full sample:
- Mean, standard deviation, variance, skewness and kurtosis of returns
- Number of positive and negative returns in the sample
- Average positive and negative returns in the sample
- correlation (between the AAPL and GSK returns and between AAPL and GSK prices)
- the sum of the autoregressive coefficients of an AR(5) model for each series
- Calculate the same statistics separately for every year of data (first for 1987, then 1988 and so forth) and evaluate (by eyeballing) any significant changes thorugh the years.
Implementation
Import data in MATLAB (keyword: data import)
MATLAB has a very wide range of importing procedures (see LoadingData). The most straightforward and user-friendly is the MATLAB import wizard, although how that works precisely changes from one version to another. Here we will use the xlsread
command as this works fairly consistently across versions. When you download dta from yahoo you are likely to obtain a csv file. The csvread
command, unfortunately does not like importing dates. However the xlsread
command does this easily. You can either convert your csv file in EXCEL to an xlsx file or you can use the xlsread
to import the csv file directly. This is what we do here.
%% Import Data
% this imports the csv files obtained from yahoo
% Note that the files have the inverse date orders
% adjusted close data are in the 6th column
[gsk_data, gsk_txt, gsk_raw] = xlsread('GSK.csv');
[app_data, app_txt, app_raw] = xlsread('AAPL.csv');
gsk_p = flipud(gsk_data(:,6)); % Extract the adjusted close price which is in the 6th data col
app_p = flipud(app_data(:,6)); % and flip upside down to get the right date order
dates_gsk = datenum(gsk_txt(2:end,1),'dd/mm/yyyy');
dates_app = datenum(app_txt(2:end,1),'dd/mm/yyyy');
dates_gsk = flipud(dates_gsk);
dates_app = flipud(dates_app);
The adjusted close data are in the 6th column of gsk_data
and app_data
respectively. The date information is in the first column of the *_txt
files (1st row is excluded as it includes the headers). The flipud
commands reverse the data order as yahoo routinely returns files with the latest data on the top.
At this stage we will have to save two different date vectors as there is no guarantee that both series contain data for exactly the same dates. This is what needs to be checked next.
Synchronize data
%% Delete days that are not available for both stocks
% check if there are dates that are nonsynchronous
[temp,i_app,i_gsk] = intersect(dates_app, dates_gsk);
gsk_p = gsk_p(i_gsk);
app_p = app_p(i_app);
dates = temp;
Construction of Log-returns
Log-returns are defined as [math]r_t=\ln(p_t)-\ln(p_{t-1})[/math] Simple returns are defined as [math]R_t=\frac{p_t}{p_{t-1}} \times 100\% -1[/math] The first return [math]r_1[/math] is not defined, since [math]p_0[/math] is not known. In MATLAB constructing of returns can be done in several ways. The long way to do this:
p=data;
[T, n]=size(p);
MSFTlogrets=zeros(T,n);
MSFTsimplerets=zeros(T,n);
for i=2:T
%This way r(1)=0 by construction
MSFTlogrets(i,1)=log(p(i,1))-log(p(i-1,1));
MSFTsimplerets(i,1)=p(i,1)/p(i-1,1)*100-1;
end
The same result can be achieved in a shorter way using the fact that MATLAB can extract submatrices from a matrix, that is p(2:end,:)
will select all elements but the first row in a matrix, and p(1:end-1,:)
will select all elements but the last.
lnrets=zeros(T,n);
simplerets=zeros(T,n);
%This way r(1)=0 by construction
lnrets(2:end,:)=log(p(2:end,:))-log(p(1:end-1,:));
simplerets(2:end,:)=p(2:end,:)./log(p(1:end-1,:)*100-1;
Constructing Sample moments
Formulas for mean [math]\bar r[/math], variance [math]\hat \sigma_r^2[/math], standard deviation [math]\hat \sigma_r[/math], skewness [math]\hat S_r[/math], and kurtosis [math]\hat K_r[/math]
[math]\begin{aligned} \bar r &= \frac{1}{T}\sum_{t=1}^T r_t\\ \hat \sigma_r^2 &= \frac{1}{T}\sum_{t=1}^T (r_t-\bar r)^2\\ \hat \sigma_r &= \sqrt{\hat \sigma_r^2}\\ \hat S_r &= \frac{1}{T}\sum_{t=1}^T (r_t-\bar r)^3/\hat \sigma_r^3\\ \hat K_r &= \frac{1}{T}\sum_{t=1}^T (r_t-\bar r)^4/\hat \sigma_r^4\end{aligned}[/math]
can be implemented directly using the following code:
MeanLnRet = sum(lnrets)/T;
VarLnRet = sum((lnrets-MeanLnRet).^2)/T;
StdLnRet = sqrt(VarLnRet);
SkewLnRet = (sum((lnrets-MeanLnRet).^3)/T)/StdLnRet^3;
KurtLnRet = (sum((lnrets-MeanLnRet).^4)/T)/StdLnRet^4;
Positive and negative returns (keywords: loops, if-then-else statements, logical operations, vectorization)
Number ([math]T^+,T^-[/math]) and sample means ([math]r^+,r^-[/math]) of non-negative and negative returns are computed [math]T^+=\sum_{t=1}^T I(r_t\ge0)[/math] [math]T^-=\sum_{t=1}^T I(r_t\lt 0)[/math] [math]r^+=\frac{\sum_{t=1}^T r_t I(r_t\ge0)}{T^+}[/math] [math]r^-=\frac{\sum_{t=1}^T r_t I(r_t\lt 0)}{T^-}[/math] where [math]I(True)=1[/math], [math]I(True)=0[/math].
A longer way to compute these quantities is:
Initialize variables
Tplus=0
,Tminus=0
,retplus=0
,retminus=0
.Check whether [math]i[/math]th observation of returns
lnrets(i)
is greater than or equal to 0 [1st] for [math]i=1[/math]If (2) is True, set
Tplus=Tplus+1;retplus=retplus+lnrets(i)
, else
setTminus=Tminus+1;retminus=retminus+lnrets(i)
[3rd]Repeat lines 2 – 3 for [math]i=2,3,...,T[/math], where [math]T[/math] is the sample size
Tplus=0; Tminus=0; retplus=0; retminus=0; for i=1:T %starts the loop if lnrets(i)>=0 Tplus=Tplus+1; %counting non-negative returns retplus=retplus+lnrets(i);%summation of non-negative returns else Tminus=Tminus+1;%counting negative returns retminus=retminus+lnrets(i);%summation of negative returns end retplus=retplus/Tplus; %computing average non-negative return retminus=retminus/Tminus; %computing average negative return
If interested in a shorter way, the following has to be kept in mind:
Logical relationships also work for vectors, that is
indpos=(lnrets>=0)
generates a vector of 0s (wherelnrets(i)<0
) and 1s (wherelnrets(i)>=0
)Logical expressions can be used for selecting subsamples from a sample, that is
retplus=lnrets(indpos)
generates a subvector of non-negative returns andretminus=lnrets(1-indpos)
generates a subvector of negative returns.indpos=lnrets > = 0; indneg=1-indpos; Tplus=sum(indpos); Tminus=sum(indneg); retplus=sum(lnrets(indpos))/Tplus; retminus=sum(lnrets(indneg))/Tminus;