Difference between revisions of "Example 2"

From ECLR
Jump to: navigation, search
(Created page with "= Task = In this exercise you will have to download some share prices and then use these data to calculate summary statistics for every year in the sample. We will then compa...")
 
Line 3: Line 3:
 
In this exercise you will have to download some share prices and then use these data to calculate summary statistics for every year in the sample. We will then compare these statistics and see how they change through time.
 
In this exercise you will have to download some share prices and then use these data to calculate summary statistics for every year in the sample. We will then compare these statistics and see how they change through time.
  
The data you should download is the share prices of two companies, Glaxo Smith Kline (GSK) and Apple (AAPL). You can get these data from [http://finance.yahoo.com/ Yahoo Finance]. Enter the Ticker symbols into the search box and after clicking enter go to the historical prices link. You should download daily data and then use the "Adjusted Close Prices". The sample period we use is from 2 January 1987 to 30 December 2011.
+
The data you should download is the share prices of two companies, Glaxo Smith Kline (GSK) and Apple (AAPL). You can get these data from http://finance.yahoo.com/. Enter the Ticker symbols into the search box and after clicking enter go to the historical prices link. You should download daily data and then use the "Adjusted Close Prices". The sample period we use is from 2 January 1987 to 30 December 2011.
  
 
These are your tasks:
 
These are your tasks:
  
#  
+
# Download the data and import into MATLAB (Date info and adjusted close prices only are required)
 +
# Delete days for which you do not have observations for both stocks
 +
# Calculate the daily log and simple returns for both series
 +
# Calculate the following summary statistics for both stocks and for both types of returns for the full sample:
 +
## Mean, standard deviation, variance, skewness and kurtosis of returns
 +
## Number of positive and negative returns in the sample
 +
## Average positive and negative returns in the sample
 +
## correlation (between the AAPL and GSK returns and between AAPL and GSK prices)
 +
## the sum of the autoregressive coefficients of an AR(5) model for each series
 +
# Calculate the same statistics separately for every year of data (first for 1987, then 1988 and so forth) and evaluate (by eyeballing) any significant changes thorugh the years.
  
This is how MATLAB looks on my machine although it may be arranged slightly differently on yours. However the 4 main elements A to D will always be there and I will briefly explain them
+
= Generic Algorithm =
  
<source>C = [5 4; 3 2]; % Create 2x2 matrix
+
# Import data in MATLAB (data are downloaded from Yahoo finance, finance.yahoo.com, adjusted daily close prices).
dc = det(C);   % Calculate determinant
+
# Check data and ensure that you have a matching sample (i.e. delete days on which one of the data-sets does not provide data)
disp(dc);       % Display value of dc</source>
+
# Construct vectors of log-returns and simple returns
 +
# Compute sample moments
 +
# Construct positive and negative subsamples
 +
# Repeat the above for all years from 1987 to 2011.
 +
 
 +
= Implementation =
 +
 
 +
== Import data in MATLAB (keyword: data import) ==
 +
 
 +
MATLAB has a very wide range of importing procedures (see [[LoadingData]]). The most straightforward and user-friendly is the MATLAB import wizard. It opens via File/Import data menu. The next step is to select the data file of interest. The import wizard is quite intuitive. It works for a variety of standard file formats and can generate a MATLAB code to import similar files in the future (check box, right bottom corner). The import wizard works well for well-structured import files. For data files with more complicated structure the <source enclose="none">textscan</source> function is used. There are two objects in the workspace after importing MSFT.txt: a matrix object <source enclose="none">data</source> and a cell object <source enclose="none">textdata</source>. MATLAB attempts to import all data columns as numerical data. If it fails, these columns are automatically dumped in a cell array <source enclose="none">textdata</source>. As a result, all dates are converted to text in a cell array <source enclose="none">textdata</source> and all numbers (prices) are stored in a data vector.
 +
 
 +
== Construction of Log-returns ==
 +
 
 +
Log-returns are defined as <math>r_t=\ln(p_t)-\ln(p_{t-1})</math> Simple returns are defined as <math>R_t=\frac{p_t}{p_{t-1}} \times 100\% -1</math> The first return <math>r_1</math> is not defined, since <math>p_0</math> is not known. In MATLAB constructing of returns can be done in several ways. The long way to do this:
 +
 
 +
<source>p=data;
 +
[T, n]=size(p);
 +
MSFTlogrets=zeros(T,n);
 +
MSFTsimplerets=zeros(T,n);
 +
for i=2:T
 +
%This way r(1)=0 by construction
 +
MSFTlogrets(i,1)=log(p(i,1))-log(p(i-1,1));
 +
MSFTsimplerets(i,1)=p(i,1)/p(i-1,1)*100-1;
 +
end</source>
 +
The same result can be achieved in a shorter way using the fact that MATLAB can extract submatrices from a matrix, that is <source enclose="none">p(2:end,:)</source> will select all elements but the first row in a matrix, and <source enclose="none">p(1:end-1,:)</source> will select all elements but the last.
 +
 
 +
<source>lnrets=zeros(T,n);
 +
simplerets=zeros(T,n);
 +
%This way r(1)=0 by construction
 +
lnrets(2:end,:)=log(p(2:end,:))-log(p(1:end-1,:));
 +
simplerets(2:end,:)=p(2:end,:)./log(p(1:end-1,:)*100-1;</source>
 +
== Constructing Sample moments ==
 +
 
 +
Formulas for mean <math>\bar r</math>, variance <math>\hat \sigma_r^2</math>, standard deviation <math>\hat \sigma_r</math>, skewness <math>\hat S_r</math>, and kurtosis <math>\hat K_r</math>
 +
 
 +
<math>\begin{aligned}
 +
  \bar r &= \frac{1}{T}\sum_{t=1}^T r_t\\
 +
  \hat \sigma_r^2 &= \frac{1}{T}\sum_{t=1}^T (r_t-\bar r)^2\\
 +
  \hat \sigma_r &= \sqrt{\hat \sigma_r^2}\\
 +
  \hat S_r &= \frac{1}{T}\sum_{t=1}^T (r_t-\bar r)^3/\hat \sigma_r^3\\
 +
  \hat K_r &= \frac{1}{T}\sum_{t=1}^T (r_t-\bar r)^4/\hat \sigma_r^4\end{aligned}</math>
 +
 
 +
can be implemented directly using the following code:
 +
 
 +
<source>  MeanLnRet = sum(lnrets)/T;
 +
  VarLnRet  = sum((lnrets-MeanLnRet).^2)/T;
 +
  StdLnRet  = sqrt(VarLnRet);
 +
  SkewLnRet = (sum((lnrets-MeanLnRet).^3)/T)/StdLnRet^3;
 +
  KurtLnRet = (sum((lnrets-MeanLnRet).^4)/T)/StdLnRet^4;</source>
 +
== Positive and negative returns (keywords: loops, if-then-else statements, logical operations, vectorization) ==
 +
 
 +
Number (<math>T^+,T^-</math>) and sample means (<math>r^+,r^-</math>) of non-negative and negative returns are computed <math>T^+=\sum_{t=1}^T I(r_t\ge0)</math> <math>T^-=\sum_{t=1}^T I(r_t<0)</math> <math>r^+=\frac{\sum_{t=1}^T r_t I(r_t\ge0)}{T^+}</math> <math>r^-=\frac{\sum_{t=1}^T r_t I(r_t<0)}{T^-}</math> where <math>I(True)=1</math>, <math>I(True)=0</math>.<br />
 +
<br />
 +
A longer way to compute these quantities is:
 +
 
 +
<ol>
 +
<li><p>Initialize variables <source enclose="none">Tplus=0</source>, <source enclose="none">Tminus=0</source>, <source enclose="none">retplus=0</source>, <source enclose="none">retminus=0</source>.</p></li>
 +
<li><p>Check whether <math>i</math>th observation of returns <source enclose="none">lnrets(i)</source> is greater than or equal to 0 [1st] for <math>i=1</math></p></li>
 +
<li><p>If (2) is True, set <source enclose="none">Tplus=Tplus+1;retplus=retplus+lnrets(i)</source>, else<br />
 +
set <source enclose="none">Tminus=Tminus+1;retminus=retminus+lnrets(i)</source> [3rd]</p></li>
 +
<li><p>Repeat lines 2 – 3 for <math>i=2,3,...,T</math>, where <math>T</math> is the sample size</p>
 +
<source>Tplus=0;
 +
Tminus=0;
 +
retplus=0;
 +
retminus=0;
 +
for i=1:T %starts the loop
 +
    if lnrets(i)>=0
 +
    Tplus=Tplus+1; %counting non-negative returns
 +
    retplus=retplus+lnrets(i);%summation of non-negative returns
 +
    else
 +
    Tminus=Tminus+1;%counting negative returns
 +
    retminus=retminus+lnrets(i);%summation of negative returns
 +
end
 +
retplus=retplus/Tplus; %computing average non-negative return
 +
retminus=retminus/Tminus; %computing average negative return</source></li></ol>
 +
 
 +
If interested in a shorter way, the following has to be kept in mind:
 +
 
 +
<ol>
 +
<li><p>Logical relationships also work for vectors, that is <source enclose="none">indpos=(lnrets>=0)</source> generates a vector of 0s (where <source enclose="none">lnrets(i)<0</source>) and 1s (where <source enclose="none">lnrets(i)>=0</source>)</p></li>
 +
<li><p>Logical expressions can be used for selecting subsamples from a sample, that is <source enclose="none">retplus=lnrets(indpos)</source> generates a subvector of non-negative returns and <source enclose="none">retminus=lnrets(1-indpos)</source> generates a subvector of negative returns.</p>
 +
<source>indpos=lnrets > = 0;
 +
indneg=1-indpos;
 +
Tplus=sum(indpos);
 +
Tminus=sum(indneg);
 +
retplus=sum(lnrets(indpos))/Tplus;
 +
retminus=sum(lnrets(indneg))/Tminus;</source></li></ol>

Revision as of 19:12, 28 September 2012

Task

In this exercise you will have to download some share prices and then use these data to calculate summary statistics for every year in the sample. We will then compare these statistics and see how they change through time.

The data you should download is the share prices of two companies, Glaxo Smith Kline (GSK) and Apple (AAPL). You can get these data from http://finance.yahoo.com/. Enter the Ticker symbols into the search box and after clicking enter go to the historical prices link. You should download daily data and then use the "Adjusted Close Prices". The sample period we use is from 2 January 1987 to 30 December 2011.

These are your tasks:

  1. Download the data and import into MATLAB (Date info and adjusted close prices only are required)
  2. Delete days for which you do not have observations for both stocks
  3. Calculate the daily log and simple returns for both series
  4. Calculate the following summary statistics for both stocks and for both types of returns for the full sample:
    1. Mean, standard deviation, variance, skewness and kurtosis of returns
    2. Number of positive and negative returns in the sample
    3. Average positive and negative returns in the sample
    4. correlation (between the AAPL and GSK returns and between AAPL and GSK prices)
    5. the sum of the autoregressive coefficients of an AR(5) model for each series
  5. Calculate the same statistics separately for every year of data (first for 1987, then 1988 and so forth) and evaluate (by eyeballing) any significant changes thorugh the years.

Generic Algorithm

  1. Import data in MATLAB (data are downloaded from Yahoo finance, finance.yahoo.com, adjusted daily close prices).
  2. Check data and ensure that you have a matching sample (i.e. delete days on which one of the data-sets does not provide data)
  3. Construct vectors of log-returns and simple returns
  4. Compute sample moments
  5. Construct positive and negative subsamples
  6. Repeat the above for all years from 1987 to 2011.

Implementation

Import data in MATLAB (keyword: data import)

MATLAB has a very wide range of importing procedures (see LoadingData). The most straightforward and user-friendly is the MATLAB import wizard. It opens via File/Import data menu. The next step is to select the data file of interest. The import wizard is quite intuitive. It works for a variety of standard file formats and can generate a MATLAB code to import similar files in the future (check box, right bottom corner). The import wizard works well for well-structured import files. For data files with more complicated structure the textscan function is used. There are two objects in the workspace after importing MSFT.txt: a matrix object data and a cell object textdata. MATLAB attempts to import all data columns as numerical data. If it fails, these columns are automatically dumped in a cell array textdata. As a result, all dates are converted to text in a cell array textdata and all numbers (prices) are stored in a data vector.

Construction of Log-returns

Log-returns are defined as [math]r_t=\ln(p_t)-\ln(p_{t-1})[/math] Simple returns are defined as [math]R_t=\frac{p_t}{p_{t-1}} \times 100\% -1[/math] The first return [math]r_1[/math] is not defined, since [math]p_0[/math] is not known. In MATLAB constructing of returns can be done in several ways. The long way to do this:

p=data;
[T, n]=size(p);
MSFTlogrets=zeros(T,n);
MSFTsimplerets=zeros(T,n);
for i=2:T
%This way r(1)=0 by construction
MSFTlogrets(i,1)=log(p(i,1))-log(p(i-1,1));
MSFTsimplerets(i,1)=p(i,1)/p(i-1,1)*100-1;
end

The same result can be achieved in a shorter way using the fact that MATLAB can extract submatrices from a matrix, that is p(2:end,:) will select all elements but the first row in a matrix, and p(1:end-1,:) will select all elements but the last.

lnrets=zeros(T,n);
simplerets=zeros(T,n);
%This way r(1)=0 by construction
lnrets(2:end,:)=log(p(2:end,:))-log(p(1:end-1,:));
simplerets(2:end,:)=p(2:end,:)./log(p(1:end-1,:)*100-1;

Constructing Sample moments

Formulas for mean [math]\bar r[/math], variance [math]\hat \sigma_r^2[/math], standard deviation [math]\hat \sigma_r[/math], skewness [math]\hat S_r[/math], and kurtosis [math]\hat K_r[/math]

[math]\begin{aligned} \bar r &= \frac{1}{T}\sum_{t=1}^T r_t\\ \hat \sigma_r^2 &= \frac{1}{T}\sum_{t=1}^T (r_t-\bar r)^2\\ \hat \sigma_r &= \sqrt{\hat \sigma_r^2}\\ \hat S_r &= \frac{1}{T}\sum_{t=1}^T (r_t-\bar r)^3/\hat \sigma_r^3\\ \hat K_r &= \frac{1}{T}\sum_{t=1}^T (r_t-\bar r)^4/\hat \sigma_r^4\end{aligned}[/math]

can be implemented directly using the following code:

  MeanLnRet = sum(lnrets)/T;
  VarLnRet  = sum((lnrets-MeanLnRet).^2)/T;
  StdLnRet  = sqrt(VarLnRet);
  SkewLnRet = (sum((lnrets-MeanLnRet).^3)/T)/StdLnRet^3;
  KurtLnRet = (sum((lnrets-MeanLnRet).^4)/T)/StdLnRet^4;

Positive and negative returns (keywords: loops, if-then-else statements, logical operations, vectorization)

Number ([math]T^+,T^-[/math]) and sample means ([math]r^+,r^-[/math]) of non-negative and negative returns are computed [math]T^+=\sum_{t=1}^T I(r_t\ge0)[/math] [math]T^-=\sum_{t=1}^T I(r_t\lt 0)[/math] [math]r^+=\frac{\sum_{t=1}^T r_t I(r_t\ge0)}{T^+}[/math] [math]r^-=\frac{\sum_{t=1}^T r_t I(r_t\lt 0)}{T^-}[/math] where [math]I(True)=1[/math], [math]I(True)=0[/math].

A longer way to compute these quantities is:

  1. Initialize variables Tplus=0, Tminus=0, retplus=0, retminus=0.

  2. Check whether [math]i[/math]th observation of returns lnrets(i) is greater than or equal to 0 [1st] for [math]i=1[/math]

  3. If (2) is True, set Tplus=Tplus+1;retplus=retplus+lnrets(i), else
    set Tminus=Tminus+1;retminus=retminus+lnrets(i) [3rd]

  4. Repeat lines 2 – 3 for [math]i=2,3,...,T[/math], where [math]T[/math] is the sample size

    Tplus=0;
    Tminus=0;
    retplus=0;
    retminus=0;
    for i=1:T %starts the loop
        if lnrets(i)>=0
        Tplus=Tplus+1; %counting non-negative returns
        retplus=retplus+lnrets(i);%summation of non-negative returns
        else
        Tminus=Tminus+1;%counting negative returns
        retminus=retminus+lnrets(i);%summation of negative returns
    end
    retplus=retplus/Tplus; %computing average non-negative return
    retminus=retminus/Tminus; %computing average negative return

If interested in a shorter way, the following has to be kept in mind:

  1. Logical relationships also work for vectors, that is indpos=(lnrets>=0) generates a vector of 0s (where lnrets(i)<0) and 1s (where lnrets(i)>=0)

  2. Logical expressions can be used for selecting subsamples from a sample, that is retplus=lnrets(indpos) generates a subvector of non-negative returns and retminus=lnrets(1-indpos) generates a subvector of negative returns.

    indpos=lnrets > = 0;
    indneg=1-indpos;
    Tplus=sum(indpos);
    Tminus=sum(indneg);
    retplus=sum(lnrets(indpos))/Tplus;
    retminus=sum(lnrets(indneg))/Tminus;