Difference between revisions of "LoadingData"

From ECLR
Jump to: navigation, search
Line 1: Line 1:
=1.0
+
= Theory =
=UNDER CONSTRUCTION=
 
  
= Preliminaries =
+
Financial theory has shown that, for efficient financial markets, the best forecast for the price of an asset tomorrow is its price today. Otherwise, financial agents would buy or sell the asset until this is the case. This is one of the formulations of “the efficient market hypothesis”. In terms of returns this means that <math>E(r_t|\mathcal{F}_{t-1})=0</math>. This hypothesis can be tested using arbitrary model specifications and estimation methods. The simplest estimation method is standard OLS (link) with robust standard errors (link). The simplest model specifications are linear:
 +
 
 +
<math>\begin{aligned}
 +
r_t&=\phi_0+\phi_1 r_{t-1}+ \beta D_{day=5}+e_t,\ H_0:\beta=0, H_a\ \beta\ne 0\\
 +
r_t&=\phi_0+ \beta D_{day=5}+e_t,\ H_0:\beta=0, H_a\ \beta\ne 0\\
 +
r_t&=\phi_0+ \sum_{i=1}^4 \beta_i D_{day=i}+e_t, \ H_0: \beta_i=0, H_a\exists i:\ \beta_i\ne 0\\
 +
r_t&=\phi_0+ \phi_i r_{t-1}+\sum_{i=1}^4 \beta_i D_{day=i}+e_t, \ H_0: \beta_i=0, H_a\exists i:\ \beta_i\ne 0\end{aligned}</math>
  
\section{Theory}
 
%In financial economics, one of the main pursuits is an attempt to predict a behavior of financial returns using information available in advance, i.e. in the most general terms $E(r_t|\mathcal{F}_{t-1})$.
 
Financial theory has shown that, for efficient financial markets, the best forecast for the price of an asset tomorrow is its price today. Otherwise, financial agents would buy or sell the asset until this is the case. This is one of the formulations of ``the efficient market hypothesis''. In terms of returns this means that $E(r_t|\mathcal{F}_{t-1})=0$. This hypothesis can be tested using arbitrary model specifications and estimation methods. The simplest estimation method is standard OLS (link) with robust standard errors (link). The simplest model specifications are linear:
 
$$r_t=\phi_0+\phi_1 r_{t-1}+ \beta D_{day=5}+e_t,\ H_0:\beta=0, H_a\ \beta\ne 0$$
 
$$r_t=\phi_0+ \beta D_{day=5}+e_t,\ H_0:\beta=0, H_a\ \beta\ne 0$$
 
$$r_t=\phi_0+ \sum_{i=1}^4 \beta_i D_{day=i}+e_t, \ H_0: \beta_i=0, H_a\exists i:\ \beta_i\ne 0$$
 
$$r_t=\phi_0+ \phi_i r_{t-1}+\sum_{i=1}^4 \beta_i D_{day=i}+e_t, \ H_0: \beta_i=0, H_a\exists i:\ \beta_i\ne 0$$
 
 
I will demonstrate the implementation of the first model in the list. Implement the others on your own.
 
I will demonstrate the implementation of the first model in the list. Implement the others on your own.
  
<source>
+
= Algorithm =
>> array = 1:2:9
 
array =
 
1 3 5 7 9
 
</source>
 
  
Very often in your life you have to repeat the same operation many times (move your right and left legs while walking/running) or behave differently depending on external conditions (there is or there is no bus on a bus stop). Quite often these two are combined together. Say, if there is a bus on a bus stop, then you run trying to catch it, otherwise walk or stop and enjoy the usual Manchester weather. The same is true for programming. Quite often you want to repeat the same operation many times, or you want to change the way you treat your data depending on some conditions. We start with conditional statements. They execute different pieces of code depending whether is true or false. There are several ways you can formulate it. The shortest
+
# Import data in MATLAB (data are downloaded from Yahoo finance, http://finance.yahoo.com, adjusted daily close prices, MSFT from Jan, 1 2000 to Sept, 10 2012).
 +
# Construct a vector of log-returns
 +
# Construct <math>y</math> vector and <math>X</math> matrix. For this purpose a dummy for the day of the week is needed.
 +
# Run OLS optimization, that is estimate <math>\hat\beta</math> from the regression <math>y=X\beta+u</math> <link>
 +
# Compute standard errors <link>
 +
# Test the hypothesis
  
 +
  
<source>
+
= Implementation =
if condition
 
  statement1;
 
  statement2;
 
  ...
 
end
 
</source>
 
  
 +
== Import data in MATLAB (keyword: data import) ==
  
, executes only if is satisfied. can be anything that generate non-zero or 0 (True or False), say , , or . The last condition is True always but for . A slightly longer version
+
MATLAB has a very wide range of importing procedures. The most straightforward and user-friendly is the MATLAB import wizard. It opens via File/Import data menu. The next step is to select the data file of interest. The import wizard is quite intuitive. It works for a variety of standard file formats and can generate a MATLAB code to import similar files in the future (check box, right bottom corner). The import wizard works well for well-structured import files. For data files with more complicated structure the <source enclose=none>textscan</source> function is used. There are two objects in the workspace after importing MSFT.txt: a matrix object <source enclose=none>data</source> and a cell object <source enclose=none>textdata</source>. MATLAB attempts to import all data columns as numerical data. If it fails, these columns are automatically dumped in a cell array <source enclose=none>textdata</source>. As a result, all dates are converted to text in a cell array <source enclose=none>textdata</source> and all numbers (prices) are stored in a data vector.
  
 +
== Construction of Log-returns ==
  
<source>
+
Log-returns are defined as <math>r_t=\ln(p_t)-\ln(p_{t-1})</math> The first return <math>r_1</math> is not defined, since <math>p_0</math> is not known. The long way to implement this in MATLAB is:
if condition
 
  statement1;
 
  statement2;
 
  ...
 
  else
 
  statement1a;
 
  statement2a;
 
  ...
 
end
 
  </source>
 
runs , if is true and otherwise. In the most general case it looks like
 
  
<pre> if condition1
+
<source>r=zeros(T,1);
  statement1;
+
for i=2:T
  statement2;
+
%This way r(1)=0 by construction
  ...
+
r(i)=log(p(i))-log(p(i-1));
  elseif condition2
+
end</source>
  statement1a;
+
The same result can be achieved in a shorter way using the fact that MATLAB can extract submatrices from a matrix, that is <source enclose=none>p(2:end,:)</source> will select all elements but the first row in a matrix, and <source enclose=none>p(1:end-1,:)</source> will select all elements but the last.
  statement2a;
 
  ...
 
  ...
 
  ...
 
  elseif conditionN
 
  statement1b;
 
  statement2b;
 
  ...
 
  else
 
  statement1c;
 
  statement2c;
 
  ...
 
  end</pre>
 
In this case, however, you have to ensure that are mutually disjoint. As an example, you might think about different actions depending on your final grade. Say, condition1: ; condition 2: ; condition 3: ; etc.
 
  
MATLAB has two statements that creates a loop. First, it is unconditional loop:
+
<source>r=zeros(T,1);
 +
%This way r(1)=0 by construction
 +
r(2:end)=log(p(2:end))-log(p(1:end-1));</source>
 +
The same result can be achieved using <source enclose=none>y=diff(x)</source>. The command generates a vector <math>y</math>, such that <source enclose=none>y(i)=x(i+1)-x(i)</source>. For return series <source enclose=none>y(i+1)=x(i+1)-x(i)</source> is needed. To correct for this, a vertical concatenation of vectors <link> is used. The code collapses to
  
<pre>for CounterVariable=[range of values]
+
<source>r=[0;diff(log(p))];</source>
statement1;
+
== Constructing day-of-the-week dummy (keywords: loops, if-then-else statements, logical operations, vectorization) ==
statement2;
 
...
 
end</pre>
 
It repeats at most as many times as many elements it has in the . If range of values is empty, this loop does not run. Say, if you define a range , MATLAB creates an empty range. Thus, this loop will not be executed. If you define a range , MATLAB creates a range of four values , and the loop runs four times. During the first iteration, , during the second , etc. After the end of the loop . Please note, this is very unwise to modify the counter inside the loop. All modifications will disappear after the next iteration. Please also note, that the values in the range could be anything, including filenames or matrices from a cell vector. There are two commands that can modify an execution of the loop. breaks the current ''iteration'' of the loop. Once it is observed, the loop continues skipping current iteration. stops the execution of the loop and your program continues after this point. These commands are used inside statements. Say, skips the loop iteration for .
 
  
Second, it is a conditional loop
+
To be able to convert <source enclose=none>textdata</source> to the day of the week variable, the date variable has to be converted to MATLAB date form. In MATLAB, date variables are stored as number of days since 01/01/0000.<br />
 +
<source enclose=none>Ddate=datenum(textdata(2:end,1))</source> converts all entries from the first column of <source enclose=none>textdata</source> starting from the second position until the end of it. For the sake of sanity, it is always a good idea to check <source enclose=none>Ddate</source> after conversion. For this purpose you can use MATLAB function <source enclose=none>datestr</source>. If the date conversion is successful, <source enclose=none>datestr(Ddate(1))</source> gives exactly the same date as in <source enclose=none>textdata(2,1)</source>. Otherwise, you have to check whether month and day are not switched.<br />
 +
<source enclose=none>Wkday=weekday(Ddate)</source> constructs a weekday indicator variable. It assigns values from 1 to 7 for different days of the week, i.e. 1 – Sunday, ..., 7 – Saturday. The last step is to construct a dummy variable for Friday.
  
<pre>while condition
+
=== The longest way ===
statement1;
 
statement2;
 
...
 
end</pre>
 
This version of loop executes statements as long as is true. If is always true, your loop runs forever.
 
  
== loop ==
+
<ol>
 +
<li><p>Create a vector of zeros <source enclose=none>Dw</source> of the same length as the return series</p></li>
 +
<li><p>Check whether the first observation of <source enclose=none>Wkday</source> is Friday, that is check whether <source enclose=none>Wkday(i)==6</source> [1st] for <math>i</math>=1</p></li>
 +
<li><p>If (2) is True, then <source enclose=none>Dw(i)=1</source>, else <source enclose=none>Dw(i)=0</source> [3rd] for <math>i</math>=1</p></li>
 +
<li><p>Repeat lines 2 – 3 for <math>i=2,3,...,T</math>, where <math>T</math> is the sample size</p>
 +
<source>T=length(y); %defines a number of steps in a loop
 +
Dw=zeros(T,1); %initializes a vector of 0s. It is quite important for performance for large T
 +
for i=1:T %starts the loop
 +
    %filling the dummy variable
 +
    if Wkday(i)==6
 +
    Dw(i)=1;
 +
    else
 +
    Dw(i)=0;
 +
end</source></li></ol>
  
A standard application for loop is a reconstruction of AR(p) series once AR(p) coefficients and a vector of error terms in known. <math>y_t=\phi_0+\sum_{i=1}^p \phi_i y_{t-i}+e_t.</math> For simplicity, we assume that <math>p=1</math>. Also, to be able to compute <math>y_1</math>, we need to provide <math>y_0</math>. Since we don’t know <math>y_0</math>,the best guess for <math>y_0</math> is <math>E(y_0)</math>. For stationary AR(1) process, that is for the case <math>|\phi_1|<1</math>, <math>E(y_0)=\phi_0/(1-\phi_1)</math>. Thus, knowing <math>y_0</math> and <math>e_t</math> for <math>t=1,\ldots,T</math>, we can reconstruct <math>y_t,\ t=1\ldots,T</math>:
+
  
 +
=== Slightly shorter way ===
  
 +
Since in MATLAB logical expression <source enclose=none>Wkday(i)==6</source> is 1, if True, and 0, if False, lines 2 – 3 can be combined in one line <source enclose=none>Dw(i)=Wkday(i)==6</source>. Then, the slightly shorter version of the long algorithm would be:
  
\begin{aligned}
+
<ol>
y_1=&\phi_0+\phi_1 y_0+e_1\\
+
<li><p>Create a vector of zeros <source enclose=none>Dw</source> of the same length as the return series</p></li>
y_2=&\phi_0+\phi_1 y_1+e_2\\
+
<li><p>Check whether the first observation of <source enclose=none>Wkday</source> is Friday, i.e.<br />
&\ldots\\
+
<source enclose=none>Dw(i)=(Wkday(i)==6)</source> [1st] for <math>i</math>=1</p></li>
y_t=&\phi_0+\phi_1 y_{t-1}+e_t\\
+
<li><p>Repeat line 2 for <math>i=2,3,...,T</math>, where <math>T</math> is the sample size</p>
&\ldots\\
+
<source>T=length(y); %defines a number of steps in a loop
y_T=&\phi_0+\phi_1 y_{T-1}+e_T
+
Dw=zeros(T,1); %initializes a vector of 0s. It is quite important for a speed for large T
\end{aligned}
+
for i=1:T %starts the loop
 +
    %filling the dummy variable
 +
    Dw(i)=Wkday(i)==6;
 +
end</source></li></ol>
  
Definitely, if you are patient enough and <math>T</math> is not very large, you can create your m file with <math>T</math> lines in it. However, once <math>T</math> is unknown, this approach would not work. Fortunately, there is a better alternative for this type of operations. All these computations can be summarized using the following algorithm:
+
  
# Find a length of a vector of error terms :
+
=== A shorter way ===
# Initialize a vector of the same length as vector :
 
# Compute . Please remember, we assume that <math>y_0=E(y)=\phi_0/(1-\phi_1)</math>
 
# Compute for <math>i=2</math>
 
# Repeat line 4 for <math>i=3,...,T</math>
 
  
Assuming vector is known in advance, the MATLAB code is
+
'''Note:'''
  
<pre> T=size(e,1);
+
We need a first step in our algorithm since MATLAB works in the following way. The first time MATLAB runs <source enclose=none>Dw(i)=0</source>, it checks whether PC has a long enough continuous chunk of memory. If “yes”, MATLAB creates a vector variable <source enclose=none>Dw</source> that has <math>i</math> components. If “no”, MATLAB stops with error. The next time MATLAB runs <source enclose=none>Dw(j)=0</source>, it check whether <source enclose=none>Dw</source> has <math>j</math> or more components. If “yes”, MATLAB changes the <math>j</math>th component of <source enclose=none>Dw</source> variable to 0. If “no”, MATLAB checks PC memory and if there is a long enough continuous chunk of memory that can accomodate a vector with <math>j</math> components, this variable is created ''and the content of the previous <math>Dw</math> is copied onto the first <math>i</math> components of vector <source enclose=none>Dw</source>''. Otherwise, it stops with “” error. As a result, without the first step in our algorithm, MATLAB creates <math>T</math> different variables in the loop. It is not that important for small <math>T</math>, but it becomes time-consuming once <math>T</math> increases. Irrelevant example:
  y=zeros(T,1);
 
  y0=phi0/(1-phi1);
 
  y(1)=phi0+phi1*y0+e(1);
 
  for i=2:T
 
    y(i)=phi0+phi1*y(i-1)+e(i);
 
  end</pre>
 
However, if , <math>E(y_t)</math> is not constant. In this situation the formula we use in the code does not work and will create either a series of <math>\pm\infty</math>, if or a series of not a numbers , if <ref>There are two special numerical values in MATLAB. One is infinity , and another is not a number . A value of a variable becomes if the number is too big in absolute value (<math>\approx \pm 2e308</math>). Also, infinity is generated once you have expressions like <math>x/0</math>, where <math>x\ne0</math>. After that, infinity can only change a sign or become not a number. Not a number appears when there is an uncertainty of a kind of <math>0/0</math>, <math>\infty-\infty</math> and such. Any algebraic operations with result
 
</ref>.
 
  
== or  ==
+
<source>close all;clear all;clc;
 
+
T=2000;
To avoid these inconveniences, we have to consider separately two cases:
+
tic;
 +
%Slow cycle, takes roughly 9 seconds (Core 2 Duo, 2.86Mhz)
 +
for i=1:T
 +
    count(i,i)=toc;
 +
end
 +
plot(diag(count))
 +
tic;
 +
%Fast cycle, takes 2.5 milliseconds
 +
for i=T:-1:1
 +
    count1(i,i)=toc;
 +
end
 +
figure
 +
plot(flipud(diag(count1)))</source>
 +
Keeping this in mind, the code can be rewritten as:
  
# AR(1) process is stationary, i.e. <math>|\phi_1|<1</math>
+
<ol>
# AR(1) process is nonstationary, i.e.  <math>|\phi_1|\ge 1</math>
+
<li><p>Check whether first observation of Wkday is Friday, i.e. <source enclose=none>D(i)=Wkday(i)==6</source> [1st] for <math>i</math>=1</p></li>
 +
<li><p>Repeat line [1st] for <math>i=2,3,...,T</math>, where <math>T</math> is a sample size</p>
 +
<source>T=length(y); %defines the number of steps in the loop
 +
for i=T:-1:1 %starts the loop
 +
    %filling the dummy variable
 +
    Dw(i)=Wkday(i)==6;
 +
end</source></li></ol>
  
For the latter, we have to acknowledge the fact that <math>E(y_t)=\mu_t</math>, i.e. unconditional expectation is a function of time. In this case we have to set <math>E(y_0)</math> to some value. A standard assumption for non-stationary series is to assume that <math>E(y_0)=0</math>.
+
=== The shortest way ===
  
The algorithm in this case would look like:
+
The shortest way is to use the vector power of MATLAB. By default, MATLAB operates on matrices, not on scalars. Then, the expression <source enclose=none>Wkday==6</source> will generate a vector of 1s if this condition is True and 0s if it is not. Thus, everything can be collapsed to:
 
 
# Find a length of a vector of error terms :
 
# Initialize a vector of the same length as vector :
 
# Check whether . If this statement is true, then . Else, . Please remember, we set <math>y_0=E(y_0)</math>.
 
# Compute .
 
# Compute for <math>i=2</math>
 
# Repeat line 4 for <math>i=3,...,T</math>
 
 
 
Assuming vector is known in advance, the MATLAB code is
 
 
 
<pre>  T=size(e,1);
 
  y=zeros(T,1);
 
  if abs(phi1)&lt;1
 
  y0=phi0/(1-phi1);
 
  else
 
  y0=0;
 
  end
 
  y(1)=phi0+phi1*y0+e(1)
 
  for i=2:T
 
    y(i)=phi0+phi1*y(i-1)+e(i);
 
  end</pre>
 
If you don’t like word else, you can skip it:
 
 
 
<pre>  T=size(e,1);
 
  y=zeros(T,1);
 
  y0=0;
 
  if abs(phi1)&lt;1
 
  y0=phi0/(1-phi1);
 
  end
 
  y(1)=phi0+phi1*y0+e(1)
 
  for i=2:T
 
    y(i)=phi0+phi1*y(i-1)+e(i);
 
  end</pre>
 
==  loop ==
 
 
 
An alternative way of running the same code is to use a conditional loop (purely for demonstration purposes). Usually conditional loop is used when the number of iterations is not known in advance.
 
 
 
# Find a length of a vector of error terms :
 
# Initialize a vector of the same length as vector :
 
# Check whether . If this statement is true, then . Else, . Please remember, we set <math>y_0=E(y_0)</math>.
 
# Compute .
 
# Compute for <math>i=2</math>
 
# Increase i by 1, i.e. <math>i=i+1</math> (please note, in programming this is not a stupid statement,
 
# Repeat line 4 while <math>i<=T</math>
 
 
 
Assuming vector is known in advance, the MATLAB code is
 
 
 
<pre>  T=size(e,1);
 
  y=zeros(T,1);
 
  if abs(phi1)&lt;1
 
  y0=phi0/(1-phi1);
 
  else
 
  y0=0;
 
  end
 
  y(1)=phi0+phi1*y0+e(1)
 
  i=2;
 
  while i&lt;=T
 
    y(i)=phi0+phi1*y(i-1)+e(i);
 
    i=i+1;
 
  end</pre>
 
== Imperfect substitutes of the above ==
 
 
 
MATLAB has two powerful tools that make programmer’s life much easier and utilization of loops/if less frequent. In addition, quite often it makes the code run faster. In particular,
 
 
 
# Logical expressions work not only on scalars, but also on vectors, matrices and, in general, on n-dimensional arrays.
 
# Subvectors/submatrices can be extracted using logical 0-1 arrays.
 
 
 
=== Irrelevant but useful example ===
 
 
 
typing in MATLAB command window create a <math>1\times5</math> row-vector with values <math>[1\ 2\ 3\ 4\ 5]</math>. Logical expression will create a so called logical vector with values <math>[0\ 0\ 0\ 1\ 1]</math>, i.e. it is 1 if the according element is greater than 3.5 and 0 otherwise. Now, typing will generate a <math>2\times1</math> subvector with values <math>[4\ 5]</math>. You can also create some vectors or matrices with specific values changed: a command replace the last two values of the original vector . As a result, the vector becomes <math>[1 \ 2\ 3\ 8\ 10]</math>.
 
 
 
=== Slightly less irrelevant example ===
 
 
 
In some occasions you would like to modify a matrix of interest. Say, in some surveys “no answer” is coded as 999. Once you import the whole dataset in , you might want to replace these with, say, NaN. It can be done for the whole matrix of interest: .
 
 
 
=== Relevant example ===
 
 
 
To demonstrate these capabilities in a more relevant environment, let’s run a very simple example. Assume that we have <math>T\times1</math> vector of returns and want to
 
 
 
# Compute number of positive, negative and zero returns
 
# Compute means of positive and negative returns
 
 
 
The algorithm for this is quite straightforward:
 
 
 
# Find out a length of vector , T
 
# Initiate three counter variables, , and vectors (since we don’t know how many negative and positive returns we will observe
 
# Check whether r(i) is greater, smaller or equal to 0 for i=1
 
# If , add 1 to Tplus, set ;
 
# Else if add 1 to Tminus, set ;
 
# Else add 1 to Tzero
 
# Repeat 3-6 for <math>i=2,\ldots,T</math>
 
# Remove excessive zeros from and :<br />
 
 
 
# Compute means of rminus and rplus. Number of positive, negative and zero returns are stored in
 
 
 
MATLAB translation:
 
 
 
<pre>T=size(r,1);
 
Tplus=0;Tminus=0;Tzero=0;
 
rplus=zeros(T,1);rminus=zeros(T,1);
 
for i=1:T
 
    if r(i)&gt;0
 
        Tplus=Tplus+1;%increasing Tplus by one if return is positive
 
        rplus(Tplus)=r(i);%storing positive return in the proper subvector
 
    elseif r(i)&lt;0
 
        Tminus=Tminus+1;%increasing Tminus by one if return is negative
 
        rminus(Tminus)=r(i);%storing negative return in the proper subvector
 
    else
 
        Tzero=Tzero+1;%increasing Tzero by one if return is neither positive nor negative
 
    end
 
end
 
rplus=rplus(1:Tplus);%removing excessive zeros from a subvector of positive returns
 
rminus=rminus(1:Tminus);%removing excessive zeros from a subvector of negative returns
 
meanplus=mean(rplus);%computing mean of positive returns
 
meanminus=sum(rminus)/Tminus;%computing mean of negative returns</pre>
 
Using MATLAB capabilities mentioned in this section, the algorithm can be reduced to:
 
  
# Construct a vector that has 1 for positive returns and 0 for negative returns
+
<source>    Dw=Wkday==6;</source>
# Construct a vector that has 1 for negative returns and 0 for positive returns
+
Please note, that the last method
# Assign to a sum of elements of . This is a number of positive returns
 
# Assign to a sum of elements of . This is a number of negative returns
 
# Compute which is
 
# Construct a vector of positive returns and compute its mean
 
# Construct a vector of negative returns and compute its mean
 
  
MATLAB implementation:
+
# Is at least as efficient as the first two (and usually more efficient).
 +
# Is much shorter (and thus, there is a smaller chance for mistakes).
 +
# Does not require initialization of the variable <source enclose=none>Dw</source> since assignment occurs just once.
  
<pre> T=size(r,1);
+
== Constructing <math>y</math> and <math>X</math> ==
  indplus  = r&gt;0;%constructing an indicator vector with 1s if r(i)&gt;0, 0 otherwise
 
  indminus = r&lt;0;%constructing an indicator vector with 1s if r(i)&lt;0, 0 otherwise
 
  Tplus=sum(indplus);%computing a number of positive returns
 
  Tminus=sum(indminus);%computing a number of negative returns
 
  Tzero=T-Tplus-Tminus;%computing a number of zero returns
 
  rplus=r(indplus);%constructing a vector of positive returns
 
  rminus=r(indminus);%constructing a vector of negative returns
 
  meanplus=sum(rplus)/Tplus; %computing mean of positive returns
 
  meanminus=mean(rminus); %computing mean of negative returns</pre>
 
Or, a slightly shorter version of the same thing
 
  
<pre> T=size(r,1);
+
A vector of <source enclose=none>y</source> is constructed in the following way. The first observation of <source enclose=none>y</source> corresponds to the second observation of vector <source enclose=none>r</source>. The last observation of <source enclose=none>y</source> corresponds to the last observation of <source enclose=none>r</source>. The first observation of matrix <source enclose=none>X</source> corresponds to the first observations of vector <math>r</math> and the second observation of dummy variable <source enclose=none>Dw</source>. The last observation of <source enclose=none>X</source> correspond to the first before the last observation of <source enclose=none>r</source> and the last observation of <source enclose=none>Dw</source>. <math>r_2=\phi r_1 + \beta D_2^{Friday}+e_2</math> <math>r_T=\phi r_{T-1} + \beta D_T^{Friday}+e_T</math> The code is:
  rplus  = r(r&gt;0);%constructing a vector of positive returns
 
  rminus = r(r&lt;0);%%constructing a vector of negative returns
 
  Tplus=size(rplus,1);%computing a number of positive returns
 
  Tminus=size(indminus,l);%computing a number of negative returns
 
  Tzero=T-Tplus-Tminus;%computing a number of zero returns
 
  meanplus=sum(rplus)/Tplus; %computing mean of positive returns
 
  meanminus=mean(rminus); %computing mean of negative returns</pre>
 
This way you write a code that is shorter, less prone to errors and easier to read (at least after some practice).
 
  
<references />
+
<source>y=r(2:end);
 +
X=[r(1:end-1) Dw(2:end)];</source>
 +
= OLS implementation =

Revision as of 13:00, 20 September 2012

Theory

Financial theory has shown that, for efficient financial markets, the best forecast for the price of an asset tomorrow is its price today. Otherwise, financial agents would buy or sell the asset until this is the case. This is one of the formulations of “the efficient market hypothesis”. In terms of returns this means that [math]E(r_t|\mathcal{F}_{t-1})=0[/math]. This hypothesis can be tested using arbitrary model specifications and estimation methods. The simplest estimation method is standard OLS (link) with robust standard errors (link). The simplest model specifications are linear:

[math]\begin{aligned} r_t&=\phi_0+\phi_1 r_{t-1}+ \beta D_{day=5}+e_t,\ H_0:\beta=0, H_a\ \beta\ne 0\\ r_t&=\phi_0+ \beta D_{day=5}+e_t,\ H_0:\beta=0, H_a\ \beta\ne 0\\ r_t&=\phi_0+ \sum_{i=1}^4 \beta_i D_{day=i}+e_t, \ H_0: \beta_i=0, H_a\exists i:\ \beta_i\ne 0\\ r_t&=\phi_0+ \phi_i r_{t-1}+\sum_{i=1}^4 \beta_i D_{day=i}+e_t, \ H_0: \beta_i=0, H_a\exists i:\ \beta_i\ne 0\end{aligned}[/math]

I will demonstrate the implementation of the first model in the list. Implement the others on your own.

Algorithm

  1. Import data in MATLAB (data are downloaded from Yahoo finance, http://finance.yahoo.com, adjusted daily close prices, MSFT from Jan, 1 2000 to Sept, 10 2012).
  2. Construct a vector of log-returns
  3. Construct [math]y[/math] vector and [math]X[/math] matrix. For this purpose a dummy for the day of the week is needed.
  4. Run OLS optimization, that is estimate [math]\hat\beta[/math] from the regression [math]y=X\beta+u[/math] <link>
  5. Compute standard errors <link>
  6. Test the hypothesis

Implementation

Import data in MATLAB (keyword: data import)

MATLAB has a very wide range of importing procedures. The most straightforward and user-friendly is the MATLAB import wizard. It opens via File/Import data menu. The next step is to select the data file of interest. The import wizard is quite intuitive. It works for a variety of standard file formats and can generate a MATLAB code to import similar files in the future (check box, right bottom corner). The import wizard works well for well-structured import files. For data files with more complicated structure the textscan function is used. There are two objects in the workspace after importing MSFT.txt: a matrix object data and a cell object textdata. MATLAB attempts to import all data columns as numerical data. If it fails, these columns are automatically dumped in a cell array textdata. As a result, all dates are converted to text in a cell array textdata and all numbers (prices) are stored in a data vector.

Construction of Log-returns

Log-returns are defined as [math]r_t=\ln(p_t)-\ln(p_{t-1})[/math] The first return [math]r_1[/math] is not defined, since [math]p_0[/math] is not known. The long way to implement this in MATLAB is:

r=zeros(T,1);
for i=2:T
%This way r(1)=0 by construction
r(i)=log(p(i))-log(p(i-1));
end

The same result can be achieved in a shorter way using the fact that MATLAB can extract submatrices from a matrix, that is p(2:end,:) will select all elements but the first row in a matrix, and p(1:end-1,:) will select all elements but the last.

r=zeros(T,1);
%This way r(1)=0 by construction
r(2:end)=log(p(2:end))-log(p(1:end-1));

The same result can be achieved using y=diff(x). The command generates a vector [math]y[/math], such that y(i)=x(i+1)-x(i). For return series y(i+1)=x(i+1)-x(i) is needed. To correct for this, a vertical concatenation of vectors <link> is used. The code collapses to

r=[0;diff(log(p))];

Constructing day-of-the-week dummy (keywords: loops, if-then-else statements, logical operations, vectorization)

To be able to convert textdata to the day of the week variable, the date variable has to be converted to MATLAB date form. In MATLAB, date variables are stored as number of days since 01/01/0000.
Ddate=datenum(textdata(2:end,1)) converts all entries from the first column of textdata starting from the second position until the end of it. For the sake of sanity, it is always a good idea to check Ddate after conversion. For this purpose you can use MATLAB function datestr. If the date conversion is successful, datestr(Ddate(1)) gives exactly the same date as in textdata(2,1). Otherwise, you have to check whether month and day are not switched.
Wkday=weekday(Ddate) constructs a weekday indicator variable. It assigns values from 1 to 7 for different days of the week, i.e. 1 – Sunday, ..., 7 – Saturday. The last step is to construct a dummy variable for Friday.

The longest way

  1. Create a vector of zeros Dw of the same length as the return series

  2. Check whether the first observation of Wkday is Friday, that is check whether Wkday(i)==6 [1st] for [math]i[/math]=1

  3. If (2) is True, then Dw(i)=1, else Dw(i)=0 [3rd] for [math]i[/math]=1

  4. Repeat lines 2 – 3 for [math]i=2,3,...,T[/math], where [math]T[/math] is the sample size

    T=length(y); %defines a number of steps in a loop
    Dw=zeros(T,1); %initializes a vector of 0s. It is quite important for performance for large T
    for i=1:T %starts the loop
        %filling the dummy variable
        if Wkday(i)==6
        Dw(i)=1;
        else
        Dw(i)=0;
    end

Slightly shorter way

Since in MATLAB logical expression Wkday(i)==6 is 1, if True, and 0, if False, lines 2 – 3 can be combined in one line Dw(i)=Wkday(i)==6. Then, the slightly shorter version of the long algorithm would be:

  1. Create a vector of zeros Dw of the same length as the return series

  2. Check whether the first observation of Wkday is Friday, i.e.
    Dw(i)=(Wkday(i)==6) [1st] for [math]i[/math]=1

  3. Repeat line 2 for [math]i=2,3,...,T[/math], where [math]T[/math] is the sample size

    T=length(y); %defines a number of steps in a loop
    Dw=zeros(T,1); %initializes a vector of 0s. It is quite important for a speed for large T
    for i=1:T %starts the loop
        %filling the dummy variable
        Dw(i)=Wkday(i)==6;
    end

A shorter way

Note:

We need a first step in our algorithm since MATLAB works in the following way. The first time MATLAB runs Dw(i)=0, it checks whether PC has a long enough continuous chunk of memory. If “yes”, MATLAB creates a vector variable Dw that has [math]i[/math] components. If “no”, MATLAB stops with error. The next time MATLAB runs Dw(j)=0, it check whether Dw has [math]j[/math] or more components. If “yes”, MATLAB changes the [math]j[/math]th component of Dw variable to 0. If “no”, MATLAB checks PC memory and if there is a long enough continuous chunk of memory that can accomodate a vector with [math]j[/math] components, this variable is created and the content of the previous [math]Dw[/math] is copied onto the first [math]i[/math] components of vector Dw. Otherwise, it stops with “” error. As a result, without the first step in our algorithm, MATLAB creates [math]T[/math] different variables in the loop. It is not that important for small [math]T[/math], but it becomes time-consuming once [math]T[/math] increases. Irrelevant example:

close all;clear all;clc;
T=2000;
tic;
%Slow cycle, takes roughly 9 seconds (Core 2 Duo, 2.86Mhz)
for i=1:T
    count(i,i)=toc;
end
plot(diag(count))
tic;
%Fast cycle, takes 2.5 milliseconds
for i=T:-1:1
    count1(i,i)=toc;
end
figure
plot(flipud(diag(count1)))

Keeping this in mind, the code can be rewritten as:

  1. Check whether first observation of Wkday is Friday, i.e. D(i)=Wkday(i)==6 [1st] for [math]i[/math]=1

  2. Repeat line [1st] for [math]i=2,3,...,T[/math], where [math]T[/math] is a sample size

    T=length(y); %defines the number of steps in the loop
    for i=T:-1:1 %starts the loop
        %filling the dummy variable
        Dw(i)=Wkday(i)==6;
    end

The shortest way

The shortest way is to use the vector power of MATLAB. By default, MATLAB operates on matrices, not on scalars. Then, the expression Wkday==6 will generate a vector of 1s if this condition is True and 0s if it is not. Thus, everything can be collapsed to:

    Dw=Wkday==6;

Please note, that the last method

  1. Is at least as efficient as the first two (and usually more efficient).
  2. Is much shorter (and thus, there is a smaller chance for mistakes).
  3. Does not require initialization of the variable Dw since assignment occurs just once.

Constructing [math]y[/math] and [math]X[/math]

A vector of y is constructed in the following way. The first observation of y corresponds to the second observation of vector r. The last observation of y corresponds to the last observation of r. The first observation of matrix X corresponds to the first observations of vector [math]r[/math] and the second observation of dummy variable Dw. The last observation of X correspond to the first before the last observation of r and the last observation of Dw. [math]r_2=\phi r_1 + \beta D_2^{Friday}+e_2[/math] [math]r_T=\phi r_{T-1} + \beta D_T^{Friday}+e_T[/math] The code is:

y=r(2:end);
X=[r(1:end-1) Dw(2:end)];

OLS implementation