Difference between revisions of "Example 1"
(→Constructing y and X) |
(→OLS implementation) |
||
Line 203: | Line 203: | ||
= OLS implementation = | = OLS implementation = | ||
− | < | + | <source> >> OLSest(y,X,1) |
+ | =========================================================== | ||
+ | ===== Regression Output ================================== | ||
+ | Obs used = 3191, missing obs = 0 | ||
+ | Rsquared = 0.0032 | ||
+ | adj_Rsq = 0.0025 | ||
+ | ===== Estimated Model Parameters ========================== | ||
+ | = Par se(Par) t(Par) pval ================== | ||
+ | -0.0003 0.0004 -0.6423 0.5208 | ||
+ | 0.0010 0.0011 0.9088 0.3635 | ||
+ | -0.0542 0.0177 -3.0648 0.0022 | ||
+ | |||
+ | ===== Model Statistics ==================================== | ||
+ | Fstat = 5.0632 (0.0064) | ||
+ | standard error = 0.0211 | ||
+ | RSS = 1.4257 | ||
+ | Durbin-Watson = 2.0031 | ||
+ | =========================================================== | ||
+ | == p-values of -999 indicate that neither the stat ======== | ||
+ | == nor the NAG toolbox were available =====================</source> |
Revision as of 11:17, 16 October 2012
=1.0
Theory
Financial theory has shown that, for efficient financial markets in equilibrium, the best conditional forecast for the price of an asset tomorrow is its price today.
Otherwise, financial agents would buy or sell the asset until this is the case. This is one of the formulations of “the efficient market hypothesis”. In terms of returns this
means that [math]E(r_t|\mathcal{F}_{t-1})=0[/math]. This hypothesis can be tested using arbitrary model specifications and estimation methods. The simplest estimation method
is standard OLS with robust standard errors. The simplest model specifications are linear:
[math]\begin{aligned} r_t&=\phi_0+\phi_1 r_{t-1}+ \beta D^{Fri}+e_t & H_0:\beta=0,\ & H_a:\beta\ne 0\\ r_t&=\phi_0+ \beta D^{Fri}+e_t& H_0:\beta=0,\ & H_a:\beta\ne 0\\ r_t&=\phi_0+ \beta_1 D^{Mo}+ \beta_2 D ^{Tue}+ \beta_3 D^{Wed}+ \beta_4 D^{Thur}+e_t, & H_0: \beta_i=0,\ & H_a:\exists i:\ \beta_i\ne 0\\ r_t&=\phi_0+ \phi_1 r_{t-1}+\beta_1 D^{Mo}+ \beta_2 D ^{Tue}+ \beta_3 D^{Wed}+ \beta_4 D^{Thur}+e_t, & H_0: \beta_i=0,\ & H_a:\exists i:\ \beta_i\ne 0\end{aligned}[/math]
I will demonstrate the implementation of the first model in the list. Implement the others on your own.
Please note: under the null, [math]\phi_1 [/math] has to be equal to 0, too. However, due to various microstructure effects, this might not be the case.
Algorithm
- Import data in MATLAB (data are downloaded from Yahoo finance, http://finance.yahoo.com, adjusted daily close prices, MSFT from Jan, 1 2000 to Sept, 10 2012).
- Construct a vector of log-returns
- Construct [math]y[/math] vector and [math]X[/math] matrix. For this purpose a dummy for the day of the week is needed.
- Run OLS optimization, that is estimate [math]\hat\beta[/math] from the regression [math]y=X\beta+u[/math] and compute standard errors and test the hypothesis
•
Implementation
Import data in MATLAB (keyword: data import)
MATLAB has a very wide range of importing procedures. The most straightforward and user-friendly is the MATLAB import wizard. It opens via File/Import data menu. The next step
is to select the data file of interest. The import wizard is quite intuitive. It works for a variety of standard file formats and can generate a MATLAB code to import similar
files in the future (check box, right bottom corner). The import wizard works well for well-structured import files. For data files with more complicated structure the textscan
function is used. There are two objects in the workspace after importing MSFT.txt: a matrix object data
and a
cell object textdata
. MATLAB attempts to import all data columns as numerical data. If it fails, these columns are automatically dumped in a
cell array textdata
. As a result, all dates are converted to text in a cell array textdata
and all numbers
(prices) are stored in a data vector.
Construction of Log-returns
Log-returns are defined as [math]r_t=\ln(p_t)-\ln(p_{t-1})[/math] The first return [math]r_1[/math] is not defined, since [math]p_0[/math] is not known. The long way to
implement this in MATLAB is:
r=zeros(T,1);
for i=2:T
%This way r(1)=0 by construction
r(i)=log(p(i))-log(p(i-1));
end
The same result can be achieved in a shorter way using the fact that MATLAB can extract submatrices from a matrix, that is p(2:end,:)
will
select all elements but the first row in a matrix, and p(1:end-1,:)
will select all elements but the last.
r=zeros(T,1);
%This way r(1)=0 by construction
r(2:end)=log(p(2:end))-log(p(1:end-1));
The same result can be achieved using y=diff(x)
. The command generates a vector [math]y[/math], such that y(i)=x(i+1)-x (i)
. For return series y(i+1)=x(i+1)-x(i)
is needed. To correct for this, a vertical concatenation of vectors <link> is used. The code
collapses to
r=[0;diff(log(p))];
Constructing day-of-the-week dummy
To be able to convert textdata
to the day of the week variable, the date variable has to be converted to MATLAB date form. In MATLAB, date
variables are stored as number of days since 01/01/0000.
Ddate=datenum(textdata(2:end,1))
converts all entries from the first column of textdata
starting from the
second position until the end of it. For the sake of sanity, it is always a good idea to check Ddate
after conversion. For this purpose you can
use MATLAB function datestr
. If the date conversion is successful, datestr(Ddate(1))
gives exactly the same
date as in textdata(2,1)
. Otherwise, you have to check whether month and day are not switched.
Wkday=weekday(Ddate)
constructs a weekday indicator variable. It assigns values from 1 to 7 for different days of the week, i.e. 1 – Sunday,
..., 7 – Saturday. The last step is to construct a dummy variable for Friday.
The longest way
Create a vector of zeros
Dw
of the same length as the return seriesCheck whether the first observation of
Wkday
is Friday, that is check whetherWkday(i)==6
[1st] for [math]i[/math]=1If (2) is True, then
Dw(i)=1
, elseDw(i)=0
[3rd] for [math]i[/math]=1Repeat lines 2 – 3 for [math]i=2,3,...,T[/math], where [math]T[/math] is the sample size
T=length(y); %defines a number of steps in a loop Dw=zeros(T,1); %initializes a vector of 0s. It is quite important for performance for large T for i=1:T %starts the loop %filling the dummy variable if Wkday(i)==6 Dw(i)=1; else Dw(i)=0; end
•
Slightly shorter way
Since in MATLAB logical expression Wkday(i)==6
is 1, if True, and 0, if False, lines 2 – 3 can be combined in one line Dw(i)=Wkday(i)==6
. Then, the slightly shorter version of the long algorithm would be:
Create a vector of zeros
Dw
of the same length as the return seriesCheck whether the first observation of
Wkday
is Friday, i.e.
Dw(i)=(Wkday(i)==6)
[1st] for [math]i[/math]=1Repeat line 2 for [math]i=2,3,...,T[/math], where [math]T[/math] is the sample size
T=length(y); %defines a number of steps in a loop Dw=zeros(T,1); %initializes a vector of 0s. It is quite important for a speed for large T for i=1:T %starts the loop %filling the dummy variable Dw(i)=Wkday(i)==6; end
•
A shorter way
Note:
We need a first step in our algorithm since MATLAB works in the following way. The first time MATLAB runs Dw(i)=0
, it checks whether PC has a
long enough continuous chunk of memory. If “yes”, MATLAB creates a vector variable Dw
that has [math]i[/math] components. If “no”, MATLAB stops
with error. The next time MATLAB runs Dw(j)=0
, it check whether Dw
has [math]j[/math] or more components. If
“yes”, MATLAB changes the [math]j[/math]th component of Dw
variable to 0. If “no”, MATLAB checks PC memory and if there is a long enough
continuous chunk of memory that can accomodate a vector with [math]j[/math] components, this variable is created and the content of the previous [math]Dw[/math] is copied
onto the first [math]i[/math] components of vector Dw
. Otherwise, it stops with “” error. As a result, without the first step in our
algorithm, MATLAB creates [math]T[/math] different variables in the loop. It is not that important for small [math]T[/math], but it becomes time-consuming once [math]T[/math]
increases. Irrelevant example:
close all;clear all;clc;
T=2000;
tic;
%Slow cycle, takes roughly 9 seconds (Core 2 Duo, 2.86Mhz)
for i=1:T
count(i,i)=toc;
end
plot(diag(count))
tic;
%Fast cycle, takes 2.5 milliseconds
for i=T:-1:1
count1(i,i)=toc;
end
figure
plot(flipud(diag(count1)))
Keeping this in mind, the code can be rewritten as:
Check whether first observation of Wkday is Friday, i.e.
D(i)=Wkday(i)==6
[1st] for [math]i[/math]=1Repeat line [1st] for [math]i=2,3,...,T[/math], where [math]T[/math] is a sample size
T=length(y); %defines the number of steps in the loop for i=T:-1:1 %starts the loop %filling the dummy variable Dw(i)=Wkday(i)==6; end
The shortest way
The shortest way is to use the vector power of MATLAB. By default, MATLAB operates on matrices, not on scalars. Then, the expression Wkday==6
will generate a vector of 1s if this condition is True and 0s if it is not. Thus, everything can be collapsed to:
Dw=Wkday==6;
Please note, that the last method
- Is at least as efficient as the first two (and usually more efficient).
- Is much shorter (and thus, there is a smaller chance for mistakes).
- Does not require initialization of the variable
Dw
since assignment occurs just once.
Constructing [math]y[/math] and [math]X[/math]
A vector of y
is constructed in the following way. The first observation of y
corresponds to the second observation of vector r
. The last observation of y
corresponds to the last observation of r
. The first observation of matrix X
corresponds to the first observations of vector [math]r[/math] and the second observation of dummy variable Dw
. The last observation of X
correspond to the first before the last observation of r
and the last observation of Dw
.
[math]\begin{aligned}
r_2&=\phi_0+\phi r_1 + \beta D_2^{Friday}+e_2\\
r_T&=\phi_0+ \phi r_{T-1} + \beta D_T^{Friday}+e_T\end{aligned}[/math]
The code is:
y=r(2:end);
X=[ones(size(y) r(1:end-1) Dw(2:end)];
OLS implementation
>> OLSest(y,X,1)
===========================================================
===== Regression Output ==================================
Obs used = 3191, missing obs = 0
Rsquared = 0.0032
adj_Rsq = 0.0025
===== Estimated Model Parameters ==========================
= Par se(Par) t(Par) pval ==================
-0.0003 0.0004 -0.6423 0.5208
0.0010 0.0011 0.9088 0.3635
-0.0542 0.0177 -3.0648 0.0022
===== Model Statistics ====================================
Fstat = 5.0632 (0.0064)
standard error = 0.0211
RSS = 1.4257
Durbin-Watson = 2.0031
===========================================================
== p-values of -999 indicate that neither the stat ========
== nor the NAG toolbox were available =====================