Difference between revisions of "LoadingData"

From ECLR
Jump to: navigation, search
 
(102 intermediate revisions by 5 users not shown)
Line 1: Line 1:
=1.0
+
= Introduction =
  
= Preliminaries =
+
Usually you will have data saved in some file, like Excel files, csv (comma seperated values) file or a text file. These are the most common formats in which you can download data from various databases.
  
Very often in your life you have to repeat the same operation many times (move your right and left legs while walking/running) or behave differently depending on external conditions (there is or there is no bus on a bus stop). Quite often these two are combined together. Say, if there is a bus on a bus stop, then you run trying to catch it, otherwise walk or stop and enjoy the usual Manchester weather. The same is true for programming. Quite often you want to repeat the same operation many times, or you want to change the way you treat your data depending on some conditions. We start with conditional statements. They execute different pieces of code depending whether is true or false. There are several ways you can formulate it. The shortest
+
Before we can look at the data upload procedure it is important to have a very quick look at the way in which MATLAB deals with dates.
  
<pre>if condition
+
= Date formats =
statement1;
 
statement2;
 
...
 
end
 
</pre>
 
, executes only if is satisfied. can be anything that generate non-zero or 0 (True or False), say , , or . The last condition is True always but for . A slightly longer version
 
  
<pre> if condition
+
When dealing with time series data you will often want to keep date information. MATLAB has excellent date functionality and dealing with data is indeed one of its strengths. There are two principle formats in which MATLAB handles dates, the <source enclose=none>datenum</source> and the <source enclose=none>datevec</source> format
  statement1;
 
  statement2;
 
  ...
 
  else
 
  statement1a;
 
  statement2a;
 
  ...
 
  end
 
  </pre>
 
runs , if is true and otherwise. In the most general case it looks like
 
  
<pre> if condition1
+
# The <source enclose=none>datenum</source> format: Here MATLAB records date information in terms of the number of days since 01 Jan 0000. This day is given the number 1. The next day (2 Jan 0000) is assigned the number 2 and so forth. The 22 Sept 2012 is 735134. If you have time information included into your dates, that will be reflected in fractions. For example, '22-Sept-2012 12:00:00' is represented by 735134.5, as it is half way through the day.
  statement1;
 
  statement2;
 
  ...
 
  elseif condition2
 
  statement1a;
 
  statement2a;
 
  ...
 
  ...
 
  ...
 
  elseif conditionN
 
  statement1b;
 
  statement2b;
 
  ...
 
  else
 
  statement1c;
 
  statement2c;
 
  ...
 
  end</pre>
 
In this case, however, you have to ensure that are mutually disjoint. As an example, you might think about different actions depending on your final grade. Say, condition1: ; condition 2: ; condition 3: ; etc.
 
  
MATLAB has two statements that creates a loop. First, it is unconditional loop:
+
# The <source enclose=none>datevec</source> format: Here a date is transformed into a (1 x 6) vector where the first element represents the year, the second the month, the third the day, the fourth the hour, the fith the minutes and the sixth the seconds. If you only enter a day, the last three entries will take the value 0.
  
<pre>for CounterVariable=[range of values]
+
It is a great feature of MATLAB that it recognises a good number of different date formats and translates them easily into one of the two MATLAB formats. Importantly, if you open an EXCEL file (see below) and your spreadsheet contains dates formatted as EXCEL dates, MATLAB will automatically translate these dates into the <source enclose=none>datenum</source> format if you are using a MATLAB version R2012a or higher.
statement1;
 
statement2;
 
...
 
end</pre>
 
It repeats at most as many times as many elements it has in the . If range of values is empty, this loop does not run. Say, if you define a range , MATLAB creates an empty range. Thus, this loop will not be executed. If you define a range , MATLAB creates a range of four values , and the loop runs four times. During the first iteration, , during the second , etc. After the end of the loop . Please note, this is very unwise to modify the counter inside the loop. All modifications will disappear after the next iteration. Please also note, that the values in the range could be anything, including filenames or matrices from a cell vector. There are two commands that can modify an execution of the loop. breaks the current ''iteration'' of the loop. Once it is observed, the loop continues skipping current iteration. stops the execution of the loop and your program continues after this point. These commands are used inside statements. Say, skips the loop iteration for .
 
  
Second, it is a conditional loop
+
= Data Sources =
  
<pre>while condition
+
By far the easiest formats to import data into MATLAB are Excel spreadsheets and csv files. When dealing with these type of files you can double click on the file you want to import in the "Current Directory" window. A window will then open and give you a preview of the data MATLAB will import. How exactly this wimdow looks like and what translation MATLAB does automatically now depens on the MATLAB version you are working with.
statement1;
 
statement2;
 
...
 
end</pre>
 
This version of loop executes statements as long as is true. If is always true, your loop runs forever.
 
  
== loop ==
+
== EXCEL files ==
  
A standard application for loop is a reconstruction of AR(p) series once AR(p) coefficients and a vector of error terms in known. <math>y_t=\phi_0+\sum_{i=1}^p \phi_i y_{t-i}+e_t.!</math> For simplicity, we assume that <math>p=1</math>. Also, to be able to compute <math>y_1</math>, we need to provide <math>y_0</math>. Since we don’t know <math>y_0</math>,the best guess for <math>y_0</math> is <math>E(y_0)</math>. For stationary AR(1) process, that is for the case <math>|\phi_1|<1</math>, <math>E(y_0)=\phi_0/(1-\phi_1)!</math>. Thus, knowing <math>y_0</math> and <math>e_t</math> for <math>t=1,\ldots,T</math>, we can reconstruct <math>y_t,\ t=1\ldots,T</math>:
+
=== MATLAB R2011 and lower ===
  
:<math>
+
Generally MATLAB will generate the data in two matrices, one for the columns that MATLAB recognised as numerical data (often called <source enclose=none>data</source>) and another that contains all non-numerical spreadsheet entries (often called <source enclose=none>textdata</source>). You can then use these data matrices for any further operations. Date cells will end up in the <source enclose=none>textdata</source> matrix and you will have to convert them manually to the MATLAB dates files (see below).
y_1=&\phi_0+\phi_1 y_0+e_1\\
 
y_2=&\phi_0+\phi_1 y_1+e_2\\
 
&\ldots\\
 
y_t=&\phi_0+\phi_1 y_{t-1}+e_t\\
 
&\ldots\\
 
y_T=&\phi_0+\phi_1 y_{T-1}+e_T</math>
 
  
Definitely, if you are patient enough and <math>T</math> is not very large, you can create your m file with <math>T</math> lines in it. However, once <math>T</math> is unknown, this approach would not work. Fortunately, there is a better alternative for this type of operations. All these computations can be summarized using the following algorithm:
+
Instead of double clicking on a file in the "Current Directory" window you can also incorporate the data import into a piece of MATLAB script. This is done with the following command:
  
# Find a length of a vector of error terms :
+
    [NUM,TXT,RAW] = xlsread('MSFT.xlsx')  
# Initialize a vector of the same length as vector :
 
# Compute . Please remember, we assume that <math>y_0=E(y)=\phi_0/(1-\phi_1)</math>
 
# Compute for <math>i=2</math>
 
# Repeat line 4 for <math>i=3,...,T</math>
 
  
Assuming vector is known in advance, the MATLAB code is
+
which will save the content of the EXCEL file ([[media:MSFT.xlsx]]) into three variables. The complete spreadsheet is saved in <source enclose=none>RAW</source>, all numerical entries are in <source enclose=none>NUM</source> and all text entries are in <source enclose=none>TXT</source>. This command works if the data you want to import are on the first sheet in the EXCEL file. If you have several shhets in your EXCEL file and you want to import the data of a particular sheet you can achieve this by means of:
  
<pre>  T=size(e,1);
+
    [NUM,TXT,RAW] = xlsread('MSFT.xlsx','MSFT')  
  y=zeros(T,1);
 
  y0=phi0/(1-phi1);
 
  y(1)=phi0+phi1*y0+e(1);
 
  for i=2:T
 
    y(i)=phi0+phi1*y(i-1)+e(i);
 
  end</pre>
 
However, if , <math>E(y_t)</math> is not constant. In this situation the formula we use in the code does not work and will create either a series of <math>\pm\infty</math>, if or a series of not a numbers , if <ref>There are two special numerical values in MATLAB. One is infinity , and another is not a number . A value of a variable becomes if the number is too big in absolute value (<math>\approx \pm 2e308</math>). Also, infinity is generated once you have expressions like <math>x/0</math>, where <math>x\ne0</math>. After that, infinity can only change a sign or become not a number. Not a number appears when there is an uncertainty of a kind of <math>0/0</math>, <math>\infty-\infty</math> and such. Any algebraic operations with result
 
</ref>.
 
  
==  or  ==
+
where the data here are on the sheet called 'MSFT'. Also consult <source enclose=none>doc xlsread</source> for more details on this import function.
  
To avoid these inconveniences, we have to consider separately two cases:
+
=== MATLAB R2012 and higher ===
  
# AR(1) process is stationary, i.e. <math>|\phi_1|<1</math>
+
Importing EXCEL data works in the same way as for earlier versions if you use the command line. The difference occurs if you import data by double clicking on the file. Now MATLAB will automatically recognise if any of the input data are dates and if it finds any suggest that these are imported as <source enclose=none>datenum</source> type data.
# AR(1) process is nonstationary, i.e.  <math>|\phi_1|\ge 1</math>
 
  
For the latter, we have to acknowledge the fact that <math>E(y_t)=\mu_t</math>, i.e. unconditional expectation is a function of time. In this case we have to set <math>E(y_0)</math> to some value. A standard assumption for non-stationary series is to assume that <math>E(y_0)=0</math>.
+
Often MATLAB will get the dates right, but there are situations where MATLAB does not get the date-info correctly. One reason for this can be that the MATLAB and Excel are using different reference days. As mentioned above, day 1 in MATLAB is 1 Jan 0000, whereas day one in Excel is 1 Jan 1900. It is prudent to check that MATLAB has done the right thing.
  
The algorithm in this case would look like:
+
Here is some guidance on how to do that. Let's say you are importing a dataset where you know that the first observation is for 22 Sept 2012. Once the data are in MATLAB check what the value for the first date is. If it is still in text format, i.e. "22/9/2012", then you know that you are yet to translate it into a MATLAB date format. However, if you encounter the number 735134, then you know that MATLAB has already translated the date into its own format. Whether it has done so correctly you can check using
  
# Find a length of a vector of error terms :
+
<source>>> datevec(735134)
# Initialize a vector of the same length as vector :
+
ans =
# Check whether . If this statement is true, then . Else, . Please remember, we set <math>y_0=E(y_0)</math>.
+
        2012          9          22          0          0          0
# Compute .
+
</source>
# Compute for <math>i=2</math>
 
# Repeat line 4 for <math>i=3,...,T</math>
 
  
Assuming vector is known in advance, the MATLAB code is
+
which returns the correct date (in the "YYYY MM DD hh mm ss" format). If however MATLAB has the number 41174 as an entry, then the translation did not take the different reference date into account. The number is the Excel date number (as 22 Sept 2012 is the 41174th day when starting with 1 Jan 1900).
  
<pre> T=size(e,1);
+
How can you correct this? Fortunately the correction is pretty easy. Lets assume that your dates are in a vector called <source enclose=none>dateold</source>, then you can re-base the date vector as follows:  
  y=zeros(T,1);
 
  if abs(phi1)&lt;1
 
  y0=phi0/(1-phi1);
 
  else
 
  y0=0;
 
  end
 
  y(1)=phi0+phi1*y0+e(1)
 
  for i=2:T
 
    y(i)=phi0+phi1*y(i-1)+e(i);
 
  end</pre>
 
If you don’t like word else, you can skip it:
 
  
<pre> T=size(e,1);
+
<source> datenew = dateold + (735134-41174# rebasing date vector
  y=zeros(T,1);
+
</source>
  y0=0;
 
  if abs(phi1)&lt;1
 
  y0=phi0/(1-phi1);
 
   end
 
  y(1)=phi0+phi1*y0+e(1)
 
  for i=2:T
 
    y(i)=phi0+phi1*y(i-1)+e(i);
 
  end</pre>
 
==  loop ==
 
  
An alternative way of running the same code is to use a conditional loop (purely for demonstration purposes). Usually conditional loop is used when the number of iterations is not known in advance.
+
When you do that, it is vital that you add a comment to the code to remember why you did that. Otherwise it will the reason for a lot of head-scratching if you open the code a few months later.
  
# Find a length of a vector of error terms :
+
== CSV files ==
# Initialize a vector of the same length as vector :
 
# Check whether . If this statement is true, then . Else, . Please remember, we set <math>y_0=E(y_0)</math>.
 
# Compute .
 
# Compute for <math>i=2</math>
 
# Increase i by 1, i.e. <math>i=i+1</math> (please note, in programming this is not a stupid statement,
 
# Repeat line 4 while <math>i<=T</math>
 
  
Assuming vector is known in advance, the MATLAB code is
+
To import csv files you can also double click on the file in the "Current Directory" window. In this case MATLAB will basically do exactly the same as if the csv file was a xlsx file (see above). You can also import data from a csv file through a command line.
  
<pre>  T=size(e,1);
+
     DATA = csvread('MSFT.csv')  
  y=zeros(T,1);
 
  if abs(phi1)&lt;1
 
  y0=phi0/(1-phi1);
 
  else
 
  y0=0;
 
  end
 
  y(1)=phi0+phi1*y0+e(1)
 
  i=2;
 
  while i&lt;=T
 
     y(i)=phi0+phi1*y(i-1)+e(i);
 
    i=i+1;
 
  end</pre>
 
== Imperfect substitutes of the above ==
 
  
MATLAB has two powerful tools that make programmer’s life much easier and utilization of loops/if less frequent. In addition, quite often it makes the code run faster. In particular,
+
If you execute this line with this LINK TO THIS FILE!!!! 'MSFT.csv' file you will actually get an error message. This file contains the date information in the first column and the <source enclose=none>csvread</source> function does not really know how to handle these. To avoid this, you can tell MATLAB to only import a certain section of the spreadsheet. For instance, if you want it to not import the first row and first column (which in this partivular case contains the non-numeric information) you can use
  
# Logical expressions work not only on scalars, but also on vectors, matrices and, in general, on n-dimensional arrays.
+
    DATA = csvread('MSFT.csv',1,1)
# Subvectors/submatrices can be extracted using logical 0-1 arrays.
 
  
=== Irrelevant but useful example ===
+
where the first '1' indicates that MATLAB should ignore one row at the top of the sheet and the second '1' instructs MATLAB to ignore one column at the left of the sheet. Also consult <source enclose=none>doc csvread</source> for more details on this import function.
 +
However, if you need the date information it is easier to first save the file as an Excel file and then import into MATLAB.
  
typing in MATLAB command window create a <math>1\times5</math> row-vector with values <math>[1\ 2\ 3\ 4\ 5]</math>. Logical expression will create a so called logical vector with values <math>[0\ 0\ 0\ 1\ 1]</math>, i.e. it is 1 if the according element is greater than 3.5 and 0 otherwise. Now, typing will generate a <math>2\times1</math> subvector with values <math>[4\ 5]</math>. You can also create some vectors or matrices with specific values changed: a command replace the last two values of the original vector . As a result, the vector becomes <math>[1 \ 2\ 3\ 8\ 10]</math>.
+
It is often easier to use the <source enclose=none>xlsread</source> function with csv files. That usually works in a straightforward manner.
  
=== Slightly less irrelevant example ===
+
= Dealing with Date vectors =
  
In some occasions you would like to modify a matrix of interest. Say, in some surveys “no answer” is coded as 999. Once you import the whole dataset in , you might want to replace these with, say, NaN. It can be done for the whole matrix of interest: .
+
As mentioned above there are two date formats that can be used in MATLAB. Depending on what you want to achieve one or the other may be easier to work with. The good thing is that you can translate back and forth between the two formats at will. Let's try that using the following date: 30 April 1777, the birthday of Carl Friedrich Gauss.
  
=== Relevant example ===
+
<source>a = datenum('30.04.1777','dd.mm.yyyy')
 +
a = 649157</source>
  
To demonstrate these capabilities in a more relevant environment, let’s run a very simple example. Assume that we have <math>T\times1</math> vector of returns and want to
+
Hence CF Gauss was born on the 649,157th day if you start counting on 1 Jan 0000. You can see from the above line that the second input <source enclose=none>'dd.mm.yyyy'</source> told MATLAB where to find the day, month and year information. If you are more familiar with the American way of formatting dates you could have achieved the same with:
  
# Compute number of positive, negative and zero returns
+
<source>a = datenum('04/30/1777','mm/dd/yyyy')
# Compute means of positive and negative returns
+
a = 649157</source>
  
The algorithm for this is quite straightforward:
+
If you preferred to use the <source enclose=none>datevec</source> format, you could have done the following (different date formats are used in exactly the same way).
  
# Find out a length of vector , T
+
<source>b = datevec('04/30/1777','mm/dd/yyyy')
# Initiate three counter variables, , and vectors (since we don’t know how many negative and positive returns we will observe
+
b = 1777          4          30          0          0          0</source>
# Check whether r(i) is greater, smaller or equal to 0 for i=1
 
# If , add 1 to Tplus, set ;
 
# Else if add 1 to Tminus, set ;
 
# Else add 1 to Tzero
 
# Repeat 3-6 for <math>i=2,\ldots,T</math>
 
# Remove excessive zeros from and :<br />
 
  
# Compute means of rminus and rplus. Number of positive, negative and zero returns are stored in
+
Just to make sure that both formats understand each other you could type
  
MATLAB translation:
+
<source>datenum(b)
 +
ans = 649157</source>
  
<pre>T=size(r,1);
+
So far we always used one particular date as input, but both these functions can also deal with vectors as input, which is exactly what MATLAB does when it imports a column with dates from an EXCEL sheet. There is a note of caution when working with dates. As mentioned above there are different conventions as far as the ordering between days and months are concerned. You need to make sure that you understand which convention was used in your data source file and adjust the code accordingly. As demonstrated above, MATLAB can easily deal with these as long as you instruct MATLAB accordingly. MATLAB's default date format is mm-dd-yyyy.
Tplus=0;Tminus=0;Tzero=0;
 
rplus=zeros(T,1);rminus=zeros(T,1);
 
for i=1:T
 
    if r(i)&gt;0
 
        Tplus=Tplus+1;%increasing Tplus by one if return is positive
 
        rplus(Tplus)=r(i);%storing positive return in the proper subvector
 
    elseif r(i)&lt;0
 
        Tminus=Tminus+1;%increasing Tminus by one if return is negative
 
        rminus(Tminus)=r(i);%storing negative return in the proper subvector
 
    else
 
        Tzero=Tzero+1;%increasing Tzero by one if return is neither positive nor negative
 
    end
 
end
 
rplus=rplus(1:Tplus);%removing excessive zeros from a subvector of positive returns
 
rminus=rminus(1:Tminus);%removing excessive zeros from a subvector of negative returns
 
meanplus=mean(rplus);%computing mean of positive returns
 
meanminus=sum(rminus)/Tminus;%computing mean of negative returns</pre>
 
Using MATLAB capabilities mentioned in this section, the algorithm can be reduced to:
 
  
# Construct a vector that has 1 for positive returns and 0 for negative returns
+
== Useful date operations ==
# Construct a vector that has 1 for negative returns and 0 for positive returns
 
# Assign to a sum of elements of . This is a number of positive returns
 
# Assign to a sum of elements of . This is a number of negative returns
 
# Compute which is
 
# Construct a vector of positive returns and compute its mean
 
# Construct a vector of negative returns and compute its mean
 
  
MATLAB implementation:
+
Date information can be extremely useful. For instance you could ask how many days ago CF Gauss was born (assuming that today is 29 September 2012):
  
<pre> T=size(r,1);
+
<source>now = date;   % defines now as the current day
  indplus  = r&gt;0;%constructing an indicator vector with 1s if r(i)&gt;0, 0 otherwise
+
datenum(now) - a
  indminus = r&lt;0;%constructing an indicator vector with 1s if r(i)&lt;0, 0 otherwise
+
ans = 85979</source>
  Tplus=sum(indplus);%computing a number of positive returns
 
  Tminus=sum(indminus);%computing a number of negative returns
 
  Tzero=T-Tplus-Tminus;%computing a number of zero returns
 
  rplus=r(indplus);%constructing a vector of positive returns
 
  rminus=r(indminus);%constructing a vector of negative returns
 
  meanplus=sum(rplus)/Tplus; %computing mean of positive returns
 
  meanminus=mean(rminus); %computing mean of negative returns</pre>
 
Or, a slightly shorter version of the same thing
 
  
<pre> T=size(r,1);
+
Here we used the MATLAB function <source enclose=none>date</source> which will retrieve the current date. Now you could ask what weekday the 30 April 1777 was.
  rplus  = r(r&gt;0);%constructing a vector of positive returns
 
  rminus = r(r&lt;0);%%constructing a vector of negative returns
 
  Tplus=size(rplus,1);%computing a number of positive returns
 
  Tminus=size(indminus,l);%computing a number of negative returns
 
  Tzero=T-Tplus-Tminus;%computing a number of zero returns
 
  meanplus=sum(rplus)/Tplus; %computing mean of positive returns
 
  meanminus=mean(rminus); %computing mean of negative returns</pre>
 
This way you write a code that is shorter, less prone to errors and easier to read (at least after some practice).
 
  
<references />
+
<source>weekday(a)    % use a datenum date as input
 +
ans = 4</source>
 +
 
 +
The answer 4 indicates that this particular genius was born on a Wednesday (1 = Sunday, 2 = Monday, etc.).
 +
 
 +
= Example Video =
 +
 
 +
This video clip demonstrates the import of data from an Excel file and some of the basic date formatting. [http://youtu.be/jyb68zGM2ik ExampleClip]

Latest revision as of 14:02, 10 November 2014

Introduction

Usually you will have data saved in some file, like Excel files, csv (comma seperated values) file or a text file. These are the most common formats in which you can download data from various databases.

Before we can look at the data upload procedure it is important to have a very quick look at the way in which MATLAB deals with dates.

Date formats

When dealing with time series data you will often want to keep date information. MATLAB has excellent date functionality and dealing with data is indeed one of its strengths. There are two principle formats in which MATLAB handles dates, the datenum and the datevec format

  1. The datenum format: Here MATLAB records date information in terms of the number of days since 01 Jan 0000. This day is given the number 1. The next day (2 Jan 0000) is assigned the number 2 and so forth. The 22 Sept 2012 is 735134. If you have time information included into your dates, that will be reflected in fractions. For example, '22-Sept-2012 12:00:00' is represented by 735134.5, as it is half way through the day.
  1. The datevec format: Here a date is transformed into a (1 x 6) vector where the first element represents the year, the second the month, the third the day, the fourth the hour, the fith the minutes and the sixth the seconds. If you only enter a day, the last three entries will take the value 0.

It is a great feature of MATLAB that it recognises a good number of different date formats and translates them easily into one of the two MATLAB formats. Importantly, if you open an EXCEL file (see below) and your spreadsheet contains dates formatted as EXCEL dates, MATLAB will automatically translate these dates into the datenum format if you are using a MATLAB version R2012a or higher.

Data Sources

By far the easiest formats to import data into MATLAB are Excel spreadsheets and csv files. When dealing with these type of files you can double click on the file you want to import in the "Current Directory" window. A window will then open and give you a preview of the data MATLAB will import. How exactly this wimdow looks like and what translation MATLAB does automatically now depens on the MATLAB version you are working with.

EXCEL files

MATLAB R2011 and lower

Generally MATLAB will generate the data in two matrices, one for the columns that MATLAB recognised as numerical data (often called data) and another that contains all non-numerical spreadsheet entries (often called textdata). You can then use these data matrices for any further operations. Date cells will end up in the textdata matrix and you will have to convert them manually to the MATLAB dates files (see below).

Instead of double clicking on a file in the "Current Directory" window you can also incorporate the data import into a piece of MATLAB script. This is done with the following command:

   [NUM,TXT,RAW] = xlsread('MSFT.xlsx') 

which will save the content of the EXCEL file (media:MSFT.xlsx) into three variables. The complete spreadsheet is saved in RAW, all numerical entries are in NUM and all text entries are in TXT. This command works if the data you want to import are on the first sheet in the EXCEL file. If you have several shhets in your EXCEL file and you want to import the data of a particular sheet you can achieve this by means of:

   [NUM,TXT,RAW] = xlsread('MSFT.xlsx','MSFT') 

where the data here are on the sheet called 'MSFT'. Also consult doc xlsread for more details on this import function.

MATLAB R2012 and higher

Importing EXCEL data works in the same way as for earlier versions if you use the command line. The difference occurs if you import data by double clicking on the file. Now MATLAB will automatically recognise if any of the input data are dates and if it finds any suggest that these are imported as datenum type data.

Often MATLAB will get the dates right, but there are situations where MATLAB does not get the date-info correctly. One reason for this can be that the MATLAB and Excel are using different reference days. As mentioned above, day 1 in MATLAB is 1 Jan 0000, whereas day one in Excel is 1 Jan 1900. It is prudent to check that MATLAB has done the right thing.

Here is some guidance on how to do that. Let's say you are importing a dataset where you know that the first observation is for 22 Sept 2012. Once the data are in MATLAB check what the value for the first date is. If it is still in text format, i.e. "22/9/2012", then you know that you are yet to translate it into a MATLAB date format. However, if you encounter the number 735134, then you know that MATLAB has already translated the date into its own format. Whether it has done so correctly you can check using

>> datevec(735134)
ans =
        2012          9          22           0           0           0

which returns the correct date (in the "YYYY MM DD hh mm ss" format). If however MATLAB has the number 41174 as an entry, then the translation did not take the different reference date into account. The number is the Excel date number (as 22 Sept 2012 is the 41174th day when starting with 1 Jan 1900).

How can you correct this? Fortunately the correction is pretty easy. Lets assume that your dates are in a vector called dateold, then you can re-base the date vector as follows:

 datenew = dateold + (735134-41174)   # rebasing date vector

When you do that, it is vital that you add a comment to the code to remember why you did that. Otherwise it will the reason for a lot of head-scratching if you open the code a few months later.

CSV files

To import csv files you can also double click on the file in the "Current Directory" window. In this case MATLAB will basically do exactly the same as if the csv file was a xlsx file (see above). You can also import data from a csv file through a command line.

   DATA = csvread('MSFT.csv') 

If you execute this line with this LINK TO THIS FILE!!!! 'MSFT.csv' file you will actually get an error message. This file contains the date information in the first column and the csvread function does not really know how to handle these. To avoid this, you can tell MATLAB to only import a certain section of the spreadsheet. For instance, if you want it to not import the first row and first column (which in this partivular case contains the non-numeric information) you can use

   DATA = csvread('MSFT.csv',1,1) 

where the first '1' indicates that MATLAB should ignore one row at the top of the sheet and the second '1' instructs MATLAB to ignore one column at the left of the sheet. Also consult doc csvread for more details on this import function. However, if you need the date information it is easier to first save the file as an Excel file and then import into MATLAB.

It is often easier to use the xlsread function with csv files. That usually works in a straightforward manner.

Dealing with Date vectors

As mentioned above there are two date formats that can be used in MATLAB. Depending on what you want to achieve one or the other may be easier to work with. The good thing is that you can translate back and forth between the two formats at will. Let's try that using the following date: 30 April 1777, the birthday of Carl Friedrich Gauss.

a = datenum('30.04.1777','dd.mm.yyyy')
a = 649157

Hence CF Gauss was born on the 649,157th day if you start counting on 1 Jan 0000. You can see from the above line that the second input 'dd.mm.yyyy' told MATLAB where to find the day, month and year information. If you are more familiar with the American way of formatting dates you could have achieved the same with:

a = datenum('04/30/1777','mm/dd/yyyy')
a = 649157

If you preferred to use the datevec format, you could have done the following (different date formats are used in exactly the same way).

b = datevec('04/30/1777','mm/dd/yyyy')
b = 1777           4          30           0           0           0

Just to make sure that both formats understand each other you could type

datenum(b)
ans = 649157

So far we always used one particular date as input, but both these functions can also deal with vectors as input, which is exactly what MATLAB does when it imports a column with dates from an EXCEL sheet. There is a note of caution when working with dates. As mentioned above there are different conventions as far as the ordering between days and months are concerned. You need to make sure that you understand which convention was used in your data source file and adjust the code accordingly. As demonstrated above, MATLAB can easily deal with these as long as you instruct MATLAB accordingly. MATLAB's default date format is mm-dd-yyyy.

Useful date operations

Date information can be extremely useful. For instance you could ask how many days ago CF Gauss was born (assuming that today is 29 September 2012):

now = date;    % defines now as the current day
datenum(now) - a
ans = 85979

Here we used the MATLAB function date which will retrieve the current date. Now you could ask what weekday the 30 April 1777 was.

weekday(a)    % use a datenum date as input
ans = 4

The answer 4 indicates that this particular genius was born on a Wednesday (1 = Sunday, 2 = Monday, etc.).

Example Video

This video clip demonstrates the import of data from an Excel file and some of the basic date formatting. ExampleClip