Difference between revisions of "LoadingData"

From ECLR
Jump to: navigation, search
(MATLAB R2011 and lower)
(MATLAB R2011 and lower)
Line 29: Line 29:
 
     [NUM,TXT,RAW] = xlsread('MSFT.xlsx')  
 
     [NUM,TXT,RAW] = xlsread('MSFT.xlsx')  
  
which will save the content of the EXCEL file ([[media:MSFT.xlsx]]) into three files. The complete spreadsheet is saved in <source enclose=none>RAW</source>, all numerical entries are in <source enclose=none>NUM</source> and all text entries are in <source enclose=none>TXT</source>. This command works if the data you want to import are on the first sheet in the EXCEL file. If you have several shhets in your EXCEL file and you want to import the data of a particular sheet you can achieve this by means of:
+
which will save the content of the EXCEL file ([[media:MSFT.xlsx]]) into three variables. The complete spreadsheet is saved in <source enclose=none>RAW</source>, all numerical entries are in <source enclose=none>NUM</source> and all text entries are in <source enclose=none>TXT</source>. This command works if the data you want to import are on the first sheet in the EXCEL file. If you have several shhets in your EXCEL file and you want to import the data of a particular sheet you can achieve this by means of:
  
 
     [NUM,TXT,RAW] = xlsread('MSFT.xlsx','MSFT')  
 
     [NUM,TXT,RAW] = xlsread('MSFT.xlsx','MSFT')  

Revision as of 12:25, 8 October 2013

Introduction

Usually you will have data saved in some file, like Excel files, csv (comma seperated values) file or a text file. These are the most common formats in which you can download data from various databases.

Before we can look at the data upload procedure it is important to have a very quick look at the way in which MATLAB deals with dates.

Date formats

When dealing with time series data you will often want to keep date information. MATLAB has excellent date functionality and dealing with data is indeed one of its strengths. There are two principle formats in which MATLAB handles dates, the datenum and the datevec format

  1. The datenum format: Here MATLAB records date information in terms of the number of days since 01 Jan 0000. This day is given the number 1. The next day (2 Jan 0000) is assigned the number 2 and so forth. The 22 Sept 2012 is 735134. If you have time information included into your dates, that will be reflected in fractions. For example, '22-Sept-2012 12:00:00' is represented by 735134.5, as it is half way through the day.
  1. The datevec format: Here a date is transformed into a (1 x 6) vector where the first element represents the year, the second the month, the third the day, the fourth the hour, the fith the minutes and the sixth the seconds. If you only enter a day, the last three entries will take the value 0.

It is a great feature of MATLAB that it recognises a good number of different date formats and translates them easily into one of the two MATLAB formats. Importantly, if you open an EXCEL file (see below) and your spreadsheet contains dates formatted as EXCEL dates, MATLAB will automatically translate these dates into the datenum format if you are using a MATLAB version R2012a or higher.

Data Sources

By far the easiest formats to import data into MATLAB are Excel spreadsheets and csv files. When dealing with these type of files you can double click on the file you want to import in the "Current Directory" window. A window will then open and give you a preview of the data MATLAB will import. How exactly this wimdow looks like and what translation MATLAB does automatically now depens on the MATLAB version you are working with.

EXCEL files

MATLAB R2011 and lower

Generally MATLAB will generate the data in two matrices, one for the columns that MATLAB recognised as numerical data (often called data) and another that contains all non-numerical spreadsheet entries (often called textdata). You can then use these data matrices for any further operations. Date cells will end up in the textdata matrix and you will have to convert them manually to the MATLAB dates files (see below).

Instead of double clicking on a file in the "Current Directory" window you can also incorporate the data import into a piece of MATLAB script. This is done with the following command:

   [NUM,TXT,RAW] = xlsread('MSFT.xlsx') 

which will save the content of the EXCEL file (media:MSFT.xlsx) into three variables. The complete spreadsheet is saved in RAW, all numerical entries are in NUM and all text entries are in TXT. This command works if the data you want to import are on the first sheet in the EXCEL file. If you have several shhets in your EXCEL file and you want to import the data of a particular sheet you can achieve this by means of:

   [NUM,TXT,RAW] = xlsread('MSFT.xlsx','MSFT') 

where the data here are on the sheet called 'MSFT'. Also consult doc xlsread for more details on this import function.

MATLAB R2012 and higher

Importing EXCEL data works in the same way as for earlier versions if you use the command line. The difference occurs if you import data by double clicking on the file. Now MATLAB will automatically recognise if any of the input data are dates and if it finds any suggest that these are imported as datenum type data.

CSV files

To import csv files you can also double click on the file in the "Current Directory" window. In this case MATLAB will basically do exactly the same as if the csv file was a xlsx file (see above). You can also import data from a csv file through a command line.

   DATA = csvread('MSFT.csv') 

If you execute this line with this LINK TO THIS FILE!!!! 'MSFT.csv' file you will actually get an error message. This file contains the date information in the first column and the csvread function does not really know how to handle these. To avoid this, you can tell MATLAB to only import a certain section of the spreadsheet. For instance, if you want it to not import the first row and first column (which in this partivular case contains the non-numeric information) you can use

   DATA = csvread('MSFT.csv',1,1) 

where the first '1' indicates that MATLAB should ignore one row at the top of the sheet and the second '1' instructs MATLAB to ignore one column at the left of the sheet. Also consult doc csvread for more details on this import function. However, if you need the date information it is easier to first save the file as an Excel file and then import into MATLAB.

Dealing with Date vectors

As mentioned above there are two date formats that can be used in MATLAB. Depending on what you want to achieve one or the other may be easier to work with. The good thing is that you can translate back and forth between the two formats at will. Let's try that using the following date: 30 April 1777, the birthday of Carl Friedrich Gauss.

a = datenum('30.04.1777','dd.mm.yyyy')
a = 649157

Hence CF Gauss was born on the 649,157th day if you start counting on 1 Jan 0000. You can see from the above line that the second input 'dd.mm.yyyy' told MATLAB where to find the day, month and year information. If you are more familiar with the American way of formatting dates you could have achieved the same with:

a = datenum('04/30/1777','mm/dd/yyyy')
a = 649157

If you preferred to use the datevec format, you could have done the following (different date formats are used in exactly the same way).

b = datevec('04/30/1777','mm/dd/yyyy')
b = 1777           4          30           0           0           0

Just to make sure that both formats understand each other you could type

datenum(b)
ans = 649157

So far we always used one particular date as input, but both these functions can also deal with vectors as input, which is exactly what MATLAB does when it imports a column with dates from an EXCEL sheet. There is a note of caution when working with dates. As mentioned above there are different conventions as far as the ordering between days and months are concerned. You need to make sure that you understand which convention was used in your data source file and adjust the code accordingly. As demonstrated above, MATLAB can easily deal with these as long as you instruct MATLAB accordingly. MATLAB's default date format is mm-dd-yyyy.

Useful date operations

Date information can be extremely useful. For instance you could ask how many days ago CF Gauss was born (assuming that today is 29 September 2012):

now = date;    % defines now as the current day
datenum(now) - a
ans = 85979

Here we used the MATLAB function date which will retrieve the current date. Now you could ask what weekday the 30 April 1777 was.

weekday(a)    % use a datenum date as input
ans = 4

The answer 4 indicates that this particular genius was born on a Wednesday (1 = Sunday, 2 = Monday, etc.).

Example Video

This video clip demonstrates the import of data from an Excel file and some of the basic date formatting. ExampleClip