SavingData

From ECLR
Revision as of 14:28, 25 September 2012 by Admin (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Saving Data and Results

When you do any econometric work you will have some results to show for. These may be estimated parameters, residuals, simulated data, etc. We will have to discuss how you save these results. This is especially important if your computations are lengthy, as just repeating the calculations may take too long. In general there are three ways in which you can save results.

  1. Save the MATLAB workspace or elements thereof. You will need MATLAB to access the results.
  2. Save matrices in an EXCEL file. Obviously you will need EXCEL (or any other spreadsheet software) to access the saved results. This may be of advantage if you are planning to report results via EXCEL tables.
  3. Save some output in a text file. You can the access these results with any software that can read text.

From the above it is obvious that your decision should be based on how we will be able to read the results afterwards. To say this from the outset, the preferred method is going to be the one which saves the MATLAB workspace. The other two options have so many restrictions that they should only be contemplated when you know that you do not have access to MATLAB and need to read the results.

Save MATLAB workspace

It was discussed in the introductory module that the MATLAB workspace saves all your variables you create in the course of one program. At any stage in the code you can save that entire workspace using the following simple command:

save results

Doing this will create a new file in your current directory that is called results.mat. This is a MATLAB specific format. Next time you open MATLAB and you execute

load results

your MATLAB workspace will be populated with all the variables that were saved in results.mat and they have exactly the same names as they used to have previously. You can use the load command in another piece of code and in this way you can use the elements in results.mat for further calculations.

You will soon learn that in the course of a reasonably complex program you will accumulate a big number of variables, many of which you are not really interested in, but were needed for preliminary calculations. If you use the load command as described above, all these variables will also appear in results.mat. It will often be useful to be somewhat more selective and clearly specify which of the many variables you have you actually want to save for posterity. Say the variables x, y and z contain the results you want to save, then you should use the following command

save results x y z

which saves these three variables in results.mat but nothing else. It is also useful to know that in that list of variables you can use a wildcard * character. Say you want to save the following five variables: x1, x2, x3, x4 and x5, then you can achieve this with

save results x*

Output to EXCEL file

This method may be useful if you know that you will use the MATLAB results to create a Table in EXCEL. Let’s say you have two matrices which you want to store, res1 and res2. You could, in a fairly straightforward way, store each matrix as their own EXCEL worksheet.

xlswrite('test.xlsx', res1, 'res1');
xlswrite('test.xlsx', res2, 'res2');

The commands here have three inputs: xlswrite(’FILENAME.xlsx’, VARNAME, ’EXCELSHEETNAME’) where VARNAME is the MATLAB variable you want to save and EXCELSHEETNAME is the name you want to give to the EXCEL sheet that should store the variable. This will create an EXCEL file with two new sheets named "res1" and "res2" containing the matrices res1 and res2 respectively.

You could also save the two matrices into the same spreadsheet, but then you need to be precise where on the spreadsheet you want to save each of the two. Also you should add names to the spreadsheet as you want to make sure that you keep track of which data come from which variable. Say the variable res1 has dimension [math](3\times2)[/math] and res2 is a [math](2\times4)[/math] matrix. Then you could do the following:

xlswrite('test.xlsx', {'res1'}, 'Sheet1', 'A1:A1');     % Saves Name
xlswrite('test.xlsx', res1, 'Sheet1', 'A2:B4');
xlswrite('test.xlsx', {'res2'}, 'Sheet1', 'A6:A6');     % Saves Name
xlswrite('test.xlsx', res2, 'Sheet1', 'A7:D8');

These lines contain two little novelties compared to the previous commands. Here xlswrite has a fourth input. This input specifies the cell range into which the variable is to be written. Say, in line 2 the cell reference is ’A2:B4’ which references the a [math](3\times2)[/math] cell block in EXCEL, exactly the correct size for res1. In line 1 the second input into the xlswrite command is {’res1’}. This is not the variable res1, but a piece of text "res1" that is being written into cell A1.

Output to Text file

The least convenient way to save results is in a text file, a file that can be read by basically any text editor. In fact, the only reason you may want to do it is if you want to add written comments to your results. BUT, printing matrices to a text file is a bit of a nightmare (and will not be dealt with here. If you have big matrices, use one of the above techniques. Small matrices can be dealt with by treating each element as a scalar. The basic structure is as follows. You first open a file into which you want to write your results (see line 1 below). Then whenever you want to print something into that file you use the fprintf (for text and scalars). Let me present a piece of code and then explain how it works. The aim is to save the [math](2\times1)[/math] vector res3 and the scalar variables n and k to a text file.

fid = fopen('test.txt','a');
fprintf(fid,'Number of obs: %6.0f; Number of variables: %4.2f \n', n, k);
fprintf(fid,'Estimated parameters: %8.4f %8.4f \n',res3);
fclose(fid);

The output that is produced by this piece of code is:

Number of obs: 100; Number of variables: 3.00
Estimated parameters: 0.2578 12.5789

assuming that [math]n=100[/math], [math]k=3[/math] and [math]res3=(0.2578\;12.5789)[/math].

The command in line 1 opens a text file called test.txt and for the remainder of the code gives it the name fid. We will use this name whenever we need to refer to this file. The second input into the fopen command, ’a’, stands for append and means that, should this file already exist, you will add whatever you add to the end of the file. If you use ’w’ instead you will overwrite anything that is already in that file. Let’s briefly jump to the last line (Line 4), here the file fid is closed again. Inbetween these two lines you can print something into the file.

This is done using the command fprintf. On first sight Lines 2 and 3 look like a mess[1]. Let’s decipher what we can see here. The general command structure is fprintf(FileID,’text text VARREF1 text VARREF2 \n ’, VAR1, VAR2). The first bit is straightforward. The command fprintf tells MATLAB to print something into a file and the first input FileID indicates which file that ought to be, test.txt in our case, or for MATLAB fid (from Line 1). Before we try to understand the second input (between the single quotation marks) we will go to the last two, n and k. They are the two MATLAB variables that are to be worked into the output. If you had more than two variables to be printed this list would be longer. So, now the complicated bit (and you will begin to understand why this is not really the preferred method), the second input. You can see some text but also the somewhat cryptic %6.0f and %4.2f. These bits have two functions. First, they tell MATLAB where, inbetween the remaining text, the variables n and k are to be printed. This bit always starts with a %.

The second function is to tell MATLAB how to format the number. This one ends with a f which indicates that you fix where the decimal point goes. The number before the decimal point tells MATLAB how many characters it should reserve for the variable and the number after the decimal point indicates how many digits after the decimal point should be displayed. Check out doc fprintf for a discussion of all the different options or even better try a few options yourself. Lastly, the \n merely tells MATLAB to now switch to a new line.

Command line 3 does basically exactly the same. You can see that there are two placeholders for MATLAB variables. This time, however, this is only followed by one MATLAB variable res3. But note that this variable has two elements and fprintf treats these two elements as if they were two different variables.

Publish Code and Results

Sometimes it is useful to have your code and your calculated results in one place. MATLAB provides an extremely useful functionality to achieve this, namely to "publish" your code. This interlaces your code with the output produced by your code. How to do this is best understood by watching the following clip.

Footnotes

  1. For the full gory detail of what is happening here you should check doc fprintf. Here you will get the quick version