Difference between revisions of "Python/Data"

From ECLR
Jump to: navigation, search
(Created page with " = Introduction = Here we will give a brief introduction into how to best handle data when using Python to solve Econometric problems. Here we will use a tool called Pandas....")
 
Line 13: Line 13:
 
We will do this by way of an example. Here are two datafiles:
 
We will do this by way of an example. Here are two datafiles:
  
# S&P500: [[media:SP500.csv|SP500.csv]]
+
# S&P500: [[media:SP500.xlsx|SP500.xlsx]]
# IBM: [[media:IBM.csv|IBM.csv]]
+
# IBM: [[media:IBM.xlsx|IBM.xlsx]]
  
 
These are csy files downloaded from [[[http://www.yahoo.com/finance Yahoo|http://www.yahoo.com/finance Yahoo]]] which contain information about the S&P500 share price index and the IBM share prices. But let’s use Python and Pandas to explore the data.
 
These are csy files downloaded from [[[http://www.yahoo.com/finance Yahoo|http://www.yahoo.com/finance Yahoo]]] which contain information about the S&P500 share price index and the IBM share prices. But let’s use Python and Pandas to explore the data.
Line 20: Line 20:
 
= Data Import =
 
= Data Import =
  
Use the following code
+
Use the following code:
  
 
<source>import numpy as np      # import modules for use
 
<source>import numpy as np      # import modules for use
 
import pandas as pd
 
import pandas as pd
  
data_SP = pd.read_csv('SP500.csv')
+
xlsxfile = pd.ExcelFile('IBM.xlsx') # Loads an Excel Table
data_IBM = pd.read_csv('IBM.csv')</source>
+
data_IBM = xlsxfile.parse('IBM')    # Read an Excel table into DataFrame
= Literature =
+
 
 +
xlsxfile = pd.ExcelFile('SP500.xlsx')
 +
data_IBM = xlsxfile.parse('SP500')</source>
 +
Importing Excel files works in the two steps of first uploading an Excel file (which is what the <code>pd.ExcelFile(’IBM.xlsx’)</code> command does). The second line, <code>dataIBM = xlsxfile.parse(’IBM’)</code> puts the data into what is called a ''DataFrame''. We’ll see what that is in a minute.
  
Hamilton J.D. (1994) ''Time Series Analysis'', Princeton, Section 5.7 as well as Judge G.G, W.E. Griffiths, R.C. Hill, H. Lütkepohl and T.-C. Lee (1985) ''The Theory and Practice of Econometrics'', John Wiley, Appendix B, give good introductions into the mechanics of nonlinear optimisation algorithms.
+
If you have the data in a csv file, then this process is even simpler! Then you only need one line <code>dataIBM = pd.readcsv(’IBM.csv’)</code>
  
Martin V., Hurn S. and Harris D. (2012) ''Econometric Modelling with Time Series: Specification, Estimation and Testing (Themes in Modern Econometrics)'', Chapter 3 gives an excellent introduction into nonlinear optimisation strategies.
+
= Literature =

Revision as of 21:12, 21 July 2014


Introduction

Here we will give a brief introduction into how to best handle data when using Python to solve Econometric problems. Here we will use a tool called Pandas. They are based on Numpy arrays. So first you got to make sure that the [[Numpy|Numpy]] and [[Panda|Panda]] modules are available.

You can use Pandas to any of the following:

  • Merge data-sets
  • Filter data-sets
  • Calculate summary statistics

We will do this by way of an example. Here are two datafiles:

  1. S&P500: SP500.xlsx
  2. IBM: IBM.xlsx

These are csy files downloaded from [[Yahoo|http://www.yahoo.com/finance Yahoo]] which contain information about the S&P500 share price index and the IBM share prices. But let’s use Python and Pandas to explore the data.

Data Import

Use the following code:

import numpy as np      # import modules for use
import pandas as pd

xlsxfile = pd.ExcelFile('IBM.xlsx') # Loads an Excel Table
data_IBM = xlsxfile.parse('IBM')    # Read an Excel table into DataFrame

xlsxfile = pd.ExcelFile('SP500.xlsx')
data_IBM = xlsxfile.parse('SP500')

Importing Excel files works in the two steps of first uploading an Excel file (which is what the pd.ExcelFile(’IBM.xlsx’) command does). The second line, dataIBM = xlsxfile.parse(’IBM’) puts the data into what is called a DataFrame. We’ll see what that is in a minute.

If you have the data in a csv file, then this process is even simpler! Then you only need one line dataIBM = pd.readcsv(’IBM.csv’)

Literature