Revision as of 21:12, 21 July 2014

Introduction

Here we will give a brief introduction into how to best handle data when using Python to solve Econometric problems. Here we will use a tool called Pandas. They are based on Numpy arrays. So first you got to make sure that the [[Numpy|Numpy]] and [[Panda|Panda]] modules are available.

You can use Pandas to any of the following:

Merge data-sets
Filter data-sets
Calculate summary statistics

We will do this by way of an example. Here are two datafiles:

S&P500: SP500.xlsx
IBM: IBM.xlsx

These are csy files downloaded from [[Yahoo|http://www.yahoo.com/finance Yahoo]] which contain information about the S&P500 share price index and the IBM share prices. But let’s use Python and Pandas to explore the data.

Data Import

Use the following code:

import numpy as np      # import modules for use
import pandas as pd

xlsxfile = pd.ExcelFile('IBM.xlsx') # Loads an Excel Table
data_IBM = xlsxfile.parse('IBM')    # Read an Excel table into DataFrame

xlsxfile = pd.ExcelFile('SP500.xlsx')
data_IBM = xlsxfile.parse('SP500')

Importing Excel files works in the two steps of first uploading an Excel file (which is what the pd.ExcelFile(’IBM.xlsx’) command does). The second line, dataIBM = xlsxfile.parse(’IBM’) puts the data into what is called a DataFrame. We’ll see what that is in a minute.

If you have the data in a csv file, then this process is even simpler! Then you only need one line dataIBM = pd.readcsv(’IBM.csv’)

Literature

@@ Line 13: / Line 13: @@
 We will do this by way of an example. Here are two datafiles:
-# S&P500: [[media:SP500.csv|SP500.csv]]
+# S&P500: [[media:SP500.xlsx|SP500.xlsx]]
-# IBM: [[media:IBM.csv|IBM.csv]]
+# IBM: [[media:IBM.xlsx|IBM.xlsx]]
 These are csy files downloaded from [[[http://www.yahoo.com/finance Yahoo|http://www.yahoo.com/finance Yahoo]]] which contain information about the S&P500 share price index and the IBM share prices. But let’s use Python and Pandas to explore the data.
@@ Line 20: / Line 20: @@
 = Data Import =
-Use the following code
+Use the following code:
 <source>import numpy as np      # import modules for use
 import pandas as pd
-data_SP = pd.read_csv('SP500.csv')
+xlsxfile = pd.ExcelFile('IBM.xlsx') # Loads an Excel Table
-data_IBM = pd.read_csv('IBM.csv')</source>
+data_IBM = xlsxfile.parse('IBM')    # Read an Excel table into DataFrame
-= Literature =
+xlsxfile = pd.ExcelFile('SP500.xlsx')
+data_IBM = xlsxfile.parse('SP500')</source>
+Importing Excel files works in the two steps of first uploading an Excel file (which is what the <code>pd.ExcelFile(’IBM.xlsx’)</code> command does). The second line, <code>dataIBM = xlsxfile.parse(’IBM’)</code> puts the data into what is called a ''DataFrame''. We’ll see what that is in a minute.
-Hamilton J.D. (1994) ''Time Series Analysis'', Princeton, Section 5.7 as well as Judge G.G, W.E. Griffiths, R.C. Hill, H. Lütkepohl and T.-C. Lee (1985) ''The Theory and Practice of Econometrics'', John Wiley, Appendix B, give good introductions into the mechanics of nonlinear optimisation algorithms.
+If you have the data in a csv file, then this process is even simpler! Then you only need one line <code>dataIBM = pd.readcsv(’IBM.csv’)</code>
-Martin V., Hurn S. and Harris D. (2012) ''Econometric Modelling with Time Series: Specification, Estimation and Testing (Themes in Modern Econometrics)'', Chapter 3 gives an excellent introduction into nonlinear optimisation strategies.
+= Literature =

Difference between revisions of "Python/Data"

Revision as of 21:12, 21 July 2014

Introduction

Data Import

Literature

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

ECLR

Tools