Python/Data

From ECLR
Revision as of 21:12, 21 July 2014 by Rb (talk | contribs)
Jump to: navigation, search


Introduction

Here we will give a brief introduction into how to best handle data when using Python to solve Econometric problems. Here we will use a tool called Pandas. They are based on Numpy arrays. So first you got to make sure that the [[Numpy|Numpy]] and [[Panda|Panda]] modules are available.

You can use Pandas to any of the following:

  • Merge data-sets
  • Filter data-sets
  • Calculate summary statistics

We will do this by way of an example. Here are two datafiles:

  1. S&P500: SP500.xlsx
  2. IBM: IBM.xlsx

These are csy files downloaded from [[Yahoo|http://www.yahoo.com/finance Yahoo]] which contain information about the S&P500 share price index and the IBM share prices. But let’s use Python and Pandas to explore the data.

Data Import

Use the following code:

import numpy as np      # import modules for use
import pandas as pd

xlsxfile = pd.ExcelFile('IBM.xlsx') # Loads an Excel Table
data_IBM = xlsxfile.parse('IBM')    # Read an Excel table into DataFrame

xlsxfile = pd.ExcelFile('SP500.xlsx')
data_IBM = xlsxfile.parse('SP500')

Importing Excel files works in the two steps of first uploading an Excel file (which is what the pd.ExcelFile(’IBM.xlsx’) command does). The second line, dataIBM = xlsxfile.parse(’IBM’) puts the data into what is called a DataFrame. We’ll see what that is in a minute.

If you have the data in a csv file, then this process is even simpler! Then you only need one line dataIBM = pd.readcsv(’IBM.csv’)

Literature