Python/Data
Introduction
Here we will give a brief introduction into how to best handle data when using Python to solve Econometric problems. Here we will use a tool called Pandas. They are based on Numpy arrays. So first you got to make sure that the [[Numpy|Numpy]] and [[Panda|Panda]] modules are available.
You can use Pandas to any of the following:
- Merge data-sets
- Filter data-sets
- Calculate summary statistics
We will do this by way of an example. Here are two datafiles:
- S&P500: SP500.xlsx
- IBM: IBM.xlsx
These are csy files downloaded from [[Yahoo|http://www.yahoo.com/finance Yahoo]] which contain information about the S&P500 share price index and the IBM share prices. But let’s use Python and Pandas to explore the data.
Data Import
Use the following code:
import numpy as np # import modules for use
import pandas as pd
xlsxfile = pd.ExcelFile('IBM.xlsx') # Loads an Excel Table
data_IBM = xlsxfile.parse('IBM') # Read an Excel table into DataFrame
xlsxfile = pd.ExcelFile('SP500.xlsx')
data_IBM = xlsxfile.parse('SP500')
Importing Excel files works in the two steps of first uploading an Excel file (which is what the pd.ExcelFile(’IBM.xlsx’)
command does). The second line, dataIBM = xlsxfile.parse(’IBM’)
puts the data into what is called a DataFrame. We’ll see what that is in a minute.
If you have the data in a csv file, then this process is even simpler! Then you only need one line dataIBM = pd.readcsv(’IBM.csv’)