Difference between revisions of "Python/Data"
(Created page with " = Introduction = Here we will give a brief introduction into how to best handle data when using Python to solve Econometric problems. Here we will use a tool called Pandas....") |
|||
Line 13: | Line 13: | ||
We will do this by way of an example. Here are two datafiles: | We will do this by way of an example. Here are two datafiles: | ||
− | # S&P500: [[media:SP500. | + | # S&P500: [[media:SP500.xlsx|SP500.xlsx]] |
− | # IBM: [[media:IBM. | + | # IBM: [[media:IBM.xlsx|IBM.xlsx]] |
These are csy files downloaded from [[[http://www.yahoo.com/finance Yahoo|http://www.yahoo.com/finance Yahoo]]] which contain information about the S&P500 share price index and the IBM share prices. But let’s use Python and Pandas to explore the data. | These are csy files downloaded from [[[http://www.yahoo.com/finance Yahoo|http://www.yahoo.com/finance Yahoo]]] which contain information about the S&P500 share price index and the IBM share prices. But let’s use Python and Pandas to explore the data. | ||
Line 20: | Line 20: | ||
= Data Import = | = Data Import = | ||
− | Use the following code | + | Use the following code: |
<source>import numpy as np # import modules for use | <source>import numpy as np # import modules for use | ||
import pandas as pd | import pandas as pd | ||
− | + | xlsxfile = pd.ExcelFile('IBM.xlsx') # Loads an Excel Table | |
− | data_IBM = pd. | + | data_IBM = xlsxfile.parse('IBM') # Read an Excel table into DataFrame |
− | = | + | |
+ | xlsxfile = pd.ExcelFile('SP500.xlsx') | ||
+ | data_IBM = xlsxfile.parse('SP500')</source> | ||
+ | Importing Excel files works in the two steps of first uploading an Excel file (which is what the <code>pd.ExcelFile(’IBM.xlsx’)</code> command does). The second line, <code>dataIBM = xlsxfile.parse(’IBM’)</code> puts the data into what is called a ''DataFrame''. We’ll see what that is in a minute. | ||
− | + | If you have the data in a csv file, then this process is even simpler! Then you only need one line <code>dataIBM = pd.readcsv(’IBM.csv’)</code> | |
− | + | = Literature = |
Revision as of 21:12, 21 July 2014
Introduction
Here we will give a brief introduction into how to best handle data when using Python to solve Econometric problems. Here we will use a tool called Pandas. They are based on Numpy arrays. So first you got to make sure that the [[Numpy|Numpy]] and [[Panda|Panda]] modules are available.
You can use Pandas to any of the following:
- Merge data-sets
- Filter data-sets
- Calculate summary statistics
We will do this by way of an example. Here are two datafiles:
- S&P500: SP500.xlsx
- IBM: IBM.xlsx
These are csy files downloaded from [[Yahoo|http://www.yahoo.com/finance Yahoo]] which contain information about the S&P500 share price index and the IBM share prices. But let’s use Python and Pandas to explore the data.
Data Import
Use the following code:
import numpy as np # import modules for use
import pandas as pd
xlsxfile = pd.ExcelFile('IBM.xlsx') # Loads an Excel Table
data_IBM = xlsxfile.parse('IBM') # Read an Excel table into DataFrame
xlsxfile = pd.ExcelFile('SP500.xlsx')
data_IBM = xlsxfile.parse('SP500')
Importing Excel files works in the two steps of first uploading an Excel file (which is what the pd.ExcelFile(’IBM.xlsx’)
command does). The second line, dataIBM = xlsxfile.parse(’IBM’)
puts the data into what is called a DataFrame. We’ll see what that is in a minute.
If you have the data in a csv file, then this process is even simpler! Then you only need one line dataIBM = pd.readcsv(’IBM.csv’)