Difference between revisions of "Python/Program Flow and Logicals"
(→More advanced example) |
(→Running Python programs) |
||
Line 376: | Line 376: | ||
# Convert the program to a function and use the [http://docs.python.org/3/tutorial/modules.html Python module functionality], e.g. save to a file <code>myfunctions.py</code> and use Python's <source enclose=none lang="python">import</source> to make the functions available. | # Convert the program to a function and use the [http://docs.python.org/3/tutorial/modules.html Python module functionality], e.g. save to a file <code>myfunctions.py</code> and use Python's <source enclose=none lang="python">import</source> to make the functions available. | ||
− | The Python prompt has other limitations too, for example, it doesn't include commands like <source enclose=none>pwd</source>, <source enclose=none>cd</source>, <source enclose=none>pwd</source>, etc. To avoid these limitations we can use [http://ipython.org/ IPython (Interactive Python)], which provides an command line interface which behaves far more like MATLAB. For example, if we save the first code snippet in [[Python/Program_Flow_and_Logicals#More_advanced_example|'''More advanced Example''' (see above)]] to a file called <code>ReturnAnalysis1.py</code> then from within IPython we can create a vector of random values to be analysed, and | + | The Python prompt has other limitations too, for example, it doesn't include commands like <source enclose=none>pwd</source>, <source enclose=none>cd</source>, <source enclose=none>pwd</source>, etc. To avoid these limitations we can use [http://ipython.org/ IPython (Interactive Python)], which provides an command line interface which behaves far more like MATLAB. For example, if we save the first code snippet in [[Python/Program_Flow_and_Logicals#More_advanced_example|'''More advanced Example''' (see above)]] to a file called <code>ReturnAnalysis1.py</code> then from within IPython we can create a vector of random values to be analysed, and use run to execute the program, noting that for MATLAB like behaviour where the script can see the current interactive namespace requies the -i flag. The -t flag is useful too, as it times how long the script takes to run. |
+ | <source lang="python"> | ||
+ | In [1]: n=100000000 | ||
+ | In [2]: from numpy.random import rand | ||
+ | In [3]: r=rand(n)*10-5 | ||
+ | In [4]: r=rand(n)*10-5 | ||
+ | In [5]: run -i -t ReturnAnalysis1 | ||
+ | |||
+ | IPython CPU timings (estimated): | ||
+ | User : 150.84 s. | ||
+ | System : 0.44 s. | ||
+ | Wall time: 151.72 s. | ||
+ | |||
+ | In [6]: meanplus | ||
+ | Out[6]: 2.4998243901335169 | ||
+ | |||
+ | In [7]: meanminus | ||
+ | Out[7]: -2.4999279003309622 | ||
+ | </source> | ||
+ | In comparison, when running the vectorised version of the code (saved in <code>ReturnAnalysis2.py</code>)..... | ||
+ | <source lang="python"> | ||
+ | In [8]: run -i -t ReturnAnalysis2 | ||
+ | |||
+ | IPython CPU timings (estimated): | ||
+ | User : 1.84 s. | ||
+ | System : 0.34 s. | ||
+ | Wall time: 2.19 s. | ||
+ | |||
+ | In [9]: meanplus | ||
+ | Out[9]: 2.4998243901335169 | ||
+ | |||
+ | In [10]: meanminus | ||
+ | Out[10]: -2.4999279003309622 | ||
+ | </source> | ||
+ | |||
+ | |||
+ | Finally, when running the vectorised matlab version (saved to ReturnAnalysis2.m) the run time is...... | ||
=Footnotes= | =Footnotes= | ||
<references /> | <references /> |
Revision as of 13:16, 15 October 2013
The following assumes use of Python 3 (version 3 of Python) as opposed to Python 2, since no more major releases are planned for version 2, version 3 is expected to be the future of Python. The two versions of Python, although similar, are not compatible in a forwards or backwards direction[1], and some legacy code exists only as Python 2. Some differences between the two versions are discussed in the footnotes.
Contents
Preliminaries
One important thing to understand when programming in Python is that correct indenting of code is essential. The Python programming language was designed with readability in mind, and as a result forces you to indent code blocks, e.g.
- while and for loops
- if, elif, else constructs
- functions
The indent for each block must be the same, the Python programming language also requires you to mark the start of a block with a colon. So where MATLAB used end
to mark the end of a block of code, in Python a code block ends when the indenting reverts. Other than this, simple Python programmes aren't dissimilar to those in MATLAB.
For example, the simplest case of an if
conditional statement in Python would look something like this
if condition:
statement1
statement2
...
where the code in lines statement1
, statement2
, ...
is executed only if condition
is True
. Sharp sighted readers might spot another difference to MATLAB, in Python there is no need to add a semicolon at the end of a line to suppress output, since Python produces no output for lines involving assignment (i.e. lines with the =
sign).
The boolean condition
can be built up using relational and logical operators. Relational operators in Python are similar to those in MATLAB, e.g. ==
tests for equality, >
and >=
test for greater than and greater than or equal to respectively. The main difference is that!=
tests for inequality in Python (compared to ~=
in MATLAB). Relational operators return boolean values of either True
or False
.
And Python's logical operators are and
, or
and not
, which are hopefully self explanatory.
The if
functionality can be expanded using else
as follows
if condition:
statement1
statement2
...
else:
statement1a
statement2a
...
where statement1
, statement2
, ...
is executed if condition
is True
, and statement1a
, statement2a
, ...
is executed if condition
is False
. Note that the code block after else
starts with a colon, and this code block is also indented.
Finally, the most general form of this programming construct introduces the elif
keyword (in contrast to elseif
in MATLAB) to give
if condition1:
statement1
statement2
...
elif condition2:
statement1a
statement2a
...
...
...
elif conditionN:
statement1b
statement2b
...
else:
statement1c
statement2c
...
Like MATLAB, Python has while and for loops. Unconditional for loops iterate over a list or range of values, e.g.
for LoopVariable in ListOrRangeOfValues:
statement1
statement2
...
and repeat for as many times as there are elements in ListOrRangeOfValues
, each time assigning the next element in the list/range to LoopVariable
. The code block associated with the loop is identified by a colon and indenting as described above.
There are various ways of creating a list or range object in Python 3. The range
function can be used to create sequences of integers with a defined start, stop and step value. The advantage of a range
object over a range
is that the sequence of values are not stored in memory with the range
. [2]. For example to create a range containing the four values 1, 4, 7 and 10, i.e. a sequence starting at 1 with steps of 3, we can use range(1,11,3)
. Note that the stop value passed to the range function is not included, i.e. range(1,10,3)
would produce only the three numbers 1, 4 & 7. We can verify this at the Python command prompt, i.e.
>>> range(1,11,3)
[1, 4, 7, 10]
>>> range(1,10,3)
[1, 4, 7]
This might seems strange, but makes more sense when we realise the start and step values are optional, and the range function assumes default values of 1 for these if they are not given, i.e. range(N)
returns N
values starting at 1, e.g.
>>> range(5)
[0, 1, 2, 3, 4]
>>> range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Python lists can be created from a sequence of values separated by commas within square brackets, e.g. MyList = [1.0, "hello", 1]
creates a list called MyList
containing 3 values, a floating point number 1.0
, the string hello
and an integer 1
. This example demonstrates that Python lists are general purpose containers, and elements don't have to be of the same class. It is for this reason that lists and ranges are best avoided for numerical calculations unless they are relatively simple, as there are much more efficient containers for numbers, i.e. NumPy arrays, which will be introduced in due course.
Conditional while loops are identified with the while
keyword, so
while condition:
statement1
statement2
...
will repeatedly execute the code block for as long as condition
is True
.
As in MATLAB, Python allows us to break out of for or while loops, or continue with the next iteration of a loop, using break
and continue
respectively.
for
We now look at the Python equivalents of the MATLAB code discussed in the MATLAB page on Program Flow and Logicals. A description of the mathematics is available on the MATLAB page, for brevity it is not repeated here. In the case when the error terms in e
are known in advance, the Python version of the algorithm is:
- Find length of the list containing the error terms
e
:T=len(e)
- Initialize a list
y
with the same length as vectore
:y=[0.0]*T
- Compute
y[0]=phi0+phi1*y0+e[0]
. Please remember, we assume that [math]y_0=E(y)=\phi_0/(1-\phi_1)[/math] - Compute
y[i]=phi0+phi1*y[i-1]+e[i]
for [math]i=1[/math] - Repeat line 4 for [math]i=2,...,(T-1)[/math]
A simple implementation in Python follows, and a description of how to run this code is given towards the end of this page.
T=len(e)
y=[0.0]*T
y0=phi0/(1-phi1)
y[0]=phi0+phi1*y0+e[0]
for i in range(1,T):
y[i]=phi0+phi1*y[i-1]+e[i]
and for comparison the MATLAB code is
T=size(e,1);
y=zeros(T,1);
y0=phi0/(1-phi1);
y(1)=phi0+phi1*y0+e(1);
for i=2:T
y(i)=phi0+phi1*y(i-1)+e(i);
end
One important difference to MATLAB is that Python list and array indexing starts at 0 and indices are placed inside square brackets (array indices start at 1 in MATLAB). It is also important to understand that Python generally assumes a number to be integer unless there is something to indicate it is a floating point value. Consider the line y=[0.0]*T
that preallocates a Python list containing T
floating point numbers all set to zero. If this had been written as y=[0]*T
the list would contain T
integers instead. We can demonstrate this at the Python prompt using the type
function, which tells us the class of an object, e.g.
>>>type(0.0)
<class 'float'>
>>> type(0)
<class 'int'>
>>> type(0e0)
<class 'float'>
Controversially, the behaviour of integer division changed in Python version 3, compared to version 2, and it is worth mentioning this now. In Python 2
>>>type(1/2)
<type 'int'>
>>> 1/2
0
whereas in Python 3
>>>type(1/2)
<class 'float'>
>>> 1/2
0.5
if else
As above, a description of the mathematics can be found on the MATLAB page on Program Flow and Logicals. The Python algorithm is now
- Find length of the list containing the error terms
e
:T=len(e)
- Initialize a list
y
with the same length ase
:y=[0.0]*T
- Check whether
abs(phi1)<1
. If this statement is true, theny0=phi0/(1-phi1)
. Else,y0=0
. Please remember, we set [math]y_0=E(y_0)[/math]. - Compute
y[0]=phi0+phi1*y0+e[0]
. - Compute
y[i]=phi0+phi1*y[i-1]+e[i]
for [math]i=1[/math] - Repeat line 5 for [math]i=2,...,(T-1)[/math]
This can be implemented in Python as
T=len(e)
y=[0.0]*T
y0=0.0
if abs(phi1)<1:
y0=phi0/(1-phi1)
y[0]=phi0+phi1*y0+e[0]
for i in range(1,T):
y[i]=phi0+phi1*y[i-1]+e[i]
which is relatively similar to the MATLAB version
T=size(e,1);
y=zeros(T,1);
y0=0;
if abs(phi1)<1
y0=phi0/(1-phi1);
end
y(1)=phi0+phi1*y0+e(1)
for i=2:T
y(i)=phi0+phi1*y(i-1)+e(i);
end
while
The Python alternative of the above code using a conditional while
loop implements the following algorithm. Remember that this contrived example is purely for demonstration purposes, and usually while
loops are used when the number of iterations is not known in advance.
- Find length of the list containing the error terms e: T=len(e)
- Initialize a list
y
with the same length ase
:y=[0.0]*T
- Check whether
abs(phi1)<1
. If this statement is true, theny0=phi0/(1-phi1)
. Else,y0=0
. - Compute
y[0]=phi0+phi1*y0+e[0]
. - Compute
y[i]=phi0+phi1*y[i-1]+e[i]
for [math]i=1[/math] - Increase i by 1, i.e. [math]i=i+1[/math].
- Repeat lines 5-6 whilst [math]i\lt T[/math]
The Python code is a follows.
T=len(e)
y=[0.0]*T
y0=0.0
if abs(phi1)<1:
y0=phi0/(1-phi1)
y[0]=phi0+phi1*y0+e[0]
i=1
while i < T:
y[i]=phi0+phi1*y[i-1]+e[i]
i+=1
This introduces a shorthand also used in other programming languages (e.g. C) as i+=1
is shorthand for i=i+1
. This shorthand can be used with other operators, e.g. i*=10
is equivalent to typing i=i*10
.
For comparison, the MATLAB code is
T=size(e,1);
y=zeros(T,1);
y0=0;
if abs(phi1)<1
y0=phi0/(1-phi1);
y(1)=phi0+phi1*y0+e(1)
i=2;
while i<=T
y(i)=phi0+phi1*y(i-1)+e(i);
i=i+1;
end
Improvements on the above (avoiding loops)
Like MATLAB, Python allow us to adopt a programming style that both simplifies code, and also allows programs to run faster, in particular:
- Operators, functions and logical expressions can work not only on scalars, but also on vectors, matrices and, in general, on n-dimensional arrays
- Subvectors/submatrices can be extracted using logical 0-1 arrays
Using Python Packages
The functionality that allows us to operate on whole vectors and matrices isn't part of core Python, and requires us to use a Python package called NumPy, which adds other useful functionality including pseudo-random number generators. There are many other Python Packages, which are listed at the Python Package Index.
Before using a Python package, the package must be imported, e.g.
import numpy
Functions within a package are located within namespaces. Namespaces are useful because they allow package writers to choose functions names without worrying about whether that function name has been used elsewhere. For example, NumPy includes a function called rand
, which exists within a namespace called random. And the random namespace is within the NumPy namespace (which is called numpy). After importing NumPy we can use the rand function, but have to include the namespaces within the function call, e.g. to use rand
at the Python command prompt to generate 5 random numbers
>>> import numpy
>>> A = numpy.random.rand(5)
>>> A
array([ 0.50639352, 0.44000756, 0.16118149, 0.69615487, 0.3887179 ])
So numpy.random.rand
refers to the rand
function in the numpy.random
namespace. While this allows safe reuse of names, it does potentially introduce a lot of extra typing, and so Python includes ways to simplify our code. For example, we can import individual functions from a namespace as follows
>>> from numpy.random import rand
>>> A = rand(4)
>>> A
array([ 0.25254338, 0.95567921, 0.28244092, 0.92564069])
and we can also rename the function as we import it
>>> from numpy.random import rand as nprand
>>> A = nprand(4)
>>> A
array([ 0.96127673, 0.57402182, 0.36119553, 0.99832014])
In addition we can rename the namespace
>>> import numpy.random as npr
>>> A = npr.rand(4)
>>> A
array([ 0.4282803 , 0.80106321, 0.7078212 , 0.13823879])
Simple example
In the above example the NumPy rand function returned random values in a Numpy array, as can be demonstrated at the Python command line.
>>> import numpy
>>> A = numpy.random.rand(10)
>>> type(A)
<class 'numpy.ndarray'>
>>> A
array([ 0.64799452, 0.41578081, 0.11770639, 0.21143116, 0.98658862,
0.35056233, 0.32420828, 0.5539366 , 0.58682753, 0.53097958])
NumPy arrays have significant differences to MATLAB arrays (and NumPy also contains a matrix class) so it's important to read the NumPy documentation, which includes tutorials and a comparison of NumPy with MATLAB. One important difference is the copy
function is used to copy values from one array to another, rather than assignment with =
. For example, given a NumPy array A
, the assignment B=A
does not copy values in A
to a new array B
, instead A
and B
are simply two names for the same array of values. However B=A.copy()
does copy all values in A
into a new array B
.
NumPy array (and Python list) slices work in subtly different ways to MATLAB's too. For example, A[m:n]
returns all values from the element with the index m
to the element with index n-1
, and because the first element has index 0, we receive the (m+1)th to nth values, e.g.
>>> r=[1,2,3,4,5,6,7,8,9,10]
>>> r[0:10]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> r[4:6]
[5, 6]
Compare this to MATLAB
>> r=[1,2,3,4,5,6,7,8,9,10]
r =
1 2 3 4 5 6 7 8 9 10
>> r(1:10)
ans =
1 2 3 4 5 6 7 8 9 10
>> r(4:6)
ans =
4 5 6
NumPy arrays are important because they can be used in whole array operations. Operations and function calls on whole arrays are much faster than the equivalent code using loops, as they allow optimal use of the processor (such code optimisation is often called vectorisation). In addition code using vector and matrix operations is often shorter and easier to read that the equivalent using loops.
For example we can test which values in A
are greater than 0.5, and then copy those values to a new array called B
as follows.
>>> A
array([ 0.64799452, 0.41578081, 0.11770639, 0.21143116, 0.98658862,
0.35056233, 0.32420828, 0.5539366 , 0.58682753, 0.53097958])
>>> ind = A > 0.5
>>> ind
array([ True, False, False, False, True, False, False, True, True, True], dtype=bool)
>>> B = A[ind].copy()
>>> B
array([ 0.64799452, 0.98658862, 0.5539366 , 0.58682753, 0.53097958])
Another method of code optimisation is to preallocate arrays, this operation is much quicker than growing arrays on-the-fly. In this example we preallocate two arrays at the Python prompt with 10,000 elements each, the first array contains integers and the second contains double precision floating point numbers.
>>> n=10000
>>> A=numpy.zeros(n,int)
>>> B=A=numpy.zeros(n)
More advanced example
We now look at the Python equivalent of the Relevant example on the MATLAB page, which assumes we have [math]T[/math] returns in a vector r
and we want to:
- Count the number of positive, negative and zero returns
- Create an array holding only the positive values
- Create another array holding only the negative values
- Compute the means of the positive and negative returns
The naive algorithm using a loop in Python is as follows.
- Find the length of the NumPy array holding
r
, i.e.T=numpy.size(r)
- Initiate three counter variables,
Tplus=0; Tzero=0; Tminus=0
- Initiate two sum variables,
psum=0.0; nsum=0.0
- Preallocate NumPy arrays
rplus=numpy.zeros(T)
andrminus=numpy.zeros(T)
(since we don’t know how many negative and positive returns we will observe) - Set
i=0
- Check whether
r[i]
is greater, smaller or equal to 0- If
r[i]>0
, setrplus[Tplus]=r[i]
, addr[i]
topsum
, and add 1 toTplus
- Else if
r[i]<0
setrminus[Tminus]=r[i]
, addr[i]
tonsum
and add 1 toTminus
- Else add 1 to
Tzero
- If
- Repeat 6 for [math]i=1,\ldots,(T-1)[/math]
- Remove spare zeros from
rplus
andrminus
, i.e.rplus=rplus[0:Tplus].copy()
andrminus=rminus[0:Tminus].copy()
- Compute means of rminus and rplus (the number of positive, negative and zero returns are stored in
Tplus,Tminus,Tzero
)
The Python code is as follows, however note that this code isn't completely free of vector operations, since removal of zeros from rplus
and rminus
is vectorised.
import numpy
T=numpy.size(r)
Tplus=0;Tminus=0;Tzero=0
psum=0.0;nsum=0.0
rplus=numpy.zeros(T);rminus=numpy.zeros(T)
for i in range(T):
if r[i]>0:
rplus[Tplus]=r[i] #Store positive return in array rplus
Tplus+=1 #Increase Tplus by one if return is positive
psum+=r[i] #Add return to sum of positive values
elif r[i]<0:
rminus[Tminus]=r[i] #Store negative return in array rminus
Tminus+=1 #Increase Tminus by one if return is negative
nsum+=r[i] #Add return to sum of negative values
else:
Tzero+=1 #Increase Tzero by one if return is zero
rplus=rplus[0:Tplus].copy() #Remove zeros from rplus
rminus=rminus[1:Tminus].copy() #Remove zeros from rminus
meanplus=psum/Tplus # Compute mean of positive returns
meanminus=nsum/Tminus # Compute mean of negative returns
We can create an alternative algorithm that only uses vector operations, using the following algorithm.
- Create an array
rplus
containing the positive values fromr
- Create an array
rminus
containing the negative values fromr
- Find the length of
rplus
and assign toTplus
- Find the length of
rminus
and assign toTminus
- Calculate
Tzero
- Find the mean of
rplus
andrminus
using vectorised functions
import numpy
rplus=r[r>0].copy() # Create an array containing positive returns
rplus=r[r<0].copy() # Create an array containing negative returns
Tplus=len(rplus) # Count how many positive returns there are
Tminus=len(rminus) # Count how many negative returns there are
Tzero=len(r)-Tplus-Tminus # Calculate the number of zero returns
meanplus=numpy.mean(rplus) # Compute mean of positive returns using numpy.mean
meanminus=numpy.sum(rminus)/Tminus # Compute mean of negative returns using numpy.sum
This version is much shorter and cleaner, and therefore easier to create and maintain.
Running Python programs
For people who are familiar with MATLAB it may be surprising to discover there is no simple way of running a Python program from within Python. If you want to run Python code using standard Python, your choices are either
- Launch it from outside Python, e.g. save to a file
myscript.py
and at the command line enterpython myscript.py
- Convert the program to a function and use the Python module functionality, e.g. save to a file
myfunctions.py
and use Python'simport
to make the functions available.
The Python prompt has other limitations too, for example, it doesn't include commands like pwd
, cd
, pwd
, etc. To avoid these limitations we can use IPython (Interactive Python), which provides an command line interface which behaves far more like MATLAB. For example, if we save the first code snippet in More advanced Example (see above) to a file called ReturnAnalysis1.py
then from within IPython we can create a vector of random values to be analysed, and use run to execute the program, noting that for MATLAB like behaviour where the script can see the current interactive namespace requies the -i flag. The -t flag is useful too, as it times how long the script takes to run.
In [1]: n=100000000
In [2]: from numpy.random import rand
In [3]: r=rand(n)*10-5
In [4]: r=rand(n)*10-5
In [5]: run -i -t ReturnAnalysis1
IPython CPU timings (estimated):
User : 150.84 s.
System : 0.44 s.
Wall time: 151.72 s.
In [6]: meanplus
Out[6]: 2.4998243901335169
In [7]: meanminus
Out[7]: -2.4999279003309622
In comparison, when running the vectorised version of the code (saved in ReturnAnalysis2.py
).....
In [8]: run -i -t ReturnAnalysis2
IPython CPU timings (estimated):
User : 1.84 s.
System : 0.34 s.
Wall time: 2.19 s.
In [9]: meanplus
Out[9]: 2.4998243901335169
In [10]: meanminus
Out[10]: -2.4999279003309622
Finally, when running the vectorised matlab version (saved to ReturnAnalysis2.m) the run time is......
Footnotes
- ↑ Although Python 2 and 3 are not totally compatible, Python 2.7 is close to Python 3. If you have to use Python 2, it is recommended using version 2.7, writing code as close to Python 3 as possible, and using tools like 2to3 to port to Python 3. Alternatively there is a Python compatibility packages called six.
- ↑ In Python 3 the
range
function creates a range object. However the Python 2range
function creates a list, i.e. stores every integer value required in memory which is very inefficient if simply looping through a long sequence of integers in afor
loop. Python 2 hasxrange
that behaves like the Python 3range
.