Difference between revisions of "Python/Program Flow and Logicals"

From ECLR
Jump to: navigation, search
(Running Python programs)
 
(47 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
The following assumes use of Python version 3, as opposed to Python 2. No more major releases are planned for Python 2, and so version 3 is expected to be the future of Python. The two versions of Python, although similar, are not compatible in a forwards or backwards direction<ref>Although Python 2 and 3 are not totally compatible, Python 2.7 is close to Python 3. If you have to use Python 2, it is recommended using version 2.7, writing code as close to Python 3 as possible, and using tools like ''2to3'' to port to Python 3. Alternatively there is a Python compatibility packages called ''six''.</ref>, and some legacy code exists only as Python 2. Some differences between the two versions are discussed in the footnotes.
 +
 
= Preliminaries =
 
= Preliminaries =
  
Line 5: Line 7:
 
* if, elif, else constructs
 
* if, elif, else constructs
 
* functions
 
* functions
The indent for each block must be the same, the Python programming language also requires you to mark the start of a block with a colon. So where MATLAB used <source enclose=none>end</source> to mark the end of a block of code, Python uses a change in indent. Other than this, simple Python programmes aren't dissimilar to those in MATLAB.
+
The indent for each block must be the same, the Python programming language also requires you to mark the start of a block with a colon. So where MATLAB used <source enclose=none>end</source> to mark the end of a block of code, in Python a code block ends when the indenting reverts. Other than this, simple Python programmes aren't dissimilar to those in MATLAB.
  
 
For example, the simplest case of an <source lang="python" enclose=none>if</source> conditional statement in Python would look something like this
 
For example, the simplest case of an <source lang="python" enclose=none>if</source> conditional statement in Python would look something like this
Line 13: Line 15:
 
   ...
 
   ...
 
</source>
 
</source>
where the code in lines <source lang="python" enclose=none>statement1</source>, <source lang="python" enclose=none>statement2</source>, <source lang="python" enclose=none>...</source> is executed only if <source lang="python" enclose=none>condition</source> is <source lang="python" enclose=none>True</source>. Sharp sighted readers might spot another difference to MATLAB, in Python there is no need to add a semicolon at the end of a line to suppress output.
+
where the code in lines <source lang="python" enclose=none>statement1</source>, <source lang="python" enclose=none>statement2</source>, <source lang="python" enclose=none>...</source> is executed only if <source lang="python" enclose=none>condition</source> is <source lang="python" enclose=none>True</source>. Sharp sighted readers might spot another difference to MATLAB, in Python there is no need to add a semicolon at the end of a line to suppress output, since Python produces no output for lines involving assignment (i.e. lines with the  <source lang="python" enclose=none>=</source> sign).
  
The boolean <source lang="python" enclose=none>condition</source> can be built up using relational and logical operators. Relational operators in Python are similar to those in MATLAB, e.g. <source lang="python" enclose=none>==</source> tests for '''equality''', <source lang="python" enclose=none>></source> and <source lang="python" enclose=none>>=</source> test for '''greater than''' and '''greater than or equal to''' respectively. The main difference is that<source lang="python" enclose=none>!=</source> tests for '''inequality''' in Python, compared to <source enclose=none>~=</source> in MATLAB. Relational operators return boolean values of either <source lang="python" enclose=none>True</source> or <source lang="python" enclose=none>False</source>.
+
The boolean <source lang="python" enclose=none>condition</source> can be built up using relational and logical operators. Relational operators in Python are similar to those in MATLAB, e.g. <source lang="python" enclose=none>==</source> tests for '''equality''', <source lang="python" enclose=none>></source> and <source lang="python" enclose=none>>=</source> test for '''greater than''' and '''greater than or equal to''' respectively. The main difference is that<source lang="python" enclose=none>!=</source> tests for '''inequality''' in Python (compared to <source enclose=none>~=</source> in MATLAB). Relational operators return boolean values of either <source lang="python" enclose=none>True</source> or <source lang="python" enclose=none>False</source>.
  
 
And Python's logical operators are <source lang="python" enclose=none>and</source>, <source lang="python" enclose=none>or</source> and <source lang="python" enclose=none>not</source>, which are hopefully self explanatory.
 
And Python's logical operators are <source lang="python" enclose=none>and</source>, <source lang="python" enclose=none>or</source> and <source lang="python" enclose=none>not</source>, which are hopefully self explanatory.
Line 29: Line 31:
 
   ...
 
   ...
 
</source>   
 
</source>   
where <source lang="python" enclose=none>statement1</source>, <source lang="python" enclose=none>statement2</source>, <source lang="python" enclose=none>...</source> is executed if <source lang="python" enclose=none>condition</source> is <source lang="python" enclose=none>True</source>, and <source lang="python" enclose=none>statement1a</source>, <source lang="python" enclose=none>statement2a</source>, <source lang="python" enclose=none>...</source> is executed if <source lang="python" enclose=none>condition</source> is <source lang="python" enclose=none>False</source>. Note that the code block after the <source lang="python" enclose=none>else</source> starts with a colon, and this code block is also indented.
+
where <source lang="python" enclose=none>statement1</source>, <source lang="python" enclose=none>statement2</source>, <source lang="python" enclose=none>...</source> is executed if <source lang="python" enclose=none>condition</source> is <source lang="python" enclose=none>True</source>, and <source lang="python" enclose=none>statement1a</source>, <source lang="python" enclose=none>statement2a</source>, <source lang="python" enclose=none>...</source> is executed if <source lang="python" enclose=none>condition</source> is <source lang="python" enclose=none>False</source>. Note that the code block after <source lang="python" enclose=none>else</source> starts with a colon, and this code block is also indented.
  
 
Finally, the most general form of this programming construct introduces the <source lang="python" enclose=none>elif</source> keyword (in contrast to <source enclose=none>elseif</source> in MATLAB) to give
 
Finally, the most general form of this programming construct introduces the <source lang="python" enclose=none>elif</source> keyword (in contrast to <source enclose=none>elseif</source> in MATLAB) to give
Line 53: Line 55:
 
</source>
 
</source>
  
Like MATLAB, Python has while and for loops. Unconditional for loops iterate over a '''list''' of values
+
Like MATLAB, Python has while and for loops. Unconditional for loops iterate over a '''list''' or '''range''' of values, e.g.
  
<source lang="python">for CounterVariable in ListOfValues:
+
<source lang="python">for LoopVariable in ListOrRangeOfValues:
 
   statement1
 
   statement1
 
   statement2
 
   statement2
 
   ...
 
   ...
 
</source>
 
</source>
and repeat for as many times as there are elements in the <source lang="python" enclose=none>ListOfValues</source>, each time assigning the next element in the list to the <source lang="python" enclose=none>CounterVariable</source>. The code block associated with the loop is identified by a colon and indenting as described above.
+
and repeat for as many times as there are elements in <source lang="python" enclose=none>ListOrRangeOfValues</source>, each time assigning the next element in the list/range to <source lang="python" enclose=none>LoopVariable</source>. The code block associated with the loop is identified by a colon and indenting as described above.
  
There are various ways of creating a list in Python. The <source lang="python" enclose=none>range</source> function can be used to create sequences of numbers with a defined start, stop and step value. For example to create a list containing the four values 1, 4, 7 and 10, i.e. a sequence starting at 1 with steps of 3, use <source lang="python" enclose=none>range(1,11,3)</source>. Note that the stop value passed to the range function is not included in the list, i.e. <source lang="python" enclose=none>range(1,10,3)</source> would produce only the three numbers 1, 4 & 7. We can verify this at the Python command prompt, i.e.
+
There are various ways of creating a list or range object in Python 3. The <source lang="python" enclose=none>range</source> function can be used to create sequences of integers with a defined start, stop and step value. The advantage of a <source lang="python" enclose=none>range</source> object over a Python <source lang="python" enclose=none>list</source> is that every single integer value is not stored in memory with a <source lang="python" enclose=none>range</source>. <ref>In Python 3 the <source lang="python" enclose=none>range</source> function creates a range object. However the Python 2 <source lang="python" enclose=none>range</source> function creates a list, i.e. stores every integer value required in memory which is very inefficient if simply looping through a long sequence of integers in a <source lang="python" enclose=none>for</source> loop. Python 2 has <source lang="python" enclose=none>xrange</source> that behaves like the Python 3 <source lang="python" enclose=none>range</source>.</ref>. For example to create a range containing the four values 1, 4, 7 and 10, i.e. a sequence starting at 1 with steps of 3, we can use <source lang="python" enclose=none>range(1,11,3)</source>. Note that the stop value passed to the range function is not included, i.e. <source lang="python" enclose=none>range(1,10,3)</source> would produce only the three numbers 1, 4 & 7. We can verify this at the Python command prompt, i.e.
  
 
<source lang="python">>>> range(1,11,3)
 
<source lang="python">>>> range(1,11,3)
Line 68: Line 70:
 
>>> range(1,10,3)
 
>>> range(1,10,3)
 
[1, 4, 7]
 
[1, 4, 7]
 +
</source>
 +
This might seems strange, but makes more sense when we realise the start and step values are optional, and the range function assumes default values of 0 and 1 respectively for these if they are not given, i.e.  <source lang="python" enclose=none>range(N)</source> returns <source lang="python" enclose=none>N</source> values starting at 0, e.g.
 +
<source lang="python">>>> range(5)
 +
[0, 1, 2, 3, 4]
 +
>>> range(10)
 +
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
 
</source>
 
</source>
  
Python lists can also be created from a sequence of values separated by commas within square brackets, e.g. <source lang="python" enclose=none>MyList = [1.0, "hello", 1]</source> creates a list called <source lang="python" enclose=none>MyList</source> containing 3 values, a floating point number <source lang="python" enclose=none>1.0</source>, the string <source lang="python" enclose=none>hello</source> and an integer <source lang="python" enclose=none>1</source>. This example demonstrates that Python lists are general purpose containers, and that elements don't have to be of the same class. It is for this reason that lists are best avoided for numerical calculations unless they are relatively simple, as there are much more efficient containers for numbers, i.e. NumPy arrays, which will be introduced in due course.
+
Python lists can be created from a sequence of values separated by commas within square brackets, e.g. <source lang="python" enclose=none>MyList = [1.0, "hello", 1]</source> creates a list called <source lang="python" enclose=none>MyList</source> containing 3 values, a floating point number <source lang="python" enclose=none>1.0</source>, the string <source lang="python" enclose=none>hello</source> and an integer <source lang="python" enclose=none>1</source>. This example demonstrates that Python lists are general purpose containers, and elements don't have to be of the same class. It is for this reason that lists are best avoided for numerical calculations unless they are relatively simple, as there are much more efficient containers for numbers, i.e. NumPy arrays, which will be introduced in due course.
  
 
Conditional while loops are identified with the <source lang="python" enclose=none>while</source> keyword, so  
 
Conditional while loops are identified with the <source lang="python" enclose=none>while</source> keyword, so  
Line 81: Line 89:
 
will repeatedly execute the code block for as long as <source lang="python" enclose=none>condition</source> is <source lang="python" enclose=none>True</source>.
 
will repeatedly execute the code block for as long as <source lang="python" enclose=none>condition</source> is <source lang="python" enclose=none>True</source>.
  
As in MATLAB, Python allows us to break out of for or while loops, or continues with the next iteration of a loop, using <source enclose=none lang="python">break</source> and <source enclose=none lang="python">continue</source> respectively.  
+
As in MATLAB, Python allows us to '''break''' out of for or while loops, or '''continue''' with the next iteration of a loop, using <source enclose=none lang="python">break</source> and <source enclose=none lang="python">continue</source> respectively.  
  
 
== <source lang="python" enclose=none>for </source> ==
 
== <source lang="python" enclose=none>for </source> ==
  
We now look at the Python equivalents of the MATLAB code discussed in the [[Program_Flow_and_Logicals#for_..._end_loop|MATLAB page on Program Flow and Logicals]]. A description of the mathematics is available on the MATLAB page, for brevity it is not repeated here. In the case when the error terms in <source enclose=none lang="python">e</source> are known in advance, the Python version of the algorithm is:
+
We now look at Python equivalents of the MATLAB <source enclose=none>for ... end</source> loop discussed in the [[Program_Flow_and_Logicals#for_..._end_loop|MATLAB page on Program Flow and Logicals]]. A description of the mathematics can be found on the MATLAB page, for brevity it is not repeated here. In the case when the error terms in <source enclose=none lang="python">e</source> are known in advance, the Python version of the algorithm is:
  
 
# Find length of the list containing the error terms <source enclose=none lang="python">e</source>: <source lang="python" enclose=none>T=len(e)</source>
 
# Find length of the list containing the error terms <source enclose=none lang="python">e</source>: <source lang="python" enclose=none>T=len(e)</source>
Line 93: Line 101:
 
# Repeat line 4 for <math>i=2,...,(T-1)</math>
 
# Repeat line 4 for <math>i=2,...,(T-1)</math>
  
A simple implementation in Python is  
+
A simple implementation in Python follows (a description of how to run this code is given towards the end of this page).
  
 
<source lang="python">T=len(e)
 
<source lang="python">T=len(e)
Line 113: Line 121:
 
   end</source>
 
   end</source>
  
 +
One important difference to MATLAB is that Python list and array indexing starts at 0 and indices are placed inside square brackets (array indices start at 1 in MATLAB). It is also important to understand that Python generally assumes a number to be integer unless there is something to indicate it is a floating point value. Consider the line <source lang="python" enclose=none>y=[0.0]*T</source> that preallocates a Python list containing <source lang="python" enclose=none>T</source> '''floating point''' numbers all set to zero. If this had been written as <source lang="python" enclose=none>y=[0]*T</source> the list would contain <source lang="python" enclose=none>T</source> '''integers''' instead. We can demonstrate this at the Python prompt using the <source lang="python" enclose=none>type</source> function, which tells us the class of an object, e.g.
  
One difference to note is Python list and array indexing starts at 0 and uses square brackets, whereas  array indices start at 1 in MATLAB. The line <source lang="python" enclose=none>y=[0.0]*T</source> preallocates a Python list containing <source lang="python" enclose=none>T</source> floating point numbers all set to zero.
+
<source lang="python">>>>type(0.0)
 +
<class 'float'>
 +
>>> type(0)
 +
<class 'int'>
 +
>>> type(0e0)
 +
<class 'float'>
 +
</source>
 +
This is a good point to mention that the behaviour of integer division changed in Python 3, compared to version 2. In Python 2
 +
<source lang="python">>>>type(1/2)
 +
<type 'int'>
 +
>>> 1/2
 +
0
 +
</source>
 +
whereas in Python 3
 +
<source lang="python">>>>type(1/2)
 +
<class 'float'>
 +
>>> 1/2
 +
0.5
 +
</source>
  
 
== <source lang="python" enclose=none>if else</source> ==
 
== <source lang="python" enclose=none>if else</source> ==
Line 126: Line 153:
 
# Compute <source enclose=none lang="python">y[i]=phi0+phi1*y[i-1]+e[i]</source> for <math>i=1</math>
 
# Compute <source enclose=none lang="python">y[i]=phi0+phi1*y[i-1]+e[i]</source> for <math>i=1</math>
 
# Repeat line 5 for <math>i=2,...,(T-1)</math>
 
# Repeat line 5 for <math>i=2,...,(T-1)</math>
 +
 +
This can be implemented in Python as
 +
<source lang="python">T=len(e)
 +
y=[0.0]*T
 +
y0=0.0
 +
if abs(phi1)<1:
 +
  y0=phi0/(1-phi1)
 +
y[0]=phi0+phi1*y0+e[0]
 +
for i in range(1,T):
 +
  y[i]=phi0+phi1*y[i-1]+e[i]
 +
</source>
 +
which is relatively similar to the MATLAB version
 +
<source>  T=size(e,1);
 +
  y=zeros(T,1);
 +
  y0=0;
 +
  if abs(phi1)<1
 +
  y0=phi0/(1-phi1);
 +
  end
 +
  y(1)=phi0+phi1*y0+e(1)
 +
  for i=2:T
 +
    y(i)=phi0+phi1*y(i-1)+e(i);
 +
  end</source>
  
 
== <source lang="python" enclose=none>while</source> ==
 
== <source lang="python" enclose=none>while</source> ==
 +
 +
The Python alternative of the above code using a conditional <source enclose=none lang="python">while</source> loop implements the following algorithm. Remember that this contrived example is purely for demonstration purposes, and usually <source enclose=none lang="python">while</source> loops are used when the number of iterations is not known in advance.
 +
 +
# Find length of the list containing the error terms e: T=len(e)
 +
# Initialize a list <source enclose=none lang="python">y</source> with the same length as <source enclose=none lang="python">e</source>: <source enclose=none lang="python">y=[0.0]*T</source>
 +
# Check whether <source enclose=none lang="python">abs(phi1)<1</source>. If this statement is true, then <source enclose=none lang="python">y0=phi0/(1-phi1)</source>. Else, <source enclose=none lang="python">y0=0</source>.
 +
# Compute <source enclose=none lang="python">y[0]=phi0+phi1*y0+e[0]</source>.
 +
# Compute <source enclose=none lang="python">y[i]=phi0+phi1*y[i-1]+e[i]</source> for <math>i=1</math>
 +
# Increase i by 1, i.e. <math>i=i+1</math>.
 +
# Repeat lines 5-6 whilst <math>i<T</math>
 +
 +
The Python code is a follows.
 +
<source lang="python">T=len(e)
 +
y=[0.0]*T
 +
y0=0.0
 +
if abs(phi1)<1:
 +
  y0=phi0/(1-phi1)
 +
y[0]=phi0+phi1*y0+e[0]
 +
i=1
 +
while i < T:
 +
  y[i]=phi0+phi1*y[i-1]+e[i]
 +
  i+=1
 +
</source>
 +
This introduces a shorthand also used in other programming languages (e.g. C) as <source enclose=none lang="python">i+=1</source> is shorthand for <source enclose=none lang="python">i=i+1</source>. This shorthand can be used with other operators, e.g. <source enclose=none lang="python">i*=10</source> is equivalent to typing <source enclose=none lang="python">i=i*10</source>.
 +
 +
For comparison, the MATLAB code is
 +
 +
<source>  T=size(e,1);
 +
  y=zeros(T,1);
 +
  y0=0;
 +
  if abs(phi1)<1
 +
  y0=phi0/(1-phi1);
 +
  y(1)=phi0+phi1*y0+e(1)
 +
  i=2;
 +
  while i<=T
 +
    y(i)=phi0+phi1*y(i-1)+e(i);
 +
    i=i+1;
 +
  end</source>
 +
 +
== Improvements on the above (avoiding loops) ==
 +
 +
Like MATLAB, Python allow us to adopt a programming style that both '''simplifies code''', and also '''allows programs to run faster''', in particular:
 +
 +
# Operators, functions and logical expressions can work not only on scalars, but also on vectors, matrices and, in general, on n-dimensional arrays
 +
# Subvectors/submatrices can be extracted using logical arrays
 +
 +
=== Using Python Packages ===
 +
 +
The functionality that allows us to operate on whole vectors and matrices isn't part of core Python, and requires us to use a Python package called [http://www.numpy.org NumPy], which adds other useful functionality including pseudo-random number generators. There are many other Python Packages, these are listed at [https://pypi.python.org/pypi the Python Package Index].
 +
 +
Before using a Python package, the package must be imported, e.g.
 +
<source lang="python">import numpy</source>
 +
Functions within a package are located within '''namespaces'''. Namespaces are useful because they allow package writers to choose functions and variable names without worrying about whether that name has been used elsewhere. For example, NumPy includes a function called <source enclose=none lang="python">rand</source>, which exists within a namespace called ''random''. And the ''random'' namespace is within the NumPy namespace (which is called ''numpy''). After importing NumPy we can use the rand function, but have to include the namespaces within the function call, e.g. to use <source enclose=none lang="python">rand</source> at the Python command prompt (to generate 5 random numbers)
 +
<source lang="python">
 +
>>> import numpy
 +
>>> A = numpy.random.rand(5)
 +
>>> A
 +
array([ 0.50639352,  0.44000756,  0.16118149,  0.69615487,  0.3887179 ])
 +
</source>
 +
So <source enclose=none lang="python">numpy.random.rand</source> refers to the <source enclose=none lang="python">rand</source> function in the <source enclose=none lang="python">numpy.random</source> namespace. While this allows safe reuse of names, it does potentially introduce a lot of extra typing, and so Python includes ways to simplify our code. For example, we can import individual functions from a namespace as follows
 +
<source lang="python">
 +
>>> from numpy.random import rand
 +
>>> A = rand(4)
 +
>>> A
 +
array([ 0.25254338,  0.95567921,  0.28244092,  0.92564069])
 +
</source>
 +
and we can also rename the function as we import it
 +
<source lang="python">>>> from numpy.random import rand as nprand
 +
>>> A = nprand(4)
 +
>>> A
 +
array([ 0.96127673,  0.57402182,  0.36119553,  0.99832014])
 +
</source>
 +
In addition we can rename the namespace
 +
<source lang="python">>>> import numpy.random as npr
 +
>>> A = npr.rand(4)
 +
>>> A
 +
array([ 0.4282803 ,  0.80106321,  0.7078212 ,  0.13823879])
 +
</source>
 +
 +
=== Simple example ===
 +
In the above example the NumPy rand function returned random values in a Numpy array, as can be demonstrated at the Python command line.
 +
<source lang="python">>>> import numpy
 +
>>> A = numpy.random.rand(10)
 +
>>> type(A)
 +
<class 'numpy.ndarray'>
 +
>>> A
 +
array([ 0.64799452,  0.41578081,  0.11770639,  0.21143116,  0.98658862,
 +
        0.35056233,  0.32420828,  0.5539366 ,  0.58682753,  0.53097958])
 +
</source>
 +
 +
NumPy arrays have significant differences to MATLAB arrays (and NumPy also contains a matrix class) so it's important to read the [http://docs.scipy.org/doc/ NumPy documentation], which includes [http://wiki.scipy.org/Tentative_NumPy_Tutorial tutorials] and a [http://wiki.scipy.org/NumPy_for_Matlab_Users comparison of NumPy with MATLAB]. One important difference is the <source enclose=none lang="python">copy</source> function is used to copy values from one array to another, rather than assignment with <source enclose=none lang="python">=</source>. For example, given a NumPy array <source enclose=none lang="python">A</source>, the assignment <source enclose=none lang="python">B=A</source> '''does not copy''' values in <source enclose=none lang="python">A</source> to a new array <source enclose=none lang="python">B</source>, instead <source enclose=none lang="python">A</source> and <source enclose=none lang="python">B</source> are simply two names for the same array of values. However <source enclose=none lang="python">B=A.copy()</source> does copy all values in <source enclose=none lang="python">A</source> into a new array <source enclose=none lang="python">B</source>.
 +
 +
NumPy array (and Python list) slices work in subtly different ways to MATLAB's too. For example, <source enclose=none lang="python">A[m:n]</source> returns all values from the element with the index <source enclose=none lang="python">m</source> to the element with index <source enclose=none lang="python">n-1</source>, and because the first element has index 0, we receive the (m+1)<sup>th</sup> to n<sup>th</sup> values, e.g.
 +
<source lang="python">
 +
>>> r=[1,2,3,4,5,6,7,8,9,10]
 +
>>> r[0:10]
 +
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
 +
>>> r[4:6]
 +
[5, 6]
 +
</source>
 +
Compare this to MATLAB
 +
<source>
 +
>> r=[1,2,3,4,5,6,7,8,9,10]
 +
r =
 +
    1    2    3    4    5    6    7    8    9    10
 +
>> r(1:10)
 +
ans =
 +
    1    2    3    4    5    6    7    8    9    10
 +
>> r(4:6)
 +
ans =
 +
    4    5    6
 +
</source>
 +
 +
NumPy arrays are important because they can be used in whole array operations. Operations and function calls on whole arrays are much faster than the equivalent code using loops, as they allow optimal use of the processor (such code optimisation is often called vectorisation). In addition code using vector and matrix operations is often shorter and easier to read that the equivalent using loops.
 +
 +
For example we can test which values in <source enclose=none lang="python">A</source> are greater than 0.5, and then copy those values to a new array called <source enclose=none lang="python">B</source> as follows.
 +
<source lang="python">>>> A
 +
array([ 0.64799452,  0.41578081,  0.11770639,  0.21143116,  0.98658862,
 +
        0.35056233,  0.32420828,  0.5539366 ,  0.58682753,  0.53097958])
 +
>>> ind = A > 0.5
 +
>>> ind
 +
array([ True, False, False, False,  True, False, False,  True,  True,  True], dtype=bool)
 +
>>> B = A[ind].copy()
 +
>>> B
 +
array([ 0.64799452,  0.98658862,  0.5539366 ,  0.58682753,  0.53097958])
 +
</source>
 +
Another method of code optimisation is to preallocate arrays, this operation is much quicker than growing arrays on-the-fly. In this example we preallocate two arrays at the Python prompt with 10,000 elements each, the first array contains integers and the second contains double precision floating point numbers.
 +
<source lang="python">>>> n=10000
 +
>>> A=numpy.zeros(n,int)
 +
>>> B=A=numpy.zeros(n)
 +
</source>
 +
 +
=== More advanced example ===
 +
We now look at the Python equivalent of the code on the MATLAB page in the section entitled [[Program_Flow_and_Logicals#Relevant_example|Relevant example]], which assumed we have <math>T</math> returns in a vector <source enclose=none>r</source> and we want to:
 +
 +
# Count the number of positive, negative and zero returns
 +
# Create an array holding only the positive values
 +
# Create another array holding only the negative values
 +
# Compute the means of the positive and negative returns
 +
 +
A naive Python algorithm that uses a loop rather than vectorisation is as follows.
 +
# Find the length of the NumPy array holding  <source lang="python" enclose=none>r</source>, i.e. <source lang="python" enclose=none>T=numpy.size(r)</source>
 +
# Initiate three counter variables, <source lang="python" enclose=none>Tplus=0; Tzero=0; Tminus=0</source>
 +
# Initiate two sum variables, <source lang="python" enclose=none>psum=0.0; nsum=0.0</source>
 +
# Preallocate NumPy arrays <source lang="python" enclose=none>rplus=numpy.zeros(T)</source> and <source lang="python" enclose=none>rminus=numpy.zeros(T)</source> (since we don’t know how many negative and positive returns we will observe)
 +
# Set <source lang="python" enclose=none>i=0</source>
 +
# Check whether <source lang="python" enclose=none>r[i]</source> is greater, smaller or equal to 0
 +
#* If <source lang="python" enclose=none>r[i]>0</source>, set <source lang="python" enclose=none>rplus[Tplus]=r[i]</source>, add <source lang="python" enclose=none>r[i]</source> to <source lang="python" enclose=none>psum</source>, and add 1 to <source lang="python" enclose=none>Tplus</source>
 +
#* Else if <source lang="python" enclose=none>r[i]<0</source> set <source lang="python" enclose=none>rminus[Tminus]=r[i]</source>, add <source lang="python" enclose=none>r[i]</source> to <source lang="python" enclose=none>nsum</source> and add 1 to <source lang="python" enclose=none>Tminus</source>
 +
#* Else add 1 to <source lang="python" enclose=none>Tzero</source>
 +
# Repeat 6 for <math>i=1,\ldots,(T-1)</math>
 +
# Remove spare zeros from <source lang="python" enclose=none>rplus</source> and <source lang="python" enclose=none>rminus</source>, i.e. <source lang="python" enclose=none>rplus=rplus[0:Tplus].copy()</source> and <source lang="python" enclose=none>rminus=rminus[0:Tminus].copy()</source>
 +
# Compute means of rminus and rplus (the number of positive, negative and zero returns are stored in <source lang="python" enclose=none>Tplus,Tminus,Tzero</source>)
 +
 +
The Python code is as follows, however note that this code isn't completely free of vector operations, since removal of zeros from <source lang="python" enclose=none>rplus</source> and <source lang="python" enclose=none>rminus</source> is vectorised.
 +
<source lang="python">import numpy
 +
T=numpy.size(r)
 +
Tplus=0;Tminus=0;Tzero=0
 +
psum=0.0;nsum=0.0
 +
rplus=numpy.zeros(T);rminus=numpy.zeros(T)
 +
for i in range(T):
 +
  if r[i]>0:
 +
      rplus[Tplus]=r[i]  #Store positive return in array rplus
 +
      Tplus+=1            #Increase Tplus by one if return is positive
 +
      psum+=r[i]          #Add return to sum of positive values
 +
  elif r[i]<0:
 +
      rminus[Tminus]=r[i] #Store negative return in array rminus
 +
      Tminus+=1          #Increase Tminus by one if return is negative
 +
      nsum+=r[i]          #Add return to sum of negative values
 +
  else:
 +
      Tzero+=1            #Increase Tzero by one if return is zero
 +
rplus=rplus[0:Tplus].copy()      #Remove zeros from rplus
 +
rminus=rminus[1:Tminus].copy()  #Remove zeros from rminus
 +
meanplus=psum/Tplus              # Compute mean of positive returns
 +
meanminus=nsum/Tminus            # Compute mean of negative returns
 +
</source>
 +
We can create an alternative algorithm that only uses vector operations, using the following algorithm.
 +
# Create an array <source lang="python" enclose=none>rplus</source> containing the positive values from <source lang="python" enclose=none>r</source>
 +
# Create an array <source lang="python" enclose=none>rminus</source> containing the negative values from <source lang="python" enclose=none>r</source>
 +
# Find the length of <source lang="python" enclose=none>rplus</source> and assign to <source lang="python" enclose=none>Tplus</source>
 +
# Find the length of <source lang="python" enclose=none>rminus</source> and assign to <source lang="python" enclose=none>Tminus</source>
 +
# Calculate <source lang="python" enclose=none>Tzero</source>
 +
# Find the mean of <source lang="python" enclose=none>rplus</source> and <source lang="python" enclose=none>rminus</source> using vectorised functions
 +
<source lang="python">import numpy
 +
rplus=r[r>0].copy()        # Create an array containing positive returns
 +
rminus=r[r<0].copy()        # Create an array containing negative returns
 +
Tplus=len(rplus)            # Count how many positive returns there are
 +
Tminus=len(rminus)          # Count how many negative returns there are
 +
Tzero=len(r)-Tplus-Tminus  # Calculate the number of zero returns
 +
meanplus=numpy.mean(rplus)        # Compute mean of positive returns using numpy.mean
 +
meanminus=numpy.sum(rminus)/Tminus # Compute mean of negative returns using numpy.sum
 +
</source>
 +
This version is much shorter and cleaner, and therefore easier to create and maintain.
 +
 +
== Running Python programs ==
 +
For people who are familiar with MATLAB it may be surprising to discover there is no simple way of running a Python program from within Python. If you want to run Python code using the standard Python interpreter, your choices are either
 +
# Launch it from outside Python, e.g. save to a file <code>myscript.py</code> and at the command line enter <code>python myscript.py</code>
 +
# Convert the program to a function and use the [http://docs.python.org/3/tutorial/modules.html Python module functionality], e.g. save the function in a file <code>myfunctions.py</code> and use Python's <source enclose=none lang="python">import</source> to make the function available.
 +
 +
The first method can be demonstrated by creating a text file <code>ReturnAnalysis.py</code> containing the following program (modified from the vectorised [[Python/Program_Flow_and_Logicals#More_advanced_example|More advanced example]] above).
 +
<source lang="python">import numpy
 +
n=500000000
 +
r=numpy.random.rand(n)*10-5
 +
 +
import time
 +
time1 = time.clock()
 +
rplus=r[r>0].copy()        # Create an array containing positive returns
 +
rminus=r[r<0].copy()        # Create an array containing negative returns
 +
Tplus=len(rplus)            # Count how many positive returns there are
 +
Tminus=len(rminus)          # Count how many negative returns there are
 +
Tzero=len(r)-Tplus-Tminus  # Calculate the number of zero returns
 +
meanplus=numpy.mean(rplus)        # Compute mean of positive returns using numpy.mean
 +
meanminus=numpy.sum(rminus)/Tminus # Compute mean of negative returns using numpy.sum
 +
time2 = time.clock()
 +
print(time2-time1)
 +
</source>
 +
In this example the array of values <source lang="python" enclose=none>r</source> is generated using the <source lang="python" enclose=none>rand</source> function, in a real scenario these values might be loaded from a file. To run this from the '''operating system command line''' we can enter <code>python ReturnAnalysis.py</code>. Note that this program outputs how long it takes to run, and on my desktop takes around 12.3s to complete (using the Anaconda Python distribution with the Accelerate package).
 +
 +
Using the second method we can create a function, the following example undertakes the computation and returns the values required.
 +
<source lang="python">def returnanalysis(r):
 +
  import numpy
 +
  rplus=r[r>0].copy()        # Create an array containing positive returns
 +
  rminus=r[r<0].copy()        # Create an array containing negative returns
 +
  Tplus=len(rplus)            # Count how many positive returns there are
 +
  Tminus=len(rminus)          # Count how many negative returns there are
 +
  Tzero=len(r)-Tplus-Tminus  # Calculate the number of zero returns
 +
  meanplus=numpy.mean(rplus)        # Compute mean of positive returns using numpy.mean
 +
  meanminus=numpy.sum(rminus)/Tminus # Compute mean of negative returns using numpy.sum
 +
  return meanplus, meanminus, Tplus, Tminus, Tzero
 +
</source>
 +
If this is saved to a file called <code>myfunctions.py</code>, we can import and use the function from the Python prompt as follows.
 +
<source lang="python">
 +
>>> import numpy
 +
>>> n=500000000
 +
>>> r=numpy.random.rand(n)*10-5
 +
>>> import myfunctions
 +
>>> mplus, mminus, Tp, Tm, Tz = myfunctions.returnanalysis(r)
 +
>>> mplus
 +
2.4999997176593398
 +
>>> mminus
 +
-2.4999816498237375
 +
</source>
 +
The Python prompt has various other limitations which mean it isn't ideal for interactive work. For example, it doesn't include commands like <source enclose=none>pwd</source>, <source enclose=none>cd</source>, <source enclose=none>pwd</source>, etc. An improved command line is included in [http://ipython.org/ IPython (Interactive Python)] which behaves far more like MATLAB. For example if we save the first Python code snippet from the [[Python/Program_Flow_and_Logicals#More_advanced_example|'''More advanced Example''' section (see above)]] to a file called <code>ReturnAnalysis1.py</code>, we can execute this program from within IPython using <source enclose=none lang="python">run</source>. For MATLAB-like behaviour, where a script can see the variables in the interactive namespace, we need to use <source enclose=none lang="python">run</source> with the <source enclose=none lang="python">-i</source> flag. The <source enclose=none lang="python">-t</source> flag is useful too, as it times how long the script takes to run.
 +
<source lang="python">
 +
In [1]: n=500000000
 +
In [2]: from numpy.random import rand
 +
In [3]: r=rand(n)*10-5
 +
In [4]: run -i -t ReturnAnalysis1
 +
 +
IPython CPU timings (estimated):
 +
  User  :    978.66 s.
 +
  System :      0.00 s.
 +
Wall time:    978.71 s.
 +
 +
In [5]: meanplus
 +
Out[5]: 2.5001402997170192
 +
 +
In [6]: meanminus
 +
Out[6]: -2.5000714107736286
 +
</source>
 +
The above example used the unvectorised version, and to demonstrate the importance of vectorisation in getting good performance we compare with the vectorised version (saved in <code>ReturnAnalysis2.py</code>).
 +
<source lang="python">
 +
In [7]: run -i -t ReturnAnalysis2
 +
 +
IPython CPU timings (estimated):
 +
  User  :      12.18 s.
 +
  System :      0.00 s.
 +
Wall time:      12.18 s.
 +
 +
In [8]: meanplus
 +
Out[8]: 2.5001402997170192
 +
 +
In [9]: meanminus
 +
Out[9]: -2.5000714107736286
 +
</source>
 +
Finally to compare with the vectorised MATLAB version (saved to a file called <code>ReturnAnalysis2.m</code>), the run time is as follows.
 +
<source>
 +
>> n=500000000;
 +
>> r=rand(n,1)*10-5;
 +
>> tic,ReturnAnalysis2,toc
 +
Elapsed time is 11.193218 seconds.
 +
</source>
 +
 +
=Footnotes=
 +
 +
<references />

Latest revision as of 13:37, 16 October 2013

The following assumes use of Python version 3, as opposed to Python 2. No more major releases are planned for Python 2, and so version 3 is expected to be the future of Python. The two versions of Python, although similar, are not compatible in a forwards or backwards direction[1], and some legacy code exists only as Python 2. Some differences between the two versions are discussed in the footnotes.

Preliminaries

One important thing to understand when programming in Python is that correct indenting of code is essential. The Python programming language was designed with readability in mind, and as a result forces you to indent code blocks, e.g.

  • while and for loops
  • if, elif, else constructs
  • functions

The indent for each block must be the same, the Python programming language also requires you to mark the start of a block with a colon. So where MATLAB used end to mark the end of a block of code, in Python a code block ends when the indenting reverts. Other than this, simple Python programmes aren't dissimilar to those in MATLAB.

For example, the simplest case of an if conditional statement in Python would look something like this

if condition:
   statement1
   statement2
   ...

where the code in lines statement1, statement2, ... is executed only if condition is True. Sharp sighted readers might spot another difference to MATLAB, in Python there is no need to add a semicolon at the end of a line to suppress output, since Python produces no output for lines involving assignment (i.e. lines with the = sign).

The boolean condition can be built up using relational and logical operators. Relational operators in Python are similar to those in MATLAB, e.g. == tests for equality, > and >= test for greater than and greater than or equal to respectively. The main difference is that!= tests for inequality in Python (compared to ~= in MATLAB). Relational operators return boolean values of either True or False.

And Python's logical operators are and, or and not, which are hopefully self explanatory.

The if functionality can be expanded using else as follows

if condition:
   statement1
   statement2
   ...
else:
   statement1a
   statement2a
   ...

where statement1, statement2, ... is executed if condition is True, and statement1a, statement2a, ... is executed if condition is False. Note that the code block after else starts with a colon, and this code block is also indented.

Finally, the most general form of this programming construct introduces the elif keyword (in contrast to elseif in MATLAB) to give

if condition1:
   statement1
   statement2
   ...
elif condition2:
   statement1a
   statement2a
   ...
   ...
   ...
elif conditionN:
   statement1b
   statement2b
   ...
else:
   statement1c
   statement2c
   ...

Like MATLAB, Python has while and for loops. Unconditional for loops iterate over a list or range of values, e.g.

for LoopVariable in ListOrRangeOfValues:
   statement1
   statement2
   ...

and repeat for as many times as there are elements in ListOrRangeOfValues, each time assigning the next element in the list/range to LoopVariable. The code block associated with the loop is identified by a colon and indenting as described above.

There are various ways of creating a list or range object in Python 3. The range function can be used to create sequences of integers with a defined start, stop and step value. The advantage of a range object over a Python list is that every single integer value is not stored in memory with a range. [2]. For example to create a range containing the four values 1, 4, 7 and 10, i.e. a sequence starting at 1 with steps of 3, we can use range(1,11,3). Note that the stop value passed to the range function is not included, i.e. range(1,10,3) would produce only the three numbers 1, 4 & 7. We can verify this at the Python command prompt, i.e.

>>> range(1,11,3)
[1, 4, 7, 10]
>>> range(1,10,3)
[1, 4, 7]

This might seems strange, but makes more sense when we realise the start and step values are optional, and the range function assumes default values of 0 and 1 respectively for these if they are not given, i.e. range(N) returns N values starting at 0, e.g.

>>> range(5)
[0, 1, 2, 3, 4]
>>> range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Python lists can be created from a sequence of values separated by commas within square brackets, e.g. MyList = [1.0, "hello", 1] creates a list called MyList containing 3 values, a floating point number 1.0, the string hello and an integer 1. This example demonstrates that Python lists are general purpose containers, and elements don't have to be of the same class. It is for this reason that lists are best avoided for numerical calculations unless they are relatively simple, as there are much more efficient containers for numbers, i.e. NumPy arrays, which will be introduced in due course.

Conditional while loops are identified with the while keyword, so

while condition:
   statement1
   statement2
   ...

will repeatedly execute the code block for as long as condition is True.

As in MATLAB, Python allows us to break out of for or while loops, or continue with the next iteration of a loop, using break and continue respectively.

for

We now look at Python equivalents of the MATLAB for ... end loop discussed in the MATLAB page on Program Flow and Logicals. A description of the mathematics can be found on the MATLAB page, for brevity it is not repeated here. In the case when the error terms in e are known in advance, the Python version of the algorithm is:

  1. Find length of the list containing the error terms e: T=len(e)
  2. Initialize a list y with the same length as vector e: y=[0.0]*T
  3. Compute y[0]=phi0+phi1*y0+e[0]. Please remember, we assume that [math]y_0=E(y)=\phi_0/(1-\phi_1)[/math]
  4. Compute y[i]=phi0+phi1*y[i-1]+e[i] for [math]i=1[/math]
  5. Repeat line 4 for [math]i=2,...,(T-1)[/math]

A simple implementation in Python follows (a description of how to run this code is given towards the end of this page).

T=len(e)
y=[0.0]*T
y0=phi0/(1-phi1)
y[0]=phi0+phi1*y0+e[0]
for i in range(1,T):
   y[i]=phi0+phi1*y[i-1]+e[i]

and for comparison the MATLAB code is

  T=size(e,1);
  y=zeros(T,1);
  y0=phi0/(1-phi1);
  y(1)=phi0+phi1*y0+e(1);
  for i=2:T
    y(i)=phi0+phi1*y(i-1)+e(i);
  end

One important difference to MATLAB is that Python list and array indexing starts at 0 and indices are placed inside square brackets (array indices start at 1 in MATLAB). It is also important to understand that Python generally assumes a number to be integer unless there is something to indicate it is a floating point value. Consider the line y=[0.0]*T that preallocates a Python list containing T floating point numbers all set to zero. If this had been written as y=[0]*T the list would contain T integers instead. We can demonstrate this at the Python prompt using the type function, which tells us the class of an object, e.g.

>>>type(0.0)
<class 'float'>
>>> type(0)
<class 'int'>
>>> type(0e0)
<class 'float'>

This is a good point to mention that the behaviour of integer division changed in Python 3, compared to version 2. In Python 2

>>>type(1/2)
<type 'int'>
>>> 1/2
0

whereas in Python 3

>>>type(1/2)
<class 'float'>
>>> 1/2
0.5

if else

As above, a description of the mathematics can be found on the MATLAB page on Program Flow and Logicals. The Python algorithm is now

  1. Find length of the list containing the error terms e: T=len(e)
  2. Initialize a list y with the same length as e: y=[0.0]*T
  3. Check whether abs(phi1)<1. If this statement is true, then y0=phi0/(1-phi1). Else, y0=0. Please remember, we set [math]y_0=E(y_0)[/math].
  4. Compute y[0]=phi0+phi1*y0+e[0].
  5. Compute y[i]=phi0+phi1*y[i-1]+e[i] for [math]i=1[/math]
  6. Repeat line 5 for [math]i=2,...,(T-1)[/math]

This can be implemented in Python as

T=len(e)
y=[0.0]*T
y0=0.0
if abs(phi1)<1:
   y0=phi0/(1-phi1)
y[0]=phi0+phi1*y0+e[0]
for i in range(1,T):
   y[i]=phi0+phi1*y[i-1]+e[i]

which is relatively similar to the MATLAB version

  T=size(e,1);
  y=zeros(T,1);
  y0=0;
  if abs(phi1)<1
  y0=phi0/(1-phi1);
  end
  y(1)=phi0+phi1*y0+e(1)
  for i=2:T
    y(i)=phi0+phi1*y(i-1)+e(i);
  end

while

The Python alternative of the above code using a conditional while loop implements the following algorithm. Remember that this contrived example is purely for demonstration purposes, and usually while loops are used when the number of iterations is not known in advance.

  1. Find length of the list containing the error terms e: T=len(e)
  2. Initialize a list y with the same length as e: y=[0.0]*T
  3. Check whether abs(phi1)<1. If this statement is true, then y0=phi0/(1-phi1). Else, y0=0.
  4. Compute y[0]=phi0+phi1*y0+e[0].
  5. Compute y[i]=phi0+phi1*y[i-1]+e[i] for [math]i=1[/math]
  6. Increase i by 1, i.e. [math]i=i+1[/math].
  7. Repeat lines 5-6 whilst [math]i\lt T[/math]

The Python code is a follows.

T=len(e)
y=[0.0]*T
y0=0.0
if abs(phi1)<1:
   y0=phi0/(1-phi1)
y[0]=phi0+phi1*y0+e[0]
i=1
while i < T:
   y[i]=phi0+phi1*y[i-1]+e[i]
   i+=1

This introduces a shorthand also used in other programming languages (e.g. C) as i+=1 is shorthand for i=i+1. This shorthand can be used with other operators, e.g. i*=10 is equivalent to typing i=i*10.

For comparison, the MATLAB code is

  T=size(e,1);
  y=zeros(T,1);
  y0=0;
  if abs(phi1)<1
  y0=phi0/(1-phi1);
  y(1)=phi0+phi1*y0+e(1)
  i=2;
  while i<=T
    y(i)=phi0+phi1*y(i-1)+e(i);
    i=i+1;
  end

Improvements on the above (avoiding loops)

Like MATLAB, Python allow us to adopt a programming style that both simplifies code, and also allows programs to run faster, in particular:

  1. Operators, functions and logical expressions can work not only on scalars, but also on vectors, matrices and, in general, on n-dimensional arrays
  2. Subvectors/submatrices can be extracted using logical arrays

Using Python Packages

The functionality that allows us to operate on whole vectors and matrices isn't part of core Python, and requires us to use a Python package called NumPy, which adds other useful functionality including pseudo-random number generators. There are many other Python Packages, these are listed at the Python Package Index.

Before using a Python package, the package must be imported, e.g.

import numpy

Functions within a package are located within namespaces. Namespaces are useful because they allow package writers to choose functions and variable names without worrying about whether that name has been used elsewhere. For example, NumPy includes a function called rand, which exists within a namespace called random. And the random namespace is within the NumPy namespace (which is called numpy). After importing NumPy we can use the rand function, but have to include the namespaces within the function call, e.g. to use rand at the Python command prompt (to generate 5 random numbers)

>>> import numpy
>>> A = numpy.random.rand(5)
>>> A
array([ 0.50639352,  0.44000756,  0.16118149,  0.69615487,  0.3887179 ])

So numpy.random.rand refers to the rand function in the numpy.random namespace. While this allows safe reuse of names, it does potentially introduce a lot of extra typing, and so Python includes ways to simplify our code. For example, we can import individual functions from a namespace as follows

>>> from numpy.random import rand
>>> A = rand(4)
>>> A
array([ 0.25254338,  0.95567921,  0.28244092,  0.92564069])

and we can also rename the function as we import it

>>> from numpy.random import rand as nprand
>>> A = nprand(4)
>>> A
array([ 0.96127673,  0.57402182,  0.36119553,  0.99832014])

In addition we can rename the namespace

>>> import numpy.random as npr
>>> A = npr.rand(4)
>>> A
array([ 0.4282803 ,  0.80106321,  0.7078212 ,  0.13823879])

Simple example

In the above example the NumPy rand function returned random values in a Numpy array, as can be demonstrated at the Python command line.

>>> import numpy
>>> A = numpy.random.rand(10)
>>> type(A)
<class 'numpy.ndarray'>
>>> A
array([ 0.64799452,  0.41578081,  0.11770639,  0.21143116,  0.98658862,
        0.35056233,  0.32420828,  0.5539366 ,  0.58682753,  0.53097958])

NumPy arrays have significant differences to MATLAB arrays (and NumPy also contains a matrix class) so it's important to read the NumPy documentation, which includes tutorials and a comparison of NumPy with MATLAB. One important difference is the copy function is used to copy values from one array to another, rather than assignment with =. For example, given a NumPy array A, the assignment B=A does not copy values in A to a new array B, instead A and B are simply two names for the same array of values. However B=A.copy() does copy all values in A into a new array B.

NumPy array (and Python list) slices work in subtly different ways to MATLAB's too. For example, A[m:n] returns all values from the element with the index m to the element with index n-1, and because the first element has index 0, we receive the (m+1)th to nth values, e.g.

>>> r=[1,2,3,4,5,6,7,8,9,10]
>>> r[0:10]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> r[4:6]
[5, 6]

Compare this to MATLAB

>> r=[1,2,3,4,5,6,7,8,9,10]
r =
     1     2     3     4     5     6     7     8     9    10
>> r(1:10)
ans =
     1     2     3     4     5     6     7     8     9    10
>> r(4:6)
ans =
     4     5     6

NumPy arrays are important because they can be used in whole array operations. Operations and function calls on whole arrays are much faster than the equivalent code using loops, as they allow optimal use of the processor (such code optimisation is often called vectorisation). In addition code using vector and matrix operations is often shorter and easier to read that the equivalent using loops.

For example we can test which values in A are greater than 0.5, and then copy those values to a new array called B as follows.

>>> A
array([ 0.64799452,  0.41578081,  0.11770639,  0.21143116,  0.98658862,
        0.35056233,  0.32420828,  0.5539366 ,  0.58682753,  0.53097958])
>>> ind = A > 0.5
>>> ind
array([ True, False, False, False,  True, False, False,  True,  True,  True], dtype=bool)
>>> B = A[ind].copy()
>>> B
array([ 0.64799452,  0.98658862,  0.5539366 ,  0.58682753,  0.53097958])

Another method of code optimisation is to preallocate arrays, this operation is much quicker than growing arrays on-the-fly. In this example we preallocate two arrays at the Python prompt with 10,000 elements each, the first array contains integers and the second contains double precision floating point numbers.

>>> n=10000
>>> A=numpy.zeros(n,int)
>>> B=A=numpy.zeros(n)

More advanced example

We now look at the Python equivalent of the code on the MATLAB page in the section entitled Relevant example, which assumed we have [math]T[/math] returns in a vector r and we want to:

  1. Count the number of positive, negative and zero returns
  2. Create an array holding only the positive values
  3. Create another array holding only the negative values
  4. Compute the means of the positive and negative returns

A naive Python algorithm that uses a loop rather than vectorisation is as follows.

  1. Find the length of the NumPy array holding r, i.e. T=numpy.size(r)
  2. Initiate three counter variables, Tplus=0; Tzero=0; Tminus=0
  3. Initiate two sum variables, psum=0.0; nsum=0.0
  4. Preallocate NumPy arrays rplus=numpy.zeros(T) and rminus=numpy.zeros(T) (since we don’t know how many negative and positive returns we will observe)
  5. Set i=0
  6. Check whether r[i] is greater, smaller or equal to 0
    • If r[i]>0, set rplus[Tplus]=r[i], add r[i] to psum, and add 1 to Tplus
    • Else if r[i]<0 set rminus[Tminus]=r[i], add r[i] to nsum and add 1 to Tminus
    • Else add 1 to Tzero
  7. Repeat 6 for [math]i=1,\ldots,(T-1)[/math]
  8. Remove spare zeros from rplus and rminus, i.e. rplus=rplus[0:Tplus].copy() and rminus=rminus[0:Tminus].copy()
  9. Compute means of rminus and rplus (the number of positive, negative and zero returns are stored in Tplus,Tminus,Tzero)

The Python code is as follows, however note that this code isn't completely free of vector operations, since removal of zeros from rplus and rminus is vectorised.

import numpy
T=numpy.size(r)
Tplus=0;Tminus=0;Tzero=0
psum=0.0;nsum=0.0
rplus=numpy.zeros(T);rminus=numpy.zeros(T)
for i in range(T):
   if r[i]>0:
      rplus[Tplus]=r[i]   #Store positive return in array rplus
      Tplus+=1            #Increase Tplus by one if return is positive
      psum+=r[i]          #Add return to sum of positive values
   elif r[i]<0:
      rminus[Tminus]=r[i] #Store negative return in array rminus
      Tminus+=1           #Increase Tminus by one if return is negative
      nsum+=r[i]          #Add return to sum of negative values
   else:
      Tzero+=1            #Increase Tzero by one if return is zero
rplus=rplus[0:Tplus].copy()      #Remove zeros from rplus
rminus=rminus[1:Tminus].copy()   #Remove zeros from rminus
meanplus=psum/Tplus              # Compute mean of positive returns 
meanminus=nsum/Tminus            # Compute mean of negative returns

We can create an alternative algorithm that only uses vector operations, using the following algorithm.

  1. Create an array rplus containing the positive values from r
  2. Create an array rminus containing the negative values from r
  3. Find the length of rplus and assign to Tplus
  4. Find the length of rminus and assign to Tminus
  5. Calculate Tzero
  6. Find the mean of rplus and rminus using vectorised functions
import numpy
rplus=r[r>0].copy()         # Create an array containing positive returns
rminus=r[r<0].copy()         # Create an array containing negative returns
Tplus=len(rplus)            # Count how many positive returns there are 
Tminus=len(rminus)          # Count how many negative returns there are 
Tzero=len(r)-Tplus-Tminus   # Calculate the number of zero returns
meanplus=numpy.mean(rplus)         # Compute mean of positive returns using numpy.mean
meanminus=numpy.sum(rminus)/Tminus # Compute mean of negative returns using numpy.sum

This version is much shorter and cleaner, and therefore easier to create and maintain.

Running Python programs

For people who are familiar with MATLAB it may be surprising to discover there is no simple way of running a Python program from within Python. If you want to run Python code using the standard Python interpreter, your choices are either

  1. Launch it from outside Python, e.g. save to a file myscript.py and at the command line enter python myscript.py
  2. Convert the program to a function and use the Python module functionality, e.g. save the function in a file myfunctions.py and use Python's import to make the function available.

The first method can be demonstrated by creating a text file ReturnAnalysis.py containing the following program (modified from the vectorised More advanced example above).

import numpy
n=500000000
r=numpy.random.rand(n)*10-5

import time
time1 = time.clock()
rplus=r[r>0].copy()         # Create an array containing positive returns
rminus=r[r<0].copy()        # Create an array containing negative returns
Tplus=len(rplus)            # Count how many positive returns there are 
Tminus=len(rminus)          # Count how many negative returns there are 
Tzero=len(r)-Tplus-Tminus   # Calculate the number of zero returns
meanplus=numpy.mean(rplus)         # Compute mean of positive returns using numpy.mean
meanminus=numpy.sum(rminus)/Tminus # Compute mean of negative returns using numpy.sum
time2 = time.clock()
print(time2-time1)

In this example the array of values r is generated using the rand function, in a real scenario these values might be loaded from a file. To run this from the operating system command line we can enter python ReturnAnalysis.py. Note that this program outputs how long it takes to run, and on my desktop takes around 12.3s to complete (using the Anaconda Python distribution with the Accelerate package).

Using the second method we can create a function, the following example undertakes the computation and returns the values required.

def returnanalysis(r):
   import numpy
   rplus=r[r>0].copy()         # Create an array containing positive returns
   rminus=r[r<0].copy()        # Create an array containing negative returns
   Tplus=len(rplus)            # Count how many positive returns there are 
   Tminus=len(rminus)          # Count how many negative returns there are 
   Tzero=len(r)-Tplus-Tminus   # Calculate the number of zero returns
   meanplus=numpy.mean(rplus)         # Compute mean of positive returns using numpy.mean
   meanminus=numpy.sum(rminus)/Tminus # Compute mean of negative returns using numpy.sum
   return meanplus, meanminus, Tplus, Tminus, Tzero

If this is saved to a file called myfunctions.py, we can import and use the function from the Python prompt as follows.

>>> import numpy
>>> n=500000000
>>> r=numpy.random.rand(n)*10-5
>>> import myfunctions
>>> mplus, mminus, Tp, Tm, Tz = myfunctions.returnanalysis(r)
>>> mplus
2.4999997176593398
>>> mminus
-2.4999816498237375

The Python prompt has various other limitations which mean it isn't ideal for interactive work. For example, it doesn't include commands like pwd, cd, pwd, etc. An improved command line is included in IPython (Interactive Python) which behaves far more like MATLAB. For example if we save the first Python code snippet from the More advanced Example section (see above) to a file called ReturnAnalysis1.py, we can execute this program from within IPython using run. For MATLAB-like behaviour, where a script can see the variables in the interactive namespace, we need to use run with the -i flag. The -t flag is useful too, as it times how long the script takes to run.

In [1]: n=500000000
In [2]: from numpy.random import rand
In [3]: r=rand(n)*10-5
In [4]: run -i -t ReturnAnalysis1

IPython CPU timings (estimated):
  User   :     978.66 s.
  System :       0.00 s.
Wall time:     978.71 s.

In [5]: meanplus
Out[5]: 2.5001402997170192

In [6]: meanminus
Out[6]: -2.5000714107736286

The above example used the unvectorised version, and to demonstrate the importance of vectorisation in getting good performance we compare with the vectorised version (saved in ReturnAnalysis2.py).

In [7]: run -i -t ReturnAnalysis2

IPython CPU timings (estimated):
  User   :      12.18 s.
  System :       0.00 s.
Wall time:      12.18 s.

In [8]: meanplus
Out[8]: 2.5001402997170192

In [9]: meanminus
Out[9]: -2.5000714107736286

Finally to compare with the vectorised MATLAB version (saved to a file called ReturnAnalysis2.m), the run time is as follows.

>> n=500000000;
>> r=rand(n,1)*10-5;
>> tic,ReturnAnalysis2,toc
Elapsed time is 11.193218 seconds.

Footnotes

  1. Although Python 2 and 3 are not totally compatible, Python 2.7 is close to Python 3. If you have to use Python 2, it is recommended using version 2.7, writing code as close to Python 3 as possible, and using tools like 2to3 to port to Python 3. Alternatively there is a Python compatibility packages called six.
  2. In Python 3 the range function creates a range object. However the Python 2 range function creates a list, i.e. stores every integer value required in memory which is very inefficient if simply looping through a long sequence of integers in a for loop. Python 2 has xrange that behaves like the Python 3 range.