©2018 Raazesh Sainudiin. Attribution 4.0 International (CC BY 4.0)

- What's this all about?
- Modular Arithmetic
- Linear Congruential Generators
- More Sophisticated Pseudo-Random Number Generators
- Accumulating sequences with
`numpy.cumsum`

- Simulating a Drunkard's Walk

How can we produce realisations from Uniform$(0,1)$, the **fundamental random variable**?

i.e., how can we produce samples $(x_1, x_2, \ldots, x_n)$ from $X_1, X_2, \ldots, X_n$ $\overset{IID}{\thicksim}$ Uniform$(0,1)$?

What is Sage doing when we ask for `random()`

?

In [9]:

```
random()
```

Out[9]:

Modular arithmetic and number theory gives us pseudo-random number generators.

What can we do with samples from a Uniform$(0,1)$ RV? Why bother?

We can use them to sample or simulate from other, more complex, random variables. These simulations can be used to understand real-world phenomenon such as:

- modelling traffic queues on land, air or sea for supply chain management
- estimate missing data in Statistical survey to better manage or administer
- helping a Hospital to manage critical care for pre-term babies
- helping NZ's Department of Conservation to minimise the extinction probability of various marine organisms
- help the Government find if certain fishing boats are illegally under-reporting their catches from NZ's waters
- find cheaper air tickets for a vacation
- various physical systems can be modeled. See http://en.wikipedia.org/wiki/Monte_Carlo_method for a bigger picture.

See how the modern history of Monte carlo starts in Los Alamos!

In [1]:

```
def showURL(url, ht=500):
"""Return an IFrame of the url to show in notebook with height ht"""
from IPython.display import IFrame
return IFrame(url, width='95%', height=ht)
showURL('http://en.wikipedia.org/wiki/Monte_Carlo_method',600)
```

Out[1]:

The starting point for Monte Carlo methods is modular arithmetic ...

Modular arithmetic (arithmetic modulo $m$) is a central theme in number theory and is crucial for generating random numbers from a computer (i.e., *machine-implementation of probabilistic objects*). Being able to do this is essential for computational statistical experiments and methods that help do this are called Monte Carlo methods. Such computer-generated random numbers are technically called *pseudo-random numbers*.

In this notebook we are going to learn to add and multiply modulo $m$ (this part of our notebook is adapted from William Stein's SageMath worksheet on Modular Arithmetic for the purposes of linear congruential generators). If you want a more thorough treatment see http://en.wikipedia.org/wiki/Modular_arithmetic.

In [2]:

```
showURL("http://en.wikipedia.org/wiki/Modular_arithmetic",500)
```

Out[2]:

`%`

? The modulus operator gives the remainder after division:

In [3]:

```
14%12 # "14 modulo 12" or just "14 mod 12"
```

Out[3]:

In [4]:

```
zeroto23Hours = range(0,24,1) # 0,1,2...,23 hours in the 24-hours clock
print(zeroto23Hours)
```

In [5]:

```
[x%12 for x in range(0,24,1)] # x%12 for the 24 hours in the analog clock
```

Out[5]:

In [6]:

```
set([x%12 for x in range(0,24,1)]) # unique hours 0,1,2,...,11 in analog clock by making a set out of the list
```

Out[6]:

Arithmetic modulo $m$ is like usual arithmetic, except every time you add or multiply, you also divide by $m$ and return the remainder. For example, working modulo $m=12$, we have:

$$8 + 6 = 14 = 2$$since $2$ is the remainder of the division of $14$ by $12$.

Think of this as like the hours on a regular analog clock. We already do modular addition on a regular basis when we think about time, for example when answering the question:

Question: If it is 8pm now then what time will it be after I have spent the 6 hours that I am supposed to dedicate this week toward this course?

Answer: <p style="color:#F8F9F9";>2am.</p>

Modular addition and multiplication with numbers modulo $m$ is well defined, and has the following properties:

- $a + b = b+a$ (addition is commutative)
- $a \cdot b = b \cdot a$ (multiplication is commutative)
- $a\cdot (b + c) = a\cdot b + a\cdot c$ (multiplication is distributive over addition)
- If $a$ is coprime to $m$ (i.e., not divisible by any of the same primes), then there is a unique $b$ (mod $m$) such that $a\cdot b=1$.

Let us make a matrix of results from addition and multiplication modulo 4.

Before that realise that arithmetic over a set $\mathbb{M}$ is merely a function from the Cartesian product $\mathbb{M} \times \mathbb{M}$ of all pairs in $\mathbb{M}$ to $\mathbb{M}$:

$$ \star : \mathbb{M} \times \mathbb{M} \to \mathbb{M} $$where $\star$ is usually $+$ for "addition" or $*$ for "multiplication".

Thus, we can think of arithmetic as a set of three-tuples: $$(x_1, x_2, x_3) \in \mathbb{M}^3 := \mathbb{M} \times \mathbb{M} \times \mathbb{M}, \quad \text{where} \quad x_3 = x_1 \star x_2. $$

In [7]:

```
mySageMatrix = matrix(2,3,[1, 2, 3, 4, 5, 6]) # this is how you make a 2X3 matrix in Sage
mySageMatrix
```

Out[7]:

In [8]:

```
type(mySageMatrix) # the type says a lot
```

Out[8]:

In [ ]:

```
mySageMatrix.
```

Let us list all the three-tuples for addition modulo 4 next.

In [9]:

```
m=4;
# list (i,j, (i+j) mod m) as (i,j) range in [0,1,2,3]
[(i,j,(i+j)%m) for i in range(m) for j in range(m)]
```

Out[9]:

In [10]:

```
[(i+j)%m for i in range(m) for j in range(m)] # just the third element of the three-tuple from above
```

Out[10]:

In [11]:

```
#addition mod m
matrix(m,m,[(i+j)%m for i in range(m) for j in range(m)])
```

Out[11]:

In [12]:

```
# multiplication mod m
matrix(m,m,[(i*j)%m for i in range(m) for j in range(m)])
```

Out[12]:

In the following interactive image (created by William Stein) we make an addition and multiplication table modulo $m$, where you can control $m$ with the slider. Try changing $m$ and seeing what effect it has:

In [13]:

```
import matplotlib.cm
cmaps = ['gray'] + list(sorted(matplotlib.cm.datad.keys()))
@interact
def mult_table(m=(4,(1..200)), cmap=("Color Map",cmaps)):
Add = matrix(m,m,[(i+j)%m for i in range(m) for j in range(m)])
Mult = matrix(m,m,[(i*j)%m for i in range(m) for j in range(m)])
print "Addition and multiplication table modulo %s"%m
show(graphics_array( (plot(Add, cmap=cmap),plot(Mult, cmap=cmap))),figsize=[10,5])
```

Look at this for a while. Make $m$ close to 200 slowly and see how the outputs of the two arithmetic operations change.

Answer the following questions (which refer to the default colour map) to be sure you understand the table:

- Question: Why is the top row and leftmost column of the multiplication table always black?
- Question: Why is there an anti-diagonal block line in the addition table (the left-most table)?
- Question: Why are the two halves of each table (the two pictures) symmetric about the diagonal?

You can change the colour map if you want to.

You should be able to understand a bit of what is happening here. See that there are two list comprehensions in there. Take the line `matrix(m,m,[(i+j)%m for i in range(m) for j in range(m)])`

. The list comprehension part is `[(i+j)%m for i in range(m) for j in range(m)]`

. Let's pull it out and have a look at it. We have set the default modulo `m`

to `4`

but you could change it if you want to:

In [14]:

```
m = 4
listFormAdd = [(i+j)%m for i in range(m) for j in range(m)]
listFormAdd
```

Out[14]:

This list comprehension doing double duty: remember that a list comprehension is like a short form for a for loop to create a list? This one is like a short form for one for-loop nested within another for-loop.

We could re-create what is going on here by making the same list with two for-loops. This one uses modulo `m = 4`

again.

In [23]:

```
m = 4
listFormAddTheHardWay = []
for i in range(m):
for j in range(m):
listFormAddTheHardWay.append((i+j)%m)
listFormAddTheHardWay
```

Out[23]:

Notice that the last statement in the list comprehension, `for j in range(m)`

, is the inner loop in the nested for-loop.

The next step that Stein's interactive image is to make a `matrix`

out of the `list`

. We won't be doing matrices in detail in this course (we decided to concentrate on `numpy.arrays`

and multi-dimensional `numpy.ndarrays`

with floating-point numbers for numerical computing instead), but if you are interested, this statement uses the `matrix`

function to make a `matrix`

out of the `list`

named `listFormAdd`

. The dimensions of the matrix are given by the first two arguments to the `matrix`

function: `(m, m,...)`

.

In [24]:

```
matrixForm = matrix(m,m, listFormAdd)
matrixForm
```

Out[24]:

Optionally, you can find out more about matrices from the documentation.

In [26]:

```
?matrix
```

In [27]:

```
%%sh
cat ~/all/software/sage/SageMath/src/sage/matrix/constructor.pyx
```

Try recreating the matrix for multiplication, just as we have just recreated the one for addition.

In [31]:

```
m = 4
listFormMultiply = [] # make the list non-empty!
```

In [32]:

```
listFormMultiply
```

Out[32]:

The simplest way to create a number modulo $m$ in Sage is to use the Mod(a,m) command. We illustrate this below.

In [15]:

```
Mod(8, 12)
```

Out[15]:

Let's assign it to a variable so that we can explore it further:

In [16]:

```
myModInt = Mod(8, 12)
myModInt
```

Out[16]:

In [17]:

```
type(myModInt)
```

Out[17]:

In [18]:

```
parent(myModInt)
```

Out[18]:

We will compare myModInt to a "normal" SageMath integer:

In [19]:

```
myInt = 8
myInt
```

Out[19]:

In [20]:

```
type(myInt)
```

Out[20]:

In [21]:

```
parent(myInt) # ZZ
```

Out[21]:

We can see that `myModInt`

and `myInt`

are different types, but what does this mean? How do they behave?

Try addition:

In [22]:

```
myModInt + 6
```

Out[22]:

In [23]:

```
myInt + 6
```

Out[23]:

Was this what you already expected?

What about multiplication?

In [24]:

```
myModInt * 6
```

Out[24]:

In [25]:

```
myInt * 6
```

Out[25]:

What's going on here? As we said above, arithmetic modulo mm is like usual arithmetic, except every time you add or multiply, you also divide by mm and return the remainder. 8 x 6 is 48, and the remainder of 48 divided by 12 (12 is our modulo) is 0.

What about this one? What's happening here?

In [26]:

```
Mod(-45,2018) # Raaz was born in the year
```

Out[26]:

In [27]:

```
Mod(-1,12) # what's an hour before mid-night
```

Out[27]:

Can you create the number giving your year of birth (±1 year) in a similar way. For example, if you are 19 years old now then find the number -19 modulo 2018.

In [ ]:

```
```

Let us assign 10 modulo 12 to a variable named `now`

:

In [49]:

```
now = Mod(10, 12)
now
```

Out[49]:

<img src="images/Week6NowEquals10.png" width=200>

Add -1 to `now`

: <img src="images/Week6NowEquals9.png" width=200>

Put in the expression to do this in SageMath into the cell below:

In [50]:

```
now -1
```

Out[50]:

And subtract 13 from the previous expression. <img src="images/Week6NowEquals8.png" width=250>

Put in the expression to do this in SageMath into the cell below:

In [51]:

```
now - 1 -13
```

Out[51]:

Also try adding 14 to the previous expression -- the new postition is not shown on this clock but you should be able to see what it should be.

<img src="images/Week6NowWas8.png" width=200>

Put in the expression to do this in SageMath into the cell below:

In [ ]:

```
```

And multiplying 2 by `now`

(or `now`

by 2)

In [ ]:

```
```

In [ ]:

```
```

In [ ]:

```
```

What happens if you try arithmetic with two modular integers? Does it matter if the moduli of both operands a and b in say a+b are the same if a and b are both modular integers? Does this make sense to you?

In [ ]:

```
```

In [ ]:

```
```

In [ ]:

```
```

(end of YouTrys)

A linear congruential generator (LCG) is a simple *pseudo-random number generator (PRNG)* - a simple way of imitating samples or realizations from the Uniform$(0,1)$.

"Pseudo-random" means that the numbers are not really random [From Ancient Greek ψευδής (pseudḗs, “false, lying”)].

We will look at what we mean by that as we find out about linear congruential generators.

The theory behind LCGs is easy to understand, and they are easily implemented and fast.

To make a LCG we need:

- a modulus $m$ ($m > 0$)
- an integer multiplier $a$ ($0 \le a < m$)
- an integer increment $c$ ($0 \le c < m$)
- an integer seed $x_0$ ($0 \le x_0 < m$)
- an integer sequence length $n$

Using these inputs, the LCG generates numbers $x_1, x_2, \ldots x_{n-1} $ where $x_{i+1}$ is calculated from $x_i$ as defined by the following recurrence relation:

$$x_{i+1} \gets mod \left(a x_i + c , m \right)$$In the above expression $\gets$ denotes "gets" as in variable assignment.

$x_0,x_1,\ldots,x_{n-1}$ is the sequence of pseudo-random numbers called the linear congruential sequence.

We can define a function parameterised by $(m,a,c,x_0,n)$ to give us a linear congruential sequence in the form of a list.

(Remember about function parameters? The function **parameters** here are `m`

, `a`

, `c`

, `x0`

, and `n`

. Note that the **return value** of the function is a list.

Let's see how we can define a function for our LCG next.'

In [28]:

```
def linConGen(m, a, c, x0, n):
'''A linear congruential sequence generator.
Param m is the integer modulus to use in the generator.
Param a is the integer multiplier.
Param c is the integer increment.
Param x0 is the integer seed.
Param n is the integer number of desired pseudo-random numbers.
Returns a list of n pseudo-random integer modulo m numbers.'''
x = x0 # the seed
retValue = [Mod(x,m)] # start the list with x=x0
for i in range(2, n+1, 1):
x = mod(a * x + c, m) # the generator, using modular arithmetic
retValue.append(x) # append the new x to the list
return retValue
```

`linConGen`

is doing!

The function is merely implementing the pseudocode or algorithm of the linear congruential generator using a for-loop and modular arithmetic: note that the generator produces integers modulo $m$.

Are all linear congruential generators as good as each other? What makes a good LCG?

One desirable property of a LCG is to have the longest possible period.

The **period** is the length of sequence we can get before we get a repeat. The longest possible period a LCG can have is $m$. Lets look at an example.

In [29]:

```
?mod
```

In [30]:

```
# first assign values to some variables to pass as arguments to the function
m = 17 # set modulus to 17
a = 2 # set multiplier to 2
c = 7 # set increment to 7
x0 = 1 # set seed to 1
n = 18 # set length of sequence to 18 = 1 + maximal period
L1 = linConGen(m, a, c, x0, n) # use our linConGren function to make the sequence
L1 # this sequence repeats itself with period 8
```

Out[30]:

`9, 8, 6, 2, 11, 12, 14`

. If you can't you can see that the sequence actually contains 8 unique numbers by making a set out of it:

In [32]:

```
Set(L1) # make a Sage Set out of the sequence in the list L1
```

Out[32]:

In [33]:

```
x0 = 3 # leave m, a, c as it is but just change the seed from 1 to 3 here
L2 = linConGen(m, a, c, x0, n)
L2 # changing the seed makes this sequence repeat itself over other numbers but also with period 8
```

Out[33]:

In [35]:

```
Set(L2) # make a Sage Set out of the sequence in the list L2
```

Out[35]:

At worst, a different `seed`

specified by `x0`

might make the sequence get stuck immediately:

In [36]:

```
x0 = 10 # leave m, a, c as it is but just change the seed to 10
L3 = linConGen(m, a, c, x0, n)
L3
```

Out[36]:

In [37]:

```
Set(L3) # make a Sage Set out of the sequence in the list L3
```

Out[37]:

*fixed point* at 10 for this LCG initialized at 10.

In [38]:

```
Set(L3)+Set(L2)+Set(L1) # + on Sage Sets gives the union of the three Sets and it is the set of integers modulo 17
```

Out[38]:

**Ring of integers modulo 17**.

What about changing the multiplier $a$?

In [39]:

```
m = 17 # same as before
a = 3 # change the multiplier from 2 to 3 here
c = 7 # same as before
x0 = 1 # set seed at 1
n = 18 # set length of sequence to 18 = 1 + maximal period
L4 = linConGen(m, a, c, x0, n)
L4 # now we get a longer period of 16 but with the number 5 missing
```

Out[39]:

In [41]:

```
Set(L4)
```

Out[41]:

In [42]:

```
x0 = 5 # just change the seed to 5
linConGen(m, a, c, x0, n)
```

Out[42]:

We want an LCG with a **full period** of $m$ so that we can use it with any seed and not get stuck at fixed points or short periodic sequences. This is a minimal requirement for simulation purposes that we will employ such sequences for. Is there anyway to know what makes a LCG with a full period?

It turns out that an LCG will have a full period if and only if:

- $c$ and $m$ are relatively prime or coprime. i.e. the
**greatest common divisor (gcd)**of $c$ and $m$ is $1$; and - $a-1$ is divisible by all
**prime factors**of $m$; and - $a-1$ is a multiple of 4 if $m$ is a multiple of 4.

(Different conditions apply when $c=0$. The Proposition and Proof for this are in Knuth, The Art of Computer Programming, vol. 2, section 3.3).

SageMath has a function `gcd`

which can calculate the greatest common divisor of two numbers:

In [43]:

```
?gcd
```

In [46]:

```
gcd(7,17)
```

Out[46]:

SageMath can also help us to calculate **prime factors** with the `prime_factors`

function:

In [47]:

```
prime_factors(m)
```

Out[47]:

In [48]:

```
prime_factors(7634887623876428376482746)
```

Out[48]:

In [49]:

```
(m,a,c,x0,n)=(256, 137, 123, 13, 256) # faster way to assign a bunch of parameters
```

In [50]:

```
gcd(c,m) # checking if the greatest common divisor of c and m is indeed 1
```

Out[50]:

In [51]:

```
prime_factors(m) # it is easy to get a list of all the prime factors of m
```

Out[51]:

In [52]:

```
(a-1) % 2 # checking if a-1=136 is divisible by 2, the only prime factors of m
```

Out[52]:

In [53]:

```
[ (a-1)%x for x in prime_factors(m) ] # a list comprehension check for an m with more prime factors
```

Out[53]:

In [54]:

```
m % 4 # m is a multiple of 4 check
```

Out[54]:

In [55]:

```
(a-1) % 4 # if m is a multiple of 4 then a-1 is also a multiple of 4
```

Out[55]:

In [56]:

```
(m, a, c, x0, n) # therefore these parameter values satisfy all conditions to have maximum period length m
```

Out[56]:

Thus, the parameters $(m, a, c, x_0, n) = (256, 137, 123, 13, 256)$ do indeed satisfy the three conditions to guarantee the longest possible period of 256.

In contrast, for the LCG example earlier with $m=17$, $a=2$, and $c=7$, although $m$ and $c$ are relatively prime, i.e., $gcd(m,c)=1$, we have the violation that $a-1=2-1=1$ is not divisible by the only prime factor $17$ of $m=17$. Thus, we cannot get a period of maximal length $17$ in that example.

In [57]:

```
[ (2-1)%x for x in prime_factors(17) ]
```

Out[57]:

In [58]:

```
(m,a,c,x0,n)=(256, 137, 123, 13, 256) # faster way to assign a bunch of parameters
ourLcg = linConGen(m,a,c,x0,n)
print(ourLcg)
```

In [59]:

```
S = Set(ourLcg) # sort it in a set to see if it indeed has maximal period of 256
S
```

Out[59]:

ourLcg is an example of a **good** LCG.

The next example will demonstrate what can go wrong.

Consider $m= 256$, $a = 136$, $c = 3$, $x_0 = 0$, $n = 15$.

In [60]:

```
m,a,c,x0,n = 256, 136, 3, 0, 15
gcd(m,c)
```

Out[60]:

In [61]:

```
prime_factors(m)
```

Out[61]:

But, since $a-1=$135 is not divisible by 2, the only prime factor of $m=$256, we get into the fixed point 91 no matter where we start from.

See if changing the seed $x_0$ makes a difference?

In [62]:

```
linConGen(m, a, c, x0, n)
```

Out[62]:

In [ ]:

```
```

`ourLcg`

.

In [63]:

```
m, a, c, x0, n = 256, 137, 0, 123, 256
lcgKnuth334T1L5 = linConGen(m, a, c, x0, n)
print(lcgKnuth334T1L5)
```

In [64]:

```
Set(lcgKnuth334T1L5)
```

Out[64]:

Note that although `ourLcg`

has maximal period of 256, `lcgKnuth334T1L5`

has period of 32. Let us look at them as points.

We can plot each number in the sequence, say `ourLcg`

, against its index in the sequence -- i.e. plot the first number 13 against 0 as the tuple (0, 13), the second number against 1 as the tuple (1, 112), etc.

To do this, we need to make a list of the index values, which is simple using `range(256)`

which as you know will give us a list of numbers from 0 to 255 going up in steps of 1. Then we can `zip`

this list with the sequence itself to make the `list`

of the desired `tuples`

.

In [65]:

```
ourPointsToPlot = zip(range(256), ourLcg)
knuPointsToPlot = zip(range(256), lcgKnuth334T1L5)
```

Then we plot using `points`

for `ourLcg`

and for `lcgKnuth334T1L5`

.

In [66]:

```
?text
```

In [67]:

```
p1 = points(ourPointsToPlot, pointsize='1')
t1 = text('ourLcg', (150,300), rgbcolor='blue',fontsize=10)
p2 = points(knuPointsToPlot, pointsize='1',color='black')
t2 = text(' lcgKnuth334T1L5', (150,300), rgbcolor='black',fontsize=10)
show(graphics_array((p1+t1,p2+t2)),figsize=[6,3])
```

We can see that in section 3.3.4, Table 1, line 5 in The Art of Computer Programming, Knuth is giving an example of a particularly bad LCG.

When we introducted LCGs, we said that using an LCG was a simple way to imiate independent samples from the Uniform$(0, 1)$ RV, but clearly so far we have been generating sequences of integers. How does that help?

To get a simple pseudo-random Uniform$(0,1)$ generator, we scale the linear congruential sequence over [0, 1]. We can do this by dividing each element by the largest number in the sequence (256 in the case of `ourLcg`

).

**Important note:** The numbers in the list returned by our `linConGen`

function are integers modulo $m$.

In [68]:

```
type(ourLcg[0])
```

Out[68]:

In [69]:

```
type(RR(ourLcg[0]))
```

Out[69]:

Having sorted that out, we can use a list comprehension as a nice neat way to do our scaling:

In [70]:

```
#ourLcgScaled = [RR(x)/256 for x in ourLcg] # convert to mpfr real numbers before division
ourLcgScaled = [(RR(x)/256).n(digits=8) for x in ourLcg] # convert to mpfr real numbers with non-zero digits
#ourLcgScaled = [int(x)/256 for x in ourLcg] # or convert x to usual integer before division
#ourLcgScaled = [x/256 for x in ourLcg] # a very bad idea
print(ourLcgScaled)
```

`range(256)`

(to get the indexes 0, .., 255) and `zip`

:

In [71]:

```
ourPointsToPlotScaled = zip(range(256), ourLcgScaled)
p = points(ourPointsToPlotScaled, pointsize='1')
show(p, figsize = (3,3))
```

In [72]:

```
import pylab # This is a nice Python numerics module on top of mumpy
pylab.clf() # clear current figure
n, bins, patches = pylab.hist(ourLcgScaled, 40) # make the histogram (don't have to have n, bins, patches = ...)
pylab.xlabel('pseudo-random number') # use pyplot methods to set labels, titles etc similar to as in matlab
pylab.ylabel('count')
pylab.title('Count histogram for linear congruential prns')
pylab.savefig('myHist', dpi=(50)) # save figure (dpi) to display the figure
pylab.show() # and finally show it
```

Is this roughly what you expected?

(Please note: we are using the histogram plots to help you to see the data, but you are not expected to be able to make one yourself.)

We could repeat this for the Knuth bad LCG example:

In [73]:

```
%%sh
ls
```

In [74]:

```
knuLcgScaled = [RR(x)/256 for x in lcgKnuth334T1L5]
knuPointsToPlotScaled = zip(range(256), knuLcgScaled)
p = points(knuPointsToPlotScaled, pointsize='1', color='black')
show(p, figsize = (3,3))
```

In [75]:

```
import pylab
pylab.clf() # clear current figure
n, bins, patches = pylab.hist(knuLcgScaled, 40) # make the histogram (don't have to have n, bins, patches = ...)
pylab.xlabel('pseudo-random number [Knuth 3.3.4 Table 1, Line 5]') # use pyplot methods to set labels, titles etc similar to as in matlab
pylab.ylabel('count')
pylab.title('Count histogram for linear congruential prns')
pylab.savefig('myHist', dpi=(50)) # save figure (dpi) to display the figure
pylab.show() # and finally show it
```

The above generators are cute but not useful for simulating Lotto draws with 40 outcomes. Minimally, we need to increase the period length with a larger modulus $m$.

But remember that the quality of the pseudo-random numbers obtained from a LCG is extremely sensitive to the choice of $m$, $a$, and $c$.

To illustrate that having a large $m$ alone is not enough we next look at **RANDU**, an infamous LCG which generates sequences with strong correlations between 3 consecutive points, which can been seen if we manipulate the sequence to make 3-dimensional tuples out of groups of 3 consecutive points.

In the cell below we make our scaled sequence in one step, using a list comprehension which contains the expression to generate the LCG and the scaling.

In [76]:

```
m, a, c, x0, n = 2147483648, 65539, 0, 1, 5010
RANDU = [RR(x)/m for x in linConGen(m, a, c, x0, n)]
```

Have a look at the results as a histogram:

In [77]:

```
import pylab
pylab.clf() # clear current figure
n, bins, patches = pylab.hist(RANDU, 40) # make the histogram (don't have to have n, bins, patches = ...)
pylab.xlabel('pseudo-random number of RANDU') # use pyplot methods to set labels, titles etc similar to as in matlab
pylab.ylabel('count')
pylab.title('Count histogram for linear congruential prns')
pylab.savefig('myHist', dpi=(50)) # save figure (dpi) to display the figure
pylab.show() # and finally show it
```

`zip`

the two columns together to make tuples (just pairs or two-tuples).

In [78]:

```
import pylab
m, a, c, x0, n = 2147483648, 65539, 0, 1, 5010
randu = pylab.array(linConGen(m, a, c, x0, n))
```

In [79]:

```
pylab.shape(randu)
```

Out[79]:

In [80]:

```
randu.resize(5010/2, 2) # resize the randu array to 2 columns
```

In [81]:

```
pylab.shape(randu)
```

Out[81]:

In [83]:

```
seqs = zip(randu[:, 0], randu[:, 1]) # zip to make tuples from columns
p = points(seqs, pointsize='1', color='black')
show(p, figsize = (3,3))
```

In [84]:

```
import pylab
m, a, c, x0, n = 2147483648, 65539, 0, 1, 5010
randu = pylab.array(linConGen(m, a, c, x0, n))
randu.resize(5010/3, 3) # resize the randu array to 3 columns
seqs = zip(randu[:, 0], randu[:, 1], randu[:, 2]) # zip to make tuples from columns
point3d(seqs, size=3)
```

Out[84]:

You can alter your perspective on this image using the mouse. From a particular perspective you can see that something has gone horribly wrong ... **RANDU is a really ugly LCG**.

The above generators are of low quality for producing pseudo-random numbers to drive statistical simulations. We end with a positive note with a LCG that is in use in the Gnu Compiler Collection. It does not have obvious problems as in small periods or as high a correlation as RANDU.

In [85]:

```
#jmol error JmolInitCheck is not defined
import pylab
glibCGCCLcg = pylab.array([RR(x)/(2^32) for x in linConGen(2^32, 1103515245,12345,13,5010)])
glibCGCCLcg.resize(1670, 3) # resize the randu array to 3 columns
seqs = zip(glibCGCCLcg[:, 0], glibCGCCLcg[:, 1], glibCGCCLcg[:, 2]) # zip to make tuples from columns
point3d(seqs, size=3)
```

Out[85]:

In [107]:

```
import pylab
glibCGCCLcg = pylab.array([RR(x)/(2^32) for x in linConGen(2^32, 1103515245,12345,13,5010)])
pylab.clf() # clear current figure
n, bins, patches = pylab.hist(glibCGCCLcg, 40) # make the histogram (don't have to have n, bins, patches = ...)
pylab.xlabel('pseudo-random number of glibc used by Gnu Compiler Collection') # use pyplot methods to set labels, titles etc similar to as in matlab
pylab.ylabel('count')
pylab.title('Count histogram for linear congruential prns')
pylab.savefig('myHist',dpi=(50)) # seem to need to have this to be able to actually display the figure
pylab.show() # and finally show it
```

**Even good LCG are not suited for realistic statistical simulation problems**.

This is because of the strong correlation between successive numbers in the sequence. For instance, if an LCG is used to choose points in an n-dimensional space, the points will lie on, at most, $m^{1/n}$ hyper-planes. There are various statistical tests that one can use to test the quality of a pseudo-random number generator. For example, the spectral test checks if the points are not on a few hyper-planes. Of course, the Sample Mean, Sample Variance, etc. should be as expected. Let us check those quickly:

Recall that the population mean for a Uniform$(0,1)$ RV is $\frac{1}{2}$ and the population variance is $\frac{1}{12}$.

In [86]:

```
glibCGCCLcg.mean() # check that the mean is close to the population mean of 0.5 for Uniform(0,1) RV
```

Out[86]:

In [87]:

```
glibCGCCLcg.var() # how about the variance
```

Out[87]:

In [88]:

```
1/12.0
```

Out[88]:

We will use a pseudo-random number generator (PRNG) called the Mersenne Twister for simulation purposes in this course. It is based on more sophisticated theory than that of LCG but the basic principles of recurrence relations are the same.

(The Mersenne Twister is a variant of the recursive relation known as a twisted generalised feedback register. See [Makato Matsumoto and Takuji Nishimura, "Mersenne Twister: A 623-dimensionally equidistributed uniform pseudo-random number generator, ACM Transactions on Modelling and Computer Simulation, vol. 8, no. 1, Jan. 1998, pp. 3-20.], or at http://en.wikipedia.org/wiki/Mersenne_twister.)

The Mersenne Twister has a period of $2^{19937}-1 \approx 10^{6000}$ (which is essentially a Very Big Number) and is currently widely used by researchers interested in statistical simulation.

In [89]:

```
showURL("http://en.wikipedia.org/wiki/Mersenne_twister",500)
```

Out[89]:

In [90]:

```
?random # find out the source file for this function
```

In [92]:

```
%%sh
# you can just cat or concatenate the source file to see what is really under the hoood! Power of SageMath/Python!!
#cat ~/all/software/sage/SageMath/local/lib/python2.7/site-packages/sage/misc/prandom.py
```

`numpy.cumsum`

¶Our example using prngs is going to use a very useful function from the `numpy`

module called `cumsum`

. This calculates a cumulative sum from a sequence of numeric values. What do we mean by a cumulative sum? If we have a sequence $x_0, x_1, x_2, x_3, \ldots, x_n$, then we can derive a sequence that is the cumulative sum,

Try evaluating the next cell to get the cumulative sum of the sequence 0, 1, 2, ..., 9

In [93]:

```
from numpy import cumsum, array # make sure we have the numpy classes and methods we need
cumsum(range(10))
```

Out[93]:

Since `pylab`

is a superset of `numpy`

, we can also do the following imports instead:

In [94]:

```
from pylab import cumsum, array # make sure we have the pylab classes and methods we need
cumsum(range(10))
```

Out[94]:

`pylab.array`

or `numpy.array`

. This can be useful, but all we need is a list, we can convert this back to a list with the array's `tolist`

method:

In [95]:

```
cumsum(range(10)).tolist()
```

Out[95]:

Using the `list`

function to make a list will do the same thing:

In [96]:

```
list(cumsum(range(10)))
```

Out[96]:

Now, say we have some probabilities in a list and we want to get the cumulative sum of them.

We can use `cumsum`

as we did above. We are going to start with just two probabilties which (for reasons that will shortly become clear), we will assign as values to variables `pLeft`

and `pRight`

.

In [97]:

```
pLeft, pRight = 0.25, 0.25 # assign values to variables
cumulativeProbs = cumsum((pLeft, pRight)).tolist() # cumsum works on a numeric tuple
cumulativeProbs
```

Out[97]:

`cumsum`

works on a `tuple`

as well as a `list`

, providing that the elements of the `tuple`

are some type of number.

You are now going to use some simulated Uniform$(0,1)$ samples to simulate a Drunkard's Walk.

The idea of the Drunkard's Walk is that the Drunkard has no idea where s/he is going: at each decision point s/he makes a random decision about which way to go. We simulate this random decison using our Uniform$(0,1)$ pseudo-random samples.

We are going to have quite a limited version of this. At each decision point the Drunkard can either go one unit left, right, up or down (pretend we are in the middle of Manhattan with its streets and avenues).

Effectively, s/he is moving around on a $(x, y)$ coordinate grid. The points on the grid will be tuples. Each tuple will have two elements and will represent a point in 2-d space. For example, $(0,0)$ is a tuple which we could use as a starting point.

First, we recall useful feature of tuples: we can unpack a tuple and assign its elements as values to specified variables:

In [98]:

```
some_point = (3,2) # make a tuple
some_point_x, some_point_y = some_point # unpack the tuple and assign values to variables
some_point_x # disclose one of the variables
```

Out[98]:

In [99]:

```
some_point_y
```

Out[99]:

We use this useful feature in the functions we have defined below.

You now know what is happening in each step of these functions and should be able to understand what is happening.

In [101]:

```
def makeLeftTurnPoint(listOfPoints):
'''Function to make a point representing the destination of a left turn from the current path.
Param listOfPoints is a list of tuples representing the path so far.
Returns a new point to the immediate left of the last point in listOfPoints.
Tuples in the list represent points in 2d space.
The destination of a left turn is a tuple representing a point which on a 2d axis would
be to the immediate left of the last point in the list representing the current path.'''
newPoint = (0,0) # a default return value
if len(listOfPoints) > 0: # check there is at least one point in the list
lastPoint_x, lastPoint_y = listOfPoints[len(listOfPoints)-1] # unpack the last point in the list
new_x = lastPoint_x - 1 # a new x coordinate, one unit to the left of the last one in the list
newPoint = (new_x, lastPoint_y) # a new point one unit to the left of the last one in the list
return newPoint
```

In [102]:

```
def makeRightTurnPoint(listOfPoints):
'''Function to make a point representing the destination of a right turn from the current path.
Param listOfPoints is a list of tuples representing the path so far.
Returns a new point to the immediate right of the last point in listOfPoints.
Tuples in the list represent points in 2d space.
The destination of a right turn is a tuple representing a point which on a 2d axis would
be to the immediate right of the last point in the list representing the current path.'''
newPoint = (0,0) # a default return value
if len(listOfPoints) > 0: # check there is a least one point in the list
lastPoint_x, lastPoint_y = listOfPoints[len(listOfPoints)-1] # the last point in the list
new_x = lastPoint_x + 1 # a new x coordinate one unit to the right of the last one in the list
newPoint = (new_x, lastPoint_y) # a new point one unit to the right of the last one in the list
return newPoint
```

You should be thinking "Hey that is a lot of duplicated code: I thought that functions should help us not to duplicate code!". You are completely right, but this example will be easier to understand if we just live with some bad programming for the sake of clarity.

Now, experience the flexibility of Python: We can have lists of functions!

In [104]:

```
movementFunctions = [makeLeftTurnPoint, makeRightTurnPoint]
type(movementFunctions[0])
```

Out[104]:

In [105]:

```
somePath = [(0,0), (1,0), (2,0)]
# find the point that is the rightTurn from the last point in the path
```

`len(somePath)`

and the indexing operator `[ ]`

.

We put the functions into the list in order, left to right, so we know that the first function in the list (index 0) is `makeLeftTurnPoint`

, and the second function in the list (index 1) is `makeRightTurnPoint`

.

We can now call `makeLeftTurnPoint`

by calling `movementFunctions[0]`

, and passing it the arguement `somePath`

which is our example path so far. We should get back a new point (tuple) which is the point to the immediate left of the last point in the list `somePath`

: