13 Jun 2015, 21:52

Gaussian Random Fields

Random fields are an example of a more formally defined noise process that can appear visually similar to some forms of procedurally generated noise like Perlin noise or simplex noise.

There are some really nice of examples of descriptions for random fields and in particular Gaussian random fields on Wikipedia. However, with a few exceptions, there doesn’t seem to be a great deal of literature available on how to generate such fields. Furthermore, while there is the RandomFields package on CRAN for R, there don’t appear to be any well known Python packages.

In the search for a good library, I did find an amazing question on the Mathematica Stack Exchange, where a number of different techniques for genering these fields have been presented.

This Python snippet is a simplified version of the current top voted answer

def fftIndgen(n):
    a = range(0, n/2+1)
    b = range(1, n/2)
    b.reverse()
    b = [-i for i in b]
    return a + b

def gaussian_random_field(Pk = lambda k : k**-3.0, size = 100):
    def Pk2(kx, ky):
        if kx == 0 and ky == 0:
            return 0.0
        return np.sqrt(Pk(np.sqrt(kx**2 + ky**2)))
    noise = np.fft.fft2(np.random.normal(size = (size, size)))
    amplitude = np.zeros((size,size))
    for i, kx in enumerate(fftIndgen(size)):
        for j, ky in enumerate(fftIndgen(size)):            
            amplitude[i, j] = Pk2(kx, ky)
    return np.fft.ifft2(noise * amplitude)

for alpha in [-4.0, -3.0, -2.0]:
    out = gaussian_random_field(Pk = lambda k: k**alpha, size=256)
    plt.figure()
    plt.imshow(out.real, interpolation='none')

Which produces figures like:

Image 3 Image 1 Image 2

02 Jun 2015, 21:23

Snakes and Ladders

One of the simplest non-trivial examples of something interesting to do with Markov chains (that’s pretty easy to visualise and understand) is working out how many moves (on average) it takes to complete a game of snakes and ladders? This selection of an example is somewhat intentional because it’s the same example as the wikipedia entry for absorbing Markov chains, and because it’s just hard enough to actually justify solving the problem properly.

A full description of absorbing Markov chains is available on wikipedia, which is a near verbatim transcription from Kemeny and Snell’s book (also available online). In short, a (Discrete Time, Finite State) Markov chain is a system that describes the evolution of probability of going from one state to another in an instantaneous transition. Transitions between states are modelled using a stochastic matrix (a matrix where the rows must sum to one).

The wikipedia article on snakes and ladders quotes the solution as 39.6, for the Milton Bradley version of Chutes and Ladders. That’s a useful baseline for comparison, because the board layout is easy to find online. It’s also worth noting that the page for absorbing Markov chains quotes the solution as 39.2.

Each of the states in the chain represents a location on the board, and we’ll have one extra state (state 0) that represents the initial state off the board. That means we can conveniently number those state 0 - 100 (inclusive), where 0 is the initial state off the board, and 100 is the terminal state.

Ignoring (initially) the chutes and ladders, a transition from a state results in you advancing by the number of squares on a six sided dice. From any state $x$, you can reach states $x+1, x+2, …, x+6$, unless such a transition would take you off the board (beyond state 100), where the transition is invalid (which means you stay where you are). The probability of each of these transitions is 16, except for the edge case where the move is invalid - in those cases a small adjustment is needed to keep the player in that states.

def snakes_and_ladders_matrix():
    n = 101
    A = np.zeros((n, n))
    for i in xrange(6):
        A += np.diag(np.ones(n-1-i),i+1)

    # work out the cells that are reachable for every initial cell 
    b = np.sum(A, axis=1)

    # update the transition matrix to have a self-edge for every missing edge 
    A += np.diag(6 - b)

    # normalize
    A /= 6.
    
    return A

A = snakes_and_ladders_matrix()

We can verify that the resulting matrix is a valid (right) stochastic matrix trivially - and it’s always a good idea to enforce constraints like this when working with Markov Chains.

np.testing.assert_allclose( 1, np.sum(A, axis=1) )

So for the game with no snakes or ladders, this is enough to show how many moves it take to finish the game. The equation for mean number of steps can be written in terms of the fundamental matrix $N$ and the identity matrix $I$:

$$ N = (I-Q)^{-1} $$

The equation is then

$$ t = N \mathbf{1} $$

Where $\mathbf{1}$ is the column vector of ones. While it’s possible to solve the equation like this, but only just, it’s not an effective way to use a computer to solve the equation. Solutions to linear equations on a computer are solved in the form:

$$ A x = b $$

But our unknown is on the wrong side of the equation. Somewhat conveniently, multiplying through by $I-Q$ gives:

$$ (I-Q) t = \mathbf{1} $$

This means we can solve the equation:

Q = A[:-1,:-1]
I = np.identity(Q.shape[0])
o = np.ones((Q.shape[0],1))
result = np.linalg.solve(I-Q, o)
print 'mean turns to complete (without snakes and ladders)', result[0, 0]

Which gives 33.3, which is close enough to give confidence that we’re in the ballpark when we add the snakes and ladders.

ladders = np.array([
    [1, 38],  [4, 14],  [9, 31],  [21,42],
    [28,84],  [36,44],  [51,67],  [71,91],
    [80,100]
]) 

snakes = np.array([
    [98, 78],  [95, 75],  [93, 73],  [87, 24],
    [64, 60],  [62, 19],  [56, 53],  [49, 11],
    [48, 26],  [16, 6],
]) 

def snakes_and_ladders_matrix():
    n = 101
    A = np.zeros((n, n))
    for i in xrange(6):
        A += np.diag(np.ones(n-1-i),i+1)

    # work out the cells that are reachable for every initial cell 
    b = np.sum(A, axis=1)

    # update the transition matrix to have a self-edge for every missing edge 
    A += np.diag(6 - b)

    edges = np.vstack([ladders, snakes])

    for src, dst in edges:
        b = A[:, src].copy()
        A[:, dst] += b
        A[:, src] = 0
    
    A /= 6.
    return A


A = snakes_and_ladders_matrix()
assert_allclose( 1, np.sum(A, axis=1) )

And running the code gives the solution

Q = A[:-1,:-1]
I = np.identity(Q.shape[0])
o = np.ones((Q.shape[0],1))
result = np.linalg.solve(I-Q, o)
print 'mean turns to complete', result[0, 0]

Gives 39.598

27 May 2015, 20:21

Stupid Things to do with SCons

TL;DR Don’t use the Python callstack to find variables - it can lead to subtle problems for users of your code.

This post came about as part of an investigation into extending SCons functionality without having to alter the source code. The primary objective was to validate that the absolute path of an SConscript file exists, or provide (improved) feedback if it doesn’t. This is important in situations where you’re using variant build directories (debug and release) and your builds are spread over many directories. This is one of the things SCons is particularly poor at reporting diagnostics for.

If this was just Python, without the magic of SCons happening, this really would be trivial, in fact it’s one of those things that Python excels at.

import os
path = 'path/to/your/SConscript'
path = os.path.abspath(os.path.normpath(path))
if not os.path.exists(path):
    raise Exception('SConcript does not exist at %s' % path)
SConscript(path, ...)

Given enough SConscript files, this pattern is annoying and worthy of refactoring. One way to do that would be to pull that code out, and put it into an external module. Given that we’ll always want to do this with our calls to SConscript, it’s probably easier to adjust the SConscript method, rather than adding a second function. Altering the behavior of Python functions and methods at runtime by pre- or post- fixing code is most easily achieved by using decorators, which is really just a fancy name for a function that wraps another function.

A modern decorator would look something like this:

# scommon.py
import os
import wrapt

@wrapt.decorator
def sconcript_validator(wrapped, instance, args, kwargs):
    path = args[0]
    path = os.path.abspath(os.path.normpath(path))
    if not os.path.exists(path):
        raise Exception('SConcript does not exist at %s' % path)
    return wrapped(*args, **kwargs)

Note that I’ve chosen to make use of the wrapt which deals with all the ugly corner cases of decorators for me.

This means that I can rewrite my SConstruct as:

# SConstruct
import scommon 
menv = Environment()
menv.SConscript = scommon.sconcript_validator(menv.SConscript)

And at that point I should be able to run SCons as per normal, with a nicely updated SConscript function, and everything should be rainbows and unicorns.

Except it isn’t. Instead SCons fails with the traceback

$ scons
scons: Reading SConscript files ...

scons: *** Export of non-existent variable ''menv''
File "/Users/username/tmp/scommon.py", line 10, in sconcript_validator

This occurs because SCons does much of it’s heavy lifting by making use of the Python callstack, finding targets via introspection of the call frames. Unfortunately, the way it does this is somewhat fragile, expecting calling code to be a certain number of stack frames above (the internals of) SConscript.

From SCons.Script.SConcript:

# This is the magic line that actually reads up
# and executes the stuff in the SConscript file
# ...
call_stack[-1].globals.update({stack_bottom:1})
old_file = call_stack[-1].globals.get('__file__')

And eventually:

exec _file_ in call_stack[-1].globals

The consequence of this is that, to the best of my knowledge, there is no easy way to decorate the SConscript method.

All’s not lost, the dynamic nature of Python means we can reach directly into the SConcript module, introspect data about the code, including getting the original source code using inspect, rewrite the lines of that function, exec that string in the context of the original functions scope, before replacing the function back on the class with setattr.

The final result looks like the ugliest Python I’ve written for a long while. Somehow I don’t think this one’s going to make it past code-review.

22 May 2015, 09:47

Markov chains and Directed Graphs

One of the simplest possible probabilistic concepts is that of a Finite Markov chain. The basic principle of a Markov chain is that it describes how something (a state) changes over time. In effect, if you start in a state, after some time, a transition occurs, and state updates. If this sounds familiar, it’s not surprising, Markov chains share many common features with directed graphs.

My interest lies more in their applications to statistical physics (which is a fancy term for describing study of the motion of lots of little bits of stuff). For my application, Markov chains have been helpful as a tool for developing initution into more complex systems.

So what are the similarities between Markov chains and Graphs?

Markov Chains Graph Theory
Markov Chain Directed Graph with edges labelled with transition probability
State Vertex
Transition Edge
$T(i, j) = 0$ No edge from $(i, j)$
$T(i, j) > 0$ Edge from $(i, j)$, with weight $T(i, j)$
Accessible $i \rightarrow j$ Reachable
Communicating Classes $(i \leftrightarrow j)$ Strongly Connected Components
Permutation of a MC Isomorphism
Transpose $T’$ Flip edges in graph
Absorbing $T(i,i) = 1$ There exists an edge $(s, s)$, and the out degree of $s = 1$

19 May 2015, 21:35

Making numpy's spacing more like MATLAB's eps

Anyone who has worked with scientists knows the tension between MATLAB and Python, the oft-repeated closed vs open source debate. For those who straddle the gap, it’s important to know where to go to get help when you’re moving between the two worlds. From a Python perspective, the most upto date reference is Numpy for MATLAB users on the scipy wiki.

During the process of porting the conformal mapping toolkit, one of the issues that came up related to MATLAB’s eps function. This function determines what the interval to the next floating point number is - an important prerequisite in doing approximately equal tests on floating point numbers.

The guide (correctly) describes that eps is equivalent to the numpy ufunc np.spacing(1). More information about exactly what spacing does is provided by the numpy info function:

import numpy as np

np.info(np.spacing) 

The documentation states that the ufunc returns “the distance between x and the nearest adjacent number”. However, MATLAB`s eps seems to differ from spacing in two important ways. Firstly, the values it returns are always positive (it’s the absolute) spacing information. Second, and more importantly, it also deals with complex values, which is critical for conformal mapping.

So a slightly revised version of eps for python might read:

def eps(z):
    """Equivalent to MATLAB eps
    """
    zre = np.real(z)
    zim = np.imag(z)
    return np.spacing(np.max([zre, zim]))