Settlers of Catan: Probability

The theory of probability has its roots in gambling theory, and introductory probability courses often draw many examples from gambling applications. Learning to reason probabilistically is a real challenge and it takes a disciplined mind to overcome the human minds tendency to see patterns where their may be only randomness.

I’m not much for gambling, but I do enjoy board games. Settlers of Catan is a very popular game which is a blend of chance and skill. It is based on the players building settlements which border resource sources. At each turn two six-faced die are rolled and if a player has a settlement which borders a resource with a number equal to the sum of the two dice they collect a resource card. The resource cards can be used to build additional settlements, roads, etc.

In my experience Catan games typically feature at least one player who becomes disgruntled at the dice rolls and expresses some level of disbelief at the sequence of dice rolls produced. In this post I will compute some probability measures to investigate the phenomena of the disgruntled Catan player.

Dice Rolls in Catan

To begin lets find out the probability mass function P(Y=m) of Y=X_1+X_2 the sum of the two dice. Let us assume the dice are fair, i.e. that the roll of any number (1,2,3,…6) for each dice is equally likely. Then we have 6×6=36 possible combinations for the two dice values (X_1, X_2) each with a probability of \frac{1}{36}. Therefore, we just need to count the number of combinations of numbers in (1,2,3,4,5,6) which sum to a given value (2,3,..12) to get our Catan probability mass function. This is a standard calculation in the mathematical discipline “combinatorics”-the mathematics of counting things. However, in this case we have a simple visualization of this process:


Where I have written the values of the two dice rolls highlighted in gray and the sums on the two values are written in the interior. Counting up the lengths of the diagonals gives the number of combinations which sum to a given number, and dividing this sum by 36 gives the probability.

The probability mass function for our Catan rolls P(Y=m) may be plotted as:

Our analysis tells us the most likely value to be rolled is a 7 with a probability of 6/36=1/6\approx 0.166 with 6 and 8 the next most likely values with a probability of 5/36\approx 0.138. The first thing to note is that this isn’t a very big difference: it is only a difference of \frac{1}{36} \approx 2.77\% difference in likelihood to roll a 7 versus a 6 or 8. We can see the reason in our chart above as each time we move to a less probable number it only costs us one of the entries in the table (or 1/36 in probability of being rolled).

Mathematically, this means our probability mass function falls off in a linear manner from the most probable value. This is counter-intuitive from most of our dealings with probability, where the central limit theorem tells us to expect a gaussian/normal distribution (the proverbial bell curve). For a Gaussian distribution the extreme values fall off in a exponential fashion meaning that the probability of seeing a value very far from the average is much smaller.

This linear fall off in the Catan distribution means that it is almost flat (uniform) and leads to a great degree of variability in the rolls which can be observed in a game.For example, we might expect that building a settlement on an 8 would be much more profitable then building on a 9. After all, the law of large numbers tells us that if the game goes on long enough more 8’s should be rolled then 9’s. However, the law of large numbers requires Large Numbers! If the average Catan game lasts 50 rolls then we may simulate to see the probability that betting on 8 will yield more resource cards then the 9. Surprisingly, we see that in this scenario the bet on the 8 only pays off about 60% of the time- so roughly 40% of the time the player who builds on a 9 will get more cards by the end of the game!  If we assume the game lasts 100 turns then this percentage increases to about 68% of the time. Below I plot the odds that a bet on an eight pays off versus a nine as a function of the game length (number of turns).


In my experience, a Catan game usually lasts about 50 turns so we can see that building on an 8 doesn’t provide a huge advantage over building on a 9.  So, we see the disgruntled Catan player has a point! You can make what is mathematically the optimal decision and still have a good chance that it doesn’t work out that way. Of course, this is part of the fun of the game.


Functions and Secret Codes

Cryptography and Functions

In this post I will introduce functions using cryptography to motivate the definition of functions. We often think of functions as being synonymous with polynomials, trig functions etc as these are generally the function examples we encounter in introductory math classes. Thus, it can be difficult to make the transition to thinking of functions as relations between sets.

However, cryptography provides an interesting example of functions begin used in this more general manner with real world applications. Here we will define substitution ciphers as a method of introducing functions.

In cryptography we want to encode a message written in plain text into a cipher text version which conceals the meaning to an adversary if it should fall into their hands. When the intended recipient receives the message they should be able to reverse the encoding process and receive the plain text.

To give a concrete example consider that you are a radio operator in Great Britain during World War II. You have received a piece of intelligence that a ship convoy is headed directly towards a section of German U-boats. You need to pass a message to the convoy to change routes. The catch is that the German U-boats will also be able to hear your message as you send it. Therefore you need to encode your message. Luckily, before the ship convoy left you worked out a cryptographic scheme just for this purpose.

A function f: P \rightarrow E is a mapping between two sets where for every input the function produces a single output. This is one of the requirements for a cipher, each plain text message should yield a single cipher message once we encrypt it.


If we had some ambiguity in this process say the message “change your route you are headed for three u-boats” encrypted to “glerki csyv vsyxi csy evi liehih jsv xlvii y-fsexw” or “ejcpig aqwt tqwvg aqw ctg jgcfgf hqt vjtgg w-dqcvu” then it is unclear which of these options we should send and the poor ships would have no idea which of the two we picked. This requirement means our cipher should be a function.

Thinking about the ship’s captain who needs to decode our message brings up an additional requirement on the nature of our function: each plain text message should map to a unique cipher text. Otherwise our receiving ship will be faced with a dilemma again when they go to decode the message. Now when they decode the message they will have more than one option again with no way of knowing which message was the one intended!

In mathematical terms this is a requirement that our cipher is and one to one (1-1) function. Meaning each input is mapped to a unique value. In a mathematical prospective this allows our function to be invertible. Therefore, decoding the message is an example of using an inverse function.

Caesar Ciphers

A simple method of encrypting these messages is a substitution cipher. This method just maps each character in the plain text message to a different character. The diagram below (taken from wikipedia) shows this process graphically.


A simple version of this is to just shift the characters by some fixed amount in the alphabet. This can be represented by two wheels of characters an outside ring which gives the plain text letter and the inside ring which can be rotated around to create new ciphers.


To encode a message you replace the letter on the outside ring with the inner ring letter which is aligned with it. When the message is decoded the receiver pulls out a Caesar cipher ring and rotates it to the correct position (they have to know this in advance). They now read through the message replacing each on the inner ring with the aligned letter on the outer ring.

This is a very ancient idea and was used by Julius Caesar to encode his messages sent across the Roman Empire! For this reason these ciphers are called Caesar ciphers.

These days we don’t need a handy Caesar Cipher ring to perform this process as it is easy enough to have a computer perform this process. The below python code will cipher a plain text message using a Caesar Cipher to encode the text.

import sys	 

def encode(k, plaintext):
    """Pass in an integer rotation key k and a plain text message and this function will return a Caesar cipher encoding of the message"""
    alphabet = list('abcdefghijklmnopqrstuvwxyz')
    cipher = ''

    for c in plaintext:
	if c.lower() in alphabet:
	    cipher += alphabet[(alphabet.index(c.lower())+k)%(len(alphabet))]
            cipher+=c #if the character isnt in the alphabet above then dont change it
    print 'Your encrypted message is: ' + cipher

def decode(k, ciphertext):
    """Pass in an integer rotation key k and a ciphertext message and this function will return the plain text message"""

    alphabet = list('abcdefghijklmnopqrstuvwxyz')
    plaintext = ''

    for c in ciphertext:
	if c.lower() in alphabet:
	    plaintext += alphabet[(alphabet.index(c.lower())-k)%(len(alphabet))]
            plaintext+=c #if the character isnt in the alphabet above then dont change it
    print 'Your decrypted message is: ' + plaintext


if __name__ == "__main__":
    plaintext = list(raw_input('Enter message: '))
    ciphertext=encode(2, plaintext)

Breaking the Caesar Cipher

The Caesar cipher provided a reasonably secure form of communication in the ancient world, perhaps because the majority of Rome’s enemies were illiterate or would assume the coded message was written in a language they didn’t know. However, if you suspect the message is encoded using a Caesar cipher you can check each of the 25 possible rotations systematically to see if any of the messages convert to a plain text message which makes sense. This would have be difficult to check in the ancient world but with the advent of computers we can perform this task in a matter of seconds. The simple python script below can be used to break our Caesar cipher:

def breakCode(ciphertext):
    """Give this an encrypted message and this will print out all possible Caesar ciphers"""

    for k in range(26):
        alphabet = list('abcdefghijklmnopqrstuvwxyz')
        plaintext = ''

        for c in ciphertext:
	    if c.lower() in alphabet:
	        plaintext += alphabet[(alphabet.index(c.lower())-k)%(len(alphabet))]
                plaintext+=c #if the character isnt in the alphabet above then dont change it
        print k,plaintext

The output of this program if we encrypted the message “The Germans are coming!” with a key (number of rotations of 2) is shown below:

0 vjg igtocpu ctg eqokpi!
1 uif hfsnbot bsf dpnjoh!
2 the germans are coming!
3 sgd fdqlzmr zqd bnlhmf!
4 rfc ecpkylq ypc amkgle!
5 qeb dbojxkp xob zljfkd!
6 pda caniwjo wna ykiejc!
7 ocz bzmhvin vmz xjhdib!
8 nby aylguhm uly wigcha!
9 max zxkftgl tkx vhfbgz!
10 lzw ywjesfk sjw ugeafy!
11 kyv xvidrej riv tfdzex!
12 jxu wuhcqdi qhu secydw!
13 iwt vtgbpch pgt rdbxcv!
14 hvs usfaobg ofs qcawbu!
15 gur treznaf ner pbzvat!
16 ftq sqdymze mdq oayuzs!
17 esp rpcxlyd lcp nzxtyr!
18 dro qobwkxc kbo mywsxq!
19 cqn pnavjwb jan lxvrwp!
20 bpm omzuiva izm kwuqvo!
21 aol nlythuz hyl jvtpun!
22 znk mkxsgty gxk iusotm!
23 ymj ljwrfsx fwj htrnsl!
24 xli kivqerw evi gsqmrk!
25 wkh jhupdqv duh frplqj!

This illustrates the cat and mouse game which is cryptography. Most codes in use today are theoretically breakable but are what is called computationally secure — they would take a unrealistic amount of computing power to crack them. However, as technology improves their is a constant need for new ciphers which can keep just ahead of the technological advances of our adversaries.

Salem Witch Trials Model

Witch Hunt

The Salem witch trails provide a horrifying example of the effects of mass hysteria. In 1692, twenty unfortunate souls in Salem, Massachusetts were accused of witchcraft, convicted and hanged. This was the subject of Arthur Miller’s famous play “The Crucible”. This subject is typically studied in english or history class, but I hope to demonstrate in this post it also can provide an example of mathematical modelling.

Using a mathematical model to understand this phenomena allows us to get at the core of the driving dynamics. Why did the witch trials in Salem spiral out of control? What unique factors in Salem may have contributed to the hysteria? How are witch trials similar to other phenomena like the growth in a biological population, rumor spreading or disease epidemics? A good mathematical model can help us understand all of these questions.

In the witch trials the accused were almost always convicted and then given the choice of hanging or confessing. If they confessed then they had to provide the names of other people they had observed when consorting with the devil. Those named were in turn faced with the same dilemma: name people or hang.

Defining Variables

The first step in developing a mathematical model is define the variables and parameters which you would like to consider. There is a definite art to this process, although it is typically best to start as simple as possible. If the model is deemed overly simple you can always add complexity later.

In this vein we define the total population of Salem as {N_s}. It is convenient to organize our thinking into discrete trials {t=1,2,3,...} so that everyone who is currently accused of witchcraft stands trial together. Thus, let {W_t} be the number of people accused of witchcraft at trial {t}. Since we will assume that if you are accused of witchcraft you will most assuredly be convicted, we can divide the accused witches into two groups: Those who will confess to witchcraft and name {k} other witches and those that will hang for their silence. Let {p_m} be the fraction of martyrs in the Salem population which will refuse to confess.

Let us assume that those who choose to name names do so in a purely random manner. That is when they are accused they choose k people randomly from the Salem population to implicate in witchcraft. This of course is an oversimplification as people will likely have allies, friends and enemies and will be more likely to name people they dislike. This phenomena could be added to the model later. Also, notice we allow the accused to name those who may already have been implicated.

At this point we also should make clear what information we hope to get out of our model. Notice, the witch hunt comes to a close when no one is accused of witchcraft any longer W_N \rightarrow 0. Therefore, we can ask how many people we expect to hang, call this {H}, when the dust settles and the witch hunt is called off. We can ask how this total hung is affected by the fraction of martyrs {p_m} and by {k} the number of new witches named by each non-martyrs accused.

Extreme Cases

It is often useful when developing a model to examine extreme cases first. This can give an idea as to whether the model lines up with our intuition. First we can consider a population of 100 percent martyrs. Here the very first person who is accused has to be a martyr and will hang without naming anyone else. Therefore, if we assume that the witch hunt begins with a single accusation ({W_1=1}) then if {p_m=1} we have {H=1}.

At the other extreme, in a population with no martyrs then no one will hang as everyone will keep naming names {H=0}. We can also derive an upper bound for how many people can possibly be hung {H \leq p_m * N_s} the total number of martyrs in Salem. When {H=p_m N_s} then everybody in the village who was willing to be hung instead of confessing has already been hung!

Thought Experiments

For simplicity let us assume that {W_1=1} that is the witch hunt is started by a single accusation. Clearly if the first person accused is a martyr then they will hang without naming anyone else and they will be the only one hung. This will happen with probability {p_m} if we assume the first accusation occurs at random in the population. However, if the first person accused wants to save themselves they choose to name {k} other people then the witch hunt will continue. The second trial will commence with the {k} people named by the first witch. We can estimate the probability of the witch hunt ending after just two trials the probability of the first trial leading to a confession {(1-p_m)} times the probability that all {k} of those accused of witchcraft in the second trial refuse to name names and are hung {\approx p_m^k}. Thus we have that the probability that {H=k} is roughly {(1-p_m)p_m^k}.

These calculations will become more complicated as we move on. The important lesson we can take from this section is that we expect the witch hunt to end after only one trial in p_m fraction of cases. In the rest we can expect to see the witch hunt begin to grow quickly as people begin naming names and thus the number of hanged people will increase quickly.

This hints that our distribution of hanged people should be bimodal. In the case that the first person is hung the witch hunt is quickly contained. However, if the first person chooses to confess then the witch hunt will grow quickly, but will extinguish eventually as nobody is left to be hung.


It might be relatively difficult to calculate the probability of a certain number of people being hung, but it is relatively easy to simulate the witch trial process. If we simulate the process many times and record the number of people hung in each run we can get an idea of the probability distribution for {H}.

The below plot shows the number of hung people for 10,000 simulations with p_m=0.23, N_s=100 and k=3.



We can see that the plot has two peaks,  one as expected from our analysis sits has only one person being hung. This corresponds to the case that the first person accused is a martyr. However, we see a second peak with around twenty people being hung during the witch hunt which happens about 1-p_m or \approx 75% of the time.

If we increase the percentage of the martyrs in the population we will see that the chances that a full-fledged witch hunt ensues will decrease. However, if a witch hunt starts it may lead to a larger number of people being hanged. Below are the results for a simulation with p_m=0.40.


The Salem witch trials provide a good example of the use of mathematical modelling. From a relatively simple model we were able to understand the role of the population structure on the evolution of a witch hunt. Having a large percentage of martyrs in the population leads to an inherent resistance against a witch hunt escalating. However, once the hunt has begun growing it can easily lead to the death of the majority of the martyrs.

I mentioned earlier that a strength of mathematical modelling is that is can allow us to see parallels between seemingly unrelated phenomena. For example our witch hunt model could be easily changed to study rumor spreading in a population which can be divided into gossips-who when they receive the information spread it to k other individuals and trustworthy individuals who don’t spread the rumor when it reaches them. In this case we would also expect a bimodal distribution depending on whether the rumor is initially spread to a gossip. As we have all experienced a rumor can quickly get out of control when it reaches the wrong ears!

The model we considered here is an example of a branching process a fascinating topic for which much mathematical theory exists.

Python Code

If you are curious the python source code for the simulations is given below.

import scipy as sp
import numpy as np
import seaborn as sbn
import pylab as plt

class Witch:

    def __init__(self, pm=0.5, Nw=100, W1=1, k=3):
        self.Mcutoff=int(pm*Nw) #establish the martyr pop
        self.pop=range(1,Nw+1) #total population
    def runSim(self):

        self.currentWitch=[] #currently accused witches

        self.currentWitch.append(np.random.choice(self.pop,1)) #choose an initial witch at random
        while (len(self.currentWitch)>0):
            for g in range(len(self.currentWitch)):
                if w < self.Mcutoff:
                    newWitches=np.random.choice(self.pop, self.k) #choose k people to accuse
                    for nw in newWitches:
                        if nw not in self.accused:
                    self.accused+=list(newWitches) #add the new accusations to the list, can't be accused twice


if __name__=="__main__":
    for i in range(10000):
    sbn.distplot(dataHung, bins=MaxW, kde=False, norm_hist=True, color='red')
    plt.xlabel('Number Hung')
    plt.ylabel('Probability of Occurance')
    plt.title('Salem Witches')



Circadian Parameter Fitting I


Parameter fitting is the dirty secret of applied math and mathematical biology. It is difficult, frustrating, time consuming and very easy to mess up. The good news is that when it is done well there may be very little reward mainly due to a general mistrust in the scientific community for fit models. Outside of more mathematically oriented journals the parameter fitting process applied is buried in a brief footnote or relegated to the supplementary material.

This mistrust for models with many parameters is reflected in the famous scientific idiom attributed to John von Neumann “With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.” This expresses the idea that a complex model with many parameters is likely to display any behaviour you want if you have enough parameters.

The reality of course is much more complicated than the elephant quote suggests; whether you can really fit the elephant depends crucially on the structure of our model and the structure of the data used in the fit. However, there are ways to determine how well the data constrains the parameters. Even when the parameters are poorly determined by the data, the model can still make useful predictions (see Sloppy Models) . This is a growing and fascinating topic and should be useful to modelers and experimentalists alike as it can be used in both model selection and experimental design.

For example, a particularly frustrating misconception in the circadian literature is that the ability of a model to show 24 hour oscillations is evidence that it can used to describe the circadian clock. No . Any dynamical system which has a limit cycle can show 24 hour oscillations by a simple rescaling of time. Perhaps what is meant by this assertion is that the system shows 24 hour oscillations for biophysically reasonable parameter values. Although even this may be difficult to argue for many of the parameters we may have little idea of what is biophysically reasonable. This is an example of the data doing little to constrain the model. We will see that other forms of data can provide better evidence in favor of the model.

In this series of posts I am going to examine parameter fitting for biochemical models, specifically tailored to systems biology and rhythmic processes (circadian rhythms, etc). In this first post I introduce the least-square cost function typically used to compare simulations and data. Later posts will address finding good parameter fits and then accessing the model.

Least Square Cost Functions

In parameter fitting for dynamical systems our model takes the form:

\displaystyle   \frac{d\vec{y}}{dt}=\vec{f}(\vec{y},\vec{\theta}) \ \ \ \ \ (1)

where {\vec{\theta} \in \mathbb{R}^{N_p}} is the parameter vector. We assume that we have data of the form {\{(Y_j, \sigma_j)\}_{j=1}^{N_d}} corresponding to a value of the dynamical variable {Y_j} at time {t_j} with uncertainty {\sigma_j}.

The idea of the parameter fitting procedure is to define some cost function which gives a metric for how well the data and the model agree as a function of the parameters. Thus, the cost function will achieve a minimum where the “best fit” is achieved.

Mathematically, we may define the problem as: Find a parameter vector ,{\vec{\theta}} ,which minimizes the cost {C:\vec{\theta} \rightarrow \mathbb{R}}. A good way to define the cost function in this case is to set

\displaystyle   C(\vec{\theta})=\frac{1}{2} \sum_{j=1}^{N_d} \left(\frac{y_j(t_j, \vec{\theta})-Y_j}{\sigma_j}\right)^2 \ \ \ \ \ (2)
This is often rewritten as,

\displaystyle   C(\vec{\theta})=\frac{1}{2} \sum_{j=1}^{N_d} \left(R_j(\vec{\theta})\right)^2 \ \ \ \ \ (3)

where {R_j(\vec{\theta})} is the residual between data point j and the simulation. When a cost function is written in these terms (the sum of the residuals squared) it is called a least square cost function. Under this definition our parameter fitting routine will look for the parameter set which minimizes the distance between the simulation and the model squared.