Bad day with friends->irritation->Markov Chains in Perl


I went to see Mission Impossible: Ghost Protocol with some friends the other day. The friend who asked me not to post anything too technical pissed me off at something and I told her I’ll never talk to her again (I guess I over-reacted but never mind… I don’t know if she’ll ever talk to me again but she is the best girl I’ve ever met). I was already frustrated that day because Bell Labs rejected me thrice after technical telephonic interviews three times in a row. I guess the reason is that I need visa sponsorship and due to recession and job shortages in the US, they are not hiring foreign employees. So whatever, I was really frustrated and to top it, my friend pissed me off and I was in a crappy mood. So I opened up Geany (my text editor) and started working on the concept of Markov chains to clear my mind. I was already trying to make it in C but due to my mental state, it was all getting messed up and it was further irritating me so I gave it p and took up perl. After a few hours of coding in perl I found out that it was working fine and that was an instant mood lifter! :D.

So in this post, we’ll take a look at Markov chains and what I did with it.

A Markov chain is a mathematical system that undergoes transitions between a finite set of states. It is a random process having the property of “memorylessness”. This property, also called Markov property, allows the next state to depend only on the current state of the system and not on the preceding states of the system. A mathematical treatment of it can be found in Markov Chains. Its really easy to understand so we wont dwell much into the mathematical background of it 🙂 *innocent smile*.

So having said that, I was working on Markov chains for something very covert but ended up doing something awesomely stupid (or stupidly awesome). I made a scrambler. Well, putting it more lucidly, you give it a text file and it applies the Markov property and Gaussian probability distribution to jumble up sentences. The output thus produced is hilariously correct (though it sometimes doesn’t make sense). I fed in the text version of The Book of Revelations, The Corinthians  and Alice in Wonderland and got the output as Wonderland of Revelations, The Corinthians of St. John. Though the content is totally unrelated to the title but it is a good read nonetheless :).

How does it work? This is the fun part. First we create a hash of what words follow what other words. For example: we have three sentences: “to the moon and back”, “to the stars and beyond” and “to the hell and under”. The key for the hash will be “to the” and the values for this key will be {“stars and beyond”, “moon and back”, “hell and under”}. If we put it mathematically, we now know what states to jump to, given the current state of the system! Do you get it?! :D.

Now, Markov chain is not truly “random” in nature, the transitions have a fixed probability and the state jumped to is the state that has the maximum transition probability. I still have to develop the logic of setting up transition probabilities so for now I did something utterly complex. The next two paras try to explain it.

Perl provides some very complex data structures apart from just arrays and hashes like, array of arrays, array of hashes, hash of arrays etc. A hash, also known as an associative array, associates a particular “key” to one or more “values”. The key is unique and can have multiple values. Now applying this concept, I took a word consisting of two words (let them be a1 and a2). From the fifth paragraph, a “key” in the hash then will be (a1, a2). The “values” associated with that key will be an array containing the sentences following the word (a1, a2). So we get a hash, with an array of sentences following the word (a1, a2) as the value for each key.

Having done that, we have a data structure something like this:
(to the) => [“stars and beyond”, “moon and back”, “hell and under”] #the square brackets represent an array.

Now we generate a random number that will index into the array just described above. The random number generated is from the rand() function provided by the perl library. I wanted to use the Gaussian distribution to generate the random number, that will feature in the next version. Its now easily deducible what the next step would be. Print the key along with the selected value and update the key. The new key is the combination of the second and third word of the sentence just printed. Umm an example: selecting “moon and back” from the data structure, the sentence printed is “to the moon and back” and the new key is “the moon”. With the new key, we search the hash again and repeat the whole procedure again.

Its easy to understand but coding it is a totally different ball game. Many intricacies are there… multi-valued hash, reading data from a file and storing it into a hash and then the Gaussian distribution itself. But nonetheless, I’ll up the code and the “book” I “made” ;). The code, once you get a hold of what is going on, is a breeze to understand. No extra-ordinary perl hacks are used. shows the perl code

and the book I “made” can be downloaded Wonderland of Revelations, The Corinthians of St. John

PS: to an adept Perl programmer, please forgive my weak Perl… its been a while since I spoke Perl and I need more rigorous practice.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s