While looking for some code to use for a screenshot because Microsoft rejected my first WP7 submission due to the original icon I used, I came across some code from a graduate class I took. The class was on intelligent systems. The main tool we used in that particular class was Weka. Weka comes with some sample datasets, and there are a variety of sites that have datasets that Weka can use. My group was looking at a dataset of poker hands, although like the dataset creators, we didn’t have much luck. Their dataset had random cards with suits. I think the same card couldn’t appear twice, although I am not completely sure on that.
The objective of having the dataset would be to feed Weka a list of hands with a value to represent what the hand was. Weka would then create a model by learning how to classify a poker hand without knowing any rules of poker. We would then use the model in another program to have it determine what the hand was.
One issue with their dataset was that without pre-processing of the data, a suit mattered as much as the card value. Order also seemed to effect the classification because 2,3,4,5,6 may get classified differently than 2,4,3,5,6. Since they made the dataset to simulate real hands being drawn, the statics are about the same for the hand types. I think they had actually had to add some royal flushes manually because they didn’t randomly produce any. The problem with a dataset consisting of small amounts in some of the classifications is that the model would have trouble correctly classifying those because it had not been trained enough with them. Also, being 5 or so off of a specific group is not seen as a big issue when there are thousands of sample hands.
I created a program to create an ARFF file (basically a csv file with Weka headers) which consisted of n hands that were evenly distributed. Basically it randomly picks the hand type, and then randomly creates a hand that fits the chosen type. We ignore the suit because in reality it doesn’t really matter, what matters is whether or not the suit of the cards is the same. We have one field for that. The cards are also ordered after the hand is picked in order to eliminate the order classification issue. We didn’t see that as an issue because that could be something people do mentally while playing poker. If anyone wants to use this file, I have uploaded it here.
The classifiers in Weka that we used were JRip and Multi-Layer Perceptron. Using a JRip model would correctly classify 89.76% of the time. Our model with the multi-layer perceptron classified correctly 97.37% of the time. Using the generated models, another group member made a java app that would allow the user to create a poker hand and use the model to classify it.
Update 1/19/2012: I posted the source to the above program on GitHub a little while ago to test Git for work. It is available here.