Programming

Peter Norvig has on online class about program design, and one of the first things he builds is a hand-ranker. Might be worth a watch:

One thing to note is that his program is drastically less object-y than yours. Not sure if that’s impacting speed materially, but it’s worth considering whether you can get away with simpler representations.

Also of note, this code is a bug:

def holdem_equity(starting_hands, board_cards=Hand(),
                  dead_cards=Hand(), sample=False, trials=10_000):

In Python, each invocation of a function uses the same default args. That’s fine if the default is immutable, but if it’s mutable, then you have a big problem. Run the following for an example:

class Elephant():
  def __init__(self):
    self.memory = []

  def offend(self):
    self.memory.append("offended")
  
  def encounter_in_a_dark_alley(self):
    if self.memory:
      print("An elephant never forgets...")

def offend_and_reencounter(whom=Elephant()):
  whom.encounter_in_a_dark_alley()
  whom.offend()
  whom.encounter_in_a_dark_alley()

print("First run:")
offend_and_reencounter()
print("Second run:")
offend_and_reencounter()

The fix is to replace Hand() with None in the arguments, then:

if dead_cards is None:
   dead_cards = Hand()

OK, on to the question. I agree that you’ll save a lot of time by abstracting the specific suits. I also wonder whether you can be more efficient by evaluating hand strength incrementally. The idea would be to start with the two players’ hands, then build out a tree by pushing one card at a time onto the board. It seems like it should be easy to track high-card/pair status of the hands as each card is added, and you can also forgo updating the leader’s hand until it’s overtaken. That would also let you fully collapse suits in many branches where flushes aren’t possible.

Good catch on the mutable default arguments. I haven’t tested it with board cards or dead cards yet, so I didn’t encounter the bug.

I’ll try tinkering around with how the ranking works and allowing a reference table to share results between evaluations.

Really depends what you are going for. If you just want to use this as a OO coding experiment and don’t really care about performance then it obviously doesn’t matter much. Assuming you do care about performance, iirc the top evaluators perform around 300,000,000 random evaluations per second (up to 2-3x that for ordered enumerations), per CPU thread. That is as of a few years ago, adjust for better hardware now.

If you want to get into that ballpark then you absolutely can not use Lists/Objects to represent the cards in your evaluated hand. Just barely initializing your Hand object probably brings you over your time budget.

Most fast evaluators use a single 64-bit integer to represent hands. Which bit layout makes the most sense depends on your evaluation approach later, but the common options are by-rank (lowest 4 bits represent deuces of all suits, next 4 bits the various 3s, etc), or by-suit (lowest 13 bits represent all e.g.
Spades, 2s3s4s…, next 13 bits Hearts, etc). You can then do all sorts of fast bitwise operations to evaluate the hand strength, the fastest evaluators additionally use some sort of lookup table.

1 Like

I’ve been continuing to fool around with the evaluator. I haven’t gotten everything working yet, but I did set up a little loop for speed testing.

counter = 0
start = time.time()
for c1 in range(52):
    for c2 in range(c1 + 1, 52):
        for c3 in range(c2 + 1, 52):
            for c4 in range(c3 + 1, 52):
                for c5 in range(c4 + 1, 52):
                    for c6 in range(c5 + 1, 52):
                        for c7 in range(c6 + 1, 52):
#                            strength = evaluate(c1, c2, c3, c4, c5, c6, c7)
                            counter += 1
end = time.time()
print(counter, 'hands counted in', str(end - start), 'seconds')

This takes about 20 seconds to run on my machine, which seems like a long time just to increment the counter. Is the use of the range() objects slowing things down here?

a million nested for loops is always going to be the slowest way to do anything.

I ilke that we’ve found an actual use for the stuff you get in programming interviews. In a 20+ year career I’ve never actually needed to do a merge sort, or bubble sort or anything similar. But I’ve never had to write a performant poker hand evaluator either.

Maybe so. I’m just trying to translate the C test code from the 22 thread linked above into Python. Here’s the C code:

int c1, c2, c3, c4, c5, c6, c7;
int handSetCounters[8] = {0,0,0,0,0,0,0,0};

for(c1 = 0; c1 < 52; c1++) {
for(c2 = c1 + 1; c2 < 52; c2++) {
for(c3 = c2 + 1; c3 < 52; c3++) {
for(c4 = c3 + 1; c4 < 52; c4++) {
for(c5 = c4 + 1; c5 < 52; c5++) {
for(c6 = c5 + 1; c6 < 52; c6++) {
for(c7 = c6 + 1; c7 < 52; c7++) {

CALCULATE RANK
CALCULATE SET ID (HIGH CARD TO STRAIGHT FLUSH)
INCREMENT HANDSETCOUNTERS[SET ID]

}}}}}}}

PRINT HANDSETCOUNTERS

I expected Python to be slower than C, but not to this extent.

Yeah, it’s a slow algorithm independent of implementation. IIRC loops have a lot of overhead in Python and nested list comprehension will be faster, but that doesn’t solve your problem, which is the use of nested loops altogether.

What is this chunk of code for? Just testing the evaluate function? If so, so what it takes too long, it does the trick, move on. But if this gets run for every simulated hand, it’s problem.

I’ve tried two other options that I thought might be faster than the for loops with ranges, but I’m getting similar speeds. I’ll see how the nested list comprehensions do later.

counter = 0
start = time.time()
c1 = 0
while c1 < 52:
    c2 = c1 + 1
    while c2 < 52:
        c3 = c2 + 1
        while c3 < 52:
            c4 = c3 + 1
            while c4 < 52:
                c5 = c4 + 1
                while c5 < 52:
                    c6 = c5 + 1
                    while c6 < 52:
                        c7 = c6 + 1
                        while c7 < 52:
                            counter += 1
                            c7 += 1
                        c6 += 1
                    c5 +=1
                c4 += 1
            c3 += 1
        c2 += 1
    c1 += 1
end = time.time()
print(counter, 'hands counted with while loop in', str(end - start), 'seconds')
counter = 0
start = time.time()
hands = it.combinations(range(52), 7)
for hand in hands:
    counter +=1
end = time.time()
print(counter, 'hands counted with itertools', str(end - start), 'seconds')

The code is for benchmarking hand evaluation speeds. I’m not trying to set any speed records, but was hoping to get something that is halfway decent.

However, the fastest hand evaluators can run through this loop and evaluate every hand in less than a second.

I haven’t event turned on the evaluator, and I’m already 20 seconds behind. So it seems pretty hopeless that get a decent evaluator test time with Python.

ok so you’re evaluating 7-card hand, right? Instead of using the nested loops to evaluate every hand, use them to build a dictionary or list of hands then pass it to the evaluator function.

Slowest attempt so far:

counter = 0
start = time.time()
for hand in ((c1, c2, c3, c4, c5, c6, c7) for c1 in range(52)
                                          for c2 in range(c1 + 1, 52)
                                          for c3 in range(c2 + 1, 52)
                                          for c4 in range(c3 + 1, 52)
                                          for c5 in range(c4 + 1, 52)
                                          for c6 in range(c5 + 1, 52)
                                          for c7 in range(c6 + 1, 52)):
    counter += 1
end = time.time()
print(counter, 'hands counted with comprehension', str(end - start), 'seconds')

I give up.

counter = 0
start = time.time()
while counter < 133784560:
    counter += 1
end = time.time()
print(counter, 'hands counted with simple counting', str(end - start), 'seconds')

Output from all variations:

133784560 hands counted with for loop in 30.414705753326416 seconds
133784560 hands counted with while loop in 34.20109176635742 seconds
133784560 hands counted with itertools 23.50975775718689 seconds
133784560 hands counted with comprehension 36.54576539993286 seconds
133784560 hands counted with simple counting 23.47886323928833 seconds

OMG we are back in business
Thank you, baby numba

import time
from numba import jit

@jit(nopython=True)
def trials():
    counter = 0
    for c1 in range(52):
        for c2 in range(c1 + 1, 52):
            for c3 in range(c2 + 1, 52):
                for c4 in range(c3 + 1, 52):
                    for c5 in range(c4 + 1, 52):
                        for c6 in range(c5 + 1, 52):
                            for c7 in range(c6 + 1, 52):
    #                            strength = evaluate(c1, c2, c3, c4, c5, c6, c7)
                                counter += 1
    return counter
start = time.time()
counter = trials()
end = time.time()
print(counter, 'hands counted with for loop in', str(end - start), 'seconds')
133784560 hands counted with for loop in 0.21671438217163086 seconds
2 Likes

I get a 30% speedup from pre-computing the ranges:

counter = 0
start = time.time()
for c1 in range(52):
    for c2 in range(c1 + 1, 52):
        for c3 in range(c2 + 1, 52):
            for c4 in range(c3 + 1, 52):
                for c5 in range(c4 + 1, 52):
                    for c6 in range(c5 + 1, 52):
                        for c7 in range(c6 + 1, 52):
#                            strength = evaluate(c1, c2, c3, c4, c5, c6, c7)
                            counter += 1
end = time.time()
print(counter, 'hands counted in', str(end - start), 'seconds')

ranges = [range(i, 52) for i in range(53)]

counter = 0
start = time.time()
for c1 in ranges[0]:
    for c2 in ranges[c1 + 1]:
        for c3 in ranges[c2 + 1]:
            for c4 in ranges[c3 + 1]:
                for c5 in ranges[c4 + 1]:
                    for c6 in ranges[c5 + 1]:
                        for c7 in ranges[c6 + 1]:
#                            strength = evaluate(c1, c2, c3, c4, c5, c6, c7)
                            counter += 1
end = time.time()
print(counter, 'hands counted in', str(end - start), 'seconds')

Results:

133784560 hands counted in 14.301237106323242 seconds
133784560 hands counted in 10.481547594070435 seconds

Cool trick. I wonder if that makes a difference with numba or if the JIT compiler already makes that optimization.

Why is range so slow?

I don’t think it’s really that slow, it’s just that the inner loop only has a single addition in it, so any other operation is going to take comparatively a significant amount of time. I added a second addition into the loop, and got the following:

133784560 hands counted in 15.26519513130188 seconds by baseline
133784560 hands counted in 11.38058352470398 seconds with precomputed ranges
133784560 hands counted in 19.098947763442993 seconds with precomputed ranges and second addition

if you guys were cool you’d program in a real programming language that doesn’t have range imo

Yeah, you definitely have to use something like numba for this.

A friend of mine ported a Java evaluator i wrote to Python (numba/numpy), i’ll check with him if i can post it here. The 133784560 hands take about 0.7s to evaluate with that version, that’s about half the speed of the original Java version.