SQL Server Joins using Playing Cards
The Inspiration
September 28, 2004 - Orlando, Florida @sqlpass. While listening to @BradMcgehee explain the value of indexes when joining tables on foreign key to primary key, I had an epiphany. The Merge Join is like shuffling two brand new decks of cards. And that in a nutshell was the inspiration for this article. By the end of this you hopefully will have proof that you think like a cost-based query optimizer and will know a dozen new trivia facts about playing cards.
The Game
You and your partner are competing on a show similar to Minute to Win It. @ChefGuyFieri (User) supplies different scenarios (Queries) where all the matches (Joins) between two piles (Tables) of playing cards need to be found (Results). You (Query Optimizer) are responsible for giving instructions to your partner (Storage Engine) on how to find the matches in the least amount of time.
The $10,000 Question
A pinochle deck is cut in half to form two piles. You need to find any exact matches of four cards from the top pile to the bottom pile. Do you know what cards are in a pinochle deck? Hopefully you do not because this counts as a trivia fact [#1]. A pinochle deck includes the 9 through Ace of all four suits, also referred to as a euchre deck [#2], which is then doubled giving you forty-eight cards. Now that you have the statistics, what instructions would you give your partner for matching the cards?
The Nested Loop (NL)
If you want to prove your mind is like a cost-based query optimizer then take a minute and think about how you would solve this before the clock starts. The Nested Loops is analogous to spreading the bottom deck (Inner Table) face up and scanning across it four times, once for each card from the top deck (Outer Table). In this scenario looping was efficient. The worst case (no matches) is twenty-four (½ of 48-card deck) tests of four different cards for a total of ninety-six compares. The best case (identical first cards) is only four compares because every match is unique allowing your partner to short-circuit the loop with each find. How lucky would that be?
The $100,000 Question
An unused red French deck [#3] and an unused blue Anglo-American deck [#4] are placed side-by-side. You need to find any exact matches between the red and blue cards. French? Anglo-American? No worries, both decks are just names given to the standard 52-card deck you are used to. Now that you have the statistics, what instructions would you give your partner for matching the cards?
The Merge Join (MJ)
The answer to this one was given away in the blog post trailer before the game. Perform a perfect shuffle using the method called riffle or dovetail shuffle [#5], where cards are held in each hand and interwoven by the thumbs allowing the cards to fall together. If your partner can to do this Merge Join correctly then it is blazingly fast. The decks were identical so every card had a match and every match was unique. Throw in the fact that the decks were sorted and it is the ideal scenario for a merge. Sorting the cards in the previous question would have wasted time because it would involve comparing many cards within the bottom deck that are not going to be matched anyway. Was epiphany too strong a word?
The $1,000,000 Question
There is a casino 8-deck shoe and a deck without any piquet cards [#6]. A piquet deck is comprised of candy canes [#7], snowmen [#8], Nina Rosses [#9] and all the Broadway cards [#10]. Translated from slang that means 7’s, 8’s, 9’s and 10 through Ace cards. (Cool how the trivia number matched the card number, huh?) You need to find any pip [#11] count matches, which means cards with the same number of symbols. Now that you have the statistics, what instructions would you give your partner for matching the cards?
The Hash Match (HM)
This one is hard but what did you expect, it is the million dollar question. In this case there will be many non-matches and many duplicate matches. This is due to the short deck having only 2 through 6 (less than half the card ranks) and due to the matching being by pip count were the 2 of clovers [#12] (clubs) matches the same thirty-two (8-decks by 4-suits) cards as the 2 of diamonds, the 2 of hearts and the 2 of spades. Precious seconds are wasted with each compare of the cards 7 through Ace (non-matches) and yet there will need to be several repeat compares of cards 2 through 6 (duplicate matches). Like in most games there's a trick and here is one for quick matching when you have both of those conditions.
Imagine if you bought a deck of cards but after someone used them they only put back fifty-one cards. This happens at my house all the time! How do you find the missing card? If your mind is like a cost-based optimizer then you make thirteen (2 through Ace) hash buckets and look for the bucket with only three cards. For this question, making hash buckets for each of the decks means every card is looked at once and buckets for each deck contain only cards of the same pip count. By matching the buckets from each deck with the same pip count, you and your partner have just completed a Hash Match. Again, it was the many non-matches and many duplicate matches that made the difference.
The Bonus Round
In the first question, what if the match was by color? Or if it was fourteen wheel cards [#bakers dozen]? In the second question, what if neither deck was sorted? Or if each deck was made-up of individual cards from fifty-two decks? In the third question, what if the 8-deck shoe was made from euchre decks? Or the match was by suit? These help demonstrate why statistics (and constraints) make all the difference. [MJ, HM, NL, HM, Constant Scan, MJ]
The Closing Announcements
Ami Levin wrote a series on Joins with playing card analogies:
- Physical Join Operators in SQL Server - Nested Loops
- Physical Join Operators in SQL Server - Merge Operator
- Physical Join Operators in SQL Server - Hash Operator
Craig Freedman wrote a series on Joins with deep dives:
The Special Thanks

