Finding Lisp

Sunday, August 08, 2004

Bloom meets Ethernet

I just read John Wiseman's write up on Bloom filters on Lemonodor. It's a good write-up, so have a read.

If you're an old-time communications guy (like me), you'll recognize Bloom filters right away, although I had never heard them called that. All the early Ethernet MAC chips used simple Bloom filters in the receive logic for multicast packets (and probably every IEEE 802.X MAC, including token ring, 802.11, etc.). Here's how it worked...

The basics of the Ethernet receive logic are pretty simple. Each Ethernet frame ("packet" to you layer 3 and above guys, but more formally a "frame" when it's on the wire) contains both source and destination MAC addresses. These are the 48-bit unique addresses burned into the ROM of each Ethernet board out there at the time of manufacture. Now, an Ethernet MAC is supposed to receive at least the following frames:

Frames addressed to its own unique MAC address (called a "unicast" address).
Frames addressed to the global broadcast address.

There is also multicast processing. Most people understand broadcast pretty well. In Ethernet, when you use the broadcast address, you're sending a frame to every node that can hear you on the local layer 2 domain (think "subnet" in the IP world). Multicast is a bit different. Multicast addresses are used to target a subset of the nodes in the local L2 domain--those that have expressed interest in participating in whatever multicast-based protocol you are running.

For an analogy here, imagine being in a large room with 1000 people. If you wanted to tell something to everybody in the room with a birthday in May, you could do it one of several ways. First, you could take out a bullhorn, interrupt everybody with your message, and assume that everybody who does not have a birthday in May will ignore it. The problem with that is, it's horribly inefficient if the number of people you want to talk to is small relative to the total size of the group. If 10 out of those 1000 people have birthdays in May (statistically unlikely, but possible), then you're going to be disturbing 990 people to talk to those 10. If you have a lot of people trying to do a similar sort of thing (possibly trying to talk to the people with birthdays in March, August, and October, for instance), then everybody will be very busy, but mostly being interrupted and then ignoring messages that aren't meant for them.

Another alternative is to tell all the people with a birthday in May to stand over in one corner of the room. You would do this by prior arrangement, like maybe putting a notice in the invitation of what you intend for them to do. You can then deliver a message without having to use a bullhorn to talk to them. The people with birthdays in March, August, and October can stand in the other corners or along one wall, or whatever. When somebody really does have a message for everybody, like "Dinner is served," for instance, they can use the bullhorn for that. Other systems are also possible, but they often are less efficient (you could tell the message to everybody, one by one, for instance).

Ethernet makes a provision for multicast addresses by dividing up the 48-bit address space based on the most-significant bit. If the most-significant bit is 0, the address is an individual node address. If the most-significant bit is a 1, then the address is a multicast/broadcast address. The broadcast address is special and represented by an all-1s pattern. This means you have 2^47 - 1 multicast addresses available.

The early, 1980s, MAC chips had a problem. The issue is that you want to filter uninteresting frames in the hardware, before you DMA data around and interrupt the software driver. The more you can filter in hardware, the more processor time gets spent on useful work, rather than simply inspecting uninteresting frames and then throwing them away in software. It's fairly easy to create hardware logic that compares the incoming frame's destination address with either a 48-bit value programmed in a register or the all-1s broadcast address, but how do you handle multicast? There are a couple obvious strategies, each with problems.

First, you could simply pass every multicast frame to the driver software. This is the equivalent of being a person with very, very, sensitive hearing that is able to hear the conversations of everybody in the large room anyway, no matter if they are in another corner or not. Essentially, you'll get interrupted by every frame. It's your choice, though, and some of the earliest MAC chips used this technique because it's easy and it's guaranteed to work, even if it is inefficient. Couple this with the fact that most early network protocols didn't use multicast that much, and you could get away with it.
The second approach is to compare the incoming frame against not just the node's own unicast address but additional registers containing multicast addresses. The upper layer software tells the driver which multicast addresses the node is interested in receiving, and the driver populates the registers appropriately (the equivalent of, "We're a May birthday, so stand in that corner."). The problem here is threefold. First, how many registers is enough? There are 2^47 - 1 addresses, so having a small number of registers like 2, 4, 8, 16, or 32 seems very small in comparison to the possible range. Second, what do you do when the upper layer software exceeds that number? Do you refuse to listen to all of them? That seems wrong. Do you revert to receiving all multicasts? That seems like a big step backwards in efficiency. Finally, while you might be tempted to just put in a whole lot of registers, like 100 or 1000 say, remember that registers take silicon area and that you also have to compare the incoming frame's destination address against all the registers simultaneously. This last requirement means that you can't use an efficient RAM to store the addresses with a state machine that compares things sequentially, you have to build something called a content addressable memory (CAM) which is both inefficient in space and power dissipation and therefore limits how large you can go. Put another way, you could maybe build a CAM to do a few tens of addresses, but you couldn't do 1000.

Now, remember that we're talking early 1980s technology here. Nowdays, there are things that you can do that you couldn't do then. For instance, many modern chips do use large CAMs because semiconductor process technology has advanced to the point there that makes more sense. But back then, the tradeoffs were quite different.

A third alternative is that you get a bit probabalistic in your accept/drop logic with the driver software doing a final perfect comparison once the frame passes the hardware screen. Essentially, you use a Bloom filter. The way this was done in the early Ethernet controllers was to run the destination address thorough a CRC-32 algorithm to hash it. This turns out to be easy to do since Ethernet uses a CRC-32 frame check sequence (FCS) and the destination address is the first field in a frame. Just start computing the CRC-32 over the whole packet as you are receiving it and grab the intermediate value of the CRC-32 once you have received the destination address. Next, use some bits of the CRC-32 as an index into a bit mask. Many early chips used a 32-bit mask, so you would use the low-order 5 bits as the index, for instance. Now, the driver software takes the list of multicast addresses registered by the upper layer software and computes a CRC-32 over each address to calculate its index and sets the corresponding bits of the multicast bit mask in the hardware. The hardware does all the matching at wire speed.

Obviously, this is a probabalistic acceptance function. If multicast addresses are used randomly, which they aren't, and multicast registrations from the upper layers are rare, which they were in the 1980s and generally still are today, you have an n-in-32 chance of mistakenly accepting a multicast packet you aren't interested in, where n is the number of bits set in the bit mask. Obviously, the more addresses that are registered by the upper layer software, the more the probability trends toward 1. In other words, the probability that nearly all the bits in a 32-bit mask being set with less than 32 addresses registered is zero. The probability rises quickly after that point, however.

The nice thing about this technique is that it scales up very easily. It's far cheaper to include a 512-bit mask than to include a 512-entry CAM, for instance. Later model Ethernet controllers from the early 1990s often used larger bit masks to reduce the probability of a "false positive" (where the hardware accepts the packet but the software isn't interested).

So, it's true what they say: "There is nothing new under the sun."

# posted by Dave Roberts : 5:36 PM
2 comments links to this post

Comments:

A single hash table lookup isn't a Bloom filter, and has none of the advantages over a single hash lookup that a Bloom filter does.

# posted by

Anonymous : August 09, 2004 12:38 PM

Sure it is. It's a Bloom filter with a single hash function, rather than some number greater than one. The general effect is exactly the same, but the probability of false positives is simply higher than if more independent hash functions had been used.

# posted by

Dave Roberts : August 09, 2004 1:59 PM

Finding Lisp

Books of Note

Sunday, August 08, 2004

Bloom meets Ethernet

Site Links

Links for Newbies

Feeds

Archives