Previously we have discussed what Neutrino is and what it has to offer wallets as Bitcoin’s new light client. In this post we’ll dive into the details of how Neutrino works its magic.
Neutrino is specified over BIPs 157 and 158. BIP 158 will be the topic of this post and details the data structures that Neutrino uses which allow clients to privately detect when they need to download more blockchain data. BIP 157 specifies the P2P messages used by Neutrino clients as well as what the operation of those clients and their servers should be. BIP 157 will be the topic of the next post in this series.
A filter is a data structure which has one main function called matching which returns True or False given some input data. Given some set of data, one constructs a filter for that returns True when fed any input from that set, and False otherwise. The simplest example of a filter is simply keeping all of the data in a list where matching is simply going through your list sequentially and seeing if you find a match. Probabilistic filters are very similar to normal filters except that they are allowed to return True with some small probability – called the false-positive rate – even if the data was not in the set for which the filter was constructed. This probability is configurable but Neutrino nodes currently use a false-positive rate of 1/784,931.
While there is the down-side of having a false-positive rate, the upside is well worth it since probabilistic filters are generally really small when compared to the underlying data. They also tend to be quite efficient to use, all of which makes them perfect for Bitcoin light clients. (Fun fact: Bitcoin Core now actually uses these filters as an internal data structure for full nodes, used for testing to see if certain data is in a block).
BIP 158 defines a new (to Bitcoin) probabilistic filter using Golomb-Coded Sets of hashes (more on this later). At a high level, here is how these filters are used in Neutrino. For each Bitcoin block, one of these filters is deterministically constructed. A block filter is around 10 times smaller than the block from which it was constructed, and matches all script public keys (think addresses) used in the block’s inputs and outputs. These filters are then passed from servers (full nodes) to clients (Neutrino nodes) who can match against the addresses they are interested in monitoring. If this results in a positive match, they can download the full block and do further validation.
Now that we have some idea of how these filters are used, let’s discuss how they work. If you are not familiar with hash functions, I suggest doing some high-level reading on them and returning to this post.
BIP 158 filters use the Siphash function which takes in a key and some data and outputs 64 seemingly random bits. We then squish this hash into a much smaller range (size of data set * 784,931). The Siphash function is optimized for speed. However, with only 64 bits of output (and even fewer after squishing), this hash function is not collision resistant, which is what will ultimately cause false positives at a rate of 1/784,931. A false positive happens in Neutrino filters when there is an item in the filter’s set with the same hash as an item not in the set that is being queried for.
With this in mind, here is how BIP 158 block filters are constructed: First, run all relevant block data through Siphash using the block hash as the key. This ensures determinism while still rotating keys every block. Second, sort these hashes in ascending order. Third, compute the differences between these hashes. Under the assumption that the Siphash function behaves as a hash function should, we expect these differences to be near a geometric distribution. This makes them ideal targets for Golomb-Rice coding which is a technique that optimally compresses geometrically distributed values. I will elaborate on this in a moment but our fourth step is to do this compression. After these four steps we have ourselves a compact block filter that can be passed to a Neutrino light client!
To match data with a block filter, we simply go in reverse. First decode the Golomb-Rice coded set. Then add up the differences to get the hashes. If you are looking to match a specific hash, you don’t even have to decode the whole set as you can just go through the filter sequentially until you either match your hash or go over it (since it is a sorted set). Furthermore if you are looking to match against multiple items, it should take just as long as you can simply sort those items and go through the filter just one time!
BIP 158 Golomb-Rice Coding
Golomb-Rice coding with the parameters specified in BIP 158 goes as follows. You take your number – in this case it is a difference of two hashes – and find its quotient and remainder when divided by 219. That is, find q and r such that the number is equal to q*219 + r. This is actually a really fast computation since our numbers are in binary. This is done by splitting the number into two (q and r) at the 219th place. Thus, that the last 19 bits are the remainder, just like the last 19 digits are the remainder when dividing a base 10 number by 1019. Next, you take q (which will be quite small) and rewrite it in unary, which is fancy jargon for tally marks (e.g. 7 = 1111111), followed by a zero, and then the remainder r. This encoding allows us to write down each hash in sequence without needing to separate them or detail their length. We can just read each number until we hit a 0, and then read 19 more bits.
Let’s review this all with a small example. Say that we have two binary numbers that we want to encode: 011|0110000011010110001 and 100|0011010011010001110 (the vertical bars separate the last 19 bits from the rest of the number).
We re-write the quotient – 011 = 3 – in unary followed by a zero, 1110, followed by the remainder, 0110000011010110001. Then we do the same with the second number, whose quotient is 100 = 4, so that we get 111100011010011010001110 as the encoded number.
Then we simply put these two together for the Golomb-coded set of 11100110000011010110001111100011010011010001110.
That is a lot of 1s and 0s but this is actually much more compact than how data is normally held in a computer which is with leading 0s that pad every number to 64 bits. Meaning that what we originally started with before compression was this space-consuming number:
I think we can all agree that what we have now is much better. And for larger number sets, the gains are even greater. While it is true that filters constructed in this way are still linear in the number of data elements, Golomb-coding is the optimal prefix-code for data in a geometric distribution. In practice, for light clients this is good enough as it reduces any block to a small number of KB.
To decode our set we start reading until we hit our first zero. Then we read the next 19 bits: 1110|0110000011010110001. We then translate from unary by counting the number of 1s and retrieve our first number, 0110110000011010110001. We repeat the same process of going until we hit a zero and then reading 19 more bits. If the set was larger we would continue this process over and over until all elements were decoded.
This is how BIP 158 probabilistic filters are constructed and how they work to match data. In the next post we will discuss how Neutrino clients actually operate using these filters to download as little data as possible while knowing they aren’t missing anything. Stay tuned for the last post in this series where we will discuss our experiences in implementing a Neutrino node.
All of our API services, for both Cryptocurrency APIs as well as Sports APIs, are built using Lightning technology and the Lightning Network. All API services are live on Bitcoin’s mainnet. Our fully customizable data service allows customers to stream as much or as little data as they wish and pay using bitcoin.
You can connect to our Lightning node at the url:
If you are a company or cryptocurrency exchange interested in learning more about how Lightning can help grow your business, contact us at [email protected].