Building on Blockchain part one - from database to Blockchains

21.03.16 Andrew Elmore

Trade is the lifeblood of financial institutions. At C24, we've spent over 15 years helping institutions trade with each other, evolving and adapting to social, regulatory & technological advancements. In this series of blog posts we'll look at how we're helping clients evolve their existing systems to support distributed ledgers and related technologies.

In Part 1, we will discuss the fundamentals of blockchain technology and introduce three key concepts:

  • The Chain
  • Consensus
  • Cryptography and signing

Knowing Where You Stand

In order to trade, you need to understand your current position in terms of the assets you have, and your open trades. 

In the current world order, each party has their own view of the world, typically encapsulated in one or more databases. For internal governance and wider regulatory compliance, it is imperative that where parties trade with each other, each party's view of the transaction is consistent (e.g. Bank A thinks it received $1M from Bank B. If Bank B doesn't think it sent $1M to Bank A: Houston, we have a problem). This process is called reconciliation and is usually surprisingly complex and slow.

Many efforts over the years have focused on simplifying the reconciliation process, usually involving standardization of aspects such message protocols and identifiers for assets and institutions.

This is, of course, fixing the symptom rather than the cause. Ideally there would be a single record of truth which reflects the actual state of the world and accepted as gospel by all parties. Why does such a centralized repository not exist? 

The main issue is one of trust. Someone would need to own and manage the centralized repository, and while existing cryptographic approaches could be used to identify the originator of each update to the repository and ensure that only specific parties have access to data, there is still the opportunity for the owner to exercise influence (such as blocking a change, or affecting the order in which updates are received).

When we are talking about systems potentially handling transactions involving hundreds of millions of dollars or more, this type of bias is obviously unacceptable.

The second reason is more practical - a single point of failure in such a system is unacceptable - and for performance reasons, there are advantages to each party having a local replica of the store. But ensuring fast and accurate replication of data, while still enforcing security, poses a number of technical problems.

The Fundamentals of a Blockchain 

Distributed ledgers (also known as permissioned blockchains) provide an elegant solution to these issues by removing the need to trust a single entity and creating a framework where all parties can work with a shared replica of the data. Of course, it is still necessary to determine the format of the ledger and define a common set of semantics – but more on that later.

There are 3 key aspects to understanding permissioned blockchains:

  • The Chain
  • Consensus
  • Cryptography & Signing

The first 2 are common to all forms of blockchain (although the definition of consensus varies). We'll walk through each in turn.

The Chain

The chain links together all the updates in our ledger. It's useful to use Bitcoin as an analogy - while its unpermissioned nature means there are fundamental differences to the ledgers we are looking at, it remains the most widely used blockchain implementation.

A series of Bitcoin transactions are collected together into a block. This block is then linked to the last block in the chain, itself becoming the new last block in the chain. This link validates 2 important properties:

  1. The content of the new block. If the content changes, the link will show it as invalid.
  2. The previous block. If the previous block is changed, the link will show it as invalid.

In essence we are providing a tamper-evident seal. 

It is implemented using a mathematical function called a hash. Such a function takes a message of any size and reduces it to a fixed size identifer (the hash or digest). There are several important properties of a good hash function:

  • It must be deterministic. Given the same input, the hash function must always generate the same hash value.
  • It must be collision-resistant. This means that it is extremely unlikely that 2 different messages will generate the same hash value. More generally, we desire it to demonstrate uniform behaviour, where all possible hash values are equally likely given a large enough range of inputs.
  • Given a hash value, it must be extremely difficult to construct a message which hashes to that value. In practice this also means that small changes to the input data have a large effect on its hash value.

Each block in the chain actually comprises of the content (our transactions) and a hash value. The hash value is computed by feeding both the hash value of the previous block and our content into the hash function. We can validate that no-one has tampered with the content in a block by feeding the hash of the previous block and the current content into our hash function and the resultant value should match the one in the block. By starting at the beginning of the chain and working our way through it checking the hashes, we can be sure that the content has not been changed and that the blocks are in the correct order.

It's worth noting that the number of transactions in a block need neither be fixed nor more than 1 - for our distributed ledger there may be scenarios where a single transaction per block makes most sense.

Of course, we need a mechanism to stop someone changing the chain and simply recalculating all the hashes. We need consensus that what we're looking at is the 'real' chain.

Consensus

In the absence of a single entity which designates a version of the chain as both definitive and correct (which would bring us back to our single, hopefully benevolent, controlling entity), we need a mechanism for the actors in our ecosystem to reach agreement on the current definitive version of the chain.

Bitcoin's consensus algorithm is very simple - longest chain wins. For this to work, the process of adding blocks to the chain needs to be difficult, else a nefarious party could simply generate a very long chain and fraudulently construct the definitive ledger. It does this by adding an extra, very computationally expensive, step to the hash generation process. Although the generation is complex, validation of the hash remains very cheap. Those costs mount up for a hacker attempting to write history - if the cost to generate a single block is N units, to tamper with a block 5th from the end will require 5*N units (the interlocked nature of the hashes meaning that all subsequent hashes will need to be recalculated). Given longest chain wins, the hacker will need to perform all of that work before the rest of the network adds another block. For this to happen it is estimated that it would require the attacker to control at least 30% of the compute power in the network.

To put that in perspective, it is estimated that the energy costs associated with maintaining Bitcoin are around one gigawatt - the average energy production of the whole of Great Britain over the last year was around 32GW.

Clearly requiring such large amounts of energy to run our distributed ledger is undesirable. The reason Bitcoin requires such an approach is down to 2 of its founding goals:

  • Anonymity. Participants should be able to trade Bitcoin without revealing their identity to everyone else.
  • Uncensored access (or unpermissioned). Anyone can trade in Bitcoin without needing to seek permission to do so.

 Neither of these goals is applicable to our distributed ledger - it is essential that we know who is contributing to our chain and, in our regulated environment, we actively want to restrict the parties which can participate. As a result, we can use a simpler algorithm, for example we would accept a new chain as definitive over our existing chain if:

  1. It is an extension of our existing chain

      or

  1. It is not a subset of our existing chain - however more actors have declared that chain as definitive than the chain that we believe to be definitive.

If we accept a new chain as definitive, we add our declaration to it that we believe it definitive and distribute that back to the network. Ultimately all parties will agree on the definitive chain.

There is an edge case where there are 2 distinct chains in the network, both signed by 50% of the actors. In this case, we could compare the list of declarations in one chain again the list of declarations in the other and define an ordering - we'll look at a fair way to do that shortly.

This process of giving precedence to the version of the chain with the majority of participants support is what provides trust - so long as more than 50% of the actors are behaving honestly (or at least not collaborating together to defraud the remaining parties) it is not possible to compromise the chain.

Cryptography

Which brings us to the final piece in the puzzle - how do I prove that an entry in the ledger was created by me, and how do I verify that the declarations in the chain are authentic? Both problems have a similar solution.

Unlike symmetric cryptography (where the same key is used to both encrypt and decrypt the data), assymetric cryptography works with key pairs. Each key pair has the following important properties:

  • Data encrypted with one key in the pair can only be decrypted with the other (and vice versa)
  • Knowing one key does not allow you to determine the other

The keys in this key pair are frequently designated as public key & private key. I keep the private key, and distribute the public key to everyone else with my name attached to it. If someone wants to send something to me privately, they encrypt it with my public key - I am the only person who can decrypt it as I am the only one with my private key.

There is another interesting use case here though - if I encrypt something with my private key, anyone with my public key can read it and know that it came from me; the process of decrypting it proves that it was encrypted with my private key and, as I'm the only person that has that key, I must have encrypted it.

Encrypting data with a private key is the equivalent of signing a document; it proves that it came from me. We can use this in our blockchain as part of the declaration process. Whenever I wish to 'sign' a block in the chain, I take the block's hash add a copy of it, encrypted with my private key, to the block's signatures section. Recipients of the block can prove that I declared support for this chain by decrypting the value with my public key, and comparing it to the stored hash for the block. If they match they know that I authenticated the block. 

These encrypted hashes also provide a way to resolve the deadlock between different chains with equal levels of support discussed above. We can create an arbitrary mechanism to order the encrypted hashes (for example a simple lexicographical ordering of their base-64 encoding) - because our hash function has uniform distribution then for any arbitrary pair of encrypted hashes they both have an equal probability of being ranked above the other.

So now we have introduced the three key concepts which underpin blockchain technology - the chain, consensus, and cryptography. In Part Two, we will look at the anatomy of a distributed ledger and how it can be adapted to meet the requirements for financial services use cases.