Semantic Fuzzy Matching Improves Processing Time | Gresham
In the investment management industry, speed and efficiency are key to success. The faster firms can process large amounts of data, the quicker they can expand their businesses. Often, the critical part of growth lies in post-trade processing where settling trades, investigating trade breaks or exceptions, and reconciling transactions must be as efficient and accurate as possible.
Reconciliation solutions help firms improve efficiency, but over time, the completeness of data has failed to keep up with growing transaction volumes. As a result, trade and settlement data does not always match, making reconciling transactions a daunting task. This is where fuzzy matching can help, particularly for asset classes lacking standard security identifiers such as OTC derivatives, loans and commodities.
But not all fuzzy matching algorithms are created equal. The one crucial difference is the use of semantics which recognize the meaning of words to pair transactions and support subsequent reconciliation activities, versus the use of syntax which only examines characters and sentence structure and often causes errors in meaning and time to process.
Therefore, the choice a firm makes for fuzzy matching can determine success or failure in its ability to quickly and accurately investigate trade breaks or exceptions, reduce risk and accelerate the entire post-trade process.
The Wild World of Identifiers
The lack of a single global identifier to uniquely identify both sides of a transaction for easy pairing has caused fuzzy matching to rise in popularity and become part of the reconciliation process. In fact, a whole industry exists to help firms support reconciliation by handling independent data coming from multiple systems in different formats. However, the syntactic approach most solutions use to compare financial instruments end up compromising speed and efficiency.
Since no single global identifier exists and there are many different identifiers used by both sides of a transaction, this process can be quite cumbersome. Adding to the difficulty, some identifiers that were previously free are now subject to license fees, such as CUSIP. Sometimes, identifiers are invented by a manager or custodian for convenience whether or not it is registered. In cases like this, such as in the OTC derivatives market, free-form text descriptions are the only identifying information available. Therefore, the unique identifier for a financial instrument may not have a matching value when compared against custodian data based on the custodian’s understanding of a position.
Automated Matching Techniques
Automated matching systems have different techniques to deal with this challenge including “memorizing” a pair of security identifiers or descriptions in a way that lets subsequent rounds identify and match data automatically. The goal of these systems is to be able to reduce the number of exceptions that must be investigated manually. Most systems have match rates in the high 90 percentiles.
However, many of these systems do not do a good job of handling derivatives and other asset classes where security identifiers do not exist, forcing firms to rely on only a security description. As a result, match rates take a steep and rapid decline since these systems do not accurately apply descriptions in their matching algorithms. What causes such inaccurate matches? Syntax-based fuzzy matching.
Syntactic versus Semantic Matching
Systems that use syntax to match security descriptions have one crucial flaw: lack of understanding. They look only at characters and the order of data. In general, fuzzy matching allows for inexact matching of strings, which ultimately provides a score to a match. The lower the score, the better the match. The score is based on a syntactic difference between the two words. For example:
- Sally fuzzy matching to Sally gives a score of 0 because they are equal.
- Sally fuzzy matching to Silly gives a score of 1 because they differ by one character.
The problem with this kind of matching is that it is syntactic and not semantic – meaning that it does not take actual meaning into account; it just looks at characters. Clearly, this is not how humans resolve ambiguity; we use our understanding and experience to apply context to words, not individual characters.
Comparing Fuzzy Matching Techniques
To further illustrate the difference between syntactic and semantic fuzzy matching, we must look at the problems with syntax and their impact on outcomes.
Problem #1: Matching Accuracy
Imagine working with the following data:
Custodian Security Identifiers
Manager Security Identifiers
Sams Club 3.75
Sams Club 3 3/4
Sams Club 3.375
Sams Club 3 3/4
In this example, a syntactic engine misinterprets the data and gives you a very wrong result. These numbers might be the yield on a bond or some similar piece of information that is hidden in the security identifier. Here, you would be choosing the wrong bond because the engine does not understand that 3 3/4 is a fraction that converts to a decimal.
However, a fuzzy matching tool that uses semantics interprets the identifier before trying to match it. That means it is able to understand that 3 3/4 is the same as 3.75, and therefore produces the correct answer and score.
Problem #2: Matching Speed
If you are only looking at characters and have no meanings to work with, then you will need to compare every item on one side of a set to every item on the other side of the set. When trying to compare large sets of data, this process becomes much more difficult, time consuming and, eventually, near impossible.
Here is a comparison of the time it takes to do matching on small to large sets of data using semantic versus syntactic matching:
Figure 1: Comparison of run times (in seconds) between fuzzy match using syntactic vs. semantic techniques.
As you can see, when an industrial-sized collection of data is at hand, the match time with syntactic matching can take days rather than hours!
How Fuzzy Matching Works
To get this started, you have two different lists (A and B) with different elements to compare with each other. When you are done, you find the lowest score for each element paired and call that your best match.
Now, let’s figure out how long that would take to do. If you have 100,000 elements and each comparison takes 0.1 millisecond, then it would take 1,000,000 seconds – or about 12 days (time = 100,000 elements in List A multiplied by 100,000 elements in List B, multiplied by 0.1 millisecond per element-pair). Twelve days is unacceptable when the entire process must be completed within minutes or even seconds. The reason it takes so long is because the system must take 100,000 passes at the data.
Alternatively, a system that understands that there are very few correct interpretations of the data on each side is able to directly find the best match – requiring only three passes at the data which take only 30 seconds total, not 12 days (time = 3 elements multiplied by 100,000 elements, multiplied by 0.1 millisecond per element-pair).
How does it do this? Semantics. The system understands the context of the data it must examine in order to match it, which expedites the entire matching process as well as increases accuracy. To test this, we took the entire English language dictionary and fuzzy matched it against the same dictionary with 84,000 elements on each side. The total time for the run was just seven seconds.
When it comes to fuzzy matching, knowledge and understanding make all the difference. By working with a solution provider with deep expertise in the financial services industry, your firm will have access to a purpose-driven, semantics-based fuzzy match engine that is faster and more accurate than solutions that use off-the-shelf tools. In addition, it would be able to successfully match within asset classes lacking reliable security identifiers, process large volumes, and scale to growth. In the buy-side investment management industry, the value of accurate and rapidly delivered insights on positions, transactions and cash balances is immeasurable.
# # #
Fuzzy matching is now available in the latest version of Electra Reconciliation.
Scott Rhodes has been chief operating officer (COO) of Electra since 2014 and served more than 10 years prior on the company’s board of directors. Before to joining Electra, Scott was a managing director at HedgeServ, president and founder of internet video company Veotag, and held executive positions at Multex Systems and Information Builders.