Generating Alpha from Information Arbitrage in the Financial Markets with NLP Datasets: 水涨船高
On January 3rd 2019, Bristol-Myers Squibb (BMY) acquired Celgene (CELG) for $74 billion. CELG rose instantly overnight from 66.64 to 87.86 per share, giving it a 31.8% gain. Although most professionals missed out on these gains, there were other hidden opportunities to be had, if you knew where and how to find them.
In this article, we’ll be discussing a group of equities that also rose in value but not immediately. This group has relatively unknown and indirect relationships to CELG. The delay in their rise was sufficient enough that it would have allowed for funds, traders or investors to take positions, and then profit. This is an example of information arbitrage in the financial markets. It happens from time to time and when it does, most people miss the opportunity.
This also relates to a paper titled “Contagious Speculation and a Cure for Cancer: A Non-Event that Made Stock Prices Soar,” (with Tomer Regev, Journal of Finance, February 2001, Vol. 56, №1, pp. 387–396). The research described an event with a company called EntreMed (ENMD was the symbol at the time):
“A Sunday New York Times article on a potential development of new cancer-curing drugs caused EntreMed’s stock price to rise from 12.063 at the Friday close, to open at 85 and close near 52 on Monday. It closed above 30 in the three following weeks. The enthusiasm spilled over to other biotechnology stocks. The potential breakthrough in cancer research already had been reported, however, in the journal Nature, and in various popular newspapers ~including the Times! more than five months earlier. Thus, enthusiastic public attention induced a permanent rise in share prices, even though no genuinely new information had been presented.”
Among the many insightful observations made by the researchers, one stood out in the conclusion:
“[Price] movements may be concentrated in stocks that have some things in common, but these need not be economic fundamentals.”
Capturing information arbitrage opportunities can be done using advanced techniques in the area of Natural Language Processing and Understanding (NLP/NLU). This also includes processing of correlation matrix datasets based on data such as public company profiles, encyclopedias, peer-reviewed scientific literature, news and patents located here.
Using a NLP/NLU correlation matrix dataset of US publicly traded equities, we generated a cluster (or basket) of companies that have relationships to CELG based on symbiotic, parasitic and sympathetic latent entanglement.
What we found, with the top five scoring equities, were unique opportunities to profit ahead of the market. We’ve produced the following slides with charts to show where gains should have been locked in. Although not all equities performed, it’s clear to see where information arbitrage opportunities existed after the CELG acquisition. Especially pronounced movers were AGIO, ADRO and EPZM in relation to CELG. The benchmark used to compare against for the first set of charts is the S&P 500 and for the second set of charts, the IHE (iShares Pharma ETF).
Benchmark: S&P 500
1. AGIO:
2. ADRO:
3. SHPG:
4. XLRN:
5. EPZM:
Source: [ PDF ]
Benchmark: IHE (iShares Pharma ETF)
1. AGIO:
2. ADRO:
3. SHPG:
4. XLRN:
5. EPZM:
You can query the dataset used to generate the CELG basket here. There’s also an API available on the same page.
The resulting output:
Conclusion
The basket (AGIO, ADRO, SHPG, XLRN & EPZM) generated by the dataset included many more equities, but as stated earlier, we chose to compare using the top five scoring symbols.
The delayed reaction of AGIO, ADRO and EPZM is the result of a delay in the way the market absorbs information.
We don’t know how many people missed out on these trades but one thing we do know is that the movement in CELG was nowhere near being priced into AGIO, ADRO and EPZM efficiently. This left an opportunity for a lot of money to be made for those that uncovered these relationships early.
This means that when the news came out that CELG was going to be acquired for $74 billion, one could have generated a basket of equities with hidden relationships to CELG, positioned in them ahead of the market and profited from a rising tide created by CELG.
In the near future, companies like Fetch.ai will provide the ability for decentralized IoT (Internet of Things) data collection and sharing. Companies like Cindicator and Vectorspace AI stand to gain from information arbitrage opportunities in the crypto marketplace.
As we continue to produce our correlation matrix datasets for traditional public companies and cryptocurrencies based on NLP/NLU, we’ll make them available, along with customized versions, to any holders of our token, VXV (data provided by CoinGecko). For more information feel free to reach out to us on telegram or at vectorspace.ai anytime!