Predicting Future Correlations between Equities

Vector Space Biosciences
4 min readNov 23, 2018

Cleave, a verb, has two very different meanings. It can describe cutting or splitting something apart with a sharp instrument, or — oddly enough — it can describe sticking to something like glue.’

Context matters.

Teacher: Tell me a sentence that starts with an “I”.

Student: I is the….

Teacher: Stop! Never put ‘is’ after an “I”. Always put ‘am’ after an “I”.

Student: OK. I am the ninth letter of the alphabet.

This is why, as it’s often said, “context is everything”.

With proper context control, Artificial Intelligence (AI) and Machine Learning (ML) operations can minimize loss or increase signal, gain, alpha, precision and accuracy in different ways.

An example of controlling the context during the process of summarization can result in different interpretations as shown here.

Context-controlled on-demand datasets are for customers in any industry benefiting from applications in AI or ML. Here’s a useful ‘Intro to Data Science for Managers’ for anyone needing a primer on how AI and ML algorithmic technology underpins just about every successful company or research effort today. Datasets result in products like Smart Baskets, algorithmically generated clusters of companies that share a theme or have hidden relationships with one another for the purpose of ‘information arbitrage’ in the area of analyzing the effect of global trends on publicly traded companies or research breakthroughs similar to methods like this https://youtu.be/ed2FWNWwE3I?t=2375 [minute 39:35] for the purpose of advanced NLP/NLU (Natural Language Processing/Understanding)

Some investment firms are hiring individuals in the emerging role of Head of Data. This still under-the-radar specialist is not necessarily a technical individual, but someone who understands trading. These individuals are scouring technical trade shows and private companies for minable pockets of information. Private firms are more than willing to provide feeds for data they already own and store for additional revenue. More recently, regulatory scrutiny of equity markets, particularly in dark pools, has pushed some algorithmic traders to seek advantage in other ways. “It may possibly be the next frontier for funds looking for an investment edge” where the new norm is a “continuous information arbitrage. — pg. 17 CME Group https://www.cmegroup.com/education/files/big-data-investment-management-the-potential-to-quantify-traditionally-qualitative-factors.pdf

Many data interpretation operations have a single end goal: To leverage human language surrounding entities on the Internet to predict future correlations (or future price correlations) between them. For example, human language correlations between entities can exist between stocks & global events & stocks or human DNA repair genes & pharmaceuticals in the context of space biosciences research. This process starts with advanced NLP and NLU.

Here’s the import:

NLP and NLU correlations can be used to predict future price correlations or correlations within a certain context.

Below are 5 correlation matrix dataset examples and demos which can be found here and here.

Example 1. First, a dataset without context control based on general correlations in human language surrounding S&P stocks and cryptocurrencies

Example 2. The same dataset but with correlations calculated in the context of “Artificial Intelligence

Example 3. With correlations calculated in the context of “Blockchain

Example 4. Cryptos correlated to cryptos based on an analysis of whitepapers

Example 5. A context-controllable relationship network visualization based on a correlation matrix built using entities from the sci fi book & TV series, The Expanse

More on the these datasets and their generation can be found here including download links to all datasets. Stay tuned for Part 2 as we’ll be working with Life Sciences data specifically related to pharmaceuticals.

Reach us on telegram or at vectorspace.ai for more information.

--

--

Vector Space Biosciences

Accelerating discovery through advanced language modeling for hidden relationship detection in biological data.