State of the art on Bitcoin trading prediction

This section reviews previous work on Bitcoin price prediction, sentiment analysis, reputational polarity and NLP models applied to social signals. It gives context to the later thesis sections by explaining why classic sentiment and reputational impact should be treated as related but different variables.

2.State of the Art

To check the degree of correlation between the price of Bitcoin and reputational and sentiment polarity, it is necessary to know both terms and what algorithms can help us achieve this goal.

The section [2.1] They will analyze the differences between sentiment analysis techniques and reputational polarity based on previous studies published on the subject. In the section [2.2] Different studies carried out on the prediction of Bitcoin value from social networks are analyzed.

2.1. Sentiment analysis vs polarity reputational

As we have commented in the section [1.2] , online reputation is a reflection of the prestige of a person or a brand on the Internet. In order to quantify the reputation of an entity, a predictive algorithm must be able to analyze a document with the aim of finding the most relevant information and classifying it according to its positive, neutral or negative implications, that is, it must use Natural Language Processing techniques in order to be able to interpret its reputational implications.

In this sense, in the study of applied art in this section it has been observed how sentiment analysis is the most used Natural Language Processing tool for monitoring online reputation. Works like [Sentiment Analysis or Opinion Mining: A Review] [30] are an example of this statement, despite, as has been demonstrated in [European Conference on Information Retrieval] [21] that the sentiments of a text and its reputational implications for that entity are different things. Actually, most texts with reputational implications are polar facts, that is, factual information without explicit feelings.

Of course, measuring the reputational polarity of a text is more complicated when the document does not implicitly express a positive or negative reputation on the topic analyzed; But investing resources in this case can provide entities with positive applications, for example, to obtain unstructured opinion data about a service or product.

Although by definition reputational polarity is substantially different from sentiment analysis, the two have some similarities. Furthermore, work on reputational polarity has evolved from previous studies on sentiment analysis, that is, the process of resolving (statistically) whether a text contains positive, negative, or neutral sentiments regarding the entity of interest.

As we have already mentioned, work on opinion recovery and sentiment analysis can be divided into two categories: lexicon-based approaches and supervised classification. Lexicon-based approaches estimate the sentiment of a document using a list of opinion words known as opinion lexicons, such as article [Proceedings of the 40th annual meeting on association for computational linguistics] [17] where the sentiment of a document is identified through a dictionary of words cataloged according to its sentiment. The lexicon-based approach is unsupervised as it does not require any training data. More sophisticated approaches incorporate additional sentiment indicators such as proximity between query terms and sentiment. [13] or stylistic variations based on themes [12] .

Classification-based approaches use sets of features to build a classifier that can predict the polarity sentiment of a document. [10] . Features range from simple n-grams to semantic features and from syntactic features to medium-specific features. [9] .

Furthermore, classification-based approaches can also be divided into semi-supervised and supervised approaches. The biggest difference between the two categories is that semi-supervised approaches combine labeled and unlabeled data. In the article [Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval] [8] A comprehensive review on opinion retrieval and sentiment analysis can be found here. While at work [Like it or not: A survey of twitter sentiment analysis methods] [7] , they found an exhaustive search focused on Twitter sentiment analysis.

From the work on sentiment analysis methods, the first approaches were established for the analysis of reputational polarity, achieving the best results with models trained from textual and sentiment characteristics. The best result was achieved in the article [CEUR WORKSHOP PROCEEDINGS] [6] who trained a maximum entropy classifier using the sentiment lexicon, diagrams, number of negation words and character repetitions. [CLEF 2013 Conference and Labs of the Evaluation Forum] [5] , addressed the problem of reputation polarity with an information retrieval-based approach and found the most relevant class using the tweet content as a query.

[Estimating reputation polarity on microblog posts] [4] , assumed that understanding how a tweet is perceived is an important indicator for estimating the reputational polarity of a tweet. To this end, they proposed a supervised approach that also considered reception characteristics such as replies and retweets of tweets. The results showed that these features were effective and that their best result was obtained on entity-dependent data.

Our contribution will be to investigate the contextual technique of word embeddings in its implementation in the BERT system in section [2.1.1.] in the estimation of the reputational polarity of tweets and the prediction of stock market values, comparing it with sentiment analysis.

2.1.1 Natural Language Processing

As we have seen in the section [2.1] , reputational polarity can use the same algorithms that data analysts use to measure sentiment polarity. Based on that premise, this section of the project will aim to investigate the field of Natural Language Processing (NLP).

As we have analyzed previously, text processing through artificial intelligence represents a challenge when presenting a given text to an algorithm and for it to understand it in its entirety, preserving the characteristics of the language.

Modern natural language processing (as of 2013) frequently uses the technique of embeddings, representations of words in an n-dimensional vector, based on the premise that their spatial proximity entails some kind of relationship between them. In the figures [1] [2] [3] 3 graphic examples of this algorithm can be analyzed.

Location proximity

As you can see, the first step of this algorithm is to assign each word a vector of numbers based on its semantic content (it should be remembered that neural networks are more efficient with numbers). If the image is analyzed [1] , we see a semantic example of how four different but related words would be represented in a vector space. If a mathematical operation is performed such as: King minus man plus woman, the result will be a vector very close to the one represented by Queen.

This evolution allows neural network systems to be used to understand the semantics of words, although without understanding the relationships between them. To solve this lack, NLP techniques have improved enough to generate what we know today as 'language models'.

Language models are Machine Learning patterns designed to predict what the next word in a text should be based on all the previous words.

The great potential of this technique is that, once the AI understands the structure of a language, it is relatively easy to download these pre-trained models and adapt them through fine-tuning to tasks other than text creation, such as text classification.

Among all the systems published so far and after searching among different solutions, in this research we have opted for BERT, one of the most advanced models for the representation of words and texts. BERT is a system that provides contextual word embeddings, that is, each word receives a representation dependent on the context in which it appears. Contextual word embeddings are pre-trained systems that provide unprecedented semantic richness, and that have been changing NLP since 2018. Although there are several systems that compete with BERT today, the fact that BERT is open source and well documented makes it the most popular option and the one we have adopted in this work.

Bert

As has been seen through the content of this section, in order to classify the reputational polarity of a text, both the analysis of the document using vector space and the analysis of the context in which the words occur will be necessary. We can see the consequences of this new interpretation reflected in the word king from the previous example, since it will have a different meaning depending on the context in which the word is used. This subtlety is necessary, since capturing the grammatical meaning of the words can provide relevant information about their polarity. For example, it is not the same to use a word as an object or as a subject in a sentence, with one meaning or another.

In this sense, natural language processing (NLP) techniques based on artificial intelligence (AI) algorithms will offer us a better solution than the algorithms analyzed so far.

To do this, previous experience in this field in language translation, sentiment analysis or semantic search can be used, which will offer help when choosing the best path for our task. Another benefit of gaining prior experience on other tasks is the ability to more efficiently optimize the created model. These algorithms need to be fed with diverse data sets large enough to train the models they use. Deep learning algorithms imitate the behavior of neurons in the human brain, that is, as the training set increases, its results improve and, therefore, any set already labeled can help us obtain better results in the project.

Now, because NLP is a field with many different tasks, most task-specific data sets contain only a few thousand or a few hundred thousand examples of human-labeled documents. To help close this data gap, researchers have developed a variety of techniques to train general-purpose language representation models using the enormous amount of unannotated text on the web (known as pre-training). The pre-trained model can then be fine-tuned to small data NLP tasks such as question answering and sentiment analysis, resulting in substantial accuracy improvements compared to training on these data sets from scratch.

And in this context, on November 2, 2018, Google presented Open Sourcing BERT (Bidirectional Encoder Representations from Transformers), the first deeply bidirectional contextual model, unsupervised language representation, pre-trained using only a plain text corpus.

BERT builds on recent work in pre-training contextual representations, including Semi-supervised Sequence Learning, Generative Pre-Training, ELMo, and ULMFit. However, unlike these previous models, BERT is the first unsupervised, deeply bidirectional language representation, pre-trained using only a simple text corpus.

As explained by Google AI researchers Jacob Devlin and Ming-Wei Chang, BERT is unique because it is bidirectional, allowing access to context from past and future directions and unattended, meaning data can be captured without classification or flagging. This contrasts with traditional NLP models that produce a context-free word embedding (a mathematical representation of a word) for every word in your vocabulary.

Pre-trained representations can be context-free or contextual, and contextual representations can be unidirectional or bidirectional. The context-free models discussed above generate a single-word embedding representation for each word in the vocabulary. For example, the word "bank" would have the same context-free representation in "bank account" and "river bank." Instead, contextual models generate a representation of each word that is based on the other words in the sentence. For example, in the sentence "I accessed the bank account," a one-way contextual model would represent "bank" based on "I accessed the" but not "account." However, BERT represents "bank'' using its previous and next context. "I accessed the [...] account'', starting from the bottom of a deep neural network, making it deeply bidirectional.

Below is a visualization of BERT's neural network architecture compared to more advanced contextual pre-training methods. Arrows indicate the flow of information from one layer to the next. The green boxes at the top indicate the final contextualized representation of each input word:

4 Comparison of Bert as a bidirectional algorithm, OpenAI GPT unidirectional and ELMo that is a bit two-way. Source Google AI blog [1]

With this release, anyone in the world can train their own sentiment analysis system (or a variety of other models) in a few hours with a single GPU. The release includes source code built on TensorFlow and a number of pre-trained language rendering models (including English).

Additionally, BERT learns to model relationships between sentences through image priors from any corpus. It is based on Google's Transformer, an open source neural network architecture based on a self-attention optimized for NLP

The last point to take into account in this algorithm is the score obtained in (SQuAD), an adaptation of BERT for reading achieved an accuracy of 93.2 percent, exceeding the state of the art and the human level of 91.6 percent and 91.2 percent, respectively. In the GLUE Benchmark (GLUE), a collection of datasets for the evaluation of NLP rendering systems has achieved an accuracy of 80.4 percent.

2.2. Bitcoin value prediction from networks social

In traditional currency markets it is common to see investors use one of the following approaches (together or separately) to predict market trends:

Fundamental Analysis: The technique that uses the underlying factors of a security to estimate its value. In In relation to the currencies issued by the State, this technique focuses on indicators such as forecasts of growth of a nation, import and export levels, tourism, political measures, levels debt, GDP and international relations. These are used as parameters for a valuation model. If the coin is considered to be below price then it makes sense to buy that coin, otherwise to sell.

[28]

Technical analysis: it is an alternative method of assigning value to a stock that analyzes the activity of the market by analyzing data such as historical prices and daily traded volume. This approach does not attempt to measure the intrinsic value of a security, but rather uses mathematical models and statistical analysis to identify patterns in order to predict future activity.

[Bitcoin Trading Agents]

[22]

x_{i}

y_{1}

\leq i \leq n

If we try to apply fundamental analysis on Bitcoin we will encounter many problems. As we have mentioned, this new currency is not backed by any entity or nation, only by the users who use it and give it a value in each transaction. For this reason, in the case of Bitcoin we cannot use the typical analysis based on usual economic indicators, but we will have to adapt to this new scenario that must be analyzed in this section.

The first point of all is to understand the characteristics of Bitcoin as a currency, its users and the market forces that drive its price variations. Understand the factors that differentiate it from traditional currencies and explore important considerations when designing a successful prediction.

At this time there is a great debate about its use among those authors who analyze assets as speculative values or refuge while other authors maintain that the attractiveness could increase until they end up fulfilling the functions of money demanded by economic theory. The article titled Inferring causal impact using Bayesian structural time-series models [34] explores the association between the market price of Bitcoin and a set of internal and external factors using the Bayesian Structural Time Series Approach. The results show that Bitcoin has mixed properties as it appears to currently act as a speculative asset, safe haven and a potential capital flight instrument.

The Bayesian Structural Time Series Approach (BSTS) model is a machine learning technique used for feature selection, time series forecasting, nowcasting, and causal impact inference, for example.

In this case, for the analysis of time series it is advisable to use methods that help interpret the information obtained by the sources and allow representative information to be extracted about the underlying relationships between the data of the series or various series. All of this allows (to a different extent and with different confidence) to extrapolate or interpolate the data and thus predict the behavior of the series at unobserved moments.

Another example is quantitative trading techniques, widely used throughout the financial industry, where price movements are assumed to follow a set of patterns, so that historical prices can be used to predict future ones. Based on this information, the latent source model, formalized in the work, can be used. A latent source model for nonparametric time series classification [27] , which attempts to take data considered high dimensional (such as a time series), and identify the ways in which the underlying events are characterized in that space. There may be only a small number of primary causes for events, but they will often be hidden in the data and are difficult to find.

Regarding sources of information, there are different sources of data that are easily accessible, such as:

Blockcain.info where all the information related to monetary statistics, activity of the network, details about blocks, new coin creation rates and transactions. Of course, it includes exchange value USD to bitcoin and vice versa along with its volume.
Google Trends. This platform is a Google Labs tool that shows the most search terms popular of the recent past. Using the word Bitcoin as a query, the main themes have been obtained related to cryptocurrency.
Macroeconomic data. Macroeconomic data from S&P500, Chicago Board Options Exchange and Volatility Index.

Finally, the social network Twitter can be a source of information about the reputation of Bitcoin, since its concise format and the ease of extracting information in real time can predict the evolution of the market. The article [Algorithmic trading of cryptocurrency based on Twitter sentiment analysis] [23] confirms this hypothesis and gives an example of how two distributions created from the collected data have allowed the author to predict the evolution of the market with sufficient success to understand the correlation between the market and the sentiment of users on social networks. Continuing this point of view, the article [The Information of Spam] [2] uses the same source of information but with a different objective, since its intention is to validate the convenience of using Spam to analyze sentiment on social networks.

Once this first approximation is finished, it can be deduced that in all articles there is a correlation between a set of internal and external factors (including the sentiment of the users themselves) of Bitcoin and its price. That said, there is only one text exposed in the article [Inferring causal impact using Bayesian structural time-series models] [34] which exposes the importance not of sentiment but of the reputation of the currency. The document confirms that there is a positive relationship between the new legislation of countries on cryptocurrency and its price increase, that is, it states that the reputation of the currency is a factor that affects the cryptocurrency. Of course, new legislation does not have a feeling in itself, therefore, use the techniques outlined in the article [Algorithmic trading of cryptocurrency based on Twitter sentiment analysis] [23] would not prove this statement, new techniques must be found that confirm this relationship empirically.

With a more current approach, you can consult the project ( [LSTM Model predicting Bitcoin with Tweet Volume \& Sentiment] [14] which aimed to explore the options available to create a model that could predict price action over a selected time period. The variables I used were data collected with tools for sentiment analysis on Twitter to predict the evolution of the market using an LSTM (Long short-term memory). Long short-term memory is a recurrent neural network model that has been predominant in NLP until the emergence of Transformers, which are now the basis of BERT and many other systems.).

Currently there is no article that relates the reputational polarity of Bitcoin with the economic evolution of the market. Although reputational polarity is substantially different from sentiment analysis, it is true that the two tasks have points in common that can be taken advantage of and, therefore, algorithms such as BERT can provide a higher success rate in predicting the market trend and, therefore, have a higher percentage of success.

Therefore, this project will focus on considering Bitcoin as a multifaceted property that goes between a virtual currency, a hedge and safe haven asset for geopolitical instability and a payment method and we will apply state of the art in Natural Language Processing (in particular, contextual word embeddings in its implementation in the BERT system) to the estimation of the reputational polarity of tweets and to the prediction of stock values, comparing it with sentiment analysis.