State of the art on Bitcoin trading prediction
This section reviews previous work on Bitcoin price prediction, sentiment analysis, reputational polarity and NLP models applied to social signals. It gives context to the later thesis sections by explaining why classic sentiment and reputational impact should be treated as related but different variables.
2.State of the Art
To check the degree of correlation between the price of Bitcoin and reputational and sentiment polarity, it is necessary to know both terms and what algorithms can help us achieve this goal. The section [2.1] They will analyze the differences between sentiment analysis techniques and reputational polarity based on previous studies published on the subject. In the section [2.2] Different studies carried out on the prediction of Bitcoin value from social networks are analyzed.2.1. Sentiment analysis vs polarity reputational
As we have commented in the section [1.2] , online reputation is a reflection of the prestige of a person or a brand on the Internet. In order to quantify the reputation of an entity, a predictive algorithm must be able to analyze a document with the aim of finding the most relevant information and classifying it according to its positive, neutral or negative implications, that is, it must use Natural Language Processing techniques in order to be able to interpret its reputational implications. In this sense, in the study of applied art in this section it has been observed how sentiment analysis is the most used Natural Language Processing tool for monitoring online reputation. Works like [Sentiment Analysis or Opinion Mining: A Review] [30] are an example of this statement, despite, as has been demonstrated in [European Conference on Information Retrieval] [21] that the sentiments of a text and its reputational implications for that entity are different things. Actually, most texts with reputational implications are polar facts, that is, factual information without explicit feelings. Of course, measuring the reputational polarity of a text is more complicated when the document does not implicitly express a positive or negative reputation on the topic analyzed; But investing resources in this case can provide entities with positive applications, for example, to obtain unstructured opinion data about a service or product. Although by definition reputational polarity is substantially different from sentiment analysis, the two have some similarities. Furthermore, work on reputational polarity has evolved from previous studies on sentiment analysis, that is, the process of resolving (statistically) whether a text contains positive, negative, or neutral sentiments regarding the entity of interest. As we have already mentioned, work on opinion recovery and sentiment analysis can be divided into two categories: lexicon-based approaches and supervised classification. Lexicon-based approaches estimate the sentiment of a document using a list of opinion words known as opinion lexicons, such as article [Proceedings of the 40th annual meeting on association for computational linguistics] [17] where the sentiment of a document is identified through a dictionary of words cataloged according to its sentiment. The lexicon-based approach is unsupervised as it does not require any training data. More sophisticated approaches incorporate additional sentiment indicators such as proximity between query terms and sentiment. [13] or stylistic variations based on themes [12] . Classification-based approaches use sets of features to build a classifier that can predict the polarity sentiment of a document. [10] . Features range from simple n-grams to semantic features and from syntactic features to medium-specific features. [9] . Furthermore, classification-based approaches can also be divided into semi-supervised and supervised approaches. The biggest difference between the two categories is that semi-supervised approaches combine labeled and unlabeled data. In the article [Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval] [8] A comprehensive review on opinion retrieval and sentiment analysis can be found here. While at work [Like it or not: A survey of twitter sentiment analysis methods] [7] , they found an exhaustive search focused on Twitter sentiment analysis. From the work on sentiment analysis methods, the first approaches were established for the analysis of reputational polarity, achieving the best results with models trained from textual and sentiment characteristics. The best result was achieved in the article [CEUR WORKSHOP PROCEEDINGS] [6] who trained a maximum entropy classifier using the sentiment lexicon, diagrams, number of negation words and character repetitions. [CLEF 2013 Conference and Labs of the Evaluation Forum] [5] , addressed the problem of reputation polarity with an information retrieval-based approach and found the most relevant class using the tweet content as a query. [Estimating reputation polarity on microblog posts] [4] , assumed that understanding how a tweet is perceived is an important indicator for estimating the reputational polarity of a tweet. To this end, they proposed a supervised approach that also considered reception characteristics such as replies and retweets of tweets. The results showed that these features were effective and that their best result was obtained on entity-dependent data. Our contribution will be to investigate the contextual technique of word embeddings in its implementation in the BERT system in section [2.1.1.] in the estimation of the reputational polarity of tweets and the prediction of stock market values, comparing it with sentiment analysis.2.1.1 Natural Language Processing
As we have seen in the section [2.1] , reputational polarity can use the same algorithms that data analysts use to measure sentiment polarity. Based on that premise, this section of the project will aim to investigate the field of Natural Language Processing (NLP). As we have analyzed previously, text processing through artificial intelligence represents a challenge when presenting a given text to an algorithm and for it to understand it in its entirety, preserving the characteristics of the language. Modern natural language processing (as of 2013) frequently uses the technique of embeddings, representations of words in an n-dimensional vector, based on the premise that their spatial proximity entails some kind of relationship between them. In the figures [1] [2] [3] 3 graphic examples of this algorithm can be analyzed.


Location proximity
As you can see, the first step of this algorithm is to assign each word a vector of numbers based on its semantic content (it should be remembered that neural networks are more efficient with numbers). If the image is analyzed [1] , we see a semantic example of how four different but related words would be represented in a vector space. If a mathematical operation is performed such as: King minus man plus woman, the result will be a vector very close to the one represented by Queen. This evolution allows neural network systems to be used to understand the semantics of words, although without understanding the relationships between them. To solve this lack, NLP techniques have improved enough to generate what we know today as 'language models'. Language models are Machine Learning patterns designed to predict what the next word in a text should be based on all the previous words. The great potential of this technique is that, once the AI understands the structure of a language, it is relatively easy to download these pre-trained models and adapt them through fine-tuning to tasks other than text creation, such as text classification. Among all the systems published so far and after searching among different solutions, in this research we have opted for BERT, one of the most advanced models for the representation of words and texts. BERT is a system that provides contextual word embeddings, that is, each word receives a representation dependent on the context in which it appears. Contextual word embeddings are pre-trained systems that provide unprecedented semantic richness, and that have been changing NLP since 2018. Although there are several systems that compete with BERT today, the fact that BERT is open source and well documented makes it the most popular option and the one we have adopted in this work.Bert
As has been seen through the content of this section, in order to classify the reputational polarity of a text, both the analysis of the document using vector space and the analysis of the context in which the words occur will be necessary. We can see the consequences of this new interpretation reflected in the word king from the previous example, since it will have a different meaning depending on the context in which the word is used. This subtlety is necessary, since capturing the grammatical meaning of the words can provide relevant information about their polarity. For example, it is not the same to use a word as an object or as a subject in a sentence, with one meaning or another. In this sense, natural language processing (NLP) techniques based on artificial intelligence (AI) algorithms will offer us a better solution than the algorithms analyzed so far. To do this, previous experience in this field in language translation, sentiment analysis or semantic search can be used, which will offer help when choosing the best path for our task. Another benefit of gaining prior experience on other tasks is the ability to more efficiently optimize the created model. These algorithms need to be fed with diverse data sets large enough to train the models they use. Deep learning algorithms imitate the behavior of neurons in the human brain, that is, as the training set increases, its results improve and, therefore, any set already labeled can help us obtain better results in the project. Now, because NLP is a field with many different tasks, most task-specific data sets contain only a few thousand or a few hundred thousand examples of human-labeled documents. To help close this data gap, researchers have developed a variety of techniques to train general-purpose language representation models using the enormous amount of unannotated text on the web (known as pre-training). The pre-trained model can then be fine-tuned to small data NLP tasks such as question answering and sentiment analysis, resulting in substantial accuracy improvements compared to training on these data sets from scratch. And in this context, on November 2, 2018, Google presented Open Sourcing BERT (Bidirectional Encoder Representations from Transformers), the first deeply bidirectional contextual model, unsupervised language representation, pre-trained using only a plain text corpus. BERT builds on recent work in pre-training contextual representations, including Semi-supervised Sequence Learning, Generative Pre-Training, ELMo, and ULMFit. However, unlike these previous models, BERT is the first unsupervised, deeply bidirectional language representation, pre-trained using only a simple text corpus. As explained by Google AI researchers Jacob Devlin and Ming-Wei Chang, BERT is unique because it is bidirectional, allowing access to context from past and future directions and unattended, meaning data can be captured without classification or flagging. This contrasts with traditional NLP models that produce a context-free word embedding (a mathematical representation of a word) for every word in your vocabulary. Pre-trained representations can be context-free or contextual, and contextual representations can be unidirectional or bidirectional. The context-free models discussed above generate a single-word embedding representation for each word in the vocabulary. For example, the word "bank" would have the same context-free representation in "bank account" and "river bank." Instead, contextual models generate a representation of each word that is based on the other words in the sentence. For example, in the sentence "I accessed the bank account," a one-way contextual model would represent "bank" based on "I accessed the" but not "account." However, BERT represents "bank'' using its previous and next context. "I accessed the [...] account'', starting from the bottom of a deep neural network, making it deeply bidirectional. Below is a visualization of BERT's neural network architecture compared to more advanced contextual pre-training methods. Arrows indicate the flow of information from one layer to the next. The green boxes at the top indicate the final contextualized representation of each input word:
2.2. Bitcoin value prediction from networks social
In traditional currency markets it is common to see investors use one of the following approaches (together or separately) to predict market trends:- Fundamental Analysis: The technique that uses the underlying factors of a security to estimate its value. In In relation to the currencies issued by the State, this technique focuses on indicators such as forecasts of growth of a nation, import and export levels, tourism, political measures, levels debt, GDP and international relations. These are used as parameters for a valuation model. If the coin is considered to be below price then it makes sense to buy that coin, otherwise to sell. The article Madan et al. [28] is an example of this focus. It was observed that the existing research did not consider the relationship between other factors in space of features and price estimation of Bitcoin when applied to a trading agent. When analyzing 16 independent features, they created a machine learning algorithm to predict the price of Bitcoin. These 16 features are related to the price of Bitcoin and were recorded daily for the last few 5 years. Their study also considered using Bitcoin prices only as a means of predicting the direction of future price changes.
- Technical analysis: it is an alternative method of assigning value to a stock that analyzes the activity of the market by analyzing data such as historical prices and daily traded volume. This approach does not attempt to measure the intrinsic value of a security, but rather uses mathematical models and statistical analysis to identify patterns in order to predict future activity. An example of this analysis is the article [Bitcoin Trading Agents] [22] where it is proposed to predict the price of Bitcoin through a Bayesian regression. N labeled data points are given in the paper ( , ) for 1 . This training data (historical Bitcoin prices) is used to predict the unknown label (future price of Bitcoin) given a given x. That is, the model used focuses on understanding the information found in historical data related to
- Blockcain.info where all the information related to monetary statistics, activity of the network, details about blocks, new coin creation rates and transactions. Of course, it includes exchange value USD to bitcoin and vice versa along with its volume.
- Google Trends. This platform is a Google Labs tool that shows the most search terms popular of the recent past. Using the word Bitcoin as a query, the main themes have been obtained related to cryptocurrency.
- Macroeconomic data. Macroeconomic data from S&P500, Chicago Board Options Exchange and Volatility Index.