Comparing reputational polarity with sentiment analysis for stock prediction

Although sentiment analysis is the Natural Language Processing tool most often used in online reputation management, previous work in the RepLab evaluation campaign showed that the sentiment expressed in a text and the reputational implications of that text for the observed entity are not the same thing. In fact, many texts with reputational implications are polar facts: factual statements with no explicit emotional language.

This master thesis compares the usefulness of automatic reputational-polarity analysis with classic sentiment analysis for stock-price prediction. The working hypothesis is that reputational analysis should be more directly related to market valuation and therefore should be a better predictor of price movements. Bitcoin was selected as the experimental asset because of its high volatility, and Twitter was selected as the textual source because it offers immediacy and makes temporal correlation with market behavior feasible.

Because there was no suitable dataset for this objective, the first contribution of the thesis is BitTweet, a dataset of tweets mentioning Bitcoin that was manually labeled for both sentiment and reputational polarity and linked to Bitcoin price information. These annotations make it possible to quantify the difference between both tasks, evaluate prediction models for sentiment and reputational polarity, and test market-prediction models built from tweets.

The second contribution is the application of the state of the art in Natural Language Processing, especially contextual word embeddings through BERT, to reputational-polarity estimation and Bitcoin-price prediction, always in direct comparison with sentiment analysis. The results support the original hypothesis: reputational polarity is a better predictor than sentiment analysis. This suggests that reputation-monitoring systems should reduce their dependence on generic sentiment software and move toward direct analysis of reputational implications.

Keywords

BERT, reputational polarity, sentiment analysis, Python, Bitcoin, Twitter

Historical scope and reproducibility

This thesis analyzes Twitter and Bitcoin data collected in 2019. Its results describe that dataset and experimental design; they are not a current trading signal. Reproduction requires the original time split, labels, preprocessing, model versions and market series, while new studies must also account for API access and distribution drift.

1. Introduction

The first chapter documents the context, the student's motivation, the objectives proposed for the start of the project, as well as the structure of the entire memory.

1.1. Online Reputation

The online reputation It is a reflection of the prestige of a person, organization, brand, etc. on the Internet. This perception is not under the control of a subject or an organization, but is the result of a set of conversations, opinions, events and articles shared in different media about a specific entity. Now, although the appearance of the Internet represents a new opportunity for communication between entities and its users, the diversity of media (blogs, social networks, websites, etc.) along with the large number of information complicate analysts' ability to quantify this perception.

In order to know the online reputation of a brand or entity, the analyst must filter this data flow with the aim of finding the most relevant information and classifying it according to its positive implications, neutral or negative and their potential impact.

Although sentiment analysis is the most used Natural Language Processing tool for online reputation monitoring, it has been shown that the feelings of a text and its implications reputational for that entity are different things; in fact, most of the texts with implications reputational are polar facts, that is, factual information without explicit feelings. Of course, Measuring the reputational polarity of a text is more complicated when the document does not implicitly express a positive or negative reputation on the topic analyzed.

Investing resources in this type of analysis can provide entities with positive applications, e.g. to obtain data from unstructured opinions about a service or product. A real use of application of reputational polarity analysis can be found in any company that has Twitter or other accounts on social networks where comments are received. Obviously, it is bad business for an entity to leave negative comments unanswered for too long, therefore an application that allows you to identify tweets with negative reputation can give them a quick way to find and prioritize these customers dissatisfied.

1.2. Sentiment analysis vs reputational polarity

As we have introduced in the section 1.1. Reputation Online , the sentiments of a text and its reputational implications for that entity are things different. For this reason, in this section it is necessary to explain what Sentiment Analysis is, in which difference from the analysis of reputational polarity and how we can use its advances in the project.

Sentiment analysis, as it is currently known, is the process of determining whether a text expresses feelings or emotions. Although it may consist of specifying what emotions are expressed and with what intensity (valence), the simplest - and most common - analysis consists simply of determining without a text expresses positive, negative or neutral feelings (polarity of feeling).

For example, the words "good" and "excellent" would be treated the same in a polarity-based approach, while “excellent” would be treated as more positive than “good” in a valence-based approach.

To determine whether these words are positive or negative (or to what extent), the developers of these approaches need a group of people to manually score them for each type of context, which It is obviously quite expensive and time-consuming. Furthermore, the lexicon must have good coverage of the typical words in the context of study, otherwise it will not be very precise. On the other hand, when there is a good adjustment between the lexicon and the objective to be studied, the sentiment analysis is very precise and, in addition, returns results quickly even on large amounts of text.

As we have mentioned, the work to generate lexicons is quite expensive and requires a lot of time. production, therefore the refresh rate is not very high. This means that lexicons lack of the latest updated jargon and this can be a problem. In the figure 1 You can see an example of this situation, since the user shows his opinion of discontent through very current jargon (blue squares) using signs of Multiple punctuation, acronyms and an emoticon. If the analysis does not take these expressions into account, this tweet Negative would be classified as neutral for the rest of the content.

1. Example Tweet with current jargon. Web font Using VADER to handle sentiment analysis with social media text [1]

When a text has a negative sentiment polarity and refers to the entity of interest, it is possible that has negative implications for your reputation. For example, "I'm fed up with the online sales service of Renfe" expresses a negative feeling (fatigue) regarding the Renfe company, and therefore can affect negatively to its reputation. This makes sentiment analysis commonly applied to measure the state of opinion regarding a company, product, organization, etc.

As we have commented in the section 1.1 , The reputation of an entity can be affected by news or events where a feeling is not expressed. The approval of a law, the implementation of a technical improvement or even economic disasters such as those experienced in Venezuela must be labeled differently from the point of view of reputation than of sentiment. This This point will be made later, in the section 3.2 where They will address these differences with examples. Here we will give just one example: "Company X pays 1\% of its tax benefits" is a factual expression, without any associated sentiment; However, this fact suggests some type of tax fraud or engineering, and therefore will have immediate negative consequences on the perception of the company by public opinion. These types of expressions are known as "polar facts", and are very common in the context of online reputation.

Another difference between sentiment polarity and reputational analysis is that, sometimes, a sentiment Negative can imply a positive reputational polarity, and vice versa. For example, "I am very sad about death of X" is a negative feeling with positive implications for X.

Although reputational polarity is substantially different from sentiment analysis, in the article Sentiment propagation for predicting reputation polarity [2] shown as both Tasks have common points that can be taken advantage of and, therefore, must be analyzed. As we have commented Previously, creating a new model for automatic analysis of reputational polarity would be a huge effort in resources and time, therefore, we can take advantage of previous research in the analysis automatic sentiment with the aim of adapting it for reputational polarity detection.

To some extent, reputational polarity is related to sentiment analysis and, therefore, Previous works in this field will be useful for the study of reputational polarity. Following this point of view, we will find two fields that can provide information for the investigation:

Lexicon. Each word that expresses feeling in a document is an indicator of information. Therefore, it would be possible to find lists with opinion terms, queries or lists adapted to the topic. analyzed that we could use.
By characteristics. The feeling can also be obtained from the characteristics syntactic analysis of the text through supervised or semi-supervised algorithms.

A little more adapted to today's society and outside of these two large categories there are studies that use the comments on social networks or user reactions to know the feeling of a text.

Once this brief introduction has been made, it is convenient to understand and analyze a lexical solution for analyzing the sentiment to adapt it to the reputational polarity. A valid example would be to adapt the algorithm presented in the article Algorithmic trading of cryptocurrency based on Twitter sentiment analysis [3] with him objective of detecting the sentiment of the document through a dictionary of words where the polarity of feeling (positive or negative) to a certain topic. The result of this mechanism is a score based on the number of sentiment words contained in the document.

This definition would be expressed mathematically in the following way:

Polarity(d) is the sentiment polarity for document d expressed in the values $-1,0,1$ .
_d It is the score of document d based on the sum of the scores of its terms $S_{d}=sum_{tin d}opinion(t)$
opinion(t) is the punctuation of the term according to the dictionary.

polaridad(d)= \left\{ \begin{array}{lr} 1 & if\,\, S_{d} > 0\\ -1 & if\,\, S_{d}< 0 \\ 0 & otros\,\, casos \\ \end{array} \right.

To improve this approach, different improvements can be proposed, such as reclassifying the terms that are classified as neutral and use these words to increase the dictionary or spread the feeling between documents with a high degree of similar terms.

What's more, if we modify the lexicon used to classify the terms and catalog them according to their polarity reputational, we would obtain a new algorithm capable of automatically predicting the polarity of a text. Additionally, a supervised method could be implemented to discover words that indicate this characteristic. This approach is based on Pointwise Mutual Information ( PMI ) exposed in the work Word association norms, mutual information, and lexicography [4] where is assigned to each of the terms t a PMI value for each of the three categories: positive, neutral and negative. For To obtain this score we have to perform the following calculation:

$PMI(d,positivo) \, = \, \sum_{t \in d}PMI(t,positive)$
$PMI(t,positivo) \, = \, log_{2} \frac{c(t,positivo)*N}{c(t)*c(positivo)}$

Where:

c(t,positive) is the frequency of the term t in the positive documents.
N is the total number of words in the corpus.
c(t) is the frequency of term t in the corpus when.
c(positive) is the number of positive terms in the document.

The PMI of the negative and neutral terms will be calculated in the same way. The final valuation will be the highest value high between the classes of the different terms.

This example is useful to understand what reputational polarity is and how reputational polarity can be measured in a text, but for Innovating in the project requires using more modern and efficient techniques applied to linguistics.

1.3. Goals

The objective of the thesis is to compare the usefulness of the automatic analysis of reputational polarity with respect to the sentiment analysis to predict stock market values, under the hypothesis that reputational analysis should have a more direct relationship with the price and therefore be a better predictor of its stock market value.

As we mentioned at the beginning of the chapter, to achieve this objective it is necessary to learn to filter a data flow to find the most relevant information with the objective of being able to classify it according to its positive, neutral or negative implications and correlate that feeling or polarity with the evolution stock market of a value.

As there was no suitable dataset for our purpose, our first objective was to develop a dataset of documents manually annotated with respect to sentiment and reputational polarity and linked to the price of a stock market value. Manual annotations of this dataset will allow us to quantify the difference between sentiment analysis and reputational polarity, evaluate sentiment and polarity prediction models reputational, and evaluate stock market value prediction models.

The second objective is to analyze the state of the art in Natural Language Processing with the objective of find an algorithm that allows us to quantify the reputational polarity of a document and predict stock values, comparing it with sentiment analysis.

1.4. Methodology

To meet the objectives set forth in section 1.3 , the first step is to choose the stock market value to predict. At this point, Bitcoin, a digital currency system, was chosen. peer-to-peer programmed in open source [5] and considered as a alternative to standard currencies. In the section 1.4.1 will be explained the characteristics of the cryptocurrency, as well as the main reason for its choice.

Next, based on the study of several articles based on Bayesian algorithms and machine learning applied to stock market prediction, it has been possible to reach the conclusion that, to predict the volatility of a stock market value, its correlation with a set of characteristics must be analyzed, among which may be its economic value, macroeconomic data or its social repercussions among others. An example, you can see a shows this correlation is the article Exploring the determinants of Bitcoin's price: an application of Bayesian Structural Time Series [6] where exposes the relationship that exists between the appearance of new legislation for Bitcoin and an increase in price.

This statement has affected the next point of the methodology, the data source to be used. From the state of art we have been able to deduce how the social network Twitter can be a perfect source of information about a entity, since its concise format and the ease of extracting documents in real time has made it possible to predict the market evolution. For example, Colianni, Stuart and Rosales, Stephanie and Signorotti, Michael present it in his work Algorithmic trading of cryptocurrency based on Twitter sentiment analysis [3] where, from two distributions created using the tweets collected in their experiment, they have managed to predict the evolution of the market with enough success to confirm the correlation between the value and the sentiment of the users on that social network. Another example can be found in The Information of Spam [8] where it shows that the tweets considered as spam contain information that helps predict the trend of the markets, that is, Anderson, Sawyer C states that this type of information considered useless for the majority of humanity may have relevant information to make estimates.

The next most important point in the methodology was to analyze the algorithms that exist at this time to language analysis, as well as knowing what type of users the entity is intended for, their intentions and the way to obtain information about them, that is, analyze how to measure the reputational polarity of a tweet. Having all this information will allow us to classify your opinions correctly.

In order to relate this information with the entity and its economic evolution, it is necessary to carry out a research on the field of Machine Learning applied to text interpretation. Although the analysis of the reputational polarity is substantially different from that of sentiment, according to Sentiment propagation for predicting reputation polarity [2] The two tasks have points in common that can be taken advantage of. For this In the next section of the methodology we will analyze the Natural Language Processing (NLP) techniques, reputational polarity and sentiment to assess the best solution. More precisely, it will be used BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [9], the name that has been awarded to one of the most advanced word processing models that exist and explained in the section on Bert.

Once the algorithm to be implemented has been chosen, the document will continue to expose the importance not of sentiment but of the reputation of the entity. For example, in the work Exploring the determinants of Bitcoin's price: an application of Bayesian Structural Time Series [6] if checks how there is a positive relationship between the new legislation of countries on Bitcoin and its increase price, that is, it states that the reputation of the currency is a factor that affects the cryptocurrency. By Of course, a new legislation does not have a feeling in itself, therefore, using techniques and not would prove this statement, new techniques must be found that confirm this relationship empirically.

As there was no suitable dataset for our objective, the decision was made to develop BitTweet , a dataset of tweets mentioning Bitcoin manually annotated with respect to sentiment and reputational polarity and linked to the price of bitcoin. The manual annotations of this dataset tell us allow us to quantify the difference between sentiment analysis and reputational polarity (something that had not been done previously), evaluate models for predicting sentiment and reputational polarity, and evaluate models of Stock market value prediction from tweets.

Finally, we have used the estimate of the reputational polarity of the tweets obtained from BERT and the stock market value prediction to compare with sentiment analysis. To validate the implementations and compare their success, an already trained model called VADER (Valence Aware Dictionary and Sentiment Reasoner) ( https://github.com/cjhutto/vaderSentiment ). This library is a sentiment analysis tool based on rules and lexicon that is specifically in tune with the feelings expressed on the networks that will allow us to compare and validate the results obtained by BERT.

1.4.1 Bitcoin

As already mentioned, Bitcoin is a peer-to-peer digital currency system programmed in open source. [Bitcoin: A peer-to-peer electronic cash system] and considered as a alternative to standard currencies. Uses a cryptographic protocol to control creation and transfer of money, ensuring that it retains its value and preventing it from being double spent. It is created and transferred without the need for a central government authority, using computing resources available to anyone user and transferring directly from one account to another using cryptographic algorithms.

Cryptocurrencies have a number of benefits over traditional currencies, as there is no need to a trusted third party. Currently, trading in "paper" currencies is based on trust issued by financial institutions that act as regulators in payment processes. The inherent weaknesses of a trust-based model cause transaction costs to increase, since the third party inevitably It has to deal with disputes and maintain the infrastructure for transactions. This makes the electronic microtransactions are unfeasible, since the costs of carrying out a global transaction represent a cost that is too high for certain quantities. To avoid this problem, cryptocurrencies have emerged as Bitcoin, which offers us a solution based on cryptographic proofs to avoid the need for mutual trust and risk of double spending. An owner must digitally sign a hash of the previous transaction and the key public of the next owner, to allow the beneficiary to receive a signature that verifies the chain of property. [bitcoin_prediction#Bitcoin: approach and protocol]

As a currency, Bitcoin consists of three fundamental elements: addresses, the transaction ledger (or blockchain) and the network. The balance of an account, represented by an address, is nothing more than the sum of your incoming (positive value) and outgoing (negative value) transactions. The network is responsible for verifying the legitimacy and viability of the transactions, that is, that they have been issued by legitimate parties owners of the accounts and that no account send money that it does not have. Furthermore, the protocol is designed to not allow the existence of more than twenty-one million bitcoins, establishing that the generation of This is halved every approximately four years.

As with any other currency, the value of Bitcoin is subject to significant variation over time, However, the aspects that affect the price of Bitcoin differ from those that affect currencies. standard. The value of any currency is related to how many people want to own it, but since Bitcoin does not is tied to a particular product or issued by a central authority, it has no intrinsic value. In With all cryptocurrencies, consumers are not limited by a central authority, but only in the currency that the interlocutor will accept in a transaction. The usefulness of owning bitcoins—note that Bitcoin does reference to the currency and bitcoin to the unit—for the consumer is, therefore, related to the measure in that markets adopt it as a valid form of currency.

Like the rest of the currency markets, in Bitcoin there is an open exchange zone that allows Consumers and investors buy and sell bitcoins. The price at which Bitcoin is traded is related with the value perceived by the investor since it is not affected by factors such as the quantity of imported products and exported or the support of the official organizations of a certain state. Bitcoin price holds due to its global and decentralized use, that is, due to the supply and demand that exists at a certain time at global level.

Here, we approach Bitcoin from an investor's point of view as we try to find what drives the variations in the price of a Bitcoin and how they differ from currencies issued by the State. Legal advances or Terms with negative connotations regarding feeling can imply positive connotations regarding the reputation, since, being such a novel and disruptive term, it needs to be interpreted and analyzed from a new point of view different from the feeling.

Of course, creating a predictive model for Bitcoin has its difficulties, since it is a novel concept. (created in 2008) that encounters the following problems:

The system is not regulated. Cryptocurrencies were born with the idea of replacing traditional currencies and therefore legislate appropriately to adapt the laws for these new forms of Payment is really complicated. Apart from actions such as those of China and Russia that seek to prohibit its use [Bitcoin and ether collapse in recent days due to threats from China and Russia] They do not help the expansion of the currency.
Its main users are very different and uncharacteristic. According to Google Trends, the The main users of Bitcoin are programmers, people engaged in criminal activities and investors.
Volatility This currency is much more volatile than a traditional currency.

As we have mentioned, this new currency is not backed by any entity or nation, only by the users who use it and give it a value in each transaction. For this reason, in the case of Bitcoin We can use the typical analysis based on common economic indicators, otherwise we will have to adapt to this new scenario and use indicators such as reputational polarity or sentiment analysis to issue these predictions.

NLP techniques will provide a model that will allow us to assess the sentiment or opinion of our own users. Using the information obtained from a social network obtained over 5 months, it will be validated if there is a correlation of the market with the results obtained from both models.

1.5. Brief description of the other chapters of the report

In the chapter [3] We will present the characteristics of BitTweet, the dataset we created of tweets that mention Bitcoin manually annotated with respect to sentiment and reputational polarity and linked to the price of Bitcoin. For this, it will be explained in the section [3.2] how the tweets have been collected, their structure and the information that has been stored in the database. In the section [3.3] The processes for storing and managing the economic information about Bitcoin. In the section [3.3] the platform used for labeling will be exposed to end the section [3.4] where the labeling results will be discussed.

2.State of the Art

To check the degree of correlation between the price of Bitcoin and reputational and sentiment polarity, it is necessary to know both terms and what algorithms can help us achieve this goal.

The section [2.1] They will analyze the differences between sentiment analysis techniques and reputational polarity based on previous studies published on the subject. In the section [2.2] Different studies carried out on the prediction of Bitcoin value from social networks are analyzed.

2.1. Sentiment analysis vs polarity reputational

As we have commented in the section [1.2] , online reputation is a reflection of the prestige of a person or a brand on the Internet. In order to quantify the reputation of an entity, a predictive algorithm must be able to analyze a document with the aim of finding the most relevant information and classifying it according to its positive, neutral or negative implications, that is, it must use Natural Language Processing techniques in order to be able to interpret its reputational implications.

In this sense, in the study of applied art in this section it has been observed how sentiment analysis is the most used Natural Language Processing tool for monitoring online reputation. Works like [Sentiment Analysis or Opinion Mining: A Review] [30] are an example of this statement, despite, as has been demonstrated in [European Conference on Information Retrieval] [21] that the sentiments of a text and its reputational implications for that entity are different things. Actually, most texts with reputational implications are polar facts, that is, factual information without explicit feelings.

Of course, measuring the reputational polarity of a text is more complicated when the document does not implicitly express a positive or negative reputation on the topic analyzed; But investing resources in this case can provide entities with positive applications, for example, to obtain unstructured opinion data about a service or product.

Although by definition reputational polarity is substantially different from sentiment analysis, the two have some similarities. Furthermore, work on reputational polarity has evolved from previous studies on sentiment analysis, that is, the process of resolving (statistically) whether a text contains positive, negative, or neutral sentiments regarding the entity of interest.

As we have already mentioned, work on opinion recovery and sentiment analysis can be divided into two categories: lexicon-based approaches and supervised classification. Lexicon-based approaches estimate the sentiment of a document using a list of opinion words known as opinion lexicons, such as article [Proceedings of the 40th annual meeting on association for computational linguistics] [17] where the sentiment of a document is identified through a dictionary of words cataloged according to its sentiment. The lexicon-based approach is unsupervised as it does not require any training data. More sophisticated approaches incorporate additional sentiment indicators such as proximity between query terms and sentiment. [13] or stylistic variations based on themes [12] .

Classification-based approaches use sets of features to build a classifier that can predict the polarity sentiment of a document. [10] . Features range from simple n-grams to semantic features and from syntactic features to medium-specific features. [9] .

Furthermore, classification-based approaches can also be divided into semi-supervised and supervised approaches. The biggest difference between the two categories is that semi-supervised approaches combine labeled and unlabeled data. In the article [Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval] [8] A comprehensive review on opinion retrieval and sentiment analysis can be found here. While at work [Like it or not: A survey of twitter sentiment analysis methods] [7] , they found an exhaustive search focused on Twitter sentiment analysis.

From the work on sentiment analysis methods, the first approaches were established for the analysis of reputational polarity, achieving the best results with models trained from textual and sentiment characteristics. The best result was achieved in the article [CEUR WORKSHOP PROCEEDINGS] [6] who trained a maximum entropy classifier using the sentiment lexicon, diagrams, number of negation words and character repetitions. [CLEF 2013 Conference and Labs of the Evaluation Forum] [5] , addressed the problem of reputation polarity with an information retrieval-based approach and found the most relevant class using the tweet content as a query.

[Estimating reputation polarity on microblog posts] [4] , assumed that understanding how a tweet is perceived is an important indicator for estimating the reputational polarity of a tweet. To this end, they proposed a supervised approach that also considered reception characteristics such as replies and retweets of tweets. The results showed that these features were effective and that their best result was obtained on entity-dependent data.

Our contribution will be to investigate the contextual technique of word embeddings in its implementation in the BERT system in section [2.1.1.] in the estimation of the reputational polarity of tweets and the prediction of stock market values, comparing it with sentiment analysis.

2.1.1 Natural Language Processing

As we have seen in the section [2.1] , reputational polarity can use the same algorithms that data analysts use to measure sentiment polarity. Based on that premise, this section of the project will aim to investigate the field of Natural Language Processing (NLP).

As we have analyzed previously, text processing through artificial intelligence represents a challenge when presenting a given text to an algorithm and for it to understand it in its entirety, preserving the characteristics of the language.

Modern natural language processing (as of 2013) frequently uses the technique of embeddings, representations of words in an n-dimensional vector, based on the premise that their spatial proximity entails some kind of relationship between them. In the figures [1] [2] [3] 3 graphic examples of this algorithm can be analyzed.

Location proximity

As you can see, the first step of this algorithm is to assign each word a vector of numbers based on its semantic content (it should be remembered that neural networks are more efficient with numbers). If the image is analyzed [1] , we see a semantic example of how four different but related words would be represented in a vector space. If a mathematical operation is performed such as: King minus man plus woman, the result will be a vector very close to the one represented by Queen.

This evolution allows neural network systems to be used to understand the semantics of words, although without understanding the relationships between them. To solve this lack, NLP techniques have improved enough to generate what we know today as 'language models'.

Language models are Machine Learning patterns designed to predict what the next word in a text should be based on all the previous words.

The great potential of this technique is that, once the AI understands the structure of a language, it is relatively easy to download these pre-trained models and adapt them through fine-tuning to tasks other than text creation, such as text classification.

Among all the systems published so far and after searching among different solutions, in this research we have opted for BERT, one of the most advanced models for the representation of words and texts. BERT is a system that provides contextual word embeddings, that is, each word receives a representation dependent on the context in which it appears. Contextual word embeddings are pre-trained systems that provide unprecedented semantic richness, and that have been changing NLP since 2018. Although there are several systems that compete with BERT today, the fact that BERT is open source and well documented makes it the most popular option and the one we have adopted in this work.

Bert

As has been seen through the content of this section, in order to classify the reputational polarity of a text, both the analysis of the document using vector space and the analysis of the context in which the words occur will be necessary. We can see the consequences of this new interpretation reflected in the word king from the previous example, since it will have a different meaning depending on the context in which the word is used. This subtlety is necessary, since capturing the grammatical meaning of the words can provide relevant information about their polarity. For example, it is not the same to use a word as an object or as a subject in a sentence, with one meaning or another.

In this sense, natural language processing (NLP) techniques based on artificial intelligence (AI) algorithms will offer us a better solution than the algorithms analyzed so far.

To do this, previous experience in this field in language translation, sentiment analysis or semantic search can be used, which will offer help when choosing the best path for our task. Another benefit of gaining prior experience on other tasks is the ability to more efficiently optimize the created model. These algorithms need to be fed with diverse data sets large enough to train the models they use. Deep learning algorithms imitate the behavior of neurons in the human brain, that is, as the training set increases, its results improve and, therefore, any set already labeled can help us obtain better results in the project.

Now, because NLP is a field with many different tasks, most task-specific data sets contain only a few thousand or a few hundred thousand examples of human-labeled documents. To help close this data gap, researchers have developed a variety of techniques to train general-purpose language representation models using the enormous amount of unannotated text on the web (known as pre-training). The pre-trained model can then be fine-tuned to small data NLP tasks such as question answering and sentiment analysis, resulting in substantial accuracy improvements compared to training on these data sets from scratch.

And in this context, on November 2, 2018, Google presented Open Sourcing BERT (Bidirectional Encoder Representations from Transformers), the first deeply bidirectional contextual model, unsupervised language representation, pre-trained using only a plain text corpus.

BERT builds on recent work in pre-training contextual representations, including Semi-supervised Sequence Learning, Generative Pre-Training, ELMo, and ULMFit. However, unlike these previous models, BERT is the first unsupervised, deeply bidirectional language representation, pre-trained using only a simple text corpus.

As explained by Google AI researchers Jacob Devlin and Ming-Wei Chang, BERT is unique because it is bidirectional, allowing access to context from past and future directions and unattended, meaning data can be captured without classification or flagging. This contrasts with traditional NLP models that produce a context-free word embedding (a mathematical representation of a word) for every word in your vocabulary.

Pre-trained representations can be context-free or contextual, and contextual representations can be unidirectional or bidirectional. The context-free models discussed above generate a single-word embedding representation for each word in the vocabulary. For example, the word "bank" would have the same context-free representation in "bank account" and "river bank." Instead, contextual models generate a representation of each word that is based on the other words in the sentence. For example, in the sentence "I accessed the bank account," a one-way contextual model would represent "bank" based on "I accessed the" but not "account." However, BERT represents "bank'' using its previous and next context. "I accessed the [...] account'', starting from the bottom of a deep neural network, making it deeply bidirectional.

Below is a visualization of BERT's neural network architecture compared to more advanced contextual pre-training methods. Arrows indicate the flow of information from one layer to the next. The green boxes at the top indicate the final contextualized representation of each input word:

4 Comparison of Bert as a bidirectional algorithm, OpenAI GPT unidirectional and ELMo that is a bit two-way. Source Google AI blog [1]

With this release, anyone in the world can train their own sentiment analysis system (or a variety of other models) in a few hours with a single GPU. The release includes source code built on TensorFlow and a number of pre-trained language rendering models (including English).

Additionally, BERT learns to model relationships between sentences through image priors from any corpus. It is based on Google's Transformer, an open source neural network architecture based on a self-attention optimized for NLP

The last point to take into account in this algorithm is the score obtained in (SQuAD), an adaptation of BERT for reading achieved an accuracy of 93.2 percent, exceeding the state of the art and the human level of 91.6 percent and 91.2 percent, respectively. In the GLUE Benchmark (GLUE), a collection of datasets for the evaluation of NLP rendering systems has achieved an accuracy of 80.4 percent.

2.2. Bitcoin value prediction from networks social

In traditional currency markets it is common to see investors use one of the following approaches (together or separately) to predict market trends:

Fundamental Analysis: The technique that uses the underlying factors of a security to estimate its value. In In relation to the currencies issued by the State, this technique focuses on indicators such as forecasts of growth of a nation, import and export levels, tourism, political measures, levels debt, GDP and international relations. These are used as parameters for a valuation model. If the coin is considered to be below price then it makes sense to buy that coin, otherwise to sell.

[28]

Technical analysis: it is an alternative method of assigning value to a stock that analyzes the activity of the market by analyzing data such as historical prices and daily traded volume. This approach does not attempt to measure the intrinsic value of a security, but rather uses mathematical models and statistical analysis to identify patterns in order to predict future activity.

[Bitcoin Trading Agents]

[22]

x_{i}

y_{1}

\leq i \leq n

If we try to apply fundamental analysis on Bitcoin we will encounter many problems. As we have mentioned, this new currency is not backed by any entity or nation, only by the users who use it and give it a value in each transaction. For this reason, in the case of Bitcoin we cannot use the typical analysis based on usual economic indicators, but we will have to adapt to this new scenario that must be analyzed in this section.

The first point of all is to understand the characteristics of Bitcoin as a currency, its users and the market forces that drive its price variations. Understand the factors that differentiate it from traditional currencies and explore important considerations when designing a successful prediction.

At this time there is a great debate about its use among those authors who analyze assets as speculative values or refuge while other authors maintain that the attractiveness could increase until they end up fulfilling the functions of money demanded by economic theory. The article titled Inferring causal impact using Bayesian structural time-series models [34] explores the association between the market price of Bitcoin and a set of internal and external factors using the Bayesian Structural Time Series Approach. The results show that Bitcoin has mixed properties as it appears to currently act as a speculative asset, safe haven and a potential capital flight instrument.

The Bayesian Structural Time Series Approach (BSTS) model is a machine learning technique used for feature selection, time series forecasting, nowcasting, and causal impact inference, for example.

In this case, for the analysis of time series it is advisable to use methods that help interpret the information obtained by the sources and allow representative information to be extracted about the underlying relationships between the data of the series or various series. All of this allows (to a different extent and with different confidence) to extrapolate or interpolate the data and thus predict the behavior of the series at unobserved moments.

Another example is quantitative trading techniques, widely used throughout the financial industry, where price movements are assumed to follow a set of patterns, so that historical prices can be used to predict future ones. Based on this information, the latent source model, formalized in the work, can be used. A latent source model for nonparametric time series classification [27] , which attempts to take data considered high dimensional (such as a time series), and identify the ways in which the underlying events are characterized in that space. There may be only a small number of primary causes for events, but they will often be hidden in the data and are difficult to find.

Regarding sources of information, there are different sources of data that are easily accessible, such as:

Blockcain.info where all the information related to monetary statistics, activity of the network, details about blocks, new coin creation rates and transactions. Of course, it includes exchange value USD to bitcoin and vice versa along with its volume.
Google Trends. This platform is a Google Labs tool that shows the most search terms popular of the recent past. Using the word Bitcoin as a query, the main themes have been obtained related to cryptocurrency.
Macroeconomic data. Macroeconomic data from S&P500, Chicago Board Options Exchange and Volatility Index.

Finally, the social network Twitter can be a source of information about the reputation of Bitcoin, since its concise format and the ease of extracting information in real time can predict the evolution of the market. The article [Algorithmic trading of cryptocurrency based on Twitter sentiment analysis] [23] confirms this hypothesis and gives an example of how two distributions created from the collected data have allowed the author to predict the evolution of the market with sufficient success to understand the correlation between the market and the sentiment of users on social networks. Continuing this point of view, the article [The Information of Spam] [2] uses the same source of information but with a different objective, since its intention is to validate the convenience of using Spam to analyze sentiment on social networks.

Once this first approximation is finished, it can be deduced that in all articles there is a correlation between a set of internal and external factors (including the sentiment of the users themselves) of Bitcoin and its price. That said, there is only one text exposed in the article [Inferring causal impact using Bayesian structural time-series models] [34] which exposes the importance not of sentiment but of the reputation of the currency. The document confirms that there is a positive relationship between the new legislation of countries on cryptocurrency and its price increase, that is, it states that the reputation of the currency is a factor that affects the cryptocurrency. Of course, new legislation does not have a feeling in itself, therefore, use the techniques outlined in the article [Algorithmic trading of cryptocurrency based on Twitter sentiment analysis] [23] would not prove this statement, new techniques must be found that confirm this relationship empirically.

With a more current approach, you can consult the project ( [LSTM Model predicting Bitcoin with Tweet Volume \& Sentiment] [14] which aimed to explore the options available to create a model that could predict price action over a selected time period. The variables I used were data collected with tools for sentiment analysis on Twitter to predict the evolution of the market using an LSTM (Long short-term memory). Long short-term memory is a recurrent neural network model that has been predominant in NLP until the emergence of Transformers, which are now the basis of BERT and many other systems.).

Currently there is no article that relates the reputational polarity of Bitcoin with the economic evolution of the market. Although reputational polarity is substantially different from sentiment analysis, it is true that the two tasks have points in common that can be taken advantage of and, therefore, algorithms such as BERT can provide a higher success rate in predicting the market trend and, therefore, have a higher percentage of success.

Therefore, this project will focus on considering Bitcoin as a multifaceted property that goes between a virtual currency, a hedge and safe haven asset for geopolitical instability and a payment method and we will apply state of the art in Natural Language Processing (in particular, contextual word embeddings in its implementation in the BERT system) to the estimation of the reputational polarity of tweets and to the prediction of stock values, comparing it with sentiment analysis.

3.BitTweet dataset

This chapter will specify the chosen dataset, how it was obtained, labeled and managed to later be used in the experimentation phase.

To do this we will divide this chapter into different sections. In the section [3.1] It will be analyzed how the information has been obtained, the sources used and how they have been managed. In the section [3.2] It will explain how the classification process has been carried out, the labeling standards applied for each of the different categories and where they have been stored. Finally, in the section [3.3] It will be explained how the labeling process has been carried out, and in the section [3.4] the results will be summarized.

3.1. Sources of information

In this section we are going to specify the two sources of information used in the project: on the one hand, the social network Twitter, from where we will obtain the comments written by users, and on the other, the website www.blockchain.com from where the economic information about Bitcoin is recorded.

In the section [3.1.1] It will be explained what Twitter is, why we have chosen this social network, how we have captured the information from the application and where we have stored it, to continue in the section [3.1.2] contextualizing Bitcoin, what type of information we can obtain and how we manage it.

3.1.1 Twitter

2026 reproducibility note. This chapter describes a historical 2019 dataset and should be read as thesis evidence, not as a directly reusable live-collection recipe. X/Twitter API access, pricing, endpoints and redistribution rules have changed. A new replication should document whether it uses X API v2 Client/StreamingClient, the query window, rate or cost limits, tweet-ID hydration strategy, deleted/protected post handling and the ethical basis for collecting social data.

As we have already mentioned before, the main problem with cryptocurrencies is their volatility. If you want to analyze the correlation of prices with the reputational valuation or sentiment of Bitcoin, it is necessary to solve this problem by using a data source that includes quick access to the first news updates in a concise format, as well as being able to extract data from this platform with relative ease.

Following the methodology explained in section [1.4] and from the investigation of the section [2.2] It was concluded that using the social network Twitter can solve the problem. This platform allows users to send short plain text messages, with a maximum of 280 characters. These messages, called tweets, are displayed on the user's home page and can be captured through an API provided by the social network itself. [31]

From the code exposed at the following GitHub address ( https://github.com/al118345/Tweepy/blob/master/tweepy.py ) It has been possible to capture all those tweets that contained the word bitcoin in the message, creating a dataset with 792792 records. These records have been collected from March 28, 2019 to August 27, 2019.

Tweepy was used in the implementation (( https://www.tweepy.org/ )) a Python library that allows you to use the Twitter API to obtain tweets in the desired context, that is, they contain the word Bitcoin and are written in English.

The collected information has been stored in a mysql database with the following structure:

ID_rubenPrimary: Integer number that identifies the record.
created_at: Fri Nov 02 17:18:31 +0000 2018
Tweet creation date
id: 1058408022936977409. Integer number that identifies the tweet.
text: RT @harmophone: "The innovative crowdsourcing that the Tagboard, Twitter and TEGNA collaboration enables is surfacing locally relevant conv…,
Text field where the content of the tweet is stored:
sia_feeling: Positive
Acronym for Sentiment Intensity Analyzer. Assessment of sentiment by VADER, bookstore for sentiment analysis discussed in the section [1.2] .
textblo_feeling: Positive
Sentiment assessment by Vader, sentiment analysis library discussed in the section [1.2] .
translated text: @harmophone:"The innovative crowdsourcing that enables Tagboard collaboration, Twitter and TEGNA is emerging locally. . . '',
Text field where the tweet translated by Google Translate is stored. For example:
Information extracted from Twitter not used for the project such as:

source:Twitter Web Client.
truncated:false.
in_reply_to_status_id:null.
in_reply_to_user_id:null.
in_reply_to_screen_name:null.
geo:null.
coordinates:null.
place:null.
contributors:null.
retweeted:false.
Lang: in.

For the project we are only going to use a small number of fields, the rest have been collected for future research that could measure or assess, for example, the impact of the tweet, the place where it was written or who wrote it, among other options.

The resulting information can be consulted in the file TweetsBaseDeData.csv, shared through Zenodo through the following URL (https://zenodo.org/record/3830920)

3.1.2 Bitcoin and Blockchain

As we have previously mentioned, Bitcoin is a peer-to-peer digital currency system programmed in open source. [24] and considered as a potential alternative to standard currencies. It uses a cryptographic protocol to control the creation and transfer of money, ensuring that it retains its value and preventing it from being double spent. It is created and transferred without the need for a central governing authority, using computational resources available to any user and transferring directly from one account to another using cryptographic algorithms.

Like the rest of the currency markets, in Bitcoin there is an open exchange zone that allows consumers and investors to buy and sell bitcoins. The price at which Bitcoin is traded is related to the value perceived by the investor, since it is not affected by factors such as the quantity of imported and exported products or the support of the official bodies of a certain state. Due to this characteristic, the price of bitcoin is sustained by its global and decentralized use, that is, by the supply and demand that exists at a certain time worldwide.

In this context, Blockchain.info was launched in 2011, a service capable of providing its users with data on the number of transactions, mined Bitcoin blocks, graphs, statistics and resources for developers with the aim of helping cryptocurrency users create an effective commercial strategy.

As explained in section [2.2] This website has an API that will provide us with the main monetary statistics, network activity, details about blocks, creation rates of new coins and value of the last transaction. This last data will be used in the project to analyze the correlation of the Bitcoin price with sentiment and reputation polarity.

At the GitHub address below ( https://github.com/al118345/java_client_blockchain/blob/master/client_java_blockchain.java ), you can consult the implemented code where the following data is obtained:

unconfirmed transactions: Number of pending unconfirmed transactions.
price24hoursUSD: Weighted price of Bitcoin during the last 24 hours.
marketcap: Total market capitalization.
24hrtransactioncount: Number of transactions made in the last 24 hours.
24numberBitcoinsent: Amount of Bitcoin exchanged in the last 24 hours.
hashreat: Estimated network hash rate in gigahash
difficulty: Current difficulty of the Bitcoin network.
block length: Length of the last mined block.
totalbitcoin: Total number of Bitcoin in circulation.
date: Record creation date.

The blockchainInfo.csv file shared on zenodo.org (( https://zenodo.org/record/4008108 )) you can consult the database generated for the project.

The file is composed of 20604 records that correspond to the data taken from March 20, 2019 to August 27, 2019.

3.2. Collection

This section will explain how the classification process has been carried out, the labeling standards applied for each of the different categories.

First of all, it is necessary to remember that the project will manage two different types of labeling, which will be the labels for:

Reputational Polarity
Sentiment Analysis.

In subsection [3.2.1] It will be analyzed how a tweet should be labeled based on how it affects the reputation of Bitcoin and in the subsection [3.2.2] It will be analyzed with respect to the feeling.

3.2.1 Reputational Polarity

As we discovered in section [2.1] , it is more important how a tweet affects the reputation of the currency than the sentiment it has.

Taking this statement into account, the annotation rules used to collect tweets regarding reputational polarity have been the following:

Positive: Any tweet that:

Value Bitcoin as a stable and safe investment system.
A real use coin.
A positive prediction about the evolution of its price
Explain a technological advance
Give examples of use.

Neutrals: Any tweet that:

Do not provide any new or useful information
Information about the current price of Bitcoin
Advertising
Tips

Negatives: Any tweet that:

Do not value Bitcoin as a stable and safe investment system.
Criticize the use of Bitcoin.
Talk about problems related to its use.
Associate Bitcoin with criminal acts.

From this point, we will present examples that have helped the labeler carry out his work. We will start with the Positives, where we will include all tweets with content similar to the following:

All those that contain the word Drop Gold, although forgetting about gold can have a connotation negative regarding the sentiment, from the reputational point it is positive. Within the cryptocurrency there is a movement in favor of replacing gold with Bitcoin.
ETFs are positive, since it is the agile way to invest, without as much risk and at a better cost than a mutual fund. investment. A Bitcoin ETF, or exchange-traded investment fund, represents a very important advance for the cryptocurrency since it would allow more investors to enter. An example could be:

Hey, check this out: [New Bitcoin ETF (BTC) and Ethereum (ETH) submitted to the SEC] (through the Quarry app) https://t.co/Ie5q6Y9QWO

Positive price predictions for Bitcoin. For example, any tweet that contains a +5% increase in the last hour, or an increase with respect to the current price should be considered positive.

$11,500 #bitcoin Price Will Absolutely Become a Reality in 2019 (https://t.co/uQB3ttUSid) https://t.co/n5xFOm17m3

Lot's of green today for crypto!! #bitcoin #bitcoinrich ( https://t.co/SWEM4EjblH)

Using Bitcoin as a synonym for security:

@brendan_dharma Well it happened on bitcoin and therefore would not be a scam

Technological or legal advances are also considered a positive aspect since, despite having a Neutral sentiment has a positive impact on your reputation. For example:

New #Blockchain Service Builds Worldwide Standardized Verification System For Certificates @newsbtc - https://t.co/3qBnN86cf6 #bitcoin #cryptocurrency #ethereum #crypto #tech #btc #blockchaintechnology #fintech #ecosystem #ICO #Ethereum #IoT #AI #BigData #altcoin #ETH https://t.co/eyorvyvJ7c

OCF aims to transform philanthropy to detect the world's first decentralized charitable foundation to build a future in which blockchain technology can avoid ending all forms of poverty and inequality.#ooobtc #obx #crypto #bitcoin #ethereum #blockchain #btc #toqqn

Adoption by large companies.

Facebook rolls back ban on cryptocurrency ads as it ramps up its own blockchain efforts #cryptocurrency #btc #bitcoin (https://t.co/MAIqyT1XbJ)

RT @crypto__mak: NYSE Arca Wants to List Bitcoin and T-Bill-Backed Fund (https://t.co/MQXHCXArKv) #News #bitcoin #nysearca

Criticism of those who do not use Bitcoin as a currency. They are actually negative in feeling, since it has a feeling of criticism, but they are positive for reputation. For example:

Google: NoCoiner ... I would post it here, but Twatter only lets me write not enough words ;)

Competitions. They are considered positive because apart from trying to publicize the currency, it recognizes its value as it is object of desire on the part of the participants. It also gives you utility of the currency and interest in it. Example:

RT brought a MEGA CONTEST to Freebitcoin Follow us on Instagram for updates bitcoin freebitcoin crypto crypto

Those tweets with positive aspects about its operation or positive statements are considered positive. about the currency:

RT @CryptoBac: #btc crypto #cryptocurrency Everything is going great here!

Bitcoin associated as a solution to financial collapses. In that sense, despite having a feeling negative because of the word collapse, saying "don't be one..." gives you the opportunity to see the positive feeling in the polarity as a solution to a problem.

RT @ArminVanBitcoin: Accumulate #Bitcoin today. Survive the big financial collapse tomorrow. None of my friends are listening. Don't be one

Within the Neutrals, all tweets with content similar to that shown below will be included:

That tweet that contains basic or no content questions regarding Bitcoin such as do you know bitcoin? Have you heard of bitcoin? which does not provide any information about reputational polarity.
Short tweets with little or no useful information such as:

RT @azbit_news:

Dollar Bitcoin

Economic information about Bitcoin. An example could be data on the current price of Bitcoin or the global capitalization with respect to the dollar as shown below:

@ #1, Bitcoin with unit price of $5,926.35, market cap of $104,823,794,536 (56.12%), and 24 hr vol. of $17,981,007,232.3 (31.74%)

Events on blockchain or Bitcoin topics that do not provide any benefit.

Don't forget the Tampa Bay #Bitcoin meetup tomorrow. RSVP while you still can: https://t.co/wSff9z2lPz

Advertising on cryptocurrency exchange platforms, such as:

https://t.co/J8amkmiqmE The most popular cryptocurrency exchange #cryptoexchange #blockchain

Telegram groups about Bitcoin.

RT @authpaper: Don't forget to also join our #telegram group to earn more #bounty rewards! Telegram link: https://t.co/xi6hNWnFGy #AUPC #A

Guides to learn more about Bitcoin, tips on how to use it or information without any feeling reputational.

RT @MervikHaums: Yes! You own your funds only if you own your keys. #binance #bnb #hacked #bitcoin #btc #toqqn #tqn #crypto #exchange http

Blockchain: Bitcoin, Ethereum, Cryptocurrency: The Insiders Guide to Blockchain Technology, Bitcoin Mining, Investing and Trading Cryptocurrencies (Blockchain business, & Blockchain for Dummies) https://t.co/2Y7EDhqFIb #blockchain #ad

Within the Negatives, all tweets with content similar to that shown below will be included:

Tweets about Craig Wright as the creator of bitcoin or positive towards that person are considered negative. This name has a negative reputation in Bitcoin and therefore everything related to it will have a negative connotation.

Satoshi Files: Calvin Ayre Teases 'More Evidence' Craig Wright Created Bitcoin https://t.co/wYkBf416o9

Derogatory comparisons with Bitcoin.

@JamesTodaroMD @TusharJain_ Bitcoin will never be free state money! But Ethereum will! Negative correct

Those referring to the lack of legislation or legal problems.
All those tweets that report or analyze Hacks

Hackers Steal $40.7 Million in #Bitcoin From Crypto Exchange Binance https://t.co/rMAQVRsKLN

https://t.co/Qc5JBeuw4B @LukeDashjr at 36:30 Cz from Binance said some community members and core bitcoin devs offered to roll back as a tech solution? what core member offered this? would be interesting to know

All tweets where bitcoin is associated with non-legal payments.

Are you paying for this media coverage in bitcoin or rubles Nigel Farage? You've been investigated for funding irregularities before - you will be again. #youwontgetawaywithitforever #charlaton #TuesdayTruths #sideofabuslies

Those that talk about negative economic terms such as price drops or possible corrections of Bitcoin.

Is #bitcoin Due for a Correction? for BITMEX:XBTUSD by oh92 #XBTUSD https://t.co/Rgkbt2wpAO https://t.co/pcjJ8j8E5p

RT @CredibleCrypto: There are ALWAYS pullbacks, so stop fomo-ing if you miss a leg up and prep your plan to buy the next correction. https:…

3.2.2 Assessment of feeling

In this section we will apply the same labeling that we analyzed in the previous section, but from the perspective of sentiment. To do this, it is necessary to analyze the content of the tweet itself with the aim of labeling it as positive, negative or neutral with respect to the sentiment expressed regarding Bitcoin.

Positive: Any tweet that:

Rate Bitcoin as something positive.
Contain positive words
Price increases
Advertising
Positive comparisons.

Neutrals: Any tweet that:

Tutorials
Economic information about Bitcoin
Example of use

Negatives: Any tweet that:

Write about cyber attacks.
Write about problems related to its use.
Associate Bitcoin with criminal acts.
Show contempt towards Bitcoin.

From this point, we will present examples that have helped the labeler carry out his work. We will start with the Positives where all those tweets with the following characteristics will be included:

The content of the tweet has positive words like:

(https://t.co/J8amkmiqmE)The most popular cryptocurrency exchange#cryptoexchange #blockchain

Bitcoin, Ethereum, Ripple and IOTA Are The Most Important Projects among 1500+ Cryptocurrencies, KPMG Report(https://t.co/ZDhJDnnoml)#Bitcoin #BitcoinLifestyle (https://t.co/uQEFmPmwGb)

Bitcoin is defined as something real and not a scam

@brendan_dharma Well it happened on bitcoin and therefore would not be a scam

Tweets where cryptocurrency is presented as a solution to problems:

ByzCoin has the potential to overcome the lag through scalable collective signing, committing #Bitcoin transactions irreversibly within seconds. Watch @brynosaurus present an outline and how it can be a solution to Bitcoin scalability (https://t.co/GBfd7iN0bA) #blockchain ( https://t.co/GBfd7iN0bA ( https://t.co/GBfd7iN0bA ) #blockchain )

Positive comparison of Bitcoin over another Cryptocurrency.

@brucefenton The problem is that Litecoin is worse than Bitcoin on all points. And that what Litecoin do mostly is copying Bitcoin.Betting on a different coin than Bitcoin is fine. But so far no coin has been better.

Positive publicity about Bitcoin.

RT @ProofOfSteve: Every time we open and close above one of these trend lines we go straight up. Guess what, this is the 4th time in BTC history this has happened. #BTC #bitcoin #crypto #hodl $btc $bitcoin (https://t.co/8bdQ0I0Mtd)

Tweets about the rise in the price of Bitcoin.

Bitcoin Soars Above $7,000 As Crypto Comeback Continues ( https://t.co/KQ7U0bRwjR) #Money #Finance #Economics #Market

All those tweets where, despite having negative terms, the way of using them and the context of the words transform it into positive ones.

Bitcoin whales are smart money. Don't be stupid money (https://t.co/bDXT5tOaK2)

Continuing with the process, all tweets with content similar to the following are considered Neutral:

All those where an information guide, a tutorial or any type of technical help is presented to users. users such as:

Blockchain: Bitcoin, Ethereum, Cryptocurrency: The Insider's Guide to Blockchain Technology, Bitcoin Mining, Investing and Trading Cryptocurrencies (Blockchain business, & Blockchain for Dummies) (https://t.co/2Y7EDhqFIb) #blockchain #ad

Stock market technicians without any kind of feeling like:

Bitcoin 55k target came just short Good example of why OBV has been more important lately than RSI Bear div started on RSI but not OBV Once OBV showed div is when it dropped Top Goon bounce and just gave same signal on 12H 4648k area to watch

@ #1, Bitcoin with unit price of $5,926.35, market cap of $104,823,794,536 (56.12%), and 24 hr vol. of $17,981,007,232.3 (31.74%)

#ETH Buy at #Paribu and sell at #Gate.io. Ratio: 0.92% Buy at #Koinim and sell at #Bitfinex. Ratio: 4.76% Buy at #BtcTurk and sell at #Bittrex. Ratio: 1.04% Buy at #BtcTurk and sell at #Bitfinex. Ratio: 6.06% #bitcoin #arbitrage #arbitraj #arbingtool https://t.co/xiFUPzcOcC

Tweets with information about the use of Bitcoin where no feelings are expressed.

University Students Choose One Dollar Over One Bitcoin @bitcoinist #Bitcoin #Bitcoin Acceptance #Bitcoin Education #Bitcoin Price #bitcoin #dollar #students ( https://t.co/Gs06qUFBEd)

Tweets about books about Bitcoin.

Download EPUB Mastering Bitcoin: Programming the Open Blockchain https://t.co/3nXIw3Eerh

Finally, they are considered negative all those tweets with content similar to the following:

Tweets where negative words appear such as "collapse", "they are not" listening:

RT @ArminVanBitcoin: Accumulate #Bitcoin today. Survive the big financial collapse tomorrow. None of my friends are listening. Don't be one

All those that contain the word Drop Gold, since getting rid of gold has a connotation negative about the feeling.

Does Grayscale’s Latest ##DropGold for #Crypto Effort Entirely Miss the Point? (https://t.co/TD7U54ENzi) #bitcoin

Tweets written with words that have a negative meaning regarding the value of Bitcoin, such as the word "correction" or "decrease."

Is #bitcoin Due for a Correction? for BITMEX:XBTUSD by oh92 #XBTUSD https://t.co/Rgkbt2wpAO (https://t.co/pcjJ8j8E5p)

Bitcoin (BTC) Price Weekly Forecast: Technical Bias Signaling Fresh Increase (https://t.co/qwgbNrkgY1) #Bitcoin #Cryptocurrency #Analysis #BTC #Technical"@brucefenton The problem is that Litecoin is worse than Bitcoin on all point. And that what Litecoin do mostly is copying Bitcoin.

Tweets about cyber attacks, bots or technical problems

Hackers Steal $40.7 Million in #Bitcoin From Crypto Exchange Binance (https://t.co/rMAQVRsKLN)

Homeland Security Warns Bots Are Exploiting Decentralized Crypto Exchanges #bitcoin #ripple #altcoin #cryptocurrencymarket #SmartCash #cryptonews #coldwallet #er20( https://t.co/xcQh00U6zw)

@binance quit holding our funds hosting. If we wanted our money to have delays we used fiat. Your damage has already been assessed so there is no reason for this continuation. Binance BinanceHack btc bitcoin

Tweet about actions of dubious legality or negative connotation.

Derogatory Bitcoin Comparisons

@JamesTodaroMD @TusharJain_ Bitcoin will never be free state money! But Ethereum will!

@cryptochrisw absolutely #securypto product matter.. Without product it has no function at all! And become waste! #cryptocurrency #bitcoin #altcoinVicious Crypto Crash Could Supercharge Bitcoin Price Rally to $20,000 (https://t.co/DzuD0HCYgv)

3.3. Process of labeling

The objective of this section has been to analyze what type of interface and infrastructure was most appropriate for the tagger, that is, the program in charge of collecting, displaying and storing the rating of a tweet from the point of view of reputational polarity and sentiment.

Being a long, repetitive task with multiple options, it is very easy to make mistakes during the labeling process. To try to avoid errors, we have chosen to use the website shown in the figure ( [3.1] . In it, a friendly interface was implemented for the labeler, trying to optimize their effort using multiple colors, rows and a responsive design based on Bootstrap giving the labeler complete independence to choose the device that best suits their way of working.

Regarding the interface, in the first part of the interface we will have the Reputational Polarity labeling. The row is divided into buttons and each button has a different color depending on its purpose, that is, depending on its reputational polarity: Positive, Neutral, Negative and Doubtful.

The second part of the form corresponds to the labeling of the sentiment. In this line, the user selects a label that symbolizes their perception regarding the sentiment of the tweet among the different available checkboxes. It is complementary to the top line, and only a single checkbox can be selected.

Finally, doubtful polarity, it is used to store those tweets that we have doubts about their Bitcoin theme. In this way, they are differentiated from the rest, with the aim of analyzing them individually later.

Following a principle of simplicity, the website has been implemented so that it can be viewed from any device, both mobile phones and computers, with the aim of allowing the user to tag independently of the device used. Furthermore, the operation is very simple, simply clicking on a button stores the information associated with the button and the selected checkbox.

To facilitate the process of creating the dataset, a dynamic tweet selector has been implemented, that is, each time the website is reloaded it randomly selects the tweet to be analyzed with the aim of creating a training set that is as real and spread over time as possible with the aim of labeling tweets with different news, topics or opinions. You are also provided with information such as creation date, number of transactions, price, etc. to help the labeler carry out his task.

The address to consult the information is ( http://test.1938.com.es/web_probas_v2.php)

Interfaz gráfica web de etiquetado — 3.1 Labeling web graphical interface

3.4. Discussion

Our manual annotations on the TweetCoin collection are, to our knowledge, the first manual dataset in which the difference between sentiment and reputational polarity can be quantified; In the RepLab reference dataset, reputational polarity is noted, but not sentiment; and in most of the rest of the datasets only the sentiment is noted. In the figure [3.2] the confusion matrix between both is collected. It can be seen that, in 37% of the cases (600 tweets out of a total of 1145), the annotations are not coincident. The most frequent discrepancies are, in this order: (1) positive sentiment with neutral reputational polarity; (2) neutral sentiment with positive reputational polarity; (3) negative sentiment with positive reputational polarity. This confirms the intuition that when we study Reputational Polarity and Sentiment Analysis we study two different ways of measuring the online reputation of a brand.

Matriz de confusión para comparar la polaridad reputacional vs el análisis del
sentimiento. — 3.2 Confusion matrix to compare reputational polarity vs analysis of the feeling.

Of course, measuring the reputational polarity of a text is more complicated when the document does not implicitly express a positive or negative reputation about the entity. In the labeling process there have been cases such as technological advances, with completely neutral sentiment but very positive for the reputation of Bitcoin.

Technological or legal advances are also considered a positive aspect since, despite having a neutral sentiment, they have a positive impact on your reputation. An example could be the following tweet:

Another example, with positive reputational polarity and with negative sentiment can be criticism of people who do not use Bitcoin like the following tweet:

Google: NoCoiner ... I would post it here, but Twatter only lets me write not enough words ;)

As we can see in the matrix, during the collection period we have encountered a large number of this type of tweets. Exactly we have located 168 tweets where the sentiment and reputational polarity are opposite.

We have also detected a large number of tweets that are positive regarding sentiment and neutral to reputational polarity, such as advertising:

Find the Largest Telegram group provide Free Crypto BOT; Crypto Signal Bitcoin forum - Discuss and Learn About Cryptocurrency

4.Experimental design

Reproducibility checklist

Keep the chronological train/test split, preprocessing, label mapping, random seeds, package versions and Bitcoin price source together. A random split can leak future language into training and overstate predictive performance.

This chapter describes the experimental design to compare the role of sentiment analysis and reputational polarity in predicting stock values. The technique explained in section will be implemented [2.1.1] . As has already been analyzed, Bert will be the NLP algorithm that best suits the project and, based on the dataset created in the chapter [3] , a model can be created to predict the polarity and sentiment of the tweets obtained from the social network Twitter to finally analyze the correlation with the price of Bitcoin.

To do this, we will begin the chapter with the section [4.1] where it will be explained what VADER is, a Python library for sentiment analysis that will be used in the project as a reference model or baseline.

Will continue with section [4.2] where two implementation options of the BERT algorithm for sentiment analysis and reputational polarity will be presented. First in the section [4.3] It will be analyzed which implementation obtains the best results to detect reputational polarity in BitTweet and in the section [4.4] It will be the same but with respect to the feeling. Finally, the conclusions of the results will be presented in section [4.5] The implementation chosen to make the stock market prediction will be presented in the section [4.6]

4.1. Sentiment Analysis System with VADER

This section aims to find an algorithm or model that we can use as a baseline. What is sought is a system implemented as a library in Python that avoids the cost of carrying out a manual labeling process, can be integrated into the tweet collection process and is validated by different studies.

Among the different possibilities that exist, VADER (Valence Aware Dictionary and Sentiment Reasoner) has been used for this project ( ([https://github.com/cjhutto/vaderSentiment] )) a rule- and lexicon-based sentiment analysis tool that is specifically attuned to sentiments expressed on social media.

If we remember the section [1.2] , the algorithms for sentiment analysis are based on the use of a dictionary of words, where each of them is classified as to how positive or negative they are. in the box [4.1] An example is given where more positive words have higher ratings and more negative words have lower ratings.

words	Sentiment rating
tragedy	-3.4
rejoiced	2.0
insane	-1.7
disaster	-3.1
great	3.1

4.1 Example extracted from Vader's lexicon

When VADER analyzes a fragment of text, it checks whether any of the words in the text are present in its lexicon. For example, the sentence "The food is good and the atmosphere is pleasant" has two words in the lexicon (good and pleasant) with ratings of 1.9 and 1.8 respectively.

From this information VADER will return four opinion metrics. The first three, positive, neutral and negative, represent the proportion of the text that falls into each category. For example, the painting [4.2] is a representation of the result obtained with the sentence "The food is good and the atmosphere is pleasant" where VADER rated it as 45% positive, 55% neutral and 0% negative. The final metric, labeled composite, is the sum of all lexical ratings (1.9 and 1.8 in this case) that have been standardized to range between -1 and 1. In this case, our example sentence has a rating of 0.69, which is quite positive.

Sentiment metric	value
Positive	0.45
Neutral	0.55
Negative	0.00
Composite	0.69

4.2. Result when applying VADER to the sentence The food is good and the atmosphere is nice

As also mentioned in the section [1.2] , there are expressions such as emoticons or exclamations with useful meaning for the interpretation of the feeling. VADER applies this principle and we can see it in prayer I just got a call from my boss, does he realize it's Saturday? smh :'') you obtain the following classification as a result

Negative: 0.321
Neutral: 0.679
Positive: 0.0
Composite: -0.6369

That is, VADER interprets the emoticon and qualifies the sentence as even more intensely negative.

And not only that, it also contextualizes the lexicon as shown in the sentence The food is good compared in the second sentence The food is GOOD. One of the main features of VADER is its ability to recognize capitalization, increasing the intensity of positive and negative words. You can see the table below [4.3] as the capital letters "GOOD" increase the positive intensity of the entire sentence.

	The food is good	The food is GOOD
Positive	0.492	0.548
Neutral	0.508	0.452
Negative	0.00	0.00
Compound	0.4404	0.5622

4.3 Comparative table of two tweets using Vader

Another factor that increases the intensity of the feeling of the sentence is exclamation points or modifying words are two other elements that affect the feeling and VADER contemplates it.

To consult the code used, you can consult the ( [https://github.com/al118345/Vader/blob/master/Analisis_Sentimiento.py] ) where the code used is located.

4.2. Sentiment Analysis and Reputational Polarity Systems with BERT

As discussed above, BERT is a bidirectional model that is based on the transformation architecture explained in the article "Attention is all you need." [20] in addition to replacing the sequential nature of neural networks (long short-term memory-LSTM [18] and gated recurrent units-GRU [21] ) for a much faster attention-based approach. This model is pre-trained for two unsupervised tasks such as masked language modeling and next sentence prediction. This allows programmers to use a pre-trained BERT model and tune it to the specific desired task, i.e. focusing on sentiment classification and reputational polarity.

At this point we will focus on the application of BERT to the problem of text classification. This task will involve classifying each of the documents provided by the dataset according to its sentiment and polarity. To do this, each document can only have a label that represents its sentiment or polarity regarding Bitcoin since we will use different models for each of the tasks. This tag may contain one of the following three states:

Positive
Negative
Neutral

Depending on the purpose of the algorithm, the implementation will be able to choose the training dataset, that is, it will either choose the dataset for reputational polarity or for sentiment analysis. Actually, there will be no difference between both implementations, except for the information provided in the training phase.

In turn, the results obtained by Bert will be stored in a MySQL database for future consultation and study.

To create the model, Google Research shared a tensorflow-based implementation along with the following pre-trained models:

BERT-Base, Uncased: 12-layer, 768-hidden, 12-heads, 110M parameters
BERT-Large, Uncased: 24-layer, 1024-hidden, 16-heads, 340M parameters
BERT-Base, Cased: 12-layer, 768-hidden, 12-heads, 110M parameters
BERT-Large, Cased: 24-layer, 1024-hidden, 16-heads, 340M parameters
BERT-Base, Multilingual Cased (New, recommended): 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
BERT-Base, Chinese: Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters

From this point the project will be subdivided into two different implementations. The first implementation will be based on the code obtained from the Github platform (GitHub is a forge (collaborative development platform) for hosting projects using the Git version control system. It is mainly used for creating source code of computer programs.) through the following link [15] and in the second implementation we will use the example provided by the Google Search team on the web ([https://github.com/google-research/bert]).

4.2.1 Implementation 1

In this implementation, the BERT-Base, Uncased model was chosen with 12 layers and a tokenizer will convert all the text to lowercase.

Additionally, PyTorch was used on top of TensorFlow, using the PyTorch port of BERT explained by HuggingFace on ( [https://github.com/huggingface/pytorch-pretrained-BERT] ). As the author explains, he converts previously trained TensorFlow checkpoints into PyTorch weights using the HuggingFace script.

Another necessary step is to implement the conversion of the tweet into a data type that can be used by BERT. An example is the figure [4.1] where you can see a phrase conversion process my dog it's cute. I have likes playing.

Proceso de conversión de un tweet. — 4.1 Tweet conversion process. Source ( [Medium.com] )

The chosen pre-trained model uses a vocabulary of 30,522 words. The tokenization process will involve breaking down the tweet into a list of tokens that are available in the vocabulary. To deal with words not available in the vocabulary, BERT uses a technique called WordPiece tokenization based on BPE (Byte Pair Encoding). In this approach, a non-vocabulary word is progressively divided into subwords and the word is represented by a group of subwords. Since subwords are part of the vocabulary, we have learned representations of a context for these subwords and the context of the word is simply the combination of the context of the subwords.

Regarding the program configuration, we train the model for 4 iterations with a batch size of 32 and a sequence length of 512, that is, the maximum possible for pre-trained models. The learning rate remained at 3

e^{5}

, as recommended in the original document.

To consult the code used, you can access the following url ( [https://github.com/al118345/bert-toxic-comments-multilabel/blob/master/test_bert-toxic-comments-multilabel.py] ) where the code used for your query is located.

4.2.2 Implementation 2

In this second implementation, we have adapted the code published by google-research on Git Hub (([https://github.com/google-research/bert] )) where an adaptation of the BERT algorithm is exposed. This implementation is very similar to the one we find in section [4.2.1] since implementation 1 is based on the code used in this example and, therefore, shares a large part of the code and configuration such as, for example, the pre-trained model chosen, which has been the BERT-Base, Uncased model with 12 layers as in the previous implementation

The first feature to highlight in Google Research's implementation is how it has added TF Hub (([link] )) as a module added to Tensorflow text pipelines.

The next point is how to transform the tweets into a format that BERT understands. To do this, first select how and what data will be used as input in the constructor provided in the BERT library. This constructor has three input components which are:

Text_a It is the text that we want to classify, that is, the tweet.
Text_b is used if we are training a model to understand the relationship between sentences (i.e. i.e. is text_b a translation of text_a? Is text_b an answer to the question asked by text_a?). This is not applies to the project, therefore, it will be blank for use.
Label is the label given to the tweet. In the implementation used it is True or False

The second part of the transformation is to tokenize the information, to do this the following tasks will be carried out:

Convert the tweet to lowercase
Split words into WordPieces (i.e. "callin'' )
Assign index to words
Add the special tokens "CLS'' and "SEP''

Once the process is finished, the next point to highlight is the model configuration. For this, the algorithm has been implemented with 3 iterations with a size of 32 and a length of 300. The learning rate was maintained at 3

e^{5}

To consult the code used, you can access the following url ([link] ) where the code used for your query is located.

4.3. Reputational Polarity Analysis System with BERT using Tagged BitTweet manually

In this section we will adapt both implementations to predict the reputational polarity of a set of tweets. As previously specified, to analyze reputational polarity we are going to label the tweets in 3 possible ways: positive, negative and neutral. This new need has required adapting both implementations in the following way:

In implementation 1, we have converted the labels into vectors of zeros and ones, so that a tweet can have one of the following tags:

positive= $[0, 1, 0]$
negative= $[1, 0, 0]$
neutral= $[0, 0, 1]$

( [link] )

75,50%

In implementation 2, we have converted the tags to numbers from 0 to 2, so a tweet can have one of the following tags:

positive= $1$
negative= $0$
neutral= $2$

87,03

([link])

By comparing the results you can see how implementation 2 is more accurate and, therefore, is the option that we are going to use in the project.

The results obtained have been:

Implementation 1: 75% Hit rate
Implementation 2: 87,03% Hit rate

4.4. Sentiment Analysis System with BERT using BitTweet labeling manually

In this section we will adapt both implementations identically, that is, we will modify the implementation to adapt the labeled manifold, although with a different objective. On this occasion we will modify the labeling of the tweets to predict the sentiment according to the established system of 3 labels: positive, negative and neutral.

To carry out this task we will adapt the code as follows

In implementation 1, we have converted the labels into vectors of zeros and ones, so that a tweet can have one of the following tags:

positive= $[0, 1, 0]$
negative= $[1, 0, 0]$
neutral= $[0, 0, 1]$

( [link] )

83%

In implementation 2, we have converted the tags to numbers from 0 to 2, so a tweet can have one of the following tags:

positive= $1$
negative= $0$
neutral= $2$

([link])

By comparing the results you can see how implementation 2 is more accurate and, therefore, is the option that we are going to use in the project.

The results obtained have been:

Implementation 1: 83% Hit rate
Implementation 2: 90% Hit rate

4.5. Discussion about the two implementations

As could be seen in this chapter, both codes were implemented by their developers [15] for the binary classification of texts according to their sentiment, that is, they could only describe whether the tweet had a positive or negative sentiment. Initially, in this project we have chosen to implement the first versions following this type of labeling to evaluate both codes, minimally modifying the authors' projects and, being as close to the original implementation as possible, see the number of tweets necessary to create the model at a success rate of 80 percent.

Following this philosophy, the project needed a testing and labeling dataset that could serve as a basis for creating the models and testing them. The tweet used in the tests can only store two types of labels: positive or negative.

Once the dataset and a method to distribute the data set between training and testing have been created, different models have been created with different numbers of tweets to analyze which configuration is most appropriate for the project.

In order to compare the results, the Success Rate has been used and the table has been generated [4.5] where it can already be verified that the second implementation has a higher percentage of success than the first.

The first detail when analyzing the results has been the decrease in the success rate in the first implementation by increasing the number of training tweets by 1500 cases. This may be because the increase in the diversity of the tweets used has generated a loss of efficiency in the model.

Furthermore, by reviewing the number of positive and negative tweets in the tests, you can see how the rate of tweets labeled as positive far exceeds that of tweets labeled as negative. This fact is very important, since a model that labels the entire tweet as positive (case 1 in implementation 1) obtains a success rate of 85.2. This detail, together with the fact that the training and test tweets were very close in time, suggests that the tests applied have been useful to learn and validate the operation of the algorithm, but not definitive of its validity.

Number Tweet Training	Implementation 1 Percentage Success	Implementation 2 Percentage Success
250	85,2	87,2
750	87,8	87,4
1000	88	89,3
1500	85,9	89,2

4.5 Comparative table between both implementations with a test carried out on 1000 tweets collected in April

With the new information collected, the two algorithms have been implemented to analyze reputational polarity using the learned characteristics. The dataset that we have labeled has followed the following three rules learned in this section:

3 labels will be used, positive, negative and neutral, which will define the reputational polarity of the tweet regarding Bitcoin.
The number of tweets from each group will have to be at least 15 percent
The tagged tweets will be selected randomly, that is, they may correspond to different days or months of the year 2019.

Also, analyzing the results, the decision was made to use models with 1459 training tweets and 350 test tweets.

From this moment on, both options have been implemented and in the section [4.4] It has been proven that implementation 2 has a higher success rate and, therefore, it is the option chosen for the project. The results can be checked in the following table: (4.6)

Hit Rate	Sentiment analysis	Reputational Polarity
Implementation 2	90	87
Implementation 1	83	75

4.6 Comparison Implementation with 3 labels

This may be due to the use of PyTorch on TensorFlow from implementation 1, while in the binary prediction it worked correctly, when the project has used 3 labels its success rate with the selected training set has decreased.

Another option may be that both implementations are focused differently. While the first implementation is designed to detect multiple binary features, the second implementation is focused on detecting a single label with multiple options. This detail is what has led to the fact that in the table [4.5] both implementations have a more or less similar result and, on the other hand, in the table [ref] implementation 2 is much more effective.

4.6. prediction system stock market

With the algorithms analyzed and validated, the next point of the project was to predict the reputational polarity and sentiment of the information collected in the section [3] to investigate if there is a correlation with the price of bitcoin.

To accomplish this objective, the first point was to determine which part of the dataset was optimal for the purpose of the section. As documented in the labeling process, the tweets used were written during the months of April, May and June and, therefore, it has been decided to use all the information obtained from April 1 to July 31, with the objective that the data obtained during the first three months serve as training and the July data as a test.

Once the subset of the dataset to be analyzed was chosen, the next step was to measure the number of tweets rated as positive, negative and neutral by both models with the aim of creating a new model that could predict the price action of BitCoin. To do this, we have divided the tweets by hour and the result has been stored in a Mysql database for future interpretation. The next point of the process was to convert the extracted information into data with the aim of being interpreted by the LSTM (Long short-term memory) model. This process will consist of normalizing the number of positive, negative and neutral tweets cataloged for each hour along with the total number of tweets written. As a target variable we will use the price of Bitcoin.

Subsequently, the normalized data will be divided into two subsets, the training one with 1500 hours and the test one with 1255 hours (with a delay of 3 hours).

Regarding the model, as we have previously mentioned, an LSTM system has been generated with the following configuration parameters:

Epochs: 50
Validation split: 0.2
Batch size: 12

Subsequently, the loss function MAE (Mean Absolute Error) and the root-mean-square error (RMSE) have been compiled to verify the goodness of the data obtained.

Finally, the code used is a modification of the one published on the websites [LSTM Model predicting Bitcoin with Tweet Volume \& Sentiment] [14]

The code used can be checked in the following Github repository ([link])

5.Experimental results

How to interpret these metrics

Accuracy alone is insufficient for imbalanced classes. Read it with per-class precision, recall, F1, confusion matrices and a chronological out-of-sample market evaluation. Correlation does not establish profitable causality.

Once the type of implementation has been chosen, the next step is to generate the prediction models for reputational polarity and sentiment analysis and check their metrics.

In the section [5.1] The metrics obtained by analyzing the model generated for reputational polarity will be presented and in the section [5.2] The metrics of the sentiment analysis model can be analyzed.

The code used to obtain the information presented in this section [5.2] and the section [5.1] can be found in the following GitHub repository ([link]).

It is very important to remember at this point what was mentioned above, both to analyze reputational polarity and to carry out a sentiment analysis, the implementation will be exactly the same, the only change will be the set used for training the algorithm.

Finally, in the section [5.3] You can consult the results obtained when making the Bitcoin prediction.

5.1. Reputational Polarity

Once the model has been generated, 346 labeled tweets have been used as a testing set. The results obtained can be checked in the table [5.1] set out below:

Measurements	Negative 0	Positive 1	Neutral 2
precision	0.853	0.812	0.872
Recall	0.617	0.759	0.963
F1	0.716	0.785	0.915

5.1 Test result of the reputational model

The result obtained can be analyzed in the file Polaridad_implementacion1_test.

The model has been correct in its prediction for 87 percent of the cases, but a more detailed study has been necessary. For example, we have found errors like the following labeled positive when they have a negative polarity:

RT @Analyst_G: The whole semiconductor business looks like bitcoin and tulip mania... ([https://t.co/824IIxW6Nh])
Not that's what gold is. Bitcoin has yet to provide itself in a single down cycle.
"Bitcoin is going to get pumped to where the whales are happy. Then the altcoins are going to get pumped.

These cases are difficult to label by a reputational polarity model due to the terms used and how they are distributed. Terms such as happy or the meaning of the comparison in the first tweet have been considered cases that are difficult to predict.

A problem has also been found related to the number of tweets labeled with negative reputational polarity. By carrying out the process naturally, that is, using a sample of 1806 tweets extracted from a database with records stored for 3 months, a collection has been obtained that is a representation of reality, that is, it is not equitable or prepared. Therefore, it may be that the number of tweets labeled as negative is not enough for the model to obtain better results. The distribution of tagged tweets has been as follows:

Negatives: 302 which is 17 percent of the total.
Neutrals: 820 which is 45 percent of the total.
Positives: 684 which is 38 percent of the total.

Despite this detail, since it is a label with three possibilities and represents more than 15 percent, we have considered it valid.

As can be seen, the results have been positive enough to be able to continue analyzing the models. A point to highlight is that due to the distribution of tweets in the dataset and its relationship with the results. As previously mentioned, the number of tweets with negative polarity is less than the examples labeled as neutral or positive and this characteristic has caused the model to have more difficulty detecting negative tweets than positive and neutral ones and is reflected in the graphs in this way.

Análisis de la relación entre la precisión y la sensibilidad para cada una de las etiquetas — 5.51 Analysis of the relationship between precision and sensitivity for each of the labels

The next point was to analyze the metrics obtained to calculate the ROC curve (acronym for Receiver Operating Characteristic) (Graphic representation of sensitivity versus specificity for a binary classifier system as the discrimination threshold is varied) for each of the labels. All this information has been represented in the graphs [5.2] where you can check the ratio of true positives (VPR = True Positive Ratio) against the ratio or false positive ratio (FPR = False Positive Ratio) graphically.

Curva Roc de las diferentes etiquetas utilizadas para la Polaridad Reputacional. — 5.2 Roc curve of the different labels used for Reputational Polarity.

As can be verified, by analyzing both images it is stated that the curves of the results obtained for each of the possibilities are accurate enough to continue with the project.

Finally, the results obtained will be shown through a confusion matrix.

3 Matriz de confusión para el polaridad reputacional — 5.3 Confusion matrix for reputational polarity

If desired, the prediction results are also available in the file Polaridad_Implementacion2_test.xlsx for consultation.

5.2. Sentiment analysis

In the next point, the metrics obtained by the model used for sentiment prediction will be analyzed. For this point, 349 tweets have been used as a testing set and the results obtained can be seen in the table [5.2] set out below:

Measurements	Negative 0	Positive 1	Neutral 2
precision	0.843	0.875	0.940
Recall	0.908	0.875	0.912
F1	0.874	0.875	0.926

5.2 Test result of the model for sentiment analysis

If desired, the prediction results are also available in the Sentimiento_implementacion3_test file for consultation.

Unlike the previous case, in the sentiment we have managed to catalog 1852 tweets with the following distribution:

Negatives: 414 which is 22 percent of the total.
Neutrals: 738 which is 40 percent of the total.
Positives: 700 which is 38 percent of the total.

Being a pre-trained model for sentiment analysis, the first characteristic to comment on is the increase in the success rate for cataloging Negative tweets. As can be consulted in the section [BERT] , it is easier for the algorithm used to predict the sentiment than the polarity and this characteristic is shown in the results obtained.

As can be seen, the results have been positive enough to be able to continue analyzing the models and continue with the project. As in the previous section, the distribution of tweets in the dataset and its relationship with the results has also been reflected in the results. As already mentioned in the section [3] , the number of tweets with negative sentiments is less than the examples labeled as neutral or positive. This characteristic may have caused the model to have more difficulty detecting negative tweets than positive and neutral tweets and is reflected in the graphs as follows:

The next point was to analyze the metrics obtained to calculate the ROC curve (acronym for Receiver Operating Characteristic) for each of the labels. All this information has been represented in the graphs [5.5] and [5.6] where you can check the ratio of true positives (VPR = True Positive Ratio) against the ratio or false positive ratio (FPR = False Positive Ratio) graphically. The only difference between both figures is the range of the graph, in the second more approximate than in the first.

Curva Roc de las diferentes etiquetas utilizadas para el Análisis del Sentimiento. — 5.5 Roc curve of the different labels used for Sentiment Analysis.

Aproximación de la curva Roc para las diferentes etiquetas utilizadas en el Análisis del Sentimiento. — 5.6 Approximation of the Roc curve for the different labels used in Sentiment Analysis.

Finally, the results obtained will be shown through a confusion matrix.

Matriz de confusión para el análisis del sentimiento — 7 Confusion matrix for sentiment analysis

5.3. Stock market value prediction

In this section, the results obtained through the prediction made with the data generated from the section will be evaluated. [4.6] .

The first point has been to generate a correlation graph of all the characteristics that were included in the prediction model for reputational polarity, that is, the relationship that exists between the number of positive, negative, neutral tweets, the total number of tweets and the price of Bitcoin. This will give a clearer indication of which features may be more important than others.

To do this we will start with the figure [5.8] where the correlation between the different variables will be shown.

Correlación entre las variables — 5.8 Correlation between variables

When analyzing the graph, what is clear from the plot is that the volume of positive and negative Tweets has a negative correlation with the value of Bitcoin. The figure [5.9] It offers information about the complete series of data ordered in time with the aim of trying to observe some pattern with the naked eye, but it has not been possible.

Serie completa de los datos ordenados en el tiempo — 5.9 Complete series of data ordered in time

The figure shows how the first part of the table shows the evolution of the price of Bitcoin over time, followed by the evolution of the number of tweets issued with positive, negative and neutral reputational polarity. The 5th and 6th row shows the evolution of the number of negative and positive tweets labeled by VADER, ending in the last row with a graph on the total evolution of tweets.

From this point, and once seen that, although there is a correlation, it seems a little hidden to the naked eye, it has been decided to try to apply a prediction algorithm explained in section [4.6] . In the figure [5.10] You can check the evolution of the price of Bitcoin in the selected time and on the graph [5.11] prediction can be seen superimposed against reality.

Evolución del precio del Bitcoin desde abril a julio — 5.10 Evolution of the price of Bitcoin from April to July

5.11 Prediction of the price of Bitcoin based on the results obtained from the reputational polarity model against the real evolution of the price in the month of June to July

On the graph [5.11] , the blue line represents the actual price during the testing period and the green line represents the predicted price. We remember that the normalized data is divided into two subsets, the training one with 1500 hours and the test one with 1255 hours represented in the graph

Regarding the error measurement, the following information has been provided:

MSE test: 5498.255
Test RMSE: 74.150

With all this information, and with the aim of validating the study, the estimator code has been modified to use the information obtained from the Vader Python library to replace the prediction made by the model. At this point, the variables we will use will be the number of positives and negatives obtained when applying the library, so we will use 3 input elements and not 5 as in the previous algorithm. The results can be seen in the figure [5.12] and as you can see with the naked eye, they are not as accurate as with the previous model.

Predicción del precio del Bitcoin a partir de la librería de VADER contra la evolución real del precio — 5.12 Prediction of the price of Bitcoin from the VADER library against the real evolution of the price

From this point, and to finish the project, the algorithm will be modified to use the information collected by the sentiment analysis analysis model with Bert. This algorithm will be very similar to the first, only the data source will be modified so that it obtains the information from a different column.

At this point, the correlation that exists between the mentioned variables will begin to be analyzed. In the figure [5.13] The correlation between the different variables will be shown.

Correlación entre las variables del modelo con sentimiento — 13 Correlation between the variables of the model with sentiment

When analyzing the graph, what is clear from the plot is that the volume of neutral Tweets and the total number of tweets has a stronger negative correlation with the value of Bitcoin than the number of positive and negative ones. The figure [5.14] It offers information about the complete series of data ordered in time with the aim of trying to observe some pattern with the naked eye, but it has not been possible.

The figure shows how the first part of the table shows the evolution of the price of Bitcoin over time, followed by the evolution of the number of tweets issued with positive, negative and neutral sentiment. The 5th and 6th row shows the evolution of the number of negative and positive tweets labeled by VADER, ending in the last row with a graph on the total evolution of tweets.

From this point on, the same prediction algorithm has been applied as in the previous case. In the figure [5.15] You can see the estimate obtained superimposed on the real price of Bitcoin.

Predicción del precio del Bitcoin a partir de los resultados obtenidos del modelo de análisis del sentimiento contra la evolución real del precio en el mes de Junio a Julio — 5.15 Prediction of the price of Bitcoin based on the results obtained from the sentiment analysis model against the real evolution of the price in the month of June to July

On the graph [5.15] , the blue line represents the real price during the testing period and if we compare it with the one obtained in the graph [5.11] It's not that exact.

Regarding the error measurement, the following information has been provided:

MSE test: 28742.381
Test RMSE: 169,536

If we compare the results obtained in the table [5.4] We can numerically verify how the predictions obtained by the reputational polarity predictive model are better than those obtained by the sentiment predictive model. The larger the MSE or RMSE, the greater the dispersion of the data around its central moment (mean), and a smaller MSE or RMSE will mean the opposite. Really what we are looking for is a smaller MSE, since it would translate into a smaller error, therefore, reputational polarity better meets that condition.

Furthermore, if we compare the success that has been obtained when predicting the price of Bitcoin, it can be confirmed that the figure [5.11] It is more precise than the figure [5.15] . That is, in the project we have managed to adapt an algorithm for sentiment analysis and provide it with a new objective, detecting the reputational polarity of tweets. To do this we have only modified the labeling criteria and generated the model adapted for this purpose.

Measurements	Feeling	Reputational Polarity
MSE	28742.381	5498.255
RMSE	169.536	74.150

5.4 Comparative table of the results obtained

6.Conclusions

This chapter is divided into two sections: the first presents the conclusions of the project and the second outlines future work.

6.1. Conclusions

Although sentiment analysis is still the most widely used Natural Language Processing tool for online reputation monitoring, the thesis has shown that the sentiment expressed in a text about Bitcoin and its reputational implications for that entity are different things. In fact, as discussed throughout the thesis, many texts with reputational implications are polar facts, that is, factual information with no explicit emotional language.

Starting from that distinction, the goal of the thesis was to compare automatic reputational-polarity analysis with sentiment analysis for predicting Bitcoin price behavior, under the hypothesis that reputational analysis should be more directly connected to market valuation and therefore should be a better predictor. Bitcoin was chosen because it is highly volatile and therefore a demanding target for predictive models, while Twitter was selected as the textual source because it is easy to collect, immediate, and suitable for temporal correlation with market data.

Since there was no existing dataset suited to this objective, the first contribution of the project was the creation of BitTweet, a dataset of tweets mentioning Bitcoin that was manually labeled for sentiment and reputational polarity and linked to Bitcoin price information. The work was built from a database with 792,792 tweet records in English about Bitcoin and 20,604 economic records about the cryptocurrency collected between March and August 2019. These annotations make it possible to quantify the difference between sentiment and reputational polarity, evaluate models for both tasks, and assess stock-prediction models built from tweets.

The second contribution was the application of state-of-the-art Natural Language Processing, especially contextual word embeddings through BERT, to reputational-polarity estimation and market prediction, always in comparison with sentiment analysis. The results confirm the original hypothesis: reputational polarity is a better predictor than sentiment analysis. All experiments were also compared against VADER as a baseline throughout the project.

These findings are particularly relevant in the case of Bitcoin, which is not backed by any state or central institution and whose value depends largely on the trust and behavior of its users. Those same users generate the news, reactions and discussion that circulate on social platforms. For that reason, in the Bitcoin context it is not enough to rely only on traditional economic indicators; it is also necessary to consider signals such as reputational polarity when building predictive models.

6.2. Future work

A first line of future work would be to improve the labeling process by involving more annotators and comparing the resulting labels with those generated by other annotators or alternative schemes. It would also be useful to work with datasets in other languages in order to test how well BERT performs with pretrained models in Chinese, Spanish or other languages.

From the implementation perspective, the training set could be enlarged and the current solution could be compared with other architectures or language models, including alternatives such as GPT-2, in order to validate whether BERT is really the strongest option for this task.

Regarding the financial application, another natural extension would be to study whether reputational polarity is correlated with the market value of other cryptocurrencies such as Ethereum, or with less volatile financial assets, to see whether the same methodology remains effective in different scenarios.

Finally, Twitter was the only data source used in this thesis. Future work could incorporate other kinds of data such as Facebook, blogs or news websites. BitTweet could also be enriched by weighting opinions according to retweets, replies or other interaction signals, following the perspective proposed in Estimating reputation polarity on microblog posts.

References

1.Titulo
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Autor
Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina
Publicacion
arXiv preprint arXiv:1810.04805
Url
2.Titulo
The Information of Spam
Autor
Anderson, Sawyer C
Publicacion
Url
3.Titulo
Twitter sentiment analysis: The good the bad and the omg!
Autor
Kouloumpis, Efthymios and Wilson, Theresa and Moore, Johanna
Publicacion
Fifth International AAAI conference on weblogs and social media
Url
4.Titulo
Estimating reputation polarity on microblog posts
Autor
Peetz, Maria-Hendrike and de Rijke, Maarten and Kaptein, Rianne
Publicacion
Information Processing \& Management
Url
5.Titulo
Modelling techniques for twitter contents: A step beyond classification based approaches
Autor
Castellanos, Angel and Cigarran, Juan and Garcia-Serrano, Ana
Publicacion
CLEF 2013 Conference and Labs of the Evaluation Forum
Url
6.Titulo
Filtering and polarity detection for reputation management on tweets
Autor
Hangya, Viktor and Farkas, Rich{\'a}rd
Publicacion
CEUR WORKSHOP PROCEEDINGS
Url
7.Titulo
Like it or not: A survey of twitter sentiment analysis methods
Autor
Giachanou, Anastasia and Crestani, Fabio
Publicacion
ACM Computing Surveys (CSUR)
Url
8.Titulo
Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval
Autor
Pang, Bo and Lee, Lillian and others
Publicacion
Url
9.Titulo
Sentiment analysis of twitter data
Autor
Agarwal, Apoorv and Xie, Boyi and Vovsha, Ilia and Rambow, Owen and Passonneau, Rebecca
Publicacion
Proceedings of the Workshop on Language in Social Media (LSM 2011)
Url
10.Titulo
Thumbs up?: sentiment classification using machine learning techniques
Autor
Pang, Bo and Lee, Lillian and Vaithyanathan, Shivakumar
Publicacion
Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10
Url
11.Titulo
Word association norms, mutual information, and lexicography
Autor
Church, Kenneth Ward and Hanks, Patrick
Publicacion
Computational linguistics
Url
12.Titulo
Topic-specific stylistic variations for opinion retrieval on twitter
Autor
Giachanou, Anastasia and Harvey, Morgan and Crestani, Fabio
Publicacion
European Conference on Information Retrieval
Url
13.Titulo
Aggregation methods for proximity-based opinion retrieval
Autor
Gerani, Shima and Carman, Mark and Crestani, Fabio
Publicacion
ACM Transactions on Information Systems (TOIS)
Url
14.Titulo
LSTM Model predicting Bitcoin with Tweet Volume \& Sentiment
Autor
Simpson, Paul
Publicacion
Medium
Url
https://medium.com/@DrPaulSimpson/lstm-model-predicting-bitcoin-with-tweet-volume-sentiment-bc3c490271a7
15.Titulo
Multi-label Text Classification using BERT – The Mighty Transformer
Autor
Trivedi, Kaushal
Publicacion
Medium
Url
https://medium.com/huggingface/multi-label-text-classification-using-bert-the-mighty-transformer-69714fa3fb3d
16.Titulo
Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing
Autor

Publicacion
Google AI Blog
Url
https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html
17.Titulo
Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews
Autor
Turney, Peter D
Publicacion
Proceedings of the 40th annual meeting on association for computational linguistics
Url
18.Titulo
Long Short-term Memory
Autor
Hochreiter, Sepp and Schmidhuber, Jürgen
Publicacion
Neural computation
Url
19.Titulo
On the properties of neural machine translation: Encoder-decoder approaches
Autor
Cho, Kyunghyun and Van Merri{\"e}nboer, Bart and Bahdanau, Dzmitry and Bengio, Yoshua
Publicacion
arXiv preprint arXiv:1409.1259
Url
20.Titulo
Attention is all you need
Autor
Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia
Publicacion
Advances in neural information processing systems
Url
21.Titulo
Sentiment propagation for predicting reputation polarity
Autor
Giachanou, Anastasia and Gonzalo, Julio and Mele, Ida and Crestani, Fabio
Publicacion
European Conference on Information Retrieval
Url
22.Titulo
Bitcoin Trading Agents
Autor
Bell, Tom
Publicacion
University of Southampton
Url
23.Titulo
Algorithmic trading of cryptocurrency based on Twitter sentiment analysis
Autor
Colianni, Stuart and Rosales, Stephanie and Signorotti, Michael
Publicacion
CS229 Project
Url
24.Titulo
Bitcoin: A peer-to-peer electronic cash system
Autor
Nakamoto, Satoshi and others
Publicacion
Url
25.Titulo
Bitcoin: planteamiento y protocolo
Autor
Víctor Díaz Marco
Publicacion
Víctor Díaz Marco
Url
https://v0ctor.me/bitcoin
26.Titulo
Bitcoin y ether se derrumban en los últimos días ante las amenazas de China y Rusia
Autor
elEconomista.es
Publicacion
elEconomista.es
Url
https://cutt.ly/kr1wHdx
27.Titulo
A latent source model for nonparametric time series classification
Autor
Chen, George H and Nikolov, Stanislav and Shah, Devavrat
Publicacion
Advances in Neural Information Processing Systems
Url
28.Titulo
Automated bitcoin trading via machine learning algorithms
Autor
Madan, Isaac and Saluja, Shaurya and Zhao, Aojia
Publicacion
URL: http://cs229. stanford. edu/proj2014/Isaac\% 20Madan
Url
29.Titulo
Using VADER to handle sentiment analysis with social media text
Autor
Burchell, Jodie}}
Publicacion
Standard error Full Atom
Url
http://t-redactyl.io/blog/2017/04/using-vader-to-handle-sentiment-analysis-with-social-media-text.html
30.Titulo
Sentiment Analysis or Opinion Mining: A Review
Autor
Saad, Saidah and Saberi, Bilal
Publicacion
International Journal on Advanced Science, Engineering and Information Technology
Url
31.Titulo
Docs - Twitter Developers
Autor

Publicacion
Twitter
Url
https://developer.twitter.com/en/docs.html
32.Titulo
Exploring the determinants of Bitcoin's price: an application of Bayesian Structural Time Series
Autor
Poyser, Obryan
Publicacion
arXiv preprint arXiv:1706.01437
Url
33.Titulo
Visualizing Polarity-based Stances of News Websites
Autor
Masaharu Yoshioka and
Publicacion
Proceedings of the Second International Workshop on Recent Trends
Url
https://dblp.org/rec/bib/conf/ecir/YoshiokaJAK18
34.Titulo
Inferring causal impact using Bayesian structural time-series models
Autor
Brodersen, Kay H and Gallusser, Fabian and Koehler, Jim and Remy, Nicolas and Scott, Steven L and others
Publicacion
The Annals of Applied Statistics
Url

Comparing reputational polarity with sentiment analysis for stock prediction

Keywords

Historical scope and reproducibility

Contents

1. Introduction

1.1. Online Reputation

1.2. Sentiment analysis vs reputational polarity

1.3. Goals

1.4. Methodology

1.4.1 Bitcoin

1.5. Brief description of the other chapters of the report

2.State of the Art

2.1. Sentiment analysis vs polarity reputational

2.1.1 Natural Language Processing

Location proximity

Bert

2.2. Bitcoin value prediction from networks social

3.BitTweet dataset

3.1. Sources of information

3.1.1 Twitter

3.1.2 Bitcoin and Blockchain

3.2. Collection

3.2.1 Reputational Polarity

3.2.2 Assessment of feeling

3.3. Process of labeling

3.4. Discussion

4.Experimental design

Reproducibility checklist

4.1. Sentiment Analysis System with VADER

4.2. Sentiment Analysis and Reputational Polarity Systems with BERT

4.2.1 Implementation 1

4.2.2 Implementation 2

4.3. Reputational Polarity Analysis System with BERT using Tagged BitTweet manually

4.4. Sentiment Analysis System with BERT using BitTweet labeling manually

4.5. Discussion about the two implementations

4.6. prediction system stock market

5.Experimental results

How to interpret these metrics

5.1. Reputational Polarity

5.2. Sentiment analysis

5.3. Stock market value prediction

6.Conclusions

6.1. Conclusions

6.2. Future work

References

1.Titulo

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Autor

Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina

Publicacion

arXiv preprint arXiv:1810.04805

Url

2.Titulo

The Information of Spam

Autor

Anderson, Sawyer C

Publicacion

Url

3.Titulo

Twitter sentiment analysis: The good the bad and the omg!

Autor

Kouloumpis, Efthymios and Wilson, Theresa and Moore, Johanna

Publicacion

Fifth International AAAI conference on weblogs and social media

Url

4.Titulo

Estimating reputation polarity on microblog posts

Autor

Peetz, Maria-Hendrike and de Rijke, Maarten and Kaptein, Rianne

Publicacion

Information Processing \& Management

Url

5.Titulo

Modelling techniques for twitter contents: A step beyond classification based approaches

Autor

Castellanos, Angel and Cigarran, Juan and Garcia-Serrano, Ana

Publicacion

CLEF 2013 Conference and Labs of the Evaluation Forum

Url

6.Titulo