Sentiment Metrics

    Sentiment Analysis

    Definition

    The sentiment metrics are build on top of the Social Data

    Sentiment Analysis is the problem of computationally identifying and categorizing emotions, opinions and subjective information in a given piece of text. This problem can be solved using different techniques: rule-based or machine learning. The first one represents a set of predefined rules that are used to estimate the sentiment of the input text. This approach is often less accurate and requires a lot of manual work. The amount of documents in our Social Data storage makes it barely impossible to analyze them manually. That's why we use machine learning to approach the sentiment analysis problem.

    Sentiment Score

    We trained a machine learning model on a large Twitter dataset, that contains over 1.6 million tweets, each labelled as either positive or negative. This model is then used to evaluate the sentiment of each single document in the Social Data set, i.e. it assigns a positive and negative sentiment score to each message/post/comment/etc. These scores are probabilities that the content of the text being analyzed is positive or negative respectively. Therefore both the positive and negative sentiment scores fall in a range between 0 (not positive/negative at all) and 1 (extremely positive/negative). Moreover, the sum of these two scores always equals 1.

    Example:

    1
    I'm really excited about the new Libra currency!

    This message has a positive score of 0.75 and a negative score of 0.25.

    We use this approach for messages and comments from social networks conversations because the structure of the text there is usually more or less the same: short messages with a single and/or simple idea behind them. But this is not the case for all the messages: some of them might be long and complicated, some might be just neutral or contain spam or other irrelevant information. These kind of messages usually have a pretty vanished pair of sentiment scores: both positive and negative scores are close to 0.5. We don't include these kind of messages while calculating the Sentiment Metrics: they are filtered out by a certain threshold.

    Sentiment Metrics

    1. Positive (Negative) Sentiment

    Definition

    The total sum of positive (negative) sentiment scores of a given set of documents over time. Only scores that are equal or higher than 0.7 are taken into account. Can be calculated for a certain asset or for any given search term, similar to the social volume.

    Measuring Unit

    Relative number, less or equal than the corresponding social volume.

    Frequency

    We store each of the social data documents with its absolute timestamp. I.e. it is possible to aggregate the data with any desired interval on request. Currently the time intervals we use are the following:

    Latency

    The sentiment scores are calculated every 5 minutes. Taking into account that the social data itself is quasi-realtime, the maximal latency is 5 minutes.

    Available Assets

    We do not separate or filter the social data being collected by assets. I.e. we can calculate this metric for any asset. More on this can be found here.

    How to Access

    Sanbase

    The metric is available for any selected asset.

    Availability

    FreeBasicProPro+Enterprise
    Sanbase
    SanAPI
    Sansheets

    2. Average Sentiment

    Definition

    The difference between the Positive and Negative Sentiment metrics.

    Measuring Unit

    Relative number. This metric falls in the range [-social_volume, +social_volume] where social_volume is the corresponding social volume.

    Frequency

    Same as Positive (Negative) Sentiment.

    Latency

    Same as Positive (Negative) Sentiment.

    Available Assets

    Same as Positive (Negative) Sentiment.

    How to Access

    Sanbase

    The metric is available for any selected asset.

    Availability

    FreeBasicProPro+Enterprise
    Sanbase
    SanAPI
    Sansheets

    3. Sentiment Weighted

    Definition

    The Sentiment Weighted is an improved version of the Sentiment Balance that also takes into account the Unique Social Volume.

    Sentiment Weighted is defined as a rolling Z-score of .

    More precisely we choose a duration which will be the length of our sliding window. Then for any timestamp we consider the population consisting of all values of for all timestamps between and . If we use and to denote mean and standard deviation, then we define Sentiment Weighted as:

    Intuitively this score can be explained as a social-volume-weighted sentiment balance. I.e. this metric will spike when the social volume is really high and the vast majority of the messages in it are very positive at the same time. Dips will occur when the social volume again is high, but the overall sentiment is negative. In case the volume is high but the sentiment is mixed, or the sentiment has a strong positive (negative) polarity but with a low volume, the Sentiment Weighted metric won't have significant changes and will stay around 0.

    Measuring Unit

    Relative number. Theoretically this metric has no lower or upper limit, but normally it lies in the range of [-3, 3]. Values from outside this range indicate that something abnormal is happening.

    Frequency

    Same as Positive (Negative) Sentiment.

    Latency

    Same as Positive (Negative) Sentiment.

    Available Assets

    Same as Positive (Negative) Sentiment.

    How to Access

    Sanbase

    The metric is available for any selected asset.

    Availability

    FreeBasicProPro+Enterprise
    Sanbase
    SanAPI
    Sansheets