Social Data
Definition
By Social data, we consider a set of crypto-related discussions from the internet that we collect and store in the form of text documents. It includes chat conversations, forum posts and comments, tweets, and other pieces of text (usually pretty short). In addition to the text itself, we also store some metadata that depends on the data source: user_id, hashtags, chat_title, etc. The Social data serves as a base layer for various statistics and metrics that we build on top of it.
How the data is collected
We have a custom data scraper for each of the data sources we track. The scrapers collect all the new incoming messages/posts/comments in real-time. We also have a historical scraper to fetch historical text data from the past for almost all sources. We store the data in a NoSQL database, which enables us to run full-text search queries on top of it very fast.
Available Assets
For each data source, we have a curated list of accounts/sub-sources from where we scrape the data. We collect all the available text documents and don't separate the incoming data by assets, i.e., the metrics we build on top of the social data are theoretically available for any asset (although in fact, for projects with a small market capitalization, the level of conversation around them is usually very low).
Available data sources
Telegram
We track a curated list with over 400 crypto-related Telegram chats. For each of them, we have the entire history of the chat.
In the metric name this source is available as telegram
. Example: social_volume_telegram
.
Latency. We collect the messages in real-time (1-2s delay max.)
History. Each chat has its complete historical data; the oldest discussion starts
at 2016-03-29
.
We track a curated list with over 4000 crypto-related and NFT-related Twitter accounts. For each of them, we collect their tweets, their retweets, and all the replies to their tweets.
In the metric name this source is typed as twitter
. Example: social_volume_twitter
.
Latency: We collect all the tweets in real-time.
History: The historical data starts at 2018-02-13
.
We track a curated list with over 350 crypto-related subreddits. For each of them we collect the posts themselves, as well as all the comments to these posts.
In the metric name this source is typed as reddit
. Example: social_volume_reddit
.
Latency: All the posts and comments are collected in real-time (1-2s delay max.)
History: The historical data starts at 2016-01-01
.
Bitcointalk
We collect all the new public posts from bitcointalk.org. We also have the full historical data for the whole forum.
In the metric name this source is typed as bitcointalk
. Example: social_volume_bitcointalk
.
Latency: The scraper goes through all the new messages once per 10 seconds.
History: we have collected the entire forum history, starting from
2009-11-22
.
Youtube Videos
We collect the transcribed text from youtube videos from a manually curated list of channels.
In the metric name this source is typed as youtube_videos
. Example: social_volume_youtube_videos
Latency: The scraper goes through all the new videos once per day.
History: The historical data starts at 2021-06-02
4chan
We collect posts from 4chan.org/biz
In the metric name this source is typed as 4chan
. Example:
social_volume_4chan
Latency: The scraper goes though all the new posts once per 5 minute.
History: The historical data starts at 2023-02-05
Farcaster
We collect posts from Farcaster
In the metrics' names this source is typed as farcaster
. Example:
social_volume_farcaster
Latency: The scraper goes though all the new posts once per 5 minute.
History: The historical data starts at 2024-04-01
Total
A combination of all available sources.
In the metric name this source is typed as total
. Example: social_volume_total
.
Talk to us in Discord
Still have some questions left? Join our Discord and get help from the Santiment team!
Go to Discord