Metrics Exported to S3 - Technical Documentation

    Overview

    Santiment exports metrics data to Amazon S3 in Parquet format, providing an efficient way to access large-scale time-series data. This data can be queried directly from S3 using analytics tools or downloaded and stored locally.

    Data Structure

    The exported data is organized in a hierarchical directory structure:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    s3://santiment-metrics/
    ├── assets.parquet          # Asset metadata lookup table
    ├── metrics.parquet         # Metric metadata lookup table
    └── metrics/
        ├── intraday/           # 5-minute interval metrics data
        │   └── {metric_name}/
        │       └── {year}/
        │           └── {month}/
        │               └── data.parquet
        └── daily/              # Daily aggregated metrics data
            └── {metric_name}/
                └── {year}/
                    └── {month}/
                        └── data.parquet

    Path Components

    • {metric_name}: The name of the metric (e.g., price_usd, active_addresses)
    • {year}: Four-digit year (e.g., 2024)
    • {month}: Two-digit month with leading zero (e.g., 01 for January, 12 for December)

    Metrics Data Files

    Path: s3://santiment-metrics/metrics/{interval}/{metric_name}/{year}/{month}/data.parquet

    Schema:

    ColumnTypeDescription
    asset_idINTEGERForeign key reference to assets.parquet
    metric_idINTEGERForeign key reference to metrics.parquet
    dtTIMESTAMPDate and time of the measurement
    valueDOUBLEThe metric value

    Sample Data:

    1
    2
    3
    4
    asset_id | metric_id | dt                  | value
    ---------|-----------|---------------------|------------------
    378      | 1289      | 2024-01-01 00:00:00 | 42280.23527619226
    378      | 1289      | 2024-01-01 00:05:00 | 42384.54766118636

    Assets Lookup Table

    Path: s3://santiment-metrics/assets.parquet

    Schema:

    ColumnTypeDescription
    asset_idINTEGERUnique identifier for the asset
    nameSTRINGFull name of the asset
    ticker_slugSTRINGTicker symbol and slug combination
    decimalsINTEGERNumber of decimal places for the token

    Sample Data:

    1
    2
    3
    4
    5
    asset_id | name                           | ticker_slug                      | decimals
    ---------|--------------------------------|----------------------------------|----------
    1        | 0chain                         | ZCN_0chain                       | 10
    2        | 0x                             | ZRX_0x                           | 18
    3        | 0x0-ai-ai-smart-contract       | 0x0_0x0-ai-ai-smart-contract     | 9

    Metrics Lookup Table

    Path: s3://santiment-metrics/metrics.parquet

    Schema:

    ColumnTypeDescription
    metric_idINTEGERUnique identifier for the metric
    nameSTRINGFull name of the metric
    versionSTRINGVersion or start date of the metric
    statusSTRINGCurrent status (e.g., production, testing)

    Sample Data:

    1
    2
    3
    4
    5
    metric_id | name                                              | version    | status
    ----------|---------------------------------------------------|------------|------------
    1         | funding_rates_aggregated_per_exchange             | 2019-01-01 | production
    2         | funding_rates_aggregated_per_settlement_currency  | 2019-01-01 | production
    3         | total_funding_rates_aggregated_per_asset          | 2019-01-01 | production

    Querying Data with ClickHouse

    In the following examples we use ClickHouse s3 Table Function to request data from S3 buckets and process it using SQL queries. But You can use any other instrument which supports data reading from S3 buckets.

    Basic Query Example

    The following example demonstrates how to query metrics data with proper asset and metric name resolution:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    SELECT 
        assets.name AS asset,
        metrics.name AS metric,
        dt,
        value
    FROM s3(
        url='s3://santiment-metrics/metrics/intraday/price_usd/2024/01/data.parquet',
        access_key_id='YOUR_KEY_ID', 
        secret_access_key='YOUR_KEY_SECRET', 
        format='Parquet'
    ) AS data
    LEFT JOIN (
        SELECT 
            asset_id,
            name
        FROM s3(
            url='s3://santiment-metrics/assets.parquet',
            access_key_id='YOUR_KEY_ID', 
            secret_access_key='YOUR_KEY_SECRET', 
            format='Parquet'
        )
    ) AS assets USING asset_id
    LEFT JOIN (
        SELECT 
            metric_id,
            name
        FROM s3(
            url='s3://santiment-metrics/metrics.parquet',
            access_key_id='YOUR_KEY_ID', 
            secret_access_key='YOUR_KEY_SECRET', 
            format='Parquet'
        )
    ) AS metrics ON metrics.metric_id = data.metric_id;

    Using Wildcards

    You can use wildcards in the S3 path to query multiple files at once. The following components support wildcard patterns:

    • Interval: intraday or daily
    • Metric name: price_usd, active_addresses, etc.
    • Year: 2024, 2023, etc.
    • Month: 01, 02, ..., 12

    Example: Query all months in 2024 for a specific metric

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    SELECT 
        assets.name AS asset,
        metrics.name AS metric,
        dt,
        value
    FROM s3(
        url='s3://santiment-metrics/metrics/intraday/price_usd/2024/*/data.parquet',
        access_key_id='YOUR_KEY_ID', 
        secret_access_key='YOUR_KEY_SECRET', 
        format='Parquet'
    ) AS data
    LEFT JOIN (
        SELECT asset_id, name
        FROM s3(
            url='s3://santiment-metrics/assets.parquet',
            access_key_id='YOUR_KEY_ID', 
            secret_access_key='YOUR_KEY_SECRET', 
            format='Parquet'
        )
    ) AS assets USING asset_id
    LEFT JOIN (
        SELECT metric_id, name
        FROM s3(
            url='s3://santiment-metrics/metrics.parquet',
            access_key_id='YOUR_KEY_ID', 
            secret_access_key='YOUR_KEY_SECRET', 
            format='Parquet'
        )
    ) AS metrics ON metrics.metric_id = data.metric_id
    WHERE asset.name = 'bitcoin';

    Intraday vs Daily Metrics

    Intraday Metrics

    • Path: s3://santiment-metrics/metrics/intraday/{metric_name}/{year}/{month}/data.parquet
    • Granularity: 5-minute intervals
    • Use cases: Intraday trading, detailed analysis, real-time monitoring

    Daily Metrics

    • Path: s3://santiment-metrics/metrics/daily/{metric_name}/{year}/{month}/data.parquet
    • Granularity: Daily aggregated values
    • Use cases: Long-term trend analysis, reduced data volume, historical comparisons