Skip to main content
Version: v4.0.0 [Denim]

Overview

This guide provides a comprehensive overview of the monitoring and data storage stack. You will learn about the architecture, the structure of the data, what data sources are available, and how to query them.

A comprehensive observability and data collection are essential for:

  • Monitoring the health of the system: Detect and alert on anomalies before they impact service.
  • Troubleshooting problems: Use historical data to pinpoint the root cause of failures.
  • Optimizing performance: Identify resource bottlenecks to fine-tune network throughput even with AI/ML tools.

VictoriaMetrics

alt text

From the Denim release, the full observability stack moved to VictoriaMetrics: a fast, scalable time series database which offers a full monitoring solution. It is designed to collect, store, and process real-time metrics, optimized for high performance and low resource usage.

Monitoring Stack

To provide a robust and efficient monitoring solution, our deployment uses a finely tuned version of the VictoriaMetrics stack that prioritizes your network deployments. We utilize vmsingle, a single-node version of VictoriaMetrics that simplifies management and reduces resource overhead while still providing powerful monitoring capabilities. This particular configuration supports data storage and retention for up to 7 days.

This streamlined approach is built on three main components:

  • vmsingle: Acting as the core of our stack, vmsingle is an all-in-one binary that handles data ingestion, long-term storage, and querying. It provides the power of a complete time-series database in a single, easy-to-manage component.
  • vmagent: This is a lightweight agent responsible for discovering and scraping metrics from various sources within the cluster (such as pods and services that expose data in the Prometheus format). It efficiently collects this data and forwards it to vmsingle.
  • grafana: The industry-standard visualization tool for creating rich, interactive dashboards. It connects directly to vmsingle as a data source, allowing you to monitor the health and performance of your cloud-based 5G network in real-time.

High-Level Architecture

The following diagram illustrates the simple, centralized data flow within the monitoring stack:

The process begins with a diverse set of Data Sources, which provide a complete picture of the system's health and performance. These include Kubernetes and energy metrics exposed as Prometheus-formatted endpoints, as well as low-level RAN metrics produced directly by the xApps.

Once this data is collected and stored by the monitoring stack, it can be accessed through a variety of Applications and Tools. These range from interactive Grafana dashboards for visualization, to programmatic access for rApps, to direct queries via the HTTP API.

Deprecated MySQL Path

The diagram shows a legacy data path where RAN metrics are also sent to a MySQL database. This path is deprecated and is maintained for backward compatibility only. It is strongly recommended to use the primary VictoriaMetrics data flow for all new applications and queries, as the MySQL path will be removed in a future release.

Data Model

VictoriaMetrics organizes data using the same powerful and widely adopted data model as Prometheus. All data is stored as time-series: streams of timestamped values belonging to the same metric and the same set of labeled dimensions.

A unique time-series is identified by the combination of its metric name and its labels. Each data point within a series consists of:

  • A metric name: This is the primary identifier for what is being measured. For example, kpm_drb_ue_thp_dl or kubelet_running_pods.
  • A set of key-value pairs (labels): These are dimensions that make the data rich and queryable. For instance, a metric could have labels like e2node_nb_id="50" or and pod="name" to distinguish it from other base stations or pods. This labeling is what allows for powerful, multi-dimensional filtering and aggregation.
  • A value: The actual numeric measurement.
  • A timestamp: The time at which the measurement was recorded.

For more detailed information on the Prometheus Data Model and different types of metrics, refer to this VictoriaMetrics blog post.

Where to Go Next

Now that you have an overview of the architecture and data model, you can dive deeper: