Skip to content

User guide

Anomaly Detection feature in FusionReactor Cloud

Monitoring an application is essential for delivering a high-quality product and ensuring a positive user experience. However, merely gathering a multitude of application metrics doesn't address the core issue. What software companies truly require is a mechanism to derive actionable insights from their metrics, enabling them to promptly address any issues their users may encounter.

Enter the RED method.

This new feature enhances observability by enabling users to track the probability of anomalies in critical service metrics, known as RED (Request, Errors and Duration rates). RED not only allows for closer monitoring of these key metrics but also provides notifications when they exceed set thresholds, offering a more proactive approach to service management.

!Screenshot

Core metrics: The RED framework

Central to FusionReactor Cloud's new anomaly detection feature are the RED metrics:

Metric Description
Requests Measures the rate at which requests are made to your services.
Errors Tracks the occurrence and type of errors within your services.
Duration Records the time taken for each service request to complete.

Functionality of Anomaly Detection

Function Description
Gathering data For each service, FusionReactor Cloud conducts three additional Prometheus queries to collect data on RED metrics.
Anomaly analysis The time series data from these metrics is examined using an 'iforest' algorithm, which excels in calculating the likelihood of anomalies in the latest RED data.
Data display Grafana dashboards are available for users to observe both the RED metrics and their anomaly probabilities.

Configurable alert system

FusionReactor Cloud allows you to set up alerts for anomalies detected in your system, providing an opportunity to proactively manage and mitigate potential service issues.

Target audience

This documentation is intended for technical users and managers who are involved in the observability of their systems. Users are not required to have domain-specific knowledge in anomaly detection to use FusionReactor Cloud's anomaly detection features effectively, as the system is designed with a focus on user-friendliness and intuitive controls.

Enabling Anomaly Detection

Accessing the Anomaly Detection configuration

To enable anomaly detection in FusionReactor Cloud, follow these steps:

Step 1: Use your web browser to navigate to the Alerting page in FusionReactor Cloud.

!Screenshot

Step 2: Select the Anomaly Detection tab to access the Anomaly Detection configuration page.

!Screenshot

Info

FR Cloud automatically detects your services and lists them on the Anomaly Detection configuration page.

To configure anomaly detection for each service:

Setting Description
Enable/Disable Anomaly Detection Toggle anomaly detection on or off for each listed service.
Manage subscriptions Choose one or more subscriptions to determine where alerts for detected anomalies will be sent.

Detailed RED query settings

By expanding a service row, you can access and adjust settings for the RED metrics:

Setting Description
Active Activate or deactivate anomaly detection for Rate, Error, and Duration queries individually.
Time Range Choose the span of historical data for calculating anomaly probabilities for new data points. The time range can be set from one hour up to one week prior to the current time. This selected period functions as a rolling window, where new data points continuously replace the oldest ones.
Threshold Tailor the sensitivity of the anomaly detection system by specifying a probability threshold. The typical range for this setting is between 80% and 100%, allowing you to define how the system identifies potential anomalies.
For (duration) Establish the time period during which the predefined threshold must be surpassed for an anomaly to be officially reported. This setting allows you to determine how long an anomaly must persist before it triggers a report.

Here is an example screen shot illustrating the configuration interface:

!Screenshot

Note

The actual interface may vary slightly, and you should consult the latest version of FR Cloud for the most up-to-date information.

Configuring Anomaly Detection for services

Service detection table

The services identified by the system are displayed in a table format. This table includes features that allow users to:

Feature Description
Filter services Search for services by name.
Sort data Organize the information by different column values.
Reload function Reinstate the default settings, allowing the system to rediscover any new services or reintegrate services that were previously removed.

Access to in-depth configuration

For each service listed, the following features are available:

Feature Description
Expand/Collapse Button Situated on the left, this button allows you to view or hide additional configuration settings.
Service Name This is shown adjacent to the expand/collapse button.
Anomaly Detection Toggle Next to the service name is a switch for enabling or disabling anomaly detection for the three key metrics: Rate, Errors, and Duration, visible in the Detect Anomalies column.
Activation of Individual Metrics The RED indicators, which can be clicked to turn on or off the respective metric queries, show their status with color coding: blue indicates active and gray indicates inactive. Anomaly probabilities are calculated only for active metrics.
Alert Status for Anomalies Displays the present condition of the RED anomaly alerts for the service. An orange label signifies that the associated alert rule is in pending mode. A red label denotes that an alert has been triggered.
Subscriptions Select from predefined subscriptions to determine where alerts for anomalies should be sent.
Actions Located at the far right, this tab provides additional actions. The current options available are a link to open the Anomaly Alert Dashboard and Delete, used to remove a service from the list.

Example screenshot showing the current anomaly alert status:

!Screenshot

Customization of individual metrics

Each service within the system allows for separate configuration of its RED metrics. Default settings are provided, but they can be tailored for specific needs. In the detailed view of a service, you'll find tabs for Rate, Errors, and Duration, each offering unique settings for their metric queries (Active and Time Range) and notification parameters (Threshold and Duration).

Setting Description
Metric Activation Switch This control, akin to the RED indicator, is used to turn the metric query on or off.
Choosing Time Range Users have the flexibility to select the time frame of historical data to be used for anomaly detection, ranging from one hour to one week. This duration is dynamically updated in a continuous window.
Anomaly Threshold Adjustment The slider allows you to set the anomaly probability threshold, with a range of 80% to 100%.
Setting Alert Duration Define the required duration for which an anomaly must exceed your chosen threshold to trigger an alert. For example, an anomaly is recognized when its probability stays at or above a set level (like 95%) for a predetermined time (such as 4 minutes).

!Screenshot

Best practice for Anomaly Detection configuration

Understanding service-specific metric profiles

Each service you monitor with FR Cloud has its unique profile for Rate, Errors, and Duration metrics. The profile varies based on the duration selected for calculating anomaly probabilities, which can range from an hour to a maximum of one week. For effective anomaly detection, it's essential to tailor the configuration for each service by considering the following:

Setting Description
Time frame selection Choose an appropriate time frame that reflects the service's operational patterns for anomaly probability calculations.
Threshold settings Set suitable threshold levels for anomaly probabilities to minimize false positives and negatives.
Duration for threshold exceedance Configure the minimum duration that the threshold level must be exceeded for an anomaly to be reported. This helps by filtering out transient spikes that may not be indicative of a systemic issue.

Tips for optimizing Anomaly Detection

Feature Description
Historical data analysis To determine the optimal Threshold and For values for anomaly detection, it is often useful to analyze the probabilities of a RED metric over an extended period, such as one week or even thirty days. This historical perspective can help identify the appropriate settings for a shorter period, like one day, ensuring that the detection is sensitive to true anomalies while ignoring normal fluctuations.

The importance of tailoring settings

Feature Description
Uniform settings aren't always effective Services differ in their operational conditions and behavioral patterns, necessitating individualized adjustments in anomaly detection settings.
Threshold settings Set suitable threshold levels for anomaly probabilities to minimize false positives and negatives.
Need for regular review and modification Anomaly detection in FR Cloud is an active process. Regularly reviewing and fine-tuning these settings is vital to align with changes in service behavior and usage patterns.

Understanding and addressing these aspects will greatly improve the precision and effectiveness of FR Cloud's anomaly detection capabilities.


Need more help?

Contact support in the chat bubble and let us know how we can assist.