Collector failover principle

It is possible to configure a collector to function as a failover collector for another collector.

Normal operation

Normal operation

In normal operation the main collector is collecting data and “healthy”. The main collector will make it’s health available to the failover collector. As long as the failover collector can verify that the main collector is healthy it will remain dormant.

Main collector is down

Main collector down

In this scenario the failover collector has not been able to verify the health of the main collector in a predefined time frame due to the main collector failing or the machine on which it is running failing. In this case the failover collector will become active and start the collection of data on the failover server.

Main collector is unhealthy but reachable

Main collector unhealthy

In this scenario the failover collector received a health status from the main collector but the health status has been bad for a predefined time frame. In this case the failover collector will become active and start the collection of data on the failover server.

Network down

Network down

In this case the communication between the main collector and the failover collector is prevented by a network issue. In this case both collectors will collect data for the configured measurements.

Configuration

Collector failover settings

To configure the failover for a collector create a new collector, install it on the target machine and perform the initial registration. Once the collector has been registered go to the settings page. After toggling the failover switch a new panel called “Failover settings” should appear with the options described below.

Main collector

Select the main collector for which you want the collector to be the failover. Only collectors of the same type can be selected.

Port

This is the port the main collector will listen on to serve the health status to the failover collector. Enter a valid tcp port number greater than 1023 if your collector is running without administrator or root privileges and make sure your firewall configuration allows a connection on this port from the host on which the failover collector is running to the host on which the main collector is running.

Host

This is the host address to which the failover collector will try to connect to get the health status of the main collector. This can be an ip address or a resolvable hostname.

Timeout

This is the number of seconds the failover collector will wait before becoming “active” after it has deemed the main collector to be unhealthy.

Stop delay

This is the number of seconds the failover collector will wait before returning to the “idle” state after it deemed the main collector to be healthy.