shaman.conf

NAME

shaman.conf − local shaman and shaman-monitor configuration file.
shaman-config − global shaman-monitor configuration file.

SYNOPSIS

/etc/shaman/shaman.conf

shaman -c cluster_name get-config [-X, –xml]

shaman -c cluster_name set-config [key=val …]

DESCRIPTION

shaman.conf is the local configuration file for the shaman-monitor daemon and the shaman command-line tool. It affects settings of shaman-monitor running on the particular node. At the same time, a number of cluster-wide configuration parameters used by shaman-monitor daemon are defined in the global configuration file and can be obtained with shaman get-config command and modified with shaman set-config command.

PARAMETER=value

All parameter names and values are case-sensitive; extra spaces are not allowed. All unrecognized lines and lines starting with # are ignored.

LOCAL PARAMETERS

The following parameters can be specified in the shaman.conf configuration file.
CLUSTER_NAME
=name

Specifies the name of the cluster to operate on.

LOG_LEVEL=level

Specifies the maximum verbosity level for the messages, that shall be put into the log. Messages with verbosity level higher than the specified will not be included in the log. The verbosity level is a numeric value from the following list:

0 - Error messages

1 - Warning messages

2 - Informational messages (default)

4 - Debug diagnostic messages

Higher levels of verbosity are intended for debugging purposes.

GLOBAL PARAMETERS

Shaman nodes cooperate according to the master/slave model. One node acts as the master while all the others act as slaves. Both the master and slave nodes periodically check each other’s state. If one or more slave nodes go down, the master node relocates their resources to the live slave nodes. If the master node goes down, one of the slave nodes becomes the master node.

The following parameters can be specified in the global configuration file with shaman set-config.

NODE MANAGEMENT PARAMETERS
LOCK_TIMEOUT
=seconds

This parameters sets the maximum time while a node could be disconnected from cluster without being detected as crushed one or lost its master status. This parameter is mangled with LEASE_CHECK_TIMEOUT_FOR_MASTER and LEASE_CHECK_TIMEOUT_FOR_SLAVE in current implementation and you should keep these two parameters much lesser than LOCK_TIMEOUT for it to keep it’s meaning. The default value of LOCK_TIMEOUT is 60 seconds.

LEASE_CHECK_TIMEOUT_FOR_MASTER=seconds

This timeout set how frequenly master checks for crushed slaves. Note that this is just check timeout, the actual timeout when crushed or disconnected slave is regareded crushed is LOCK_TIMEOUT. The default value is 10 seconds.

LEASE_CHECK_TIMEOUT_FOR_SLAVE=seconds

This timeout set how frequenly slaves checks for crushed master. Note that this is just check timeout, the actual timeout when crushed or disconnected master is regareded crushed is LOCK_TIMEOUT. The default value is 10 seconds.

LEASE_LOST_ACTION=action

Defines the action to perform when a slave detects that the master found him crushed because of network temporaty connectivity lost. We don’t have many options to recover in this case, just “crash”, “halt”, “reboot” or “none”(do nothing, let it be).

CLUSTER_MOUNTPOINT_DEAD_ACTION=action

Defines the action to perform when shaman-monitor detects that the cluster mount point is no longer functioning properly for some reason. The supported actions are: crash, halt, reboot, none.

RELOCATION_SKIP_THRESHOLD=number

Sets the threshold for the number of simultaneously crashed nodes. If the number of simultaneously crashed nodes becomes greater than or equal to the threshold, the master stops relocating resources from the crashed nodes. When the number of simultaneously crashed nodes drops below the threshold, the master automatically resumes relocating resources from the crashed nodes. The threshold can be useful when multiple nodes are being rebooted at the same time. Without it, the master would start relocating resources from all the rebooting nodes. The threshold is set to 3 by default and must be 2 or greater. For clusters with only 3 nodes, the threshold is automatically set to 2.

RESOURCE MANAGEMENT PARAMETERS
POOL_CHECK_TIMEOUT
=seconds

Sets the interval for shaman-monitor to check Pool for the resources scheduled for relocation. The default value is 30 seconds.

RESOURCE_RELOCATION_MODE=mode[,mode…]

Defines a sequence of algorithms (modes) used for resource relocation on hardware node failure. At least one mode must be specified. Multiple modes must be separated with commas. On hardware node failure, relocation using the first specified mode is attempted. If unsuccessful, the next specified mode is attempted and so on. If relocation using the last specified mode is unsuccessful, the resources are left on the failed hardware node. The following resource relocation modes are supported:

round-robin - Each resource from the failed hardware node is relocated to another node, which is chosen using the round-robin algorithm. In general, resources are relocated to different hardware nodes.

spare - All resources from the failed hardware node are relocated to a ’spare’ node. A spare node is a hardware node, which is registered in the cluster and has no resources stored on it.

drs - All resources from the failed hardware node are relocated using an external DRS daemon.

The default sequence is “drs, round-robin”.

WATCHDOG PARAMETERS
WATCHDOG_TIMEOUT
=seconds

Sets the interval for the watchdog timer. The watchdog timer is responsible for performing the action defined in WATCHDOG_ACTION. shaman-monitor activates the watchdog timer on its start-up and periodically resets it to the specified value. If shaman-monitor doesn’t reset the timer, the watchdog timer counts down until it reaches zero and performs the defined action. Setting the interval to zero disables the watchdog timer. Minimal watchdog timer interval that could be set is 10 seconds. The default value is 60 seconds.

WATCHDOG_ACTION=action[,action…]

Defines a sequence of actions to perform after the watchdog timer expires (happens when cluster vstorage is unavailable). When the watchdog timer expires, the first specified action is attempted. If unsuccessful, the next specified action is attempted and so on. If the last specified action is unsuccessful, then the action specified in the /sys/kernel/watchdog_action file is performed. At least one action must be specified. Multiple actions must be separated with commas. Available actions are listed in the /sys/kernel/watchdog_available_actions file. The default sequence is “netfilter, reboot”.

netfilter - blocks socket input from outer world in and out except for ssh and network filesystem ports. This prevents the daemons of the node from reporting outdated node statistics when cluster connectivity is lost for long period of time so that node is blocked in its network storage operations and not yet recognized that master found out him dead long time ago and relocating or already relocated node resources.

The rules for opening and closing the firewall are located in the scripts init_wdog, comm_wdog, fini_wdog in the default shaman scripts directory (see shaman-scripts(8)).

FILES

/etc/shaman/shaman.conf

The default location of the local configuration file.

NOTES

Normally, modifications to the global configuration file come into effect every LEASE_CHECK_TIMEOUT seconds (shaman-monitor reloads the configuration file repeatedly at the specified interval), except for cases when the file is cached by applications.

SEE ALSO

Copyright © 2013-2017 R-Platforma LLC, All rights reserved. Copyright © 2017-2019 R-Platforma LLC, All rights reserved.