Watchdog compares it’s own pings, and records the time it takes for a round trip to different components, clustered connections, and if one variable is larger than the other, watchdog will commence closing that stale connection. Here is a breakdown:
- A check is performed of a connection(s) on every
watchdog-delay
interval. -
During this check two things occur
- If the last transfer time exceeds
max-inactivity-time
a stop service command is given to terminate and broadcast unavailable presence. -
If the last transfer time is lower than
max-inactivity-time
but exceedswatchdog-timeout
watchdog will try to send a ping (ofwatchdog-ping-type
). This ping may be one of two varieties (set in config.tdsl)WHITESPACE
ping which will yield the time of the last data transfer in any direction.XMPP
ping which will yield the time of the last received xmpp stanza.
- If the last transfer time exceeds
- If the 2nd option is true, the connection will remain open, and another check will begin after the
watchdog-delay
time has expired.
For example, lets draw this out and get a visual representation
-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-- | | | | | | | | | | | ---+---------------------------------------------------------------- 1 2 3 4 5 6 ---*-*-----*-----*-----*-----------
This line represents how often the check is performed. Each |
|
This line is client activity, here the client sent a message at 40 seconds (marked by |
|
The following line represents the watchdog logic, with timeout at 120 seconds and max inactivity timeout at 180 seconds: 'watchdog-timeout' = 120000 c2s { 'max-inactivity-time' = '180000' } (timeout at 120 seconds and max inactivity timeout at 180 seconds) |
How the check is performed:
- 30 seconds - at this point last transfer or last received time is updated.
- 60 seconds - watchdog runs - it check the connection and says: _ok, last client transfer was 20s ago - but it’s lower than both inactivity (so don’t disconnect) and timeout (so don’t send ping).
- 120 seconds - 2nd check - last transfer was 100s ago - still lower than both values - do nothing.
- 180 seconds - 3rd check - last transfer was 160s ago - lower than inactivity but greater than delay - ping it sent.
- 240 seconds - 4th check - last transfer was 220s ago - client still hasn’t responded, watchdog compares idle time to
max-inactivity-timeout
and finds out that it is greater, connection is terminated. - 300 seconds - watchdog is run again but given the connection was terminatet there is no XMPP session to check for that particular client.
Tip
It is possible that the connection is broken, and could be detected during the sending of a ping and the connection would be severed at step 4 instead of waiting for step 5. NOTE This MAY cause JVM to throw an exception.
Note
Global settings may not be ideal for every setup. Since each component has its own settings for max-inactivity-time
you may find it necessary to design custom watchdog settings, or edit the inactivity times to better suit your needs. Below
is a short list of components with thier default settings:
bosh { 'max-inactivity-time' = 600L } c2s { 'max-inactivity-time' = 86400L } 'cl-comp' { 'max-inactivity-time' = 180L } s2s { 'max-inactivity-time' = 900L } ws2s { 'max-inactivity-time' = 86400L }
Important
Again remember, for Watchdog to properly work, the max-inactivity-time
MUST be longer than the watchdog-timeout
setting