1 =head1 NAME
3 collectd-threshold - Documentation of collectd's I<Threshold plugin>
5 =head1 SYNOPSIS
7 LoadPlugin "threshold"
8 <Plugin "threshold">
9 <Type "foo">
10 WarningMin 0.00
11 WarningMax 1000.00
12 FailureMin 0.00
13 FailureMax 1200.00
14 Invert false
15 Instance "bar"
16 </Type>
17 </Plugin>
19 =head1 DESCRIPTION
21 Starting with version C<4.3.0> I<collectd> has support for B<monitoring>. By
22 that we mean that the values are not only stored or sent somewhere, but that
23 they are judged and, if a problem is recognized, acted upon. The only action
24 the I<Threshold plugin> takes itself is to generate and dispatch a
25 I<notification>. Other plugins can register to receive notifications and
26 perform appropriate further actions.
28 Since systems and what you expect them to do differ a lot, you can configure
29 I<thresholds> for your values freely. This gives you a lot of flexibility but
30 also a lot of responsibility.
32 Every time a value is out of range, a notification is dispatched. This means
33 that the idle percentage of your CPU needs to be less then the configured
34 threshold only once for a notification to be generated. There's no such thing
35 as a moving average or similar - at least not now.
37 Also, all values that match a threshold are considered to be relevant or
38 "interesting". As a consequence collectd will issue a notification if they are
39 not received for B<Timeout> iterations. The B<Timeout> configuration option is
40 explained in section L<collectd.conf(5)/"GLOBAL OPTIONS">. If, for example,
41 B<Timeout> is set to "2" (the default) and some hosts sends it's CPU statistics
42 to the server every 60 seconds, a notification will be dispatched after about
43 120 seconds. It may take a little longer because the timeout is checked only
44 once each B<Interval> on the server.
46 When a value comes within range again or is received after it was missing, an
47 "OKAY-notification" is dispatched.
49 =head1 CONFIGURATION
51 Here is a configuration example to get you started. Read below for more
52 information.
54 LoadPlugin "threshold"
55 <Plugin "threshold">
56 <Type "foo">
57 WarningMin 0.00
58 WarningMax 1000.00
59 FailureMin 0.00
60 FailureMax 1200.00
61 Invert false
62 Instance "bar"
63 </Type>
65 <Plugin "interface">
66 Instance "eth0"
67 <Type "if_octets">
68 FailureMax 10000000
69 DataSource "rx"
70 </Type>
71 </Plugin>
73 <Host "hostname">
74 <Type "cpu">
75 Instance "idle"
76 FailureMin 10
77 </Type>
79 <Plugin "memory">
80 <Type "memory">
81 Instance "cached"
82 WarningMin 100000000
83 </Type>
84 </Plugin>
86 <Type "load">
87 DataSource "midterm"
88 FailureMax 4
89 Hits 3
90 Hysteresis 3
91 </Type>
92 </Host>
93 </Plugin>
95 There are basically two types of configuration statements: The C<Host>,
96 C<Plugin>, and C<Type> blocks select the value for which a threshold should be
97 configured. The C<Plugin> and C<Type> blocks may be specified further using the
98 C<Instance> option. You can combine the block by nesting the blocks, though
99 they must be nested in the above order, i.e. C<Host> may contain either
100 C<Plugin> and C<Type> blocks, C<Plugin> may only contain C<Type> blocks and
101 C<Type> may not contain other blocks. If multiple blocks apply to the same
102 value the most specific block is used.
104 The other statements specify the threshold to configure. They B<must> be
105 included in a C<Type> block. Currently the following statements are recognized:
107 =over 4
109 =item B<FailureMax> I<Value>
111 =item B<WarningMax> I<Value>
113 Sets the upper bound of acceptable values. If unset defaults to positive
114 infinity. If a value is greater than B<FailureMax> a B<FAILURE> notification
115 will be created. If the value is greater than B<WarningMax> but less than (or
116 equal to) B<FailureMax> a B<WARNING> notification will be created.
118 =item B<FailureMin> I<Value>
120 =item B<WarningMin> I<Value>
122 Sets the lower bound of acceptable values. If unset defaults to negative
123 infinity. If a value is less than B<FailureMin> a B<FAILURE> notification will
124 be created. If the value is less than B<WarningMin> but greater than (or equal
125 to) B<FailureMin> a B<WARNING> notification will be created.
127 =item B<DataSource> I<DSName>
129 Some data sets have more than one "data source". Interesting examples are the
130 C<if_octets> data set, which has received (C<rx>) and sent (C<tx>) bytes and
131 the C<disk_ops> data set, which holds C<read> and C<write> operations. The
132 system load data set, C<load>, even has three data sources: C<shortterm>,
133 C<midterm>, and C<longterm>.
135 Normally, all data sources are checked against a configured threshold. If this
136 is undesirable, or if you want to specify different limits for each data
137 source, you can use the B<DataSource> option to have a threshold apply only to
138 one data source.
140 =item B<Invert> B<true>|B<false>
142 If set to B<true> the range of acceptable values is inverted, i.e. values
143 between B<FailureMin> and B<FailureMax> (B<WarningMin> and B<WarningMax>) are
144 not okay. Defaults to B<false>.
146 =item B<Persist> B<true>|B<false>
148 Sets how often notifications are generated. If set to B<true> one notification
149 will be generated for each value that is out of the acceptable range. If set to
150 B<false> (the default) then a notification is only generated if a value is out
151 of range but the previous value was okay.
153 This applies to missing values, too: If set to B<true> a notification about a
154 missing value is generated once every B<Interval> seconds. If set to B<false>
155 only one such notification is generated until the value appears again.
157 =item B<Percentage> B<true>|B<false>
159 If set to B<true>, the minimum and maximum values given are interpreted as
160 percentage value, relative to the other data sources. This is helpful for
161 example for the "df" type, where you may want to issue a warning when less than
162 5E<nbsp>% of the total space is available. Defaults to B<false>.
164 =item B<Hits> I<Value>
166 Sets the number of occurrences which the threshold must be raised before to
167 dispatch any notification or, in other words, the number of B<Interval>s
168 that the threshold must be match before dispatch any notification.
170 =item B<Hysteresis> I<Value>
172 Sets the hysteresis value for threshold. The hysteresis is a method to prevent
173 flapping between states, until a new received value for a previously matched
174 threshold down below the threshold condition (B<WarningMax>, B<FailureMin> or
175 everything else) minus the hysteresis value, the failure (respectively warning)
176 state will be keep.
178 =item B<Interesting> B<true>|B<false>
180 If set to B<true> (the default), the threshold must be treated as interesting
181 and, when a number of B<Timeout> values will lost, then a missing notification
182 will be dispatched. On the other hand, if set to B<false>, the missing
183 notification will never dispatched for this threshold.
185 =back
187 =head1 SEE ALSO
189 L<collectd(1)>,
190 L<collectd.conf(5)>
192 =head1 AUTHOR
194 Florian Forster E<lt>octoE<nbsp>atE<nbsp>collectd.orgE<gt>