1 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1//EN" >
2 <book>
3 <title>Nagios Plug-in Developer Guidelines</title>
5 <bookinfo>
6 <authorgroup>
7 <author>
8 <affiliation>
9 <orgname>Nagios Plugins Development Team</orgname>
10 </affiliation>
11 </author>
12 </authorgroup>
14 <pubdate>2005</pubdate>
15 <title>Nagios plug-in development guidelines</title>
17 <revhistory>
18 <revision>
19 <revnumber>$Revision$</revnumber>
20 <date>$Date$</date>
21 </revision>
22 </revhistory>
24 <copyright>
25 <year>2000 - 2005</year>
26 <holder>Nagios Plugins Development Team</holder>
27 </copyright>
29 </bookinfo>
32 <preface id="preface"><title>Preface</title>
33 <para>The purpose of this guidelines is to provide a reference for
34 the plug-in developers and encourage the standarization of the
35 different kind of plug-ins: C, shell, perl, python, etc.</para>
37 <para>Nagios Plug-in Development Guidelines Copyright (C) 2000-2005
38 (Nagios Plugins Team)</para>
40 <para>Permission is granted to make and distribute verbatim
41 copies of this manual provided the copyright notice and this
42 permission notice are preserved on all copies.</para>
44 <para>The plugins themselves are copyrighted by their respective
45 authors.</para>
46 </preface>
48 <article>
49 <section id="DevRequirements"><title>Development platform requirements</title>
50 <para>
51 Nagios plugins are developed to the GNU standard, so any OS which is supported by GNU
52 should run the plugins. While the requirements for compiling the Nagios plugins release
53 is very small, to develop from CVS needs additional software to be installed. These are the
54 minimum levels of software required:
56 <literallayout>
57 gnu make 3.79
58 automake 1.8
59 autoconf 2.58
60 gettext 0.11.5
61 </literallayout>
63 To compile from CVS, after you have checked out the code, run:
64 <literallayout>
65 tools/setup
66 ./configure
67 make
68 make install
69 </literallayout>
70 </para>
71 </section>
73 <section id="PlugOutput"><title>Plugin Output for Nagios</title>
75 <para>You should always print something to STDOUT that tells if the
76 service is working or why it is failing. Try to keep the output short -
77 probably less that 80 characters. Remember that you ideally would like
78 the entire output to appear in a pager message, which will get chopped
79 off after a certain length.</para>
81 <section><title>Print only one line of text</title>
82 <para>Nagios will only grab the first line of text from STDOUT
83 when it notifies contacts about potential problems. If you print
84 multiple lines, you're out of luck. Remember, keep it short and
85 to the point.</para>
87 <para>Output should be in the format:</para>
88 <literallayout>
89 METRIC STATUS: Information text
90 </literallayout>
91 <para>However, note that this is not a requirement of the API, so you cannot depend on this
92 being an accurate reflection of the status of the service - the status should always
93 be determined by the return code.</para>
94 </section>
96 <section><title>Verbose output</title>
97 <para>Use the -v flag for verbose output. You should allow multiple
98 -v options for additional verbosity, up to a maximum of 3. The standard
99 type of output should be:</para>
101 <table id="verboselevels"><title>Verbose output levels</title>
102 <tgroup cols="2">
103 <thead>
104 <row>
105 <entry><para>Verbosity level</para></entry>
106 <entry><para>Type of output</para></entry>
107 </row>
108 </thead>
109 <tbody>
110 <row>
111 <entry align="center"><para>0</para></entry>
112 <entry><para>Single line, minimal output. Summary</para></entry>
113 </row>
114 <row>
115 <entry align="center"><para>1</para></entry>
116 <entry><para>Single line, additional information (eg list processes that fail)</para></entry>
117 </row>
118 <row>
119 <entry align="center"><para>2</para></entry>
120 <entry><para>Multi line, configuration debug output (eg ps command used)</para></entry>
121 </row>
122 <row>
123 <entry align="center"><para>3</para></entry>
124 <entry><para>Lots of detail for plugin problem diagnosis</para></entry>
125 </row>
126 </tbody>
127 </tgroup>
128 </table>
129 </section>
131 <section><title>Screen Output</title>
132 <para>The plug-in should print the diagnostic and just the
133 synopsis part of the help message. A well written plugin would
134 then have --help as a way to get the verbose help.</para>
135 <para>Code and output should try to respect the 80x25 size of a
136 crt (remember when fixing stuff in the server room!)</para>
137 </section>
139 <section><title>Return the proper status code</title>
140 <para>See <xref linkend="ReturnCodes"> below
141 for the numeric values of status codes and their
142 description. Remember to return an UNKNOWN state if bogus or
143 invalid command line arguments are supplied or it you are unable
144 to check the service.</para>
145 </section>
147 <section><title>Plugin Return Codes</title>
148 <para>The return codes below are based on the POSIX spec of returning
149 a positive value. Netsaint prior to v0.0.7 supported non-POSIX
150 compliant return code of "-1" for unknown. Nagios supports POSIX return
151 codes by default.</para>
153 <para>Note: Some plugins will on occasion print on STDOUT that an error
154 occurred and error code is 138 or 255 or some such number. These
155 are usually caused by plugins using system commands and having not
156 enough checks to catch unexpected output. Developers should include a
157 default catch-all for system command output that returns an UNKNOWN
158 return code.</para>
160 <table id="ReturnCodes"><title>Plugin Return Codes</title>
161 <tgroup cols="3">
162 <thead>
163 <row>
164 <entry><para>Numeric Value</para></entry>
165 <entry><para>Service Status</para></entry>
166 <entry><para>Status Description</para></entry>
167 </row>
168 </thead>
169 <tbody>
170 <row>
171 <entry align="center"><para>0</para></entry>
172 <entry valign="middle"><para>OK</para></entry>
173 <entry><para>The plugin was able to check the service and it
174 appeared to be functioning properly</para></entry>
175 </row>
176 <row>
177 <entry align="center"><para>1</para></entry>
178 <entry valign="middle"><para>Warning</para></entry>
179 <entry><para>The plugin was able to check the service, but it
180 appeared to be above some "warning" threshold or did not appear
181 to be working properly</para></entry>
182 </row>
183 <row>
184 <entry align="center"><para>2</para></entry>
185 <entry valign="middle"><para>Critical</para></entry>
186 <entry><para>The plugin detected that either the service was not
187 running or it was above some "critical" threshold</para></entry>
188 </row>
189 <row>
190 <entry align="center"><para>3</para></entry>
191 <entry valign="middle"><para>Unknown</para></entry>
192 <entry><para>Invalid command line arguments were supplied to the
193 plugin or the plugin was unable to check the status of the given
194 hosts/service</para></entry>
195 </row>
196 </tbody>
197 </tgroup>
198 </table>
201 </section>
203 <section id="thresholdformat"><title>Threshold range format</title>
204 <para>Thresholds ranges define the warning and critical levels for plugins to
205 alert on. The theory is that the plugin will do some sort of check which returns
206 back a numerical value, or metric, which is then compared to the warning and
207 critical thresholds.
208 This is the generalised format for threshold ranges:</para>
210 <literallayout>
211 [@]start:end
212 </literallayout>
214 <para>Notes:</para>
215 <orderedlist>
216 <listitem><para>start > end</para>
217 </listitem>
218 <listitem><para>start and ":" is not required if start=0</para>
219 </listitem>
220 <listitem><para>if range is of format "start:" and end is not specified,
221 assume end is infinity</para>
222 </listitem>
223 <listitem><para>to specify negative infinity, use "~"</para>
224 </listitem>
225 <listitem><para>alert is raised if metric is outside start and end range
226 (inclusive of endpoints)</para>
227 </listitem>
228 <listitem><para>if range starts with "@", then alert if inside this range
229 (inclusive of endpoints)</para>
230 </listitem>
231 </orderedlist>
233 <para>Note: Not all plugins are coded to expect ranges in this format. It is
234 planned for a future release to
235 provide standard libraries to parse and compare metrics against ranges. There
236 will also be some work in providing multiple metrics.</para>
237 </section>
239 <section><title>Performance data</title>
240 <para>Performance data is defined by Nagios as "everything after the | of the plugin output" -
241 please refer to Nagios documentation for information on capturing this data to logfiles.
242 However, it is the responsibility of the plugin writer to ensure the
243 performance data is in a "Nagios plugins" format.
244 This is the expected format:</para>
246 <literallayout>
247 'label'=value[UOM];[warn];[crit];[min];[max]
248 </literallayout>
250 <para>Notes:</para>
251 <orderedlist>
252 <listitem><para>space separated list of label/value pairs</para>
253 </listitem>
254 <listitem><para>label can contain any characters</para>
255 </listitem>
256 <listitem><para>the single quotes for the label are optional. Required if
257 spaces, = or ' are in the label</para>
258 </listitem>
259 <listitem><para>label length is arbitrary, but ideally the first 19 characters
260 are unique (due to a limitation in RRD). Be aware of a limitation in the
261 amount of data that NRPE returns to Nagios</para>
262 </listitem>
263 <listitem><para>to specify a quote character, use two single quotes</para>
264 </listitem>
265 <listitem><para>warn, crit, min or max may be null (for example, if the threshold is
266 not defined or min and max do not apply). Trailing unfilled semicolons can be
267 dropped</para>
268 </listitem>
269 <listitem><para>min and max are not required if UOM=%</para>
270 </listitem>
271 <listitem><para>value, min and max in class [-0-9.]. Must all be the
272 same UOM</para>
273 </listitem>
274 <listitem><para>warn and crit are in the range format (see
275 <xref linkend="thresholdformat">). Must be the same UOM</para>
276 </listitem>
277 <listitem><para>UOM (unit of measurement) is one of:</para>
278 <orderedlist>
279 <listitem><para>no unit specified - assume a number (int or float)
280 of things (eg, users, processes, load averages)</para>
281 </listitem>
282 <listitem><para>s - seconds (also us, ms)</para></listitem>
283 <listitem><para>% - percentage</para></listitem>
284 <listitem><para>B - bytes (also KB, MB, TB)</para></listitem>
285 <listitem><para>c - a continous counter (such as bytes
286 transmitted on an interface)</para></listitem>
287 </orderedlist>
288 </listitem>
289 </orderedlist>
291 <para>It is up to third party programs to convert the Nagios plugins
292 performance data into graphs.</para>
293 </section>
295 <section><title>Translations</title>
296 <para>If possible, use translation tools for all output. Currently, most of the core C plugins
297 use gettext for translation. General guidelines are:</para>
299 <orderedlist>
300 <listitem><para>short help is not translated</para></listitem>
301 <listitem><para>long help has options in English language, but text translated</para></listitem>
302 <listitem><para>"Copyright" kept in English</para></listitem>
303 <listitem><para>copyright holder names kept in original text</para></listitem>
304 </orderedlist>
305 </section>
306 </section>
308 <section id="SysCmdAuxFiles"><title>System Commands and Auxiliary Files</title>
310 <section><title>Don't execute system commands without specifying their
311 full path</title>
312 <para>Don't use exec(), popen(), etc. to execute external
313 commands without explicity using the full path of the external
314 program.</para>
316 <para>Doing otherwise makes the plugin vulnerable to hijacking
317 by a trojan horse earlier in the search path. See the main
318 plugin distribution for examples on how this is done.</para>
319 </section>
321 <section><title>Use spopen() if external commands must be executed</title>
323 <para>If you have to execute external commands from within your
324 plugin and you're writing it in C, use the spopen() function
325 that Karl DeBisschop has written.</para>
327 <para>The code for spopen() and spclose() is included with the
328 core plugin distribution.</para>
329 </section>
331 <section><title>Don't make temp files unless absolutely required</title>
333 <para>If temp files are needed, make sure that the plugin will
334 fail cleanly if the file can't be written (e.g., too few file
335 handles, out of disk space, incorrect permissions, etc.) and
336 delete the temp file when processing is complete.</para>
337 </section>
339 <section><title>Don't be tricked into following symlinks</title>
341 <para>If your plugin opens any files, take steps to ensure that
342 you are not following a symlink to another location on the
343 system.</para>
344 </section>
346 <section><title>Validate all input</title>
348 <para>use routines in utils.c or utils.pm and write more as needed</para>
349 </section>
351 </section>
356 <section id="PerlPlugin"><title>Perl Plugins</title>
358 <para>Perl plugins are coded a little more defensively than other
359 plugins because of embedded Perl. When configured as such, embedded
360 Perl Nagios (ePN) requires stricter use of the some of Perl's features.
361 This section outlines some of the steps needed to use ePN
362 effectively.</para>
364 <orderedlist>
366 <listitem><para> Do not use BEGIN and END blocks since they will be called
367 only once (when Nagios starts and shuts down) with Embedded Perl (ePN). In
368 particular, do not use BEGIN blocks to initialize variables.</para>
369 </listitem>
371 <listitem><para>To use utils.pm, you need to provide a full path to the
372 module in order for it to work.</para>
374 <literallayout>
375 e.g.
376 use lib "/usr/local/nagios/libexec";
377 use utils qw(...);
378 </literallayout>
379 </listitem>
381 <listitem><para>Perl scripts should be called with "-w"</para>
382 </listitem>
384 <listitem><para>All Perl plugins must compile cleanly under "use strict" - i.e. at
385 least explicitly package names as in "$main::x" or predeclare every
386 variable. </para>
389 <para>Explicitly initialize each variable in use. Otherwise with
390 caching enabled, the plugin will not be recompiled each time, and
391 therefore Perl will not reinitialize all the variables. All old
392 variable values will still be in effect.</para>
393 </listitem>
395 <listitem><para>Do not use >DATA< handles (these simply do not compile under ePN).</para>
396 </listitem>
398 <listitem><para>Do not use global variables in named subroutines. This is bad practise anyway, but with ePN the
399 compiler will report an error "<global_var> will not stay shared ..". Values used by
400 subroutines should be passed in the argument list.</para>
401 </listitem>
403 <listitem><para>If writing to a file (perhaps recording
404 performance data) explicitly close close it. The plugin never
405 calls <emphasis role="strong">exit</emphasis>; that is caught by
406 p1.pl, so output streams are never closed.</para>
407 </listitem>
409 <listitem><para>As in <xref linkend="runtime"> all plugins need
410 to monitor their runtime, specially if they are using network
411 resources. Use of the <emphasis>alarm</emphasis> is recommended
412 noting that some Perl modules (eg LWP) manage timers, so that an alarm
413 set by a plugin using such a module is overwritten by the module.
414 (workarounds are cunning (TM) or using the module timer)
415 Plugins may import a default time out ($TIMEOUT) from utils.pm.
416 </para>
417 </listitem>
419 <listitem><para>Perl plugins should import %ERRORS from utils.pm
420 and then "exit $ERRORS{'OK'}" rather than "exit 0"
421 </para>
422 </listitem>
424 </orderedlist>
426 </section>
428 <section id="runtime"><title>Runtime Timeouts</title>
430 <para>Plugins have a very limited runtime - typically 10 sec.
431 As a result, it is very important for plugins to maintain internal
432 code to exit if runtime exceeds a threshold. </para>
434 <para>All plugins should timeout gracefully, not just networking
435 plugins. For instance, df may lock if you have automounted
436 drives and your network fails - but on first glance, who'd think
437 df could lock up like that. Plus, it should just be more error
438 resistant to be able to time out rather than consume
439 resources.</para>
441 <section><title>Use DEFAULT_SOCKET_TIMEOUT</title>
443 <para>All network plugins should use DEFAULT_SOCKET_TIMEOUT to timeout</para>
445 </section>
448 <section><title>Add alarms to network plugins</title>
450 <para>If you write a plugin which communicates with another
451 networked host, you should make sure to set an alarm() in your
452 code that prevents the plugin from hanging due to abnormal
453 socket closures, etc. Nagios takes steps to protect itself
454 against unruly plugins that timeout, but any plugins you create
455 should be well behaved on their own.</para>
457 </section>
461 </section>
463 <section id="PlugOptions"><title>Plugin Options</title>
465 <para>A well written plugin should have --help as a way to get
466 verbose help. Code and output should try to respect the 80x25 size of a
467 crt (remember when fixing stuff in the server room!)</para>
469 <section><title>Option Processing</title>
471 <para>For plugins written in C, we recommend the C standard
472 getopt library for short options. Getopt_long is always available.
473 </para>
475 <para>For plugins written in Perl, we recommend Getopt::Long module.</para>
477 <para>Positional arguments are strongly discouraged.</para>
479 <para>There are a few reserved options that should not be used
480 for other purposes:</para>
482 <literallayout>
483 -V version (--version)
484 -h help (--help)
485 -t timeout (--timeout)
486 -w warning threshold (--warning)
487 -c critical threshold (--critical)
488 -H hostname (--hostname)
489 -v verbose (--verbose)
490 </literallayout>
492 <para>In addition to the reserved options above, some other standard options are:</para>
494 <literallayout>
495 -C SNMP community (--community)
496 -a authentication password (--authentication)
497 -l login name (--logname)
498 -p port or password (--port or --passwd/--password)monitors operational
499 -u url or username (--url or --username)
500 </literallayout>
502 <para>Look at check_pgsql and check_procs to see how I currently
503 think this can work. Standard options are:</para>
506 <para>The option -V or --version should be present in all
507 plugins. For C plugins it should result in a call to print_revision, a
508 function in utils.c which takes two character arguments, the
509 command name and the plugin revision.</para>
511 <para>The -? option, or any other unparsable set of options,
512 should print out a short usage statement. Character width should
513 be 80 and less and no more that 23 lines should be printed (it
514 should display cleanly on a dumb terminal in a server
515 room).</para>
517 <para>The option -h or --help should be present in all plugins.
518 In C plugins, it should result in a call to print_help (or
519 equivalent). The function print_help should call print_revision,
520 then print_usage, then should provide detailed
521 help. Help text should fit on an 80-character width display, but
522 may run as many lines as needed.</para>
524 <para>The option -v or --verbose should be present in all plugins.
525 The user should be allowed to specify -v multiple times to increase
526 the verbosity level, as described in <xref linkend="verboselevels">.</para>
527 </section>
529 <section>
530 <title>Plugins with more than one type of threshold, or with
531 threshold ranges</title>
533 <para>Old style was to do things like -ct for critical time and
534 -cv for critical value. That goes out the window with POSIX
535 getopt. The allowable alternatives are:</para>
537 <orderedlist>
538 <listitem>
539 <para>long options like -critical-time (or -ct and -cv, I
540 suppose).</para>
541 </listitem>
543 <listitem>
544 <para>repeated options like `check_load -w 10 -w 6 -w 4 -c
545 16 -c 10 -c 10`</para>
546 </listitem>
548 <listitem>
549 <para>for brevity, the above can be expressed as `check_load
550 -w 10,6,4 -c 16,10,10`</para>
551 </listitem>
553 <listitem>
554 <para>ranges are expressed with colons as in `check_procs -C
555 httpd -w 1:20 -c 1:30` which will warn above 20 instances,
556 and critical at 0 and above 30</para>
557 </listitem>
559 <listitem>
560 <para>lists are expressed with commas, so Jacob's check_nmap
561 uses constructs like '-p 1000,1010,1050:1060,2000'</para>
562 </listitem>
564 <listitem>
565 <para>If possible when writing lists, use tokens to make the
566 list easy to remember and non-order dependent - so
567 check_disk uses '-c 10000,10%' so that it is clear which is
568 the precentage and which is the KB values (note that due to
569 my own lack of foresight, that used to be '-c 10000:10%' but
570 such constructs should all be changed for consistency,
571 though providing reverse compatibility is fairly
572 easy).</para>
573 </listitem>
575 </orderedlist>
577 <para>As always, comments are welcome - making this consistent
578 without a host of long options was quite a hassle, and I would
579 suspect that there are flaws in this strategy.
580 </para>
581 </section>
582 </section>
584 <section id="CodingGuidelines"><title>Coding guidelines</title>
585 <para>See <ulink url="http://www.gnu.org/prep/standards_toc.html">GNU
586 Coding standards</ulink> for general guidelines.</para>
587 <section><title>Comments</title>
588 <para>You should use /* */ for comments and not // as some compilers
589 do not handle the latter form.</para>
590 <para>If you have copied a routine from another source, make sure the licence
591 from your source allows this. Add a comment referencing the ACKNOWLEDGEMENTS
592 file, where you can put more detail about the source.</para>
593 <para>For contributed code, do not add any named credits in the source code
594 - contributors should be added into the THANKS.in file instead.
595 </para>
596 </section>
598 <section><title>CVS comments</title>
599 <para>When adding CVS comments at commit time, you can use the following prefixes:
600 <variablelist>
601 <varlistentry><term>- comment</term>
602 <listitem>
603 <para>for a comment that can be removed from the Changelog</para>
604 </listitem>
605 </varlistentry>
606 <varlistentry><term>* comment</term>
607 <listitem>
608 <para>for an important amendment to be included into a features list</para>
609 </listitem>
610 </varlistentry>
611 </variablelist>
612 </para>
613 <para>If the change is due to a contribution, please quote the contributor's name
614 and, if applicable, add the SourceForge Tracker number. Don't forget to
615 update the THANKS.in file.</para>
616 </section>
618 <section><title>Translations for developers</title>
619 <para>To make the job easier for translators please follow these guidelines:</para>
620 <orderedlist>
621 <listitem><para>
622 before creating new strings, check the po/de.po file to see if a similar string
623 already exists
624 </para></listitem>
625 <listitem><para>
626 for help texts, break into individual options so that these can be reused
627 between plugins
628 </para></listitem>
629 </orderedlist>
630 </section>
632 <section><title>Translations for translators</title>
633 <para>To create an up to date list of translatable strings, run: tools/gen_locale.sh</para>
634 </section>
636 </section>
638 <section id="SubmittingChanges"><title>Submission of new plugins and patches</title>
640 <section id="Patches"><title>Patches</title>
641 <para>If you have a bug patch, please supply a unified or context diff against the
642 version you are using. For new features, please supply a diff against
643 the CVS HEAD version.</para>
645 <para>Patches should be submitted via
646 <ulink url="http://sourceforge.net/tracker/?group_id=29880&atid=397599">SourceForge's
647 tracker system for Nagiosplug patches</ulink>
648 and be announced to the nagiosplug-devel mailing list.</para>
650 <para>Submission of a patch implies that the submmitter acknowledges that they
651 are the author of the code (or have permission from the author to release the code)
652 and agree that the code can be released under the GPL. The copyright for the changes will
653 then revert to the Nagios Plugin Development Team - this is required so that any copyright
654 infringements can be investigated quickly without contacting a huge list of copyright holders.
655 Credit will always be given for any patches through a THANKS file in the distribution.</para>
656 </section>
658 <section id="Newplugins"><title>New plugins</title>
660 <para>If you would like others to use your plugins, please add it to
661 the official 3rd party plugin repository,
662 <ulink url="http://www.nagiosexchange.org">NagiosExchange</ulink>.
663 </para>
665 <para>We are not accepting requests for inclusion of plugins into
666 our distribution at the moment, but when we do, these are the minimum
667 requirements:
668 </para>
670 <orderedlist>
671 <listitem>
672 <para>Include copyright and license information in all files</para>
673 </listitem>
674 <listitem>
675 <para>The standard command options are supported (--help, --version,
676 --timeout, --warning, --critical)</para>
677 </listitem>
678 <listitem>
679 <para>It is determined to be not redundant (for instance, we would not
680 add a new version of check_disk just because someone had provide
681 a plugin that had perf checking - we would incorporate the features
682 into an exisiting plugin)</para>
683 </listitem>
684 <listitem>
685 <para>One of the developers has had the time to audit the code and declare
686 it ready for core</para>
687 </listitem>
688 <listitem>
689 <para>It should also follow code format guidelines, and use functions from
690 utils (perl or c or sh) rather than using its own</para>
691 </listitem>
692 <listitem>
693 <para>Includes patches to configure.in if required (via the EXTRAS list if
694 it will only work on some platforms)</para>
695 </listitem>
696 <listitem>
697 <para>If possible, please submit a test harness. Documentation on sample
698 tests coming soon</para>
699 </listitem>
700 </orderedlist>
702 </section>
704 </section>
705 </article>
707 </book>