OmniCheck was originally implemented on HP/UX 9.05, and has been successfully ported to all current revisions of the following operating systems:
mkdir /opt/omnicheck # or wherever you wish
mkdir /opt/omnicheck/logs # or wherever you wish
cd {install_dir} vi configfile
vi {install_dir}/rules.{nodename}
/usr/local/bin/perl {install_dir}/omnicheck -F {install_dir}/configfile
-F /oap/omnicheck.config
SIGHUP
.NOTE: may not be used with files that consist of multiple physical lines.
process: syslog process: sapi-apps
home: /oap home: /opt/omnicheck
name: foobar name: NODE
#!
)
@
)
file: /usr/adm/syslog/syslog.log file: #!/bin/df -k file: /oap/logs/*.err file: @/opt/omnicheck/file.list
file
configuration file entry.
OmniCheck will parse and data that was written to this file after the last
run, but before the file was rotated. This feature only works when a
single file is being monitored in the block.
oldfile: /usr/adm/oSYSLOG oldfile: /oap/logs/.old/process-a.err oldfile: #!/oap/calc_oldfile_name.shSee here for documentation on specifying oldfiles for files within a filelist.
gzip
are used.
gzip: /usr/bin/gzip
NOTE: Do not set 'tmpdir' to /tmp
on Solaris systems, as that filesystem is cleared on reboot, erasing your
tellfiles.
tmpdir: /usr/tmp tmpdir: /oap
block: MAILDB_M01A_SYB block: dsl_reg_db
home
directory or via an
absolute path:
# relative to the 'home' directory rules: rules.local rules.group rules.global rules: rules.process_1 rules.process_group rules.all_processes # absolute rules: /opt/omnicheck/special/rules.special
home
directory will be used.
logs: /oap/logs logs: /usr/local/omnicheck/logs
omnicheck.out
out: omnicheck_%Y%m%d.out out: omnicheck_%H:%M:%S.out
omnicheck.err
err: omnicheck_%Y-%m-%d.err err: omnicheck_%H:%M:%S.err
# highest level = most log entries debug: debug debug: info # notice is default production level debug: notice debug: warn debug: err debug: crit debug: alert debug: emer
interval
seconds, the next event loop will start immediately after the last.
interval: 300
IRS: ^\d\d\d\d-\d\d-\d\d-\d\d\.\d\d\.\d\d\.\d+ IRS: ^[A-Z] [A-Z][a-z][a-z] [ \d]\d \d\d:\d\d:\d\d \d\d\d\d
There is an optional :trim
tag you can associate with the
IRS
entry to match a set of lines, trim off the matching part,
like a date/time stamp, then concatenate the remainders into one line.
IRS: ^\d\d\d\d-\d\d-\d\d-\d\d\.\d\d\.\d\d\.\d+ :trim IRS: ^[A-Z] [A-Z][a-z][a-z] [ \d]\d \d\d:\d\d:\d\d \d\d\d\d :trim
NOTE: may not be used in persistent mode.
production: no # or 0 or off production: yes # or 1 or on
The theory goes that a single component of a farm can endure an failure without causing adverse impact to the farm as a whole.
See 'production' above for true and false values. When the farm value is true, the effect is the same as if the production value is false.
farm: no # or 0 or off farm: yes # or 1 or on
maint: no # or 0 or off maint: yes # or 1 or on
crontab
to provide values for the
minute, hour, day, month, and day-of-week. Any trailing values not assigned
are assumed to match all possible values (*
).
quiet: * 15-19 * * * # no alerts between 3:00pm and 7:59pm quiet: * 15-19 # same as above quiet: * * * * 0,6 # no alerts on Saturday or Sunday
smtphost: localhost smtphost: relay.mail.here.com smtphost: /usr/lib/sendmail -t
pagerhost: pager.foo.com pagerhost: page.mail.here.com
admin: jblow admin: jblow@here.com admin: /oap/omnicheck_admin admin: #!/opt/omnicheck/get_admin.sh
oncall: jblow oncall: jblow@here.com oncall: /oap/omnicheck_oncall oncall: #!/opt/omnicheck/get_oncall.sh
organization: QA_Team organization: NorthAm.Prod organization: Foobar
fqdn: foobar.db.foo.com
@
). Only single files can have their oldfiles specified in
this manner.
syslog.log{tab}syslog.log.0 syslog.log{tab}syslog.log.gz
Abra##pattern actions
20030124##pattern actions
! detected co-location site actions
NOTE: not available in persistent mode.
pattern-a +3 pattern-b actions pattern-a +3 pattern-b actions
pattern-a ... pattern-b actions pattern-a ... pattern-b actions
NOTE: not available in persistent mode.
pattern-a && pattern-b actions pattern-a && pattern-b actions
pattern-a|pattern-b actions pattern-a || pattern-b actions
\
in the pattern:
mail admin ; test message
The word admin
will be translated to the value of
the admin
in the configuration file.
page oncall ; system reboot
The word oncall
will be translated to the value of
the oncall
in the configuration file.
By default, the account name used to form both the email-to-pager and follow-up email are the same. If this is undesirable, you can divide the two addresses in the following manner:
page pager@pager.foo.com/mail@foo.net ; split addresses
Here, any page generated by this action will go to pager@pager.foo.com,
where the accompanying mail will go to mail@foo.net. This value
can also be set in the oncall
configuration file entry.
file /usr/adm/logs/separate.log file /usr/adm/logs/file_$1.log file /usr/adm/logs/file_@1.log
The 'file' action has the ability to interpret the values of the parenthesized data within each matched log entry, and use that data to alter the filename being opened for appending.
exec /usr/local/bin/process_data.sh exec /usr/local/bin/new_output.sh -d @1 -m @2 exec --ignore /usr/local/bin/new_output.sh -d @1 -m @2
The 'exec' action has the ability to interpret the values of the parenthesized data within each matched log entry, and use that data to alter the script name and/or the parameters passed to the script. In these situations, the script will be invoked one time per matched log entry, whereas the default behavior is to pass all matching log entries to a single invocation of the script.
When the --ignore
option is used, the script does not
receive the matched log entries as STDIN. This is to allow
the external script/program to run without needing to manage the
matching log entries if it is not designed to do so.
modify --prepend "this" ; modify --prepend "#!/output/of/script args" ; modify --append "that" ; modify --append "#!/output/of/script args" ; modify --replace "this" "that" ; modify --replace "this" "#!/output/of/script args" ; modify --replace "regex" "that" ; modify --replace "regex" "#!/output/of/script args" ;Using --replace to simulate --prepend:
modify --replace "^" "that" ; modify --replace "^" "#!/output/of/script args" ;Using --replace to simulate --append:
modify --replace "$" "that" ; modify --replace "$" "#!/output/of/script args" ;Instances of "this" and "that" represent simple text strings; "regex" represents a Perl regular expression; and
/output/to/script
represents some external program. The args
of the script can
contain the same $1, $2 variables as other actions:
see Pattern-Action Interaction.
ignore juser ; junk messages
if >= 10 mail admin ; test messages
The valid relations are:
There must be a space after the word if
and after
the numeric value. Space between the relation and numeric
value is optional.
<pattern> if org eq "FOO" mail admin-team ; host issue elsif org eq 'BAR' page oncall ; fix me else ignore admin ; not important endifUse either single (') or double (") quotes to surround the organization value within the rule.
If there are actions outside, or after, an if
block, they
will always take effect. In this example, only instances
used by the FOO organization will get the "host issue" mail,
but all instances will send the "fix tomorrow" mail:
<pattern> if org eq "FOO" mail admin-team ; host issue endif mail admin ; fix tomorrowNOTE: Organization-sensitive actions and threshholded actions are current mutually exclusive, i.e., you cannot do this:
if (org eq "FOO" && >= 10) mail admin ; FOO and over 10 if (>= 10 || org eq "BAR") mail admin ; over 10 or BARIf this is a feature that is requested, the effort will be applied to work out the parsing logic. For now, however, you can do one or the other.
ftpd\[\d+\]: FTP LOGIN FROM (\d+\.\d+\.\d+\.\d+) as (\w+) mail admin ; FTP from $1 as $2
Note: pattern-action interactions are now functional, so that patterns like this:
Error: (\w+) ... Description: "([^"]+)"will now capture the data within two parentheses and provide it as expected.
Then, you need to add a threshold to your actions:
Alpha##foobar if 30/day mail admin ; lots of foobar if over 30/day mail admin ; lots of foobar if under 30/day mail admin ; not enough foobar
On each iteration of OmniCheck, the number of matches for all patterns
that have been tagged will be stored in a .thresh
file in
the tmpdir
directory, timestamped to when the match occurred.
Also, the .thresh
file is kept manageable by trimming off data
entries that exceed by 2 times the maximum threshold time value within the rule files.
If the number of matches for a particular pattern, including what is currently
matching the pattern within the current iteration, and if the preface control
word 'over' is used, and the number is greater than or equal to the
quantity per time unit specified in the action, then the action is invoked;
otherwise, it is not.
If the number of matches for a particular pattern, including what is currently matching the pattern within the current iteration, and if the preface control word 'under' is used, and the number is less than or equal to the quantity per time unit specified in the action, then the action is invoked; otherwise, it is not.
The default control is 'over'.
Labels can contain varibles that are filled in with captured sections from the pattern, just as in actions.
Alpha_$1##foo: (\w+) if 30/day mail admin ; lots of foobar if over 30/day mail admin ; lots of foobar if under 30/day mail admin ; lots of foobarThe available time units are:
h
, hr
, hour
,
hrs
, or hours
d
, day
, or days
w
, week
, wk
,
wks
, or weeks
omnicheck.err
,
depending on the value of the debug
configuration file directive:
-F
command-line option. Failing to provide
that command-line option will cause OmniCheck to terminate.
omnicheck.out
file as they
occur during the normal operation of OmniCheck: