Download - Patrol/Ranger Update
![Page 1: Patrol/Ranger Update](https://reader035.vdocuments.site/reader035/viewer/2022062407/56812a71550346895d8df46a/html5/thumbnails/1.jpg)
Tapemove1
SD
SD
S u n
E NTE RP RI SE15 0
Tapemove2SD
Sunm ic ro s y st e m
Shire
SD
4 5 0EN TE RP RI SE
S unSPA RC
DR IVE NUL TRA
E450Objyserv1
SD
Sun
S PAR Cs tor age Li bra ry
A3500.5TB
Patrol/Ranger UpdateChuck Boeheim
Assistant Director
SLAC Computer Services
![Page 2: Patrol/Ranger Update](https://reader035.vdocuments.site/reader035/viewer/2022062407/56812a71550346895d8df46a/html5/thumbnails/2.jpg)
Tapemove1
SD
SD
S u n
E NTE RP RI SE15 0
Tapemove2
SD
Sunm ic ro s y st e m
Shire
SD
4 5 0EN TE RP RI SE
S unSPA RC
DR IVE NUL TRA
E450Objyserv1
SD
Sun
S PAR Cs tor age Li bra ry
A3500.5TB
History
Patrol originated in 1994• Originally only to renice processes• Extended to monitor filesystems, daemons, and
to perform more notifications/repairs
Downloaded by over 300 sites, in production use in about 20 known sites
![Page 3: Patrol/Ranger Update](https://reader035.vdocuments.site/reader035/viewer/2022062407/56812a71550346895d8df46a/html5/thumbnails/3.jpg)
Tapemove1
SD
SD
S u n
E NTE RP RI SE15 0
Tapemove2
SD
Sunm ic ro s y st e m
Shire
SD
4 5 0EN TE RP RI SE
S unSPA RC
DR IVE NUL TRA
E450Objyserv1
SD
Sun
S PAR Cs tor age Li bra ry
A3500.5TB
Limitations
Original rules language simple, columnarPC afs[0-9]* 50 log,mail(unix-admin)
Difficult to extend to express complexities• E.g., renice processes using more than 20% of
the CPU if the load average is over 3.
Written in Perl4, limited by not having complex data structures
![Page 4: Patrol/Ranger Update](https://reader035.vdocuments.site/reader035/viewer/2022062407/56812a71550346895d8df46a/html5/thumbnails/4.jpg)
Tapemove1
SD
SD
S u n
E NTE RP RI SE15 0
Tapemove2
SD
Sunm ic ro s y st e m
Shire
SD
4 5 0EN TE RP RI SE
S unSPA RC
DR IVE NUL TRA
E450Objyserv1
SD
Sun
S PAR Cs tor age Li bra ry
A3500.5TB
The Rewrite
Update to Perl5 Introduce new rules language Introduce extensible data collectors Rename to System Ranger
![Page 5: Patrol/Ranger Update](https://reader035.vdocuments.site/reader035/viewer/2022062407/56812a71550346895d8df46a/html5/thumbnails/5.jpg)
Tapemove1
SD
SD
S u n
E NTE RP RI SE15 0
Tapemove2
SD
Sunm ic ro s y st e m
Shire
SD
4 5 0EN TE RP RI SE
S unSPA RC
DR IVE NUL TRA
E450Objyserv1
SD
Sun
S PAR Cs tor age Li bra ry
A3500.5TB
Rules file structure
Config section supplies local customizations
Ruleset sections defines data collectors and the set of rules to be applied to them
Message section defines message texts
![Page 6: Patrol/Ranger Update](https://reader035.vdocuments.site/reader035/viewer/2022062407/56812a71550346895d8df46a/html5/thumbnails/6.jpg)
Tapemove1
SD
SD
S u n
E NTE RP RI SE15 0
Tapemove2
SD
Sunm ic ro s y st e m
Shire
SD
4 5 0EN TE RP RI SE
S unSPA RC
DR IVE NUL TRA
E450Objyserv1
SD
Sun
S PAR Cs tor age Li bra ry
A3500.5TB
Config section
Supplies the common customizations made at other sites
config
{
optsfile(/etc/tailor.opts) path(/usr/ucb:/bin:/usr/bin)
mailfrom('The System Ranger <root>') mailreply(’Unix Admins <unix-admin>')
}
![Page 7: Patrol/Ranger Update](https://reader035.vdocuments.site/reader035/viewer/2022062407/56812a71550346895d8df46a/html5/thumbnails/7.jpg)
Tapemove1
SD
SD
S u n
E NTE RP RI SE15 0
Tapemove2
SD
Sunm ic ro s y st e m
Shire
SD
4 5 0EN TE RP RI SE
S unSPA RC
DR IVE NUL TRA
E450Objyserv1
SD
Sun
S PAR Cs tor age Li bra ry
A3500.5TB
Rulesets
Rulesets name a set of rules and associate them with a data collectorRuleset(anyname) collector(process)
{
list of rules...
}
Builtin data collectors are: System, Process, Daemon, User, Filesystem, File, Service
Custom collectors are planned
![Page 8: Patrol/Ranger Update](https://reader035.vdocuments.site/reader035/viewer/2022062407/56812a71550346895d8df46a/html5/thumbnails/8.jpg)
Tapemove1
SD
SD
S u n
E NTE RP RI SE15 0
Tapemove2
SD
Sunm ic ro s y st e m
Shire
SD
4 5 0EN TE RP RI SE
S unSPA RC
DR IVE NUL TRA
E450Objyserv1
SD
Sun
S PAR Cs tor age Li bra ry
A3500.5TB
Rules
A rule is a set of function calls in bracesRule { cpu(gt,50) kill() log() }
Functions return SUCCESS or FAILURE FAILURE causes remainder of rule not to
be executed, execution passes to next rule A rule that succeeds ends processing of the
ruleset unless the CONTINUE function appears in it.
![Page 9: Patrol/Ranger Update](https://reader035.vdocuments.site/reader035/viewer/2022062407/56812a71550346895d8df46a/html5/thumbnails/9.jpg)
Tapemove1
SD
SD
S u n
E NTE RP RI SE15 0
Tapemove2
SD
Sunm ic ro s y st e m
Shire
SD
4 5 0EN TE RP RI SE
S unSPA RC
DR IVE NUL TRA
E450Objyserv1
SD
Sun
S PAR Cs tor age Li bra ry
A3500.5TB
Rules
The word OR may connect functionsRule { cpu(gt,50) or size(gt,20M) kill() }
A sequence of functions in braces returns SUCCESS or FAILURE for the entire sequenceRule {{cpu(gt,50) kill()} or cpu(gt,25) log }
A sequence of functions in brackets always returns SUCCESS• Rule { cpu(gt,50) [size(gt,10M) kill] log }
![Page 10: Patrol/Ranger Update](https://reader035.vdocuments.site/reader035/viewer/2022062407/56812a71550346895d8df46a/html5/thumbnails/10.jpg)
Tapemove1
SD
SD
S u n
E NTE RP RI SE15 0
Tapemove2
SD
Sunm ic ro s y st e m
Shire
SD
4 5 0EN TE RP RI SE
S unSPA RC
DR IVE NUL TRA
E450Objyserv1
SD
Sun
S PAR Cs tor age Li bra ry
A3500.5TB
Selection Functions
Apply to specific machines:• host• option• arch• test
Apply to specific instances:• user• group• name
All tests may be negative or positive e.g., host(icarus) or user(!root)
![Page 11: Patrol/Ranger Update](https://reader035.vdocuments.site/reader035/viewer/2022062407/56812a71550346895d8df46a/html5/thumbnails/11.jpg)
Tapemove1
SD
SD
S u n
E NTE RP RI SE15 0
Tapemove2
SD
Sunm ic ro s y st e m
Shire
SD
4 5 0EN TE RP RI SE
S unSPA RC
DR IVE NUL TRA
E450Objyserv1
SD
Sun
S PAR Cs tor age Li bra ry
A3500.5TB
Comparison Functions
Determine when thresholds crossed• cpu - percent of CPU• size - memory or file size or rate of change• time - total CPU time
Or test global values• loadavg, numusers, numprocs, uptime
Have optional first argument specifying comparison: gt, lt, eq, etc.
![Page 12: Patrol/Ranger Update](https://reader035.vdocuments.site/reader035/viewer/2022062407/56812a71550346895d8df46a/html5/thumbnails/12.jpg)
Tapemove1
SD
SD
S u n
E NTE RP RI SE15 0
Tapemove2
SD
Sunm ic ro s y st e m
Shire
SD
4 5 0EN TE RP RI SE
S unSPA RC
DR IVE NUL TRA
E450Objyserv1
SD
Sun
S PAR Cs tor age Li bra ry
A3500.5TB
Action Functions
Specify some action to perform• log• mail• page• kill, signal (by pid or name)• nice
![Page 13: Patrol/Ranger Update](https://reader035.vdocuments.site/reader035/viewer/2022062407/56812a71550346895d8df46a/html5/thumbnails/13.jpg)
Tapemove1
SD
SD
S u n
E NTE RP RI SE15 0
Tapemove2
SD
Sunm ic ro s y st e m
Shire
SD
4 5 0EN TE RP RI SE
S unSPA RC
DR IVE NUL TRA
E450Objyserv1
SD
Sun
S PAR Cs tor age Li bra ry
A3500.5TB
Sample Process Rules
Rule { host(www.*) pct(gt,10) or size(gt,20M)
mail(PROC_REPORT,www-monitor) mcons(info) log
}
Rule { {time(gt,6h) kill mail(OVERLIM, $user)} or {time(gt,4h) mail(WARN2, $user)} or
{time(gt,2h) mail(WARN1, $user)}
}
Message OVERLIM <<EOF
The CPU limit for $host is 6 hours. Your
process $pid $cmd has been terminated for
exceeding the limit.
<<EOF
![Page 14: Patrol/Ranger Update](https://reader035.vdocuments.site/reader035/viewer/2022062407/56812a71550346895d8df46a/html5/thumbnails/14.jpg)
Tapemove1
SD
SD
S u n
E NTE RP RI SE15 0
Tapemove2
SD
Sunm ic ro s y st e m
Shire
SD
4 5 0EN TE RP RI SE
S unSPA RC
DR IVE NUL TRA
E450Objyserv1
SD
Sun
S PAR Cs tor age Li bra ry
A3500.5TB
Sample Filesystem Rules
Rule { name(/u[0-9]) pct(gt,99,90+1) page(admin)}
Rule { host(afs[0-9]+) name(/vicep.*)
{ host(afs07) name(/vicepg) } or
{ host(afs08) name(/vicepf) } or
{ pct(gt,98) mail(FSFULL, admin) }
}
Message FSFULL <<EOF
File system $name is $pct% full, grew by $delta%.
EOF
![Page 15: Patrol/Ranger Update](https://reader035.vdocuments.site/reader035/viewer/2022062407/56812a71550346895d8df46a/html5/thumbnails/15.jpg)
Tapemove1
SD
SD
S u n
E NTE RP RI SE15 0
Tapemove2
SD
Sunm ic ro s y st e m
Shire
SD
4 5 0EN TE RP RI SE
S unSPA RC
DR IVE NUL TRA
E450Objyserv1
SD
Sun
S PAR Cs tor age Li bra ry
A3500.5TB
Sample File Rules
Rule { name(/var/adm*) size(gt,1M) page(admin) }
Rule { name(/etc/passwd) md5()
mail(PSWDCHG, admin)
}
Message PSWDCHG <<EOF
File $name has been changed!
EOF
![Page 16: Patrol/Ranger Update](https://reader035.vdocuments.site/reader035/viewer/2022062407/56812a71550346895d8df46a/html5/thumbnails/16.jpg)
Tapemove1
SD
SD
S u n
E NTE RP RI SE15 0
Tapemove2
SD
Sunm ic ro s y st e m
Shire
SD
4 5 0EN TE RP RI SE
S unSPA RC
DR IVE NUL TRA
E450Objyserv1
SD
Sun
S PAR Cs tor age Li bra ry
A3500.5TB
Sample Daemon Rules
Rule { name(nfsd) number(ne,8) page(admin) }
Rule { name(pud) number(lt,1) restart(pud) }
Rule { name(amd) number(gt,1) page(admin) }
![Page 17: Patrol/Ranger Update](https://reader035.vdocuments.site/reader035/viewer/2022062407/56812a71550346895d8df46a/html5/thumbnails/17.jpg)
Tapemove1
SD
SD
S u n
E NTE RP RI SE15 0
Tapemove2
SD
Sunm ic ro s y st e m
Shire
SD
4 5 0EN TE RP RI SE
S unSPA RC
DR IVE NUL TRA
E450Objyserv1
SD
Sun
S PAR Cs tor age Li bra ry
A3500.5TB
Sample User Rules
Still somewhat experimental
Rule { user(!root) number(gt,3) pct(gt,50)
mail(CPUHOG, admin)
}
Message CPUHOG <<EOF
User $user has $number processes using $pct%
of the CPU on $host.
<<EOF
![Page 18: Patrol/Ranger Update](https://reader035.vdocuments.site/reader035/viewer/2022062407/56812a71550346895d8df46a/html5/thumbnails/18.jpg)
Tapemove1
SD
SD
S u n
E NTE RP RI SE15 0
Tapemove2
SD
Sunm ic ro s y st e m
Shire
SD
4 5 0EN TE RP RI SE
S unSPA RC
DR IVE NUL TRA
E450Objyserv1
SD
Sun
S PAR Cs tor age Li bra ry
A3500.5TB
Why Ranger?
Some automatic monitoring is needed Commercial packages are complex and
expensive Ranger does a lot in a small package Because it’s cool
![Page 19: Patrol/Ranger Update](https://reader035.vdocuments.site/reader035/viewer/2022062407/56812a71550346895d8df46a/html5/thumbnails/19.jpg)
Tapemove1
SD
SD
S u n
E NTE RP RI SE15 0
Tapemove2
SD
Sunm ic ro s y st e m
Shire
SD
4 5 0EN TE RP RI SE
S unSPA RC
DR IVE NUL TRA
E450Objyserv1
SD
Sun
S PAR Cs tor age Li bra ry
A3500.5TB
Availability
Needs a bit more shakedown at SLAC before distribution
Look for via http://www.slac.stanford.edu/~boeheim
Will be starting a mailing list; send email to be included