em12c monitoring best practices - rob zoeteweij -

7
EM12c Monitoring Best Practices Author: Rob Zoeteweij Date: 13 October 2012 http://oemgc.wordpress.com Some weeks ago I posted an article on my blog after attending Ana McCollum’s presentation “Beyond the Basics: Making the Most of Oracle Enterprise Manager Monitoring” at OOW 2012. In this document I further elaborated my notes to give a good overview of all topics discussed during the presentation. All credits for Ana! To my opinion this document could very good be the bases for your guys “EM12c Best Practices” document. I included some snippets of pictures I took of the slides during the presentation. They are bit blurry (sorry for that), but I hope they will give a bit more understanding. Creating the Administration Group Hierarchy Specify multiple values for the target property criteria Target Type criteria: Database, Listener, ASM belong to the same group instead of 3 groups Set the time zone when you define the group o Time zone is used for group operations and charts o All subgroups will default to the same time zone After the hierarchy is created, you can: o Add or remove values for a target property (expand/shrink hierarchy horizontally) o Add new/Remove target property criteria (add/remove new level) Hierarchy will be deleted and recreated Template Collections will remain but will need reassociation o Rename any group (EMCLI rename_target) How do I set Target Properties so Targets join Administration Groups? Set properties during target addition/promotion workflow o Target Properties page in console Target menu Target Setup Properties Possible Property Values are based on Administration Group Hierarchy (New in Rel2) o Use EMCLI set_target_property_value for setting the Property Values for multiple Targets at once Aggregate Targets o Cluster targets Target property set on the cluster automatically applies to all members o Noncluster aggregate targets

Upload: others

Post on 15-Feb-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EM12c Monitoring Best Practices - Rob Zoeteweij -

EM12c  Monitoring    Best  Practices  Author:  Rob  Zoeteweij    Date:  13  October  2012  http://oemgc.wordpress.com    Some  weeks  ago  I  posted  an  article  on  my  blog  after  attending  Ana  McCollum’s  presentation  “Beyond  the  Basics:  Making  the  Most  of  Oracle  Enterprise  Manager  Monitoring”  at  OOW  2012.      In  this  document  I  further  elaborated  my  notes  to  give  a  good  overview  of  all  topics  discussed  during  the  presentation.  All  credits  for  Ana!    To  my  opinion  this  document  could  very  good  be  the  bases  for  your  guys  “EM12c  Best  Practices”  document.    I  included  some  snippets  of  pictures  I  took  of  the  slides  during  the  presentation.  They  are  bit  blurry  (sorry  for  that),  but  I  hope  they  will  give  a  bit  more  understanding.      Creating  the  Administration  Group  Hierarchy  • Specify  multiple  values  for  the  target  property  criteria  • Target  Type  criteria:  Database,  Listener,  ASM  belong  to  the  same  group  instead  of  3  

groups  • Set  the  time  zone  when  you  define  the  group  

o Time  zone  is  used  for  group  operations  and  charts  o All  subgroups  will  default  to  the  same  time  zone  

• After  the  hierarchy  is  created,  you  can:  o Add  or  remove  values  for  a  target  property  (expand/shrink  hierarchy  

horizontally)  o Add  new/Remove  target  property  criteria  (add/remove  new  level)  

Hierarchy  will  be  deleted  and  re-­‐created   Template  Collections  will  remain  but  will  need  re-­‐association  

o Rename  any  group  (EMCLI  rename_target)    How  do  I  set  Target  Properties  so  Targets  join  Administration  Groups?  • Set  properties  during  target  addition/promotion  workflow  

o Target  Properties  page  in  console   Target  menu    Target  Setup    Properties   Possible  Property  Values  are  based  on  Administration  Group  

Hierarchy  (New  in  Rel2)  o Use  EMCLI  set_target_property_value  for  setting  the  Property  Values  for  

multiple  Targets  at  once  • Aggregate  Targets  

o Cluster  targets   Target  property  set  on  the  cluster  automatically  applies  to  all  

members  o Non-­‐cluster  aggregate  targets  

Page 2: EM12c Monitoring Best Practices - Rob Zoeteweij -

Target  property  set  on  aggregate  does  not  auto  apply  to  members  • Members  could  be  part  of  different  aggregate  targets,  

properties  therefor  need  to  be  set  explicitly   Templates  auto-­‐applied  only  to  members  whose  target  properties  

match  the  group  criteria  (aka  Direct  Members)   To  set  target  property  on  aggregate  and  its  current  members  

• EMCLI  set_target_property_value  –propagate_to_members  • Example:  set  the  Location  property  of  a  database  system  

including  its  members:    emcli set_target_property_value –property_records=”dbrac_sys:oracle_dbsys:Location Bangalore” –propagate_to_members  

What  Monitoring  Settings  will  be  applied  to  the  Administration  Group?  • Enhanced  Group  Management  Settings  (New  Rel2)  

o Use  on  LEAF  Groups  o Shows  parent  groups/template  collections  o Review  specific  monitoring  templates  o Review  combined  monitoring  settings  from  multiple  templates  o Verify  if  management  settings  have  been  applied  to  the  group  

 Are  my  Targets  monitored  using  our  Standards  for  Monitoring?  • Check  synchronization  Status  region  of  TOPMOST  administration  group  

o Shows  sync  status  of  all  targets  in  hierarchy    Sync  Status  Column   What  to  do  Synchronized  Targets   Nothing.  Targets  are  in  sync  with  monitoring  

templates  Pending  Targets   Ensure  you  have  Global  Sync  Schedule  

defined.  Indicated  by  ‘Next  Synchronization’  date;  if  N/A  set  schedule  

Running  Targets   Nothing.  Check  later  to  see  if  they  are  all  synchronized.  

Failed  Targets   Drilldown  to  get  details;  Fix  where  possible.  Will  attempt  to  re-­‐sync  on  next  sync  schedule,  or  on  demand  by  user  

N/A  targets   Targets  have  no  associated  monitoring  template.  Drilldown  to  get  target  type,  add  monitoring  template  to  template  collection.  

   

Privileges  required  for  Monitoring  Setup  • You  need  to  use  super  administrator  to  perform  these  actions  

 Monitoring  Setup   Required  Privilege  Create  Administration  Group  Hierarchy  

• FULL  Any  Target  • Create  Privilege  Propagating  Group  

Page 3: EM12c Monitoring Best Practices - Rob Zoeteweij -

Use  Monitoring  Templates   • None  to  create  • View  on  specific  Monitoring  Template  

Use  Template  Collections   • Create  Template  Collection  • View/Full  on  specific  Template  

Collections  or  View  any  Template  Collection  

Associate  Template  Collection  with  Administration  Group  

• Operator  on  group  • View  on  Template  Collection  

 Incident  Management    

 • Manage  by  Incidents  

o Significant  events  o Combination  of  events  related  to  the  same  issue  (e.g.  events  raised  from  

database,  host,  storage  indicating  lack  of  space)  • Centralized  incident  management  console  

o View,  manage,  diagnose  and  resolve  incidents  from  one  location  • Support  for  incident  lifecycle  operations  

o Assign,  acknowledge,  prioritize,  track  status,  escalate,  suppress  o Notify  and  open  helpdesk  ticket  

• Integrated  Oracle  expertise  o Access  to  My  Oracle  Support  knowledge  base  o Accelerates  incident  and  problem  diagnosis  and  resolution  

 What  Targets  should  be  used  in  Rule  Sets?  • Specify  group(s)/systems  

o Specify  administration  group(s)  if  applicable  o Rules  keep  up  with  changes  in  group  membership  o Example:  All  database  targets  whose  Lifecycle  Status  =  ‘Mission  Critical’  or  

‘Production’  

Page 4: EM12c Monitoring Best Practices - Rob Zoeteweij -

 How  do  I  organize  my  Rule  Sets  /  Rules?  • Combine  all  rules  applying  to  the  same  group  in  one  rule  set  • Leverage  the  order  of  rules  within  a  rule  set  and  group  similar  rules  together:  

o Rules  to  create  incidents  o Rules  to  manage  incidents  (email,  ticketing,  escalation)  o Put  duration-­‐based  rules  last  

• Duplicate  actions  across  rule  sets  o ‘Create  Incident’:  first  rule  wins  (can’t  create  multiple  incidents  for  same  

event  o Incident  workflow  (assign,  set  priority…):  last  rule  wins  (final  value  from  

rule)  o Notifications:  all  actions  executed  

 What  Type  of  Rules  should  I  choose?  Type  of  Rule   Best  used  for  Event  Rule   • Create  incidents  based  on  events  

• Create  helpdesk  tickets  for  incidents  • Send  events  to  third  party  

management  systems  • Send  email  for  specific  events  of  

interest  (e.g.  send  email  to  business  users  if  target  is  down)  

Incident  Rule   • Automate  incident  workflow  operations  (e.g.  assign  incident)  

• Send  notifications  on  incidents  • Create  helpdesk  tickets  for  incidents  

(e.g.  create  ticket  if  incident  is  escalated  to  level  2)  

Problem  Rule   • Automate  problem  workflow  operations  (e.g.  assignment,  prioritization,  etc.)  

• Send  notifications  on  problems    What  Conditions  should  I  specify  in  Event  Rules  • Use  broad  criteria  that  spans  multiple  target  types  • Metric  Alert  event  rule  

o Use  broad  criteria  (e.g.  all  critical  events  or  critical  events  on  specified  target  types)  instead  of  individual  metrics  

Requires  controlling  metric  alerts  thresholds)  at  the  source   Simplifies  rule  maintenance:  No  need  to  change  rule  for  new  metrics  

• Target  Availability  event  rule  o Based  on  status  metric  o Choose  ‘agent  unreachable’  only  for  host  and/or  agent  targets  o Choose  ‘down’  for  all  other  targets  

 Target  Availability  Event  Severities  Scenario   Target  Type   Target  Status   Availability  

Page 5: EM12c Monitoring Best Practices - Rob Zoeteweij -

Event  Severity  

Target  is  down   All  target  types  except  host  and  agent  

Down   Fatal  

Agent  is  down  or  unreachable  

Agent   Agent  Unreachable   Critical  

  All  non-­‐agent  targets  including  host  

Agent  Unreachable   Warning  

Host  is  down  or  unreachable  

Agent  on  the  host   Agent  Unreachable   Critical  

  All  agents  on  the  host  including  host  

Agent  Unreachable   Warning  

Blackout  started  on  target   All  target  types   Blackout   Advisory  Target  is  up  (from  any  of  the  other  states)  

All  target  types   Up   Clear  

Target  is  in  status  pending  for  more  than  5  minutes  

All  target  types   Unknown   Warning  

 What  Conditions  should  I  specify  in  Event  Rules  –  2  • Job  Status  event  rule  

o No  job  events  unless  you  set  it  up  o Setup    Incidents    Job  Events  o Choose  Job  Status  to  raise  events  

Action  Required,  Problems  are  defaults  o Select  targets  on  which  job  events  are  raised  (tip:  use  groups)  

 Who  gets  notified  for  Events  /  Incidents  • Checklist  for  email  notifications  

o Recipient  must  have  at  least  View  on  the  source  of  the  event  o Recipient  must  have  email  address  and  notification  schedule  

• Can  specify  direct  email  addresses  including  distribution  lists  • Leverage  TO:  vs  CC:  email  notification  

o TO:  recipients:  Best  used  to  enforce  mandatory  recipients  of  the  email.  Only  rule  creator  can  add  these  

o CC:  Recipients:  Best  used  for  interested  parties;  Users  who  self-­‐subsribe  to  rule  are  added  to  the  CC  line  

• Take  advantage  of  ‘page’  vs  ‘email’  classification  o Enables  easier  setup  for:  Page  me  for  critical,  email  me  for  warning  

• Use  variables  as  notification  recipients:    $INCIDENT_OWNER$ $TGT_OWNER$ $PROBLEM_OWNER$ $SOURCE_OBJ_OWNER$    Example:  Setup  a  single  rule  to  send  email  notifications  to  the  $INCIDENT_OWNER$  when  he  gets  assigned  an  incident  

Page 6: EM12c Monitoring Best Practices - Rob Zoeteweij -

 

 • Tailor  email  message  (Subject,  Body)  formats  for  your  requirements  

o Setup    Notifications    Customize  Email  Format  o Customize  per  event  type,  incident  or  problem  email  messages  

 Using  Incident  Rule  Sets  • For  rules  with  duration  conditions,  put  more  specific  criteria  in  the  rule  

o Rule:  After  7  days,  for  all  critical  events,  clear  event  o Better  rule:  After  7  days,  for  all  critical  Generic  Alert  Log  Error  events,  clear  

event  • Leverage  failover  feature  for  SMTP  gateway  by  specifying  multiple  gateways  • Setup  repeat  notifications  for  important  incidents  

o Will  repeat  until  cleared  or  acknowledged  o Acknowledge  the  incident  via  Console  or  Enterprise  Manager  Mobile  

 How  to  clear  these  Events  /  Incidents?  • Incidents  will  auto-­‐clear  if  all  their  underlying  events  are  cleared  

o Most  events  auto-­‐clear  if  underlying  condition  is  resolved  • Exception:  Manually-­‐clearable  events  

o emcli clear_stateless_alerts  (bulk  clear  for  metric  alerts)  o get_metrics_for_stateless_alerts  lists  manually  clearable  metric  alerts  o Event  Rule  to  clear  events  after  specified  duration  

Tip:  Put  the  specific  metric  alerts  in  the  rule  o Incident  Manager:  clear  (appears  if  applicable)  

Clear  multiple  incidents  (New  R2)  

Page 7: EM12c Monitoring Best Practices - Rob Zoeteweij -

   Leveraging  Incident  Manager  • To  filter  on  incidents  of  interest,  create  custom  views  on  groups  or  by  lifecycle  status  

(New  R2)  • To  enable  more  granular  tracking  of  the  incident  status,  add  new  resolution  status  

values  (e.g.  Waiting  on  SME)  o emcli create_resolution_state

• Leverage  ‘Resolved’  incident  status  as  ‘soft  closed’  o Set  this  wen  fix  has  been  implemented  o Enterprise  Manager  will  set  to  ‘Closed’  when  the  underlying  event/incident  is  

cleared    Maintain  Priority  processing  of  important  Targets  • Set  Lifecycle  Status  target  property  especially  for  important  targets  

o Mission  Critical,  Production,  Staging,  Test,  Development  o Highest  priority  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐>>>  Lowes  priority  

• Used  to  prioritize  loading  of  data  and  metric  alerts,  and  processing  of  events  for  notifications,  creating  incidents,  etc.  

• Enable  priority  processing  of  important  targets  even  if  managed  targets  increase