Crash Grid Control agent, too many open files;health check-error

Agent (10.2.0.4) crashes – on a site (64-bit) with many databases (10.2.0.3) a lot, intermittently – , too many open files, emagent.trc gives ‘health check’ error

Agent gave messages – in the past – like this:

Number files opened by Agent is 1140.
These files appeared to be the $ORACLE_HOME/dbs/hc<instance>_.dat which is loaded a lot in memory.

Checked this out by first determining which process the agent has:
–> ps -ef |grep emagent
Then checked the memory map of the process of the agent (here e.g. : 23645)
–> pmap -x 23645
And this is a very, very long list,so I hit the jackpot…

Logging of the emagent.trc ($AGENT_HOME/sysman/log):

2008-04-22 15:29:20,221 Thread-4094679984 ERROR engine: [oracle_database,<rep_database>,health_check] :
nmeegd_GetMetricData failed : Instance Health Check initialization failed due to one of the
following causes: the owner of the EM agent process is not same as the owner of the Oracle
instance processes; the owner of the EM agent process is not part of the dba group; or the
database version is not 10g (10.1.0.2) and above.

Cause: Bug 5872000 – HEALTHCHECK ERROR OCCURS FOR 32BIT DATABASE ON 64BIT OS DUE TO BUG4526916 FIX.
The Healthcheck file, namely $RDBMS_HOME/dbs/hc_.dat file differs in  size from the memory structure used by the Agent to read it. This file is created by the database on startup time, if not present.

This happens when the database is e.g. 10.2.0.4 and the agent 10.2.0.3 and vice versa.
In the latter case this is solved by upgrading the database to 10.2.0.4. or 11.1.0.7.

Possible solutions for myself:

1. Apply Patch 5872000 to 32-bit or 64-bit databases on 64-bit machines.
This needs to be applied on top of 10.1 -> 10.2.0.3, and 11.1.0.6 databases.
The following file may need to be removed from the DATABASE $ORACLE_HOME/dbs directory before starting up the database after patch application: hc_.dat
NOTE: This file is created on database start up if not present. The agent uses this file for the
Healthcheck metric. By recreating the file on start up after the patch application, the file is
the correct one needed by the agent.

An easier way for the time being:
2. Disable the Healthcheck metric per database in grid Control:
This has no consequences for the monitoring the database if it’s still up for example (I tested this first..).
In the Metric and Policy settings page, tab Metrics, you will not see the metric Health Check displayed, even if you choose All metrics instead of the default Metrics with thresholds value in the Drop Down list titled View.

The Health Check metric is a composite metric which includes 7 metrics:
Instance Status
Instance State
Maintenance
Mounted
State Description
Unavailable
Unmounted

a. Go to the database Home Page for which you want to disable this metric
At the bottom pane Related Links, click on the link Metric and Policy Settings
b. Go to the metric Instance Status (or to any other metric belonging to the Health Check metric)
Click on the link in the column “Frequency Schedule”: 15 seconds by default
c. Once in the Edit Collection Settings Home Page
Press the disable button to disable this metric collection or
Change the collection frequency and any other value you want to change in this page
Note: you will see at the bottom of the page a sheet titled “Affected Metrics” which lists all the metrics which will be changed in the same way as the current metric.
You will notice that all the metrics pertaining to the Health Check metric are listed there.
Hence they will all been disabled or they will all have the new frequency collection as the one currently updated.
d. Click on Continue then on OK in order for this changes to be saved in the repository
e. Then click on OK once the Update confirmation received.
A new collection file for this database will be created in the Monitoring Agent of this target in the directory $ORACLE_HOME/sysman/emd/collection.

Oh, and don’t forget to stop and start the agent on the target nodes after you’ve done this.

Grid 10.2.0.5:

Mr Akhtar Tiwana checked this issue with Oracle support,  and they suggested to remove the warning and critical thresholds for health check metrics (making them NULL) and that will do the same. The functionality to disable these metrics apparently have been taken away in grid 10.2.05.

Used sources:
564617.1 Agent Fails on Instance Health Check Following Upgrade To 10.2.0.4
566607.1 Healthcheck Metric Collection Fails Since Agent was Upgraded to 10.2.0.4 on Linux x86-64 platform

By |April 16th, 2009|Categories: grid control|Tags: |2 Comments

Grid Control, OMS : large number of defunct-processes

m4s0n501

On my Grid Control management-server (Suse Linux, OMS version 10.2.0.4) a very large number of <defunct> processes arose what eventually caused the OMS not to respond anymore.

Looked like this:
oracle   16932 15961  0 Mar03 ?        00:00:00 <defunct>
oracle   16987 15961  0 Mar03 ?        00:00:00 <defunct>
oracle   17027 15961  0 Mar03 ?        00:00:00 <defunct>

… etc.

The process what caused this appeared to be the iasconsole:

oracle   15961     1  0 Mar03 ?        00:05:10 /software/oracle/product/GC10g/oms10g/perl/bin/perl /software/oracle/product/GC10g/oms10g/bin/emwd.pl iasconsole /software/oracle/product/GC10g/oms10g/sysman/log/em.nohup

Stopping and starting did clean up the defunct-processes, but only temporarily.

Two solutions

1. Not supported by Oracle :-) , solution contributed by a guy called Seb on the forums (but it works!):
Stop the iasconsole.
– Edit $ORACLE_HOME/bin/IASConsole.pm
– Modify the following line:
from #my $ua = LWP::UserAgent->new(keep_alive=>1);
to my $ua = LWP::UserAgent->new;
– Start the iasconsole.

2. Follow the note 391894.1
– It’s the bug 5504078 Abstract: EMWD.PL SPWANS DEFUNCT PERL PROCESSES AFTER OMS PATCH 10.2.0.2:
The script attempts to locate a file called WINDOWS_NT which on Unix of course does not exist. Consequently a defunct process is created.
– Stop the iasconsole:
emctl stop iasconsole
– cd $ORACLE_HOME/bin
touch Windows_NT
chmod 544 Windows_NT
– Start the iasconsole:
emctl start iasconsole

By |March 31st, 2009|Categories: grid control|Tags: , |0 Comments

Installing Grid Control 10.2.0.4

Redelijk gedoe met versies. Basis: Suse Linux 9, 32-bits. Agents op Suse Linux: versie 9 en 10, 64-bits.

install-grid-control_10.2.0.4

By |November 14th, 2008|Categories: grid control|Tags: , |0 Comments