grid control

Crash Grid Control agent, too many open files;health check-error

Agent (10.2.0.4) crashes – on a site (64-bit) with many databases (10.2.0.3) a lot, intermittently – , too many open files, emagent.trc gives ‘health check’ error

Agent gave messages – in the past – like this:

Number files opened by Agent is 1140.
These files appeared to be the $ORACLE_HOME/dbs/hc<instance>_.dat which is loaded a lot in memory.

Checked this out by first determining which process the agent has:
–> ps -ef |grep emagent
Then checked the memory map of the process of the agent (here e.g. : 23645)
–> pmap -x 23645
And this is a very, very long list,so I hit the jackpot…

Logging of the emagent.trc ($AGENT_HOME/sysman/log):

2008-04-22 15:29:20,221 Thread-4094679984 ERROR engine: :
nmeegd_GetMetricData failed : Instance Health Check initialization failed due to one of the
following causes: the owner of the EM agent process is not same as the owner of the Oracle
instance processes; the owner of the EM agent process is not part of the dba group; or the
database version is not 10g (10.1.0.2) and above.

Cause: Bug 5872000 – HEALTHCHECK ERROR OCCURS FOR 32BIT DATABASE ON 64BIT OS DUE TO BUG4526916 FIX.
The Healthcheck file, namely $RDBMS_HOME/dbs/hc_.dat file differs in  size from the memory structure used by the Agent to read it. This file is created by the database on startup time, if not present.

This happens when the database is e.g. 10.2.0.4 and the agent 10.2.0.3 and vice versa.
In the latter case this is solved by upgrading the database to 10.2.0.4. or 11.1.0.7.

Possible solutions for myself:

1. Apply Patch 5872000 to 32-bit or 64-bit databases on 64-bit machines.
This needs to be applied on top of 10.1 -> 10.2.0.3, and 11.1.0.6 databases.
The following file may need to be removed from the DATABASE $ORACLE_HOME/dbs directory before starting up the database after patch application: hc_.dat
NOTE: […]

By |April 16th, 2009|grid control|2 Comments

Grid Control, OMS : large number of defunct-processes

On my Grid Control management-server (Suse Linux, OMS version 10.2.0.4) a very large number of <defunct> processes arose what eventually caused the OMS not to respond anymore.

Looked like this:
oracle   16932 15961  0 Mar03 ?        00:00:00 <defunct>
oracle   16987 15961  0 Mar03 ?        00:00:00 <defunct>
oracle   17027 15961  0 Mar03 ?        00:00:00 <defunct>

… etc.

The process what caused this appeared to be the iasconsole:

oracle   15961     1  0 Mar03 ?        00:05:10 /software/oracle/product/GC10g/oms10g/perl/bin/perl /software/oracle/product/GC10g/oms10g/bin/emwd.pl iasconsole /software/oracle/product/GC10g/oms10g/sysman/log/em.nohup

Stopping and starting did clean up the defunct-processes, but only temporarily.

Two solutions

1. Not supported by Oracle :-) , solution contributed by a guy called Seb on the forums (but it works!):
Stop the iasconsole.
- Edit $ORACLE_HOME/bin/IASConsole.pm
- Modify the following line:
from #my $ua = LWP::UserAgent->new(keep_alive=>1);
to my $ua = LWP::UserAgent->new;
- Start the iasconsole.

2. Follow the note 391894.1
- It’s the bug 5504078 Abstract: EMWD.PL SPWANS DEFUNCT PERL PROCESSES AFTER OMS PATCH 10.2.0.2:
The script attempts to locate a file called WINDOWS_NT which on Unix of course does not exist. Consequently a defunct process is created.
- Stop the iasconsole:
emctl stop iasconsole
- cd $ORACLE_HOME/bin
touch Windows_NT
chmod 544 Windows_NT
- Start the iasconsole:
emctl start iasconsole

By |March 31st, 2009|grid control|0 Comments

Installing Grid Control 10.2.0.4

Redelijk gedoe met versies. Basis: Suse Linux 9, 32-bits. Agents op Suse Linux: versie 9 en 10, 64-bits.

install-grid-control_10.2.0.4

By |November 14th, 2008|grid control|0 Comments