Agent (10.2.0.4) crashes – on a site (64-bit) with many databases (10.2.0.3) a lot, intermittently – , too many open files, emagent.trc gives ‘health check’ error
Agent gave messages – in the past – like this:
Number files opened by Agent is 1140.
These files appeared to be the $ORACLE_HOME/dbs/hc<instance>_.dat which is loaded a lot in memory.
Checked this out by first determining which process the agent has:
–> ps -ef |grep emagent
Then checked the memory map of the process of the agent (here e.g. : 23645)
–> pmap -x 23645
And this is a very, very long list,so I hit the jackpot…
Logging of the emagent.trc ($AGENT_HOME/sysman/log):
2008-04-22 15:29:20,221 Thread-4094679984 ERROR engine: :
nmeegd_GetMetricData failed : Instance Health Check initialization failed due to one of the
following causes: the owner of the EM agent process is not same as the owner of the Oracle
instance processes; the owner of the EM agent process is not part of the dba group; or the
database version is not 10g (10.1.0.2) and above.
Cause: Bug 5872000 – HEALTHCHECK ERROR OCCURS FOR 32BIT DATABASE ON 64BIT OS DUE TO BUG4526916 FIX.
The Healthcheck file, namely $RDBMS_HOME/dbs/hc_.dat file differs in size from the memory structure used by the Agent to read it. This file is created by the database on startup time, if not present.
This happens when the database is e.g. 10.2.0.4 and the agent 10.2.0.3 and vice versa.
In the latter case this is solved by upgrading the database to 10.2.0.4. or 18.104.22.168.
Possible solutions for myself:
1. Apply Patch 5872000 to 32-bit or 64-bit databases on 64-bit machines.
This needs to be applied on top of 10.1 -> 10.2.0.3, and 22.214.171.124 databases.
The following file may need to be removed from the DATABASE $ORACLE_HOME/dbs directory before starting up the database after patch application: hc_.dat