Grid Control , ora-27102, memory needed.

Tried to prepare my 10.2.0.4 – Grid Control for an upgrade to 10.2.0.5.   See also http://download.oracle.com/docs/cd/B16240_01/doc/install.102/e10953/toc.htm .

Updated for example the shared_pool_size  as 128MB as minimum size (see below for other  recommended fixed parameter settings),  although I think it also will work while setting the sga_target. Changed the sga_max_size, shutdown, startup… unfortunately: ora-27102 – out of memory. Database couldn’t be started anymore, not even in nomount (seems logic..).   All the notes are pointing to the kernel-parameter kernel.shmall. But when I recalculate, it should be quite sufficient.

Platform: Suse 9.4, 32-bits. I’m only monitoring  for this customer about something more than hundred objects: 25 databases, some application servers, 13 hosts.

Put more memory in the machine: 8GB: Same problem. 83% of the memory appeared to be diskcache, which should be good. Solution for the moment: startup the database with no sga_max_size. Increased the sga_target with 1GB: no problem. Annoying, as I have to bounce the database when I change sga_target.
For the moment no other solution at hand.

Fixed Initalization Parameters (besides 128MB shared_pool)
————————————
job_queue_processes    10
db_block_size 8192
timed_statistics TRUE
open_cursors 300
session_cached_cursors 200
aq_tm_processes  1
compatible <currently installed Oracle Database release> (default)
undo_management AUTO
undo_retention 10800
undo_tablespace <any acceptable name>
processes 150
log_buffer 1048576
statistics_level TYPICAL (Note that this value is specific only to Enterprise Manager 10g Repository Database release and later.)
TEMP space (Tablespace)Foot 1  50 MB (extending to 100 MB)
_b_tree_bitmap_plans false (hidden parameter)

By |May 10th, 2009|Categories: grid control|0 Comments

Status application server down in Grid Control, while all components are up.

Platform:  application server 10.1.2.0.2 on Suse Linux 9. Grid Control agent 10.2.04.

All components within the application server are up, but the home page of Grid Control showing it down.

Underlying cause: bug in discovering ports by the agent.  Agent discovering process gets Forms url Port from < HOME_IAS>/10.1.2/forms/server/target2add.xml. But this port is not the expected port.

Two actions:

I – changed port in targets.xml in agent-home.

1. Check what is the real port for Forms application url (e.g. 7779 in stead of 7778, you can see this in the home-page of the middle-tier in Grid Control), this is  the Oracle HTTP Server Listen port, take the portlist.ini (if you are sure nothing has changed), or in httpd.conf, or through the Grid Control, search for ports in the http-section.

2. Edit <GRID_AGENT_HOME>/sysman/emd/targets.xml

For Target TYPE=”oracle_forms”  change the port in parameter :  Property NAME=”ServletUrl” — In my case from 7778 to 7779.

3. Restart the agent then check if Forms status change in grid control.

II – Changed port in targets.xml in midtier.

1. Navigate to “$ORACLE_HOME /sysman/emd” and open the “targets.xml” file.

2. Find the line reffering to the “ServletUrl” property. It should look like this:
<Property NAME=”ServletUrl” VALUE=”http://<HOST>:<PORT>/forms/lservlet”/>

3. In the above line replace the PORT with the Oracle HTTP Server Listen port).

5. Save and close the targets.xml

6. Run the following command: “opmnctl reload”

7. Run the following command “opmnctl restartproc process-type=HTTP_Server” – or restart through the Enterprise Manager.

8. Test a couple of times the status of the “Forms” component in Enterprise Manager by refreshing the data (In the EM page on the upper right)
References

But remember: actions like ‘agent -ca’ (rediscovering) may cause this to do the actions above again…

757363.1 : Forms Target Status Shows as Down Although It is UP.
469577.1 : Forms Looks Always Stopped From Oracle Enterprise Manager

By |April 21st, 2009|Categories: grid control|Tags: , |2 Comments

Crash Grid Control agent, too many open files;health check-error

Agent (10.2.0.4) crashes – on a site (64-bit) with many databases (10.2.0.3) a lot, intermittently – , too many open files, emagent.trc gives ‘health check’ error

Agent gave messages – in the past – like this:

Number files opened by Agent is 1140.
These files appeared to be the $ORACLE_HOME/dbs/hc<instance>_.dat which is loaded a lot in memory.

Checked this out by first determining which process the agent has:
–> ps -ef |grep emagent
Then checked the memory map of the process of the agent (here e.g. : 23645)
–> pmap -x 23645
And this is a very, very long list,so I hit the jackpot…

Logging of the emagent.trc ($AGENT_HOME/sysman/log):

2008-04-22 15:29:20,221 Thread-4094679984 ERROR engine: [oracle_database,<rep_database>,health_check] :
nmeegd_GetMetricData failed : Instance Health Check initialization failed due to one of the
following causes: the owner of the EM agent process is not same as the owner of the Oracle
instance processes; the owner of the EM agent process is not part of the dba group; or the
database version is not 10g (10.1.0.2) and above.

Cause: Bug 5872000 – HEALTHCHECK ERROR OCCURS FOR 32BIT DATABASE ON 64BIT OS DUE TO BUG4526916 FIX.
The Healthcheck file, namely $RDBMS_HOME/dbs/hc_.dat file differs in  size from the memory structure used by the Agent to read it. This file is created by the database on startup time, if not present.

This happens when the database is e.g. 10.2.0.4 and the agent 10.2.0.3 and vice versa.
In the latter case this is solved by upgrading the database to 10.2.0.4. or 11.1.0.7.

Possible solutions for myself:

1. Apply Patch 5872000 to 32-bit or 64-bit databases on 64-bit machines.
This needs to be applied on top of 10.1 -> 10.2.0.3, and 11.1.0.6 databases.
The following file may need to be removed from the DATABASE $ORACLE_HOME/dbs directory before starting up the database after patch application: hc_.dat
NOTE: This file is created on database start up if not present. The agent uses this file for the
Healthcheck metric. By recreating the file on start up after the patch application, the file is
the correct one needed by the agent.

An easier way for the time being:
2. Disable the Healthcheck metric per database in grid Control:
This has no consequences for the monitoring the database if it’s still up for example (I tested this first..).
In the Metric and Policy settings page, tab Metrics, you will not see the metric Health Check displayed, even if you choose All metrics instead of the default Metrics with thresholds value in the Drop Down list titled View.

The Health Check metric is a composite metric which includes 7 metrics:
Instance Status
Instance State
Maintenance
Mounted
State Description
Unavailable
Unmounted

a. Go to the database Home Page for which you want to disable this metric
At the bottom pane Related Links, click on the link Metric and Policy Settings
b. Go to the metric Instance Status (or to any other metric belonging to the Health Check metric)
Click on the link in the column “Frequency Schedule”: 15 seconds by default
c. Once in the Edit Collection Settings Home Page
Press the disable button to disable this metric collection or
Change the collection frequency and any other value you want to change in this page
Note: you will see at the bottom of the page a sheet titled “Affected Metrics” which lists all the metrics which will be changed in the same way as the current metric.
You will notice that all the metrics pertaining to the Health Check metric are listed there.
Hence they will all been disabled or they will all have the new frequency collection as the one currently updated.
d. Click on Continue then on OK in order for this changes to be saved in the repository
e. Then click on OK once the Update confirmation received.
A new collection file for this database will be created in the Monitoring Agent of this target in the directory $ORACLE_HOME/sysman/emd/collection.

Oh, and don’t forget to stop and start the agent on the target nodes after you’ve done this.

Grid 10.2.0.5:

Mr Akhtar Tiwana checked this issue with Oracle support,  and they suggested to remove the warning and critical thresholds for health check metrics (making them NULL) and that will do the same. The functionality to disable these metrics apparently have been taken away in grid 10.2.05.

Used sources:
564617.1 Agent Fails on Instance Health Check Following Upgrade To 10.2.0.4
566607.1 Healthcheck Metric Collection Fails Since Agent was Upgraded to 10.2.0.4 on Linux x86-64 platform

By |April 16th, 2009|Categories: grid control|Tags: |2 Comments

Grid Control, OMS : large number of defunct-processes

m4s0n501

On my Grid Control management-server (Suse Linux, OMS version 10.2.0.4) a very large number of <defunct> processes arose what eventually caused the OMS not to respond anymore.

Looked like this:
oracle   16932 15961  0 Mar03 ?        00:00:00 <defunct>
oracle   16987 15961  0 Mar03 ?        00:00:00 <defunct>
oracle   17027 15961  0 Mar03 ?        00:00:00 <defunct>

… etc.

The process what caused this appeared to be the iasconsole:

oracle   15961     1  0 Mar03 ?        00:05:10 /software/oracle/product/GC10g/oms10g/perl/bin/perl /software/oracle/product/GC10g/oms10g/bin/emwd.pl iasconsole /software/oracle/product/GC10g/oms10g/sysman/log/em.nohup

Stopping and starting did clean up the defunct-processes, but only temporarily.

Two solutions

1. Not supported by Oracle :-) , solution contributed by a guy called Seb on the forums (but it works!):
Stop the iasconsole.
– Edit $ORACLE_HOME/bin/IASConsole.pm
– Modify the following line:
from #my $ua = LWP::UserAgent->new(keep_alive=>1);
to my $ua = LWP::UserAgent->new;
– Start the iasconsole.

2. Follow the note 391894.1
– It’s the bug 5504078 Abstract: EMWD.PL SPWANS DEFUNCT PERL PROCESSES AFTER OMS PATCH 10.2.0.2:
The script attempts to locate a file called WINDOWS_NT which on Unix of course does not exist. Consequently a defunct process is created.
– Stop the iasconsole:
emctl stop iasconsole
– cd $ORACLE_HOME/bin
touch Windows_NT
chmod 544 Windows_NT
– Start the iasconsole:
emctl start iasconsole

By |March 31st, 2009|Categories: grid control|Tags: , |0 Comments

Every 5 seconds vpxoci-error in emagent.trc, ora-25228

Noticed that the following error was popping up in the emagent.trc of a 10.2.04 – Grid Control-agent  on a specific node, every 5 seconds. Annoying, unnecessary:

2007-09-18 12:15:14 Thread-134875 ERROR vpxoci: Error on dequeue from SYS.ALERT_QUE: ORA-00604:
error occurred at recursive SQL level 1
ORA-06502: PL/SQL: numeric or value error
ORA-06512: at line 30
ORA-25228: timeout or end-of-fetch during message dequeue from SYS.ALERT_QUEM

Metalink  gave me doc-id 738638.1 as the cause, and gave me a solution: disable an ‘after event’-trigger.  First of all I couldn’t find a trigger (had to search 8 instances..) with the statement they provided. Second of all I couldn’t disable this trigger, as it was an application-trigger, and in use.

The statement they gave was:

select trigger_name,trigger_type,triggering_event from dba_triggers where trigger_type = ‘AFTER EVENT’ and triggering_event =  ‘ERROR';

The following gave me more results:

select trigger_name,trigger_type,triggering_event from dba_triggers where trigger_type = ‘AFTER EVENT’ and triggering_event like ‘%ERROR%';

So I found the trigger, but could not disable this one.  So I updated the trigger with:

begin

if ora_login_user != ‘DBSNMP’ then

…..

end if;

end;

/

Did not need to stop the database, error disappeared right away.

By |March 27th, 2009|Categories: grid control|0 Comments