Wednesday, November 10, 2010

Reading the alert.log as a local table.

Reading the alert.log file is a must for the DBA. This is the primary source of information about the Database. The traditional way to read it is by means of any text based tool that can open the file and lets you take a look at it and look for specific text patterns. This requires access to the Operating System, which probably is something the DBA can take for granted at most shops. However, it happens that because of security issues, access to the Operating System is restricted ... sounds familiar? ... either you have someone to send you the file on a regular basis, you have a link to the file (assuming proper permissions) or you have to create your own routines to access it. Another natural choice is to access it by means of either the EM Control Console or the Grid Console, and the of course an External Table.

In this example I will create an externa table to access the alert.log file

create table ALERT_LOG (
text_line varchar2( 512)
organization external (
default directory BACKGROUND_DUMP_DEST_DIR
access parameters (
records delimited by newline
location( 'alertORCL.log')

This example assumes the BACKGROUND_DUMP_DEST_DIR exists and you have access to it. It also assumes your Oracle instance is the classical ORCL instance.

Once the external table has been created just access it as a regular Oracle table with select only privileges.


Starting ORACLE instance (normal)
Shared memory segment for instance monitoring created
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_10 parameter default value as USE_DB_RECOVERY_FILE_DEST
Autotune of undo retention is turned on.
ILAT =18
SYS auditing is disabled

Starting up ORACLE RDBMS Version:
Using parameter settings in client-side pfile C:\ORACLE\APP\ADMIN\ORCL\PFILE\I
NIT.ORA on machine HECTOR

System parameters with non-default values:
processes = 150
memory_target = 1648M

19 rows selected.

This works for 9i, 10g and 11g, (Rel. 1 and Rel. 2). In the particular case of 11g Oracle created a new X$ view which can be accessed just like any other X$ view to read the alert.log file, the X$DBGALERTEXT view. You must have proper privileges to access it.

Reference: Oracle Data and Database Management Utilities. Chapter 3

Tuesday, October 12, 2010

Oracle Magazine Nov-Dec 2010

I want to thank Oracle Magazine for publishing the interview at the Peer-To-Peer section


"Which new features in Oracle Database are you currently finding most valuable? Oracle Automatic Storage Management and the Volume Manager. Another nice feature is Secure Files, which improves performance, optimizes storage, and provides an additional security layer.

What advice do you have for those just getting into application development? When designing an application, seriously evaluate the amount of intelligence it’s going to manage. This will define the amount of coding, complexity, round-trips to the server, and scalability. And be aware that the application user is different from the big database user.

Tell us about your role with Oracle University (OU) and the value you see in this program. I’ve been a certified OU instructor for more than 15 years, and I really believe that an OU course is the best way to get acquainted with Oracle technology. The courses provide a good balance between theory and practice, and professionals can start being productive the day the course concludes."

Monday, July 19, 2010

ORA-00942 Querying Tablespaces from Enterprise Manager

In order to meet the minimum security requirements, the SYSTEM user use must be restricted, only the actual DBA must have access to it. Other users requiring access to Enterprise Manager to monitor and "view" must be granted especific minimum privileges.
The SELECT_CATALOG_ROLE and the CONNECT roles are good enough to see most of the E.M. contents, but when trying to access the Tablespaces page from Enterprise Manager the ORA-00942 error shows up.
When tracing I discovered the query used to fill up the Tablespaces Page:

SELECT /*+first_rows */ d.tablespace_name, NVL(a.bytes / 1024 / 1024, 0), DECODE(d.contents,'UNDO', NVL(u.bytes, 0)/1024/1024, NVL(a.bytes - NVL(f.bytes, 0), 0)/1024/1024), DECODE(d.contents,'UNDO', NVL(u.bytes / a.bytes * 100, 0), NVL((a.bytes - NVL(f.bytes, 0)) / a.bytes * 100, 0)), a.autoext, DECODE(d.contents,'UNDO', NVL(a.bytes - NVL(u.bytes, 0), 0)/1024/1024, NVL(f.bytes, 0) / 1024 / 1024), d.status, a.count, d.contents, d.extent_management, d.segment_space_management/*, d.encrypted*/ FROM sys.dba_tablespaces d, (select tablespace_name, sum(bytes) bytes, count(file_id) count, decode(sum(decode(autoextensible, 'NO', 0, 1)), 0, 'NO', 'YES') autoext from dba_data_files GROUP BY tablespace_name) a, (select tablespace_name, sum(bytes) bytes from dba_free_space group by tablespace_name) f, (SELECT tablespace_name, SUM(bytes) bytes FROM (SELECT tablespace_name,sum (bytes) bytes,status from dba_undo_extents WHERE status ='ACTIVE' group by tablespace_name,status
SELECT tablespace_name,sum(bytes) bytes,status from dba_undo_extents WHERE status ='UNEXPIRED' group by tablespace_name,status ) group by tablespace_name ) u WHERE d.tablespace_name = a.tablespace_name(+) AND d.tablespace_name = f.tablespace_name(+) AND d.tablespace_name = u.tablespace_name(+) AND NOT (d.extent_management = 'LOCAL' AND d.contents = 'TEMPORARY') AND d.tablespace_name like '%' /*:1*/ UNION ALL SELECT d.tablespace_name, NVL(a.bytes / 1024 / 1024, 0), NVL(t.bytes, 0)/1024/1024, NVL(t.bytes / a.bytes * 100, 0), a.autoext, (NVL(a.bytes ,0)/1024/1024 - NVL(t.bytes, 0)/1024/1024), d.status, a.count, d.contents, d.extent_management, d.segment_space_management/*, d.encrypted*/ FROM sys.dba_tablespaces d, (select tablespace_name, sum(bytes) bytes, count(file_id) count, decode(sum(decode(autoextensible, 'NO', 0, 1)), 0, 'NO', 'YES') autoext from dba_temp_files group by tablespace_name) a, (select ss.tablespace_name , sum((ss.used_blocks*ts.blocksize)) bytes from gv$sort_segment ss, sys.ts$ ts
where ss.tablespace_name = group by ss.tablespace_name) t WHERE d.tablespace_name = a.tablespace_name(+) AND d.tablespace_name = t.tablespace_name(+) AND d.extent_management = 'LOCAL' AND d.contents = 'TEMPORARY' and d.tablespace_name like '%' /*:2*/ ORDER BY 1;

Basically all of the views mentioned here are accessible through the SELECT_CATALOG_ROLE, but SYS.TS$, which requires an explicit SELECT privilege granted from SYS on this view to the user.

Wednesday, June 02, 2010

ORA-00600 Stellium

ORA-00600 [ARG1] [ARG2] [ARG3] [ARG4] [ARG5]
This is the kind of things a production DBA does not want to read at the alert.log file, particularly when the production database availability is compromised. At this time the only thing we would like to have is a magic wand that could make this dreaded ORA600 errors magically disappear. The truth is that so far there is no such magic wand and we all have to go through a troubleshooting procedure.

This is not a troubleshooting guide, and if you are reading this now it means either you are following my blog or you are googling after a problem you currently face and you have no clue on what is going on.

The basic troubleshooting procedure
Make sure you are at the latest patchset available for your current installed Oracle Home.
Collect and pack all the trace, and log files and raise a Service Request (SR) at Oracle Support Services (OSS).

In the mean time, try to read through the quasi-encrypted text trace and log files to find out if there is a known pattern, a readable entry point function, a particular environmental circumstance that precluded the ORA600, any thing that can help you out answer to the very basic question ... What's the root problem?

In my particular environment I had an ORA-00600 Stellium (a group of planets that from the observer's perspective they are linked together in a series of continuous conjunctions) It was a cascade of ORA-00600 errors that used to periodically flood my alert.log, trace files and incident library.

The ORA-00600 Stellium
ORA-00600 [kkdcacr: ptn_kxcp]
ORA-00600 [qerrmOFbu:invalid rowcount]
ORA-00600 [kkpamRFGet()+369]

The ORA-07445 companions
ORA-07445: exception encountered: core dump [kxccexi()+1219] [ACCESS_VIOLATION] [ADDR:0xC45] [PC:0x263EFAF] [UNABLE_TO_READ] [].

The technical environment
Oracle 11g Rel. 1 (
MS Windows 2003 x64 R2 Server 5.2 SP 2
PowerEdge 2950
4 CPU's
16 Gb RAM

The pattern found was, during a huge transactional activity on several partitioned tables containing spatial data columns suddenly all connections were broken and in order to continue with the production environment a the Oracle instance had to be restarted.

There was no specific time for this error to show up, it could be at night, at noon, on weekends, so there was actually not a specific time pattern. The key to solve this issue lied on the fact that there was too much transactional activity. After digging into the trace files I found out that there were memory issues, not related with the amount, but rather with the allocation. After the Oracle instance was running for a while (several days) on a stressful transactional activity it started to cause instability withe the memory allocation policies.

The SR Feedback
So far OSS told us the recurrence of the Ora600 errors documented at the production environment are due to a bug not yet identified so far by Oracle. The only tasks that can be done so far are to upgrade to and wait for a patch that would be released on May 2010 that probably fixes this issue. The main argument OSS provided was the fact that the error could not be reproduced on their environment.

At this point Oracle does not have a patch or a direct procedure to avoid the kind of ORA600 errors seen at the production environment. The notes and white papers documented so far point to some issues found on the Intel x64 platform based environments. So it can happen both at Linux or Windows.

The troubleshooting path I followed in this case was:
  • Compile and pack the facts (trace and log files in an incident pack)
  • Raise a Service Request Informing Oracle about the issues
  • Track with the assigned Oracle Support Analyst
  • Downgrade the memory parameters behavior from 11g to 10g
  • Upgrade from to
  • Upgrade Client Machines
  • Upgrade each client machine where an oracle client is installed
  • Upgrade the middle tier client that connects to the Oracle database.
  • Revert back the Oracle 11g memory allocation policy to 10g.

Three key factors
After struggling for months with these issues I was able to identify three key factors:
Upgrade to Upgrading provides three advantageous points, first you don't have to wait for OSS to ask for an upgrade when raising a SR, second you prevent the root cause to be due after some known bugs, and third it is part of the best practices.
Revert back Memory allocation policy to 10g. 11g introduced a new instance parameter memory_target, which defines the total SGA + PGA memory allocated for the current instance, this way you don't have to worry about the way memory is individually distributed among the different memory component, meanwhile in 10g there were two different parameters, one named sga_target and the second pga_aggregate_target, this divided the memory allocation in two parts. In my personal belief I am strongly convinced that there was an issue with this new allocation policy on this particular Windows 2003 x64 platform. 11g for windows is the last port released by Oracle Corp., not too many big companies use Windows as the main production O.S. platform, and the new features have to go through a testing that sometimes cannot thoroughly reproduce a heavy weight production environment, so it is very likely that at the extreme conditions systems are sometimes exposed a very hidden in the deep internals bug is hit. This was the case in this environment, so I suggested to revert back to the 10g memory allocation policy.
Rewrite the application. Leaning on the original development team is great, you can manipulate the source code and have it tailored in case it is required. In this case there were some applications that wildly hit the production environment with a huge amount of transactions. In this particular point the credit goes to Anil R. who knew exactly how to rewrite the code that was still triggering an ORA-07455 error after the DB was upgraded.

After one month of activity no errors have shown up, and even though O.S.S. was not able to issue a conclusive reason why these errors showed up, the troubleshooting procedure was so far successful.

An ORA-00600 is still an error that lies within the Oracle jurisdiction, don't try to rewrite the oracle.exe code in case this kind of errors show up, but at least try to do your very best to abstract the 600 Black Box by identifying the key factors that trigger the error, and reading through the log and trace files. Answer the question Why this error showed up? and absolutely the very first troubleshooting point, raise a Service Request at Oracle Support Services.

Saturday, May 22, 2010

ORA-00600: [kgeade_is_0]

An ORA-00600 is something I don't like to see, particularly when I am in the just in time. And it is particularly boring when it is displayed once per each datafile to be renamed.

The Error
ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], []
This error shows up under this circumstances:

The Environment
  1. RDBMS version
  2. Any Platform
  3. Oracle DB Restore from an RMAN backup at the Standby Site
  4. The original Database is in ASM.
  5. The target Database is in File System.

While googling i found a recommendation that suggsted to either use ASM or File System, but not both. In my case this is not an Option (and in my personal opinion not a good answer either), my primary site is a Real Application Cluster on Linux x64, so ASM is mandatory since we have no Cluster file system on the primary, and the stanby site is a File System based site.
According to Oracle Support Servivcivec they declared this as a fixable Bug.

Bug 7207932 OERI [kgeade_is_0] when renaming a file from ASM to a filesystem

The funny part of the diagnostics
The regular procedure to have an ORA600 error fixed is by diagnosing this at at the alert.log file, and by querying it at the ORA600 error lookup tool from Oracle Support Services; however, while using this last one to diagnose i found out that it didn't recognize the first argument as a valid argument.

The Correcting Procedure

There are basically two ways to have this problem solved on the standby site, and it is by applying a patch that fixes this issue, the other one is to upgrade to or greater, where this bug has already been fixed.

  • ORA-00600 [KGEADE_IS_0] When Renaming A File From ASM TO FS [ID 742289.1]
  • Bug 7207932 - OERI [kgeade_is_0] when renaming a file from ASM to a filesystem [ID 7207932.8]

Thursday, May 13, 2010

ORA-02019 : connection description for remote database not found

This error has been appearing on a radom base at the alert.log file. There is no apparent reason for this to appear. And when I take a look at the detailes provided on the trace file the only thing I see is the same error meaningless pattern, the connection descriptor was not found.
In this particular case the contents at the alert.log file showed this information:

Tue Mar 09 15:45:43 2010

Errors in file ...\trace\orapro_reco_4892.trc:

ORA-02019: connection description for remote database not found

From the generated trace file name, it states that it is the RECO background process the one responsible for this trace. RECO is used in distributed transactions, and it happens that when a database goes down RECO tries to resend the pending transactions according to the Two Phase Commit procedure.
In this case there it used to be a database attached to the main database, but it was gone long ago, however some non applied transactions remained in the transaction queue, making RECO to keep on retrying forever. The database was deconfigured and all the related files and file descriptors were removed.

By taking a look at the DBA_2PC_PENDING view there there were a couple of transactions that remained in the queue and which were never applied.


9.21.102540 ORCL.ee36125b.9.21.102540 collecting no 9462810864
8.32.86528 ORCL.ee36125b.8.32.86528 collecting no 9462814781

In this case, and since there is no target database to apply the transactions to, then it is just enough to get rid of them by means of the PURGE_LOST_DB_ENTRY procedure:


That's it, this way the ORA-02019 error is gone.

Wednesday, May 12, 2010

Data Guard ORA-16789 Health Check Warning

After setting up the Physical Dataguard environment some errors still showed up. Even though it looked fine and the Standby database was properly receiving and applying the archivelogs, the Dataguard Broker still showed some errors

DGMGRL> show configuration verbose;

Enabled: YES
Protection Mode: MaxPerformance
PRIDB - Primary database
SBDB - Physical standby database

Fast-Start Failover: DISABLED

Current status for "SBDBCONFIG":
Warning: ORA-16608: one or more databases have warnings

Dataguard broker uses an evaluation criteria that is some sort of everything or nothing at all. Either it works fine, or it doesn't but there is no intermediate point. The issue here is that sometimes you have to be patience to find the errors. This is an 11g release, and though the solution found can work on 10g too you should be aware that after the DIAGNOSTIC_DEST parameter in 11g the path to find the log files has changed : ORACLE_BASE/diag/rdbms/ORACLE_SID/ORACLE_SID/trace/drcORACLE_SID.log, the errors displayed there were as follows:

Error Displayed

RSM Error: LogXptMode value 'ASYNC' of requires this database to have status redo logs, but they are not configured.
RSM0: HEALTH CHECK WARNING: ORA-16789: standby redo logs not configured
Operation CTL_GET_STATUS continuing with warning, status = ORA-16789


In this case it is pretty obvious why it is failing ... there are two solutions, one is to add the required Standby logfiles after the configured LogXptMode ASYNC mode, or change the LogXptMode to ARCH. I choosed the first one,

Database altered.

The command is repeated for group 5 and 6. The sizes and paths shown in this example are just for instructional purposes, on the actual production environment the size of the standby redo logs should be the same used for the production logs.

A Second Error Shows Up

The second error shown refers to the fact that since adding the standby redo log files required the recovery process to be stopped, the Data guard manager complained about it.

DGMGRL> show configuration verbose;

Enabled: YES
Protection Mode: MaxPerformance
PRIDB - Primary database
SBDB - Physical standby database

Fast-Start Failover: DISABLED

Current status for "SBDBCONFIG":
Warning: ORA-16607: one or more databases have failed

This time the Dataguard logfile reports the following:
DMON: HEALTH CHECK ERROR: ORA-16766: Redo Apply is stopped
Operation CTL_GET_STATUS canceled during phase 1, error = ORA-16766

In this case it was pretty obvious too, the redo apply process was stopped to be able to add the Standby Database while the Active Data Guard was open.

Just restart the Recovery process and that's it.


Database altered.

DGMGRL> show configuration verbose;

Enabled: YES
Protection Mode: MaxPerformance
PRIDB - Primary database
SBDB - Physical standby database

Fast-Start Failover: DISABLED

Current status for "SBDBCONFIG":

It works as it is supposed to.


Oracle® Data Guard Broker
11g Release 2 (11.2)

Part Number E10702-01

Tuesday, May 04, 2010

Errror upgrading RMAN Catalog to


The following post describes a particular issue I wanted to document and share, the solution found for this issue worked for my particular environment, which does not mean it will work with yours. Dealing with internal Oracle structures without Oracle Support Services approval and supervision will render your database unsupported and it may (and most probably will) compromise its availability.

When performing the upgrade of the recovery catalog database an error showed up. This error has to do with a known issue when upgrading the recovery catalog to after applying the patchset against the database.
The upgrade procedure corrupts the DBMS_RCVCAT procedure, so it is enough to get rid of it and have the rman catalog owner to upgrade its catalog from an rman prompt, as follows:

1. Connected to a SQL*Plus prompt as the recovery catalog owner issue the following commands:
sqlplus rcatOwner/rcatPassword
SQL> drop package DBMS_RCVCAT;

2. From a Recovery Manager prompt issue the following command:
rman catalog rcatOwner/rcatPassword
recovery catalog owner is RMAN
enter UPGRADE CATALOG command again to confirm catalog upgrade
recovery catalog upgraded to version
DBMS_RCVMAN package upgraded to version
DBMS_RCVCAT package upgraded to version

At this moment the procedure has successfully upgraded the repository and the DBMS_RCVCAT stored unit has been properly rebuilt.

BUT ... Yes, the big BUT. In my case a particular issue arose. Instead of reading the above successful message this is what showed up instead:

RMAN> connect catalog rman/rman
connected to recovery catalog database
recovery catalog is partially upgraded to; UPGRADE CATALOG again
RMAN> upgrade catalog;
recovery catalog owner is RMAN
enter UPGRADE CATALOG command again to confirm catalog upgrade
RMAN> upgrade catalog;
error creating upgcat_57
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-06004: ORACLE error from recovery catalog database: ORA-02293: cannot validate (RMAN.CDF_C_STATUS) - check constraint violated

I hate whe RMAN complains. And it likes to pretty often.

RMAN.CDF_C_STATUS stands for a constraint that is added to the CDF table. This table stores information about the registered datafile copies. This constraint is defined in the ?/rdbms/admin/recover.bsq file and it literally reads as follows:
<<< define upgcat_57
alter table cdf add constraint cdf_c_status check (status in ('A','U','D','X'))

Notice the clause define upgcat_57 and the error message above, error creating upgcat_57.
Originally the CDF table is created with an offline constraint that can be found in the same file:

CONSTRAINT cdf_c_status CHECK (status in ('A','U','D','X','F')).

But when the upgcat_57 step is executed it magically leaves the 'F' value out.
In my particular case the CDF table, had four registered datafile copies, these copies were already obsoleted, but they were there any way with a 'F' flag. Since the add constraint applied against the repository table it crashed and made the upgrade procedure abort.
The workaround found for this particular case was to update the CDF table and change the 'F' flag in the status column with a 'U' value.
This renders the backups as Unavailable. The flag values are not documented in the file, but as far as I know the 'A' value means Available, meanwhile 'U' stands for unavailable.
After manually updating the column value the procedure to upgrade the repository was run once again from the RMAN prompt, this time it was successful.
In my particular case these backups were taken a very long time ago, and those are no longer available, so there it was no problem at all in my case taking this direction.
However a big issue that comes to my attention is that Oracle either has a bug in this upgrade procedure or it forgot to update the constraint declaration with the proper value set.

Update ...

While I was finding out if this issue could be reproduced on 11g Rel. 2 i found at the recover.bsq file the following lines:

define upgcat_166

define upgcat_167
CHECK (status in ('A','U','D','X','F')) >>>

So it means that someone in Oracle realized about this mistake, then they dropped the current constraint and then add the constraint with all the flags there included.
I can see that if someone has not hit this bug on 11g then most probably they will not realize about it by the time they migrate to 11g Rel. 2.

Update April 30th, 2012.
I applied the upgrade to a database and this same issue showed up.   And the same workaround had to be applied.

Recovery Manager: Release - Production on Tue May 1 00:20:16 2012

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to recovery catalog database
recovery catalog is partially upgraded to

RMAN> upgrade catalog;

recovery catalog owner is RMAN
enter UPGRADE CATALOG command again to confirm catalog upgrade

RMAN> upgrade catalog;

recovery catalog upgraded to version
DBMS_RCVMAN package upgraded to version
DBMS_RCVCAT package upgraded to version

Monday, May 03, 2010

ORA-42012: error occurred while completing the redefinition

An error stack showed up when trying to use the DBMS_REDEFINITION package. After issuing the command

dbms_redefinition.finish_redef_table('MADRID', 'a_table', 'an_int_table', 'a_partition');

The following error stack showed up:

ERROR at line 1:
ORA-42012: error occurred while completing the redefinition
ORA-04020: deadlock detected while trying to lock object MADRID.TABLE1_INT
ORA-06512: at "SYS.DBMS_REDEFINITION", line 78
ORA-06512: at "SYS.DBMS_REDEFINITION", line 1680
ORA-06512: at line 2

When I was trying to remove the interim table to restart the redefinition procedure it conflicted with a couple of interim objects created on the fly.

SQL> drop table table1_int;
drop table table1_int
ERROR at line 1:

Materialized view dropped.

SQL> drop table table1_int;
Table dropped.

This is due to a bug in the 11.2.0 release on every OS platform. This will be fixed in future releases, in the mean time the workaround for this problem is to modify an Oracle Instance parameter, just set the deferred_segment_creation to FALSE and retry the operation.

Wednesday, April 07, 2010

ORA-39714 After upgrade

After the execution of the @?/rdbms/admin/catupgrd.sql script on an database on Windows x32 when trying to open the database it showed up this ORA-39714 error. As per the error catalog manual, it stated :

ORA-39714: upgrade script utlmmig.sql failed
Cause: A normal database open was attempted, but the upgrade script utlmmig.sql failed to complete.
Action: Use the UPGRADE option when opening the database and then run utlmmig.sql.

As per the manual, just ran the utlmmig.sql script. That's it.

I need to further find out why this showed up. I was trying to move a 64bit database to a 32bit platform, so I had to deal with the dabase objects and the data dictionary to have it working after downgrading to the 32bit platform. I had to recompile the database objects with the utlirp and utlrp procedures.

Friday, February 19, 2010

Error in Agent trying to install Grid Control

An error showed when I tried to install Oracle Grid Control 1o.2.0 on a Windows 2003 SE x32. The error was pretty much the same as the error shown with the regular Enterprise Manager Console on this platform, it has to do with the time zone issue I have already reported some time ago in this blog.

Sometimes because of the timeout involved during the installation process, it may happen that after a second attempt the Agent can be installed, afterwards then take a look at the log files, there may be some major issue that requires special attention.

I took a look at the \log directory and looked for the emdctl.trc file, there I found the following lines:

2010-02-19 15:30:52 Thread-1744 ERROR main: nmectla_agentctl: Error connecting to Returning status code 1
2010-02-19 15:30:53 Thread-2872 ERROR main: nmectl.c: nmectl_validateTZRegion, agentTZoffset =-360,and testTZoffset for GMT:0 do not match
2010-02-19 15:30:54 Thread-2872 ERROR main: nmectl.c: nmectl_validateTZRegion, agentTZoffset =-360,and testTZoffset for GMT:0 do not match

2010-02-19 15:49:34 Thread-2552 ERROR main: nmectla_agentctl: Error connecting to Returning status code 1
2010-02-19 15:49:44 Thread-2660 WARN http: snmehl_connect: connect failed to ( No connection could be made because the target machine actively refused it. (error = 10061)

By the way, is a fictitious server and does not have anything to do with Oracle corp.

At the log file a particular error came to may attention, the TZ Error. This one has to do with the time zone changes that took place some time back. I will apply the latest patchset on top of it, so I am not too much concerned about fixing it at the time, but since I want to have a 'clean' install I worked around this by commenting the line found at the \config\ file (a routine backup is highly advisable) and commented the last line. I changed the time zone in the windows machine and by the time the assistant is re-run it takes the right Time Zone.

###HRM: agentTZRegion=GMT

That's it, my installation took place and I can proceed with the next tasks.

Wednesday, February 17, 2010

My recent experience with VMWare

VMware ate my four CPU Cores

Today I noticed the high amount of CPU consumed by the vmware-authd.exe process. It raised the CPU consumption to 100% and leaves no CPU resources to any other process in the system.

The environment is a Windows 7 professional 64 bits with 4GB Ram and 4 cores (all of them at the top 100%).

The most recent post I found so far by googling is from Oct 19th 2009, and the author of the post states that a bug was filed (483679).

vmware-authd.exe is the executable for the VMware Authorization and Authentication Service for starting and accessing virtual machines. This process is required if you are not logged in with administrative privileges (which by the way is my case).

I shuted down this service. In the VMWare 2.0 edition the VMWare Auth. service is dependent from the VMWare Host Agent which provides remote command and administrative control over this VMWare Server host. After shutting down both processes the CPU monitor was back to normal.

This process is required in the VMWare Server 2.0 version, otherwise it will no be possible to launch the console. By shutting down and restarting the services the problem seems to be fixed. Most probably it has to do with a bug.

On the other hand, I don't have anything against the new Tomcat based console, but I miss the regular windows based console.

Host Unreachable

When trying to connect to my virtual machines through the network, they replied with a 'host unreachable' error, among other unpleasant related network errors.

The current environment I have:
  • DHCP on host real network adapter
  • Loopback adapter on real host
  • A fixed IP address through a bridged virtual network adapter

In this scenario the loopback adapter was required since I am installing an Oracle 11g Rel. 1 / 2 on this platform. The virtual machines were not visible or either were partially and intermittently visible.

I put the virtual network adapter to have a dynamic address provided by the DHCP server and it began to work. However I still need a fixed IP Address at the virtual machine since I am installing Oracle RDBMS Oracle 11g Rel 1 on top of it. So I installed a Loopback adapter inside the Virtual Machine.

I went through several Google references and most of them talked about de-installing and re-installing the VM protocol from the real network adapter, but it only lead me in this particular case to a waste of time and a server reboot. So far, everything is properly working.

When I pinged the host server I noticed a lot of time waiting for a reply, and when I tried to access a shared path from it my local machine replied with a timeout. Two issues were involved here, first the user at the host server is a domain user, not a local user, and the second one, it has to solve the hostname first, which took too long, so I added the address from the host to the hosts file at the virtual machine.

Network Communication to the VM was Deeeeeeadly Slooooooooooow

It is not enough eh!, well I tried to perfom file transfer from my host to the virtual machine, it happened that the performance was around 20Kb/s. so figure out what it was to transfer the Oracle XE installer executable (200,000 Kb), around 2:45 hrs to transfer the whole file, I don't want to image how long it would last to transfer the more than 1G file to install Ora11gR1 to the VM. Google you are the Geek's Nirvana, after googling a while I found a reference that stated some network parameters had to be configured, it was the advanced properties for the network adapter at the physical host.

At the advanced properties of the network adapter ther is one named Large Send Offload v2 (IPv4), this must turned off (disabled), it boosted the network performance to the Virtual Machine. A definition of what this parameter does can be found at this Microsoft Tech Note.

Disables offloading of processor tasks to the network adapter. Offloading is designed to optimize performance of Windows 2000.

Network Driver Interface Specification (NDIS) 5.0 lets TCP take full advantage of intelligence in network adapters by letting the adapter do some of the tasks that the processor normally performs. Offloading these tasks to the network adapter leaves the processor free for tasks that only it can perform."

A document that may help with some additional performance hints can be found here "VMWare ESX Server Performance Tuning Best Practices".

I go back to the VMWare, this VMWare session was very instructional for me.

Friday, January 29, 2010

Performance Problems with my Workstation

My computer was deadly slow, several processes took over the computer resources, such as CPU, which almost always was permanently at 100%, IO resources, several processes were reading several Gb's of data, (what for?), it was a nightmare. The bigger the computers grow the slower they can become. I have a Laptop I use as a server with several VM Machines running when required which serve as test environments. I use Windows XP, yeap, I personally don't like Vista and if possible I avoided new releases, except for Windows 2003 and Windows 2008.

One of the resource consuming processes was the jqs.exe, a process that runs in the background with low priority but which can lead your computer to IO starvation. JQS is an acronym after Java Quick Starter, it is used to speed up the launch of java applets, but if you are not launching Java applets too much frequently it is not worth keeping it running. Now, if you want to disable the Java Quick Starter process you can do that in the Windows Control Panel. You find a Java entry there which will open the Java Control Panel. A click on Advanced and the selection of Miscellaneous will display the activated Java Quick Starter entry. Uncheck the box to disable the process. You have gotten rid of the jqs.exe process. Hasta la vista baby! ...

CCSCHST.EXE is another heavy weight process I want to get rid of, this process creates another IO and CPU bottleneck. I have Norton Antivirus, this was not a good choice for an anti-vrus, it deadly slowed down the computer performance. It is being said that it has to do with a conflict between the Microsoft update process and Norton.

I disabled the automatic windows update feature from the control panel. A red alert is displayed, this let me remember that I have to manually connect to the Microsoft site to perform the updates. Here it is an article that describes what is going on with this issue Norton Vs. Microsoft . I disabled the NAV security features so that Norton does not try to start the live update on its own. I choosed to manually launch the updates.
In the case of Norton AntiVirus I had to leave several features to the minimum, or even disable them from the NAV control panel.

Another nice application that consumes a lot of resources, particularly during system startup is the Weather Channel desktop, it is a very useful and nice application, particularly considering these unpredictable winter weather, but neither my machine nor my work are willing to go through the additional overhead required to startup this application at windows start time. So I disabled it from starting at windows start time. I'll go through the Weather channel web page to keep updated about the weather.

I also have the Virtual Machine server from VMWare, it requires several resources during startup, but this is one of the application I use the most, so there is not too many options but to let it be.

And finally firefox, I love FireFox, but I do not know what happened to the latest releases, particularly 3.X. There have been several upgrades, it added new features, and it is pretty nice, but it requires more resources. Either way after getting rid of all of the ballast my machine had to carry on during startup and production time, there are enough free system resources for fireFox, so this is definitely out of my personal deinstallation list.