Saturday, May 22, 2010

ORA-00600: [kgeade_is_0]

An ORA-00600 is something I don't like to see, particularly when I am in the just in time. And it is particularly boring when it is displayed once per each datafile to be renamed.

The Error
ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], []
This error shows up under this circumstances:

The Environment
  1. RDBMS version 10.2.0.4.0
  2. Any Platform
  3. Oracle DB Restore from an RMAN backup at the Standby Site
  4. The original Database is in ASM.
  5. The target Database is in File System.

Diagnostics
While googling i found a recommendation that suggsted to either use ASM or File System, but not both. In my case this is not an Option (and in my personal opinion not a good answer either), my primary site is a Real Application Cluster on Linux x64, so ASM is mandatory since we have no Cluster file system on the primary, and the stanby site is a File System based site.
According to Oracle Support Servivcivec they declared this as a fixable Bug.

Bug 7207932 OERI [kgeade_is_0] when renaming a file from ASM to a filesystem

The funny part of the diagnostics
The regular procedure to have an ORA600 error fixed is by diagnosing this at at the alert.log file, and by querying it at the ORA600 error lookup tool from Oracle Support Services; however, while using this last one to diagnose i found out that it didn't recognize the first argument as a valid argument.

The Correcting Procedure

There are basically two ways to have this problem solved on the standby site, and it is by applying a patch that fixes this issue, the other one is to upgrade to 10.2.0.5.0 or greater, where this bug has already been fixed.

References
  • ORA-00600 [KGEADE_IS_0] When Renaming A File From ASM TO FS [ID 742289.1]
  • Bug 7207932 - OERI [kgeade_is_0] when renaming a file from ASM to a filesystem [ID 7207932.8]

Thursday, May 13, 2010

ORA-02019 : connection description for remote database not found

This error has been appearing on a radom base at the alert.log file. There is no apparent reason for this to appear. And when I take a look at the detailes provided on the trace file the only thing I see is the same error meaningless pattern, the connection descriptor was not found.
In this particular case the contents at the alert.log file showed this information:

Tue Mar 09 15:45:43 2010

Errors in file ...\trace\orapro_reco_4892.trc:

ORA-02019: connection description for remote database not found

From the generated trace file name, it states that it is the RECO background process the one responsible for this trace. RECO is used in distributed transactions, and it happens that when a database goes down RECO tries to resend the pending transactions according to the Two Phase Commit procedure.
In this case there it used to be a database attached to the main database, but it was gone long ago, however some non applied transactions remained in the transaction queue, making RECO to keep on retrying forever. The database was deconfigured and all the related files and file descriptors were removed.

By taking a look at the DBA_2PC_PENDING view there there were a couple of transactions that remained in the queue and which were never applied.

SELECT LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,MIXED,COMMIT#
FROM DBA_2PC_PENDING;


LOCAL_TRAN_ID GLOBAL_TRAN_ID STATE MIXED COMMIT#
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
9.21.102540 ORCL.ee36125b.9.21.102540 collecting no 9462810864
8.32.86528 ORCL.ee36125b.8.32.86528 collecting no 9462814781


In this case, and since there is no target database to apply the transactions to, then it is just enough to get rid of them by means of the PURGE_LOST_DB_ENTRY procedure:

execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('8.32.86528');
execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('9.21.102540');
commit;


That's it, this way the ORA-02019 error is gone.


Wednesday, May 12, 2010

Data Guard ORA-16789 Health Check Warning

After setting up the Physical Dataguard environment some errors still showed up. Even though it looked fine and the Standby database was properly receiving and applying the archivelogs, the Dataguard Broker still showed some errors

DGMGRL> show configuration verbose;

Configuration
Name: SBDBCONFIG
Enabled: YES
Protection Mode: MaxPerformance
Databases:
PRIDB - Primary database
SBDB - Physical standby database

Fast-Start Failover: DISABLED

Current status for "SBDBCONFIG":
Warning: ORA-16608: one or more databases have warnings

Dataguard broker uses an evaluation criteria that is some sort of everything or nothing at all. Either it works fine, or it doesn't but there is no intermediate point. The issue here is that sometimes you have to be patience to find the errors. This is an 11g release, and though the solution found can work on 10g too you should be aware that after the DIAGNOSTIC_DEST parameter in 11g the path to find the log files has changed : ORACLE_BASE/diag/rdbms/ORACLE_SID/ORACLE_SID/trace/drcORACLE_SID.log, the errors displayed there were as follows:


Error Displayed

RSM Error: LogXptMode value 'ASYNC' of requires this database to have status redo logs, but they are not configured.
RSM0: HEALTH CHECK WARNING: ORA-16789: standby redo logs not configured
Operation CTL_GET_STATUS continuing with warning, status = ORA-16789


Solution

In this case it is pretty obvious why it is failing ... there are two solutions, one is to add the required Standby logfiles after the configured LogXptMode ASYNC mode, or change the LogXptMode to ARCH. I choosed the first one,

SQL> ALTER DATABASE ADD STANDBY LOGFILE GROUP 4
2 'C:\ORACLE\APP\ORADATA\SBDB\SBREDO04.RDO' SIZE 16M;
Database altered.


The command is repeated for group 5 and 6. The sizes and paths shown in this example are just for instructional purposes, on the actual production environment the size of the standby redo logs should be the same used for the production logs.


A Second Error Shows Up

The second error shown refers to the fact that since adding the standby redo log files required the recovery process to be stopped, the Data guard manager complained about it.

DGMGRL> show configuration verbose;

Configuration
Name: SBDBCONFIG
Enabled: YES
Protection Mode: MaxPerformance
Databases:
PRIDB - Primary database
SBDB - Physical standby database

Fast-Start Failover: DISABLED


Current status for "SBDBCONFIG":
Warning: ORA-16607: one or more databases have failed


This time the Dataguard logfile reports the following:
DMON: HEALTH CHECK ERROR: ORA-16766: Redo Apply is stopped
Operation CTL_GET_STATUS canceled during phase 1, error = ORA-16766


In this case it was pretty obvious too, the redo apply process was stopped to be able to add the Standby Database while the Active Data Guard was open.

Just restart the Recovery process and that's it.


Solution

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;
Database altered.

DGMGRL> show configuration verbose;

Configuration
Name: SBDBCONFIG
Enabled: YES
Protection Mode: MaxPerformance
Databases:
PRIDB - Primary database
SBDB - Physical standby database

Fast-Start Failover: DISABLED

Current status for "SBDBCONFIG":
SUCCESS


It works as it is supposed to.


References

Oracle® Data Guard Broker
11g Release 2 (11.2)

Part Number E10702-01

Tuesday, May 04, 2010

Errror upgrading RMAN Catalog to 11.1.0.7.0

WARNING.

The following post describes a particular issue I wanted to document and share, the solution found for this issue worked for my particular environment, which does not mean it will work with yours. Dealing with internal Oracle structures without Oracle Support Services approval and supervision will render your database unsupported and it may (and most probably will) compromise its availability.

When performing the upgrade of the recovery catalog database an error showed up. This error has to do with a known issue when upgrading the recovery catalog to 11.1.0.7.0 after applying the patchset against the database.
The upgrade procedure corrupts the DBMS_RCVCAT procedure, so it is enough to get rid of it and have the rman catalog owner to upgrade its catalog from an rman prompt, as follows:

1. Connected to a SQL*Plus prompt as the recovery catalog owner issue the following commands:
sqlplus rcatOwner/rcatPassword
SQL> drop package DBMS_RCVCAT;

2. From a Recovery Manager prompt issue the following command:
rman catalog rcatOwner/rcatPassword
RMAN> UPGRADE CATALOG;
recovery catalog owner is RMAN
enter UPGRADE CATALOG command again to confirm catalog upgrade
RMAN> UPGRADE CATALOG;
recovery catalog upgraded to version 11.01.00.07
DBMS_RCVMAN package upgraded to version 11.01.00.07
DBMS_RCVCAT package upgraded to version 11.01.00.07

At this moment the procedure has successfully upgraded the repository and the DBMS_RCVCAT stored unit has been properly rebuilt.

BUT ... Yes, the big BUT. In my case a particular issue arose. Instead of reading the above successful message this is what showed up instead:

RMAN> connect catalog rman/rman
connected to recovery catalog database
recovery catalog is partially upgraded to 11.01.00.07; UPGRADE CATALOG again
RMAN> upgrade catalog;
recovery catalog owner is RMAN
enter UPGRADE CATALOG command again to confirm catalog upgrade
RMAN> upgrade catalog;
error creating upgcat_57
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-06004: ORACLE error from recovery catalog database: ORA-02293: cannot validate (RMAN.CDF_C_STATUS) - check constraint violated

I hate whe RMAN complains. And it likes to pretty often.

RMAN.CDF_C_STATUS stands for a constraint that is added to the CDF table. This table stores information about the registered datafile copies. This constraint is defined in the ?/rdbms/admin/recover.bsq file and it literally reads as follows:
<<< define upgcat_57
>>>
alter table cdf add constraint cdf_c_status check (status in ('A','U','D','X'))


Notice the clause define upgcat_57 and the error message above, error creating upgcat_57.
Originally the CDF table is created with an offline constraint that can be found in the same file:

CONSTRAINT cdf_c_status CHECK (status in ('A','U','D','X','F')).

But when the upgcat_57 step is executed it magically leaves the 'F' value out.
In my particular case the CDF table, had four registered datafile copies, these copies were already obsoleted, but they were there any way with a 'F' flag. Since the add constraint applied against the repository table it crashed and made the upgrade procedure abort.
The workaround found for this particular case was to update the CDF table and change the 'F' flag in the status column with a 'U' value.
This renders the backups as Unavailable. The flag values are not documented in the file, but as far as I know the 'A' value means Available, meanwhile 'U' stands for unavailable.
After manually updating the column value the procedure to upgrade the repository was run once again from the RMAN prompt, this time it was successful.
In my particular case these backups were taken a very long time ago, and those are no longer available, so there it was no problem at all in my case taking this direction.
However a big issue that comes to my attention is that Oracle either has a bug in this upgrade procedure or it forgot to update the constraint declaration with the proper value set.


Update ...

While I was finding out if this issue could be reproduced on 11g Rel. 2 i found at the recover.bsq file the following lines:

define upgcat_166
<<<>>>

define upgcat_167
<<<>
CHECK (status in ('A','U','D','X','F')) >>>


So it means that someone in Oracle realized about this mistake, then they dropped the current constraint and then add the constraint with all the flags there included.
I can see that if someone has not hit this bug on 11g then most probably they will not realize about it by the time they migrate to 11g Rel. 2.



Update April 30th, 2012.
I applied the upgrade to a 11.2.0.3.0 database and this same issue showed up.   And the same workaround had to be applied.

Recovery Manager: Release 11.2.0.3.0 - Production on Tue May 1 00:20:16 2012

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to recovery catalog database
recovery catalog is partially upgraded to 11.02.00.03

RMAN> upgrade catalog;

recovery catalog owner is RMAN
enter UPGRADE CATALOG command again to confirm catalog upgrade

RMAN> upgrade catalog;

recovery catalog upgraded to version 11.02.00.03
DBMS_RCVMAN package upgraded to version 11.02.00.03
DBMS_RCVCAT package upgraded to version 11.02.00.03

Monday, May 03, 2010

ORA-42012: error occurred while completing the redefinition

An error stack showed up when trying to use the DBMS_REDEFINITION package. After issuing the command

dbms_redefinition.finish_redef_table('MADRID', 'a_table', 'an_int_table', 'a_partition');

The following error stack showed up:

begin
*
ERROR at line 1:
ORA-42012: error occurred while completing the redefinition
ORA-04020: deadlock detected while trying to lock object MADRID.TABLE1_INT
ORA-06512: at "SYS.DBMS_REDEFINITION", line 78
ORA-06512: at "SYS.DBMS_REDEFINITION", line 1680
ORA-06512: at line 2


When I was trying to remove the interim table to restart the redefinition procedure it conflicted with a couple of interim objects created on the fly.

SQL> drop table table1_int;
drop table table1_int
*
ERROR at line 1:
ORA-12083: must use DROP MATERIALIZED VIEW to drop "MADRID"."TABLE1_INT"

SQL> DROP MATERIALIZED VIEW "MADRID"."TABLE1_INT";
Materialized view dropped.

SQL> drop table table1_int;
Table dropped.


This is due to a bug in the 11.2.0 release on every OS platform. This will be fixed in future releases, in the mean time the workaround for this problem is to modify an Oracle Instance parameter, just set the deferred_segment_creation to FALSE and retry the operation.