ERROR: unrecoverable error ORA-29701 raised in ASM I/O path when using Golden Gate replication process

February 19, 2011 by · 5 Comments 

Yesterday my DBA colleagues had a very challenging day, but I was just a “remote observer” since I couldn’t show on site. But, I have decided to blog about that situation, since we couldn’t find any similar information neither on google or metalink, so you might find is useful. Thanks to my colleague and friend Baki Sahin from Turkey who had a great hunch and found a workaround.

We have a system running on HP-UX 11.31i with latest patch set and Oracle 11g R2 (11.2.0.2) running as a single instance using ASM. Everything was installed and configured almost 2 months before the problem occurred (separate grid user for CSS, separate oracle user for the oracle binaries, additional OS groups etc …); OS was also inspected, kernel parameters were not even 10% utilized; UAT and performance tests were ongoing, as well as various administrative tasks on the database as well as testing 11gr2 new features and everything was fine.

However, when we started with the Golden Gate replication process, the alert log of the database instance had begun to show the following ORA errors: ERROR: unrecoverable error ORA-29701 raised in ASM I/O path and started to terminate Golden Gate processes. Even more, creating new session from other modules was very slow, if not impossible and after a while the instance hanged.

ERROR: unrecoverable error ORA-29701 raised in ASM I/O path; terminating process 6507
Fri Feb 18 13:51:37 2011
Thread 1 cannot allocate new log, sequence 4824
Private strand flush not complete
  Current log# 7 seq# 4823 mem# 0: +REDO00/redo10a.rdo
  Current log# 7 seq# 4823 mem# 1: +REDO01/redo10b.rdo
Thread 1 advanced to log sequence 4824 (LGWR switch)
  Current log# 8 seq# 4824 mem# 0: +REDO00/redo08a.rdo
  Current log# 8 seq# 4824 mem# 1: +REDO01/redo08b.rdo
Fri Feb 18 13:52:34 2011
ERROR: unrecoverable error ORA-29701 raised in ASM I/O path; terminating process 6509
Fri Feb 18 13:53:37 2011
ERROR: unrecoverable error ORA-29701 raised in ASM I/O path; terminating process 6506
Fri Feb 18 13:54:28 2011
Thread 1 cannot allocate new log, sequence 4825
Private strand flush not complete
  Current log# 8 seq# 4824 mem# 0: +REDO00/redo08a.rdo
  Current log# 8 seq# 4824 mem# 1: +REDO01/redo08b.rdo
Thread 1 advanced to log sequence 4825 (LGWR switch)
  Current log# 1 seq# 4825 mem# 0: +REDO00/redo01a.rdo
  Current log# 1 seq# 4825 mem# 1: +REDO01/redo01b.rdo
Fri Feb 18 13:54:40 2011
ERROR: unrecoverable error ORA-29701 raised in ASM I/O path; terminating process 26844
Fri Feb 18 13:55:43 2011
ERROR: unrecoverable error ORA-29701 raised in ASM I/O path; terminating process 6508

Okey, we started with the witch hut, by checking permissions of the underlying ASM disks, user and group permissions, because in most of the cases Oracle complaints with ORA-29701 when communication with cluster manager is not established. This was not the case, because CSS were up and running and alert log belonging to the CSS was clean; even crs_stat showed that all services were up (except the ONS which we don’t use it) and running.

[grid@somehost:+ASM]/home/grid$ crs_stat -t

Name           Type           Target    State     Host
------------------------------------------------------------
ora....TA00.dg ora....up.type ONLINE    ONLINE    somehost
ora....DO00.dg ora....up.type ONLINE    ONLINE    somehost
ora....DO01.dg ora....up.type ONLINE    ONLINE    somehost
ora....PR.lsnr ora....er.type ONLINE    ONLINE    somehost
ora....SM.lsnr ora....er.type ONLINE    ONLINE    somehost
ora.asm        ora.asm.type   ONLINE    ONLINE    somehost
ora.cssd       ora.cssd.type  ONLINE    ONLINE    somehost
ora.diskmon    ora....on.type ONLINE    ONLINE    somehost
ora.evmd       ora.evm.type   ONLINE    ONLINE    somehost
ora.ons        ora.ons.type   OFFLINE   OFFLINE

Running out of ideas, we have completely shutdown the database + CSS and rebooted the machine. After that we started up everything and played around with switching the redo logs, creating very big tables, making parallel queries out of huge tables and everything was working normally. Just like hot and cold when we started Golden Gate replication we got the same error again … ERROR: unrecoverable error ORA-29701 raised in ASM I/O.
We found out two very useful blog posts from Pythian Group (Don Seiler) and Surachart Opun regarding hidden Oracle folders with files related to CSS (I am still puzzled why Oracle did that); Of course we tried all of that, but were back to point 0.

/var/tmp/.oracle hidden directory
Beware the /var/tmp/.oracle Hidden Directory!

Workaround

When Baki disabled auditing (AUDIT_TRAIL = none), the Golden Gate replication process was working perfectly and ORA-29701 disappeared from the alert log. Everybody are very busy with other tasks, so our troubleshooting window is severely limited. But, once we find more time, we will gather more info and try to open an SR to Oracle.

About Nakinov
http://de.linkedin.com/in/nakinov

Comments

5 Responses to “ERROR: unrecoverable error ORA-29701 raised in ASM I/O path when using Golden Gate replication process”
  1. LisaG says:

    We had the exact same problem. Oracle has filed a bug on our behalf (Bug 12576651: ORA-29701 FROM 11.2 DATABASE FOR NON-DBA ACCOUNTS) but based on the strace we found a work around for our situation.

    The issue was the user who received the error didn’t have osdba (and we couldn’t grant it based on ic controls).

    The strace showed errors opening a file in $GRID_HOME/auth/css/

    We changed permissions of (note not using my real users for security reasons and your environment may vary.

    From :
    root:osdba 750
    /auth grid_owner:osdba 770
    /auth/css grid_owner:osdba 770
    /auth/css/ grid_owner:osdba 770

    To:
    root:osdba 755
    /auth grid_owner:osdba 755
    /auth/css grid_owner:osdba 755
    /auth/css/ grid_owner:osdba 775

  2. LisaG says:

    We had the same problem. The issue looks like it happens only for a user without osdba. Oracle opened a bug, but based on the strace the issue was writing to $GRID_HOME/auth/css/. We changed the permissions of $GRID_HOME to 755, $GRID_HOME/auth, $GRID_HOME/auth/css, $GRID_HOME/auth/css/ all to 775 and it worked for our test case.

  3. MarkK says:

    man, ran into this today. and this helped us sort out our issue. thanks.

    but for us, it was even more mundane, it was connecting to the database through sqlplus, and running an OWB block to run an ETL job. some jobs worked, some failed.

    and the permissions “fix” solved the problem. for us, we’re running 11.2.0.3.4. and on the directories noted above, i noticed the following:
    drwxr-x— 71 root oinstall 4096 Feb 5 13:21 . <—– ????!!!!!!!
    drwxrwx— 6 grid oinstall 4096 Feb 5 10:23 auth

    drwxrwx— 6 grid oinstall 4096 Feb 5 10:23 .
    drwxr-x— 71 root oinstall 4096 Feb 5 13:21 ..
    drwxrwx— 3 grid oinstall 4096 Feb 5 10:23 crs
    drwxrwx— 3 grid oinstall 4096 Feb 5 10:23 css
    drwxrwx— 3 grid oinstall 4096 Feb 5 10:23 evm
    drwxrwx— 3 grid oinstall 4096 Feb 5 10:23 ohasd

    and thats based on a "clean" install. so the root scripts to help complete the setup, are nakered again it seems. very frustrating. after resetting the permissions, the ETL jobs succeeded, and the error hasnt happened since (touch wood).

Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!


6 × = thirty