oraenv on Steroids part 2

In oraenv on Steroids Part 1, we demonstrated a way to extract the correct values for $ORACLE_SID and $ORACLE_HOME.  Now we need to put it into a script and use it.  Let’s just take the commands and put them in a script “oenv” that we will put in our path.  We will also add $ORACLE_HOME/bin to our path:

#!/bin/bash
export ORACLE_SID=`grep ^$1 /etc/oratab | cut -f1 -d":"`
export ORACLE_HOME=`grep ^$1 /etc/oratab | cut -f2 -d":"`
export PATH=/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:$HOME/bin:$ORACLE_HOME/bin
echo Oracle Sid: $ORACLE_SID
echo Oracle Home: $ORACLE_HOME

Running that bad boy:

Sweet!  But when  we try to use it, it doesn’t seem to work. Even though it seems to be setting up our variables correctly, nothing works.  Why?

The reason this doesn’t work is that Unix shell scripts spawn a new shell, and leave the environment unmodified in the shell that started it.  In order to include the  variables in the current environment, you have to use the source or “.” when invoking the script, i.e:

Works much better now.  But remembering to type “source oenv” or “. orenv” is a pain.  Why doesn’t work like DOS batch files or SQL*Plus scripts?  Actually there is a really good reason?  Have you ever had a SQL*Plus script not work because a previous script changed something? Having a environment scope is really nice.

Getting around this annoyance is really easy.  All you have to do is add an alias to your .bashrc:

alias oraenv='source oenv'

Logout, and log back in.  You can just run “oraenv” from the command prompt and it will work as expected:

That’s it.  We now have a script which correctly configures the environment for almost any Oracle installation.  In our next hopefully exciting blog, we’ll pimp this scrypt out and add some more.

oraenv on Steroids Part 1

Most DBA’s have encountered the oraenv script, the one that is generated by root.sh:

This script is just as useless as it was in 1993.  If you just have to enter the $ORACLE_SID and $ORACLE_HOME, why do you even need this script?  In 25 years of Oracle DBA experience I have never once used it.  It was bad enough then, but now?  The environment variables like $ORACLE_HOME and $ORACLE_SID are a management nightmare.   On a modern RAC system you have at least 2 Oracle homes (the grid infrastructure and the RDBMS).  And the ORACLE_SID is not just the database name, but also the node number.  So you literally need a different value for every node in your cluster.

But there is a solution to all of this that is already on your servers:  /etc/oratab.  You can use this file to configure your environment correctly for every database server you have whether it is RAC or not just by using it.  Consider a typical /etc/oratab from a RAC cluster:

$ cat /etc/oratab
#
# This file is used by ORACLE utilities.  It is created by root.sh
# and updated by either Database Configuration Assistant while creating
# a database or ASM Configuration Assistant while creating ASM instance.

# A colon, ':', is used as the field terminator.  A new line terminates
# the entry.  Lines beginning with a pound sign, '#', are comments.
#
# Entries are of the form:
#   $ORACLE_SID:$ORACLE_HOME:<N|Y>:
#
# The first and second fields are the system identifier and home
# directory of the database respectively.  The third field indicates
# to the dbstart utility that the database should , "Y", or should not,
# "N", be brought up at system boot time.
#
# Multiple entries with the same $ORACLE_SID are not allowed.
#
#
dbistdby:/u01/app/oracle/product/12.1.0/db:N      # line added by Agent
+ASM1:/u01/app/12.1.0/grid:N                      # line added by Agent
dbinternals:/u01/app/oracle/product/12.1.0/db:N   # line added by Agent

From this file, we can see there are 2 databases (dbinternals and dbistdby), as well as a grid infrastructure on this server.   We can easily extract the ORACLE_HOME and ORACLE_SID from it using just the database name, i.e:

$ grep dbinternals /etc/oratab
dbinternals:/u01/app/oracle/product/12.1.0/db:N   # line added by Agent
$ export ORACLE_SID=`grep ^dbinternals /etc/oratab | cut -f1 -d":"`
$ export ORACLE_HOME=`grep ^dbinternals /etc/oratab | cut -f2 -d":"`
$ echo $ORACLE_HOME
/u01/app/oracle/product/12.1.0/db
[oracle@oracle-linux ~]$ echo $ORACLE_SID
dbinternals

But there is a problem.  In a RAC environment, the $ORACLE_SID isn’t just the DB name.  It also has the node number in it too.  In our example, the actual ORACLE_SID used is dbinternals1.  Fortunately, that is easy to extract from /etc/oratab too.  You just get it from the +ASM entry, i.e:

$ export NODE=`grep ^+ASM /etc/oratab | cut -c5`
$ echo $NODE
1

So you can combine these two together to get the actual ORACLE_SID, i.e:

$ export ORACLE_SID=${ORACLE_SID}${NODE}
$ echo $ORACLE_SID
dbinternals1

So now we have the correct ORACLE_SID and ORACLE_HOME based on the values in /etc/oratab, and we have the basis for a script we can deploy throughout our enterprise without modifications.  In part 2, we’ll make this script.

 

 

 

When Others Then John Doe; —

A few weeks ago, while doing PL/SQL code review, I came across this gem:Screenshot-john-doe.sql (-tmp) - gedit

Of course upon seeing that, my head exploded.  Then it started a chain reaction in our office with everyone’s head exploding in sequence, like in the movies where our hero sets off a sequence of claymore mines.  Some things cannot be unseen 🙂

While this is an extreme example, I see plenty of other examples of WHEN OTHERS abuse.  This one would return “John Doe” if you ran out of temp space, had a constraint violation, too many rows, no data found, and practically any other condition.  I pretty sure that is not the intended design of the application.

Consider this misuse of WHEN OTHERS that I see a lot:

WHEN OTHERS THEN
ROLLBACK;

The developer is obviously trying to ensure that all incomplete transactions are rolled back on an error, but Oracle does that anyway.  So this is unnecessary, but if there is an error, it appears to work correctly while not actually doing anything.  This can be catastrophic.  Consider the following trivial example, where we want to backup the EMP table and keep backups for 30 days:

Screenshot-backup_emp.sql (~) - gedit-2

So we set this up and run it for a year as a scheduled job.  EMP has 896 rows in it, and the backup runs perfectly each time:

SQL> select count(*) from emp;

  COUNT(*)
----------
       896

Since I don’t really have a year to wait for the result, I’ll just run the backup job in a loop and display the counts in the backup_emp table afterwards. It should increase over time, then after 30 days remain about the same right? This script just runs it several times:

set serveroutput on
  2  declare
  3    v_count number;
  4  begin
  5  for i in 1..11 loop
  6    run_backup_emp;
  7    select count(*) into v_count from backup_emp;
  8    dbms_output.put_line('Backup run ' || i || ' complete, backup_emp count is ' || v_count);
  9  end loop;
 10* end;

So lets run it:

Backup run 1 complete, backup_emp count is 896
Backup run 2 complete, backup_emp count is 1792
Backup run 3 complete, backup_emp count is 2688
Backup run 4 complete, backup_emp count is 3584
Backup run 5 complete, backup_emp count is 4480
Backup run 6 complete, backup_emp count is 5376
Backup run 7 complete, backup_emp count is 6272
Backup run 8 complete, backup_emp count is 7168
Backup run 9 complete, backup_emp count is 8064
Backup run 10 complete, backup_emp count is 8960
Backup run 11 complete, backup_emp count is 0

PL/SQL procedure successfully completed.

WTF? What deleted all of the data in my backup_emp table? Of course since Murphy is an optimist, someone runs an “upgrade” script which fails but clobbers the EMP table before it fails, and here we are:

SQL> select count(*) from backup_emp;

  COUNT(*)
----------
	 0

1 row selected.

After a week of downtime, during the blamestorming session, everyone still asks, “what happened to all backup data? Have we been hacked?”

Our DBA takes out the WHEN OTHERS THEN ROLLBACK; and re-runs the job:

SQL> @p1
Backup run 1 complete, backup_emp count is 896
Backup run 2 complete, backup_emp count is 1792
Backup run 3 complete, backup_emp count is 2688
Backup run 4 complete, backup_emp count is 3584
Backup run 5 complete, backup_emp count is 4480
Backup run 6 complete, backup_emp count is 5376
Backup run 7 complete, backup_emp count is 6272
Backup run 8 complete, backup_emp count is 7168
Backup run 9 complete, backup_emp count is 8064
Backup run 10 complete, backup_emp count is 8960
declare
*
ERROR at line 1:
ORA-01653: unable to extend table C##SCOTT.BACKUP_EMP by 8 in tablespace DEMO
ORA-06512: at "C##SCOTT.RUN_BACKUP_EMP", line 3
ORA-06512: at line 5

So our backup failed silently for a year because we had shitty error handling and ran out of space. We didn’t really need those backups, did we?

Why Every DBA Should Learn Slackware – Part 2

We just finished installing Slackware and booted it up.  I feel like I am playing Zork here:

Slackware-firstboot

So where is my GUI?  What do I do now?  Well, Slackware comes with a pretty nice KDE desktop.  It just doesn’t start by default.  Probably a good thing if you are logged in as root.

Anyway, I created a low privileged oracle account with useradd, then logged into it and started the GUI:

$ startx

That gives us a nice GUI with a browser.  Slackware comes with a pretty current 2.4.x version of apache, but we want to compile it from source.  Besides being the most up-to-date, we will get all of our apache installation in one convenient location, /usr/local/apache2.  This will be important later.  We also need modowa, the Apache PL/SQL interface, so we will download it too:

Slackware-modowa

 

Anyway, after downloading Apache, ./configure, make, and sudo make install.  What could be easier?  Worked perfectly on Slackware, but when trying on OEL got a lot of dependencies missing.  Anyway, in order to install mod_owa, I need an Oracle client first.

Oracle only supports a few Linux distros like RHEL, OEL, and a few others.  So the OUI probably won’t work out of the box.  Oracle likes to hard-code paths to utilities such as make, gcc, etc instead of just using what is in the path.

So the simplest solution is to create a tarball of an already patched client installation from a supported distro, then extract it on Slackware, set your $ORACLE_HOME and $PATH and you are good to go:

 

Slackware-oracle-client

I know that this is an unsupported platform and cloning an ORACLE_HOME by extracting a tarball is not supported by Oracle, but it works just fine.  What we are dong in the next hopefully exciting instalment of this blog will be a lot less supported, but very worthwhile.  So lets install MOD_OWA now.  M

/home/oracle/modowa/apache24$ sudo cp mod_owa.so /usr/local/apache2/modules/
/home/oracle/modowa/apache24$ sudo cp mod_owa.conf /usr/local/apache2/conf/
/usr/local/apache2/bin$ ./httpd -k start
httpd: Syntax error on line 504 of /usr/local/apache2/conf/httpd.conf: Syntax error on line 1 of /usr/local/apache2/conf/modowa.conf: Cannot load modules/mod_owa.so into server: libclntsh.so.11.1: cannot open shared object file: No such file or directory

So that didn’t work.  We’ll have to set LD_LIBRARY_PATH to $ORACLE_HOME/lib.  Well do that in the .bash so it is always set:

/usr/local/apache2/bin$ export LD_LIBRARY_PATH=$ORACLE_HOME/lib
/usr/local/apache2/bin$ ./httpd -k start
/usr/local/apache2/bin$ tail -f ../logs/error_log 
[Thu Oct 22 17:46:18.315078 2015] [mpm_worker:notice] [pid 24375:tid 139988694828928] AH00292: Apache/2.4.17 (Unix) mod_owa 2.10.10 configured -- resuming normal operations
[Thu Oct 22 17:46:18.316160 2015] [core:notice] [pid 24375:tid 139988694828928] AH00094: Command line: './httpd'

In Apache, startup messages are in the error_log. From our error_log we can see that mod_owa was loaded when Apache was started,
so we are ready to begin setting up our PL/SQL website, which will be covered in the next instalment of this blog.

Why every DBA should learn Slackware – Part 1

There is an old saying:  “”If you want to learn Debian, install Debian. If you want to learn Red Hat, install Fedora. If you want to learn Linux, install Slackware.” There is a truth to that statement.  It is one of the oldest distros, and I first saw it in about 1994 when it came on floppy disks.  A DBA for Oracle on AIX, I thought “Cool!  Unix on the PC.   I wonder if Oracle will run on it?”

Of course it wouldn’t.  Oracle didn’t run on any Linux at that time.  I was one of the original developers who downloaded Oracle 8.0.5 for Linux which installed perfectly.  Back then Oracle’s “installer” was just a tarball of an Oracle home.  But it worked.  If only the OUI worked so well now. 🙂

Back to Slackware.  We downloaded the ISO file from slackware.com and configured a Virtualbox VM.  Lets mount the ISO and boot into the installer:

Slackware-install

Wait, what is this?  No pretty splash screen?  No GUI wizards?   1993 called and they want their CRTs back.  But if we read the screen, it tells exactly what to do:

  1. You can run memtest to test the RAM on your machine.  Given that I have had databases corrupted by bad memory, running memtest on a new server is probably a good idea.
  2. There is a bunch of kernel options that you can specify but I have no clue about those, so I will take the third option and…
  3. Just hit ENTER to take the default

Now we should see the actual installer, right?

Slackware-installer

Well it seems that we still have to read the screen before proceeding.  It seems that we have to partition our disks before actually installing.  Most other distros have a default partitioning scheme which may or may not meet your needs.  Slackware has none of that.;  Like the previous screen it does have handy instructions on how to setup your system.  Enter “root” to “login to the installer:

Wait?  You expected something else besides a command prompt?  Just keep reading the screen.  It tells you what to do.  Partition your disk using fdisk.  The nice thing is, it allows you to customize your disk layout before actually installing anything.  Have you ever tried to create a separate home partition in Ubuntu?

Besides, you are going to need to know fdisk if you install Oracle.  Might as well learn it here

Anyway, I created 4 partitions, on 10g for root, one 10g for /home, a 5g swap and a 5g for /var.  My VM is tiny and I won’t be running a database here.  If I were setting up partitions for an Oracle server, I would setup large partitions for for /u01, /var, etc.

finally, run setup.  If you messed up the partitions, exit and go back to fdisk to fix them.  You will need to make sure one of your partitions is Linux swap (type 82).  Now you get to see your fancy GUI installer:

slackware-setup

1993 called and they wanted their NCURSES back.  But it works fine.  It is just as easy to install as any other distro and definitely easier to install than Windows (TM).

Well start setting up an Apache webserver to access an Oracle database in the next hopefully exciting instalment.

Securing your web browsing in Linux

Many years ago, when I installed Red Hat 7 as my primary Internet connected workstation, it was like the surfing the web in 1995.  Before Firefox, we had Mozilla 1.0 and it was a pleasure to use.  It was the first browser with a pop-up blocker and I could count on Linux’s Unix based security model to keep a website from pwning my whole machine.  Life was good back then.  Now with Linux used in smart phones and more popular on the desktop, the crappyness  of the Internet has come here.

If you go to those black hat conferences, you will find that nobody tries to pwn a Linux machine the way they do a Windows one.  Instead, the trick is pwn userland.  It is almost as good as the whole machine.  An attacker can obtain sensitive documents, keepassx databases, tax returns, photos, pretty much everything you have with just access to your $HOME directory.

So how do you prevent it?  With Linux and most other Unix based desktops it is possible to run the as a different user in the desktop.  All you have to do is:

  1. Create a low privileged user such as “nobody” or “interwebs”.  Add your main user to it’s private group
  2. Install sshd if it is not already installed
  3. Set up password user/host equivalence between your main account and the low privileged one.
  4. Set up a command to execute the browser over SSH the same way you might use xming on Windows to run the OUI, and put a short cut on your desktop with a different icon.
  5. Use the “trusted” browser to access your bank, email, paypal and such.  Use your new “untrusted internet” browser to surf the rest of the interwebs.

For example:

Creating the low privileged account:

Screenshot-User Manager

Set up passwordless ssh user/host equivalence:

[oracle@oracle-linux ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/oracle/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/oracle/.ssh/id_rsa.
Your public key has been saved in /home/oracle/.ssh/id_rsa.pub.
The key fingerprint is:
73:06:f4:aa:fd:b2:e0:fa:86:69:08:09:ae:75:ac:0c oracle@oracle-linux.localdomain
The key's randomart image is:
+--[ RSA 2048]----+
|        .        |
|       . .       |
|        . .      |
|.        o       |
|o. .    S o      |
|E.. o  o +       |
|.= + oo .        |
|. + +.....       |
|   ..+o .o.      |
+-----------------+

[oracle@oracle-linux ~]$ ssh-copy-id interwebs@127.0.0.1
interwebs@127.0.0.1's password: 
Now try logging into the machine, with "ssh 'interwebs@127.0.0.1'", and check in:

  .ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

[oracle@oracle-linux ~]$ ssh interwebs@127.0.0.1 date
Fri May 29 21:24:41 EDT 2015
[oracle@oracle-linux ~]$ 

So try running the browser through ssh and xwindows:

[oracle@oracle-linux ~]$ ssh -Y interwebs@127.0.0.1 firefox &
[1] 2304
[oracle@oracle-linux ~]$ /usr/bin/xauth:  creating new authority file /home/interwebs/.Xauthority

Now that the broswer has started, you will notice it doesn’t have any of your bookmarks, browser customizations, extenstions, nor anything else assoicated with your main account. You can further verify it is using the low privileged account by accessing the file menu and seeing what home directory you are in:

Screenshot

There you are!  Create a shortcut on on the desktop, preferably with a spammy looking icon so you don’t accidentally use it to login to your bank.

 

The Importance of DBA_ROLLBACK_SEGS – Recovery

OK, I have successfully opened my database.  I am in a lot better position than I was I was a few days ago.  But what about my priceless Dilbert collection?  It is not like I could just go to dilbert.com and download them again, could I?

SQL> select count(*) from dilbert;

ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 12 cannot be read at this time
ORA-01110: data file 12: '+DATA/DBINTERNALS/DATAFILE/undotbs1.260.875748227':

Anyway, after opening my database, I tried to see what was in my Dilbert table:

Checking DBA_ROLLBACK_SEGS. I see some entries that have a status of “needs recovery” from my missing undo tablespace:

SQL> select segment_name, status from dba_rollback_segs;
SEGMENT_NAME                   STATUS
------------------------------ ----------------
SYSTEM                         ONLINE
_SYSSMU28_3586889292$          ONLINE
_SYSSMU27_3341047585$          ONLINE
_SYSSMU26_1539987469$          ONLINE
_SYSSMU36_168785881$           OFFLINE
_SYSSMU35_1475960130$          OFFLINE
_SYSSMU34_908380251$           OFFLINE
_SYSSMU33_2122985546$          OFFLINE
_SYSSMU32_228305016$           OFFLINE
_SYSSMU31_2744769455$          OFFLINE
_SYSSMU30_2141723264$          OFFLINE
_SYSSMU29_2359381190$          OFFLINE
_SYSSMU10_732010430$           NEEDS RECOVERY
_SYSSMU9_624586700$            NEEDS RECOVERY
_SYSSMU8_1927760754$           NEEDS RECOVERY
_SYSSMU7_3527658720$           NEEDS RECOVERY
_SYSSMU6_3377313595$           NEEDS RECOVERY
_SYSSMU5_1147562966$           NEEDS RECOVERY

Dropping the tablespace but got:

ORA-01548: active rollback segment

I have done this before. Not sure why it didn’t work, but nothing else seems to be working on this blog, including WordPress :). Anyway, I tried setting the parameter again and restarting the database cleanly. The entries are still there in DBA_ROLLBACK_SEGS but I can drop the tablespace now:

SQL> alter system set "_corrupted_rollback_segments"=
    1 '_SYSSMU10_732010430$',
    2 '_SYSSMU9_624586700$',
    3 '_SYSSMU8_1927760754$',
    4 '_SYSSMU7_3527658720$',
    5 '_SYSSMU6_3377313595$',
    6 '_SYSSMU5_1147562966$',
    7 scope=spfile
SQL> /

System altered.

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 2499805184 bytes
Fixed Size 2927480 bytes
Variable Size 671089800 bytes
Database Buffers 1811939328 bytes
Redo Buffers 13848576 bytes
Database mounted.
Database opened.

SQL> select segment_name, status from dba_rollback_segs;
SEGMENT_NAME                   STATUS
------------------------------ ----------------
SYSTEM                         ONLINE
_SYSSMU28_3586889292$          ONLINE
_SYSSMU27_3341047585$          ONLINE
_SYSSMU26_1539987469$          ONLINE
_SYSSMU36_168785881$           OFFLINE
_SYSSMU35_1475960130$          OFFLINE
_SYSSMU34_908380251$           OFFLINE
_SYSSMU33_2122985546$          OFFLINE
_SYSSMU32_228305016$           OFFLINE
_SYSSMU31_2744769455$          OFFLINE
_SYSSMU30_2141723264$          OFFLINE
_SYSSMU29_2359381190$          OFFLINE
_SYSSMU10_732010430$           NEEDS RECOVERY
_SYSSMU9_624586700$            NEEDS RECOVERY
_SYSSMU8_1927760754$           NEEDS RECOVERY
_SYSSMU7_3527658720$           NEEDS RECOVERY
_SYSSMU6_3377313595$           NEEDS RECOVERY
_SYSSMU5_1147562966$           NEEDS RECOVERY

SQL> drop tablespace undotbs1;

Tablespace dropped.

SQL> select segment_name, status, tablespace_name from dba_rollback_segs;

SEGMENT_NAME                   STATUS           TABLESPACE_NAME
------------------------------ ---------------- ------------------------------
SYSTEM                         ONLINE           SYSTEM
_SYSSMU28_3586889292$          ONLINE           UNDOTBS01
_SYSSMU27_3341047585$          ONLINE           UNDOTBS01
_SYSSMU26_1539987469$          ONLINE           UNDOTBS01
_SYSSMU25_4262984436$          ONLINE           UNDOTBS01
_SYSSMU24_2903816470$          ONLINE           UNDOTBS01
_SYSSMU23_4129801038$          ONLINE           UNDOTBS01
_SYSSMU22_1326017728$          ONLINE           UNDOTBS01
_SYSSMU21_3269488760$          ONLINE           UNDOTBS01
_SYSSMU20_3537793676$          ONLINE           UNDOTBS01
_SYSSMU19_3320221023$          ONLINE           UNDOTBS01

11 rows selected.

SQL>

So I have successfully dropped the undo tablespace.  What about my data?  I try querying the table again to see if I can access it:

SQL> select count(*) from dilbert;

  COUNT(*)
----------
       640

If you go back to my previous blog, you will see that 640 is the number of rows committed before the database crashed.  Apparently, I have my data,  But is it really all there with no corruptions?  I wrote a little PL/SQL program to extract them to a /tmp directory and extracted them:

Screenshot-dilbert

 

My beloved Dilbert strips are all there!  I have all of my data even though though I lost my undo tablespace without proper backups.  Should this be considered as part of any planned backup strategy?  Other than putting a query of DBA_ROLLBACK_SEGS in your backup script, No.  But sometimes, even with the best backup strategy, you can find yourself without backups you need at the most inopportune time.  I have seen backup products that skip datafiles due to a failing hard drive, then reported the backup as successful.  I have seen a database silently write corrupt archivelogs, rendering otherwise perfectly good RMAN backups useless.  When things start to go wrong, they often tend to affect your backup systems as well.  This recovery would not have been possible without the initial query of DBA_ROLLBACK_SEGS in the backup.  I would have not known what the segment names were, and would have not been able to open the database in the first place.

The Importance of DBA_ROLLBACK_SEGS – 3 of 4

 

 

Does anyone notice a pattern here with the post titles?   Anyway, I was loading my priceless collection of Dilbert strips into my database when it crashed and burned:

sqlldr

Trying to restart the database, my undo tablespace is gone:

SQL> startup
ORACLE instance started.

Total System Global Area 2499805184 bytes
Fixed Size 2927480 bytes
Variable Size 671089800 bytes
Database Buffers 1811939328 bytes
Redo Buffers 13848576 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 12 - see DBWR trace file
ORA-01110: data file 12: '+DATA/DBINTERNALS/DATAFILE/undotbs1.260.875748227'

SQL> alter database datafile 12 offline;

Database altered.

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 12 cannot be read at this time
ORA-01110: data file 12: '+DATA/DBINTERNALS/DATAFILE/undotbs1.260.875748227'
Process ID: 23713
Session ID: 237 Serial number: 25274

So we are hosed. We will copy and paste the segment names from that DBA_ROLLBACK_SEGS into the _CORRUPTED_ROLLBACK_SEGS parameter and try to start it again:

SQL> alter system set _corrupted_rollback_segments='_SYSSMU36_168785881$',
 '_SYSSMU35_1475960130$', '_SYSSMU34_908380251$', '_SYSSMU33_2122985546$', 
'_SYSSMU32_228305016$', '_SYSSMU31_2744769455$', '_SYSSMU30_2141723264$', 
'_SYSSMU29_2359381190$', '_SYSSMU10_732010430$', '_SYSSMU9_624586700$', 
'_SYSSMU8_1927760754$', '_SYSSMU7_3527658720$', '_SYSSMU6_3377313595$', 
'_SYSSMU5_1147562966$', '_SYSSMU4_1750859549$', '_SYSSMU3_1431909199$', 
'_SYSSMU2_911282856$', '_SYSSMU1_4079383768$';
alter system set _corrupted_rollback_segments='_SYSSMU36_168785881$', 
'_SYSSMU35_1475960130$', '_SYSSMU34_908380251$', '_SYSSMU33_2122985546$', 
'_SYSSMU32_228305016$', '_SYSSMU31_2744769455$', '_SYSSMU30_2141723264$', 
'_SYSSMU29_2359381190$', '_SYSSMU10_732010430$', '_SYSSMU9_624586700$', 
'_SYSSMU8_1927760754$', '_SYSSMU7_3527658720$', '_SYSSMU6_3377313595$', 
'_SYSSMU5_1147562966$', '_SYSSMU4_1750859549$', '_SYSSMU3_1431909199$', 
'_SYSSMU2_911282856$', '_SYSSMU1_4079383768$'
                 *
ERROR at line 1:
ORA-00911: invalid character

Well, that didn’t work out so well. From this error, I have no idea where the problem is. So I save the SQL script and edited it. I added a new line after each segment and ran it again:

SQL> 

  1  alter system set _corrupted_rollback_segments=
  2  '_SYSSMU36_168785881$',
  3  '_SYSSMU35_1475960130$',
  4   '_SYSSMU34_908380251$',
  5   '_SYSSMU33_2122985546$',
  6   '_SYSSMU32_228305016$',
  7   '_SYSSMU31_2744769455$',
  8   '_SYSSMU30_2141723264$',
  9   '_SYSSMU29_2359381190$',
 10   '_SYSSMU10_732010430$',
 11   '_SYSSMU9_624586700$',
 12   '_SYSSMU8_1927760754$',
 13   '_SYSSMU7_3527658720$',
 14   '_SYSSMU6_3377313595$',
 15   '_SYSSMU5_1147562966$',
 16   '_SYSSMU4_1750859549$',
 17   '_SYSSMU3_1431909199$',
 18   '_SYSSMU2_911282856$',
 19*  '_SYSSMU1_4079383768$'

SQL> /
alter system set _corrupted_rollback_segments=
                 *
ERROR at line 1:
ORA-00911: invalid character

Much easier to figure out where the issue is. You would be surprised at the number of recovery efforts that are abandoned due to trivial issues like this. Anyway, I fixed my script, restarted the database and it is open! The table I was loading with my precious Dilbert strips still isn’t accessible, but I will work on that on my next increasingly inaccurately named blog.

  1  alter system set "_corrupted_rollback_segments"=
  2  '_SYSSMU36_168785881$',
  3  '_SYSSMU35_1475960130$',
  4   '_SYSSMU34_908380251$',
  5   '_SYSSMU33_2122985546$',
  6   '_SYSSMU32_228305016$',
  7   '_SYSSMU31_2744769455$',
  8   '_SYSSMU30_2141723264$',
  9   '_SYSSMU29_2359381190$',
 10   '_SYSSMU10_732010430$',
 11   '_SYSSMU9_624586700$',
 12   '_SYSSMU8_1927760754$',
 13   '_SYSSMU7_3527658720$',
 14   '_SYSSMU6_3377313595$',
 15   '_SYSSMU5_1147562966$',
 16   '_SYSSMU4_1750859549$',
 17   '_SYSSMU3_1431909199$',
 18   '_SYSSMU2_911282856$',
 19*  '_SYSSMU1_4079383768$' scope=spfile
SQL> /

System altered.

SQL> startup
ORACLE instance started.

Total System Global Area 2499805184 bytes
Fixed Size		    2927480 bytes
Variable Size		  671089800 bytes
Database Buffers	 1811939328 bytes
Redo Buffers		   13848576 bytes
Database mounted.
Database opened.
SQL> 

The Importance of DBA_ROLLBACK_SEGS – 2 of 3

Yes, the original post title said “The Importance of DBA_ROLLBACK_SEGS -1 of 2”.  The second part was supposed to be where I clobbered my undo tablespace and had to use _CORRUPTED_ROLLBACK_SEGMENTS.  Apparently I can’t even clobber my tablespaces right.  My evil plan was simple:

  • Shut the database down
  • Nuke the rollback segment datafile from orbit with ASMCMD
  • Attempt to bring the database back up, recovering with _CORRUPTED_ROLLBACK_SEGMENTS.

Well it didn’t work.  I shut the database down:

rbs2

Then, I deleted the Undo tablespace with ASMCMD and tried to bring the database back up.  The results were predictable:

ASMCMD> rm +DATA/DBINTERNALS/DATAFILE/undotbs01.260.875483419

SQL> startup
ORACLE instance started.

Total System Global Area 2499805184 bytes
Fixed Size            2927480 bytes
Variable Size          671089800 bytes
Database Buffers     1811939328 bytes
Redo Buffers           13848576 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 11 - see DBWR trace file
ORA-01110: data file 11: '+DATA/DBINTERNALS/DATAFILE/undotbs01.260.875483419'

Since I don’t have the datafile, I’ll try to take it offline and open it.  Since my undo tablespace is gone, I should get an error.

Wrong.

SQL> alter database datafile 11 offline;

Database altered.

SQL> alter database open;

Database altered.

WTF?  No undo tablespace and I can open my database just fine?  Then I figured it out.  I did a SHUTDOWN IMMEDIATE, so all uncommitted transactions were rolled back during the shutdown, so no  undo is needed to open the database.  In addition, there is an old Oracle7 style rollback segment SYSTEM in the SYSTEM tablespace.  So you can actually do updates if you want to, but I wouldn’t.  Instead, I would create a new undo tablespace, update the spfile, and restart the database:

SQL> create undo tablespace undotbs1 datafile '+DATA' size 500m;

Tablespace created.

SQL> alter system set undo_tablespace='UNDOTBS1' scope=spfile;

System altered.

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 2499805184 bytes
Fixed Size            2927480 bytes
Variable Size          671089800 bytes
Database Buffers     1811939328 bytes
Redo Buffers           13848576 bytes
Database mounted.
Database opened.
SQL> drop tablespace undotbs01;  -- Drop the old undo tablespace

Tablespace dropped.
SQL>

So next time, I’ll have to try a harder test.  I’ll start a SQL*Loader script to load the last 20 years of Dilbert cartoons into my database, then do something stupid like killing PMON before deleting the datafile.  That should hose my database real good.

The importance of DBA_ROLLBACK_SEGS – 1 of 2

Unless you have been a DBA for more than 10 years, you might not have even heard of DBA_ROLLBACK_SEGS. Back in the days of Oracle6 and Oracle7, when there was no Interwebs and every kid had to walk to school uphill both ways in the snow, Oracle had rollback segments.

Before we had undo tablespaces, DBA’s had to create rollback segments manually, specifying storage parameters, and adding them to the init.ora. Yes, we had init.ora files which had to edit manually – we didn’t have those fancy ALTER SYSTEM commands to change parameters. Why is this view referencing a component of Oracle that has been obsolete for over 10 years important now?

Try querying DBA_ROLLBACK_SEGS in a modern database and you will get results like:

rollback-segs

What is the significance of these results? Well, all undo tablespaces are is an automagically managed version of rollback segments. The underlying technology isn’t really different. The significance is when your undo tablespace is for whatever reason, unavailable and you can’t open your database. Also for whatever reason, you don’t have good backups of your undo tablespace. We know everyone always has good backups of their undo tablespaces but in case you don’t and can’t open your database, Oracle has an undocumented parameter:

_corrupted_rollback_segments

This parameter can be used to force the database open if rollback segments (undo tablespaces) are missing or corrupted. It is potentially nasty, as it allows the database to open by assuming every transaction in the rollback segments is a complete, committed transaction. This can lead to logical corruption throughout the database, and you would need to take an export and rebuild the database afterwards. That is still better then losing all of your data.

The problem is, _corrupted_rollback_segments requires the list of the segment_names. If you can’t open your database, how do you get the list? In the olden days, your init.ora might help you, but with these fancy new undo tablespaces there is no parameter to help. You need to have queried dba_rollback_segments before you had the failure.

The point of this blog is not to advocate using _corrupted_rollback_segments. Instead it is to put the query in your backup scripts. You probably won’t need it, but if you do, you will be very happy you have it.