Translate

Search This Blog

block corruption using RMAN in oracle,also simulate block corruption

Check and repair physical and logical block corruption :

Block corruption errors are not detected in oralce until you run dbv,RMAN,analyze or export command etc. As a DBA you can proactively monitor the block corruptions in Oracle using RMAN. Further you can recover these block corruptions using RMAN with little efforts using healthy RMAN or user managed backup. If you are using user managed backup then this will need to be cataloged in oracle using catalog datafilecopy 'datafile_path_name'

When you take RMAN backup using backupset or backup image copy or run RMAN validate command it automatically checks for physical block corruptions(media blcok corruptions, hard corruption) and RMAN aborts at the first block corruption it encounters. You may want RMAN to check completely, in this case you need to allow RMAN explicitly for N number of block corruptions it encounters.

RUN { set maxcorrupt for datafile 1,2,3 ,4 to 200;
                  backup validate database;}

Since oracle 9i and onward we always have a default channel so there is no need to allocate one.

check physical as well as logical block corruption(software corruption):

RUN { set maxcorrupt for datafile 1,2,3 ,4 to 200;
          backup validate check logical database;}

Above commands mark the blocks as corrupt similar to dbv and populated v$database_block_corruption.
V$database_block_corruption gives you the file number of the datafiles that contain the corruption, and will provide the block number at which it starts and the number of blocks corrupted. You can find the object(s) affected by this corruption in the dba_segments view. using the provided values for file_id & block_id. Don't forget to check the indexes or constraints for any affected objects.

SQL> select * from v$database_block_corruption;

FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
4 756652 1 5.5157E+12 CORRUPT

select segment_name, segment_type, owner

from dba_extents
where file_id = 4
and 756652 between block_id
and block_id + blocks -1;

no rows selected

Question is is corrupted block the free block ?

 [ create table block_corr as select file#,block# from V$DATABASE_BLOCK_CORRUPTION where 1=0;

create unique index bcorr_pk on block_corr(file#,block#);
insert into block_corr select file#,block# from V$DATABASE_BLOCK_CORRUPTION order by file#,block#;

create table corr_objects as
SELECT de.segment_type, de.owner, de.segment_name, de.PARTITION_NAME
FROM dba_extents de, block_corr bc
WHERE de.file_id = bc.file#
and bc.block# between block_id AND block_id + blocks - 1
group by de.segment_type, de.owner, de.segment_name, de.PARTITION_NAME;
 ]

Next run this statement to identify the objects associated with corruption:

SQL> SELECT e.owner, e.segment_type, e.segment_name, e.partition_name, c.file#
, greatest(e.block_id, c.block#) corr_start_block#
, least(e.block_id+e.blocks-1, c.block#+c.blocks-1) corr_end_block#
, least(e.block_id+e.blocks-1, c.block#+c.blocks-1)
- greatest(e.block_id, c.block#) + 1 blocks_corrupted
, null description
FROM dba_extents e, v$database_block_corruption c
WHERE e.file_id = c.file#
AND e.block_id = c.block#;

OWNER SEGMENT_TYPE SEGMENT_NAME PARTITION_NAME FILE#
---------- -------------------- -------------------- --------------- ----------
CORR_START_BLOCK# CORR_END_BLOCK# BLOCKS_CORRUPTED D
----------------- --------------- ---------------- -

TEST INDEX XXXXXXXXXXXX 62
ME_IDX
292779 292779 1

If the block does not belong to any object there is nothing to do, nothing to care, the block will be reformatted when it will be reallocated.

SELECT e.owner, e.segment_type, e.segment_name, e.partition_name, c.file#
, greatest(e.block_id, c.block#) corr_start_block#
, least(e.block_id+e.blocks-1, c.block#+c.blocks-1) corr_end_block#
, least(e.block_id+e.blocks-1, c.block#+c.blocks-1)
- greatest(e.block_id, c.block#) + 1 blocks_corrupted
, null description
FROM dba_extents e, v$database_block_corruption c
WHERE e.file_id = c.file#
AND e.block_id <= c.block# + c.blocks - 1
AND e.block_id + e.blocks - 1 >= c.block#
UNION
SELECT s.owner, s.segment_type, s.segment_name, s.partition_name, c.file#
, header_block corr_start_block#
, header_block corr_end_block#
, 1 blocks_corrupted
, 'Segment Header' description
FROM dba_segments s, v$database_block_corruption c
WHERE s.header_file = c.file#
AND s.header_block between c.block# and c.block# + c.blocks - 1
UNION
SELECT null owner, null segment_type, null segment_name, null partition_name, c.file#
, greatest(f.block_id, c.block#) corr_start_block#
, least(f.block_id+f.blocks-1, c.block#+c.blocks-1) corr_end_block#
, least(f.block_id+f.blocks-1, c.block#+c.blocks-1)
- greatest(f.block_id, c.block#) + 1 blocks_corrupted
, 'Free Block' description
FROM dba_free_space f, v$database_block_corruption c
WHERE f.file_id = c.file#
AND f.block_id <= c.block# + c.blocks - 1
AND f.block_id + f.blocks - 1 >= c.block#
order by file#, corr_start_block#;

OWNER SEGMENT_TYPE SEGMENT_NAME
------------------------------ ------------------ ---------------------------------------------------------------------------------
 PARTITION_NAME FILE# CORR_START_BLOCK# CORR_END_BLOCK# BLOCKS_CORRUPTED DESCRIPTION
------------------------------ ---------- ----------------- --------------- ---------------- --------------

4 756652 756652 1 Free Block

Note: DBMS_REPAIR does not recover the blocks , rather it marks the corrupted blocks as fixed so these blocks are skipped in read/write operation. Data in these blocks are lost forever.

Other examples:

# Check for physical and logical corruption of a tablespace.
RMAN { VALIDATE CHECK LOGICAL TABLESPACE USERS;}

# Check for physical corruption of all archived redo logs files.
VALIDATE ARCHIVELOG ALL;

# Check for physical and logical corruption of the controlfile.
VALIDATE CHECK LOGICAL CURRENT CONTROLFILE;

# Check for physical and logical corruption of a specific backupset.
VALIDATE CHECK LOGICAL BACKUPSET 3;

# Check for physical corruption of files to be backed up.
BACKUP VALIDATE DATABASE ARCHIVELOG ALL;

# Check for physical and logical corruption of files to be backed up.
BACKUP VALIDATE CHECK LOGICAL DATABASE ARCHIVELOG ALL;

# Check for physical corruption of files to be restored.
RESTORE VALIDATE DATABASE;

# Check for physical and logical corruption of files to be restored.
RESTORE VALIDATE CHECK LOGICAL DATABASE;

Repair block corruption: RMAN can be used to recover corrupted blocks online.Without block media recovery, if even a single block is corrupt, then you must take the datafile offline and restore a backup of the datafile. You must apply all redo generated for the datafile after the backup was created. The entire file is unavailable until media recovery completes. With block media recovery, only the blocks actually being recovered are unavailable during the recovery.

Repairing All Block Corruption in the Database: Follwing runs a backup validation to populate V$DATABASE_BLOCK_CORRUPTION, then repairs any corrupt blocks recorded in the view:

RUN {
BACKUP VALIDATE CHECK LOGICAL DATABASE;
BLOCKRECOVER CORRUPTION LIST;}

OR

RECOVER CORRUPTION LIST;

Note: RMAN block media recovery required enterprise edition. 

Alternate 1 : following repairs all physically corrupted blocks recorded in the view:
 BLOCKRECOVER CORRUPTION LIST RESTORE UNTIL TIME 'SYSDATE - 7';

Other examples #1 Recovering a Group of Corrupt Blocks: This recovers corrupt blocks in three datafiles:

BLOCKRECOVER DATAFILE 2 BLOCK 12, 13 DATAFILE 3 BLOCK 5, 98, 99 DATAFILE 4 BLOCK 19;

#2 Limiting Block Media Recovery by Type of Restore:
following  recovers a series of blocks and restores only from datafile copies:

RUN {
BLOCKRECOVER DATAFILE 3 BLOCK 2,3,4,5 TABLESPACE sales DBA 4194405, 4194409, FROM DATAFILECOPY; }

#3 Limiting Block Media Recovery by Backup Tag:
This recovers blocks and restores only from the backup with the tag weekly_backup:

BLOCKRECOVER TABLESPACE SYSTEM DBA 4194404, 4194405 FROM TAG "weekly_backup";

#4 Limiting Block Media Recovery by Time: Following  recovers two blocks in the SYSTEM tablespace. It restores only from backups that could be used to recover the database to a point two days ago:

BLOCKRECOVER TABLESPACE SYSTEM DBA 4194404, 4194405 RESTORE UNTIL TIME 'SYSDATE-2';

References: 1
                  2
                  3
                  4
 finding corrupted blocks in detail

It was all about physical and logical block corruption but But what about when the data itself is "corrupted" but the block is fine? I'm not talking about a logical or application oriented corruption rather a corruption that happens when there is a "Lost Write." In versions 11.1 (Oracle says it is now corrected for 11.2 onwards) and earlier it was possible for dbwr process to get a write acknowledgement of the write when in fact it did not happen. Of course this isn't Oracle's fault, but try explaining that to your manager/director weeks or months later when a data corruption is encountered and you don't know where it came from! Check it out in the "Backup and Recovery User's Guide" as well as the parameter DB_LOST_WRITE_PROTECT.  The best protection is using Data Guard.

reference: db lost write
asl tom about db lost write & protection

Simulate block corruption in oracle:
select segment_name , header_file , header_block
from dba_segments
where segment_name = 'table_to_be_corrupted' and owner = 'user_name';

SEGMENT_NAME HEADER_FILE HEADER_BLOCK
---------------------------- ----------- ------------
table1                              8                          19

Lets corrupt block 20 using the dd command in Linux. For windows use ultra edit editor in hexa mode

DISCLAIMER: The dd command given below is just for learning purposes and should only be used on testing systems. I will not take any responsibility of any consequences or loss of data caused by this command.

$dd of=ts_corrupt01.dbf bs=8192 conv=notrunc seek=20 << EOF
Make it Corrupt.
EOF

11g Active Standby Database Automatic Block Corruption Repair(reference: Gavin Soorma)


In addition to the real time query capability of the 11g Active Data Guard feature, we can also add to our high availability capability by using the Automatic Block Media Repair feature whereby data block corruptions on the Primary database can be repaired by obtaining those blocks from the standby site – all performed by a background process (ABMR) transparent to the application. The same functionality can be used to repair block corruptions on the Active Standby site by applying blocks which are conversely now received from the Primary site.