Translate

Search This Blog

Oracle Bugs and Life!!I I'm amused to see.

Its been 2 months!!! Much has since in life and seems life will never be same again. oh.. and Im back again to Post but little.

Although once I did it already I faced issueses again in installation of Oracle (bit) version on RHEL 5.2 (64 bit ) and it took again few more hours to delay installation. But caused more panicked than before.

Issue was same again that Oracle Universal Installer was still requiring 32 bit libraries to start with. That is known bug but Oracle Documents is missing this point even in lates version of docs.

RPMS default installation includes those libraries but system administrator did not do default RPMSs installation rather he customized and so dependency call of 32 library was unresolved.


One another day I was awaked in middle of night by call: error was 4031 but that was not so simple so obvious here culprit was new bug in 10g ERROR 4031 was misleading.
Oh what the life has become !!!

And more: one day when I stayed home to take care of some my dear one at same time I had to monitor DB from home. But much more than this some important query used Graphical User Interface hanged and major work suffered. Many users sat idle!! They had no option but to call me to fix Oh what a life has become!! while I did not have remote access to that Prod DB I asked Junior DBA to send me execution Plan and I guessed query can be fixed to get the same old execution plan by setting hidden parameters!! Parameter turned off using B+ indices as Bitmap Index and it worked!! oh what a life is!!! It simply rocked. Ah but what a life has become. But I'm amused to think what I did and I did it my way!!!

adieu !! till Im back


extracting SQLS from redo logs 10g to tracks all/commited changes

I -to track all(commited+non commited changes)

since v$logmmr_contents data is only persistent to session in which log miner is run a table logmined_commited is created to store all extracted information.

SQL> ALTER SESSION SET NLS_DATE_FORMAT = 'DD-MON-YYYY HH24:MI:SS';

Session altered.

In below options parameters is saving the time of adding individual redo logs/archive logs.

SQL> EXECUTE DBMS_LOGMNR.START_LOGMNR( -
> STARTTIME => '01-jan-2008 01:00:00', -
> ENDTIME => '03-jan-2008 21:00:00', -
> OPTIONS => DBMS_LOGMNR.DICT_FROM_ONLINE_CATALOG + -
> DBMS_LOGMNR.CONTINUOUS_MINE);

PL/SQL procedure successfully completed.

SQL> select count(*) from v$logmnr_contents;

COUNT(*)
----------
98558

SQL> create table extlogs as select * from v$logmnr_contents;

Table created.
------

OR

II -to track commited changes only

SQL> ALTER SESSION SET NLS_DATE_FORMAT = 'DD-MON-YYYY HH24:MI:SS';

Session altered.

SQL> EXECUTE DBMS_LOGMNR.START_LOGMNR( -
> STARTTIME => '01-JAN-2008 01:00:00', -
> ENDTIME => '03-JAN-2008 23:00:00', -
> OPTIONS => DBMS_LOGMNR.DICT_FROM_ONLINE_CATALOG + -
> DBMS_LOGMNR.CONTINUOUS_MINE + DBMS_LOGMNR.COMMITTED_DATA_ONLY);

PL/SQL procedure successfully completed.


SQL> select count(*) from v$logmnr_contents;

COUNT(*)
----------
54994


SQL> create table extlogs as select * from v$logmnr_contents ;

Table created.

SQL> exec DBMS_LOGMNR.END_LOGMNR( );

PL/SQL procedure successfully completed.

sample query from extracted logs:
SELECT SEG_OWNER, SEG_NAME,operation, COUNT(*) AS Hits FROM
sys.logmined_commited
WHERE SEG_OWNER NOT IN('SYS','SYSTEM','WKSYS')
AND TRUNC(timestamp)='01-jan-07'
GROUP BY SEG_OWNER, SEG_NAME,operation
ORDER BY 1,2,3,4

above shows tables which had undergone DMLS

below shows DML type,corresponding DML statement and equivalent UNDO statement to undo DML for above affected objects.

SELECT OPERATION, SQL_REDO, SQL_UNDO
FROM sys.logmined_commited
WHERE SEG_OWNER = 'SEGOWNER'
AND SEG_NAME = 'segment_name'
AND OPERATION = 'UPDATE'
AND USERNAME = 'username';

Performance Tuning considerations for developing new Database system

Over all tuning = Oracle Tuning + OS Tuning (including Network Tuning)
Oracle tuning = DB Instance tuning + Application tuning
DB Instance tuning =Tuning Instance initialization parameters

Application Tuning=SQL Tuning + PL/SQL tuning
OS Tuning= Tuning Kernel Parameters+use of large page size of virtual memory+tuning of swap space+tuning of network kernel parameter+tuning of I/O kernel parameters
 
Whenever we have tuning issues I found to tune at system levevl is best approach efforts to gain higher benefit in short time. This is the area where lies the maximum scope of return of efforts as mostly its ignored. So Let us first begin with this.
 
1. Resources under utilized

 
a) SGA+PGA not configured to make best use of available memory.
 
b) Database files not distributed to balanced I/O. Most of the cases they are not following SAME methods.
 
c) Database code is not using partitioned tables or parallelization benefit
 
So even finest tuned aplication would be slow if they wait/content too much for memory/cpu/IO.
 
2. Poor hardware configuration
 
a) very few CPUs or underpower CPU cores
 
b) less memory so more physical I/O and swapping
 
c) not using appropriate RAID level, for e.g. using RAID for write intensive DB or using same for storing redo log files
 
d) very few diks array or physical drives
 
I found many servers having 4GB RAM, RAID 5 and too with only single logical array drive. To make it more slow I found less RPMS of individual hdds in Disk Array
 
3.Poor indexing
  • indices missing
  • indices unselective
  • indices not used
  • In other cases I found over indexing was kill for slow DMLs
  • too high clustering factor
 4. Statistics not gathered in way to provide  accurate statistics or they are stale
 
database configuration parameters
 
Poor rdbms version / Poor operating system - One should be wise to choose RDBMS software and OS.
 
5. extra load[unnessary work] I found one database project where there were too much redundant materialized views.
 
6. Too often gathering statistics
 
7. bugs(in case some specific sqls are slow) mostly these re fixed in patches.
not using hidden parameters to counter these
 
8. not using latest upgrade may also be one issue for performance slow down as new versions and patches generally have fixes for the known performance bugs and
 
9. Not using the latest features is also one concern for performance tuning one should go for features such as analytial functions,Bulk loading, Query Subfactoring [with clause ], partitioned tables can be used if required.

Performance Tuning Myths

1. index as much as possible all columns used in join and where clause.

2. Daily gather the statistics

3. DBMS_STATS is always better than ANALYZE TABLE

4. tables should be listed in from clause in such a way that oracle can start joining from smaller tables to bigger ones.

5. use of bind variable is always faster.

6. use Dynamic sqls as much as possible

7. in good tuned DB Cache hit ratios should be above 90

8. newer release faster than previous, so upgrade can boost performance

9. RAC can magically enchance performance.

10. A good DBA can always tune evry query and make it run from 1 hours to 1 min.

11. select count(1) is faster than select count(*)

12. Rebuild indexes daily.

Oracle full of features but BUGs too ORA-00600: internal error code, arguments: [kcratr1_lostwrt],

SETUP : two node 9i rac 9204 ON ocfs ON WIN 2K

first instance crahed due OCFS bug
now second rtried instance recovery of crashed first instance and it quite successfully recovered the first instance(appareent fro malert log of seconds instance)
but sooner secondnstance crashes too and throws ora-600 internal error.

ORA-00600: internal error code, arguments: [kcratr1_lostwrt], [], [], [], [], [], [], []

I tried to manually start the database but oracle throwing same error and complaining about corrupted logs the lost update. I used first argument of ora-600 here [kcratr1_lostwrt] - searched in metalink using ora-600 lookup tool broadly suggested either that was dangerous situation - required media recovery or that could be BUG with other OS like HP Unix and oracle software versions 9205 onwards. which was not our case.Oracle was spuriously trying to recover in second case and metalink told it can be recovered easily in second case.

I preferred to choose second solution path first over giving recover database command (with out restore from backup/after restore). So I set undocument parameter two_pass=false and opened the database successfully from first instance.then I removed this parameter and opened from first instance and second instance,respectively.

keep watch always on query causing extra I/O becuase of high HWM ..its high high high !!!

while I keep watch always on top resoucre intensive queries I observerd many time

HUGE I/O ON DEF$_AQERROR,THOUGH NO ROW in that CURRENTLY.
Since oracle checks for def error perioidically,it was wasting resource in our case.

for that Query oracle fires internally is below:

select q_name, state, delay, expiration, rowid, msgid,
dequeue_msgid, chain_no, local_order_no, enq_time, enq_tid,
step_no, priority, exception_qschema, exception_queue,
retry_count, corrid, time_manager_info
from SYSTEM.DEF$_AQERROR
where time_manager_info <= :1
and state != :2
for
update skip locked

currently there is no row in SYSTEM.DEF$_AQERROR table.
Though there were some errors seen in DEFERROR view a month ago due
confliction but we deleted them giving command like following for every error transaction.

.
execute dbms_defer_sys.delete_error(destination=>null,deferred_tran_id=>'1.26.1180542');
..
commit;


meanwhile we set parametr refresh_after_errors to true in all mv refreh group and when got all errors
removed from DEFERROR at master site we set back these parameter to default i.e
false.

Now my question was why did we still had reads from DEF$_ERRROR and that was comming in huge physical reads, the highest on instance.

Thouught:

that happening beacuse of HWM of DEF$_AQERROR was not reset by oracle's given error
removal procedure dbms_defer_sys.delete_error.

I took following action.

1. for all master groups
Suspend the master acivity at master site.
SQL> connect repadmin/repadmin
SQL> begin
dbms_repcat.suspend_master_activity(gname=>'REPDBA_MG');
end;
/

2.Truncate the table

SQL>connect SYSTEM/

SQL> truncate table def$aq_error;

3. Resume the master activity for all masterr groups.

SQL> connect repadmin/repadmin
SQL> begin
dbms_repcat.resume_master_activity(gname=>'REPDBA_MG');
end;
/



and I was done.

since then I never saw that query in top SQL

gotcha in distributed transactions

while I was just running a query fetching records from remote master db(MASTERDB), connection failed...resulted in PMON trying to clean this transaction every now and then(indicated from alert log).... I had worries my client DB(DEVDB) instance could be terminated,then I found way to resolve it as follows...

AT CLIENT DB(runing query fetching data from master db using dblink)

SQL> show parameter db_name

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_name string DEVDB
SQL> show user
USER is "SYS"
SQL> set line 300
SQL> SELECT LOCAL_TRAN_ID, GLOBAL_TRAN_ID, STATE, MIXED, HOST, COMMIT#
2 FROM DBA_2PC_PENDING;

LOCAL_TRAN_ID GLOBAL_TRAN_ID STATE MIX
---------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------- ---
HOST COMMIT#
-------------------------------------------------------------------------------------------------------------------------------- ----------------
5.6.717654 DEVDB.schemaname.ORG.6c69aa83.5.6.717654 collecting no
schemaname\DEVNODE 3486961816


SQL> set line 80
SQL> desc dba_2pc_pending
Name Null? Type
----------------------------------------- -------- ----------------------------
LOCAL_TRAN_ID NOT NULL VARCHAR2(22)
GLOBAL_TRAN_ID VARCHAR2(169)
STATE NOT NULL VARCHAR2(16)
MIXED VARCHAR2(3)
ADVICE VARCHAR2(1)
TRAN_COMMENT VARCHAR2(255)
FAIL_TIME NOT NULL DATE
FORCE_TIME DATE
RETRY_TIME NOT NULL DATE
OS_USER VARCHAR2(64)
OS_TERMINAL VARCHAR2(255)
HOST VARCHAR2(128)
DB_USER VARCHAR2(30)
COMMIT# VARCHAR2(16)

SQL> col global_tran_id format a35
SQL> /

LOCAL_TRAN_ID GLOBAL_TRAN_ID STATE MIX
---------------------- ----------------------------------- ---------------- ---
HOST
--------------------------------------------------------------------------------
COMMIT#
----------------
5.6.717654 DEVDB.schemaname.ORG.6c69aa83.5.6.717654 collecting no
schemaname\DEVNODE
3486961816


SQL> set line 300
SQL> /

LOCAL_TRAN_ID GLOBAL_TRAN_ID STATE MIX HOST COMMIT#
---------------------- ----------------------------------- ---------------- --- -------------------------------------------------------------------------------------------------------------------------------- ----------------
5.6.717654 DEVDB.schemaname.ORG.6c69aa83.5.6.717654 collecting no schemaname\DEVNODE 3486961816

SQL>
SQL> SELECT LOCAL_TRAN_ID, IN_OUT, DATABASE, INTERFACE
2 FROM DBA_2PC_NEIGHBORS;

LOCAL_TRAN_ID IN_ DATABASE I
---------------------- --- -------------------------------------------------------------------------------------------------------------------------------- -
5.6.717654 in N
5.6.717654 out MASTERDB.schemaname.ORG N

SQL> execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('5.6.717654');
BEGIN DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('5.6.717654'); END;

*
ERROR at line 1:
ORA-30019: Illegal rollback Segment operation in Automatic Undo mode
ORA-06512: at "SYS.DBMS_TRANSACTION", line 65
ORA-06512: at "SYS.DBMS_TRANSACTION", line 85
ORA-06512: at line 1


SQL> set transaction use rollback segment system
2 ;
set transaction use rollback segment system
*
ERROR at line 1:
ORA-30019: Illegal rollback Segment operation in Automatic Undo mode


SQL> show user;
USER is "SYS"
SQL> alter session set "_smu_debug_mode" = 4;

Session altered.

SQL> execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('5.6.717654');

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

SQL> spool off

ORA-12535: TNS:operation timed out: don't take it easy

ORA-12535: TNS:operation timed out

basic check is you check host name is resolved/reacheable correctly by PING and then further check and TNSPING succeeded but still not able to connect through oracle net(e.g. using SQLPLUS) - Why ?
so no address conflicts,no DNS outages, no virtual ip addresses that can not be resolved anymore, no failed cluster groups/veritas cluster packages
still getting.]

Hey ! you know ports concerned are open stil getting "ORA-12535: TNS:operation timed out" when connecting - why?

yet there could be FIREWALL problem

when connection request is made to an Oracle listener ( generally port 1521), a new process is spawned/thread created which listens on a different port (re u catching it),which is send back to the client, and the client tries to make a TCP connection to the new oracle process/thread. Thats typical way connection is made.

So you should not only add the lsnrctl to the FIREWALL rules, but also add the oracle process to the firewall rule. You know from basis OS knowledge the port given back to the client can range anywhere from 1024 to 65535.

if your oracle database server is on Windows you can set USE_SHARED_SOCKET = TRUE variable in the registry at HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE

**** more may be needed to make it dunne but hope it helps

You can further look at :
http://www.orafaq.com/maillist/oracle-l/2003/10/24/2174.htm
http://ora-12535.ora-code.com/msg/54838.html
http://ora-12535.ora-code.com/msg/39530.html

on oracle database residing on unix servers this problem may be faced in shared sever mode. to get out from it , in dispatchers parameter port number can be specified so port no wont be selected randomly.(make sure this port is listed in firewall oepn ports list)

dipatchers="(address=(pro=tcp)(host=hostname)(port=1234)(disp=5)"

another way to get rid is use connection manager in win/unix and set port no in cman.ora

ORA-12535 is faced also when using SSL and to skip it you can use dedicated specific port no in dispacthers parameter or by using conn mgr.

Follow by Email