Search This Blog

Column Order in Oracle Composite Index and indexing strategy

You have queries like:

select *
from table1
where c1=10 and c2=30 and c3=100

select *
from table1
where c1=10 and c2=30

If the columns used in filter conditions are not selective i.e c1, c2 and c3 have high cardinality then these columns are not good candidate for indexing. In this case their combination can be tried to see  if whole filter condition is more selective or say cardinality of whole filter,the combination of these individual filters is low.  So in this case composite index on combinations can be created. Combination of these columns can be unique or not unique

Now question #1 is what should be the combination of these columns in composite index. In oracle we have index skip scan so answer may be slighly different from other rdbms which uses composite indexes only when leading column is used.

Suppose that we have a composite unique index on this table on the combination of these columns (in that order):

FirstName, LastName, ContactID
Let’s look at the data distribution:
select count(*) as total_records,
count(distinct FirstName) as Distinct_FirstName,
count(distinct LastName) as Distinct_LastName,
count(distinct ContactID) as Distinct_ContactID
from Employees

total_records Distinct_FirstName Distinct_LastName Distinct_ContactID
------------- ------------------ ----------------- ------------------
2336724       1018               1206              2336724

you can see, the most selective column is CONTACTID for which a given value will qualify an exact 1 record. Next most selective is LastName
and the least selective is FirstName.

Now, let’s look at Question #1 :

Query :select * from Employees where ContactID = 2 and FirstName = ‘Catherine’ and see its cost

Now, let us also include the leading column of the index:

Query : select * from Employees where ContactID = 2 and FirstName = ‘Catherine’ and see its cost

Its cost will come down. This tells us that not only is the column order important, it is very important for the leading column of the index to be part of the filter condition in order to make good use of the index and to have a good execution plan. This immediately brings another question to mind – now, we know that we need the leading column of the index to be present in the filter condition (WHERE clause) but does it make a difference if we have the least selective vs the most selective column as the leading column of that index?

In our example, we have established that the ContactID column is the most selective – every value qualifies for a single record and the FirstName column is the least selective. The index that we currently have on the table is: test and it is formed on the combination of the columns in this order:

FirstName, LastName and then ContactID

Now, let’s go ahead and create another index and this time, we will make ContactID as the first column:

create index test_2 on Employees(ContactID, LastName, FirstName)

Now, let us execute the same query but force it to use different indexes and see whether the execution plan and the query cost changes at all. The query cost will be  different in the two cases – the test index usage yielded higher cost vs lower cost in case of test_2. So the column order as well as the selectivity of the columns in an index does matter a lot . If it is SQL Server which does not have index skip scan then having the most selective column as the leading column of the index is a good design assuming that that column is going to be used in your filter criteria.

Bottom line is that the order of the columns in a composite index is very important and it also depends upon how your queries are written. Let’s take one more example:

Say, I have a table TAB1 which has columns A, B, C and suppose I have two index:
Index1 on (a, b) and Index2 on (b,a).
If I have these queries:

Select * from TAB1 where A = @a and B = @b;
Select * from TAB1 where A = @a
then, Index1 is useful for these queries. If I have:

Select * from TAB1 where A = @a and B = @b;
Select * from TAB1 where B = @b
then, Index2 is more useful since it covers both the queries.  Beginning Oracle 9i,if cost based optimizer is choosen for execution theen there is  a new feature called “Index Skip Scan” was introduced using which both Index1 or Index2 can be useful in the above case but it depends upon the selectivity of those columns – we will cover index skip scan in a future post but you can read more over here  if you are interested.

Key things to take away from this post are:
1) Order of the columns in an index is important but the most important thing is to first understand your queries and see how the index is going to be used. Say, I have three columns in a table with million records with this data distribution for the columns:
C: 2 distinct values
B: 5 distinct values
A: Million distinct values
As you can see from above, A is the most selective and C is the least so is A, B, C the right order? Answer is it depends. If you are using only equality operators in your filter condition, then definitely the answer is Yes. If however, say, the filter criteria is: WHERE C = ‘X’ and B = 2 and A > getdate(), then we would scan a lot of C=’Y’ and B != 2 values while we are scanning the A > getdate() values (this criteria of A > the value that we used in the example below qualified nearly 200,000 records). A better thing would be to use B,C,A in that case to reach the place where B and C equality matches are done and scan the rest of the records using the A > getdate() criteria.

So, understand your queries and how you are going to be accessing the data and then decide how the composite index needs to be formed

Question 2: Is order of column always important:
Answer is :  If I have all the columns of the composite index in the filter condition (WHERE clause), then the order in which they are specified in the filter criteria does not matter.

So strategy could be start indexing columns from highly selective to least but consider cases of queries where there is no equality conditions also.

Another strategy if index skip scan has to be kept in mind is:

  1. If a and b both have 1000 distinct values and they are always queried together then the order of columns in the index doesn't really matter. But a has only 10 distinct values or you have queries which use just one of the columns then it does matter; in these scenarios the index may not be used if the column ordering does not suit the query.
  2. The column with the least distinct values ought to be first and the column with the most distinct values last. This not only maximises the utility of the index it also increases the potential gains from index compression.
  3. The datatype and length of the column have an impact on the return we can get from index compression but not on the best order of columns in an index.
  4. Arrange the columns with the least selective column first and the most selective column last. In the case of a tie lead with the column which is more likely to be used on its own.
The two main reasons for leading with the least selective column are
  1. Index compression
  2. Index Skip reads
Both these work their magic from knowing that the value in the current slot is the same as the value in the previous slot. Consequently we can maximize the return from these techniques by minimsing the number of times the value changes. In the following example, A has four distinct values and B has six. The dittos represent a compressible value or a skippable index block.
Least selective column leads ...

A          B
---------  -
"          2
"          3
"          4
"          5
"          6
"          2
"          3
"          4
"          5
"          6
"          2
"          3
"          4
"          5
"          6
"          2
"          3
"          4
"          5
"          6
Most selective column leads ...
B  A
-  --------
Even in this trival example, (A, B) has 20 skippable slots compared to the 18 of (B, A). A wider disparity would generate greater ROI on index compression or better utility from Index Skip reads.

Question 4:

I have a Composite Primary key which is combination of Column1, and Column2. What would be the difference if I have Composite key with Column2, and Column1. Does it even really matter which order you set columns?

It does, but perhaps not in the way you might expect.

Given col1 and col2, people's initial instinct is to put the more selective column first.

But, that's almost always irrelevant.

First, you need to consider how the index will be used in your SQL statements. If some SQLs only specify col1, then col1 should probably be first. If all SQLs will always specify both columns in the where predicate, then it doesn't matter which comes first.

But, what if most of the SQLs specify both col1 and col2, and a few provide only col1 or col2?

Well, in that case, the least selective column should (probably) go first. There are two reasons for this. First, if the least selective goes first, then, for those queries where only the second column is specified in the predicate, Oracle will be more likely to do an INDEX SKIP SCAN, if the leading column is not very selective. Second, by putting the less selective column first, you'll be more likely to be able to take advantage of index compression.

Choosing Composite Indexes

A composite index contains more than one key column. Composite indexes can provide additional advantages over single-column indexes:

  • Improved selectivity
    Sometimes two or more columns or expressions, each with poor selectivity, can be combined to form a composite index with higher selectivity.
  • Reduced I/O
    If all columns selected by a query are in a composite index, then Oracle can return these values from the index without accessing the table.

A SQL statement can use an access path involving a composite index if the statement contains constructs that use a leading portion of the index

A leading portion of an index is a set of one or more columns that were specified first and consecutively in the list of columns in the CREATE INDEX statement that created the index. Consider this CREATE INDEX statement:
CREATE INDEX comp_ind 
ON table1(x, y, z);

  • x, xy, and xyz combinations of columns are leading portions of the index
  • yz, y, and z combinations of columns are not leading portions of the index

Choosing Keys for Composite Indexes

Follow these guidelines for choosing keys for composite indexes:

  • Consider creating a composite index on keys that are used together frequently in WHERE clause conditions combined with AND operators, especially if their combined selectivity is better than the selectivity of either key individually.
  • If several queries select the same set of keys based on one or more key values, then consider creating a composite index containing all of these keys.

Of course, consider the guidelines associated with the general performance advantages and trade-offs of indexes described in the previous sections.

Ordering Keys for Composite Indexes

Follow these guidelines for ordering keys in composite indexes:

  • Create the index so the keys used in WHERE clauses make up a leading portion.
  • If some keys are used in WHERE clauses more frequently, then be sure to create the index so that the more frequently selected keys make up a leading portion to allow the statements that use only these keys to use the index.
  • If all keys are used in WHERE clauses equally often, then ordering these keys from most selective to least selective in the CREATE INDEX statement best improves query performance.
  • If all keys are used in the WHERE clauses equally often but the data is physically ordered on one of the keys, then place that key first in the composite index.

Follow by Email