Friday, March 10, 2006

Database: Various Methods to Use Dates From Tables as Parameters for Queries

I had someone ask me the following question:

How can I use a date value as a variable for another query? For example: select * from XXX where startdate = XXXX;

There are actually a few ways to do this. I will cover the easier ones here. There are 3 ways to do this, either through a join, through a sub query, or through an exists statement.

To use the join method, you would so something like this:

Select      
     ‘X’
from
     table1, table2
where
     table1.startdate = table2.startdate;

Now there is a little caveat in working with dates in databases, in particular I will use Oracle to demonstrate this. Lets say we have 2 tables, table1 and table2, both with 1 field called startdate. The tables contains the following data:

SQL> select startdate from table1;

STARTDATE
---------
10-MAR-06
15-MAR-06
04-APR-06
10-MAR-06

SQL> select startdate from table2;

STARTDATE
---------
10-MAR-06

Now, if we use the above query we should get 2 results back, right?  Well, lets try it and see:

SQL> Select  
  2   'X'
  3  from
  4   table1, table2
  5  where
  6   table1.startdate = table2.startdate;

no rows selected

So why didn’t we get any results back? The answer is because Oracle stored date fields as both Date and Time. By default, if a time is not specified, midnight is assumed. Because of this, between statements may not work as expected. So to fix the above query, I will change it to a ‘DD-MON-YYYY’ format so they will match. Lets see what happens when I do this.

  1  Select
  2   'X'
  3  from
  4   table1, table2
  5  where
  6*  to_char(table1.startdate, 'DD-MON-YYYY') = to_char(table2.startdate, 'DD-MON-YYYY')
SQL> /

'X'
--------------------------------
X
X

And these are the results I wanted to see. The second method is using a sub query. This can be a little trickier than using a join since sub query as results require that only 1 result be returned in order to match. Lets look at an example:

  1  SELECT
  2         'X'
  3  FROM
  4         table1
  5  WHERE
  6         to_char(startdate,
  7                 'DD-MON-YYYY'
  8                ) = (
  9         SELECT
10                TO_CHAR(startdate,
11                        'DD-MON-YYYY'
12                       )
13         FROM
14                table2
15*                   )
SQL> /

'X'
--------------------------------
X
X

This is what we expect, but that is only because table2 has only 1 value in it. What happens if table2 has data like so:

SQL> select startdate from table2;

STARTDATE
---------
10-MAR-06
10-MAR-06
10-MAR-06

SQL> SELECT
  2         'X'
  3  FROM
  4         table1
  5  WHERE
  6         to_char(startdate,
  7                 'DD-MON-YYYY'
  8                ) = (
  9         SELECT
10                TO_CHAR(startdate,
11                        'DD-MON-YYYY'
12                       )
13         FROM
14                table2
15                    )
16  /
       SELECT
           *
ERROR at line 9:
ORA-01427: single-row subquery returns more than one row

Suddenly, the subquery does not work. This can be fixed by adding a distinct clause to the sub query. So it will look like this:

  1  SELECT
  2         'X'
  3  FROM
  4         table1
  5  WHERE
  6         to_char(startdate,
  7                 'DD-MON-YYYY'
  8                ) = (
  9         SELECT
10                distinct TO_CHAR(startdate,
11                        'DD-MON-YYYY'
12                       )
13         FROM
14                table2
15*                   )
SQL> /

'X'
--------------------------------
X
X

Alternative to this, you can also use an EXISTS statement to retrieve your results. To do so, the query would look like so:

  1  SELECT
  2         table1.startdate
  3  FROM
  4         table1
  5  WHERE
  6         exists (
  7         SELECT
  8                'X'
  9         FROM
10                table2
11     where
12             to_char(table2.startdate, 'DD-MON-YYYY') = to_char(table1.startdate, 'DD-MON-YYYY')
13*    )
SQL> /

STARTDATE
---------
10-MAR-06
10-MAR-06

Assuming we have an index on both tables, the execution plan for the first join minus he to_char() function would be:

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=3 Bytes=42)
   1    0   NESTED LOOPS (Cost=1 Card=3 Bytes=42)
   2    1     INDEX (FULL SCAN) OF 'TBL2' (NON-UNIQUE) (Cost=1 Card=3
          Bytes=21)

   3    1     INDEX (RANGE SCAN) OF 'TBL1' (NON-UNIQUE)

With the function added it would look like:
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=1 Bytes=14)
   1    0   HASH JOIN (Cost=5 Card=1 Bytes=14)
   2    1     TABLE ACCESS (FULL) OF 'TABLE2' (Cost=2 Card=3 Bytes=21)
   3    1     TABLE ACCESS (FULL) OF 'TABLE1' (Cost=2 Card=4 Bytes=28)

Hmm, so by adding the function, we have increased the cost. What about with the sub query?

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=7)
   1    0   TABLE ACCESS (FULL) OF 'TABLE1' (Cost=2 Card=1 Bytes=7)
   2    1     SORT (UNIQUE) (Cost=5 Card=3 Bytes=21)
   3    2       TABLE ACCESS (FULL) OF 'TABLE2' (Cost=2 Card=3 Bytes=2
          1)

Hmm, the cost is slightly lower, even with the functions and the distinct clause. So what does it look like with the EXISTS statement?

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=1 Bytes=14)
   1    0   HASH JOIN (SEMI) (Cost=5 Card=1 Bytes=14)
   2    1     TABLE ACCESS (FULL) OF 'TABLE1' (Cost=2 Card=4 Bytes=28)
   3    1     TABLE ACCESS (FULL) OF 'TABLE2' (Cost=2 Card=3 Bytes=21)

In reality the results would be much different if Oracles Function based Indexes were turned on, and if the tables were much larger where the Cost Based Optimizer would actually determine that an index would be faster than a full table scan. I personally would prefer the join method for these examples, but your mileage may vary. The various execution plans and reasons for Full Table Scans illustrate some of the complexities with working with Oracle. Different database platforms will have a different set of problems. For example, Microsoft Access, although supporting EXISTS statements, have been a source of unexpected results, which leads me to believe the support is not complete. Certain DBMS platforms separate date and time types, so having to convert to a date only format to compare will be unnecessary on certain platforms. Some platforms may not support sub queries, or the EXISTS statement. So take these things into consideration when designing your queries and consult documentation on what features are supported on your platform. I hope this helps.

No comments: