Wednesday, May 07, 2008

Why is Varchar2 oversizing bad?

Apparently the same space consumption would be the same if a Varchar2(4) is used versus a Varchar2(4000), if you require one byte, then one byte will be allocated, and performance should be the same, but when taking a look at the performance statistics there are some surprises that arise.
When fetching information from Oracle, buffers and fetching structures are being prepared beforehand to be able to manage the maximum column lenght, applications such as toad may run out of memory trying to allocate resources to retrieve the information from an unnecessarily oversized varchar2 column.

Let's analyze this case:
Table 1: Varchar2LenghtDemo

SQL> desc Varchar2LenghtDemo
Name Null? Type
----------------------------------------- -------- ----------------------
ID NUMBER(10)
VCCOL VARCHAR2(4000)
VCCOL2 VARCHAR2(4000)
VCCOL3 VARCHAR2(4000)
VCCOL4 VARCHAR2(4000)
VCCOL5 VARCHAR2(4000)
VCCOL6 VARCHAR2(4000)
VCCOL7 VARCHAR2(4000)
VCCOL8 VARCHAR2(4000)
VCCOL9 VARCHAR2(4000)
VCCOL10 VARCHAR2(4000)
VCCOL11 VARCHAR2(4000)
VCCOL12 VARCHAR2(4000)
VCCOL13 VARCHAR2(4000)
VCCOL14 VARCHAR2(4000)
VCCOL15 VARCHAR2(4000)
VCCOL16 VARCHAR2(4000)
VCCOL17 VARCHAR2(4000)
VCCOL18 VARCHAR2(4000)
VCCOL19 VARCHAR2(4000)
VCCOL20 VARCHAR2(4000)

23:39:55 SQL> select count(*) from Varchar2LenghtDemo;

COUNT(*)
----------
8388608

Elapsed: 00:00:10.18

Statistics
----------------------------------------------------------
5 recursive calls
0 db block gets
55719 consistent gets
55669 physical reads
0 redo size
414 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed


If we compare the statistics from one table and the other:
SQL> desc Varchar2LenghtDemoShort
Name Null? Type
----------------------------------------- -------- --------------------

ID NUMBER(10)
VCCOL VARCHAR2(4)
VCCOL2 VARCHAR2(4)
VCCOL3 VARCHAR2(4)
VCCOL4 VARCHAR2(4)
VCCOL5 VARCHAR2(4)
VCCOL6 VARCHAR2(4)
VCCOL7 VARCHAR2(4)
VCCOL8 VARCHAR2(4)
VCCOL9 VARCHAR2(4)
VCCOL10 VARCHAR2(4)
VCCOL11 VARCHAR2(4)
VCCOL12 VARCHAR2(4)
VCCOL13 VARCHAR2(4)
VCCOL14 VARCHAR2(4)
VCCOL15 VARCHAR2(4)
VCCOL16 VARCHAR2(4)
VCCOL17 VARCHAR2(4)
VCCOL18 VARCHAR2(4)
VCCOL19 VARCHAR2(4)
VCCOL20 VARCHAR2(4)

23:43:02 SQL> r
1* select count(*) from Varchar2LenghtDemoShort

COUNT(*)
----------
8388608

Elapsed: 00:00:05.50

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
55756 consistent gets
22077 physical reads
0 redo size
414 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

Almost all statistics are the same, except for these two:
Operation: Select count(*) from table

Table Rows Time Elapsed Phy Rds
Varchar(4) 8,388,608 00:00:05.50 22,077
Varchar(4000) 8,388,608 00:00:10.18 55,669

Since the number of physical reads is meaningful different a question arises:
How many rows per block? Is the row density the same?
Table          Avg Rows per Block
Varchar(4) 150.99
Varchar(4000) 150.99

Quite surprising, Row density is the same, but physical blocks processed are doubled.


Conclusion

Even though Varchar2 columns allow a variable number of characters and it will just use the number of bytes required, it is absolutely better to properly size the actual requirements, otherwise a meaningful performance issue unnecessarily will be created.

8 comments:

Tonguç said...

Hi Madrid,

Once upon a time this was a question to me from one of my colleagues; since VARCHAR2 datatype has flexibilities to store variable-length values for example lets say "why not storing a “human name” :) on a varchar2(4000) column, what can be the alternative costs since varchar2(40) is already enough for this kind of a need?"

Besides row chaining and migrating, %ROWTYPE and %TYPE usage of PL/SQL is another threat in terms of memory consumption;

http://tonguc.yilmaz.googlepages.com/Code_listing-73-Memory_cost_analysis.txt

Also there are several limitations too;

create table t ( a varchar2(4000), b varchar2(4000) );
create index t_idx_ab on t(a, b);

create index t_idx_ab on t(a, b)
*
ERROR at line 1:
ORA-01450: maximum key length (6398) exceeded

The alternative CHAR datatype stores fixed-length character strings, since Oracle compares CHAR values using blank-padded comparison semantics where as Oracle compares VARCHAR2 values using nonpadded comparison semantics this also confuses developers a lot and motivates them to use VARCHAR2 over CHAR2 I believe.

Best regards.

Hector R. Madrid said...

Certainly. When I was performing the tests, there was an additional issue I didn't publish because of lack of database resources, and because I was interested in collecting operational statistics that were not able to be gathered because of this: when I was to perform another insert as select on the Varchar4000 table, inserting 8 million rows made the undo segment to overflow, meanwhile at the Varchar4 table it was able to keep on inserting up to 64 million rows without any problem at undo tablespace.

Cool said...

Hi!
so you are strongly recommending that we use char instead of varchar2 datatype. am I wright?

Hector R. Madrid said...

Actually I am not suggesting to use char datatype instead of varchar2, since the char datatype will reserve the amount of characters even if those are not used, leading to space wastage. My suggestion is even though varchar2 can be defined any size, it shouldn't be oversized, otherwise some performance and problems at execution time may arise.

Eduardo Legatti said...

Hi Hector,

Nice article. By the way, in addition to Tonguç's comment about ORA-01450, maybe creating a function-based index could partially solve the "problem":

SQL> create table t (a varchar2(4000),b varchar2(4000));

Table created.

SQL> create index t_idx_ab
2 on t(substr(a,1,100),substr(b,1,100));

Index created.

SQL> insert into t
2 values ('aaaaaaaaaaaaaaa','bbbbbbbbbbbbbbbb');

1 row created.

SQL> analyze table t compute statistics;

Table analyzed.

SQL> explain plan for
2 select * from t
3 where a='aaaaaaaaaaaaaaa' and b='bbbbbbbbbbbbbbbb';

Explained.

SQL> select * from table(dbms_xplan.display);

TABLE ACCESS BY INDEX ROWID INDEX RANGE SCAN (T_IDX_AB)

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("A"='aaaaaaaaaaa' AND "B"='bbbbbbbbbbbbbb')
2 - access(SUBSTR("A",1,100)='aaaaaaaaaaaaaaa' AND
SUBSTR("B",1,100)='bbbbbbbbbbbbbbbb')

Cheers

Legatti

Anonymous said...

I can't reproduce your results on 10.2.0.3 using 8K blocks, LMT with autoallocate and ASSM.

What was your tablespace definition, table pctfree, and how did you populate the table ?

Since you seem to have two tables of 55,000 blocks and have only reported 22,000 blocks read for one of tests, it looks as if either you had 33,000 blocks buffered or your version of Oracle has done asynchronous readahead for one of your tests and not counted the reads as physicals.

Can you reproduce the tests and use Tanel Poder's 'snapper' (or similar) to get the session's v$sesstat and v$session_event entries.

Hector R. Madrid said...

Jonathan, I will check it. The environment I used was a regular 10.2.0.1.0 database with LTM, 8K blocksize, and default storage parameters.

The procedure to populate tables was by means of Insert As Select procedures taking the data from the same table.

Another fact I should point out, an undo issue, when I was performing the insert procedure it was possible to insert a higher amount of data at the short sized varchar table, meanwhile the procedure ran out of undo space for the long sized varchar table.

Anonymous said...

I don't agree with the example, your first example data is retrieved from cache while second does not.

I simulate same example in http://arjudba.blogspot.com/2009/08/does-oversize-of-datatype-varchar2.html

There in fact no performance problem