dbTalk Databases Forums  

Column space requirements - VARCHAR2 versus NUMBER

comp.databases.oracle.misc comp.databases.oracle.misc


Discuss Column space requirements - VARCHAR2 versus NUMBER in the comp.databases.oracle.misc forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Luke
 
Posts: n/a

Default Column space requirements - VARCHAR2 versus NUMBER - 05-03-2011 , 08:26 PM






Hello,

Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit

I have a table that will contain around 300 million rows so space
utilization is a primary criterion for the design of the table.

One of the columns will be loaded with a variable length numeric value
from 1 to 9999999999 that will contain leading zeroes.

Since this numeric column value is not used in any computations, if
space were not so paramount, I would normally define it as
varchar2(10) but we are considering defining it as NUMBER (realizing
that we will lose the leading zeroes when the column is populated).

My understanding is that the NUMBER datatype will generally require
less space than the VARCHAR2 datatype but is defining the column as
NUMBER a reasonable design compromise?

TIA

Luke

Reply With Quote
  #2  
Old   
Mark D Powell
 
Posts: n/a

Default Re: Column space requirements - VARCHAR2 versus NUMBER - 05-04-2011 , 09:05 AM






On May 3, 9:26*pm, Luke <luke_ai... (AT) hotmail (DOT) com> wrote:
Quote:
Hello,

Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit

I have a table that will contain around 300 million rows so space
utilization is a primary criterion for the design of the table.

One of the columns will be loaded with a variable length numeric value
from 1 to 9999999999 that will contain leading zeroes.

Since this numeric column value is not used in any computations, if
space were not so paramount, I would normally define it as
varchar2(10) but we are considering defining it as NUMBER (realizing
that we will lose the leading zeroes when the column is populated).

My understanding is that the NUMBER datatype will generally require
less space than the VARCHAR2 datatype but is defining the column as
NUMBER a reasonable design compromise?

TIA

Luke
If the column will always contain only digits not just now but also in
the future then yes using data type number instead of varchar2 may
make sense however you also need to consider if this value is stored
in any other tables and if so if the tables will ever be joined on
this column. If the tables will be joined on this column then how the
column is defined in these other table matters since you really do not
want data type conversion to take place on the join. Good design
would dictate that the same data in more than one table should be
defined using the same column name, data type, and associated data
type specification (length, number of significant digits, etc ,,,) in
all tables.

When the column also appears in other tables you would want to check
the other tables data to be sure the values are always digits in these
tables also. The presence of non-digits in these other tables would
mean not only would data type conversion be required on joins but the
direction of the conversion becomes important.

You could easily use the to_char(column,'00000009') function with the
appropriate mask to display the leading zeroes on retieval. The
conversion to number would discard them without issue.

HTH -- Mark D Powell --

Reply With Quote
  #3  
Old   
joel garry
 
Posts: n/a

Default Re: Column space requirements - VARCHAR2 versus NUMBER - 05-04-2011 , 11:35 AM



On May 3, 6:26*pm, Luke <luke_ai... (AT) hotmail (DOT) com> wrote:
Quote:
Hello,

Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit

I have a table that will contain around 300 million rows so space
utilization is a primary criterion for the design of the table.

One of the columns will be loaded with a variable length numeric value
from 1 to 9999999999 that will contain leading zeroes.

Since this numeric column value is not used in any computations, if
space were not so paramount, I would normally define it as
varchar2(10) but we are considering defining it as NUMBER (realizing
that we will lose the leading zeroes when the column is populated).

My understanding is that the NUMBER datatype will generally require
less space than the VARCHAR2 datatype but is defining the column as
NUMBER a reasonable design compromise?

TIA

Luke
OK, so you are going to have at most 10 digits. From
http://www.jlcomp.demon.co.uk/faq/num_size.html that would take up 7
bytes (length of data plus column overhead) v. 11. So that would be
2.1G versus 3.3G.

I think you should put more weight on the processing requirements and
the issues Mark pointed out, and less on 1G here or there. If you are
doing full table scans, you might have other things to worry about in
your design.

You definitely need to evaluate whether the leading zero requirement
is just a holdover from the past or an over-generalized design, and
should be just a display issue.

jg
--
@home.com is bogus.
http://news.techworld.com/applicatio...ys-former-ceo/

Reply With Quote
  #4  
Old   
Peter J. Holzer
 
Posts: n/a

Default Re: Column space requirements - VARCHAR2 versus NUMBER - 05-07-2011 , 04:56 AM



On 2011-05-04 01:26, Luke <luke_airig (AT) hotmail (DOT) com> wrote:
Quote:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit

I have a table that will contain around 300 million rows so space
utilization is a primary criterion for the design of the table.

One of the columns will be loaded with a variable length numeric value
from 1 to 9999999999 that will contain leading zeroes.
What does "variable length ... that will contain leading zeroes" mean
exactly:

1) The value without leading zeroes is variable length, leading zeroes
are used to pad to 10 digits ("0000000042" is a legal value, but
"00042" is not).
2) The length including leading zeroes is variable, but the leading
zeroes are not significant ("0000000042" and "00042" could both occur
but are considered the same value)
3) The length including leading zeroes is variable and all digits are
significant ("0000000042" and "00042" can both occur and are
different values).

Quote:
Since this numeric column value is not used in any computations, if
space were not so paramount, I would normally define it as
varchar2(10) but we are considering defining it as NUMBER (realizing
that we will lose the leading zeroes when the column is populated).
This will only really work in the case 1 above. It will sort of work in
case 2 (i.e. you can still identify your records but users may be
irritated if "the computer eats their zeroes").

It will not work at all in case 3. You can, however, make it work in
case 3 by prepending a fixed digit. So you store "0000000042" as
10000000042 and "00042" as 100042. For display purposes you strip the
first digit. This will add 0.5 bytes per row on average.
Alternatively you could use a varchar2 and store each pair of digits in
a single ASCII character. That would shave off another byte compared to
storing it as number, but has the disadvantage that you have cryptic
strings instead of readable numbers in your database - probably not
worth the bother.

hp

Reply With Quote
  #5  
Old   
Rockhound57
 
Posts: n/a

Default Re: Column space requirements - VARCHAR2 versus NUMBER - 05-08-2011 , 10:08 PM



On Sat, 7 May 2011 11:56:46 +0200, "Peter J. Holzer"
<hjp-usenet2 (AT) hjp (DOT) at> wrote:

Quote:
On 2011-05-04 01:26, Luke <luke_airig (AT) hotmail (DOT) com> wrote:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit

I have a table that will contain around 300 million rows so space
utilization is a primary criterion for the design of the table.

One of the columns will be loaded with a variable length numeric value
from 1 to 9999999999 that will contain leading zeroes.

What does "variable length ... that will contain leading zeroes" mean
exactly:

1) The value without leading zeroes is variable length, leading zeroes
are used to pad to 10 digits ("0000000042" is a legal value, but
"00042" is not).
2) The length including leading zeroes is variable, but the leading
zeroes are not significant ("0000000042" and "00042" could both occur
but are considered the same value)
3) The length including leading zeroes is variable and all digits are
significant ("0000000042" and "00042" can both occur and are
different values).

Since this numeric column value is not used in any computations, if
space were not so paramount, I would normally define it as
varchar2(10) but we are considering defining it as NUMBER (realizing
that we will lose the leading zeroes when the column is populated).

This will only really work in the case 1 above. It will sort of work in
case 2 (i.e. you can still identify your records but users may be
irritated if "the computer eats their zeroes").

It will not work at all in case 3. You can, however, make it work in
case 3 by prepending a fixed digit. So you store "0000000042" as
10000000042 and "00042" as 100042. For display purposes you strip the
first digit. This will add 0.5 bytes per row on average.
Alternatively you could use a varchar2 and store each pair of digits in
a single ASCII character. That would shave off another byte compared to
storing it as number, but has the disadvantage that you have cryptic
strings instead of readable numbers in your database - probably not
worth the bother.

hp

Leading zeros... Think everyone haves them. There's a special place in
Hell for people who define things identified by things using leading
zeros!



Would help if we knew what this column content represented. Might lead
to less guessing, better modeling.

Two blind thoughts.

(1) This is 10 numeric characters in one column in a 300 million row
table. How many other columns, how big are they? or is it a waste of
effort to try to tweak a teeny bit of storage out of what may be
insignificant, in the grand scheme of things? If each record is fifty
columns, some of which are rather "fat", why bother? just store 10
characters & keep all the zeros.

(2) If, for some reason, these mysterious leading zeros all must be
there to a total of ten charactes worth in the string, (God knows why
- some sort of part number, maybe?), why not just get the zeros "back"
when retrieving them using LPAD in SQL & not bother with storing them
in the table at all?

But a purist would say that if something is part # "000564" I'd want
to store 000564, NOT 564. And I'd have to say I'd agree, too.
Integrity of the data is more important than saving a byte or 2.

5 years from now, when the systems have grown, mutated & changed
beyond all recognition, nobody will ever know why that particular
column happened to be fiddled with - best to "keep it real".

Reply With Quote
  #6  
Old   
Walt
 
Posts: n/a

Default Re: Column space requirements - VARCHAR2 versus NUMBER - 05-09-2011 , 03:01 PM



On 5/8/2011 11:08 PM, Rockhound57 wrote:

Quote:
Leading zeros... Think everyone haves them.
Rockhound,

Did you meant to say "everyone *hates* them"? If so, I'm in complete
agreement.

Luke,

Unless you are working with legacy data where (for instance) you need
to distinguish between 42 042 and 000042, everything will work much
much better if you just use the Number datatype. Numeric-like data with
leading zeroes is a notorious bug-inducer - in particular, Microsoft Excel has
fits whenever it encounters numeric-like data with leading zeroes.

If you can design them out of the spec now, you will save yourself and
your colleagues many many frustrating hours of coding around the leading
zeroes.

There are space and speed advantages of going with the Number datatype
as well, but they pale in comparison to the frustration you'll encounter
when the system *breaks* rather than just being slow or taking up a lot
of space.

//Walt

Reply With Quote
  #7  
Old   
Rockhound57
 
Posts: n/a

Default Re: Column space requirements - VARCHAR2 versus NUMBER - 05-10-2011 , 10:37 AM



On Mon, 09 May 2011 16:01:16 -0400, Walt <walt_askier (AT) yahoo (DOT) com>
wrote:

Quote:
On 5/8/2011 11:08 PM, Rockhound57 wrote:


Leading zeros... Think everyone haves them.
Rockhound,

Did you meant to say "everyone *hates* them"? If so, I'm in complete
agreement.

Exactly. as usual, I can't type worth spit.


Quote:
Luke,

Unless you are working with legacy data where (for instance) you need
to distinguish between 42 042 and 000042, everything will work much
much better if you just use the Number datatype. Numeric-like data with
leading zeroes is a notorious bug-inducer - in particular, Microsoft Excel has
fits whenever it encounters numeric-like data with leading zeroes.

Modeling here is key. Wish we had a clue as to what this column data
was representing, the advice would be better. If it's some sort of
part number, where the zeros are a part of the identifier, they're
something that has to be kept. But, not knowing what the information
in the table IS, it's hard to be truly helpful.

As I often say, the "DBA Ouija board" is down for maintenance today.





Quote:
If you can design them out of the spec now, you will save yourself and
your colleagues many many frustrating hours of coding around the leading
zeroes.

There are space and speed advantages of going with the Number datatype
as well, but they pale in comparison to the frustration you'll encounter
when the system *breaks* rather than just being slow or taking up a lot
of space.

//Walt

Reply With Quote
  #8  
Old   
Tim X
 
Posts: n/a

Default Re: Column space requirements - VARCHAR2 versus NUMBER - 06-14-2011 , 03:44 AM



Luke <luke_airig (AT) hotmail (DOT) com> writes:

Quote:
Hello,

Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit

I have a table that will contain around 300 million rows so space
utilization is a primary criterion for the design of the table.

One of the columns will be loaded with a variable length numeric value
from 1 to 9999999999 that will contain leading zeroes.

Since this numeric column value is not used in any computations, if
space were not so paramount, I would normally define it as
varchar2(10) but we are considering defining it as NUMBER (realizing
that we will lose the leading zeroes when the column is populated).

My understanding is that the NUMBER datatype will generally require
less space than the VARCHAR2 datatype but is defining the column as
NUMBER a reasonable design compromise?

TIA

While I can understand your concern, I think it may be misplaced.

It is likely that if you have a requirement to have numbers with leading
0, these are not really number, but identifiers (such as a staff number
or similar). In this case, you are really dealing with a string of
characters that just happen to all be digits.

Now, one of the advantages of varchar2 is that the column will only use
the space it requires while number columns will use a fixed size. So,
the question is, will the average length of your varchar2 columns be
larger than the fixed width number columns? But I don't think that is
even the major point and suspect the differences will, in the scale of
things, be minimal.

What is possibly more crucial is how the value is going to be processed/used.
For example, if the leading zeros are important, maybe you will end up
with lots of to_char conversions to pad with leading zeros or maybe you
will need to doing joins with other tables where the data is represented
as varchar2 etc.

Sorry this isn't really answering the question, but the problem is you
cannot give an adequate answer without a lot more information regarding
the application and how the value is used. One thing I can say, I've
lost count of the number of times developers have used a number column
to represent a value when it should have been represented as a varchar2
and was only defined as being a number because it represented some
attribute that just happened to have the word number in its name, such
as staff number. My rule of thumb is that if the value is not going to
be used in mathematical calculations and your in doubt, make it a
varchar2.

Tim




--
tcross (at) rapttech dot com dot au

Reply With Quote
  #9  
Old   
Luuk
 
Posts: n/a

Default Re: Column space requirements - VARCHAR2 versus NUMBER - 06-14-2011 , 01:29 PM



On 14-06-2011 10:44, Tim X wrote:
Quote:
Luke <luke_airig (AT) hotmail (DOT) com> writes:

Hello,

Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit

I have a table that will contain around 300 million rows so space
utilization is a primary criterion for the design of the table.

One of the columns will be loaded with a variable length numeric value
from 1 to 9999999999 that will contain leading zeroes.

Since this numeric column value is not used in any computations, if
space were not so paramount, I would normally define it as
varchar2(10) but we are considering defining it as NUMBER (realizing
that we will lose the leading zeroes when the column is populated).

My understanding is that the NUMBER datatype will generally require
less space than the VARCHAR2 datatype but is defining the column as
NUMBER a reasonable design compromise?

TIA


While I can understand your concern, I think it may be misplaced.

It is likely that if you have a requirement to have numbers with leading
0, these are not really number, but identifiers (such as a staff number
or similar). In this case, you are really dealing with a string of
characters that just happen to all be digits.

Now, one of the advantages of varchar2 is that the column will only use
the space it requires while number columns will use a fixed size. So,
the question is, will the average length of your varchar2 columns be
larger than the fixed width number columns? But I don't think that is
even the major point and suspect the differences will, in the scale of
things, be minimal.

What is possibly more crucial is how the value is going to be processed/used.
For example, if the leading zeros are important, maybe you will end up
with lots of to_char conversions to pad with leading zeros or maybe you
will need to doing joins with other tables where the data is represented
as varchar2 etc.

Sorry this isn't really answering the question, but the problem is you
cannot give an adequate answer without a lot more information regarding
the application and how the value is used. One thing I can say, I've
lost count of the number of times developers have used a number column
to represent a value when it should have been represented as a varchar2
and was only defined as being a number because it represented some
attribute that just happened to have the word number in its name, such
as staff number. My rule of thumb is that if the value is not going to
be used in mathematical calculations and your in doubt, make it a
varchar2.

Tim




If those columns always contain 10 numbers
would it make a difference to define the table as
CREATE TABLE table ( column1 CHAR(20));
or
CREATE TABLE table ( column1 VARCHAR(20));

?

--
Luuk

Reply With Quote
  #10  
Old   
joel garry
 
Posts: n/a

Default Re: Column space requirements - VARCHAR2 versus NUMBER - 06-15-2011 , 01:27 PM



On Jun 14, 11:29*am, Luuk <L... (AT) invalid (DOT) lan> wrote:
Quote:
On 14-06-2011 10:44, Tim X wrote:



Luke <luke_ai... (AT) hotmail (DOT) com> writes:

Hello,

Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit

I have a table that will contain around 300 million rows so space
utilization is a primary criterion for the design of the table.

One of the columns will be loaded with a variable length numeric value
from 1 to 9999999999 that will contain leading zeroes.

Since this numeric column value is not used in any computations, if
space were not so paramount, I would normally define it as
varchar2(10) but we are considering defining it as NUMBER (realizing
that we will lose the leading zeroes when the column is populated).

My understanding is that the NUMBER datatype will generally require
less space than the VARCHAR2 datatype but is defining the column as
NUMBER a reasonable design compromise?

TIA

While I can understand your concern, I think it may be misplaced.

It is likely that if you have a requirement to have numbers with leading
0, these are not really number, but identifiers (such as a staff number
or similar). In this case, you are really dealing with a string of
characters that just happen to all be digits.

Now, one of the advantages of varchar2 is that the column will only use
the space it requires while number columns will use a fixed size. So,
the question is, will the average length of your varchar2 columns be
larger than the fixed width number columns? But I don't think that is
even the major point and suspect the differences will, in the scale of
things, be minimal.

What is possibly more crucial is how the value is going to be processed/used.
For example, if the leading zeros are important, maybe you will end up
with lots of to_char conversions to pad with leading zeros or maybe you
will need to doing joins with other tables where the data is represented
as varchar2 etc.

Sorry this isn't really answering the question, but the problem is you
cannot give an adequate answer without a lot more information regarding
the application and how the value is used. One thing I can say, I've
lost count of the number of times developers have used a number column
to represent a value when it should have been represented as a varchar2
and was only defined as being a number because it represented some
attribute that just happened to have the word number in its name, such
as staff number. My rule of thumb is that if the value is not going to
be used in mathematical calculations and your in doubt, make it a
varchar2.

Tim

If those columns always contain 10 numbers
would it make a difference to define the table as
CREATE TABLE table ( column1 CHAR(20));
or
CREATE TABLE table ( column1 VARCHAR(20));

?

--
Luuk
Canonical answer: http://asktom.oracle.com/pls/asktom/...D:123212348063

jg
--
@home.com is bogus.
http://www.informationweek.com/news/...e/bi/230700013

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.