dbTalk Databases Forums  

Large(r) databases - best practices

sybase.public.sqlanywhere.general sybase.public.sqlanywhere.general


Discuss Large(r) databases - best practices in the sybase.public.sqlanywhere.general forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
jeff
 
Posts: n/a

Default Large(r) databases - best practices - 04-02-2006 , 02:40 PM






Hello.

I've been using sqlanywhere for about two years and am starting to collect a
fairly substantial amount of data. I currently have ~20 database ranging from
small < 1G to larger databases, the largest being 22G (today). i have them
being backed up daily on the same machine they are served up on, via BACKUP
DATABASE statements, and then sent over a Gigabit LAN connection to a backup
host - infrant readyNAS 370G RAID1

From what i can find online, size does not seem to be much of a concern so long
as the OS supports it (using W2K3 - 300G total in RAID5 with 3G RAM)

Each db is 1 file, not broken up into tablespaces. There are three large tables
in each database that store 1 or 2 LONGBINARY column(s) plus the configuration,
in the same row, that accompanies this data, which are 2000->4000+ 32bit floats.

The largest database is, as you may have excepted, also the fastest growing.

I've not seen any performance degradation as these databases get bigger, but
want to understand if i should be doing something different. The databases are
inserted into very regularly, but only read from to populate front end databases
which drive our main application.

so is there a question in here? just wanted to explain what i have and am
looking for opinions as how to better handle it, or understand that this
configuration is fine.

Happy to provide more info if needed.

Thanks for reading.
Jeff

Reply With Quote
  #2  
Old   
snelbert
 
Posts: n/a

Default Re: Large(r) databases - best practices - 04-03-2006 , 05:43 AM






I am no expert but I would advise running a database validation on a regular
basis, probably, right before you backup the data.

Snelbert




"jeff" <jdlists (AT) gmail (DOT) com> wrote

Quote:
Hello.

I've been using sqlanywhere for about two years and am starting to collect
a fairly substantial amount of data. I currently have ~20 database
ranging from small < 1G to larger databases, the largest being 22G
(today). i have them being backed up daily on the same machine they are
served up on, via BACKUP DATABASE statements, and then sent over a Gigabit
LAN connection to a backup host - infrant readyNAS 370G RAID1

From what i can find online, size does not seem to be much of a concern so
long as the OS supports it (using W2K3 - 300G total in RAID5 with 3G RAM)

Each db is 1 file, not broken up into tablespaces. There are three large
tables in each database that store 1 or 2 LONGBINARY column(s) plus the
configuration, in the same row, that accompanies this data, which are
2000->4000+ 32bit floats.

The largest database is, as you may have excepted, also the fastest
growing.

I've not seen any performance degradation as these databases get bigger,
but want to understand if i should be doing something different. The
databases are inserted into very regularly, but only read from to populate
front end databases which drive our main application.

so is there a question in here? just wanted to explain what i have and am
looking for opinions as how to better handle it, or understand that this
configuration is fine.

Happy to provide more info if needed.

Thanks for reading.
Jeff



Reply With Quote
  #3  
Old   
Pavel Karady
 
Posts: n/a

Default Re: Large(r) databases - best practices - 04-03-2006 , 06:12 AM



Hello,

you've missed a webcast about practices using 100GB+ databases, which is
just a week or two away now. Basically:

- RAID 0 or 1 perform slightly better than RAID 5 (RAID 5 performs better
only when writing small chunks of data to different disk locations - at
least that's what I've read somewhere - and that's why using ASA together
with RAID 5 is slower than with RAID 0 or 1)

- dbspaces have no impact on the performance. the only MAJOR harddisk thing
that has impact on perf is it's speed: use 10.000 rpm SCSI harddisk is
applicable... the faster the better - and harddisk are cheap nowadays so
there's no reason to buy a 100.000$ box with 7200 rpm disk... (and really
hard to understand)

Achieving higher throughput:

- db design - get rid of any columns that you do not want. the less columns
in a table the better.

- get rid of every index you don't really need

- remove fk's which are present only for "completeness" and
design-accuration

- use autoincrement... in the webcast, 94% speedup of DELETE statement (!!)
was presented when autoincrements has been used on the same data

- use test to measure speed... build up for exmaple a sophisticated,
time-measuring 100.000 inserts test and measure it's time with different
db/disk/system settings (and, of course, never trust your first test...
completely discard the result and repeat the inserts 2-3-4 times until it's
speed does not change and take that as a result)

all of these suggestions are optional, e.g. you should consider if removing
uneccesary fk's is really what you want, if not and perf does not degrade as
the time goes, then you don't need to do that - and so on. hope this helped
somewhat

Pavel


"jeff" <jdlists (AT) gmail (DOT) com> wrote

Quote:
Hello.

I've been using sqlanywhere for about two years and am starting to collect
a fairly substantial amount of data. I currently have ~20 database
ranging from small < 1G to larger databases, the largest being 22G
(today). i have them being backed up daily on the same machine they are
served up on, via BACKUP DATABASE statements, and then sent over a Gigabit
LAN connection to a backup host - infrant readyNAS 370G RAID1

From what i can find online, size does not seem to be much of a concern so
long as the OS supports it (using W2K3 - 300G total in RAID5 with 3G RAM)

Each db is 1 file, not broken up into tablespaces. There are three large
tables in each database that store 1 or 2 LONGBINARY column(s) plus the
configuration, in the same row, that accompanies this data, which are
2000->4000+ 32bit floats.

The largest database is, as you may have excepted, also the fastest
growing.

I've not seen any performance degradation as these databases get bigger,
but want to understand if i should be doing something different. The
databases are inserted into very regularly, but only read from to populate
front end databases which drive our main application.

so is there a question in here? just wanted to explain what i have and am
looking for opinions as how to better handle it, or understand that this
configuration is fine.

Happy to provide more info if needed.

Thanks for reading.
Jeff



Reply With Quote
  #4  
Old   
Breck Carter [Team iAnywhere]
 
Posts: n/a

Default Re: Large(r) databases - best practices - 04-03-2006 , 09:17 AM



On 2 Apr 2006 11:40:17 -0800, jeff <jdlists (AT) gmail (DOT) com> wrote:

Quote:
Hello.

I've been using sqlanywhere for about two years and am starting to collect a
fairly substantial amount of data. I currently have ~20 database ranging from
small < 1G to larger databases, the largest being 22G (today). i have them
being backed up daily on the same machine they are served up on, via BACKUP
DATABASE statements, and then sent over a Gigabit LAN connection to a backup
host - infrant readyNAS 370G RAID1
22G is larger than the average SQL Anywhere database but it is by no
means "huge" (look, there are a bazillion tiny embedded SQL Anywhere
databases so the "average" size is always going to be tiny... even a
hundred 300G+ databases ain't going to skew the stats

If regular full offline or online backups become problematic (and they
might not, if the database isn't heavily used 24 by 7), you might
consider differential or incremental online log backups. Plus a live
log backup process to another machine, if speedy failover is a desire.

Quote:
From what i can find online, size does not seem to be much of a concern so long
as the OS supports it (using W2K3 - 300G total in RAID5 with 3G RAM)

Each db is 1 file, not broken up into tablespaces. There are three large tables
in each database that store 1 or 2 LONGBINARY column(s) plus the configuration,
in the same row, that accompanies this data, which are 2000->4000+ 32bit floats.
Are you saying there are 2000 to 4000 *columns* in the table? Just
asking.

There are things you can do with fat rows, depending on how badly they
are performing, but you haven't said anything about performance.

FWIW Foxhound lets see how much space is allocated to each table,
index, etc... if your database is V9 that is (it works on V5 - V8
databases too, but doesn't show details of disk space usage). See
http://www.risingroad.com/foxhound_beta_3.html

Quote:
The largest database is, as you may have excepted, also the fastest growing.

I've not seen any performance degradation as these databases get bigger,
That's good. The Rules on Optimization are this:

1. Don't do it.
2. (for experts only) Don't do it yet.

Quote:
but
want to understand if i should be doing something different. The databases are
inserted into very regularly, but only read from to populate front end databases
which drive our main application.
Use the wonderful new LogExpensiveQueries feature to find slow SQL;
see the SQL Anywhere Developer's Corner at
http://www.ianywhere.com/developer/s...eloper_corner/

Quote:
so is there a question in here? just wanted to explain what i have and am
looking for opinions as how to better handle it, or understand that this
configuration is fine.
The old saying "if it ain't broke don't fix it" seems to apply here.

Pavel gave you the major points from Todd Loomis' recent webcast, but
I'm not sure your database falls into the "high throughput" category,
which must accomodate traffic like 5G inserts and 5G deletes per day.

Todd's whitepaper is here
http://www.ianywhere.com/whitepapers...ut_sqlany.html

Eventually the webcast itself should appear here (and AFAIK Todd put
some more detail in the webcast)
http://www.ianywhere.com/products/presentations.html

Breck

--
Breck Carter [Team iAnywhere]
RisingRoad SQL Anywhere and MobiLink Professional Services
www.risingroad.com
The book: http://www.risingroad.com/SQL_Anywhe...ers_Guide.html
breck.carter (AT) risingroad (DOT) com


Reply With Quote
  #5  
Old   
Greg Fenton
 
Posts: n/a

Default Re: Large(r) databases - best practices - 04-03-2006 , 11:15 AM



snelbert wrote:
Quote:
I am no expert but I would advise running a database validation on a regular
basis, probably, right before you backup the data.
I would be running validation on the *backup* after the backup is
performed (actually, validating a copy of the backup, or while running
the backup in read-only mode).

This way validation won't affect production database performance,
validation will be done on a system that is not concurrently being
modified (which can lead to false validation failures), and most
importantly it proves that your backup is valid. There is nothing worse
than pulling your backup to recover a lost system only to find that your
backup was corrupted.

greg.fenton
--
Greg Fenton
Consultant, Solution Services, iAnywhere Solutions
--------
Visit the iAnywhere Solutions Developer Community
Whitepapers, TechDocs, Downloads
http://www.ianywhere.com/developer/


Reply With Quote
  #6  
Old   
Greg Fenton
 
Posts: n/a

Default Re: Large(r) databases - best practices - 04-03-2006 , 11:23 AM



Pavel Karady wrote:
Quote:
- RAID 0 or 1 perform slightly better than RAID 5 (RAID 5 performs better
only when writing small chunks of data to different disk locations - at
least that's what I've read somewhere - and that's why using ASA together
with RAID 5 is slower than with RAID 0 or 1)
Most database vendors (iAnywhere included) recommend RAID 1+0 (or "RAID
0+1" or RAID "10"). Performance (RAID 0, striping) and protection (RAID
1, mirroring). RAID 5 is a close second (striping and parity disk).
Parity is typically slower than mirroring, but mirroring is more
expensive (more disks needed).

For a brief intro to RAID levels:

http://searchstorage.techtarget.com/...214332,00.html

and

http://www.adaptec.com/worldwide/pro...tion_of_ra id

greg.fenton
--
Greg Fenton
Consultant, Solution Services, iAnywhere Solutions
--------
Visit the iAnywhere Solutions Developer Community
Whitepapers, TechDocs, Downloads
http://www.ianywhere.com/developer/


Reply With Quote
  #7  
Old   
Klaus P. Pieper
 
Posts: n/a

Default Re: Large(r) databases - best practices - 04-03-2006 , 12:33 PM



Breck Carter [Team iAnywhere] schrieb:
Quote:
If regular full offline or online backups become problematic (and they
might not, if the database isn't heavily used 24 by 7), you might
consider differential or incremental online log backups. Plus a live
log backup process to another machine, if speedy failover is a desire.
I have a client who wants an hourly full backup of a database (including
the log file). Despite this database being much smaller (about 1GB), it
turns out that the backup process slows down the entire operation
significantly for a few minutes - just as long as the backup takes. The
server is already equipped with fast SCSI drives.

Is there a way to reduce the priority of the backup process or to run
the backup process just when the database server is running idle?

Thanks for all answers

Klaus

--
reply to pub . kp2 . pieper at ibeq . com


Reply With Quote
  #8  
Old   
Breck Carter [Team iAnywhere]
 
Posts: n/a

Default Re: Large(r) databases - best practices - 04-03-2006 , 01:35 PM



What version and build are you using?

How is the backup performed? What is the *exact* command line or SQL
statement?

Is the drive fragmented?

Has the database file been expanded via ALTER DBSPACE to allocate free
space? Defragmentation should be done *after* this process.

How often does the server take checkpoints?

Are there any client processes that run explicit CHECKPOINT commands,
or force implicit checkpoints (e.g., ALTER TABLE)? Checkpoints are
suspended while the database file is backed up. This should just block
the connection(s) trying the checkpoints, BUT...

There there was an issue at one time where a backup competing with an
ALTER TABLE would cause *all* client connections to cease operating
until the backup finished. I think this is the description of the fix
made in 9.0.2.3202...

=====
Build #3202 - Engineering Case #406421
If a checkpoint was initiated while a backup was in progress, the
changes made for Engineering Case 400911 would have prevented
subsequent commits from completing until the backup had finished. The
server now allows requests attempting to suspend checkpoints to
progress if there is a transaction already suspending checkpoints as
a result of a backup or other such operation.
=====

Note that connections blocked by checkpoints do not show up in
sa_conn_info() or Block Sniffer.

Where is the temporary file stored?

Does the server have lots of RAM?

Have you tried the BACKUP DATABASE command, or dbbackup -s which
invokes BACKUP DATABASE? This does faster server push backup rather
than client pull.

Here is an alternative scheme that may give *better* protection as
well as (probably) better performance: Daily full database and log
backup to a separate machine, plus log mirroring (to the same
machine), plus hourly differential transaction log backup or a live
log backup to the separate machine. With an hourly full backup a
complete disaster (grenaded server) could cost the client an hour of
work. The same is true of hourly anything. With a live log backup,
that's reduced to a few seconds loss. The mirror log makes recovery
easier in the less-disastrous cases.

Breck




On 3 Apr 2006 09:33:45 -0800, "Klaus P. Pieper" <me (AT) private (DOT) net>
wrote:

Quote:
Breck Carter [Team iAnywhere] schrieb:

If regular full offline or online backups become problematic (and they
might not, if the database isn't heavily used 24 by 7), you might
consider differential or incremental online log backups. Plus a live
log backup process to another machine, if speedy failover is a desire.

I have a client who wants an hourly full backup of a database (including
the log file). Despite this database being much smaller (about 1GB), it
turns out that the backup process slows down the entire operation
significantly for a few minutes - just as long as the backup takes. The
server is already equipped with fast SCSI drives.

Is there a way to reduce the priority of the backup process or to run
the backup process just when the database server is running idle?

Thanks for all answers

Klaus
--
Breck Carter [Team iAnywhere]
RisingRoad SQL Anywhere and MobiLink Professional Services
www.risingroad.com
The book: http://www.risingroad.com/SQL_Anywhe...ers_Guide.html
breck.carter (AT) risingroad (DOT) com


Reply With Quote
  #9  
Old   
Klaus P. Pieper
 
Posts: n/a

Default Re: Large(r) databases - best practices - 04-03-2006 , 06:20 PM



Breck,

Breck Carter [Team iAnywhere] schrieb:
Quote:
What version and build are you using?
9.0.2.3198

Quote:
How is the backup performed?
A Windows service calls the backup routine over an OleDb Connection.

Quote:
What is the *exact* command line or SQL statement?
BACKUP DATABASE DIRECTORY '\\\\SERVER\\DIRECTORY'

The database server and the backup machine are connected with a Gbit
LAN. Just one switch between the two machines & not too much traffic in
the network.

Quote:
Is the drive fragmented?
Not that I'm aware off, but I would have to verify this.

Quote:
Has the database file been expanded via ALTER DBSPACE to allocate free
space? Defragmentation should be done *after* this process.
There hasn't been any defragmentation recently but the database was
completely unloaded and rebuilt 2 months ago. We noticed that the backup
slowed down the operation right after the database was rebuilt. The
database statistics do not show any significant fragmentation and the
disk has plenty of free space.

Quote:
How often does the server take checkpoints?
I'll have to check this.

Quote:
Are there any client processes that run explicit CHECKPOINT commands,
or force implicit checkpoints (e.g., ALTER TABLE)? Checkpoints are
suspended while the database file is backed up. This should just block
the connection(s) trying the checkpoints, BUT...
No, the operations which are impacted by the backup are just usual
selects, inserts & updates. Rarely deletes.

Quote:
There there was an issue at one time where a backup competing with an
ALTER TABLE would cause *all* client connections to cease operating
until the backup finished. I think this is the description of the fix
made in 9.0.2.3202...

=====
Build #3202 - Engineering Case #406421
If a checkpoint was initiated while a backup was in progress, the
changes made for Engineering Case 400911 would have prevented
subsequent commits from completing until the backup had finished. The
server now allows requests attempting to suspend checkpoints to
progress if there is a transaction already suspending checkpoints as
a result of a backup or other such operation.
=====

Note that connections blocked by checkpoints do not show up in
sa_conn_info() or Block Sniffer.
As I read this, it seems that *any* checkpoint (not only one caused by
an ALTER TABLE statment) which is initiated during a backup process
would cause the server to suspend other activities. I'll check this out.

Quote:
Where is the temporary file stored?
On the lokal drive c:\windows\temp

Quote:
Does the server have lots of RAM?
3.5GB
Not much more running on the same machine, it is more or less dedicated.

Quote:
Have you tried the BACKUP DATABASE command, or dbbackup -s which
invokes BACKUP DATABASE? This does faster server push backup rather
than client pull.
Yes, that's how the system works.

Quote:
Here is an alternative scheme that may give *better* protection as
well as (probably) better performance: Daily full database and log
backup to a separate machine, plus log mirroring (to the same
machine), plus hourly differential transaction log backup or a live
log backup to the separate machine. With an hourly full backup a
complete disaster (grenaded server) could cost the client an hour of
work. The same is true of hourly anything. With a live log backup,
that's reduced to a few seconds loss. The mirror log makes recovery
easier in the less-disastrous cases.
Thanks, I'll give this some thoughts.

Klaus

--
reply to pub . kp2 . pieper at ibeq . com


Reply With Quote
  #10  
Old   
Pavel Karady
 
Posts: n/a

Default Re: Large(r) databases - best practices - 04-06-2006 , 05:28 AM



"Klaus P. Pieper" <me (AT) private (DOT) net> wrote

Quote:
BACKUP DATABASE DIRECTORY '\\\\SERVER\\DIRECTORY'
Just an almost-off-topic question: the BACKUP statement is not recommended
to be used with network addresses, is it?

Pavel




Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.