dbTalk Databases Forums  

reducing server load by crazy data distribution

comp.databases.mysql comp.databases.mysql


Discuss reducing server load by crazy data distribution in the comp.databases.mysql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Jacek Krysztofik
 
Posts: n/a

Default reducing server load by crazy data distribution - 01-01-2011 , 01:46 PM






-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi,
I got involved in a project that, apart from a ton of spaghetti code,
has something they call "db split functionality." NOTE: this is in no
way table partitioning and the longest table In short it means they have
several clones of the database (on the same server) with the same
structure but different set of data. The piece of code that takes care
of switching between the databases is below.

I was told the data has been distributed to decrease the server load. In
the meantime there appearead a bunch of (id) inconsistencies between the
data in the clones and other non-cloned parts of the database (ids in
serialized data &c).

- From the start the solution to the problem of high server load by
shortening the table content seemed outlandish. It is done by
arbitrarily assigning parts of uniformly structured data to various
database clones. Never mind that the table structure and indexing are,
well, undesigned. Variable lengths everywhere, 20+ columns a table, no
constraints or relations, boolean values done with 'yes'/'no' enums,
stuff like this.

I am already dizzy because of this project (no we can't refactor or
overhaul, because we have deadlines and new dev are coming to learn this
big ball of mud) so my question to you is:
is the "db split functionality" as crazy as I think or am I going crazy?
IMO the proper design and indexing would have been sufficient to reduce
the server load (seek times mostly). But I am really starting to doubt
if the Earth is round... Please help.

Here's a part of the php "code" that goes about "db splitting".

class Item {

<- snip ->

// this func takes care of the db split functionality
function _changeUserDB()
{
global $_dbDetails, $connCounter;
if(trim($this->config_dbase_prefix)=='')
return;
$_dbDetails['name']=$this->config_dbase_prefix;
$this->userDbx='`'.$this->config_dbase_prefix.'`.';
$this->_connect(); // just in case, we do it over;
}

<- snip ->

function _connect()
{
global $_dbDetails;
if(isset($this))
{
$this->_dbLink =
mysql_connect($_dbDetails['host'],$_dbDetails['user'],$_dbDetails['pass']);
mysql_select_db($_dbDetails['name'],$this->_dbLink);
}
else
{
$dbLink =
mysql_connect($_dbDetails['host'],$_dbDetails['user'],$_dbDetails['pass']);
mysql_select_db($_dbDetails['name'],$dbLink);
}
<- snip ->
}
}
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iF4EAREIAAYFAk0fhK4ACgkQfD3PECtxdkVgBgD/TCiodXRLzJcAsemhHl+zStxY
/nwYq8Ll4VxeEPdk1sQBAMceCn7bWd670XdI4vctaOiZ8ZYlZsm JQRRCT2z2WojF
=i1Pw
-----END PGP SIGNATURE-----

Reply With Quote
  #2  
Old   
Jerry Stuckle
 
Posts: n/a

Default Re: reducing server load by crazy data distribution - 01-01-2011 , 04:25 PM






On 1/1/2011 2:46 PM, Jacek Krysztofik wrote:
Quote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi,
I got involved in a project that, apart from a ton of spaghetti code,
has something they call "db split functionality." NOTE: this is in no
way table partitioning and the longest table In short it means they have
several clones of the database (on the same server) with the same
structure but different set of data. The piece of code that takes care
of switching between the databases is below.

I was told the data has been distributed to decrease the server load. In
the meantime there appearead a bunch of (id) inconsistencies between the
data in the clones and other non-cloned parts of the database (ids in
serialized data&c).

- From the start the solution to the problem of high server load by
shortening the table content seemed outlandish. It is done by
arbitrarily assigning parts of uniformly structured data to various
database clones. Never mind that the table structure and indexing are,
well, undesigned. Variable lengths everywhere, 20+ columns a table, no
constraints or relations, boolean values done with 'yes'/'no' enums,
stuff like this.

I am already dizzy because of this project (no we can't refactor or
overhaul, because we have deadlines and new dev are coming to learn this
big ball of mud) so my question to you is:
is the "db split functionality" as crazy as I think or am I going crazy?
IMO the proper design and indexing would have been sufficient to reduce
the server load (seek times mostly). But I am really starting to doubt
if the Earth is round... Please help.

Here's a part of the php "code" that goes about "db splitting".

class Item {

- snip -

// this func takes care of the db split functionality
function _changeUserDB()
{
global $_dbDetails, $connCounter;
if(trim($this->config_dbase_prefix)=='')
return;
$_dbDetails['name']=$this->config_dbase_prefix;
$this->userDbx='`'.$this->config_dbase_prefix.'`.';
$this->_connect(); // just in case, we do it over;
}

- snip -

function _connect()
{
global $_dbDetails;
if(isset($this))
{
$this->_dbLink =
mysql_connect($_dbDetails['host'],$_dbDetails['user'],$_dbDetails['pass']);
mysql_select_db($_dbDetails['name'],$this->_dbLink);
}
else
{
$dbLink =
mysql_connect($_dbDetails['host'],$_dbDetails['user'],$_dbDetails['pass']);
mysql_select_db($_dbDetails['name'],$dbLink);
}
- snip -
}
}
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iF4EAREIAAYFAk0fhK4ACgkQfD3PECtxdkVgBgD/TCiodXRLzJcAsemhHl+zStxY
/nwYq8Ll4VxeEPdk1sQBAMceCn7bWd670XdI4vctaOiZ8ZYlZsm JQRRCT2z2WojF
=i1Pw
-----END PGP SIGNATURE-----
Impossible to tell without a close examination of the databases and how
the databases are used. Separating data into different databases can
improve performance, or it can hurt performance.

For instance, let's say you have 10 years of sales records, but 95% of
the queries are for data from the last six months, with only an
occasional reference to older data. You *may* be able to improve
performance by archiving data older than six months in another table.
But it's not always true, either. There are just too many variables to
be able to tell without that examination.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex (AT) attglobal (DOT) net
==================

Reply With Quote
  #3  
Old   
Axel Schwenke
 
Posts: n/a

Default Re: reducing server load by crazy data distribution - 01-01-2011 , 05:36 PM



Jacek Krysztofik <default (AT) tlen (DOT) pl> wrote:

Quote:
I got involved in a project that, apart from a ton of spaghetti code,
has something they call "db split functionality." NOTE: this is in no
way table partitioning and the longest table In short it means they have
several clones of the database (on the same server) with the same
structure but different set of data.

I was told the data has been distributed to decrease the server load.
If those databases are located on the same physical machine, then this
makes little to no sense.

Quote:
... my question to you is:
is the "db split functionality" as crazy as I think or am I going crazy?
If this is done properly, it is called "sharding". It's very similar
to regular partitioning except that:

- shards are defined by logical aspects of data, typically each
shard is 100% self-contained

- distribution of data to shards and (if necessary) combination of
data from several shards is done in application code

- different shards reside on different (physical) hardware, normally
the shards are completely independent

Sharding is used by huge sites where there is too much data to handle
it (economically) in a single database instance. The goal is to
distribute data and work load to as many cheap cpu cores as possible.

Extreme example of that philosophy is Google. They splitted the search
index to literally(!) hundreds of thousands of cheap computers.


XL

Reply With Quote
  #4  
Old   
Axel Schwenke
 
Posts: n/a

Default Re: reducing server load by crazy data distribution - 01-02-2011 , 07:41 AM



Jacek Krysztofik <default (AT) tlen (DOT) pl> wrote:
Quote:
Axel Schwenke wrote:
If those databases are located on the same physical machine, then this
makes little to no sense.

Everything is on one machine, Quad-Core AMD Opteron(tm) Processor 2378.
Yeah, makes no sense

Quote:
If this is done properly, it is called "sharding". It's very similar
to regular partitioning except that:

- shards are defined by logical aspects of data, typically each
shard is 100% self-contained

Nope, I mean here, but they are only related to the "main" db.
Well, I didn't say what you see *is* sharding. It looks like a lame
approach to make the database distributable from the beginning, so
it can be sharded and distributed later (but this is only a first
guess and I won't dive into that matter)

From all the other points you name:

Quote:
The tables are not normalized, indexes are made just for the sake of
their existence, most of the time they don't watch anything relevant
IMHO the whole project looks FUBAR. My advise would be to dump the
existing stuff and start over. Unfortunately you already excluded
this option.

PS: just to make this point clear: while sharding is regarded a good
way out of the database scalability trap, it's not a silver bullet.
One should always start with a normalized, single instance setup.
Then optimize known (measured!) hotspots. And only as last resort
start thinking about a distributed database setup.


XL

Reply With Quote
  #5  
Old   
Jacek Krysztofik
 
Posts: n/a

Default Re: reducing server load by crazy data distribution - 01-02-2011 , 08:42 AM



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Axel Schwenke wrote:
Quote:
Well, I didn't say what you see *is* sharding.
I know, I'm just saying that what is here doesn't meet the criteria.

Quote:
It looks like a lame approach to make the database distributable
from the beginning, so it can be sharded and distributed later
No, it was done when the previous machine choked on the load.

Quote:
IMHO the whole project looks FUBAR. My advise would be to dump the
existing stuff and start over.
I am of the exact same opinion.

Quote:
Unfortunately you already excluded this option.
Well, the owner said no. I'm new to the team (2 devs plus one
"designer"/admin). The boss' idea of managing the project is to hire new
people to implement new features ASAP (nevermind Brooks' law). My
arguments that this idea is the recipe for disaster are not understood.

Quote:
Then optimize known (measured!) hotspots. And only as last resort
start thinking about a distributed database setup.
That's the problem with this project. For three years there was no time
for overhaul, because "they had too many TODOs" (as if a refactoring or
overhaul wasn't a TODO and STAT). There was only a time for hotfixes and
workarounds on a codebase created by a PHP beginner.

Is there a software engineering newsgroup or do you have any experience
in this matter? IMO the project is doomed unless the following are done:

- - hire 3-5 new devs with closely related framework experience, at least
one predominantly for view layer, at least one with big DBA experience,

- - do a quick basic functional analysis of the core of the system - the
simplest possible subset of functions,

- - develop the new core on some uniform codebase (MVC framework w/ ORM),

- - make some kind of transformation for the data from the old database,
like reports &c,

- - in the meantime the old team catches up with any current tasks possible,

- - gradually analyse, model and develop the remaining parts of the system
by functionality, so that parts of the old setup can be delegated as
soon as they are completed in the new codebase - this should be done so
that the old team is disturbed as little as possible.

Of course this MO will not make the deadlines realistic but it could
rescue the project. At its current state I'm afraid to touch anything
because it looks like a house of cards.

Thanks for your help.

- --

In closing I want to show you two main methods used throughout the project:

/**
* loadSomeByWhere
* this will load all the records.
* it actually creates an iterator for us:
*/
function
loadSomeByWhere($where='1',$order=false,$orderDir= 'DESC',$offset=false,
$num=false, $calcRows=false)
{
$this->_lT('pre loadsomebywhere');
$this->_lT();
session_start();
$this->_currentIndex = -1;
$this->_iterator = false;

if($where == '')
$where = 1;

$sql='SELECT ';
if($calcRows)
$sql.='SQL_CALC_FOUND_ROWS ';
$sql .= "* FROM {$this->dbx}`{$this->_table}` as `{$this->_table}` ";

// if they provided ordering:
$sql .= ' WHERE '.$where;
if(is_array($order)) $order=implode(" {$orderDir}, ", $order);
if($order !== false || in_array("$order",$this->_dbFields))
$sql .= " ORDER BY {$order} {$orderDir}";

// If they provided pagination:
if($offset !== false && $num != false)
$sql .= " LIMIT {$offset},{$num}";


//if($orderDir != 'BAZOO')
// print $sql;
$this->_lT('_diff');
$this->_lT('finish prep LSBW');
$this->_lT('pre query');
$this->_lT();

$this->_iterator = mysql_query($sql);
$this->_lT('_diff');
$this->_lT($sql);
$this->_lT('post query');
$this->_lT();
if($calcRows)
{
$_res=mysql_query('select FOUND_ROWS() as `rows`');
if($_res)
$_row=mysql_fetch_assoc($_res);
if($_row['rows'])
$this->_numItems=$_row['rows'];

}
else
$this->_numItems = mysql_num_rows($this->_iterator);

//debug: output of sql query
// echo "<!-- $sql ".$this->getClass()." -->\n";
$this->_lT('_diff');
$this->_lT('post numitems');
$this->objInfo['sql']=$sql;
//$_SESSION['debug']['loadSomeByWhere'][]=array($this->getClass()
=>$sql, 'rows' => $this->_numItems);
}


//if($orderDir != 'BAZOO')
// print $sql;
$this->_lT('_diff');
$this->_lT('finish prep LSBW');
$this->_lT('pre query');
$this->_lT();

$this->_iterator = mysql_query($sql);
$this->_lT('_diff');
$this->_lT($sql);
$this->_lT('post query');
$this->_lT();
if($calcRows)
{
$_res=mysql_query('select FOUND_ROWS() as `rows`');
if($_res)
$_row=mysql_fetch_assoc($_res);
if($_row['rows'])
$this->_numItems=$_row['rows'];

}
else
$this->_numItems = mysql_num_rows($this->_iterator);

//debug: output of sql query
// echo "<!-- $sql ".$this->getClass()." -->\n";
$this->_lT('_diff');
$this->_lT('post numitems');
$this->objInfo['sql']=$sql;

//$_SESSION['debug']['loadSomeByWhere'][]=array($this->getClass()
=>$sql, 'rows' => $this->_numItems);
}


/**
* Super tricky iterator: pfft.
*
* @return bool Success indicator.
*/

var $noInstance=false; // db split hack, needed to instantiate certain
objects on the main db
function next($instance=true, $changeDB=true)
{
session_start();
//$numsql=count($_SESSION['debug']['loadSomeByWhere'])-1;
if($this->_iterator != false && $this->_currentIndex+1 < $this->_numItems)
{
$this->_currentIndex++;
if(!mysql_data_seek($this->_iterator,$this->_currentIndex))
return false;
$assoc = mysql_fetch_assoc($this->_iterator);
$this->_thisRowAssoc = $assoc;
//$_SESSION['debug']['loadSomeByWhere'][$numsql]['dataRows'][]=$assoc;
$this->_setProperties($assoc);

// dbsplit functionality.
// as the login gets loaded either by next() or load() we hack both
funcs to change the db:
if($changeDB)
{
// echo "going there ({$changeDB})".$this->getClass()."\n";
$this->_changeUserDB();
}
$obj=new ObjectData('next', $this->getClass(), $this->id, $this); //
we add the extra data from the associated data table
if($instance and !$this->noInstance) $this->instanceAll(); //
sometimes you need to breake things in order to fix them...
// quick hack to avoid calling LFL for every record, if the first one
returned nothing.
if(!isset($this->LFL) or $this->prevQTables[$this->LFL]!='no rows')
$this->LFL=$this->_loadFromLookups();
return(true);
}
else
return(false);
}

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iF4EAREIAAYFAk0gjrsACgkQfD3PECtxdkXdvAD9F0sMwrCPC3 cp1CTg1i9CiNaZ
Mk/gyxt9j7lT5/WC7OQA/A+OdfY+MFThbrLwe4dLD7vTYsmR4YL1K81VGFCll/Xs
=pGHF
-----END PGP SIGNATURE-----

Reply With Quote
  #6  
Old   
jwcarlton
 
Posts: n/a

Default Re: reducing server load by crazy data distribution - 01-03-2011 , 06:59 PM



On Jan 2, 9:42*am, Jacek Krysztofik <defa... (AT) tlen (DOT) pl> wrote:
Quote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Axel Schwenke wrote:
Well, I didn't say what you see *is* sharding.

I know, I'm just saying that what is here doesn't meet the criteria.

It looks like a lame approach to make the database distributable
from the beginning, so it can be sharded and distributed later

No, it was done when the previous machine choked on the load.

IMHO the whole project looks FUBAR. My advise would be to dump the
existing stuff and start over.

I am of the exact same opinion.

Unfortunately you already excluded this option.

Well, the owner said no. I'm new to the team (2 devs plus one
"designer"/admin). The boss' idea of managing the project is to hire new
people to implement new features ASAP (nevermind Brooks' law). My
arguments that this idea is the recipe for disaster are not understood.

Then optimize known (measured!) hotspots. And only as last resort
start thinking about a distributed database setup.

That's the problem with this project. For three years there was no time
for overhaul, because "they had too many TODOs" (as if a refactoring or
overhaul wasn't a TODO and STAT). There was only a time for hotfixes and
workarounds on a codebase created by a PHP beginner.

Is there a software engineering newsgroup or do you have any experience
in this matter? IMO the project is doomed unless the following are done:

- - hire 3-5 new devs with closely related framework experience, at least
one predominantly for view layer, at least one with big DBA experience,

- - do a quick basic functional analysis of the core of the system - the
simplest possible subset of functions,

- - develop the new core on some uniform codebase (MVC framework w/ ORM),

- - make some kind of transformation for the data from the old database,
like reports &c,

- - in the meantime the old team catches up with any current tasks possible,

- - gradually analyse, model and develop the remaining parts of the system
by functionality, so that parts of the old setup can be delegated as
soon as they are completed in the new codebase - this should be done so
that the old team is disturbed as little as possible.

Of course this MO will not make the deadlines realistic but it could
rescue the project. At its current state I'm afraid to touch anything
because it looks like a house of cards.

Thanks for your help.

- --

In closing I want to show you two main methods used throughout the project:

/**
** loadSomeByWhere
** this will load all the records.
** it actually creates an iterator for us:
**/
function
loadSomeByWhere($where='1',$order=false,$orderDir= 'DESC',$offset=false,
$num=false, $calcRows=false)
{
* * * * $this->_lT('pre loadsomebywhere');
* * * * $this->_lT();
* * * * session_start();
* * * * $this->_currentIndex = -1;
* * * * $this->_iterator = false;

* * * * if($where == '')
* * * * * * * * $where = 1;

* * * * $sql='SELECT ';
* * * * if($calcRows)
* * * * * * * * $sql.='SQL_CALC_FOUND_ROWS ';
* * * * $sql .= "* FROM {$this->dbx}`{$this->_table}` as `{$this->_table}` ";

* * * * // if they provided ordering:
* * * * $sql .= ' *WHERE '.$where;
* * * * if(is_array($order)) $order=implode(" {$orderDir}, ", $order);
* * * * if($order !== false || in_array("$order",$this->_dbFields))
* * * * * * * * $sql .= " ORDER BY {$order} {$orderDir}";

* * * * // If they provided pagination:
* * * * if($offset !== false && $num != false)
* * * * * * * * $sql .= " LIMIT {$offset},{$num}";

* * * * //if($orderDir != 'BAZOO')
* * * * // * * *print $sql;
* * * * $this->_lT('_diff');
* * * * $this->_lT('finish prep LSBW');
* * * * $this->_lT('pre query');
* * * * $this->_lT();

* * * * $this->_iterator = mysql_query($sql);
* * * * $this->_lT('_diff');
* * * * $this->_lT($sql);
* * * * $this->_lT('post query');
* * * * $this->_lT();
* * * * if($calcRows)
* * * * {
* * * * * * * * $_res=mysql_query('select FOUND_ROWS() as `rows`');
* * * * * * * * if($_res)
* * * * * * * * * * * * $_row=mysql_fetch_assoc($_res);
* * * * * * * * if($_row['rows'])
* * * * * * * * * * * * $this->_numItems=$_row['rows'];

* * * * }
* * * * else
* * * * * * * * $this->_numItems = mysql_num_rows($this->_iterator);

* * * * //debug: output of sql query
* * * * // echo "<!-- $sql ".$this->getClass()." -->\n";
* * * * $this->_lT('_diff');
* * * * $this->_lT('post numitems');
* * * * $this->objInfo['sql']=$sql;
* * * * //$_SESSION['debug']['loadSomeByWhere'][]=array($this->getClass()
=>$sql, 'rows' => $this->_numItems);

}

* * * * * * * * //if($orderDir != 'BAZOO')
* * * * * * * * // * * *print $sql;
* * * * * * * * $this->_lT('_diff');
* * * * * * * * $this->_lT('finish prep LSBW');
* * * * * * * * $this->_lT('pre query');
* * * * * * * * $this->_lT();

* * * * * * * * $this->_iterator = mysql_query($sql);
* * * * * * * * $this->_lT('_diff');
* * * * * * * * $this->_lT($sql);
* * * * * * * * $this->_lT('post query');
* * * * * * * * $this->_lT();
* * * * * * * * if($calcRows)
* * * * * * * * {
* * * * * * * * * * * * $_res=mysql_query('select FOUND_ROWS() as `rows`');
* * * * * * * * * * * * if($_res)
* * * * * * * * * * * * * * * * $_row=mysql_fetch_assoc($_res);
* * * * * * * * * * * * if($_row['rows'])
* * * * * * * * * * * * * * * * $this->_numItems=$_row['rows'];

* * * * * * * * }
* * * * * * * * else
* * * * * * * * * * * * $this->_numItems = mysql_num_rows($this->_iterator);

* * * * //debug: output of sql query
* * * * * * * * // echo "<!-- $sql ".$this->getClass()." -->\n";
* * * * * * * * $this->_lT('_diff');
* * * * * * * * $this->_lT('post numitems');
* * * * $this->objInfo['sql']=$sql;

//$_SESSION['debug']['loadSomeByWhere'][]=array($this->getClass()
=>$sql, 'rows' => $this->_numItems);
* * * * }

/**
** Super tricky iterator: pfft.
**
** @return bool Success indicator.
**/

var $noInstance=false; // db split hack, needed to instantiate certain
objects on the main db
function next($instance=true, $changeDB=true)
{
* * * * session_start();
* * * * //$numsql=count($_SESSION['debug']['loadSomeByWhere'])-1;
* * * * if($this->_iterator != false && $this->_currentIndex+1 < $this->_numItems)
* * * * {
* * * * * * * * $this->_currentIndex++;
* * * * * * * * if(!mysql_data_seek($this->_iterator,$this->_currentIndex))
* * * * * * * * * * * * return false;
* * * * * * * * $assoc = mysql_fetch_assoc($this->_iterator);
* * * * * * * * $this->_thisRowAssoc = $assoc;
* * * * * * * * //$_SESSION['debug']['loadSomeByWhere'][$numsql]['dataRows'][]=$assoc;
* * * * * * * * $this->_setProperties($assoc);

* * * * * * * * // dbsplit functionality.
* * * * * * * * // as the login gets loaded either by next() or load() we hack both
funcs to change the db:
* * * * * * * * if($changeDB)
* * * * * * * * {
* * * * * * * * * * * * // echo "going there ({$changeDB})".$this->getClass()."\n";
* * * * * * * * * * * * $this->_changeUserDB();
* * * * * * * * }
* * * * * * * * $obj=new ObjectData('next', $this->getClass(), $this->id, $this); //
we add the extra data from the associated data table
* * * * * * * * if($instance and !$this->noInstance) $this->instanceAll(); //
sometimes you need to breake things in order to fix them...
* * * * * * * * // quick hack to avoid calling LFL for every record, if the first one
returned nothing.
* * * * * * * * if(!isset($this->LFL) or $this->prevQTables[$this->LFL]!='no rows')
$this->LFL=$this->_loadFromLookups();
* * * * * * * * return(true);
* * * * }
* * * * else
* * * * * * * * return(false);

}

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla -http://enigmail.mozdev.org/

iF4EAREIAAYFAk0gjrsACgkQfD3PECtxdkXdvAD9F0sMwrCPC3 cp1CTg1i9CiNaZ
Mk/gyxt9j7lT5/WC7OQA/A+OdfY+MFThbrLwe4dLD7vTYsmR4YL1K81VGFCll/Xs
=pGHF
-----END PGP SIGNATURE-----
Jacek, I would SWEAR that you were describing my database and company!
LOL Other than the code you posted, you could be talking about me:

1. Stumbling through a database design that became critical instead of
the hobby that it originated.

2. Too many urgent projects to allow for better database design, so
it's compounded with patches and quick-fixes instead.

3. Distributed data in an attempt to decrease server load.

4. By time he had the money and time to bring in a pro, it's gone so
far that no one has a clue how to fix it.

From the owner's perspective, I can honestly say that it comes from a
base of not knowing what to expect from your data in the future, not
knowing how much demand to expect, and not really understanding the
limitations of the software and database. And, in my case, the whole
point was to learn something, so hiring a pro defeats the purpose.

On my end, things worked perfectly 5 years ago, but then when the
demand increased and the amount of data increased, flaws in the
software design (and database structure) began to surface. And from
the amateur's POV, we don't really know if the limitations require a
complete overhaul, or if something as simple as changing the index is
all that's needed.

In short... I feel for you.

Reply With Quote
  #7  
Old   
Bodo
 
Posts: n/a

Default Re: reducing server load by crazy data distribution - 01-04-2011 , 07:10 AM



Hi,

top tuning advice is: proper indexes!

Better indexing gains factor 1000.

Trying to distribute load on same hardware is pure nonsense.

It may make sense to add plenty of RAM. It was 8GB right?
ram is cheap nowadays.

Install as many as possible and adjust your my.cnf parameters.

I'd try with innotop or mytop to search for the worst performing statements
and add proper indexes

kind regards,

Toni

Reply With Quote
  #8  
Old   
Jerry Stuckle
 
Posts: n/a

Default Re: reducing server load by crazy data distribution - 01-04-2011 , 08:07 AM



On 1/4/2011 8:10 AM, Bodo wrote:
Quote:
Hi,

top tuning advice is: proper indexes!

Better indexing gains factor 1000.

Not necessarily. Any index MAY speed up SELECT statement processing,
but WILL slow down INSERT (and UPDATE statements affecting the column(s)
being indexed). Too many indexes is as bad as too few; you need to
analyze usage and create the correct indexes. And index gains may be
greater or smaller than 1000 - it all depends on the data and the SELECT
statement. But I think a gain of 1000 is high for most SQL tables.

Quote:
Trying to distribute load on same hardware is pure nonsense.

Distributing data across tables can easily improve performance, if a
large portion of SELECTS are done on the same subset of data all the
time. For instance, a table containing sales data can get quite large,
and most searches are on recent data (last day, week, month or quarter).
Archiving older data into another table cuts down the amount of data
needed to be searched for most SELECT statements, improving performance.

Quote:
It may make sense to add plenty of RAM. It was 8GB right?
ram is cheap nowadays.

Install as many as possible and adjust your my.cnf parameters.

Too much memory can actually hurt performance because cache handling
becomes more time consuming (and CPU intensive) than just retrieving the
data in the first place. Again, the answer is to get the correct
amount. This also is highly dependent on data and its usage.

Quote:
I'd try with innotop or mytop to search for the worst performing statements
and add proper indexes

The worst performing statements may or may not be the problem. A 5
second improvement in a statement executed once a month isn't worth the
time spent doing it. However, a 0.005 second improvement in a statement
executed 100 times/second is a huge improvement. EXPLAINing the most
commonly used statements will help here.


Quote:
kind regards,

Toni
P.S. Please quote the appropriate text you are responding to. It helps
maintain thread continuity. Thanks.



--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex (AT) attglobal (DOT) net
==================

Reply With Quote
  #9  
Old   
Jacek Krysztofik
 
Posts: n/a

Default Re: reducing server load by crazy data distribution - 01-04-2011 , 08:24 AM



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

OK, Mr "Expert at state the obvious" I'd like to thank you for your
input PERMANENTLY. Please regard this as a Cease and Desist note.

04.01.2011 15:07, Jerry Stuckle:
Quote:
top tuning advice is: proper indexes!
Not necessarily. Any index MAY speed up SELECT statement processing,
but WILL slow down INSERT (and UPDATE statements affecting the column(s)
being indexed). Too many indexes is as bad as too few; you need to
analyze usage and create the correct indexes. And index gains may be
greater or smaller than 1000 - it all depends on the data and the SELECT
statement. But I think a gain of 1000 is high for most SQL tables.
In case you didn't notice, the GP wrote "proper indexes".

Quote:
Distributing data across tables can easily improve performance, if a
large portion of SELECTS are done on the same subset of data all the
time. For instance, a table containing sales data can get quite large,
and most searches are on recent data (last day, week, month or quarter).
Archiving older data into another table cuts down the amount of data
needed to be searched for most SELECT statements, improving performance.
We're talking about data that's constantly utilized as explained in
other messages.

Quote:
Too much memory can actually hurt performance because cache handling
becomes more time consuming (and CPU intensive) than just retrieving the
data in the first place. Again, the answer is to get the correct
amount. This also is highly dependent on data and its usage.
The db server will handle the memory management, the point is access to
enough. You raise a non-issue here.

Quote:
The worst performing statements may or may not be the problem. A 5
second improvement in a statement executed once a month isn't worth the
time spent doing it. However, a 0.005 second improvement in a statement
executed 100 times/second is a huge improvement. EXPLAINing the most
commonly used statements will help here.
You overarticulate bullshit to sound clever. Please stop.

Quote:
P.S. Please quote the appropriate text you are responding to. It helps
maintain thread continuity. Thanks.
Look who's talking. I (the OP) had no problem understanding the reply.

Bye
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iF4EAREIAAYFAk0jLYYACgkQfD3PECtxdkUcfQEAoSAHFtnWng 1qjLxq9HQ/GkGb
+fUhSdisw3peHxP1fRoA/isQLsWQ0pueHaXrJNcwMnU3L5yC7GM6LGZmIsUZx81V
=XATi
-----END PGP SIGNATURE-----

Reply With Quote
  #10  
Old   
Axel Schwenke
 
Posts: n/a

Default Re: reducing server load by crazy data distribution - 01-04-2011 , 09:03 AM



Jacek Krysztofik <default (AT) tlen (DOT) pl> wrote:
Quote:
04.01.2011 15:07, Jerry Stuckle:

OK, Mr "Expert at state the obvious" I'd like to thank you for your
input PERMANENTLY. Please regard this as a Cease and Desist note.
LOOOOL

Thanks Jacek, you made my day!


XL

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.