![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Hi, I got involved in a project that, apart from a ton of spaghetti code, has something they call "db split functionality." NOTE: this is in no way table partitioning and the longest table In short it means they have several clones of the database (on the same server) with the same structure but different set of data. The piece of code that takes care of switching between the databases is below. I was told the data has been distributed to decrease the server load. In the meantime there appearead a bunch of (id) inconsistencies between the data in the clones and other non-cloned parts of the database (ids in serialized data&c). - From the start the solution to the problem of high server load by shortening the table content seemed outlandish. It is done by arbitrarily assigning parts of uniformly structured data to various database clones. Never mind that the table structure and indexing are, well, undesigned. Variable lengths everywhere, 20+ columns a table, no constraints or relations, boolean values done with 'yes'/'no' enums, stuff like this. I am already dizzy because of this project (no we can't refactor or overhaul, because we have deadlines and new dev are coming to learn this big ball of mud) so my question to you is: is the "db split functionality" as crazy as I think or am I going crazy? IMO the proper design and indexing would have been sufficient to reduce the server load (seek times mostly). But I am really starting to doubt if the Earth is round... Please help. Here's a part of the php "code" that goes about "db splitting". class Item { - snip - // this func takes care of the db split functionality function _changeUserDB() { global $_dbDetails, $connCounter; if(trim($this->config_dbase_prefix)=='') return; $_dbDetails['name']=$this->config_dbase_prefix; $this->userDbx='`'.$this->config_dbase_prefix.'`.'; $this->_connect(); // just in case, we do it over; } - snip - function _connect() { global $_dbDetails; if(isset($this)) { $this->_dbLink = mysql_connect($_dbDetails['host'],$_dbDetails['user'],$_dbDetails['pass']); mysql_select_db($_dbDetails['name'],$this->_dbLink); } else { $dbLink = mysql_connect($_dbDetails['host'],$_dbDetails['user'],$_dbDetails['pass']); mysql_select_db($_dbDetails['name'],$dbLink); } - snip - } } -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iF4EAREIAAYFAk0fhK4ACgkQfD3PECtxdkVgBgD/TCiodXRLzJcAsemhHl+zStxY /nwYq8Ll4VxeEPdk1sQBAMceCn7bWd670XdI4vctaOiZ8ZYlZsm JQRRCT2z2WojF =i1Pw -----END PGP SIGNATURE----- |
#3
| |||
| |||
|
|
I got involved in a project that, apart from a ton of spaghetti code, has something they call "db split functionality." NOTE: this is in no way table partitioning and the longest table In short it means they have several clones of the database (on the same server) with the same structure but different set of data. I was told the data has been distributed to decrease the server load. |
|
... my question to you is: is the "db split functionality" as crazy as I think or am I going crazy? |
#4
| |||
| |||
|
|
Axel Schwenke wrote: If those databases are located on the same physical machine, then this makes little to no sense. Everything is on one machine, Quad-Core AMD Opteron(tm) Processor 2378. |
|
If this is done properly, it is called "sharding". It's very similar to regular partitioning except that: - shards are defined by logical aspects of data, typically each shard is 100% self-contained Nope, I mean here, but they are only related to the "main" db. |
|
The tables are not normalized, indexes are made just for the sake of their existence, most of the time they don't watch anything relevant |
#5
| |||||
| |||||
|
|
Well, I didn't say what you see *is* sharding. I know, I'm just saying that what is here doesn't meet the criteria. |
|
It looks like a lame approach to make the database distributable from the beginning, so it can be sharded and distributed later No, it was done when the previous machine choked on the load. |
|
IMHO the whole project looks FUBAR. My advise would be to dump the existing stuff and start over. I am of the exact same opinion. |
|
Unfortunately you already excluded this option. Well, the owner said no. I'm new to the team (2 devs plus one |
|
Then optimize known (measured!) hotspots. And only as last resort start thinking about a distributed database setup. That's the problem with this project. For three years there was no time |
#6
| |||
| |||
|
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Axel Schwenke wrote: Well, I didn't say what you see *is* sharding. I know, I'm just saying that what is here doesn't meet the criteria. It looks like a lame approach to make the database distributable from the beginning, so it can be sharded and distributed later No, it was done when the previous machine choked on the load. IMHO the whole project looks FUBAR. My advise would be to dump the existing stuff and start over. I am of the exact same opinion. Unfortunately you already excluded this option. Well, the owner said no. I'm new to the team (2 devs plus one "designer"/admin). The boss' idea of managing the project is to hire new people to implement new features ASAP (nevermind Brooks' law). My arguments that this idea is the recipe for disaster are not understood. Then optimize known (measured!) hotspots. And only as last resort start thinking about a distributed database setup. That's the problem with this project. For three years there was no time for overhaul, because "they had too many TODOs" (as if a refactoring or overhaul wasn't a TODO and STAT). There was only a time for hotfixes and workarounds on a codebase created by a PHP beginner. Is there a software engineering newsgroup or do you have any experience in this matter? IMO the project is doomed unless the following are done: - - hire 3-5 new devs with closely related framework experience, at least one predominantly for view layer, at least one with big DBA experience, - - do a quick basic functional analysis of the core of the system - the simplest possible subset of functions, - - develop the new core on some uniform codebase (MVC framework w/ ORM), - - make some kind of transformation for the data from the old database, like reports &c, - - in the meantime the old team catches up with any current tasks possible, - - gradually analyse, model and develop the remaining parts of the system by functionality, so that parts of the old setup can be delegated as soon as they are completed in the new codebase - this should be done so that the old team is disturbed as little as possible. Of course this MO will not make the deadlines realistic but it could rescue the project. At its current state I'm afraid to touch anything because it looks like a house of cards. Thanks for your help. - -- In closing I want to show you two main methods used throughout the project: /** ** loadSomeByWhere ** this will load all the records. ** it actually creates an iterator for us: **/ function loadSomeByWhere($where='1',$order=false,$orderDir= 'DESC',$offset=false, $num=false, $calcRows=false) { * * * * $this->_lT('pre loadsomebywhere'); * * * * $this->_lT(); * * * * session_start(); * * * * $this->_currentIndex = -1; * * * * $this->_iterator = false; * * * * if($where == '') * * * * * * * * $where = 1; * * * * $sql='SELECT '; * * * * if($calcRows) * * * * * * * * $sql.='SQL_CALC_FOUND_ROWS '; * * * * $sql .= "* FROM {$this->dbx}`{$this->_table}` as `{$this->_table}` "; * * * * // if they provided ordering: * * * * $sql .= ' *WHERE '.$where; * * * * if(is_array($order)) $order=implode(" {$orderDir}, ", $order); * * * * if($order !== false || in_array("$order",$this->_dbFields)) * * * * * * * * $sql .= " ORDER BY {$order} {$orderDir}"; * * * * // If they provided pagination: * * * * if($offset !== false && $num != false) * * * * * * * * $sql .= " LIMIT {$offset},{$num}"; * * * * //if($orderDir != 'BAZOO') * * * * // * * *print $sql; * * * * $this->_lT('_diff'); * * * * $this->_lT('finish prep LSBW'); * * * * $this->_lT('pre query'); * * * * $this->_lT(); * * * * $this->_iterator = mysql_query($sql); * * * * $this->_lT('_diff'); * * * * $this->_lT($sql); * * * * $this->_lT('post query'); * * * * $this->_lT(); * * * * if($calcRows) * * * * { * * * * * * * * $_res=mysql_query('select FOUND_ROWS() as `rows`'); * * * * * * * * if($_res) * * * * * * * * * * * * $_row=mysql_fetch_assoc($_res); * * * * * * * * if($_row['rows']) * * * * * * * * * * * * $this->_numItems=$_row['rows']; * * * * } * * * * else * * * * * * * * $this->_numItems = mysql_num_rows($this->_iterator); * * * * //debug: output of sql query * * * * // echo "<!-- $sql ".$this->getClass()." -->\n"; * * * * $this->_lT('_diff'); * * * * $this->_lT('post numitems'); * * * * $this->objInfo['sql']=$sql; * * * * //$_SESSION['debug']['loadSomeByWhere'][]=array($this->getClass() =>$sql, 'rows' => $this->_numItems); } * * * * * * * * //if($orderDir != 'BAZOO') * * * * * * * * // * * *print $sql; * * * * * * * * $this->_lT('_diff'); * * * * * * * * $this->_lT('finish prep LSBW'); * * * * * * * * $this->_lT('pre query'); * * * * * * * * $this->_lT(); * * * * * * * * $this->_iterator = mysql_query($sql); * * * * * * * * $this->_lT('_diff'); * * * * * * * * $this->_lT($sql); * * * * * * * * $this->_lT('post query'); * * * * * * * * $this->_lT(); * * * * * * * * if($calcRows) * * * * * * * * { * * * * * * * * * * * * $_res=mysql_query('select FOUND_ROWS() as `rows`'); * * * * * * * * * * * * if($_res) * * * * * * * * * * * * * * * * $_row=mysql_fetch_assoc($_res); * * * * * * * * * * * * if($_row['rows']) * * * * * * * * * * * * * * * * $this->_numItems=$_row['rows']; * * * * * * * * } * * * * * * * * else * * * * * * * * * * * * $this->_numItems = mysql_num_rows($this->_iterator); * * * * //debug: output of sql query * * * * * * * * // echo "<!-- $sql ".$this->getClass()." -->\n"; * * * * * * * * $this->_lT('_diff'); * * * * * * * * $this->_lT('post numitems'); * * * * $this->objInfo['sql']=$sql; //$_SESSION['debug']['loadSomeByWhere'][]=array($this->getClass() =>$sql, 'rows' => $this->_numItems); * * * * } /** ** Super tricky iterator: pfft. ** ** @return bool Success indicator. **/ var $noInstance=false; // db split hack, needed to instantiate certain objects on the main db function next($instance=true, $changeDB=true) { * * * * session_start(); * * * * //$numsql=count($_SESSION['debug']['loadSomeByWhere'])-1; * * * * if($this->_iterator != false && $this->_currentIndex+1 < $this->_numItems) * * * * { * * * * * * * * $this->_currentIndex++; * * * * * * * * if(!mysql_data_seek($this->_iterator,$this->_currentIndex)) * * * * * * * * * * * * return false; * * * * * * * * $assoc = mysql_fetch_assoc($this->_iterator); * * * * * * * * $this->_thisRowAssoc = $assoc; * * * * * * * * //$_SESSION['debug']['loadSomeByWhere'][$numsql]['dataRows'][]=$assoc; * * * * * * * * $this->_setProperties($assoc); * * * * * * * * // dbsplit functionality. * * * * * * * * // as the login gets loaded either by next() or load() we hack both funcs to change the db: * * * * * * * * if($changeDB) * * * * * * * * { * * * * * * * * * * * * // echo "going there ({$changeDB})".$this->getClass()."\n"; * * * * * * * * * * * * $this->_changeUserDB(); * * * * * * * * } * * * * * * * * $obj=new ObjectData('next', $this->getClass(), $this->id, $this); // we add the extra data from the associated data table * * * * * * * * if($instance and !$this->noInstance) $this->instanceAll(); // sometimes you need to breake things in order to fix them... * * * * * * * * // quick hack to avoid calling LFL for every record, if the first one returned nothing. * * * * * * * * if(!isset($this->LFL) or $this->prevQTables[$this->LFL]!='no rows') $this->LFL=$this->_loadFromLookups(); * * * * * * * * return(true); * * * * } * * * * else * * * * * * * * return(false); } -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla -http://enigmail.mozdev.org/ iF4EAREIAAYFAk0gjrsACgkQfD3PECtxdkXdvAD9F0sMwrCPC3 cp1CTg1i9CiNaZ Mk/gyxt9j7lT5/WC7OQA/A+OdfY+MFThbrLwe4dLD7vTYsmR4YL1K81VGFCll/Xs =pGHF -----END PGP SIGNATURE----- |
#7
| |||
| |||
|
#8
| |||||
| |||||
|
|
Hi, top tuning advice is: proper indexes! Better indexing gains factor 1000. |
|
Trying to distribute load on same hardware is pure nonsense. |
|
It may make sense to add plenty of RAM. It was 8GB right? ram is cheap nowadays. Install as many as possible and adjust your my.cnf parameters. |
|
I'd try with innotop or mytop to search for the worst performing statements and add proper indexes |
|
kind regards, Toni |
#9
| |||||
| |||||
|
|
top tuning advice is: proper indexes! Not necessarily. Any index MAY speed up SELECT statement processing, but WILL slow down INSERT (and UPDATE statements affecting the column(s) being indexed). Too many indexes is as bad as too few; you need to analyze usage and create the correct indexes. And index gains may be greater or smaller than 1000 - it all depends on the data and the SELECT statement. But I think a gain of 1000 is high for most SQL tables. In case you didn't notice, the GP wrote "proper indexes". |
|
Distributing data across tables can easily improve performance, if a large portion of SELECTS are done on the same subset of data all the time. For instance, a table containing sales data can get quite large, and most searches are on recent data (last day, week, month or quarter). Archiving older data into another table cuts down the amount of data needed to be searched for most SELECT statements, improving performance. We're talking about data that's constantly utilized as explained in |
|
Too much memory can actually hurt performance because cache handling becomes more time consuming (and CPU intensive) than just retrieving the data in the first place. Again, the answer is to get the correct amount. This also is highly dependent on data and its usage. The db server will handle the memory management, the point is access to |
|
The worst performing statements may or may not be the problem. A 5 second improvement in a statement executed once a month isn't worth the time spent doing it. However, a 0.005 second improvement in a statement executed 100 times/second is a huge improvement. EXPLAINing the most commonly used statements will help here. You overarticulate bullshit to sound clever. Please stop. |
|
P.S. Please quote the appropriate text you are responding to. It helps maintain thread continuity. Thanks. Look who's talking. I (the OP) had no problem understanding the reply. |
#10
| |||
| |||
|
|
04.01.2011 15:07, Jerry Stuckle: OK, Mr "Expert at state the obvious" I'd like to thank you for your input PERMANENTLY. Please regard this as a Cease and Desist note. |
![]() |
| Thread Tools | |
| Display Modes | |
| |