![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| ||||||||
| ||||||||
|
|
Suppose we have to design a database for a recruitment agency. There will be a table "candidates" with fields "candidateid","last name","first name" ; the languages mastered by a candidate as well as the skills are separated into different tables. Eg. The skills are stored in the table "skills" with fields "skillid", "skillname","candidateid" where skillid is the primary autoincrement key. The language table is defined in a similar way. A candidate having two skills will lead to two records in the table "skills"; |
|
... similarly a candidate mastering 3 languages will lead to 3 records in the table "languages". |
|
Retrieving candidates with skills with skillname="php" and skillname="asp" as well as languagename="EN" and languagename="FR" can be done as follows: SELECT DISTINCT c.name FROM candidates c, skills s, languages l WHERE l.candidateid=c.candidateid AND s.candidateid=c.candidateid AND (s.skillname='php' OR s.skillname='asp') AND (l.languagename='FR' OR l.languagename='EN') GROUP BY c.name HAVING (count(DISTINCT s.skillid)=2 AND count(DISTINCT l.languageid)=2) |
|
This already rather complex query (for a non-expert as myself) is probably the disadvantage of a normalized design. |
|
The problem that bothers me most is the following: what happens if I want to built a search tool to find candidates using a flexible search interface; in this interface the user could type "meta queries" such as: find me the candidates that (master php AND (asp OR net)) AND (that master French OR (Dutch AND German)) |
|
This query looks quite simple but how to translate this to one query for the normalized database? |
|
Do I first have to do the following conversion: find me the candidates that (master (php AND asp) OR (php AND net)) AND (that master French OR (Dutch AND German)) becoming find me the candidates that (master (php AND asp)) AND (that master French) OR find me the candidates that (master (php AND asp)) AND (that master Dutch AND German) OR find me the candidates that (master (php AND net)) AND (that master French) OR find me the candidates that (master (php AND net)) AND (that master Dutch AND German) |
|
Next I assume the three ORs have to be combined via UNION statements? This seems quite complicated so I was wondering if there is no easier way to create an SQL query starting from the "meta query" above? |
#3
| |||
| |||
|
|
phlype.johnson (AT) gmail (DOT) com wrote: Suppose we have to design a database for a recruitment agency. There will be a table "candidates" with fields "candidateid","last name","first name" ; the languages mastered by a candidate as well as the skills are separated into different tables. Eg. The skills are stored in the table "skills" with fields "skillid", "skillname","candidateid" where skillid is the primary autoincrement key. The language table is defined in a similar way. A candidate having two skills will lead to two records in the table "skills"; Bzzzt wrong. Well maybe not necessarily. at least you are talking about tables too soon and cetainly the description of a skills table is not Normalized. Think of entities and attributes at this point (yes it is just a straight substitution of words but the thought process is often driven by how we phrase the question.) So candidates has attributes: first name, last name, id, DOB, height, skills, languages spoken, etc As part of normalizing you realize a candidate can have more than one skill. Also you realize skills have some of their own attributes. so you remove skills from the candidate entity, creating a skills entity Skills include attributes like name, type(musical, industrial,...), tools required(yes/no), etc now candidates can have one or more so this leads directly to a relation I'll label candidate_skill. it has three attributes: candidate first name, last name, and skills name. Implementing this design would be easiest in three tables CANDIDATES, SKILLS, and CANDIDATE_SKILLS one candidate with two different skills does require two rows in the CANDIDATE_SKILLS table. Another candidate with 3skills (2 of them the same as the first candidate requires three rows in the CANDIDATE_SKILLS table. And in the skills table is just three skills rows, one for each unique skill. There would be foreign key constraints from the CANDIDATE_SKILLS table: from the skill attribute to the skill PK in the SKILLS table. and from the name attributes to the name attributes forming the PK in the CANDIDATE table. ... similarly a candidate mastering 3 languages will lead to 3 records in the table "languages". similarly in a CANDIDATES_LANGUAGES table Retrieving candidates with skills with skillname="php" and skillname="asp" as well as languagename="EN" and languagename="FR" can be done as follows: SELECT DISTINCT c.name FROM candidates c, skills s, languages l WHERE l.candidateid=c.candidateid AND s.candidateid=c.candidateid AND (s.skillname='php' OR s.skillname='asp') AND (l.languagename='FR' OR l.languagename='EN') GROUP BY c.name HAVING (count(DISTINCT s.skillid)=2 AND count(DISTINCT l.languageid)=2) Ignoring normalization for the moment, why do you think you need DISTINCT? the GROUP BY will not return multiple JOHN rows in the result set, so the DISTINCT is totally redundant in this query. This already rather complex query (for a non-expert as myself) is probably the disadvantage of a normalized design. Well I have ranted about pseudoKeys like your candidate ID and this shows one of the disadvantages. Using the tables I described above, you don't even need to visit the candidates table: SELECT s.first_name,s.last_name FROM skills s, languages L WHERE s.first_name=L.first_name AND s.last_name=L.last_name AND s.skillname IN ('php' , 'asp') AND L.languagename IN ('FR', 'EN') GROUP BY s.first_name,s.last_name HAVING (count(DISTINCT s.skillid)=2 AND count(DISTINCT L.languageid)=2) But there are many more benefits to a normalized schema that far outweigh the perceived complexity of this query. Consider that to make it easier for general users to form queries, rather than denormalizing the tables, consider creating a view for some of the most common data requests. For example (mushing first and last name down to cname, but ou get the idea) create view CANDIDATE_3SKILLS as SELECT C.*,s1.skillname skill1, s2.skillname skill2, s3.skillname skill3 from CANDIDATE C, CANDIDATE_SKILLS s1, CANDIDATE_SKILLS s2, CANDIDATE_SKILLS s3 where CANDIDATE.cname = s1.cname and CANDIDATE.cname = s2.cname and CANDIDATE.cname = s3.cname and s1.skillname > s2.skillname and s2.skillname > s3.skillname of course this only works for candidates with exactly 3 skills. Candidates with less do not appear and candidates with more appear more than once. (part of the trouble of denormalizing) The problem that bothers me most is the following: what happens if I want to built a search tool to find candidates using a flexible search interface; in this interface the user could type "meta queries" such as: find me the candidates that (master php AND (asp OR net)) AND (that master French OR (Dutch AND German)) Ah, so you are less concerned about the readablilty of the query. good If you are going for a more natural language query for the user why not something like: * find me the candidates that master php and (asp or net) and that master French or (Dutch and German) you may want to provide the users some training about grouping via parentheses. (do you really want someone that speaks ONLY French?) also it might be much easier if you allow more specific terms such as: * find me the candidates that program in php and (asp or net) and that speak French or (Dutch and German) This query looks quite simple but how to translate this to one query for the normalized database? a LOT easier than for a denormalized DB! It is fairly easy for a language parser program. A good software engineer can dealt with this. (send me an email and I could help or recommend someone else). Do I first have to do the following conversion: find me the candidates that (master (php AND asp) OR (php AND net)) AND (that master French OR (Dutch AND German)) becoming find me the candidates that (master (php AND asp)) AND (that master French) OR find me the candidates that (master (php AND asp)) AND (that master Dutch AND German) OR find me the candidates that (master (php AND net)) AND (that master French) OR find me the candidates that (master (php AND net)) AND (that master Dutch AND German) no. Next I assume the three ORs have to be combined via UNION statements? This seems quite complicated so I was wondering if there is no easier way to create an SQL query starting from the "meta query" above? There are easier ways. You need an interpreter to translate from the natural meta language query to SQL. It is an interesting kind of programming. It helps to have a well defined syntax for the "meta language", but for your case you might have a language that allows some fluff (such as the opening "find me the " phrase) which makes the user more comfortable but adds little to the meaning of the entire query. HTH, Ed |
#4
| |||
| |||
|
|
Thanks Ed for your nice feedback. I agree that the example was not completely normalized however on the redundance of the DISTINCT I disagree. I was planning to repost my question with some more explanation and as you will see in my verbose description below: if you leave out the DISTINCT you will change the result of the query (have a try with the provided mysql example); the reason is that if you have a person with skills A and B that also speaks language alpha, the join result of the query looking for a person having those properties will contain two rows: one for skill A and one for skill B each time with language alpha, meaning that alpha appears twice unless you do the COUNT (DISTINCT ... |
#5
| |||
| |||
|
|
phlype.johnson (AT) gmail (DOT) com wrote: Thanks Ed for your nice feedback. I agree that the example was not completely normalized however on the redundance of the DISTINCT I disagree. I was planning to repost my question with some more explanation and as you will see in my verbose description below: if you leave out the DISTINCT you will change the result of the query (have a try with the provided mysql example); the reason is that if you have a person with skills A and B that also speaks language alpha, the join result of the query looking for a person having those properties will contain two rows: one for skill A and one for skill B each time with language alpha, meaning that alpha appears twice unless you do the COUNT (DISTINCT ... Just addressing this issue quickly. I read the rest of the post later. The DISTINCT in the HAVING clause is fine and I agree it should stay, it was the DISTINCT in the SELECT clause which is redundant. GROUP BY cannot return repeated rows in the result set. OK I agree. |
#6
| ||||||||
| ||||||||
|
|
Here comes my longer post: |
|
To illustrate better my question on whether normalized designs lead to more complex queries yes or no, I have prepared an example. The example is a database with the following tables: *table person with fields: -persid: autoincrement id -name: name of the person |
|
*table material with fields: -materialid: autoincrement id -material: name of the material eg "wood" *table color with fields: -colorid: autoincrement id -color: name of the color eg "green" *table persmaterial with fields: -persmatid: autoincrement id -persid: link to table person -materialid: link to table material |
|
*table perscolor with fields: -perscolorid: autoincrement id -persid: link to table person -colorid: link to table color Using these tables it is straightforward to store the preference of a certain person for colors and materials. |
|
Now use the statements at the end of the post to create the tables and populate them with intial values. Next we want to find all persons who like the colors red or blue and also like the color green. Furthermore the person should have a preference for iron as material. I understand there are several ways to obtain the requested result. Option 1: The initial query can be written in pseudocode as Find all persons that (like as color (red OR blue) AND green) AND that (like as material iron). I do not see right away how to write this as a query with joins so I rewrite the pseudo query as follows: Find all persons that ((like as color red AND green) AND that (like as material iron)) OR ((like as color blue AND green) AND that (like as material iron)) this leads to the following query (SELECT p.persid FROM person p, perscolor pc, persmaterial pm WHERE p.persid=pc.persid AND (pc.colorid=1 OR pc.colorid=2) AND p.persid=pm.persid AND pm.materialid=2 GROUP BY p.persid HAVING (count(DISTINCT pc.colorid)=2 AND count(DISTINCT pm.materialid)=1)) UNION (SELECT p.persid FROM person p, perscolor pc, persmaterial pm WHERE p.persid=pc.persid AND (pc.colorid=2 OR pc.colorid=3) AND p.persid=pm.persid AND pm.materialid=2 GROUP BY p.persid HAVING (count(DISTINCT pc.colorid)=2 AND count(DISTINCT pm.materialid)=1)) |
|
-- phpMyAdmin SQL Dump -- version 2.6.1 -- http://www.phpmyadmin.net -- -- Host: localhost -- Generation Time: Oct 19, 2006 at 01:13 PM -- Server version: 4.1.9 -- PHP Version: 4.3.10 -- -- Database: `aston` -- -- -------------------------------------------------------- -- -- Table structure for table `color` -- CREATE TABLE `color` ( `colorid` int(11) NOT NULL auto_increment, `color` varchar(30) NOT NULL default '', PRIMARY KEY (`colorid`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=5 ; |
|
-- -- Dumping data for table `color` -- INSERT INTO `color` VALUES (1, 'red'); INSERT INTO `color` VALUES (2, 'green'); INSERT INTO `color` VALUES (3, 'blue'); INSERT INTO `color` VALUES (4, 'yellow'); -- -------------------------------------------------------- -- -- Table structure for table `material` -- CREATE TABLE `material` ( `materialid` int(11) NOT NULL auto_increment, `material` varchar(30) NOT NULL default '', PRIMARY KEY (`materialid`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=3 ; -- -- Dumping data for table `material` -- INSERT INTO `material` VALUES (1, 'wood'); INSERT INTO `material` VALUES (2, 'iron'); -- -------------------------------------------------------- -- -- Table structure for table `perscolor` -- CREATE TABLE `perscolor` ( `perscolorid` int(11) NOT NULL auto_increment, `persid` int(11) NOT NULL default '0', `colorid` int(11) NOT NULL default '0', PRIMARY KEY (`perscolorid`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=7 ; |
|
-- -- Dumping data for table `perscolor` -- INSERT INTO `perscolor` VALUES (1, 1, 1); INSERT INTO `perscolor` VALUES (2, 1, 2); INSERT INTO `perscolor` VALUES (3, 2, 1); INSERT INTO `perscolor` VALUES (5, 3, 3); INSERT INTO `perscolor` VALUES (6, 3, 2); -- -------------------------------------------------------- -- -- Table structure for table `persmaterial` -- CREATE TABLE `persmaterial` ( `persmatid` int(11) NOT NULL auto_increment, `persid` int(11) NOT NULL default '0', `materialid` int(11) NOT NULL default '0', PRIMARY KEY (`persmatid`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=6 ; -- -- Dumping data for table `persmaterial` -- INSERT INTO `persmaterial` VALUES (1, 1, 1); INSERT INTO `persmaterial` VALUES (2, 1, 2); INSERT INTO `persmaterial` VALUES (3, 2, 1); INSERT INTO `persmaterial` VALUES (5, 3, 2); -- -------------------------------------------------------- -- -- Table structure for table `person` -- CREATE TABLE `person` ( `persid` int(11) NOT NULL auto_increment, `name` varchar(30) NOT NULL default '', PRIMARY KEY (`persid`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=4 ; -- -- Dumping data for table `person` -- INSERT INTO `person` VALUES (1, 'john'); INSERT INTO `person` VALUES (2, 'emily'); INSERT INTO `person` VALUES (3, 'liz'); |
![]() |
| Thread Tools | |
| Display Modes | |
| |