dbTalk Databases Forums  

How to find holes in a sequence?

comp.databases.mysql comp.databases.mysql


Discuss How to find holes in a sequence? in the comp.databases.mysql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Steve Ackman
 
Posts: n/a

Default How to find holes in a sequence? - 05-05-2008 , 02:00 PM






I've got a leafnode directory with usenet articles
named, roughly, 1-79389.

After running a sh script on the directory to populate a
table with various headers and the full article, I have,

mysql> select count(*) from rcm;
+----------+
Quote:
count(*) |
+----------+
79303 |
+----------+
1 row in set (0.00 sec)

and an actual article count on the directory of 79347.

Obviously, there are 44 articles that for some
reason or other, failed the INSERT (and an additional
42 that don't exist as filenames). Given that
server_id (mediumint) is assigned the filename of the
article, how can I find which numbers between 1 and
79389 *don't* exist as server_id?


Reply With Quote
  #2  
Old   
cvh@LE
 
Posts: n/a

Default Re: How to find holes in a sequence? - 05-05-2008 , 03:57 PM






On May 5, 9:00*pm, Steve Ackman <st... (AT) SNIP-THIS (DOT) twoloonscoffee.com>
wrote:
Quote:
* I've got a leafnode directory with usenet articles
named, roughly, 1-79389.

After running a sh script on the directory to populate a
table with various headers and the full article, I have,

mysql> select count(*) from rcm;
+----------+
| count(*) |
+----------+
| * *79303 |
+----------+
1 row in set (0.00 sec)

and an actual article count on the directory of 79347.

* Obviously, there are 44 articles that for some
reason or other, failed the INSERT (and an additional
42 that don't exist as filenames). *Given that
server_id (mediumint) is assigned the filename of the
article, how can I find which numbers between 1 and
79389 *don't* exist as server_id?
I'm a bit unsure whether I correctly understand what you try to
accomplish,
but given you have a table with all numbers 1..79389 you could simply
run a denying left join query on both tables
like:

select
f.id as missing
from fulltable f left join leafnodes l on f.id=l.nodeid
where isnull(l.nodeid)

but I am afraid I'm missing the point here



Reply With Quote
  #3  
Old   
Steve Ackman
 
Posts: n/a

Default Re: How to find holes in a sequence? - 05-05-2008 , 05:46 PM



In <9b3a64b4-aeab-43fb-bc07-ef960a2c34b0 (AT) y38g2000hsy (DOT) googlegroups.com>,
on Mon, 5 May 2008 13:57:43 -0700 (PDT), cvh@LE,
christian.hansel (AT) cpi-service (DOT) com wrote:
Quote:
On May 5, 9:00*pm, Steve Ackman <st... (AT) SNIP-THIS (DOT) twoloonscoffee.com
wrote:
* I've got a leafnode directory with usenet articles
named, roughly, 1-79389.

After running a sh script on the directory to populate a
table with various headers and the full article, I have,

mysql> select count(*) from rcm;
+----------+
| count(*) |
+----------+
| * *79303 |
+----------+
1 row in set (0.00 sec)

and an actual article count on the directory of 79347.

* Obviously, there are 44 articles that for some
reason or other, failed the INSERT (and an additional
42 that don't exist as filenames). *Given that
server_id (mediumint) is assigned the filename of the
article, how can I find which numbers between 1 and
79389 *don't* exist as server_id?

I'm a bit unsure whether I correctly understand what you try to
accomplish,
but given you have a table with all numbers 1..79389
I'm not sure whether you think I have such a table,
or whether you're suggesting I create one.

Quote:
you could simply
run a denying left join query on both tables
like:

select
f.id as missing
from fulltable f left join leafnodes l on f.id=l.nodeid
where isnull(l.nodeid)
I have one table rcm. That's it. I guess I can
look up denying left joins and see where that leads me.
Probably to understanding what you're doing there.

Quote:
but I am afraid I'm missing the point here
The source is a directory containing filenames in the
range of 3 to 79389.

The table "rcm" has 79303 records with server_id field
in the range of those above filenames... so some files
failed to be INSERTED by the script.

Very basically, given a table with record id numbers
of 2,3,4,6,7,8,9, which numbers are missing? How do I
get MySQL to tell me that 1 and 5 are not there?

The ultimate point is to find out which 44 scripted
INSERT statements failed and why, to fix the script.

I could find the missing articles via a couple of
additional steps in the shell script, but it takes
hours each time I run it, so I was hoping MySQL would
have some sort of easy built-in way to do it.


Reply With Quote
  #4  
Old   
Peter H. Coffin
 
Posts: n/a

Default Re: How to find holes in a sequence? - 05-05-2008 , 07:44 PM



On Mon, 5 May 2008 18:46:19 -0400, Steve Ackman wrote:
Quote:
In <9b3a64b4-aeab-43fb-bc07-ef960a2c34b0 (AT) y38g2000hsy (DOT) googlegroups.com>,
on Mon, 5 May 2008 13:57:43 -0700 (PDT), cvh@LE,
christian.hansel (AT) cpi-service (DOT) com wrote:
On May 5, 9:00*pm, Steve Ackman <st... (AT) SNIP-THIS (DOT) twoloonscoffee.com
wrote:
* I've got a leafnode directory with usenet articles
named, roughly, 1-79389.

After running a sh script on the directory to populate a
table with various headers and the full article, I have,

mysql> select count(*) from rcm;
+----------+
| count(*) |
+----------+
| * *79303 |
+----------+
1 row in set (0.00 sec)

and an actual article count on the directory of 79347.

* Obviously, there are 44 articles that for some
reason or other, failed the INSERT (and an additional
42 that don't exist as filenames). *Given that
server_id (mediumint) is assigned the filename of the
article, how can I find which numbers between 1 and
79389 *don't* exist as server_id?

I'm a bit unsure whether I correctly understand what you try to
accomplish,
but given you have a table with all numbers 1..79389

I'm not sure whether you think I have such a table,
or whether you're suggesting I create one.
Create one.

ome web searching will turn up a variety of tricks for creating a table
filled with test data.

--
43. I will maintain a healthy amount of skepticism when I capture the beautiful
rebel and she claims she is attracted to my power and good looks and will
gladly betray her companions if I just let her in on my plans.
--Peter Anspach's list of things to do as an Evil Overlord


Reply With Quote
  #5  
Old   
cvh@LE
 
Posts: n/a

Default Re: How to find holes in a sequence? - 05-06-2008 , 02:08 AM



On 6 Mai, 00:46, Steve Ackman <st... (AT) SNIP-THIS (DOT) twoloonscoffee.com>
wrote:
Quote:
In <9b3a64b4-aeab-43fb-bc07-ef960a2c3... (AT) y38g2000hsy (DOT) googlegroups.com>,
on Mon, 5 May 2008 13:57:43 -0700 (PDT), cvh@LE,



christian.han... (AT) cpi-service (DOT) com wrote:
On May 5, 9:00*pm, Steve Ackman <st... (AT) SNIP-THIS (DOT) twoloonscoffee.com
wrote:
* I've got a leafnode directory with usenet articles
named, roughly, 1-79389.

After running a sh script on the directory to populate a
table with various headers and the full article, I have,

mysql> select count(*) from rcm;
+----------+
| count(*) |
+----------+
| * *79303 |
+----------+
1 row in set (0.00 sec)

and an actual article count on the directory of 79347.

* Obviously, there are 44 articles that for some
reason or other, failed the INSERT (and an additional
42 that don't exist as filenames). *Given that
server_id (mediumint) is assigned the filename of the
article, how can I find which numbers between 1 and
79389 *don't* exist as server_id?

I'm a bit unsure whether I correctly understand what you try to
accomplish,
but given you have a table with all numbers 1..79389

* I'm not sure whether you think I have such a table,
or whether you're suggesting I create one.

you could simply
run a denying left join query on both tables
like:

select
f.id as missing
from fulltable f left join leafnodes l on f.id=l.nodeid
where isnull(l.nodeid)

* I have one table rcm. *That's it. *I guess I can
look up denying left joins and see where that leads me.
Probably to understanding what you're doing there.

*but I am afraid I'm missing the point here

* The source is a directory containing filenames in the
range of 3 to 79389.

* The table "rcm" has 79303 records with server_id field
in the range of those above filenames... so some files
failed to be INSERTED by the script.

* Very basically, given a table with record id numbers
of 2,3,4,6,7,8,9, which numbers are missing? *How do I
get MySQL to tell me that 1 and 5 are not there?

* The ultimate point is to find out which 44 scripted
INSERT statements failed and why, to fix the script.

* I could find the missing articles via a couple of
additional steps in the shell script, but it takes
hours each time I run it, so I was hoping MySQL would
have some sort of easy built-in way to do it.
I am abit surprised that running a shell script over a directory
should take so long.

Using Linux tools like awk with piped output to mysql a table (given
its created) of all filenames/nodes in a directory can easily be
created by something like:

ls -l |awk '$1!~/^d/{print "INSERT INTO mytable set filename=
\""$8"\";"}' |mysql -uxxxx -pxxxx mydatabase

of course this would take longer if your putting fulltext content in
the database, too. However this little snippet should suffice to fill
a table of all files in a directory

If you only want files of a given pattern (e.g. your integer-numbered
filename) change the snippet to something like

ls -l |awk '$1!~/^d/ && $8~/^[0-9]+/{print "INSERT INTO mytable set
filename=\""$8"\";"}' |mysql -uxxxx -pxxxx mydatabase

if your filenames contain extensions like ".png" or ".msg" etc. you
also will need to change the join condition using the substring_index
function like:

from fulltable f left join leafnodes l on substring_index(f.name,",",
1)=l.nodeid

hope this helps a bit



Reply With Quote
  #6  
Old   
Steve Ackman
 
Posts: n/a

Default Re: How to find holes in a sequence? - 05-06-2008 , 05:23 PM



In <41e20971-4fa0-41eb-9eff-b2bcddb1d6a5 (AT) d45g2000hsc (DOT) googlegroups.com>,
on Tue, 6 May 2008 00:08:15 -0700 (PDT), cvh@LE,
christian.hansel (AT) cpi-service (DOT) com wrote:

Quote:
I am abit surprised that running a shell script over a directory
should take so long.
It takes ~34 minutes on this machine just to do this:

#! /bin/sh

i=3
while [ $i -le 79389 ]
do
echo $i
mysql news <<MARKER
INSERT into count
(id)
VALUES
($i);
MARKER
i=$((i+1))
done

Now that it's made, I can look at your earlier post
re: left join deny to see if I can actually make any
sense of it. ;-)

Quote:
Using Linux tools like awk with piped output to mysql a table (given
its created) of all filenames/nodes in a directory can easily be
created by something like:

ls -l |awk '$1!~/^d/{print "INSERT INTO mytable set filename=
\""$8"\";"}' |mysql -uxxxx -pxxxx mydatabase
I'm breaking out headers (with awk/sed) so I can do:

mysql news <<MARKER
# DELETE FROM rcm WHERE server_id = $i;
INSERT into rcm
(server_id,Message_ID,Sender,Subject,Date,User_Age nt,NNTP_Posting_Host,X_Trace,Content_Type,Xref,Art icle)
VALUES
($i,"$Message_ID","$Sender",'$Subject','$Date','$U ser_Agent','$NNTP_Posting_Host','$X_Trace','$Conte nt_Type','$Xref',"$Article");
MARKER

Quote:
of course this would take longer if your putting fulltext content in
the database, too.
I haven't decided about fulltext yet... or whether
to include Lines and Bytes.

Quote:
However this little snippet should suffice to fill
a table of all files in a directory
I don't see that simply filling a table with files
would be particularly useful (would it?), though I
suppose searching keywords would be quicker than using
grep.



Reply With Quote
  #7  
Old   
Helmut Wais
 
Posts: n/a

Default Re: How to find holes in a sequence? - 05-07-2008 , 04:58 AM



Steve Ackman wrote:
Quote:
In <41e20971-4fa0-41eb-9eff-b2bcddb1d6a5 (AT) d45g2000hsc (DOT) googlegroups.com>,
on Tue, 6 May 2008 00:08:15 -0700 (PDT), cvh@LE,
christian.hansel (AT) cpi-service (DOT) com wrote:

I am abit surprised that running a shell script over a directory
should take so long.

It takes ~34 minutes on this machine just to do this:

#! /bin/sh

i=3
while [ $i -le 79389 ]
do
echo $i
mysql news <<MARKER
INSERT into count
(id)
VALUES
($i);
MARKER
i=$((i+1))
done
which uses one DB connection and one mysql process for each row.

try this:

( i=3
while [ $i -le 79389 ]
do
echo "INSERT into count (id) VALUES ($i);"
i=$((i+1))
done
) | mysql news

regards, helmut


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.