dbTalk Databases Forums  

File Summary Storage location

comp.databases comp.databases


Discuss File Summary Storage location in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
J de Boyne Pollard
 
Posts: n/a

Default File Summary Storage location - 11-07-2007 , 08:59 AM






P> [...] thousands of files that are rapidly being created and deleted
[...]
P> Currently our system creates triplets for files (3 files per job
P> instance). [...]
P> Files are received by one multi-threaded process, handled and
P> delegated by 2nd multi-threaded process and finally processed
P> and deleted by 3rd multi-thread process. [...]

Modern electronic mail transport systems have much the same design. I
strongly suggest reviewing them in order to learn from existing
designs. You can start with reading how "qmail" manages its queue:
<URL:http://qmail-ldap.org./wiki/Man/Misc/INTERNALS> Notice how i-
node numbers are used to guarantee uniqueness, how processing state
guarantees work, and how the system arranges to recover from outages.


Reply With Quote
  #2  
Old   
Pops
 
Posts: n/a

Default Re: File Summary Storage location - 11-07-2007 , 06:34 PM






J de Boyne Pollard wrote:
Quote:
P> [...] thousands of files that are rapidly being created and deleted
[...]
P> Currently our system creates triplets for files (3 files per job
P> instance). [...]
P> Files are received by one multi-threaded process, handled and
P> delegated by 2nd multi-threaded process and finally processed
P> and deleted by 3rd multi-thread process. [...]

Modern electronic mail transport systems have much the same design.
Funny you recognize this. I've been in the electronic mail business
since the mid 80s. :-)

Quote:
I strongly suggest reviewing them in order to learn from existing
designs. You can start with reading how "qmail" manages its queue:
URL:http://qmail-ldap.org./wiki/Man/Misc/INTERNALS> Notice how i-
node numbers are used to guarantee uniqueness, how processing state
guarantees work, and how the system arranges to recover from outages.
I have to see check it out more thoroughly. Last I look at qmail was to
see how it and others handles multiple response lines to address a SMTP
specification update (RFC 2821bis) clarification concern.

I can't speak about gmail today, but a good deal our framework issue is
based on the idea that it still has legacy logic to support the old
slip/uucp naming convention and transport methods. This is where we got
the 64k sequential limit (actually 5 digits). SendMail had the same
problem before Eric modernize it by completely letting go of slip/uucp
for 100% smtp support. I was told as much by John Klensin (RFC 2821
current editor) that I will have to let go just like Eric did. :-)

Of course, Sendmail has far more larger customers, but I always thought
our mailer is head and shoulders faster so we didn't see the issues as
quickly as sendmail did. In addition, we were always more focus with
dynamic SMTP operations so it does more "hand-holding" out of the box
before mail is received - another scale consideration as well that
conflicts with todays high anti-spam scanning needs. But the recommended
direction is already set for SMTP mailers in the new RFC 2821bis
specifiction to do more dynamic validation before reception to reduce
bounce attack. So WCSMTP is alittle ahead of the curve in this aspect. :-)

We did make some changes back in 2000 to reduce potential clobbering.

Overall, here is I will be making changes:

- ProcessA(), the SMTP receiver using GetTempFileName() to spool
incoming files. It signals ProcessB(). Change #1

- ProcessB(), the SMTP router, processes the spool and creates the
triplets (*.DAT/XQT/CMD) UUCP format ready files for outbound mail, and
pairs (*.D/*X) for local imports, moves them to a gateway spool. Change
#2 deals with the legacy 8.3 file naming here.

- ProcessC(), the gateway, imports the *.D/X for local hosted mail.

- ProcessD(), the SMTP outbound, process qthe *.DAT/XQT/CMD by moving it
into another outbound queue.

In change #1, we get away from GetTempFileName().

In change #2, Process B() is using an atomic sequence number for the
triplets and pairs (This was the 2000 change). But it was still limited
by the 8.3 filename (actually %05dW), the seq number is type case to 16
bit and thus presents a potential bottleneck under high loads.

So once we deprecate the uucp/slip baloney, we should be ok. I can't
wait. :-)

Thanks

--
HLS


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.