dbTalk Databases Forums  

Reasoning behind process instead of thread based arch?

comp.databases.postgresql.general comp.databases.postgresql.general


Discuss Reasoning behind process instead of thread based arch? in the comp.databases.postgresql.general forum.



Reply
 
Thread Tools Display Modes
  #11  
Old   
Tom Lane
 
Posts: n/a

Default Re: Reasoning behind process instead of thread based - 10-27-2004 , 05:59 PM






Thomas Hallgren <thhal (AT) mailblocks (DOT) com> writes:
Quote:
Tom Lane wrote:
Right. Depending on your OS you may be able to catch a signal that
would kill a thread and keep it from killing the whole process, but
this still leaves you with a process memory space that may or may not
be corrupted.

It is very common that you either get a null pointer exception (attempt
to access address zero), that your stack will hit a write protected page
(stack overflow), or that you get some sort of arithemtic exception.
These conditions can be trapped and gracefully handled.
That argument has zilch to do with the question at hand. If you use a
coding style in which these things should be considered recoverable
errors, then setting up a signal handler to recover from them works
about the same whether the process is multi-threaded or not. The point
I was trying to make is that when an unrecognized trap occurs, you have
to assume not only that the current thread of execution is a lost cause,
but that it may have clobbered any memory it can get its hands on.

Quote:
I'm not sure I follow. You will be able to bring all threads of one
process to a halt much faster than you can kill a number of external
processes.
Speed is not even a factor in this discussion; or do you habitually
spend time optimizing cases that aren't supposed to happen? The point
here is circumscribing how much can go wrong before you realize you're
in trouble.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match



Reply With Quote
  #12  
Old   
Thomas Hallgren
 
Posts: n/a

Default Re: Reasoning behind process instead of thread based - 10-28-2004 , 02:41 AM






Tom Lane wrote:
Quote:
That argument has zilch to do with the question at hand. If you use a
coding style in which these things should be considered recoverable
errors, then setting up a signal handler to recover from them works
about the same whether the process is multi-threaded or not. The point
I was trying to make is that when an unrecognized trap occurs, you have
to assume not only that the current thread of execution is a lost cause,
but that it may have clobbered any memory it can get its hands on.

I'm just arguing that far from all signals are caused by unrecoverable
errors and that threads causing them can be killed individually and
gracefully.

I can go further and say that in some multi-threaded environments you as
a developer don't even have the opportunity to corrupt memory. In such
environments the recognized traps are the only ones you encounter unless
the environment is corrupt in itself. In addition, there are a number of
techniques that can be used to make it impossible for the threads to
unintentionally interfere with each others memory.

I'm not at all contesting the fact that a single-threaded server
architecture is more bug-tolerant and in some ways easier to manage.
What I'm trying to say is that it is very possible to write even better,
yet very reliable servers using a multi-threaded architecture and high
quality code.

Quote:
... The point here is circumscribing how much can go wrong before you
realize you're in trouble.

Ok now I do follow. With respect to my last comment about speed, I guess
it's long overdue to kill this thread now. Let's hope the forum stays
intact :-)

Regards,
Thomas Hallgren




---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html



Reply With Quote
  #13  
Old   
Thomas Hallgren
 
Posts: n/a

Default Re: Reasoning behind process instead of thread based - 10-28-2004 , 05:15 AM



nd02tsk (AT) student (DOT) hig.se wrote:

Quote:
So Thomas, you say you like the PostgreSQL process based modell better
than the threaded one used by MySQL. But you sound like the opposite. I'd
like to know why you like processes more.


Ok, let me try and explain why I can be perceived as a scatterbrain :-).

PostgreSQL is a very stable and well functioning product. It is one of
the few databases out there that has a well documented way of adding
plugins written in C and quite a few plugins exists today. You have all
the server side languages, (PL/pgsql PL/Perl, PL/Tcl, PL/Java, etc.),
and a plethora of custom functions and other utilities. Most of this is
beyond the control of the PostgreSQL core team since it's not part of
the core product. It would be extremely hard to convert everything into
a multi-threaded environment and it would be even harder to maintain the
very high quality that would be required.

I think PostgreSQL in it's current shape, is ideal for a distributed,
Open Source based conglomerate of products. The high quality core firmly
controlled by the core team, in conjunction with all surrounding
features, brings you DBMS functionality that is otherwise unheard of in
the free software market. I believe that this advantage is very much due
to the simplicity and bug-resilient single-threaded design of the
PostgreSQL.

My only regret is that the PL/Java, to which I'm the father, is confined
to one connection only. But that too has some advantages in terms of
simplicity and reliability.

So far PostgreSQL
At present, I'm part of a team that develops a very reliable
multi-threaded system (a Java VM). In this role, I've learned a lot
about how high performance thread based systems can be made. If people
on this list wants to dismiss multi-threaded systems, I feel they should
do it based on facts. It's more than possible to build a great
multi-threaded server. It is my belief that as PostgreSQL get more
representation in the high end market where the advantages of
multi-threaded solutions get more and more apparent, it will find that
the competition from a performance standpoint is sometimes overwhelming.

I can't say anything about MySQL robustness because I haven't used it
much. Perhaps the code quality is indeed below what is required for a
multi-threaded system, perhaps not. I choose PostgreSQL over MySQL
because MySQL lacks some of the features that I feel are essential,
because it does some things dead wrong, and because it is dual licensed.

Hope that cleared up some of the confusion.

Regards,
Thomas Hallgren




---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly



Reply With Quote
  #14  
Old   
Marco Colombo
 
Posts: n/a

Default Re: Reasoning behind process instead of thread based - 10-28-2004 , 07:44 AM



[processes vs threads stuff deleted]

In any modern and reasonable Unix-like OS, there's very little difference
between the multi-process or the multi-thread model. _Default_ behaviour
is different, e.g. memory is shared by default for threads, but processes
can share memory as well. There are very few features threads have
that processes don't, and vice versa. And if the OS is good enough,
there are hardly performance issues.

I think that it would be interesting to discuss multi(processes/threades)
model vs mono (process/thread). Mono as in _one_ single process/thread
per CPU, not one per session. That is, moving all the "scheduling"
between sessions entirely to userspace. The server gains almost complete
control over the data structures allocated per session, and the resources
allocated _to_ sessions.

I bet this is very theoretical since it'd require a complete redesign
of some core stuff. And I have strong concerns about portability. Still,
it could be interesting.

..TM.
--
____/ ____/ /
/ / / Marco Colombo
___/ ___ / / Technical Manager
/ / / ESI s.r.l.
_____/ _____/ _/ Colombo (AT) ESI (DOT) it

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster


Reply With Quote
  #15  
Old   
Thomas Hallgren
 
Posts: n/a

Default Re: Reasoning behind process instead of thread based - 10-28-2004 , 09:40 AM



Marco Colombo wrote:
Quote:
[processes vs threads stuff deleted]

In any modern and reasonable Unix-like OS, there's very little difference
between the multi-process or the multi-thread model. _Default_ behaviour
is different, e.g. memory is shared by default for threads, but processes
can share memory as well. There are very few features threads have
that processes don't, and vice versa. And if the OS is good enough,
there are hardly performance issues.

Most servers have a desire to run on Windows-NT and I would consider
Solaris a "modern and reasonable Unix-like OS". On both, you will find a
significant performance difference. I think that's true for Irix as
well. Your statement is very true for Linux based OS'es though.

Quote:
I think that it would be interesting to discuss multi(processes/threades)
model vs mono (process/thread). Mono as in _one_ single process/thread
per CPU, not one per session. That is, moving all the "scheduling"
between sessions entirely to userspace. The server gains almost complete
control over the data structures allocated per session, and the resources
allocated _to_ sessions.

I think what you mean is user space threads. In the Java community known
as "green" threads, Windows call it "fibers". That approach has been
more or less abandoned by Sun, BEA, and other Java VM manufacturers
since a user space scheduler is confined to one CPU, one process, and
unable to balance the scheduling with other processes and their threads.
A kernel scheduler might be slightly heavier but it does a much better job.

Regards,
Thomas Hallgren


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings



Reply With Quote
  #16  
Old   
Marco Colombo
 
Posts: n/a

Default Re: Reasoning behind process instead of thread based - 10-28-2004 , 11:23 AM



On Thu, 28 Oct 2004, Thomas Hallgren wrote:

Quote:
Marco Colombo wrote:
[processes vs threads stuff deleted]

In any modern and reasonable Unix-like OS, there's very little difference
between the multi-process or the multi-thread model. _Default_ behaviour
is different, e.g. memory is shared by default for threads, but processes
can share memory as well. There are very few features threads have
that processes don't, and vice versa. And if the OS is good enough,
there are hardly performance issues.

Most servers have a desire to run on Windows-NT and I would consider Solaris
a "modern and reasonable Unix-like OS". On both, you will find a significant
performance difference. I think that's true for Irix as well. Your statement
is very true for Linux based OS'es though.
See the "if the OS is good enough" part... :-)

AFAIK, many techniques developed under Linux have been included in
recent releases of other OSes. I haven't seen the source, of course.

If recent Solaris still has processes which are actually "heavy", well
I call that "an old legacy (mis-)feature on a modern and reasonable OS"...
Back in '93, Mr. Gates used to state: "NT is Unix". If it's not the case
yet, well, it's not _my_ fault.

Quote:
I think that it would be interesting to discuss multi(processes/threades)
model vs mono (process/thread). Mono as in _one_ single process/thread
per CPU, not one per session. That is, moving all the "scheduling"
between sessions entirely to userspace. The server gains almost complete
control over the data structures allocated per session, and the resources
allocated _to_ sessions.

I think what you mean is user space threads. In the Java community known as
"green" threads, Windows call it "fibers". That approach has been more or
less abandoned by Sun, BEA, and other Java VM manufacturers since a user
space scheduler is confined to one CPU, one process, and unable to balance
the scheduling with other processes and their threads. A kernel scheduler
might be slightly heavier but it does a much better job.

Regards,
Thomas Hallgren
No. I just meant "scheduling" between PG sessions. I'm not interested in
userspace threads. Those are general purpose solutions, with the drawbacks
you pointed out.

I mean an entirely event driven server. The trickiest part is to handle
N-way. On 1-way, it's quite a clear and well-defined model.

I'm not going to say it's easy. I'd like to move the discussion away from
the sterile processes vs threads issue. Most differences there are
platform specific anyway. The model is the same: one thread of execution
per session. I'm proposing a new model entirely (well I'm proposing
a _discussion_ on a model vs. model basis and not implementation vs
implementation of the same model).

If you read this thread, you'll notice most people miss the point:
either processes or threads, the model is the same, many many actors
that share a big part of their memory. The problems are the same, too.
Should we buy the fact that processes are safer? Of course, it's not the
case, when they share such a big memory segment. The chance of a runaway
pointer thrashing some important shared data is almost the same for both
processes and threads. If one backend crashes for a SIGSEGV, I'd bet
nothing on the shared mem not being corrupted somehow.

My point being: how about [discussing of] a completely different model
instead?

..TM.
--
____/ ____/ /
/ / / Marco Colombo
___/ ___ / / Technical Manager
/ / / ESI s.r.l.
_____/ _____/ _/ Colombo (AT) ESI (DOT) it

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend



Reply With Quote
  #17  
Old   
Thomas Hallgren
 
Posts: n/a

Default Re: Reasoning behind process instead of thread based - 10-28-2004 , 02:07 PM



Marco,

Quote:
I mean an entirely event driven server. The trickiest part is to handle
N-way. On 1-way, it's quite a clear and well-defined model.
You need to clarify this a bit.

You say that the scheduler is in user-space, yet there's only one thread
per process and one process per CPU. You state that instead of threads,
you want it to be completely event driven. In essence that would mean
serving one event per CPU from start to end at any given time. What is
an event in this case? Where did it come from? How will this system
serve concurrent users?

Regards,
Thomas Hallgren



---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster



Reply With Quote
  #18  
Old   
Jim C. Nasby
 
Posts: n/a

Default Re: Reasoning behind process instead of thread based - 10-28-2004 , 05:47 PM



On Thu, Oct 28, 2004 at 02:44:55PM +0200, Marco Colombo wrote:
Quote:
I think that it would be interesting to discuss multi(processes/threades)
model vs mono (process/thread). Mono as in _one_ single process/thread
per CPU, not one per session. That is, moving all the "scheduling"
between sessions entirely to userspace. The server gains almost complete
control over the data structures allocated per session, and the resources
allocated _to_ sessions.
This is how DB2 and Oracle work. Having scheduling control is very
interesting, but I'm not sure it needs to be accomplished this way.
There are other advantages too; in both products you have a single pool
of sort memory; you can allocate as much memory to sorting as you want
without the risk of exceeding it. PostgreSQL can't do this and it makes
writing code that wants a lot of sort memory a real pain. Of course this
could probably be solved without going to a 'mono process' model.
--
Jim C. Nasby, Database Consultant decibel (AT) decibel (DOT) org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html



Reply With Quote
  #19  
Old   
Marco Colombo
 
Posts: n/a

Default Re: Reasoning behind process instead of thread based - 10-29-2004 , 05:38 AM



On Thu, 28 Oct 2004, Thomas Hallgren wrote:

Quote:
Marco,

I mean an entirely event driven server. The trickiest part is to handle
N-way. On 1-way, it's quite a clear and well-defined model.

You need to clarify this a bit.

You say that the scheduler is in user-space, yet there's only one thread per
process and one process per CPU. You state that instead of threads, you want
it to be completely event driven. In essence that would mean serving one
event per CPU from start to end at any given time. What is an event in this
case? Where did it come from? How will this system serve concurrent users?
Let's take a look at the bigger picture. We need to serve many clients,
that is many sessions, that is many requests (queries) at the same time.
Since there may be more than one active request, we need to schedule
them in some way. That's what I meant with "session scheduler".

The traditional accept&fork model doesn't handle that directly: by
creating one process per session, it relays on the process scheduler
in the kernel. I state this is suboptimal, both for extra resources
allocated to each session, and for the kernel policies not being
perfectly tailored to the job of scheduling PG sessions (*).
Not to mention the postmaster has almost no control over these policies.

Now, threads help a bit in reducing the per session overhead. But that's
more an implementation detail, and it's _very_ platform specific.
Switching to threads has a great impact on many _details_ of the
server, the benefits depend a lot on the platform, but the model is
just the same, with the same essential problems.
Many big changes for little gain. Let's explore, at least in theory,
the advantages of a completely different model (that implies a lot
of changes too, of course - but for something).

You ask what an event is? An event can be:
- input from a connection (usually a new query);
- notification that I/O needed by a pending query has completed;
- if we don't want a single query starve the server, an alarm of kind
(I think this is a corner case, but still possible
- something else I haven't thought about.

At any given moment, there are many pending queries. Most of them
will be waiting for I/O to complete. That's how the server handles
concurrent users.

Quote:
Regards,
Thomas Hallgren
(*) They're oriented to general purpose processes. Think of how CPU
usage affects relative priorities. In a DB context, there may be
other criteria of greater significance. Roughly speaking, the larger
the part of the data a single session holds locked, the sooner it should
be completed. The kernel has no knowledge of this. To the kernel,
"big" processes are those that are using a lot of CPU. And the policy is
to slow them down. To a DB, a "big" queries are those that force the most
serialization ("lock a lot"), and they should be completed as soon as
possible.

..TM.
--
____/ ____/ /
/ / / Marco Colombo
___/ ___ / / Technical Manager
/ / / ESI s.r.l.
_____/ _____/ _/ Colombo (AT) ESI (DOT) it

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly



Reply With Quote
  #20  
Old   
Chris Browne
 
Posts: n/a

Default Re: Reasoning behind process instead of thread based - 10-29-2004 , 02:47 PM



nd02tsk (AT) student (DOT) hig.se writes:
Quote:
Two: If a
single process in a multi-process application crashes, that process
alone dies. The buffer is flushed, and all the other child processes
continue happily along. In a multi-threaded environment, when one
thread dies, they all die.

So this means that if a single connection thread dies in MySQL, all
connections die?
Yes, that's right.

Quote:
Seems rather serious. I am doubtful that is how they have
implemented it.
If it's a multithreaded application, then there is nothing to doubt
about the matter. If any thread dies, the whole process croaks, and
there's no choice in the matter. If a thread has been corrupted to
the point of crashing, then the entire process has been corrupted.
--
let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];;
http://www.ntlug.org/~cbbrowne/linuxxian.html
A VAX is virtually a computer, but not quite.


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.