what are the differences between erlang process and pthread threads?

hello,

i'd like to find out about the differences between erlang processes and pthreads threads -- not in terms of the use of them but in what they actually are -- what the process and thread consists of -- how processes and threads differ and what those differences mean/amount to. i know an erlang process is much more economical (probably in both memory and cpu use) than a thread -- what is it that threads have and processes don't? stacks? each thread has a stack of it's own i think; do processes not? -- those are the kind of differences i'd like to find out about -- where can i read about this kind of thing? i notice there's a c project called "Protothreads: lightweight, stackless threads in C" ( http://www.sics.se/~adam/pt/index.html ) -- how similar to a process is one of these protothreads do you think (if you happen to know anything about protothreads)?

any info on this would be great.

thanks, ben.

"Green thread" or "microthread" is probably a better search term

"Green thread" or "microthread" are some of the more generic terms used here. I'm not sure if the erlang team called them processes because they felt the name was more descriptive, or because they thought something like 'microthread' would be dismissed in the same way that people dismiss 'scripting languages'. There are a few big differences compared to 'normal' threads.

First is that they require much lower overhead. Normal threads create a stack for each thread which ends up hogging memory if you try to create hundreds or thousands of threads.

Second is that switching between threads is controlled by the application, not the OS.

Third is that processes are isolated and the only permissible way to exchange information is message passing. This eliminates the need to use mutexes, semaphores, etc, and simplifies writing thread safe code.

Anyway, if you do some research on green threads or microthreads, you'll get some pretty good general information on the differences compared to OS threads. Most should be applicable to erlang.

By grant at Thu, 2006-08-24 16:01 | login or register to post comments

Additional note.

Second is that switching between threads is controlled by the application, not the OS.

Just to be a bit more explicit: This is also critical to performance when using huge numbers of threads since switching between threads is a few machine instructions (cheap) versus doing a full-blown context switch (expensive).

By Bárður Árantsson at Thu, 2006-08-24 17:36 | login or register to post comments

Cost of context switches

These days much (most?) of the expense of a context switch is all the cache lines that are suddenly stale. I'd expect thread switches to incur this cost whether or not the operating system was involved.

By Tom Duff at Thu, 2006-08-24 18:18 | login or register to post comments

Aagh!

That was supposed to be in reply to BÃ¡rÃ°ur's comment, but apparently I can't reliably identify the correct Reply button.

By Tom Duff at Thu, 2006-08-24 18:20 | login or register to post comments

Right

A machine-level context switch is only a few instructions anyway (depending on the chip, of course); and, no matter what you do, if two threads are accessing different physical memory, the switch will probably flood the cache.

In fact, come to think of it, writing in a threading model that shares memory might reduce cache flooding, since two threads can share the same physical memory.

By John Stracke at Thu, 2006-08-24 19:39 | login or register to post comments

Empirical data show that OS threads are indeed more expensive

See Computer Language Shootout (and here).

Ignore C# and C++ there, their implementations use a different algorithm which gives them an unfair advantage.

By Qrczak at Thu, 2006-08-24 22:01 | login or register to post comments

Ref. doesn't address the problem?

From the problem statement (500 threads, each increments a number and passes it on to the next) it sounds like all the data for all the threads should easily fit in modern caches, so the massive-staleness problem shouldn't come up. Also, this is really a coroutine problem. Only one thread is eligible to run at each context switch, so there's no scheduling work to do -- Cardelli & Pike's Squeak compiler would turn the program into a single assignment statement, with a little optimization from the C compiler on its back end.

That said, I have no doubt that going through the OS should be more expensive, if only because you're likely to suffer 2 context switches for each thread swap. My only question is whether the added overhead is significant in realistic situations.

By Tom Duff at Thu, 2006-08-24 23:08 | login or register to post comments

Hmm...

I'm going by tests on hardware ca. 3-4 years ago or so here, I guess should have added a disclaimer there... These were of course totally artifical benchmarks, but an OS-level context switch on a Pentium3 1GHz (mobile) was roughly 100-200 instructions actual time IIRC (two threads using almost no memory), whereas a non-OS context switch was a few instructions in actual time. The benchmarks did not take memory access into account and it seems very plausible that that would indeed reduce the difference between the two considerably in real-world cases. (I mean "reduce" in the sense that cache misses/memory fetches take so long that the actual cost of the context switch itself doesn't matter any more.)

I wonder, though, if a scheduler which is built into the compiler and has deep knowledge of the language could make better judgements about scheduling to minimize cache misses. It would almost certainly be able to better predict how large the working set of any given thread/context would be, and how it could schedule a number of threads to keep the working sets of those threads in cache at the same time. (We may be into "magic happens here" territory though.)

Anyway, correction much appreciated.

By Bárður Árantsson at Thu, 2006-08-24 22:25 | login or register to post comments

Cache

With kernel-space context switching all your cache lines are pretty much guaranteed to go stale because caches in modern computers are virtually rather than physically mapped. (The alternative is go through the page table or TLB with each memory access which hurts the average case.) At least with user-space context switching there is a chance of relatively peaceful coexistence between processes with small working sets as far as the cache is concerned.

Also, I think the minimum overhead of a hardware context switch on a modern x86 box is just below 1 ms. That's definitely nothing to sneeze at although I do agree that the cache effects of context switching are going to be dominant.

By Per Vognsen at Fri, 2006-08-25 08:34 | login or register to post comments

russian doll process arch

bend: i'd like to find out about the differences between erlang processes and pthreads threads [snip...] how processes and threads differ and what those differences mean/amount to.

I think about this a lot in a long-running design problem I'm cooking on a mental backburner. (If I write a FAQ for a new runtime someday, one question might easily be this one. :-) I very nearly wrote on this topic in a recent LtU thread on "fixing Lisp", but my reply got out of hand, so I shelved it. But I'll include a part of it in this post: I picked the phrase russian doll process architecture to name an idea. (Pronouce rudopa the same as Europa: roo-DOH-puh.)

The distinguishing characteristic of a process (as opposed to thread) usually has something do with with the address space. For example, a process is typically described as having an address space independent from other processes. Under Linux for example, each process has a separate address space (if you ignore shared memory). Threads are control flows with concurrent access to the same memory, which introduces the whole mutable shared memory access control problem.

But what you really mean by process and address space might depend on your runtime and how narrowly you want to define things. You can get different sorts of process like results by allowing variations in what you mean by address space, where each variation focuses on the idea of keeping memory partitions disjoint from one another. You can use more than one together at the same time in one runtime. You can even nest them inside one another, as in the classic russian doll meme.

Assuming you want to host a runtime in some existing operating system, you'll take one process as given from the host OS. But inside this process you can have lightweight processes. If you only had two kinds of process -- host and lightweight -- then you'd be able to get by with the idea of lightweight process alone. However, you can have more than one type of lightweight process, and they can also nest inside each other. (I thought rudopa -- for russian doll process architecture -- was a plausible way to refer to this.)

An Erlang process fits the address space qualification for process because Erlang data is immutable. If you can't ask objects about their addresses, and if you can't change state to leave breadcrumbs for analysis, then you have no objective means to detect the scope of an address space. So if Erlang messaging objects don't send data to each other, they might as well be in different address spaces becuase they can't interfere with each other in a shared mutable state scenario. (You can also share state between lightwieght processes and this is undetectable if the state doesn't change; so sharing this way is an optimization over indepedent copies.)

There are at least a couple other ways to do lightweight processes, and I expect to use both of them in addition to the Erlang style immutable state event handlers. (Let's use const process to refer to an Erlang style process.) First, you can have a VM with disjoint address spaces enforced by the VM, so multiple lightweight process in one host process can't affect one another. Let's call this a sandbox process. Second, you can have a memory allocation runtime partitioning one host address space into disjoint graphs of objects, so each is partition is garbage collected independently of any other. Let's call each of these a plot process (using a real estate metaphor for land plots), since disjointness is nominal but legally mediated by the runtime (you can ask whether an address is inside your plot).

Anyway, each of the process models give you a different form of management and control over memory usage, and each has a different take on avoiding interference under concurrency without using synchronization mechanisms, because no mutable state is shared.

Inside a host process you can have multiple plot processes (and maybe some sandbox processes), and in each plot process you might have a way to mark memory as immutable so it can be fed into Erlang style handlers that run as const processes.

By Rys McCusker at Thu, 2006-08-24 20:12 | login or register to post comments

ToonTalk

ToonTalk's houses would be const processes, I suppose, if they are written without using sensors (leaving bird and nest as the only communication). And I see no particular reason why they would need stacks, so they are probably pretty green.

As in the actor model, when a house "fires", it cannot loop nor hang, so can't take long to finish its turn. It's basically executing a rewrite rule.

By Jack Waugh at Fri, 2006-08-25 13:31 | login or register to post comments

Lambda the Ultimate

User login

Navigation