out of memory

When a resource is exhausted, like memory, does this need special handling in a programming language? I see a way to avoid it, thus answering no. But I wonder what kind of argument requires yes instead of no. Note I rarely find paper references either useful or interesting, since I like basic ideas summarized in granularity of dozens to hundreds of words.

(This is an offshoot from discussion Experiment where out of memory conditions came up. I thought it might be interesting to discuss memory limits as they affect programming languages, if they do.)

Resource allocation can look like a system call to a PL (programming language), so running out of memory can be handled by the environment. A runtime hosting a PL can react instead. For example, in a lightweight process system, max memory used by one process might be limited. Hitting a limit occurs in a call, blocking the group/process/thread/fiber. What happens next depends on configuration, process architecture, and registered preference. Options include killing a process group, relaxing limits in triage, and/or blocking until space is freed by another scheduled entity under the same limit-scope mechanism.

An "important" process ought to have an out-of-memory handler, to react more flexibly than letting that process be killed. In a lightweight process system, this can be a signal awakening a fiber handling high priority event notifications like "you're overdrawn". From a PL perspective, the code is just event handling, like any other event. A runtime or operating system sees a resource exhausted, so a scheduler responds and not the PL. Is there a reason why a PL should get any say?

I can imagine a PL resisting management of resources by the OS and/or runtime because there's no concept of running inside bounded space, so limits imposed by the environment fail to be enough room all the time, because a PL can't stay inside and handlers basically fire all the time to no productive result. But that seems like a language arrogating to itself some OS resource handling role without suitable responsibility and competence. Failing to live inside limits seems more flaw than feature.

My point of view isn't interesting to me because I understand it. I'd prefer to hear different views so I learn something. Is there a reason why a PL provides special value by handling resource exhaustion more directly than the host environment?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

computers are

Computers are finite state machines. Programming languages that treat them as turing machines are fantastical. Virtualizations (e.g. via OS) that lies about the size of the finite state machine are also fantastical.

Fantastical doesn't mean useless but my sense is that it does mean immature.

Rings and Things

I agree with treating computers as FSM, also with treating integers as number theoretic groups (DiscreteArchimedianRing?)

As for lying about the size, I am not sure? Virtual memory could be seen as a caching layer (like cache on the CPU). Cache-oblivious algorithms let you write optimal code without knowing the size of the cache. You don't want your code to rely on the size of the CPU layer 1 , 2, or 3 cache, nor should it rely on the size of DRAM (layer 4 cache?). Its fine to work with the virtual memory size, and a cache-oblivious algorithm will just give you better performance the more cache you have in any layer 1 - 4.

an side about the term "cache oblivious"

"Most cache-oblivious algorithms rely on a divide-and-conquer approach. They reduce the problem, so that it eventually fits in cache no matter how small the cache is."

i think i don't like it how people say, "without knowing the size of the cache." in reality you will pick /some/ cache size and tell your algorithm to aim for that when it does divide and conquer. a value must be given. otherwise it would have to assume like a cache of size zero or 1 but anything bigger would be a random mostly guaranteed wrong guess, either too big or too small.

no?

for a cache of reasonable (but unknown) size

Cache oblivious means you are neither 'given' the size of the cache, nor do you assume a specific size. However, you do assume that the cache is of some "reasonable" size - i.e. big enough to fit the inner loops of your program code and a limited working set of sequentially arranged data.

cache oblivious ordering.

For matrix multiplication to get any benefit from the cache the smallest matrix (2x2) must fit in the cache. So we are talking bytes to start taking advantage. For the cache oblivious funnel sort its two words/objects minimum cache size.

In general the optimal cache oblivious algorithm will outperform any non-cache aware algorithm on real hardware (for example funnel sort is faster than quick-sort or merge-sort). However a cache-aware algorithm (one which is written with specific optimisation for the real sizes of the caches in the cache hierachy) may be faster, but it tends to be asymptotic, so with a large enough cache the cache oblivious performs as well as the hand tuned cache-aware. Modern caches are large enough for there to be little point in using a cache-aware algorithm where a cache-oblivious one exists.

Cache aware vs doing he right thing

"In general the optimal cache oblivious algorithm will outperform any non-cache aware algorithm on real hardware"

Maybe in general, but not in specific cases. The cache aware algorithm for matrix-matrix multiply is demonstrably and provably slower than doing the right thing (the Goto / van de Geijn algorithm). That's because the optimal (or at least, the best known) algorithm actually optimizes for L2, not L1. And don't say "but that's cache", because relatively to L1 the L2 is main memory.

"cache-aware algorithm [...] but it tends to be asymptotic". Nope. Goto is faster on practically relevant sizes.

Cache-Oblivious is a metric

"cache-aware" is not the same thing as "cache-oblivious" at all, which is what we were talking about. cache-aware requires you know the exact cache hierarchy and sizes, and will not work optimally on hardware that is configured differently. Cache-oblivious you specifically are not allowed to know the cache-architecture and sizes, it is optimal over all cache configurations.

Another cache-oblivious algorithm seems to perform as well as Goto's on real hardware:
http://www5.in.tum.de/pub/bader_para08_paper.pdf

In any case, if Goto's algorithm achieves (near) optimal performance on all possible cache configurations than it too would be classed as cache oblivious. Cache oblivious is a metric to evaluate performance that takes caches into account and replaces big-oh. You take any algorithm and prove its worst-case cache-oblivious performance is better than other algorithms, thats all, so in that light your comment doesn't make much sense.

thanks

so yes i think that confirms my suspicion / frustration/ guidance to people about how they say these things. :-) all the ways i've read it described mostly utterly fail to clearly succinctly up front state "we get the base case down to something so stunningly small that most people would agree it will fit in just about any cache anywhere -- and yet it still turns out to be very efficient anyway as we build back up from the minimal/base cases." :-) because otherwise it all sounds like hand-wavy ivory-tower weirdness to me and i suspect to (maybe only for some vanishingly small value of 'others') others.

Caches and VM

I think for most situations that, rather than the fastest general purpose computer for the money, I would prefer to have one where the speed of an instruction is predictable and where perfectly memory safe programs don't get OOM errors "just because".

At least if simultaneously, by other means, the quality of software engineering is also getting much better.

I'll take a little slower but much less flaky over the present situation.

For general purpose computing meets high-performance needs I kind of assume on-the-fly-field-customized hardware will eventually dominate.

Software engineering is a real bottleneck here, though. The chip designers cater to improving the performance of legacy SW and SW practices and that has lead to today's Rube Goldberg contraptions.

Not Necessarily Slower

I don't see that things have to be slower. Using in-place destructive updates is memory-safe because no memory is allocated during the operation.

Handling it in-language

Handling it in-language versus leaving it to the host environment is a matter of whether the language can handle it in a way more useful than the default of the host environment. In a language that handles errors well, I suspect it could be handled better than the host default. However, error-handling is one of the things I don't think we have a clue how to do right. The semi-religious debates over exceptions and retries and whatnot suggest to me that none of these approaches speaks to the essence of the matter. So I'm skepticcal that any existing language can handle it better than the host default.

A good programming language

A good programming language will factor programs into a behavior specification and a memory management scheme implementing that behavior because this factorization is simplifying and complexity is real bottleneck.

A good factorization

There's some sort of a general principle here. "One: factorization is only simplifying when done with genuinely orthogonal factors. Two: there are no genuinely orthogonal factors." This is probably while all programming today isn't done in asepect-oriented languages.

I agree with your general

I agree with your general caution against factorization. Years ago, I posted a similar critique of aspect oriented programming to this site, but I can't find it. I think I trimmed a bit too much meat with the fat when I was preparing the above comment.

That said, I don't really agree with your principles - particularly the first one - as stated. Factorizations are often simplifying even if they aren't into perfectly orthogonal factors. For example, breaking a program down into modules and functions is factorization (replace 'factorization' with 'abstraction' and this is a sentiment you know well) and is an almost required simplification.

I think the area of memory representation is another instance where the factorization will be fruitful. If the compiler knows what the values are, it can help you get the representation related issues right.

behavior not a module?

I won't defend the specific wording; I deliberately modeled it on "Murphy's Law"-style humorous aphorisms. Incautious of me, no doubt. Though it's more than half serious.

The behavior/memory-management factorization seems to me to differ in some fundamental way from factorization into modules. Perhaps because the factorization is asymmetric? With modules, there are generally internals of each module that are supposed to be able to ignore internals of other modules, The interfaces (non-internals) often ignore each other asymmetrically, with low-level modules ignoring the interfaces of higher-level modules. It seems, though, that we want the behavior to ignore all parts of memory-management... and the central question here, as I understand it, is what to do when the behavior's attempt to ignore memory management fails catastrophically.

Memory is definitely more of

Memory is definitely more of a "cross cutting concern" than problems that can be solved by factoring into modules. And I agree that having the programmer ignore all parts of memory-management during algorithm implementation isn't a good idea. What I think is a good idea is being memory-conscious in the general sense of considering which values are around and high-level data traversal in the first pass and then worrying about garbage collection and low-level pointer traversals in a second pass. The second pass can be fully automatic if you don't care too much about performance.

what does orthogonal mean, anyway?

are orthogonal things easily composed? or the opposite? or what?

easily decomposed

easily decomposed, separate concerns

apologies, i was being too terse/sarcastic

when somebody says something is orthogonal i don't see how that guarantees that in my chosen programming language it will be easily composeable. maybe it is necessary, but off the top of my head it seems hardly sufficient. so i was fishing for more down to earth thinking about "ok so i have orthogonal things but big deal, that will get me a cup of coffee -- how do i make sure it actually gives me coherent benefits in my language? can i eff it up with bad syntax? with a bad compiler? with ?" etc.

yes we can

Intercal is a constructive proof that a programming language can be obstructive, incomprehensible, and hinder all things nice. With examples like Intercal around, along with less intentionally obtuse languages such as PHP or Perl, the answer to your question should be obvious: yes, we can eff it up.

not an obvious answer

because we don't know if intercal or php or perl even tried to go for any orthogonality. (or at least even if they did, they were dumb about the design of it all on paper.) i'm wondering: given truly orthogonal things, what good does that do me the end-developer-in-the-street? what languages are such that if something is orthogonal then it is automatically easy to use, composeable, robust, whatever other *isms we want?

what languages are such that

what languages are such that if something is orthogonal then it is automatically easy to use, composeable, robust, whatever other *isms we want?

The main job of a programming language is to glue ideas together, not to keep them separate.

With careful design, a language can simplify separation of a few specific concerns. But the rest will be left to frameworks, libraries, APIs - i.e. design and discipline. For example, if you want orthogonal persistence, you can get it through your language or through a framework.

As an end developer on the street, you ought to know what can be separated, and perhaps what should be separated. It will affect how you develop code.

resource-obliviousness

Any given behavior can be realized in a number of different ways (including none, outside its space-time feasibility constraint). Riffing on the cache-oblivious idea: would there be any advantage to a system (or language runtime) which could be told: M, N, and O are machines with differing resource requirements, but which all realize behavior B, so that when we invoke B we will get our results from the quickest machine which is able to run in the given space? (one can always implicitly implement this concept —caching schemes are a particular example— but is there any advantage to making it explicit?)

blame and partial failure

Is there a reason why a PL provides special value by handling resource exhaustion more directly than the host environment?

As I understand it, there isn't much benefit to a global "you're out of memory" signal because you can't blame anything but the whole program. But if you have some way of providing quotas or tracking memory consumption of designated subprograms, and instead have a "this subprogram has reached its quota and is asking for X more" signal, this could be very useful - you clearly can blame a particular subprogram and do something about it (e.g. kill it, or have a tolerance policy that accommodates brief but rare misbehaviors). This is especially useful in applications where you're working with scripts, multiple agents, or distrusted requests.

Dealing with whole-system or whole-program failure is easy. Partial failure, where different subsystems fail but the rest keep working (and thus we must concern ourselves with consistency and recovery) is much more complicated. Providing a clear scope for blame and partial failure, e.g. by modeling sandboxes or partitions within the language, can help a lot.

out of quota

I probably should have titled the discussion out-of-quota or out-of-budget, but out-of-memory was the starting point in the other threads.

But if you have some way of providing quotas or tracking memory consumption of designated subprograms, and instead have a "this subprogram has reached its quota and is asking for X more" signal, this could be very useful ...

Right, I'm mainly concerned with local out-of-quota (OOQ) signals, not global out-of-memory signals, so subprograms can respond or be killed by runtime and restarted (if necessary) by supervisors. I'm looking at a system where it's feasible to track total allocations, total i/o, and other resources one might want to limit. So signalling on exhaustion by subprogram is easy. Adaptive per-instance subprogram dynamic quota changes are okay.

As long as a thing is modeled, controlling it is feasible. So an environment that models resource usage by subprograms can install controls. For memory it would involve changing the total memory footprint statistic on each allocation and deallocation, but this is only slight total cost increase.

One of the environments I plan to write for an engine will aim to simulate a poor man's Unix environment in broad strokes, with a shell and command lines and pipelines, etc, but everything would be coroutines in the local OS process. However, allocations, cycles consumed, and i/o would be under quotas.

Transactions, Ask-Early and Checkpointing.

I think the best that can be achieved is to request in advance of some transaction that should either complete or roll-back, for sufficient memory to complete the transaction under all circumstances.

I would like to see the type system prove an upper memory bound for the operation inside the transaction and make the reservation request in advance. If there is insufficient memory to do the transaction in one step, then the programmer should refactor into smaller transactions, and use backtracking (IE take a checkpoint before the first transaction of the set, or maintain an undo-log), again this might be enforced by the type system.

The reservation also guards against circumstances where there initially appears to be enough memory to complete the transaction, but another competing process uses lots of memory after the transaction starts.

In interactive programs you would just return to the prompt/UI with an "insufficient memory to complete operation, please try again later" kind of message. For programs that cannot continue, I would look to do a memory dump (checkpoint of the consistent pre-transaction state), in such a way that the program can be resumed later (the language could support starting any program with a check-pointed memory image).

Resource Aware Type Systems

If you limit yourself to type system techniques, Jan Hoffmann and coauthors have worked on resource aware type systems.

Limiting?

I wouldn't call type techniques limiting. I think given the right type system, you can perform any compile time computation, its just less ad-hoc than other compile-time methods, and can be turned into a formal proof more easily.

Application specific knowledge

OOM is typically a global condition for a system and not all applications are equally important. Killing init(1) is obviously bad for your entire system. Killing the first application that requests more memory means your scientific simulation that has been running for weeks is the first thing to die which is probably not what the user wants. If applications knew about low memory situations (which they can, AIX has SIGDANGER for example) they could react to that and return memory that wasn't absolutely essential for their operation. First things that come to mind are caches, but most people would prefer having a single tab in their web browser killed rather than having the entire browser killed.

The conclusion of all of the above is basically that it's a hard problem and there isn't an obvious right solution that works for every possible scenario. Providing knobs to tweak for the working engineer at least allows them to try to handle the situation in a sensible matter, whatever sensible means in their context.

plenty of coroutine VM knobs

If you write your own runtime, OOQ (out-of-quota) is a local condition applying to subprograms, and each "application" coroutine can have its own hand-tuned configuration and quota befitting needs and importance. One can avoid loss of forward progress via judicious checkpoints in a long-running app.

Obviously killing the first app requesting more memory is neither desired nor a requirement. If a runtime has it's own quota system, an important scientific simulation that's been running for weeks can make itself immune to quota enforcement, if the runtime's developer allows that, at the risk of runtime failure when an immune app consumes all resources without responding to memory pressure events. Basically you want a user space operating system for coroutines; otherwise if everything is a native OS process, you're at the mercy of what the OS does.

that sounds nice in theory

But i suspect there's a lot of it that is hard in terms of actual usability for humans. If there were a library that magically had heuristics that would automagically say "oh your input looks like it is 4gig so that means my quota has to be XYZPDQ bytes" then maybe it would be a bit easier. On the whole if we are leaving it up the end-user then I expect this will happen:
1) run Big Program with some default or very arbitrarily guessed quotas.
2) it gets killed due to resource usage 3 seconds before it is going to tell me The Answer.
3) cursing!
4) re-run it with the resources set at 2x to hope it doesn't fail this time.
5) maybe that works...
6) ...until the next time I run it with a different data set.
6a) either it does (2 and 3) again or
6b) it works but is wasting tons of resources that others could have used, defeating the whole purpose of all of this.

but i'm an optimist.

resilient software

Keeping in mind that quotas aren't the only thing that might kill a program. There are also power outages, node or network failures, etc.. And even updates, where we might be forced to reload software with a new version.

It is worth pursuing orthogonal persistence, crash-only software designs, and resilience. And these will help with quota failures, too.

and another thing

if you are on a multi user machine then it is also perhaps a dynamic answer. i want my Big Program to win the ro-sham-bo game over other processes because i'm selfish, as would most other people using the system want their to win. tragedy of the commons etc.

quota usability for devs vs users

I like your reply a lot because the perspective surprises me, and that's what I wanted. But it seems selfish of me to pursue topics with no clear PL angle just because it relates to runtime stuff I enjoy. Maybe you can add more PL focus and make me less selfish. :-) In your remark I see no aspect about a PL (programming language) as reflected in a question like this one:

Is there a reason why a PL provides special value by handling resource exhaustion more directly than the host environment?

That's okay, but it just means we seem to be talking about whether quotas are good or bad in the host runtime environment for a PL, and I'm getting your focus is how quotas impact end users if they can see quotas. The main user I care about is the developer creating the code, not the eventual consumer of the running code in production. I would not expect that dev to let end users fiddle with quotas, or see them, unless a more whitebox view of things was a feature sold and run through some validation to put bounds on expected quality. (A readonly proc file system view of quotas and actual usage would be safer than letting end users directly write new limits.)

I work on modules in network load balancers, and the "user" is a devops admin tuning configuration for network services, so end users of the actual services will never see a load balancer exists or modules inside. When debugging, the user is myself or my dev peers, as we try to get an idea what's actually happen inside that perfect storm of non-deterministic, async, distributed mix of fibers, threads, and processes trying to service as many connections as possible with fixed resources in hand. We try to provision and configure a system to get the right amount of memory set aside for everything that needs to happen, but finding out what actually happens is not that easy.

When debugging, one typically sets limits to be smaller and less forgiving, so you can see any variation from what you expect, so surprises happen sooner in tests instead of post deployment. Where a flow might assert when debugging, a connection may abort instead in production, because there's no reason to kill more than the failing entity. Limits are much bigger in production, usually meaning something very wrong has happened. If writing a config file is expected to cause a few kilobytes of i/o, then after writing a hundred megs instead, maybe there's a problem. If a resource should only take R units of cost in theory, but actually takes several orders of magnitude more at runtime, that's a good reason to impose a limit.

I can avoid speaking more in generalities unless we want to expand some particular, or touch something I haven't yet mentioned. (The subject of "bad things that can happen" is large and difficult to summarize briefly.) Let me address specifics in your list instead. Note when using quotas, you can always turn them off by making them unlimited. But filling an end user's hard disk completely is probably not a good idea.

If you clone Unix cat or netcat utilities into a coroutine-based embedded system, there's probably no reason why they should ever use more than some max amount of memory at once. Why would they ever hit a megabyte before handing ownership off to someone else? You could set the limit very big and then just log when a smaller soft limit was hit. When a system is composed of thousands of pieces, it's helpful to know which goes far outside normal parameters.

For Big Program, make memory unlimited. Then if OOM causes Linux to kill your scientific simulation that's been running for two weeks instead of Big Program, maybe you'll want to set the limit to something smaller so the OS doesn't kill random processes when Big Program goes crazy. Profiling what Big Program allocates and why may provide insight into hog usage. If Big Program was a collection of many subprograms, you could limit the most abusive ones within. Presumably devs of Big Program would know what bounds are reasonable, and give quota tuning guidelines to end users.

Instead of starting small and working up, I'd go the other way and start big, then work down when too many resources are being used in your opinion. Limits themselves don't cause preallocation. Allowing an app to reach two gigabytes does not allocate 2 gigs of space that goes unused. It only means passing this threshold sends a signal, and what happens depends on what handler was set up by the dev or an end user (if that was supported). You only get resource waste from quotas if you allocate the quota upfront, which isn't necessary.

I was pleased you brought up the tragedy of the commons in the context of resource competition. I was going to bring it up as an example of something that's wrong in the current app ecosystem, where devs have an incentive to use more resources first to hamstring their competition.

Max Memory Bounds

Wouldn't this work better with some statically proven max memory bounds like I was suggesting above? If you could take your cat source code and because it uses fixed size destructively reused buffers and bounded stack behaviour and be sure it only ever needed 10k to run in. You wouldn't need a quota then. You would only need a quota for unbounded behaviour (which could be a warning or a compile error in strict mode).

yes, static analysis helps establish limits

I liked your earlier comments too, but I thought my reaction would sound too much like argument if written. (I was inclined to comment on interactive feedback being less useful for tools running without anyone attending the result, unless failure causes a dev to check logs.)

Wouldn't this work better with some statically proven max memory bounds like I was suggesting above?

Yes, definitely. Proving max expected memory bounds would be an excellent source for setting a quota limit. :-) I like static analysis, and seeing it applied to resource usage sounds good. I would find it hard to prove memory limits via static analysis myself, if code I analyzed was written in C, which I expect to see a lot. (My livelihood has gotten too bound up in dealing with things other folks have written in C and sometimes C++, so getting away from it doesn't appear anywhere yet on the horizon. Network server code has a lot of inertia.)

Even after proving a max upper bounds on memory usage, a quota limit can still be violated at runtime, especially if the code involved runs in the same native OS process as any other code studied less carefully. After a wild piece of code clobbers a field in your process with the proof, the proof becomes invalid, as we'd expect under memory corruption.

Folks where I work no longer ask me what I think caused memory corruption, when that's the cause I cite as likely for a given crash's aftermath, since I've already said it many times. I say memory always becomes corrupt eventually, so it's just a mean-time-before-failure issue, with a goal of putting corruption off so well that failing for some other reason instead becomes more likely. This is one reason to isolate highly vetted code in a separate process from (for example) open source utils grabbed because they were convenient.

While static analysis and type systems are a good way to prevent memory corruption, I also like code laced with canaries to see it, then fail-soft faster, simulating a SEGV as violently as seems suitable (assert under debug and killing lightweight processes in production). Basically I like every object to contain two unchanging fields: a tag indicating type and a generation number pseudo-randomly assigned at construction time. (References then have the form of a pair pointer+gen instead of pointer alone.) Field size depends on object size, expected cost in total population, and urgency in detecting corruption based on object importance. A first step in reducing causes of corruption in code bases is acute intolerance of corruption, so selection by survival of the fittest helps improve quality over time.

seL4 or TAL

It would make sense to align the safe code with hardware memory protection. Start with a formally proved microkernel like seL4, and keep OS and library code in separate processes, so user code only calls seL4 directly, passing messages to the other processes to carry out tasks. Each process can be formally proved in isolation.

The lack of software would be a problem. Perhaps a more widely applicable approach would be to work with typed-assembly-language and validate the compiler output. This would enable complete validation of an open source operating system and applications.

Gulwani's SPEED

Sumit Gulwani has done some interesting work on imperative resource usage analysis with SPEED.

(wrong spot)

*