Is there a functional language with explicit limits on the heap(s)?

I'm trying to make a research compiler which is supposed to run as a service, and so I'd like to use some language with cooperative, userspace threads, as (Go or) Erlang. The problem is that some user requests may consume too much memory, or possibly never halt. So I'd like to impose some restrictions, e.g., "requests from IP a.b.c.d may only use up to 80mb of heap memory and run for 5 minutes top".

Haskell has setAllocationCounter (also forkIO and STM), which would help, but that doesn't seem to take the GC into account. Erlang seems to let me limit the heap for each process, but many tasks will be computationally intensive, and Erlang's probably not a good pick.

Would anyone have some suggestion? I'd rather not have to create a new language, but it's a possibility; is there any research on such languages that could help?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

See Memory Accounting

In case this wasn't clear,

In case this wasn't clear, this work was done in the context of the Racket programming language, and it's part of the standard distribution. Custodians manage memory, threads, streams, sockets, etc.

It wasn't clear from reading it ...

I still haven't finished the paper, but thanks for pointing that out, because Racket itself wasn't mentioned anywhere in the paper.

I figured, since that work

I figured, since that work predates the rename from PLT Scheme to Racket. You can find the original name in there.

You were clear :)

I've been playing with Racket yesterday; seems like the way to go! Thanks.

Coincidence

This is coincidentally a problem that I'm working on right now. I'm not aware of any language that does what you're looking for (today) in a lightweight manner, i.e. without using OS processes as the "killable" limiter.

The manner that we have solved it is somewhat Erlangish:

  • A service is a lightweight "von Neumann machine", with its own (virtual) memory and single (virtual) CPU.
  • Similar to an "Erlang process" (IIRC), a service represents a potentially asynchronous boundary.
  • Execution in and out of a service (i.e. transfer of control to/from a service by any form of invocation or message) can only move immutable data or references-to-services through the service membrane.
  • Execution in and out of a service is handled by an implicit async/await model, or an explicit future/promise model; the underlying mechanism is always the same, regardless of the representation.
  • Services exist nested within a container, which is itself a service. New containers can be created, which exist nested within whichever container created them.
  • Each service meters its own memory and CPU usage. Each container provides an asynchronous sum of the same (including nested containers).
  • A service can be killed. A container (which is itself a service) can be killed.
  • Each service is responsible for garbage-collecting its own memory (there is no "stop the world"). Additionally, a container has a dedicated second pool of memory for shared immutable data, so that immutable information that is exchanged among services can be passed by reference, and that separate memory area for immutable data can be garbage-collected concurrently (precisely because it's immutable).
  • Services control their own re-entrancy, critical sections, time-outs, and so on. The entrancy model into a service is that of a co-routine, so while a service is itself a "single core (virtual) CPU", it can always accept new work (subject to re-entrancy and critical section settings) without "burying" any previously running work underneath a call stack.
  • Memory and CPU can be limited, can be measured and aggregated, and excessive use can result in the killing of a service or container, as appropriate.
  • A container is used to (optionally) load a bundle of linked modules, so unloading the container can also unload the related code.

Are you creating a new language?

Are you creating a new language? Measuring CPU usage seems something pretty nice as well.

Yes

Yes, this is from a new language, although the CPU measurements are a function of a runtime option, and not in any way mandated by the language itself. However, the language was explicitly designed to elegantly support resource management, including CPU, memory, storage, and disk and network I/O.

How do you measure time?

How do you measure time? I've been thinking about writing a runtime for such things, to control memory usage, time and bandwidth (for socket connections), with a userspace scheduler and stuff. But, to do such things, I'd have to measure time a lot, and measuring time itself is sometimes costly (a system call), and one shouldn't do that so frequently. How do you handle this issue?

Measuring time

Hi Paulo -

The hardware Time Stamp Counter (TSC) is probably the cheapest clock.

http://btorpey.github.io/blog/2014/02/18/clock-sources-in-linux/
https://aufather.wordpress.com/2010/09/08/high-performance-time-measuremen-in-linux/
https://linux.die.net/man/2/clock_gettime

It really depends what you're trying to measure. It's best to be in a position where the measurements can be of relatively large granular units (perhaps more of a sampling approach).

In our design, the user space / system boundary is well defined, so we only measure time spent in user space. Anything done on behalf of the user in system space is either accounted separately, such as I/O, or simply considered necessary overhead.

ulimit...

ulimit? I am not sure you even want to solve this in a language.

Handling it in the language

Handling it in the language is much more lightweight than trying to isolate it via a process. This would be applicable anytime you want to run untrusted code for, let's say, plugins in your video editor.

That's not lightweight

That's not lightweight at all, I'd have to fork the whole process. I was looking for a solution that would let me run within cooperative userspace threads.