Call-by-reference decided by caller

I wonder if it makes any sense if the caller of a function decides weither a parameter is passed by reference or value. Often when I look at code (C++ mostly) I ask myself "Is this one changed by the function?" and I have to look up the function declaration. Does anyone know of a language implementing such abstraction?

Edit: I just realized, that the caller can control how an argument is passed by either making a deep copy before passing it (call-by-value) or not (call-by-ref). Only thing is, the function needs to define this argument as ref. Also this does not work well together with manual memory management, because if you create deep copies of stuff on the fly they do not get cleaned up afterwards, because the function receiving the arg does not know it has been copied before.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

ALGOL 60

In ALGOL 60 you can specify whether each individual argument to function is call-by-name or call-by-value, but I believe it is determined in the function definition, not by the caller.

C# half fixes this

C# doesn't let the caller control whether a parameter is passed by reference or not, but it does require the callsite to indicate it. If you define a method as taking a ref parameter, you must use the ref keyword before the argument when you call it.

I like this approach. In

I like this approach. In fact recently I heard a lot about C# that sounded very intresting.

Am I missing something?

Isn't this what you have in C? Parameters are passed by value, if you want to pass by ref, you pass a reference (&s).

I think I mean sth. slightly

I think I mean sth. slightly different: The caller decides weither a given argument is passed as ref or value, but the function itself treats the parameter as it likes. In C you are bound to the type of parameter defined in the function definition: either a value or a ref (ptr in C).

I just realized that if the caller chooses how to pass an argument, a function may have as well no effect, because its only purpose may be to modify that argument.

As you say, the definition

As you say, the definition of the function may rely on the parameter being a reference.

You could approach this problem in another way, by introducing an "update with" syntax, for example something like:

foo => myFunction();

This would be syntactic sugar for:

foo = myFunction(foo);

Our functions will always be called by-value, but the caller can decide to replace their old version with this new one immediately. This would also act as a hint to the compiler/interpreter that it can update this value in-place, ie. we can reuse the existing function in a call-by-reference way. Such a hint would also be useful to track argument usage:

myFunction = function(myArg) {
  temp = otherFunction(myArg);
  return 45 + temp;
}

otherFunction = function(myArg2) {
  temp2 = blah(myArg2)
  return temp1 * myArg2;
}

Here "myFunction" only uses "myArg" once, so it could do "myArg => otherFunction();" to avoid a copy; this can keep cascading until someone actually needs a copy. An example where a copy is needed is "otherFunction", which uses "myArg2" after defining "temp2", so we have to use two variables in this situation.

This has many similarities to CopyOnWrite; there may be advantages to using both together.

It may be useful to allow name changes without copying too, so we can update values in-place while still giving them more descriptive names. This is especially useful for Hungarian notation. For example, we could change our syntax a little to get:

unsafeEmail = askUser("What is your email address?");
safeEmail => sanitise(unsafeEmail);
sendToDb(safeEmail);

Here we've updated unsafeEmail in-place, but given it a new name "safeEmail". I would recommend that this invalidates the name "unsafeEmail", but there may be arguments for keeping both around (like "$foo =& $bar" in PHP).

Can be a security issue.

In a language with some kinds of call-by-name, call-by-macro-expansion, call-by-future, or call-by-promise semantics (all of which are uncommon) there is a risk of exposing your caller's environment to examination or even mutation by the callee, so this can become a trust or security issue. The specific problem happens when these dynamic patterns are required to pass a pointer to the caller's environment (so that the argument can be evaluated in that environment) and there is any way for the caller to access that pointer (even via machine code) and see/change what's at the other end of it.

If that is the case, I consider it obligatory for a language design to include a way for the caller to restrict what the callee can see or do. One method of doing this is to provide "restricted environments" for the arguments -- where the argument is reduced to a simple variable name, and the environment passed to the callee with the argument is an environment with only that single variable.

Of course, you'd rather not just explain the technique and then watch as programmers consistently forget to do it when calling routines they have no justification for trusting such as mobile code. Instead, you want to have an explicit "trust" level, probably as part of the "import" or "use" statement, that defines how much trust to extend to a particular callee. This also permits callee functions that *need* access to the caller's environment to work properly, to detect that they do not have it and moves the failure to compile time type-checks rather than runtime.

Of course doing this defeats the purposes behind the evaluation strategy, but there are a whole lot of useful functions that work just fine when called eagerly, and if the callee is one of those it should not matter. If it isn't, then whatever magic it does with your environment simply won't work unless you trust it to mess with your environment.

Protected from whom?

Most code belongs to the person using it so why do they need protection from themselves? Who can write a C program without passing a pointer to a structure to get something done? In fact, this whole comment seems like a dismissal of all imperative and OOP programming as accessing the data associated with the encapsulated data is an implied pointer from the functions viewpoint.

If you have to worry that somebody with change your data and the language needs to protect you from that person, the person already has access to the programming code and call by reference is the least of your worries.

Of course your solution to this huge problem might just happen to be stateless functional programming. What exactly happens in functional programming when you need the function to act on a 50M data structure? Does a programmer have any rights to the code they write or does it have to pass the compiler writers smell test?

It was nice how you fitted that "compile time type-checks rather than runtime" in there without any justification. It only takes a few words to dismiss dynamic type systems and systems that aren't or can't be compiled monolithically at one time.

So far, all OOP and dynamic languages have bitten the dust. RIP

Libraries, frameworks, services

Libraries, frameworks, services, plugins, etc. embody a great deal of code not controlled by the person using them. In team development efforts, it is not uncommon that no member of the team understands the full codebase. I wouldn't be surprised if most people control less than 1% of the code they use.

Calling method determines the code.

If a function is in a library or otherwise doesn't give you access to it's source code, please explain, without changing the function, how to allow the option of call by value or reference? How could this option be available in general through the PL?

You responded to my post and I think I showed that the function would have to be written explicitly to enable both types of calls.

Do you disagree?

How could this option be

How could this option be available in general through the PL?

The trivial solution would be to compile each function each way that it is used. For functions exposed through an interface, you'd compile them each way. A more sophisticated option is to integrate this with the value model - i.e. a value model that can transparently flip from edit-in-place to copy-on-write (perhaps leveraging uniqueness properties).

The technical feasibility isn't in question. The merit is. I think it would be silly to pass-by-reference where pass-by-value is expected, but the converse might be a little useful.

You responded to my post

I responded to your first sentence. You seem to be depending on a premise (that users own most of their code) that disagrees with my own understandings and experiences.

I don't think a wrote clearly enough

Let's say I wrote a function to change data that was passed by reference and I return whether it worked or not by using a logical. This function could be arbitrarily complex. If you sent that function a copy of the data instead, the algorithm might work correctly but the result would never get back to the caller. The changed data would just die on the stack.

How would you create an equivalent function that could return the resulting data rather than the logical? What would it mean to the function definition to return 2 totally different types from the same function where that type difference depends on the data? By the way, PHP returns different types for some functions that depend on the data and it is very easy to get errors because of it.

If you can see where my argument is incorrect, please enlighten me!

There is no general

There is no general equivalence between return values and pass-by-reference values; any such property would depend on the language. But I'm not seeing the topic relevance.

What exactly happens in

What exactly happens in functional programming when you need the function to act on a 50M data structure?

The usual approach these days is to "stream" the data through our functions, much like we stream text through UNIX pipes. For example I can run grep on my entire hard drive, but it will not wait for all the data before it starts processing.

This isn't completely generic, for example we may have an algorithm which needs access to the entire datastructure before it can do anything, but in those cases we're screwed no matter what language/approach we use.

http://okmij.org/ftp/Streams.html#iteratee

eh?

Of course your solution to this huge problem might just happen to be stateless functional programming. What exactly happens in functional programming when you need the function to act on a 50M data structure?

[I'll go further than Chris Warburton does in his answer to this]

The equational theory that is the lambda calculus doesn't go all wonky upon reaching a certain length of computation. With that in mind I cannot for the life of me figure out how functional programming at the 50M data structure size is going to be much different than functional programming at any other data structure size.

Beware the fallacy of the

Beware the fallacy of the beard. Due to various peripheral reasons (performance, process control, partial failure, persistence, memory constraints, limits of human patience), working with large structures is different from working with small ones. Of course, 'large' is relative; these days, 50M isn't too large, but add another order of magnitude or two...

Indeed, different calling

Indeed, different calling conventions are not equivalent even in a pure, Platonic setting like Lambda Calculus. We usually think of the Church-Rosser property as telling us "all calling conventions give the same result", but we have to remember the caveat "if they give a result at all!". For example:


infiniteList = 1 : map (1 +) infiniteList

main = print (fst ("Hello world", infiniteList))

If this program halts, it prints "Hello world". This will happen in a non-strict language like Haskell, which won't bother trying to construct the value "infiniteList" because it's not used. A call-by-value language (Scheme, OCaml, etc.) won't give a different result, it will give *no* result; it will get stuck in an infinite loop trying to build the list.

In fact this is the reason I shied away from mentioning calling conventions explicitly; I can't think of a way to make Iteratees for arbitrary, pre-supplied datastructure which is both call-by-value and constant memory. Of course, if we can implement the datastructure too, we can use thunks to generate the large parts on-demand. For example, in Javascript:


// Naive implementation of the above; will get stuck in an infinite loop.
var infiniteList = function() {
return [1].concat(infiniteList().map(function(x) { return x + 1; }));
};
var main = alert(["Hello world", infiniteList()][0]);

// Implementation using thunks (the second element of each array is a function which generates the rest)
var incAll = function(l) {
return [1 + l[0], function() { return incAll(l[1]()); }];
};
var infiniteList = function() {
return [1, function() { return incAll(infiniteList()); }];
}

If we're getting data from some outside source, these seemingly-artificial thunks can be replaced by the relevant "getNextValue" function for our data source.

Exactly

Due to various peripheral reasons (performance, process control, partial failure, persistence, memory constraints, limits of human patience),

Exactly, peripheral reasons, none of which are specific to functional programming.

To counter your earlier

To counter your earlier position, it is sufficient to note that functional programming is different with the larger sizes - different idioms and design patterns, tighter integration with effects for persistence or process-bar feedback, etc.. That this is not specific to functional is a valuable observation for another argument entirely, and perhaps we can study a few paradigms and generalize some requirements for operating on very large data structures.

Caller determines call by value or reference?

In general you, as a programmer, have access to the function code and you are the one making the call so you could pass by value or reference depending on what you want.

I don't think it makes sense to have the option on the caller side to choose either option. The default in all languages I have used in pass by value and most functions use this automatically. You might override this behavior if you want to change something in your calling space with the function. Nothing wrong with this kind of function, however the code of the function would expect to change your data and then probably return a success/failure code as a return. If you decided to send only a copy of the data then the function behavior would be broken. If the function could detect the kind of variable it was sent then you might be able to make a function that combined both solutions into one function. In general, this kind of complexity wouldn't be worth it.

In the end, it is your code and you can implement it either way without problem and without adding another "feature" to the PL.

In general you, as a

In general you, as a programmer, have access to the function code and you are the one making the call so you could pass by value or reference depending on what you want.

This statement is incorrect. If I had access to the source in general, that would mean I can always reach the "having-access" state, which I can't (for example, the source may have been lost years ago). In general I *don't* have access, because I can always reach the "no-access" state (eg. by deleting things).

If you mean programmers "usually" have access then I would agree, but hacking library code (without having your patches merged into the trunk) is a bad idea because it puts you in charge of a fork.

Seems this should be related

Seems this should be somehow related to a problem I was trying to frame a while back, of how to insulate clients from a function's internal notion of side-effects (thereby redressing the global typing problem presented by monads; in a blog post here).

I say it's all right to require a trust declaration.

I think that if someone wants to be certain that a given module is using only functions that do not have side-effects on anything visible to that module, then it's reasonable for him to say

import-pure "Libraryname1"

Whereas if someone wants to import everything in a given library, even arbitrarily crazy routines that may alter bindings not even related to anything they were passed as arguments, inject arbitrary new bindings into the environment, alter the continuation that the caller will return to when it exits, or do other "crazy" things, it is reasonable for him to be required to say

#I_AM_INSANE //enables absolute trust in imported modules
import-with-insanity "Libraryname2"

At that point, if the compiler detects a call to a routine in Libraryname1 that will have a side effect and modify one of its arguments, it should FAIL because the programmer explicitly told it not to extend such trust to libraryname1. The programmer must find a different way to solve that problem. At the very least the programmer is protected from calling something from libraryname1 which has a visible side effect that he doesn't know about.

Conversely if such a routine were in "Libraryname2" then the compiler wouldn't blink; Libraryname2 has been extended implicit and total trust and may do whatever it wants to the caller and its environment.

I've been thinking about this because...

In the course of attempting to define a maximally expressive (by Felleisen's notion of expressiveness to be specific) translation-target dialect, it becomes necessary to define ways to express the semantic DESIGN MISTAKES of other languages, because you may be required to translate code from that language, which works only because of those mistakes. At the same time, you'd rather not forego compilation strategies relying on the absence of those mistakes when in fact the program instance in hand does not rely on them.

As others have said, programmers generally use libraries that they do not completely know and/or trust, may not have the source code for, and consider unwise to modify. The problem in this case is to import only those things from that library which are in fact trustworthy. Indeed, you must be able to know (and prove!) to what degree you have limited your code's exposure to the random whimsy that may be expressed in a library originally written in a language that allows the semantic craziness of, for example, macros with arbitrary variable captures.