## Booleans vs strings

I'm working on a simple, dynamically-typed language for business users. We're trying to have as few datatypes as possible to reduce the number of things they have to learn (currently just strings, numbers and tables). I'm considering just using "true" and "false" strings instead of boolean values. I can't see any arguments against it but I can't help feeling that I'm going to regret it.

Is this sensible? The only language I'm aware off that does something similar is TCL, which also allows yes/no/on/off.

EDIT: I suppose erlangs true/false atoms are similar. That gives me some encouragement.

## Comment viewing options

### Issues

There are three issues that immediately spring to mind:

1. What is the result of evaluating a boolean operator on strings that are not "true" or "false"? This is the impact on the interpreter of programmers frequently mistyping simple things: if "treu", if "True" or many other near-misses.

2. What is the impact on the programmer of these simple typos, in particular how visible is it to the programmer what has gone wrong and where the source of the error is?

3. You seem to be describing LUA. Would it not be simpler to just use LUA?

### 1. The only condtional in

1. The only condtional in the language is pattern matching, so at some point it would get matched against "true"/"false" and anything else would raise a match error.

2. We evaluate code live so the result is immediately visible. On an error we pause the code and drop the user into a debugger at the point where the error happened. That does still force them to follow the flow of "treu"s back to the source by hand. That seems like a problem we need to solve in general, not just for typos. Some form of type inference might help catch that earlier.

3. We diverge drastically from lua (live-coding, purely functional, pattern matching, graphical editor etc). The focus is on making a highly discoverable environment.

### lispy fix-and-continue?

if there's no static checking up front to catch my fat-fingers, then i sure hope e.g. after a 10 hour run and then the failure i can just fix the typo from ture to true, and keep on trucking? :-)

### Yep, thats the plan.

Yep, thats the plan.

### We're planning to also have

We're planning to also have structural type inference (ala dialyzer) in the ide to help catch mistakes like that up front.

### seems OK

Numbers shouldn't be strings because of problems like a_number + 0 == a_number. If equality is string equality that isn't always true, assuming some number values have more than one read representation. On the other if there are two distinct equality operators then has anything really been simplified?

It's hard to prove a negative but I can't think of anything that could go wrong using strings as booleans assuming:

If an operator expects a boolean, it should accept only the strings "true" and "false".

For example, not "zing" should be an error. So should be if "zing" then blorg()

I think this works because:

1) All familiar boolean operator identities hold as usual. For example, not not x == x

2) Aside from ==, boolean and string operators are disjoint. Boolean and string equality can be the same operator without problem.

Without the strictness error, you would wind up with problems like:

if "foo" then blorg();

Supposing "foo" is treated like "true", violating strictness. So, blorg is called.

So: if not "foo" then blog(); does not call blorg.

So not "foo" == "false"

So not not "foo" != "foo"

That leads to strange problems like if, while simplifying a program, a person winds up with:

y := not not "foo"

then strictly speaking it would be incorrect to simplify it to:

y := "foo"

Something similar happens in C: 5 != !!5. The problem doesn't make a programming language unusable. But it does make it more confusing and the priority goal for your language is apparently simplicity (but not over-simplicity).

### no, I take that back

Actually, there is a problem:

not ("zing" == "true") == "true"

but:

(not "zing" == not "true") == "true"

is an error. So an example of a boolean identity that is lost is:

not (x == y) == z

is not the same as

(not x == not y) == z

so I guess they should be separate types.

### Given that the langauge is

Given that the langauge is dynamically typed, that seems like a problem even with separate boolean types:

not ("zing" == true) == true

(not "zing" == not true) is an error

The only way I can see to avoid that is if ("zing" == true) is an error, but then we the user need to check (type("zing") == type(true)) all over the place.

I'm not sure I like either version. Getting equality right is going to be hard.

### equality vs. sameness

Getting equality right is going to be hard.

One way is something like:

a == b is defined if the operand are both of the same type, and an error otherwise.

a same? b is true if a == b and is otherwise false.

Where your pattern matching flow operators *implicitly* pick a form of equality, default to same?.

When teaching a novice programmer about equality in your language, initially only teach == and leave same? as an "advanced topic".

The advanced topic is basically polymorphism and the topic would include same? and type predicates string? x, number? x, boolean? x, and table? x.

The idea is that the novice gets pretty far relying just on == and not having to know much about types.

### Hardness

I have a theory about programming language design.. if you're having trouble figuring out an algorithm, and the implementation of it looks complex .. its wrong. Its wrong because the user is going to have exactly the same problem you are. If the actual algorithm is complex, reasoning about the code will be complex.

I have made this kind of mistake a number of times in my own language, I regret now the extra work I had to do implementing something clever which turned out not to be clever at all.

Business logic is NOT simple. It's intrinsically complex to start with. So anyone coding it needs some understanding of programming and just isn't going to have any problem at all with a boolean type.

In fact, since business often involves multiple options and that's quite normal, I'd seriously think about giving them a real sum type. They'll have less problem with it than the average CS trained programmer.

### accential vs. essential complexities

While I hear some ring of truth in what you say, I think people have to be careful about extremes and about what precisely is needed. There's lots of crap in programming languages that most of us don't recognize as crap because we're inured to it. That doesn't make it stop being crap in the absolute universal truth insightful usability sense.

If we are trying to reduce something (business process) down to math (algorithm) then presumably either the algorithm writer has to know math or the system has to know enough math for them to help them write it out in a way that works. I suspect we don't have systems like that, so it means the algorithm writer does have to know math.

If we are trying to reduce it to something else, like a rules system, then the person / system have to know enough / hand-hold enough to get it working.

etc.

So yes it is probably dangerous and silly when people think it can be "simplified" into something too simple (as in they went past the "but no simpler" rule).

But we should still be striving to look for the true essence, the true essentials, and seeking to reduce and eliminate the accidentals.

### As It Stands, I'd Lean No

First, strings are essentially lists or arrays, and string inequality may often be quickly noted, but verifying a match requires checking every character. If the design impetus is to make the boolean type less explicit for your users, I'd go with shorter strings, perhaps "y" and "n."

The quickest test for equality is that two values have the same memory location. This is why most languages have true or false expressed as symbols representing constants. Where booleans are a type, I think it's so the headache of C's 'if ( x = 0 )' remains where it belongs. As I think about that, I worry that collapsing boolean types into strings may encourage obfuscated and hard to debug code, in that an expression may generate its "truth" through string manipulation.

I assume this design choice is motivated by your wish to simplify the language for your target users. That's a good thing to aim for. I hold the difficulty in teaching a language is not in the specific representations of true and false, but in demonstrating the concept and practice of building tests and branching.

### You will regret it!

It is best to limit the language as much as possible. You can always relax restrictions later on, but it can be very hard to make an existing language more stringent. For example, why have strings "true" and "false" when you must deal with the possibility that someone gives you "blarney" instead?

And for that matter, if your tooling is supposed to support people whose primary job is not programming, then you should be as stringent as possible so that itâ€™s extremely clear what the boundaries of the language are. For example, you could make lint rules such as (a = â€œfooâ€) = false into errors, with the suggestion to use the simplification (a â‰  â€œfooâ€) instead. In that vein, type checking would be nice; but type inference is probably unnecessary.

Also, as others have mentioned, consider using an out-of-the-box solution such as Lua, which has the advantage of being well-documented outside your company, and is also a transferable skill.

### Dependent types

It isn't so bad to use strings to represent booleans if you use dependent types or similar, i.e. such that you can enforce upstream the expectation that a string is "true" or "false" and nothing else (and thus carries one bit of information). Even for dynamic types, this can work if 'error' is not observable (i.e. it halts the program).

Though, booleans are awful anyway due to boolean blindness. I ended up favoring sum types as a foundation for conditional behavior.

### I ended up favoring sum

I ended up favoring sum types as a foundation for conditional behavior.

Exactly. Reifying propositions as values means you lose important semantic information. I don't think many languages have taken this route however, so I'm interested to see how it turns out.

Most uses of bools seem to be justified because of insufficient abstraction, ie. you want to pass a membership testing function (forall a.a -> bool) into some polymorphic function, so you need to be able to abstract over propositions-that-aren't-values. In the case of sums, this implies abstracting over sums ala views, or first-class patterns/cases of some sort.

### Unconvinced

I didn't find the Bob Harper's article very convincing. Ignoring the mumbo jumbo about constructivism, which I found totally unconvincing, I think "just restructure your program so that all of the properties you are interested in come for free" isn't realistic (or good) advice. It's not even possible in many circumstances. What should you do when the boolean you're trying to eliminate implies the presence of data in several places with differing multiplicities? Even when it is possible, I think it's often not a good idea to do because unwrapping constructors everywhere is noisy and it couples the code that uses some data to the logical structure governing when that data is available. The article makes a snide comment about requiring a SAT solver, but in those cases where the boolean can simply be replaced with constructors the reasoning required to verify that the correct booleans hold is equally trivial.

### Boolean blindness

The difficulty with booleans is something I encountered long before I read Bob Harper's article, but he gave me a nice name for it. The real problem is well summarized towards the end:

"weâ€™ve crushed the information we have about x down to one bit, then branched on it, then were forced to recover the information we lost to justify the call to pred" -- Bob Harper

The pattern of losing information then recovering it does not generalize well to computation with substructural types or heterogeneous systems (e.g. GPU/CPU partitioned memory). It is also not very compositional.

unwrapping constructors everywhere is noisy and it couples the code that uses some data to the logical structure governing when that data is available

I don't follow these complaints. Unwrapping constructors does not need to be noisy (cf. typeclasses). The framework (or logical structure) to access information can be decoupled from the function that uses it (cf. lenses). If we avoid booleans, a few programming idioms will change, but that isn't a problem.

What should you do when the boolean you're trying to eliminate implies the presence of data in several places with differing multiplicities?

We should have a very good, structural explanation of how this boolean is obtained or enforced.

In any case, the idea isn't to "eliminate" booleans. The idea is to avoid them in the first place. The difference is significant; in the latter case, your libraries and frameworks and language will be helping you out; in the former, booleans are already deeply embedded and you'll be fighting for every step.

"just restructure your program so that all of the properties you are interested in come for free" isn't realistic (or good) advice. It's not even possible in many circumstances.

Just be interested in the properties you can achieve structurally. ;)

### Well

I don't follow these complaints. Unwrapping constructors does not need to be noisy (cf. typeclasses). The framework (or logical structure) to access information can be decoupled from the function that uses it (cf. lenses).

Ok, I admit that I may have been over-generalizing or reading too much into the article or what you were endorsing by linking to it. And I agree there is a real problem, but think that just wrapping everything in constructors is a cure worse than the disease. I'm open to the possibility of other approaches that solve the problem (typeclasses, lenses, etc.). I have my own ideas.

We should have a very good, structural explanation of how this boolean is obtained in the first place.

No, we generally don't, because when we're writing code to use the boolean we might not have even written the code yet that produces it. And even if we have, coupling the two leads to brittleness.

The idea is to avoid [booleans] in the first place.

I agree and I'm in favor of keeping a better handle on them in the first place. Again, my main complaint is with what I understood to be the proposed solution. FYI: I also don't like Option types (or explicit sums) as the replacement for null in most cases.

### when we're writing code to

when we're writing code to use the boolean we might not have even written the code yet that produces it

I don't see the order in which you write code to be relevant. If a particular subprogram doesn't need much detail about how a condition is observed or enforced or explained, you can abstract on those details at the interface to the subprogram. This abstraction is orthogonal to whether those details are preserved. We can also have functions to compose and extend the abstract explanation - indeed, every function on (whatever replaces booleans) may do so.

In any case, booleans should almost never be inputs to interesting functions or subprograms. If they are, you're probably doing modularity wrong; consider instead writing a separate function for each case. If necessary, separate the case-recognition logic.

I also don't like Option types (or explicit sums) as the replacement for null in most cases.

I'm not fond of nominative types, including Maybe a = Just a | Nothing. I would favor type Maybe a = Either () a (or () + a), and to just have everything in terms of explicit binary sums. There are a lot of nice structural operations we can perform using more structural types - e.g. not :: (a + b) â†’ (b + a) and left :: (a â†’ a') â†’ (a + b) â†’ (a' + b). Generic programming is much easier when we can decompose and recompose types. Further, we can readily generalize from Maybe to ErrorT e if we want to treat it as a Monad.

With dependent types, we can presumably leverage dependently typed pairs to model sums. However, I've had quite a lot of difficulty generalizing structural operations to dependent pairs. How can we preserve the type inference relationship when performing structural operations on the pair? swap :: (a * b) â†’ (b * a) isn't particularly difficult, but when we start using assocl :: (a * (b * c)) â†’ ((a * b) * c) and first :: (a â†’ a') â†’ (a * b) â†’ (a' * b), we must start reasoning about whether functions on a are injective so we can recover the relationship between a' and b.

Further, dependent pairs still have the same fundamental boolean blindness issue of performing logical recognition on a then forgetting about it, repeatedly.

I don't know many other alternatives, so I ended up favoring structural sum types.

### If a particular subprogram

If a particular subprogram doesn't need much detail about how a condition is observed or enforced or explained, you can abstract on those details for just that subprogram.

I was considering explicit abstraction to be expensive for such a common operation as this. In the early stages of programming, when you're just writing down fragments of code, I think it's helpful to not worry about writing down the precise assumptions or context in which each fragment works. I still think coarse HM-ish typing is useful at this stage, but more precise types for establishing program properties should be optional and usually come later. Forcing early resolution of such logical properties by embedding them into the structure of data is IMO an anti-pattern.

What I have in mind is more like (oft-maligned?) type state. Rather than a type of "Maybe Int", you'd have a type that says "Int, when initialized". This feels like it fits into my language philosophy better, but I haven't hammered out too many of the details yet.

### Coupling structure and logic

Forcing early resolution of such logical properties by embedding them into the structure of data is IMO an anti-pattern.

You seem to be assuming early resolution of the data structure.

Coupling logical properties to data structure some nice consequences. We can abstract logical properties by abstracting data structure - something we know how to do very well (ADT, OO). We can manipulate and refine logical properties by manipulating and refining data structures - either constructively (e.g. non-empty list as pair of element and list, substructural and modal types) or analytically (i.e. by assertion or contract).

During early stages of development, we can use more generic data structures - lists, matrices, graphs, trees, streams, etc. - and thus enforce fewer logical properties.

### Fuzzy judgements

I know I'm making assumptions and acknowledge that these are fuzzy judgements I'm making. (The original rant against booleans was similarly fuzzy). The design space for programming languages is quite large and it's quite difficult to even identify what my assumptions are.

During early stages of development, we can use more generic data structures - lists, matrices, graphs, trees, streams, etc. - and thus enforce fewer logical properties.

When you switch from using lists to using pairs to encode non-empty lists, do you have to change all of the code that uses this type?

### Program refactoring

When you switch from using lists to using pairs to encode non-empty lists, do you have to change all of the code that uses this type?

In general, yes. Fortunately, this sort of global refactoring can be quite straightforward and easy, especially when guided by a type system.

Partial functions (and inference of refinement types) can help ease the transition, make it more incremental (one small subprogram at a time instead of "change all the code"). E.g. Haskell has head :: [a] â†’ a and tail :: [a] â†’ [a] on a regular list, which diverge if applied to an empty list. My language offers assert :: (a + b) â†’ b for similar reasons, except I treat assertions (and non-divergence in general) as a weak objective for static proof (warn if cannot prove or disprove within limited effort).

Of course, best practices for API design still apply for shared libraries or frameworks. One should encapsulate decisions and structures that are likely to change.

### Separate function for each case

"In any case, booleans should almost never be inputs to interesting functions or subprograms. If they are, you're probably doing modularity wrong; consider instead writing a separate function for each case. If necessary, separate the case-recognition logic."

Although this advice seems sound, it is very often not practical because the language fails to provide suitable support.

In the first instance replacing a function with so-called "threaded logic" with two functions will lead to substantial parts of the function being duplicated. So next you will tell me to factor the common parts out into subroutines. Which now has major problems because you have to name them, and you have to pass context to them which means you need a type representing what was previously private state.

If you now take these issues of bifurcation and have 4 or 5 boolean arguments for threading, unthreading the code is utterly intractable.

This problem is manifest in Ocaml pattern matching, where your first crude split into several cases is followed by a second test, which is repeated in several branches. You can certainly lift the common code out of the match expression but then you have lost context and more importantly locality, defeating the whole idea of lexical scoping.

So despite the apparent evils of threaded code, it actually has better compositional properties than the alternatives in many cases. No one likes the spaghetti presented in many Posix functions, lots of flags and stuff, but when you look at the complexity of the operations represented its hard to see a good alternative.

### Language can hinder clean

Language can hinder clean factoring, I agree. Local variables, nominative types, and second-class patterns can be major hindrances. As can be poor dataflow optimization.

I've leaned towards concatenative style (even in Haskell and C++) and structural types (templates, pairs, etc.) for a long time. GHC's ViewPatterns extension is also nice for factoring and flattening out some of the case-recognition logics.

### I wonder to what extent

I wonder to what extent boolean blindness can be ameliorated by first class patterns.

Suppose that all the primitive boolean-returning functions are actually patterns instead eg

 match x with | (>= 0) -> ... | _ -> ... 

 filter(xs, match (>= 0)) 

Then when you branch on a proposition the information that it encodes is right there. You could of course still choose to return some bool-like value.

When patterns are statically determined they can provide structural type information directly without having to work backwards from bools and the match compiler has more information to work with when ordering tests.

Erlang goes some way towards that ideal but it lacks first-class patterns and so functions like filter still have use boolean atoms.

### Patterns help only if they refine

Patterns help only if they refine the matched values.

In your example, the (≥ 0) pattern would generally not help. The code on the RHS, the ..., would use x again but contain a context-dependent assumption that x ≥ 0 (or x < 0 on the failed match branch). Because these assumptions are context dependent, it is difficult to refactor or extract this code for reuse.

You might do better with something like GHC Haskell's ViewPatterns extension, which aren't quite first-class but are flexible and subject to abstraction.

### And why not?

A compiler can trivially add local assumptions to the typing environment. They just have to be conservative. For example, in imperative pseudo-code:

int f(dynamic x) {
// x :: dynamic
if (x is int) {  // x :: int
if (x > 0) {  // x :: int, x > 0
return (x - 1);
} else {  // x :: int, x â‰¤ 0
return (x + 1);
}
} else if (x is object) {  // x :: object
// x :: object
if (x = null) {  // x :: object, null(x)
return 0;
} else {
return x.toInteger();  // x :: object, nonnull(x)
}
} else {  // x :: dynamic (no new information)
return 0;
}
}

You often see boring branchy code like this in the wild, making assumptions that could easily be verified by the compilerâ€”for instance, that itâ€™s safe to access .toInteger on x if x has already been tested for nullity.

Of course, with mutatey OOP-land there are some caveats: you have to propagate assumptions through calls (undecidable?) or assume conservatively that because any non-const member function can mutate any mutable member, assumptions canâ€™t be preserved across calls via this.

### Because these assumptions

Because these assumptions are context dependent, it is difficult to refactor or extract this code for reuse. Without effective modularity and decomposition, we cannot readily abstract these deep 'local' patterns. They get repeated at each site.

Verifying such code can be a good thing, of course. Typed Scheme seems to work on a similar design. But it fundamentally doesn't address the boolean blindness problem - i.e. that you are forced to keep all this context (which is not part of the boolean) in order to make sense of the boolean.

### Thatâ€™s true.

If I want to extract an interior block, I may not be able to do so because its contents may rely on assumptions from an enclosing one. Ideally you would have a first-class way of talking about this kind of information, which effectively implies having proof objects.

I do question whether doing it â€œthe right wayâ€ is worth the additional cost, though. Having a model thatâ€™s easy to demonstrate concretely is important, and I think you would have little trouble explaining something like this to Sam Imperative Programmer:

int f(nullable int x) {
return (x + 1);  // error: (+) expected int but got nullable int (x)
}

⇓

int f(nullable int x) {
if (null(x)) { x := 0; }
return (x + 1);
}


### Refinements

Proof objects can be a bit heavyweight, yes. What I've been pursuing is something lighter weight - just a different way of expressing observations that is hopefully easier for both human and type system to track.

Instead of x is int returning a boolean, I might model observeInt :: (Observable x) â‡’ x â†’ (x(not int) + x(int)). (Here 'Observable' means 'x' must accept this sort of introspection; in a conventional dynamic language, all values might be observable. But I'm not fond of universal introspection.) We could then, in the left branch, set the value to 0 then merge. Similarly, instead of x ≥ 0 returning a boolean, I might model observeGTE :: (Comparable x y) â‡’ (x*y) â†’ ((x*y)(when x<y)+(x*y)(when x≥y)).

By associating type refinements with data structures, the type checking should (hypothetically) be easier to perform, and certainly operations to collapse the sum (losing information, type unions) are much more explicit. But the biggest motivation is that operations on sum types can be composed and decomposed, i.e. unlike if/then/else expressions which are syntactically closed.

### Thoughts

Design to meet your users' needs and expectations. Trained programmers already know Boolean logic, so will come expecting your language to provide familiar true and false symbols, and may get stroppy if it doesn't. However, non-programmers have no such preconceptions, giving you much more leeway in how you implement and teach Boolean-based operations. As long as it's simple, consistent, and can be quickly learned if/when it becomes necessary to know, they'll accept whatever you give them.

Consider what role(s) Booleans actually serve. I can think of three: conditional switching, flag-style arguments, and arguments/return values to comparison/logic operations. You're already using pattern matching for the first - effectively cutting out the Boolean middleman - so they're not important there. The second is far better achieved by dedicated two-value enumerations, e.g. if you're providing a sort function, "normal"/"reverse" is far more self-explanatory than true/false for specifying the sort order.

That really just leaves comparison operations, and unless users are performing lots of those then I'd say there's very little justification for a dedicated Boolean type, or even pseudo-Boolean "true"/"false" representations. I'd suggest just treating empty values as false and non-empty values as true, and seeing how they get on with that. For example, the end-user language I'm currently developing takes this approach: since most of the relevant use-cases are conditional tests of the form "if input, do this/if no input, do that", testing for emptiness actually works out much simpler than testing for Boolean-ness.

...

I'd also suggest getting rid of the distinct number and string types, and just have a single "text" type with relaxed quoting rules. End users are task-driven: they care about behaviors, so focus on your [function] interfaces. Data's just data: if it looks right, it should just work when they throw it at the desired interface. Start telling them that they can do math with 123 but not "456", or that Bob, bob, and BOB are fundamentally different things, and they're going to stare at you like you're wrong in the head. You'll find end users have a very poor, muddled grasp of abstract concepts such as "type", "value", and "variable" at the best of times - heck, just developing a working grasp of grammar and punctuation rules takes no small amount of effort. The more you can eliminate abstract concepts in favor of concrete ones, the better: anything that isn't directly facilitating their goals is impeding them.

Obviously, having a single scalar type means you'll need to provide two distinct sets of operators - one for performing math operations (using symbols and syntax as taught in school), and another for performing text manipulation. While this might sound more complex (1 type + 16 operators rather than 2 types + 8 operators), each of those 16 operators is far simpler to learn and use. When reading code, a user can tell how any given operator will behave just by looking at its name, which is immediate and concrete information. OTOH, an overloaded operator will act differently depending on what sort(s) of values are fed into it, and since most of these values aren't written as literal operands but are supplied via variables and other expressions, deducing that indirect, abstract information is a lot more work.

Having dedicated text operators should also let you support optional arguments for specifying whether comparisons should consider or ignore case/whitespace/accents/etc. without muddying up the clean, simple semantics of your math operators. In particular, case should be insensitive by default; however, users should still have the option to override this on the rare occasions when they do need case to be taken into account. And non-programmers will expect your language to handle stuff like this for them.

Lastly, non-overloaded operators should be far easier to document: you can cover one set under a "Doing math" chapter and the other under a "Manipulating text" chapter, with no coupling or crossover between them. OTOH, if you've two distinct number and string types plus overloaded operators applicable to both, it's a PITA figuring out how, when, and where to slice up your coverage for the least amount of unavoidable repetition and/or page-hopping.

### That's pretty convincing.

That's pretty convincing. Even for comparisons, the result is not really boolean eg  equals: x. x -> x -> "equal" | "not equal" compare : x. x -> x -> "less than" | "equal" | "greater than" 

Performance-wise, we can maybe do something with interning, along similar lines to erlang atoms or clojure keywords.

Getting rid of separate number types make me nervous. For scientific work its quite important to be explicit about precision and I'm not sure how to do that with only a single number type. Perhaps we could default to decimal arithmetic with string-like numbers and only expose other representations as an advanced feature.

Since we're building a structured editor we have a lot more leeway with syntax eg we already have a prototype 'math mode' editor where you see nicely laid out MathML rather than flat text. The idea is that the user is choosing a context to work in at any moment and we only suggest functions that make sense in that context.

### number worries

FWIW, I started writing a longer response, but it turned into a big long raw braindump which may or may not be of any value, so I'm a little reluctant to post it here. I could bob it to you over email if you want, or post it here if you really want.

As to numeric types, bear in mind that what the user sees and thinks doesn't necessarily need to match what the interpreter sees and thinks. Language design is UI/UX design: the first goal is to meet users' needs and expectations as best as possible, not make life simple for the poor language developer.;) The only real rule is that where you do hide internal complexities to simplify the external interface, make sure those abstractions aren't going to spring leaks all over the user five minutes after they start to use the.

So, as far as having distinct number and string types:

Your runtime could internally implement both Number and String classes and have the parser optimistically treat any token that looks numeric as Number and the rest as String. You can then implement a pair of Number->String and String->Number coercion handlers, so that it doesn't matter if the user passes (String,Number) to the math addition operator or (Number,String) to the text concatenation operator: the runtime will automatically coerce the operands to the required types, (Number,Number) and (String,String) respectively on the operator's behalf. From the user's POV though, you just document it all as being a single 'text' ['scalar', whatever] type.

This is similar to what kiwi does. Since math and performance are not major requirements it doesn't bother having an internal/explicit Number type at all. Relaxed quoting rules means most text never needs quoted anyway, so 123.45 and "123.45" are both exactly the same thing: a text value that just happens to look like it's some sort of number-y data. OTOH, kiwi's 'text' type is actually implemented as a cluster of concrete classes (to which I could always add a NumberText in future if it became of significant benefit):

1. PlainText is a plain vanilla character sequence (i.e. your standard String), e.g. "Hello, World!"
2. RichText is a text value with one or more unevaluated rules attached (think "attributed string", but with behaviors, not styles, attached), e.g. [bold, case (upper) @ "Hello, World!"]
3. ExpandedRichText is a text value with one or more partially-evaluated rules attached, e.g. [bold @ "HELLO WORLD!"]
return TRUE;
}
else {
return FALSE;
}

if (!\$foo) {
bar();
}
else {
baz();
}


### Booleans as data

In some sense, we could say the same of information in general: "ultimately, information must either be ignored or branched on". Ultimately, the only way to 'use' information is to have it influence some decision.

But we can also compute with information, calculate and communicate without observing or branching. And the same is true for booleans. E.g. if you say a and b we can combine two booleans without branching. It isn't clear to me that booleans are distinct from other value types in this respect.

### if you say a and b we can

if you say a and b we can combine two booleans without branching.

How could "and" be implemented without branching? Even if you shift the branch into a lower layer by making "and" a primitive, we're still stuck with a boolean result that's only good for branching on.

It isn't clear to me that booleans are distinct from other value types in this respect.

They're not, at least not from other finite enumerations. That was kind of my point: a language shouldn't give special attention to booleans 'because they're used for flags and (Boolean) logic', since those are just ways booleans are *used*, which also apply to other enumerations.

There are 'legitimate' reasons to special-case the booleans, for example if you only want a single finite-enumeration type and booleans are the simplest.

When breaking with tradition like this, I feel it's important to distinguish between things which are fundamental/unavoidable (ie. branching) and things which are historical/conventional/avoidable (ie. true/false flags, boolean logic). This way, old patterns aren't shoe-horned into the new design when they're not appropriate. For example, true/false flags are avoidable by using has's observation that passing the strings "true" and "false" into a function is a wasted opportunity when we could be passing descriptive strings like "normal" and "reverse".

### That was kind of my point: a

That was kind of my point: a language shouldn't give special attention to booleans 'because they're used for flags and (Boolean) logic', since those are just ways booleans are *used*, which also apply to other enumerations.

Yep, this was my point too. The difference is: it's those patterns of usage that you need to study and understand, not the language-defined types they're currently using to do it. You design the tools to serve the tasks, not the tasks to fit the tools.

### for example

e.g. To illustrate: While the OP worries about how to represent numbers adequately without providing users a dedicated number type, my response would be: "But as scientists, how are they supposed to represent everyday values like 37.8Â°C and 8000K, 0.25 Ã…, 314.159Âµm, and 2.0626e5 AU, and so on, using only number type and its literal representation of values?"

Though IANAS, I'd speculate that being able to describe and manipulate such data efficiently and effectively will be of infinitely more interest and value to such users than anything the language can teach them about types.

But don't rely on my assumptions, your assumptions, or anyone else's. The only way to be sure is to go speak to them and learn from them. Get out on the shop floor and communicate with the people doing the actual work: they're your intended users, and the ones you need to please. Learn how they think, what their jobs are, and how they do them. And make sure you check all your own CS preconceptions at the door: while such knowledge will be a big help once you implement your design, it can only hinder (or cripple) the process of discovering that design in the first place.

### And you'd be correct from a

Booleans only have one use: branching (ie. conditional switching). Your other two examples are just the interaction of branching with function abstraction

And you'd be correct from a purely technical POV: all Booleans ultimately end up consumed in the fiery depths of a JNZ.

OTOH, I'm coming at it from the users' perspective, and what various Boolean values might mean to them as they fly hither and yon before they reach that final destination.

Since my own end-user language designs are behavior-centric, my interest is what goes on at the user interfaces to those behaviors, because as interfaces - that is, function signatures and calls - are the one bit of a program that's always completely concrete and visible to them, that's what they'll hang their understanding on*.

A better phrasing for my question might be:

What explicit and/or implicit meaning[s] might a given Boolean value carry for 1. the user, 2. the function, at the point it meets the function (behavioral) interface?

followed by:

Of these use-cases, which really are best served by Booleans, and which should/might be better/alternately achieved by other means?

Which, as I say, is something that can only be answered from POV of "What does the end user want/expect/understand/need?" Language design is UI/UX/HCI design, so questions like "What makes the language designer's life simple and painless?" or [worse] "What does the hardware need?" may be indicators you're doing it wrong.;)

--

* i.e. Not fleeting magical "objects" that only machines pretend to see; nor endless arcane ontological quest into the true meaning of "type"**.

** (I'll get my coat...)

### another option: booleans are not values

I still take this bit to be the primary constraint, emphasis added:

simple, dynamically-typed language for business users.

One idea is to omit booleans as first-class values. Instead, allow boolean "expressions" (let's call them "tests") only in conditionals.

In other words, if you were to have a simple conditional ("if .. then .."), then it's syntax would not be:

  if expression then statement


  if test then statement


Tests would be something like:

   expression == expression
expression < expression
...
test and test
not test
...


(Since the language includes side effects, perhaps potentially side-effectful expressions should not be included in the syntax for tests.)

This suggests a programming style in which a function would never "return a boolean" but could instead return a number or string used to encode some concept:

    return "overdraft"
return "full-time"
return "true"


### MUMPS

No mention here of MUMPS? Which, as it was described to me in grad compilers class, has only one data type, the character string.