User loginNavigation |
LtU Forumprogramming languages with full-unicode syntax and identifiers are surprisingly hard to do well.I am working on a programming language. I want to fully integrate unicode - preferably the NFC/NFD repertoire, with "canonical" decompositions only. At the same time I don't want the bidi algorithm to be used to display code in a deceptive order on the page. But, in order to achieve this, I have to require LTR control characters in the program text after every RTL character where the following character is bidi "neutral" or "weak." Is that a mode I can set in any programming editor in wide use, or do I have to implement my own editor? Adding LTR controls in a separate step (like with a sed script or something) means there's an extra step I have to do before I see while editing, the same version of the code the compiler will be seeing. At the same time I don't want "lookalike" characters used to display deceptive identifiers. Nobody can tell by looking whether 'a' is Latin or Cyrillic, or whether 'A' is Latin or Greek, and I don't want programmers tearing their hair out trying to understand why the variable they thought they just initialized is holding some value set by malevolent code somewhere out of sight, or why a perfectly innocent "Jane_Doe" keeps getting blamed for the fraudulent transactions of someone else whose name appears to be spelled exactly the same. The most straightforward precaution here is to ban identifiers that contain alphabetic characters from more than one script, but it seems a lot like using a sledgehammer to kill cockroaches. A less restrictive rule would allow mixing scripts but not if you use any letters which are confusable between those scripts - for example you could mix Latin and Cyrillic if you do it without using any character that looks like "a" (or other characters that could be either) or you could mix Latin and Greek if you do it without using any character that looks like "A" (or "B", or "X", or other characters that could be either). But this makes the identifier syntax rules complicated to check and hard to easily express. Just two of the *MANY* issues that need to be addressed in order to allow a fully unicode-enabled programming language that's not a security or usability disaster. I used to hate Unicode a lot more than I still do. These days I recognize it as a massive hairball, but I'm not really angry about it any more; it's just one more piece of legacy design that clearly was NOT intended for the kind of use I want to make of it. So it's massively difficult to use, leaky, flabby, illogical, promotes insecurity worse than a language without array bounds checking, etc, but I guess I've just come to accept it and I'm finally getting around to trying to overcome the problems and try do something worthwhile with it anyway. Egel v0.1.8 (beta) released - do syntaxAnother Egel release. New in this patch: do syntax for applications, some utf bugs squashed. Egel is a small functional scripting language. What sets it apart is that at any time during execution the program, or any program fragment, forms a directed acyclic graph. That makes concurrency, serialization, and shipping of values more trivial than in most other languages and I want to exploit that once to make mobile code a breeze to implement. Shoot any combinator anywhere! Meanwhile, the egel language supports only the bare minimum as a front-end to a graph rewriting semantics. At the same time, sometimes I implement little syntactic sugar to make programming more pleasant. The latest addition is do-syntax. Like in many functional languages, you can set up application chains. For example, I found a slight need to facilitate this kind of programming with do, a do expression abbreviates a chain, for example, Of course, this isn't Haskell's monadic do syntax, but as we all know, applicatives are as good as monads. Or are they? By marco at 2023-05-22 21:24 | LtU Forum | login or register to post comments | other blogs | 1095 reads
Egel v0.1.8 (beta) released - do syntaxAnother Egel release. New in this patch: do syntax for applications, some utf bugs squashed. Egel is a small functional scripting language. What sets it apart is that at any time during execution the program, or any program fragment, forms a directed acyclic graph. That makes concurrency, serialization, and shipping of values more trivial than in most other languages and I want to exploit that once to make mobile code a breeze to implement. Shoot any combinator anywhere! Meanwhile, the egel language supports only the bare minimum as a front-end to a graph rewriting semantics. At the same time, sometimes I implement little syntactic sugar to make programming more pleasant. The latest addition is do-syntax. Like in many functional languages, you can set up application chains. For example, I found a slight need to facilitate this kind of programming with do, a do expression abbreviates a chain, for example, Of course, this isn't Haskell's monadic do syntax, but as we all know, applicatives are as good as monads. Or are they? By marco at 2023-05-22 21:23 | LtU Forum | login or register to post comments | other blogs | 955 reads
Egel v0.1.8 (beta) released - do syntaxAnother Egel release. New in this patch: do syntax for applications, some utf bugs squashed. Egel is a small functional scripting language. What sets it apart is that at any time during execution the program, or any program fragment, forms a directed acyclic graph. That makes concurrency, serialization, and shipping of values more trivial than in most other languages and I want to exploit that once to make mobile code a breeze to implement. Shoot any combinator anywhere! Meanwhile, the egel language supports only the bare minimum as a front-end to a graph rewriting semantics. At the same time, sometimes I implement little syntactic sugar to make programming more pleasant. The latest addition is do-syntax. Like in many functional languages, you can set up application chains. For example, I found a slight need to facilitate this kind of programming with do, a do expression abbreviates a chain, for example, Of course, this isn't Haskell's monadic do syntax, but as we all know, applicatives are as good as monads. Or are they? Ann: The Logic LabThe Logic Lab is a high-level DSL for specifying logics. There is a series of Youtube mini-lectures on this technology detailed in the Logic Lab home page. https://shenlanguage.org/Logiclab.html. You can also access these talks at the following addresses. History of the Logic Lab https://www.youtube.com/watch?v=VqnVWLFiEII The Logic Lab page also includes a download link for the program. This application needs the Shen standard library. There is a link on the Logic Lab home page to a 'batteries included' binary version of Shen under Windows (SBCL platform) with the standard library compiled inside it. As my first talk makes clear, this is a reissue of an old technology, published by me in the 90s, which has resided in the CMU repository for Artificial Intelligence for nearly 30 years and has been reissued under Shen. Mark By Mark Tarver at 2023-04-17 12:14 | LtU Forum | login or register to post comments | other blogs | 1676 reads
LinearLisp: a proof of concept in full linearization for AST-walking interpretersOr, how to walk an AST with (almost) no references on the heap and how it pays off: https://github.com/ysharplanguage/LinearLisp Implementation: C# / .NET 6. Comes with basic microbenchmarks for some contenders: Microsoft's PowerShell, Microsoft's JScript, and Python 3.10+ Rewrite.js – a minimalist s-expr based term rewriting systemRewrite.js is estimated to be a Turing complete, s-expression based term rewriting system. It may be used as a curiosity computing platform, formula application system, proof checker, problem solver, and pretty much anywhere where any kind of computation is required, as long as slower performance on intensive computations doesn't go beyond limits of user patience. Rewrite.js is designed as a creation with only one kind of rules: substitution rules. Being such a minimalist creation, complete rewrite.js implementation fits in a bit more than 400 Javascript lines of code. To get a feeling about rewrite.js appearance, this is a math expression rewriting code in rewrite.js: ( ( REWRITE ( MATCH (VAR <a>) (RULE (READ <a> + <a>) (WRITE 2 * <a>)) ) ( MATCH (VAR <a>) (RULE (READ <a> * <a>) (WRITE <a> ^ 2)) ) ) (x + x) * (x + x) ) The above example evaluates to: ((2 * x) ^ 2) Rewrite.js is hosted on https://github.com/contrast-zone/rewrite.js with convenient online playground. Aside from criticism, I'm particularly interested in possible rewrite.js usage ideas. JIT: it's complimicatedWhat Exactly do we Mean by JIT Warmup? 2016 but still interesting, I think. Weird Results. Many benchmarks don’t warmup under the classic model. New goal: Try to understand why we see “weird” results. By raould at 2022-12-16 01:07 | LtU Forum | login or register to post comments | other blogs | 1934 reads
Pico Scheme (A purely functional subset of scheme)The Pico Scheme specification just came out. It is a purely functional subset of Scheme (R7RS-small) that I edited. By jrincayc at 2022-12-09 22:49 | LtU Forum | login or register to post comments | other blogs | 2654 reads
Deallocation patterns and linear types (e.g. Rust)I've been [much] belatedly poking at Rust - particularly the move/borrow model, and I was then thinking about implications for allocation and deallocation performance. Though I haven't looked seriously, a couple of things occurred to me that I haven't seen people talking about. Nothing in any way profound here, but perhaps worth considering for those of you designing or implementing new languages, and it appears to me that this applies to many langugaes using linear type. Rust's reference counted pointers fall outside of what I'm going to describe. I was pondering lifespans in Rust, thinking that there are probably cases where deallocations in Rust occur later (for naive lifespan reasons) than we might like - at least according to the simple description of lifespans in the specification, which is conservative. Also that the rules for object deallocation have the consequence that the cost of function return becomes hard to understand because it is difficult for a programmer to have good intuitions about deallocation overheads associated with the return. This got me to pondering deallocation and heap fragmentation to try to understand whether that's an actual problem, and then to wonder how many of Rust's allocations can be stack allocated and deallocated in the style of
If I have this right, region deallocation for statically sized objects is reduced to running destructors (in Rust: There are practical concerns - many operating systems make naive stack overflow tests that would need to be worked around - but that's neither new nor hard. The handling of "uncertain death" regions can be improved. Even if we don't know the exact order of region death, we can generally determine some place in the region exit ordering where we know conservatively that a region has become unreachable. It seems to me that we end up with three kinds of allocation areas:
The absence of reference counting (or analogous general object heap allocation mechanisms) in a program tells us whether we need to bind the collector into the binary. For anybody whose thought carefully about implementing linear type, I suspect all of this is pretty obvious. I hadn't gotten quite that far in my thinking about heap management in BitC, though I had certainly been thinking about compiler-driven stack allocation. |
Browse archives
Active forum topics |
Recent comments
23 weeks 5 hours ago
23 weeks 9 hours ago
23 weeks 9 hours ago
45 weeks 1 day ago
49 weeks 3 days ago
51 weeks 17 hours ago
51 weeks 17 hours ago
1 year 1 week ago
1 year 6 weeks ago
1 year 6 weeks ago