> This idea is so simple, so crazy, so stupid, and works so well, but I never thought of it. Bravo to the authors.
Thanks for the nice summary -- looking forward to read the paper!
The same idea of self-tagging is actually also used in Koka language [1] runtime system where by default the Koka compiler only heap allocates float64's when their absolute value is outside the range [2e-511,2e512) and not 0, infinity, or NaN (see [2]). This saves indeed many (many!) heap allocations for float intensive programs.
Since Koka only uses 1 bit to distinguish pointers from values, another slightly faster option is to only box negative float64's but of course, negative numbers are still quite common so it saves less allocations in general.
ps. If you enjoy reading about tagging, I recently wrote a note on efficiently supporting seamless large integer arithmetic (as used in Koka as well) and discuss how certain hardware instructions could really help to speed this up [3]:
That's a nice overview of Hindley-Milner in practice!
For those interested, I recently have been thinking of a better way to specify type inference with principal derivations that lends itself better for type system extensions:
The man, the myth, the legend! Ok, so for extensible records: I don't have a good intuition for why the record objects would have to keep around the shadowed (hidden/duplicate) fields at run-time. Do you have a motivating example for that? Also entirely possible I missed it in the paper
(I'll check out the new paper later - thank you for the link)
Haha, thank you for your kind reply :) really enjoyed the blog post as it shows nicely that implementing HM can be straightforward -- many papers on inference are usually quite math heavy which can be a tough read.
About duplicate labels.. one needs to retain the duplicate field at runtime _if_ there is a "remove_l" or "mask_l" operation that drops a field "l". For example, `{x=2,x=True}.remove_x.x` == `True`. (Where the type of `remove_l` is `{l:a|r} -> {r}`)
This comes up with effect systems where we could have 2 exception handlers in scope, and the current effect would be `<exn,exn>` (which corresponds to a runtime evidence vector `evn` of `[exn:h1,exn:h2]` where the h1,h2 point to the runtime exception handlers.). If a user raises an exception it'll select `evn.exn` and raise to `h1`. But a user can also "mask" the inner exception handler and raise directly to `h2` as well as `evn.mask_exn.exn`.
One could design a system though with different primitives and not have a remove or mask operation, such that the duplicate fields do not have to be retained at runtime (I think).
(Anyway, feel free to contact me if you'll like to discuss this more)
Yes -- I think historically the power of condition handling was not well understood and algebraic effect handlers were a "rediscovery" coming from well-studied category theory (Plotkin, Power, and Pretnar).
If you want to play with "structurally typed condition handling", then the Koka language has "row-typed algebraic effect handlers" that compile to C: <http://koka-lang.org>
I like the characterization that Andrej Bauer uses: "while relates to goto, as effect handlers to shift/reset" :-)
That is, you indeed need delimited continuations to implement effect handlers, but they have more structure than, say, shift/reset. In particular, instead of being able to just yield (shift) with any function to an innermost context (reset), you can only perform a certain set of operations that are handled by a function defined in the handler.
This gives various advantages, in particular,
- you can give simple types to effect handlers (while shift/reset for example needs an answer type system), and you can use fine grained effect typing (instead of just a "shift" effect, you have "<exn,console>" effects)
- you can do better reasoning as the set of operations is restricted
- and (as a consequence) you have more opportunity to optimize them. In particular, tail-presumptive operations can be implemented to execute "in-place" without capturing the continuation for example.
- finally, different effect handlers are compositional and can be combined freely (unlike shift/reset; for example shift aways shifts to the innermost reset so we cannot compose arbitrarily compose with multiple resets).
Just to add to this: Koka can (obviously :-)) not always determine if a function will terminate or not so it generally adds a `div` effect whenever there is the possibility of infinite recursion.
However, since most data types in Koka are inductive, any recursion over such inductive data types are still inferred to be always terminating.
In practice, it looks like about 70% of a typical program can usually be `total`, with 20% being `pure` (which is exceptions + divergence as in Haskell), and the final 10% being arbitrary side effects in the `io` effect.
these numbers are very useful to me even as a Python programmer, as they (somewhat) line up with my intuitions about how much of our programs can be expected to be written as (Python-) pure transforms, vs imperative shells. It's cool to think about a language being able to tell me this directly.
fun add-refs( a : ref<h,int>, b : ref<h,int> ) : st<h> int {
a := 10
b := 20
(!a + !b)
}
where indeed the effect is `st<h>` as the updates are observable. How the function was written before, the two arguments use a "rank-2" polymorphic type and the heaps are fully abstract -- in that case it would be unobservable but you cannot create such values :-)
This is an interesting point and comes down to the question -- what is an effect really? I argue that effect types tell you the type signature of the mathematical function that models your program (the denotational semantics). For example, the function
fun sqr(x : int) : total int { x*x }
has no effect at all. The math function that gives semantics to `sqr` would have a type signature that can be directly
derived from the effect type:
[[int -> total int]] = Z -> Z
(with Z the set of integers). Now, for a function that raises an exception, it would be:
[[int -> exn int]] = Z -> (Z + 1)
That is, either an integer (Z), or (+), a unit value (1) if an exception is raised. Similarly, a function that modifies a global heap h, would look like:
[[int -> st<h> int]] = (H,Z) -> (H,Z)
that is, it takes an initial heap H (besides the integer) and returns an updated heap with the result.
Now, non-termination as an effect makes sense: a function
like "turing-machine" may diverge, so:
[[int -> div int]] = Z -> Z_\bot
That is, its mathematical result is the set of integers (Z) together with an injected bottom value that represents non-termination. (note: We don't use "Z + \bot" here since we cannot distinguish if a function is not terminating or not (in contrast to exceptions)).
In a language like Haskell
every value may not terminate or cause an exception -- that is a value `x :: Int` really has type `Int_\bot`, and we cannot replace for example `x*0` with `0` in Haskell.
Note that in almost all other languages, the semantic function is very complex with a global heap etc, any function `[[int -> int]]` becomes something like `(H,Z_32) -> (H,(Z_32 + 1))_\bot`. This is the essence of why it much harder for compilers and humans to reason about such functions (and why effect types can really help both programmers and compilers to do effective local reasoning)
I'm wondering if you've found it useful in practice to distinguish between total and possibly-diverging functions?
It seems like it's the sort of thing that's useful in something like Agda, where you use the existence of a function (without running it) to prove that its result exists. (The type is inhabited.) Or so I've read; I haven't used it.
But if you're going to run the program, you typically want to know if a function will return promptly, and a total function could still spin for a million years calculating something, in a way that's indistinguishable in practice from diverging.
In practice, I have not (yet) found many great use cases for the distinction in Koka. It is nice to have "total" functions, but "pure" (exceptions+divergence) is still a good thing (and what Haskell gives you). And like you say, in practice we can easily have functions that just take a long long time to compute.
Still, it is a good extra check and I can see more use for the `div` effect for future verification tools where total functions can be used as a predicates (but non-terminating ones cannot).
Good to know. At compile or verification time, it seems like you could get the same problem again, though? We want our compiles to finish promptly, and a slow, total predicate could hang.
Which is to say, a possibly-diverging function seems okay to use even by a verification tool, so long as it finishes quickly in practice. Having some notion of analysis-time-safe predicates seems useful but it doesn't seem to map cleanly to being able to prove termination?
I have found total checks in languages like idris helpful, but really only for catching small mistakes that I've made (for example, recursing on the original argument)
If your language has equality proofs, for example `a = Int` then, if your language is non-total I can prove anything by divergence. So I can prove `Int = String`. This is still type-safe as long as you don't erase the proof, because you can never use the invalid proof because your program will diverge.
So if you want to be type-safe but you still want to be able to erase proofs, your proofs have to be total.
Interesting article -- thanks! I see they used the `key ^ P` encoding where the `key` is `L >> PAGESHIFT` to use the ASLR randomized bits from the free list position `L`.
In [mimalloc](https://github.com/microsoft/mimalloc) we use a similar strategy to protect the free list in secure mode. However, there are some weaknesses associated with using a plain _xor_ -- if the attacker can guess `P` that reveals the key immediately (`P^key^P == key`) or even if there is a read overflow, and the attacker can read multiple encodings, they can xor with each other, say `(P1^key)^(P2^key)` and then we have`(P1^P2)` which may reveal information about the pointers (like alignment).
What we use in _mimalloc_ instead is two keys `key1` and `key2` and encode as `((P^key2)<<<key1)+key1`. Since these operations are not associative, the above approaches do not
work so well any more even if the `P` can be guesstimated. For example, for the read case we can subtract two entries to discard the `+key1` term, but that leads to `((P1^key2)<<<key1) - ((P2^key2)<<<key1)` at best. (We include the left-rotation since xor and addition are otherwise linear
in the lowest bit).
Just some thoughts. Of course, this may be too much for the use-case. However, we found just little extra overhead for the extra operations (as most programs are dominated by memory access) so it may be of benefit.
Thanks for the reference. Espresso doesn't support recursive records. Does system in your paper allow for those? Is there anything particularly tricky about supporting them?
Hi, author here, yes your paper was a huge inspiration and I really should add a references section in the readme. The row types in Expresso use very simple lacks constraints, similar to the old Hugs TREX system, and don't permit duplicate/scoped labels. I felt this was also a pretty good sweet spot.
Thanks :-)
TREX (by Mark Jones and Benedict Gaster) is great. The beauty is that a lacks constraints can get translated to fields offsets at runtime making it super efficient!
E.g. a type like:
foo :: r/x => { x :: int | r } -> int
foo r = r.x
Gets translated at runtime to a function where the lacks constraint r/x becomes an actual offset parameter.
Thanks for the nice summary -- looking forward to read the paper!
The same idea of self-tagging is actually also used in Koka language [1] runtime system where by default the Koka compiler only heap allocates float64's when their absolute value is outside the range [2e-511,2e512) and not 0, infinity, or NaN (see [2]). This saves indeed many (many!) heap allocations for float intensive programs.
Since Koka only uses 1 bit to distinguish pointers from values, another slightly faster option is to only box negative float64's but of course, negative numbers are still quite common so it saves less allocations in general.
[1] https://koka-lang.github.io/koka/doc/book.html#sec-value-typ...
[2] https://github.com/koka-lang/koka/blob/dev/kklib/src/box.c#L...
ps. If you enjoy reading about tagging, I recently wrote a note on efficiently supporting seamless large integer arithmetic (as used in Koka as well) and discuss how certain hardware instructions could really help to speed this up [3]:
[3] https://www.microsoft.com/en-us/research/uploads/prod/2022/0... (ML workshop 2022)