Hacker News .hnnew | past | comments | ask | show | jobs | submit | daanx's commentslogin

> This idea is so simple, so crazy, so stupid, and works so well, but I never thought of it. Bravo to the authors.

Thanks for the nice summary -- looking forward to read the paper!

The same idea of self-tagging is actually also used in Koka language [1] runtime system where by default the Koka compiler only heap allocates float64's when their absolute value is outside the range [2e-511,2e512) and not 0, infinity, or NaN (see [2]). This saves indeed many (many!) heap allocations for float intensive programs.

Since Koka only uses 1 bit to distinguish pointers from values, another slightly faster option is to only box negative float64's but of course, negative numbers are still quite common so it saves less allocations in general.

[1] https://koka-lang.github.io/koka/doc/book.html#sec-value-typ...

[2] https://github.com/koka-lang/koka/blob/dev/kklib/src/box.c#L...

ps. If you enjoy reading about tagging, I recently wrote a note on efficiently supporting seamless large integer arithmetic (as used in Koka as well) and discuss how certain hardware instructions could really help to speed this up [3]:

[3] https://www.microsoft.com/en-us/research/uploads/prod/2022/0... (ML workshop 2022)


That's a nice overview of Hindley-Milner in practice!

For those interested, I recently have been thinking of a better way to specify type inference with principal derivations that lends itself better for type system extensions:

https://www.microsoft.com/en-us/research/uploads/prod/2024/0...

Still a bit preliminary but hopefully fun to read :-)


The man, the myth, the legend! Ok, so for extensible records: I don't have a good intuition for why the record objects would have to keep around the shadowed (hidden/duplicate) fields at run-time. Do you have a motivating example for that? Also entirely possible I missed it in the paper

(I'll check out the new paper later - thank you for the link)


Haha, thank you for your kind reply :) really enjoyed the blog post as it shows nicely that implementing HM can be straightforward -- many papers on inference are usually quite math heavy which can be a tough read.

About duplicate labels.. one needs to retain the duplicate field at runtime _if_ there is a "remove_l" or "mask_l" operation that drops a field "l". For example, `{x=2,x=True}.remove_x.x` == `True`. (Where the type of `remove_l` is `{l:a|r} -> {r}`)

This comes up with effect systems where we could have 2 exception handlers in scope, and the current effect would be `<exn,exn>` (which corresponds to a runtime evidence vector `evn` of `[exn:h1,exn:h2]` where the h1,h2 point to the runtime exception handlers.). If a user raises an exception it'll select `evn.exn` and raise to `h1`. But a user can also "mask" the inner exception handler and raise directly to `h2` as well as `evn.mask_exn.exn`.

One could design a system though with different primitives and not have a remove or mask operation, such that the duplicate fields do not have to be retained at runtime (I think).

(Anyway, feel free to contact me if you'll like to discuss this more)


And Andrew Kennedy gave a very nice lecture on this recently as part of Xavier Leroy's seminar on control structures:

https://www.college-de-france.fr/fr/agenda/seminaire/structu...


Yes -- I think historically the power of condition handling was not well understood and algebraic effect handlers were a "rediscovery" coming from well-studied category theory (Plotkin, Power, and Pretnar).

If you want to play with "structurally typed condition handling", then the Koka language has "row-typed algebraic effect handlers" that compile to C: <http://koka-lang.org>


The modern way of call/cc and shift/reset is using Effect Handlers instead [1].

If you are interested in this, Ningning Xie and I give a tutorial about effect handlers (and more) at ICFP'21 tomorrow (Thursday 12:30 EST): <https://icfp21.sigplan.org/details/icfp-2021-tutorials/5/Pro...>

[1] <https://koka-lang.github.io/koka/doc/book.html>


I'm not convinced that effect handlers are other than (at most) just a different name for delimited continuations.


I like the characterization that Andrej Bauer uses: "while relates to goto, as effect handlers to shift/reset" :-)

That is, you indeed need delimited continuations to implement effect handlers, but they have more structure than, say, shift/reset. In particular, instead of being able to just yield (shift) with any function to an innermost context (reset), you can only perform a certain set of operations that are handled by a function defined in the handler.

This gives various advantages, in particular,

- you can give simple types to effect handlers (while shift/reset for example needs an answer type system), and you can use fine grained effect typing (instead of just a "shift" effect, you have "<exn,console>" effects)

- you can do better reasoning as the set of operations is restricted

- and (as a consequence) you have more opportunity to optimize them. In particular, tail-presumptive operations can be implemented to execute "in-place" without capturing the continuation for example.

- finally, different effect handlers are compositional and can be combined freely (unlike shift/reset; for example shift aways shifts to the innermost reset so we cannot compose arbitrarily compose with multiple resets).


They are much more structured than mere delimitation. That's the whole point


Interesting! Can one have effect handlers in an untyped language such as scheme ?


Yes! a nice aspect of effect handlers is that they have an untyped dynamic semantics.

Having said that, tracking effect types can be very beneficial, especially if you do interesting control flow like async/await for example.


Just to add to this: Koka can (obviously :-)) not always determine if a function will terminate or not so it generally adds a `div` effect whenever there is the possibility of infinite recursion.

However, since most data types in Koka are inductive, any recursion over such inductive data types are still inferred to be always terminating.

In practice, it looks like about 70% of a typical program can usually be `total`, with 20% being `pure` (which is exceptions + divergence as in Haskell), and the final 10% being arbitrary side effects in the `io` effect.


these numbers are very useful to me even as a Python programmer, as they (somewhat) line up with my intuitions about how much of our programs can be expected to be written as (Python-) pure transforms, vs imperative shells. It's cool to think about a language being able to tell me this directly.


Ah, I think Stephen meant to write the following:

  fun add-refs( a : ref<h,int>, b : ref<h,int> ) : st<h> int {
    a := 10
    b := 20
    (!a + !b)
  }
where indeed the effect is `st<h>` as the updates are observable. How the function was written before, the two arguments use a "rank-2" polymorphic type and the heaps are fully abstract -- in that case it would be unobservable but you cannot create such values :-)


(Daan here, creator of [Koka](https://github.com/koka-lang/koka)

This is an interesting point and comes down to the question -- what is an effect really? I argue that effect types tell you the type signature of the mathematical function that models your program (the denotational semantics). For example, the function

  fun sqr(x : int) : total int { x*x }
has no effect at all. The math function that gives semantics to `sqr` would have a type signature that can be directly derived from the effect type:

  [[int -> total int]]  = Z -> Z
(with Z the set of integers). Now, for a function that raises an exception, it would be:

  [[int -> exn int]] = Z -> (Z + 1)
That is, either an integer (Z), or (+), a unit value (1) if an exception is raised. Similarly, a function that modifies a global heap h, would look like:

  [[int -> st<h> int]] =  (H,Z) -> (H,Z)
that is, it takes an initial heap H (besides the integer) and returns an updated heap with the result.

Now, non-termination as an effect makes sense: a function like "turing-machine" may diverge, so:

  [[int -> div int]] = Z -> Z_\bot
That is, its mathematical result is the set of integers (Z) together with an injected bottom value that represents non-termination. (note: We don't use "Z + \bot" here since we cannot distinguish if a function is not terminating or not (in contrast to exceptions)).

In a language like Haskell every value may not terminate or cause an exception -- that is a value `x :: Int` really has type `Int_\bot`, and we cannot replace for example `x*0` with `0` in Haskell.

Note that in almost all other languages, the semantic function is very complex with a global heap etc, any function `[[int -> int]]` becomes something like `(H,Z_32) -> (H,(Z_32 + 1))_\bot`. This is the essence of why it much harder for compilers and humans to reason about such functions (and why effect types can really help both programmers and compilers to do effective local reasoning)


I'm wondering if you've found it useful in practice to distinguish between total and possibly-diverging functions?

It seems like it's the sort of thing that's useful in something like Agda, where you use the existence of a function (without running it) to prove that its result exists. (The type is inhabited.) Or so I've read; I haven't used it.

But if you're going to run the program, you typically want to know if a function will return promptly, and a total function could still spin for a million years calculating something, in a way that's indistinguishable in practice from diverging.


In practice, I have not (yet) found many great use cases for the distinction in Koka. It is nice to have "total" functions, but "pure" (exceptions+divergence) is still a good thing (and what Haskell gives you). And like you say, in practice we can easily have functions that just take a long long time to compute.

Still, it is a good extra check and I can see more use for the `div` effect for future verification tools where total functions can be used as a predicates (but non-terminating ones cannot).


Good to know. At compile or verification time, it seems like you could get the same problem again, though? We want our compiles to finish promptly, and a slow, total predicate could hang.

Which is to say, a possibly-diverging function seems okay to use even by a verification tool, so long as it finishes quickly in practice. Having some notion of analysis-time-safe predicates seems useful but it doesn't seem to map cleanly to being able to prove termination?


I have found total checks in languages like idris helpful, but really only for catching small mistakes that I've made (for example, recursing on the original argument)


If your language has equality proofs, for example `a = Int` then, if your language is non-total I can prove anything by divergence. So I can prove `Int = String`. This is still type-safe as long as you don't erase the proof, because you can never use the invalid proof because your program will diverge.

So if you want to be type-safe but you still want to be able to erase proofs, your proofs have to be total.


Interesting article -- thanks! I see they used the `key ^ P` encoding where the `key` is `L >> PAGESHIFT` to use the ASLR randomized bits from the free list position `L`.

In [mimalloc](https://github.com/microsoft/mimalloc) we use a similar strategy to protect the free list in secure mode. However, there are some weaknesses associated with using a plain _xor_ -- if the attacker can guess `P` that reveals the key immediately (`P^key^P == key`) or even if there is a read overflow, and the attacker can read multiple encodings, they can xor with each other, say `(P1^key)^(P2^key)` and then we have`(P1^P2)` which may reveal information about the pointers (like alignment).

What we use in _mimalloc_ instead is two keys `key1` and `key2` and encode as `((P^key2)<<<key1)+key1`. Since these operations are not associative, the above approaches do not work so well any more even if the `P` can be guesstimated. For example, for the read case we can subtract two entries to discard the `+key1` term, but that leads to `((P1^key2)<<<key1) - ((P2^key2)<<<key1)` at best. (We include the left-rotation since xor and addition are otherwise linear in the lowest bit).

Just some thoughts. Of course, this may be too much for the use-case. However, we found just little extra overhead for the extra operations (as most programs are dominated by memory access) so it may be of benefit.


Looks like a fun project, very nice :-)

For those interested in more of the beautiful theory on extensible row types, this project seems to be based on an earlier paper I wrote on scoped labels: https://www.microsoft.com/en-us/research/wp-content/uploads/...


Thanks for the reference. Espresso doesn't support recursive records. Does system in your paper allow for those? Is there anything particularly tricky about supporting them?


Hi, author here, yes your paper was a huge inspiration and I really should add a references section in the readme. The row types in Expresso use very simple lacks constraints, similar to the old Hugs TREX system, and don't permit duplicate/scoped labels. I felt this was also a pretty good sweet spot.


Thanks :-) TREX (by Mark Jones and Benedict Gaster) is great. The beauty is that a lacks constraints can get translated to fields offsets at runtime making it super efficient!

E.g. a type like:

foo :: r/x => { x :: int | r } -> int

foo r = r.x

Gets translated at runtime to a function where the lacks constraint r/x becomes an actual offset parameter.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: