> The best approach is to write your definitions the way you like, mostly ignori...

> The best approach is to write your definitions the way you like, mostly ignoring speed and cycles and byte count. This should result in clean, elegant definitions. This first step reveals some short words like your @+1 that could be good candidates for an implementation in assembler.

Yikes! When I get into debates about this, writing part of the program in assembly usually comes up eventually. This is a good example of how inefficient the system is when you have to write assembly to do what would be *ptr++ in C. Another example is something like "swap 5 + swap" which burns all kinds of cycles. Someone recommended I rewrite this in assembly as well although it would only be something like "x+=5" in C. The absurdity of that speaks for itself.

>The Python basis for this port needs some optimizations to begin with. If I am not mistaken, all tiles are square. No need for a t_height or a t_width. No need for both in the tiles structure either.

Sure, but storing those as one value instead of two would also slightly speed up the C and assembly versions. Forth would still be just as slow and inefficient compared to the other languages if you made that change.

> The C port of the Python program can be improved. Tiles are defined, then there's a big array of pointers to those tiles. This is necessary because the tile data does not have a constant length, but it seems to me preferable to have an array of structures defining the geometry and colors of the tiles, and a pointer to the pixel data. The size is the same, just a different layout, but I expect that delaying the indirection from fetching the tile to fetching the data of the tile to be more convenient for the code.

No, the color data varies as well since some tiles have no color pairs and the rest have a varying number. Also, different color pairs can be applied to the same tile depending on the situation, so there is no 1:1 correspondence that would make your suggestion make sense here. In any case, adding one level of indirection to save memory and keep the system organized by passing an 8-bit index instead of a 16-bit pointer adds a few dozen cycles to a function that takes around 20,000 to draw the tile. Do you mean that if the C and Forth versions were reorganized, there wouldn't be such a large speed disparity between the two? That is highly doubtful.

>I am puzzled to hear this complain from someone who can program in assembly. Forth and assembly are both un-typed, "no declaration needed" languages. What makes those properties acceptable in asm but not in Forth? Your expectations of Forth being a higher level language maybe?

The original criticism is that you can't tell what anything takes or returns. If you look at the assembly, you'll see that the function calls there are just as clear as in C since we have named arguments and can instantly see what happens with the return value after the subroutine call. It's very telling that it's a lot easier to implement a usable scheme for local variables in assembly, primitive as it is, while Forth still lacks this.

> Sorry but no. I have no motivation to do that - I dislike coding challenges and prefer to code things that are actually useful for me - and nothing to gain from it.

Fair enough. How about an existing example you could point to then? There are a lot of unbelievable claims about the performance of Forth compared to C and assembly, but no one seems to be able to prove any of this. That's one of the main motivations for doing this project.