> This brings us to one of the main shortcomings of Forth. If you need to access...

druzyek · on July 13, 2020

Hi astrobe_, interesting points! I disagree with you on a few things:

> Looking at the C version, the argument "tile" is used once in the function, to get the pointer to the tile. The pointer could be passed directly: one less local.

The argument "tile" is an 8-bit index, which is faster to pass than the 16-bit pointer to the tile. More importantly, that index is used for things other than looking up the tile pointer, such as finding color pairs to recolor the tile. In other words, you could pass the 16-bit tile pointer directly, but you'd still need a way to correlate that pointer with its matching color data, and that system would be slower than using the same 8-bit index for both.

> x and y are passed only to calculate an address, this address could be passed directly

That would be more efficient and save a handful of cycles when the function is called. This does not answer the original criticism, however. As I mentioned on the page, x and y go away after the pointer is calculated, so I'm not counting those as variables that need to be available in the loop body. You still have over a dozen variables that you can't manage efficiently in Forth.

> t0 seems to be always 0, except when it is undefined.

Good point!

> The t_height, t_with are just offsets in the tile structure. This locals can go away too

I don't think I see your point. If you're going for performance, doing one array access and saving the value saves a lot of cycles compared to indexing into the array each iteration, which eats a lot of cycles.

> trans_row only exists to set skip_pixel, it seems one of them can go away. The logic seems to be "unless trans_row and something, do something". The C version might actually be more verbose than necessary.

You could get rid of one of them at the expense of readability. Or maybe there is a good name that would indicate both purposes.

> Finally, the edge_style complicates a lot the logic. Using one function to do different things because it is so simple to "just add another parameter" is typical in many HLL, and often result in awful spaghetti code.

I don't think this function is awful spaghetti code. Yes, the logic is complicated. I mention on the page that I encoded the data for some tiles as 1 bit per pixel to save room and decided to make rounded transparent corners in code depending on a flag stored for each tile. This function is a compromise to save memory. Otherwise, you could just encode the tile like any other and have three colors including transparency but take up 8 times more memory.

> Actually even in C some follow certain naming conventions - like #defines being all caps, member things being prefixed by m_, etc. And you can do that in Forth too. Once again, "write only" has more to do with the author than wit the language.

I really disagree on this one! All caps and so on for constants is nice but still does not tell you how many values, if any, a word pops and how many it leaves behind. That alone earns the "write only" label. For example: CONST_A m_foo CONST_B m_bar. What does m_foo return and what arguments does m_bar take?

Thanks for the comments!

astrobe_ · on July 13, 2020

To be clear, I don't think this function is spaghetti code.

I believe you are victim of premature optimization. You knew Forth would be slow because interpreted, so you feared for performance. Fear often makes you do the wrong things.

The nice thing about Forth is that speed is often correlated to the number of words in a definition, so you can start writing for source/memory compactness all while partitioning the program into meaningful words as much as you can. Then assess the situation on the performance side.

> I really disagree on this one! All caps and so on for constants is nice but still does not tell you how many values, if any, a word pops and how many it leaves behind

Just like in Lua or Go you cannot tell how many values a function returns, and in most dynamically typed languages, what are the types of the input arguments. Furthermore the name alone cannot tell you about the behavior for corner cases either, like what happens if you pass a null pointer as a second argument to a string concatenation function: no-op or crash? You cannot tell until you check out the docs or read the source of the function.

Chuck Moore even claims that "stack pictures" (comments describing the stack effects of a word) are unnecessary; the effects should be obvious from the definition or if it needs more elaborate commentary, it should be documented somewhere else.

Documentation is something people don't take seriously. When you see that the whole documentation of a project consists in Doxygen files this is usually a bad sign. It might be pretty and nice at first glance, but there is more to documentation than documenting the interface of functions and dropping a few lines of introduction on top of that. Documentation should tell a story. And for this story to be well told, you cannot follow the layout of the code.

So to answer your question, what a word does should be obvious from the name or from its definition. Someone said naming things is one of the two hard problems in CS. This is where Forth shows you the real problems and demands thoughtful solutions. 47 characters-long names are certainly not the solution. Documented naming conventions and making sure a word does not do too many things are.

druzyek · on July 13, 2020

> I believe you are victim of premature optimization. You knew Forth would be slow because interpreted, so you feared for performance. Fear often makes you do the wrong things.

As I explain on the page, the Forth I'm using is not interpreted. It generates STC, and if the body of the word is smaller than a particular size which you set, it will inline the code. Someone measured the dispatch overhead for fetching the next word in FIG-Forth for the 6502 at over 80 cycles. STC only needs 12 for a JSR/RTS pair and potentially 0 if the word is inlined. In any case, if I'm trying to make each version as fast as I know how, it wouldn't matter if I knew that it was interpreted or not. On the other hand, I'm happy to look at any place in the source you find where it seems fear made me do the wrong thing.

> The nice thing about Forth is that speed is often correlated to the number of words in a definition, so you can start writing for source/memory compactness all while partitioning the program into meaningful words as much as you can. Then assess the situation on the performance side.

Speed is correlated with number of words in that it goes down when you do a lot of factoring. Admittedly this depends on the architecture and penalty for subroutine calls. I was all set up to factor all my words down to a line or two after reading Starting Forth but switched to longer but much faster words after starting this project.

> Just like in Lua or Go you cannot tell how many values a function returns, and in most dynamically typed languages, what are the types of the input arguments. Furthermore the name alone cannot tell you about the behavior for corner cases either, like what happens if you pass a null pointer as a second argument to a string concatenation function: no-op or crash? You cannot tell until you check out the docs or read the source of the function.

I've never used Lua or Go so I can't comment on that. This does not answer the original criticism. Sure, there are pieces of information in every language you don't get until you look at the documentation. The point still stands that you can tell what arguments a C function takes and what it's returning at a glance while you can't in Forth.

> Documentation is something people don't take seriously. When you see that the whole documentation of a project consists in Doxygen files this is usually a bad sign. It might be pretty and nice at first glance, but there is more to documentation than documenting the interface of functions and dropping a few lines of introduction on top of that. Documentation should tell a story. And for this story to be well told, you cannot follow the layout of the code.

Good point

> So to answer your question, what a word does should be obvious from the name or from its definition. Someone said naming things is one of the two hard problems in CS. This is where Forth shows you the real problems and demands thoughtful solutions. 47 characters-long names are certainly not the solution. Documented naming conventions and making sure a word does not do too many things are.

This is something different than what I'm pointing out. Yes, good naming will tell you what it's doing, but it doesn't tell you HOW it's doing it. Even if the word is kept short and doesn't do too much, the programmer has a lot of freedom in how they choose to use the stack, so even in well-written Forth, the word name gives no indication of what the word does to the stack. It's understandable then that you don't want a word taking 13 arguments, but once we get to the point where a valid argument could be made to have a word either take one item off the stack or two, we are stuck in the same morass where we can't tell what anything takes or returns.

astrobe_, I picked DrawTile1bpp as an example where Forth is especially unwieldy. Would you like to rewrite it as an example? Maybe if someone experienced like you took a swing at it, it would better reflect what Forth can do on the 6502.

astrobe_ · on July 14, 2020

> Someone measured the dispatch overhead for fetching the next word in FIG-Forth for the 6502 at over 80 cycles. STC only needs 12 for a JSR/RTS pair and potentially 0 if the word is inlined.

I have implemented STC with cod inlining for 8086 a long time ago, as well as DTC and bytecode, in assembly, C and Forth (sic), so I did my share of cycle counting.

Sorry if I project my younger self on you, but cycle count is not the best approach to factoring. Factoring in Forth is about spotting redundancies. This is a bit different from what people mean today by "(re)factoring", which is more about how one splits a task, a program, into functions or classes. This should be called "restructuring" instead. Refactoring in Forth is really a compression process, even when you could not care less about the size of your object code.

The best approach is to write your definitions the way you like, mostly ignoring speed and cycles and byte count. This should result in clean, elegant definitions. This first step reveals some short words like your @+1 that could be good candidates for an implementation in assembler. Some other words can be inlined using IMMEDIATE. Some others can be de-factored as a result of this refactoring-for-speed process. Doing it that would have made you realize t0_ppp is always 1, just like t0 is in the C version.

The Python basis for this port needs some optimizations to begin with. If I am not mistaken, all tiles are square. No need for a t_height or a t_width. No need for both in the tiles structure either.

The C port of the Python program can be improved. Tiles are defined, then there's a big array of pointers to those tiles. This is necessary because the tile data does not have a constant length, but it seems to me preferable to have an array of structures defining the geometry and colors of the tiles, and a pointer to the pixel data. The size is the same, just a different layout, but I expect that delaying the indirection from fetching the tile to fetching the data of the tile to be more convenient for the code.

Spotting unneeded complexity is harder than one would expect. Past the point of complexity being the consequence of laziness or lack of skill or lack of time, you find out that complexity also results from (bad) habits. You need to configure a bunch of paths for your application, you start writing routines to read them from an initialization file. Then after a while you realize you are using an interpreted language, so you could have sourced the configuration file directly, duh. I'm still making that kind of mistake after years of practice.

Simplification is a progressive process. Jeff Fox explains it better than I would in the third chapter of his essay on Forth [0].

> we are stuck in the same morass where we can't tell what anything takes or returns

I am puzzled to hear this complain from someone who can program in assembly. Forth and assembly are both un-typed, "no declaration needed" languages. What makes those properties acceptable in asm but not in Forth? Your expectations of Forth being a higher level language maybe?

In any case, all I can do is tell you that with practice, it is not as a big deal as you think, just like the weird symbols in APL are not a big deal, just like parenthesis in Lisp are not a big deal.

> Would you like to rewrite it as an example?

Sorry but no. I have no motivation to do that - I dislike coding challenges and prefer to code things that are actually useful for me - and nothing to gain from it.

[0] http://www.ultratechnology.com/forth.htm

druzyek · on July 19, 2020

> The best approach is to write your definitions the way you like, mostly ignoring speed and cycles and byte count. This should result in clean, elegant definitions. This first step reveals some short words like your @+1 that could be good candidates for an implementation in assembler.

Yikes! When I get into debates about this, writing part of the program in assembly usually comes up eventually. This is a good example of how inefficient the system is when you have to write assembly to do what would be *ptr++ in C. Another example is something like "swap 5 + swap" which burns all kinds of cycles. Someone recommended I rewrite this in assembly as well although it would only be something like "x+=5" in C. The absurdity of that speaks for itself.

>The Python basis for this port needs some optimizations to begin with. If I am not mistaken, all tiles are square. No need for a t_height or a t_width. No need for both in the tiles structure either.

Sure, but storing those as one value instead of two would also slightly speed up the C and assembly versions. Forth would still be just as slow and inefficient compared to the other languages if you made that change.

> The C port of the Python program can be improved. Tiles are defined, then there's a big array of pointers to those tiles. This is necessary because the tile data does not have a constant length, but it seems to me preferable to have an array of structures defining the geometry and colors of the tiles, and a pointer to the pixel data. The size is the same, just a different layout, but I expect that delaying the indirection from fetching the tile to fetching the data of the tile to be more convenient for the code.

No, the color data varies as well since some tiles have no color pairs and the rest have a varying number. Also, different color pairs can be applied to the same tile depending on the situation, so there is no 1:1 correspondence that would make your suggestion make sense here. In any case, adding one level of indirection to save memory and keep the system organized by passing an 8-bit index instead of a 16-bit pointer adds a few dozen cycles to a function that takes around 20,000 to draw the tile. Do you mean that if the C and Forth versions were reorganized, there wouldn't be such a large speed disparity between the two? That is highly doubtful.

>I am puzzled to hear this complain from someone who can program in assembly. Forth and assembly are both un-typed, "no declaration needed" languages. What makes those properties acceptable in asm but not in Forth? Your expectations of Forth being a higher level language maybe?

The original criticism is that you can't tell what anything takes or returns. If you look at the assembly, you'll see that the function calls there are just as clear as in C since we have named arguments and can instantly see what happens with the return value after the subroutine call. It's very telling that it's a lot easier to implement a usable scheme for local variables in assembly, primitive as it is, while Forth still lacks this.

> Sorry but no. I have no motivation to do that - I dislike coding challenges and prefer to code things that are actually useful for me - and nothing to gain from it.

Fair enough. How about an existing example you could point to then? There are a lot of unbelievable claims about the performance of Forth compared to C and assembly, but no one seems to be able to prove any of this. That's one of the main motivations for doing this project.