> When considering minimal implementation, one will often try to choose between the easiest thing to parse ('b8') or easiest thing for human comprehension ('copy-to-EAX'). Why is 'b8/' in there? Are you using 'b8' + stop at whitespace for easy parsing while leaving ignored characters in between only for readability? And were there any drawbacks when you considered just dropping the b8/'s and such.
You kinda deduced the reason. I'm trying to walk a fine line between keeping the implementation small and keeping the interface at least in some realm of ergonomic. This isn't a notation to bootstrap as quickly as possible and then treat like a red-headed stepchild. I want it to be habitable all by itself. A uniform but strict syntax with lots of room for free-form comments is the best approach I've found so far. The nice thing about restricting readability concerns to comments: error handling stays simple. And error handling is often where a compiler spends a lot of LoC (and still ends up with a highly sub-optimal result).
> The one thing I'm unsure on, which I didn't see in yours (maybe not applicable) was structs. They were the one change in C that let UNIX be ported from assembly.
I'm already starting to think about type definitions once I finish bootstrapping the SubX implementation. Definitely the #1 high-level feature I want. The plan is to grow a small notation that occupies the same niche as C, but with heavy emphasis on keeping the implementation small and transparent, and zero emphasis on portability. That should eliminate C's problems of undefined behavior.
It may look a lot like C, but that's still open. I think having it look different may help reinforce the right expectations: it's not going to be C, there's never going to be any sort of bug compatibility. I'm also planning to avoid complex nested syntax for arithmetic or pointers. I'd like to preserve a 1-1 correspondence with machine code as long as possible. Makes error messages much easier to design.
Your idea of showing before/after is awesome. I'm never going to do it in the context of C :) but I am going to totally steal it at some point.
Brings me to another point. We focus a lot on how much code something takes and how readable it is. The alternative is generating boiler-plate like code which isn't readable but declaration and generator is. Think grammars fed through parser generators. I was thinking extra lines of code might be worth it for automating away parsing and error handling parts with DSL's. I mean, does it really matter if those parts aren't readable if we know they're just tedious bullshit that readable code generated in a structure you understand reasons for?
Maybe just personal preference here. I'm doing generative approach like Kay/STEPS did for anything tedious anywhere possible if I can. That lets me dodge overhead of error handling and...
The main advantage of them is they reduce cognitive overhead of juggling nearly meaningless details. You even mention understanding as a goal of SubX. Seems like it's still worth more exploration. Maybe it's a declaration that has the offsets etc, your interpreter saves it, and you have create/set/get/delete syntax that gets translated into simpler stuff by interpreter. Basic, rewrite rules. Back when I did it, I just add fields to struct name with .syntax as strings in translation for output code or traces. So, your syntax might let you do a one liner. The executable code and traces will show full thing it's doing with a comment on top containing the mapping. Maybe it shows you that on first run as you write the program so you can check it while everything still fresh in mind. Like REPL's.
You seem mostly to be doing ASM here. I get why. Maybe consider allowing one, small layer of indirection with still clear mapping. Hyde's HLA is inspirational here. If you want simple asm, you can get it. If you want understandable code (!= asm), you can selectively use the "high-level" stuff which still isn't high-level like C++ or Java. Obviously, yours would be simpler than Hyde's.
That's different. It will stop most undefined behavior. Also, that there's just one implementation.
re type definitions
Definitely look at Cyclone and Typed Assembly for ideas on attaching different, semantic meanings to low-level things like pointers. Ada's combo of type and bit represenation is worth trying. Maybe look at whatever Zig guy is doing, too, since he's aiming for safety and simplicity. If you want to get wild, you can throw a whole Prolog interpreter in there like Shen guy. I don't suggest that.
Although clueless about them, I do notice similarities to basic pattern matching and rewrite rules when I look at type systems in papers. That's what also powers macros and HLL-to-LLL conversions. There could be a set of primitives you could adapt to handle them all by just attaching the context somehow. Maybe also with templating where the primitive expresses core idea but is adapted to setters/getters, bitsize, and ranges. Pulled right out of type/bit definitions.
re side by side
Glad you love it. Look forward to seeing what use you come up with.
> does it really matter if those parts aren't readable if we know they're just tedious bullshit that readable code generated in a structure you understand reasons for?
Corollary to "all abstractions are leaky": No boilerplate is ever entirely bullshit.
Don't get me wrong, DSLs can be useful (provided somebody's watching the big picture to ensure there aren't multiple notations for the same niche). But I'm still operating at too low a level for them. I'm digging myself back out to the sun as fast as I can :)
> You seem mostly to be doing ASM here. I get why. Maybe consider allowing one, small layer of indirection with still clear mapping.
Like I said, I will definitely have a notation for structs. But the implementation of the notation for structs is not going to itself use the notation for structs. That kind of circular dependency is hell on global comprehension of the big picture. The local convenience of having offset names is far too small a benefit to outweigh the global issues.
I won't be doing ASM forever. But as I add layers of notation, each implementation will only be allowed to use earlier notations. Notations will have a strict dependency ordering.