-
Notifications
You must be signed in to change notification settings - Fork 44
Description
Hello!
I wanted to give an update on what things have been going on. This is a bit rushed / incoherent, but just wanted to write a rough idea of what all has been going on with Wiz for the past few years, what I'd like to do, what could be done next as an open-source community/user group/interest group around this.
First off: To anyone who reads this: feel free to mirror this repository elsewhere for preservation purposes, or fork this project to keep development on-going! After all, the project is open-source and permissively licensed, even if I may have my own goals for my maintainership. While it's a small audience, some people have used this project, and may want to continue to do so. Would be nice to preserve this work in the face of potential economic/cloud service/global landscape changes, as well as lots going on in my personal life right now as I try to find new dayjob work + try to get back more time for open-source.
I have been trying and not yet succeeding the many past years to advance the project in new directions, but the existing build meanwhile is stagnating. I am trying to see if I can share this project with the open-source community better while still advancing my vision of it, and what that could look like. Very busy at the moment.
Summarized some things on Discord, this is another rehash of some things I was working on / trying the past while sort of under private development, not exhaustive, anyways, here's the outcome of some of these rewrite attempts:
rewriting in Rust
would have been a massive undertaking and isn't exactly the most fun rewrite to do considering many things that work in the existing build are hard to supplement, and the safety stuff restricts performance (and more importantly certain safe-but-not-borrow-checkable) on the compiler (which encourages lots of clone(), lots of extra book-keeping for simple things, can't get into exact specifics, but ran into many + workarounds were piling up), and would introduce lots of dependencies if not through the standard library or handrolled. (I prefer monolothic repositories if on for the fact that they have less upstream dependencies aside from the compiler/library changes. This way they can be fixed in isolation even if can be more maintenance.) If someone else wants to do a reimplementation in Rust, go for it. But not for me.
Rust is relatively mature now, but still requires a lot of macros/crates for metaprogramming features missing in the language/crates to supplement/nightly features to support const fn/etc things. Also, any unsafe code (if used at all to escape the safety constraints which Rust enforces) is quite hard to write correctly due to the amount of undefined behaviour that can occur and which isn't well-documented/clearly explained. If I were to rewrite, I don't just want to support existing workflows, I want to push things in a better long-term direction. I like the safety aspects + relative maturity compared to most solutions, but it feels more suited to high-level application code if you're okay with trading some memory/speed performance, some ease-of-writing for safety, or for writing lots of unsafe wrappers that you need to maintain, or for dealing with many upstream crate dependencies.
rewriting in Zig
was looking promising for a while, but kept running into small roadblocks, and along the way, they kept breaking features that I relied on for development. I can't rewrite a compiler while also rewriting the whole codebase every time they cut a feature on a whim because "slightly faster (maybe, but not necessarily)", "there should one be one right way", "easier to read than write (questionable)", instead of letting the user opt-in gradually, over devtime. But I understand they're going for something fairly tecnical / "state-of-the-art", just really abrasive feeling on further revisions and reading their github / roadmaps.
Encountered stuff like: Errors not produced in a stable ordering. No multi-line comments. Annoyingly pedantic _ discarding and const enforcement at all times, even during iteration. Breaking their build system, killing their own standard library / language features on a whim for questionable gains (and I lean on their standard lib somewhat in part because of how volatile the main language is / lots of metaprogramming comptime stuff has already been sorted out). Happened multiple times. Worse debugger story than any other language I've used recently. No easy way to lint. Annoying verbose casts that are restricted to specific types. Meta-programming features that raise compile errors rather than returning optionals which could spare the code writer a lot of time. No automatic type deduction/type inference, except anytype, or computations of types, or explicit type arguments. You have to write your own overloading. Comptime error messages omit context (even with -freference-trace). Imports that suck, annoying to do conditionally compiled submodules.
Recent changes to remove all "managed" data structures (eg. ArrayList in favor of ArrayListUnmanaged) do indeed get rid of "allocator field bloat" in things, but it feels extremely heavy-handed to just force that on the user right off the bat. Maybe it wouldn't feel as bad, if they didn't rug pull on existing projects.
I loved the general idea of comptime computation, and super fast build times, but until it stabilizes more, and hopefully dials back on the "(questionable, because of all the workarounds) readability and (negligible) speed at all costs", and I could make enough wrappers around their restrictions, I don't think it's worth re-re-rewriting and not being able to advance things. I could version-pin things, but it would take ages to migrate later, the larger the codebase grows. TL;DR lots of reasons it doesn't work for me.
But hope they keep going, maybe restore their image and make it more useful for actual compiler users, not just the writers of the language. I think most projects without full-time resources to put on active migration will get exhausted with every revision. I'm sparing my own energy on that for now, unless it seems to suddenly become fruitful again or I'm okay with a version-pinning compromise to things.
rewriting in C++
tried a few times, some were too tricky to maintain, some were too low-level (eg. "no libc" build that was too handmade community esque from the get-go; would have taken ages to build up the codebase necessary to get it off the ground from scratch in my spare time, had fun experimenting though). STL and recent standards have all been bloated and various levels of bad support. I want a get to a minimal libc/no libc thing possible though, and ideally remove dependency on any C++ runtime library features, just use enough to do metaprogramming. (Zig was promising to replace that until they made too many volatile decisions, as mentioned above, that would have been easier to just reimplement myself in C++ by this point.)
rewrite in C
too much stuff you need to provide yourself, too many footguns C++ avoids (while C++ creates many more), conforming C89 has limited support for 64-bit systems, later C standards not available everywhere. Most portable, but at the expense of lots of little things to work around non-mainstream vendor quirks if they're not 100% standard conforming or are in a freestanding environment.
There's lots of attempts at specifying things / respeccing the design, but never fully succeeded with this and can't really provide this work.
Sadly all of this work, is not in a state I can share easily share. Probably won't be that useful to see my warehouse of bike-sheds, and I don't want to get too locked down while I develop things privately.
Current thinking for the future:
- Attempting to give up on Rust/Zig for now, as neither is perfectly suited for this, for me.
- C++ is stable even if memory unsafe, and has many years now of standards support for C++14. I don't trust C++ later standards future though, but as long as compilers continue to support building for earlier standards, there's a path ahead there. C++ macros can still paper over some deviations between vendor implementations/versions.
- Gradual rewrite/fork of existing code merging in design/ideas from my Zig/Rust/many design sketches.
- Keep maintainance of existing C++ build alive, ideally by opening that up to other people, so it can continue to thrive + move along.
Here's a mega-list some goals for my rewrite, (not ordered atm, probably missing some things, some repetition here, some explored/spec'd further, but don't have time to elaborate each one, just a laundry list of next steps to consider):
- A bunch of tools / features, use the ones you need for your project, rather than "one way" / lock-in.
- Macro assembler with lots of fancy high-level features + types for speeding up writing code that can't precise at details in higher-level languages
- preserve 1:1 targetting of instructions, make this better for weird edge-cases that existed in the previous language by various other goals here
- compress code into higher level constructs when goto spaghetti isn't your style, with enough customization points to tailor the code it generates, and still having the option to avoid this sugar
- Anticipate many use-cases from both high-level language users (eg. C, Rust, Zig, etc) and various asm users (eg. CA65, WLA DX, RGBDS, etc),
- Support legacy hardware better than LLVM or C compilers do, due to better direct representation of the machine that it targets
- Compatible with old code as much as possible, while improving features. Minimize breakages / have versioning to allow migration
- Compile-time computation, explicit inlining
- Some form of OCaml style parameterized modules or templating by expressions, rather than full compile-time eval or preprocessor macros.
using
constructs to "outline" / expand code, similar toinline
, but can generate declarations in the outer scope. Useful for generating lots of boilerplate code, similar to what preprocessor macros would do in other languages, a bit more hygenic but requites semantic analysis and ability to add more passes to the compiler as stuff is expanded. Especially useful for instruction set bindings, not strictly necessary but very nice to maintain things. Constructs likeusing if
,using for
(loops over some iterable and generates an instantation each time, similar toinline for
),using
blocks (just inserts the contents, similar to a block but contents are lifted outside)using
of a module/explicit parameterization (lifts contents of module into scope)- Explicit ordering of imports/using when needed, but without needing unity builds / forward declarations. Normally point of definition would be the first-instance, latter definitions would be references. Add
extern
+primary
keywords to alter this behaviour -- if a symbol has a primary definition, it becomes the definition site, extern symbols are references that do not define the symbol, just expose its definition for use + later fill-in. Regular import still exists for most cases where you don't need to explicitly order dependencies. - Better importing, that supports passing arguments to imports (similar to using instantation, but for files), selective imports, better ability to control import order by explicitly saying where code is defined vs referenced (while allowing automatic order if no other uses encountered), ability to import redundantly for mirrored code across banks.
- Use the
primary
keyword to explicitly instantiate module uses in specific spots, so instantiations of modules aren't magically inserted without your knowledge, and they can be directly addressed/pinned to a location in the final program - meta-features that allow expanding a compile-time expression at an early phase to generate an identifier. eg.
$."abc_{x}"
becomes the indentifierabc_123
. Can be mixed withusing for
, etc. private
qualifier for declarations. Useful in certain blocks to keep symbols anonymous / inaccessible outside the block if they'd otherwise be visible via . notation, using/import. Also prevents name clashes by ensuring the symbol. Especially forusing
blocks with anonymous local variables, but also internal details you don't want exposed outside the file/module it's defined.- better dependency ordering, using topological sorting, rather than a fixed number of passes
- "opcode modules" with bunch of better ways of directly expressing addressing modes / registers / instruction signatures
- overloading for all functions, in a consistent way, so both instruction selection and user code can rely on it. Explicit ways of selecting an overload, so it can always be addressed directly from other code
- ways to register functions with various runtime operations/arithmetic operations in the language, and specify their arithmetic properties/argument bindings/flag results/etc
- niceities to allow higher-level contexts with register allocation/scratch space/calling conventions, mixing constants/vars/code in same source without explicitly switching sections as much. Thus saving time on book-keeping registers, etc, but allowing other code to still do direct register/variable use as is done now.
- per-platform "sensible defaults", so people could write a "hello world" in a cross-platform way, and get started before diving further. These could be grouped into various profiles/schemes.
- opt-in register selection (either by a scheme), opt-in stackframe allocation, opt-in
- forced evaluation at runtime or compile-time by some attributes
- reference types + references arguments for inline/opcode functions.
- conditional attributes, conditional qualiers.
- attributes for high-level control structures to ensure the code generated
- inline functions that can take partial compile-time arguments
- In-language definition of instruction sets
- In-language definition of link formats / output targets.
- better testing coverage
- Types and values are both expressions, can be mixed
- "asm mode" for easier migration of some code from other projects / easier maintaining of interop, and exercising instruction database more directly. Some "dialect" options to handle common inconsistencies like [] vs (), # vs not, $0x vs 0x vs 123h, %00101 vs etc, tab requirements, directive characters, string delimiters, comment delimiters, EQU vs =, local label prefixes, inconsistent addressing modes/namings of instructions on some assemblers
- way to import symbol tables/objects from other projects to allow using popular asm libraries that already exist without porting them.
- opt-in peephole optimizations
- opt-in static analysis,
- opt-in high-level optimizations by arithmetic properties
- opt-in unused code removal or warnings/errors
- opt-in synthesis of missing instructions based on existence of other instructions + their asserted properties.
- opt-in register selection from a group of registers when equivalent to one another
- way to explicitly retire registers, or automatically reuse based on liveness analysis
- ways to embed submodules in different platforms (eg. for uploaded code on CA65)
- text-based targets (eg. equivalent C code, asm code dump for interop with existing project,)
- VM targets, including VMs for scripting within a possible low-level asm project. WASM, JVM,
- stack-based machine support (rather than registers, haven't spec'd what that would look like, but would allow more targets)
- address spaces: separate code and data addresss spaces, default address spaces, that can be customized for code/data/etc that might be different (eg. 65816's data bank vs program bank, harvard architectures), explicit address space selection per declaration of group of declarations, and easy way to pass this into submodules from an importer at a higher position up the stack so it can be easy to move modules to different banks.
- data pointers within specific address spaces
- data pointers within some struct's memory, allowing for automatic seeking to fields when random access isn't as easy and would require multiple register loads, with ways to take advantage of the data type's alignment properties to more quickly seek fields or only update a sub-register field.
- linker helpers to locate free space, automatically fit parts with different allocation schemes (rather than strictly linear code emission, code visitation), report unused
- multiple debug/symbol file targets.
- "sufficiently smart compiler" but only by exposing the smarts to let the user control, since it's meant to be a stand-in for assembly first, with some high-level portable abstractions to make it easier to write correctly/reusably within some person's definition of "portable" that can be fairly configured.
On my end, mostly focusing on the "macro assembler" aspect first, but the high-level stuff you can open up with better metadata is hard to miss once you're already putting in the work to define a full instruction table with type information + register results + flag results + arithmetic properties. Trying to scope this, but I'm treating the whole thing fairly open-ended and wanted to do something pretty ambitious to allow this to be a multi-use DSL / tool for homebrew stuff or targetting.
Various Influences: C, Rust, Zig, C++, OCaml, Python, Odin, D, Haskell, Lua, LLVM, CA65, RGBDS, NASM, WLA DX, NESASM, GNU ld, ld65, customasm, NESHLA, HLAKit, Atalan, Millfork, MagicKit, uxn, 1ml, many others I'm forgetting right now. I look at a lot of languages/assemblers/linkers/homebrew devkits as I brainstorm stuff since it wants to do all of this from the compiler, ideally without much of a build system.
Legacy: Wiz (this C++ version), Wiz (D version), nel (D version), nel (C++ version), nel (Lex+yacc version).
Discord isn't a good venue for long-form discussion, neither are github issues (although I can take this as a comment thread for anyone else who wishes to talk/exchange ideas here as things evolve).
I also don't know if this site is a long-term solution to keep on the big cloud services. Trying to see if there's other efforts, or at least things we can do to mirror that.
I'm sorry to anyone who's been waiting on things, who sent emails that didn't get answered (but I did try to read if I saw, feel free to chime in two cents somewhere, can't really do too many 1:1 communications for this stuff or offer support, so apologies if I ignored your email, was hoping to originally get around to some of those).
You can always use the existing code + fork if you don't want to wait / want to step up in your own way. I'm one person, and I have my own vision for this, that might not be the one others' way. Feel free to explore this space and come up with neat ideas! I think as-is, Wiz could be considered a "finished" project, but not successful at every goal, but I have ambitions to make a new version, and no clear timeframe I can promise if it happens. Does that mean this project is dead? I hope not, just unable to progress in its current form without more free time/disposable income/maintainers until it can maybe pay off.
Orginally Wiz was just a tool set for my own games, and it has been rewritten more than once now over the course of over a decade in my free time between other things I want to make. But the trap of making a project and having mild success with it is, that it creates more expectations on it to become a better thing, unless managed. I don't feel pressure to release anything, but rather just to let people know what they're getting into and whether they're accepting of the current state/want to wait.
I've tried to think of ways to address that and hopefully make a "final version" that could make even more people happy to use it. But as is, it's a fun project for jamming out games with somewhat suboptimal asm at times for a few system, fairly close 1:1 to asm with some niceities. Usable for prototyping, gets harder with larger scope projects without supplementing with extra tools, working around quirks.
I thought I could deliver this way sooner, but needed to prioritize other stuff. Also want to work on a more iterative rewrite using code that already exists rather than destroy-and-rebuild, if I can. After so many issues with the other rewrite approach, it's only sustainable if you can do in a quick timeframe. It's still nice to have researched a bunch of potential roadmaps/avenues to go. If any of the stuff inspires someone more motivated to make a new language that covers some of these ideas, or makes someone inspired to make a fork of their own, go for it!
As for documentation/etc, feel free to expand on it, community grow it, etc. I personally prefer lower tech solutions / offline solutions that don't require JavaScript or a fancy "framework of the day", ways to digest the same content into multiple target formats that are convenient to the user. I also couldn't really vet integrations for tools I don't use. This is why I hestitated to merge the one effort to do so, despite looking promising. I don't want to discourage these "unofficial" efforts, they can always be nice, and this whole thing is open-source, so feel free to comb through this for ideas.
Older issues on here are probably not as relevant anymore to the new efforts, but some might give cool directions for the existing repo. my fork is based off of the stuff currently in the repo at the time of writing + lots of personal research I can't really easily share in a useful way atm.
If other people want to step up and organize/forums/spitball ideas, I'd ideally myself like to move off centralized big-cloud infrastructure, but also don't have money to pay for hosting. I'd like a forum like place, ideally one that's mature/existing, or someone else who feels like hosting. I'll put the effort to solving these problems myself if I have time later / community support, but I'm also ok with to let someone else taking that part.
Okay going to nap a little. Have a good day everyone! I might not check this thread that frequently, but feel free to chime in if you want.
TL;DR:
- this is open-source, fork or use how you like, as long it respects the license!
- the old project had reached "toy language but with a bit more competency" level, with buildable examples proving it can be useful but wanted to push further for longterm projects.
- Rust + Zig rewrites were dead-ends, for me in my free time. Going to try like rewrite attempt number 5 or 6 (lost count), back in C++. We'll see how it goes when I have the time!
- wanted to try a big ambitious rewrite that anticipates many user needs from other assemblers / compilers / linkers, both high-level conveniences and low-level power-user customization points.
- unfortunately can't elaborate too much on stuff above, but tried to summarize as best I could.
- got various emails over the years for both asking help / asking if they could volunteer, but couldn't really give time the past while to troubleshoot others' use-cases, wasn't able to really find time for bringing in other maintainers + make sure we aligned.
- my goals might be different than others, that's fine.
- can't really guarantee I can field questions
- others can leave comments/find ways to organize if it would be helpful
- want to give maintainership/ability to manage things for other interested users who can be good stewards (don't have to vet myself, just go ahead + form your own community if you want to, and I give my blessing)
- I'm not that available for this, but letting others step in / talk next steps
- want to do your own language like this? want to "steal an idea"? take what you want for study. I really don't mind. Ideally would be nice to have some acknowledgement, and license still applies to existing source, but take notes/inspirations if it helps,
- would love more interop with other build tools/languages
- better docs, better discussion forums (existing forum? someone else want to host it?), chat outside discord (IRC? Matrix?)
- repo mirroring, offline docs, offline backups, lean better into the decentralized repo nature of git than using strictly github (even if it still may be nice for the central visibility / free open source).
- more community involvement! figure out healthier roadmap for the future!
Thanks everyone who tried this, and to the greater homebrew development / compiler / open source community that made this work possible to sustain!
(Finally apologies for any typos / incomplete sentences, may try to edit later + indicate so in the OP depending on how much traffic/noise there is.)