Builtin macros, macro arguments, \overset and \underset#605
Builtin macros, macro arguments, \overset and \underset#605k4b7 merged 5 commits intoKaTeX:masterfrom
Conversation
xymostech
left a comment
There was a problem hiding this comment.
This looks awesome! Thanks for implementing the macro expansion code, I'm sure that'll be very useful as time goes on.
My TeX-purity is complaining that it doesn't support everything that real TeX macro expansion does, but oh well. We don't need that for now.
| tok = expansion[--i]; // next token on stack | ||
| if (tok.text === "#") { // ## → # | ||
| expansion.splice(i + 1, 1); // drop first # | ||
| } else if (/^[1-9]$/.test(tok.text)) { |
There was a problem hiding this comment.
This is looking backwards through the text, correct? Doesn't that mean that this will find a '#', then look backwards one more character to find the number? I'd assume that this would find things that look like 1# not #1. Maybe I'm missing something though, since the tests pass...
There was a problem hiding this comment.
The whole MacroExpander is working kind of backwards. It uses an array as a stack, with tokens added by lexing (one at a time) or macro expansion (may be many), and tokens removed by consumption by the parser (one at a time). The canonical way of using an array as a stack is using its push and pop methods, which modify the tail end of the array. So if subsequent pop calls should return the tokens in order, the stack itself has to have them in reverse order, with higher indices corresponding to eralier tokens. That's the reason I have two calls to reverse in there by now.
If this is causing too much confusion, we might add bigger comments than those we have, but I guess those would still get lost in the diffs you see on GitHub. Or we could make the stack work forward, and hope that a shift is fast enough. Or we could avoid removing elements, and instead maintain a pointer indicating the current position. Do you think any of these approaches would be better than what we have now?
I guess that although the difference between push/pop and shift/unshift appears to be considerable, it's probably still negligible compared to the other processing we do, so I'm willing to reformulate all of this in terms of these if you want me to.
See #493 (diff) for previous discussion on this.
There was a problem hiding this comment.
It's worse than that benchmark suggests (which is already 10x on a size-4 array): shift/unshift's cost grow linearly with array size, so things will get worse and worse as if your stack grows at all, while push/pop take constant time (amortized). So I'd avoid that. Indexing instead of removal is a reasonable way to go, though. (In general, it sounds like you're looking for a deque data structure.)
There was a problem hiding this comment.
Aaahhh! I see now. I didn't notice that expansion was reversed earlier. This is fine, it was just a bit confusing.
There was a problem hiding this comment.
@edemaine sure I'd want a deque, or a proper stack. Might even ask for a decently typed language while we are at it… Well, looks like we can stick to the reversed stack for now.
| expansion.numArgs = numArgs; | ||
| this.macros[name] = expansion; | ||
| } | ||
| if (expansion.numArgs) { |
There was a problem hiding this comment.
This expansion looks like it works (or I can't find any glaring flaws)! However, you might want to note for future macro-writers that this is somewhat simplified from the true TeX macro expansion? For instance, it won't expand macros inside of the arguments (which might matter if, for instance, the macro expands to something with a } in it). It also won't handle any argument lists except for plain \def\x#1#2...#n{}, whereas TeX can handle \def\x#1 some random text here #2{}.
Not sure that we'd ever actually want to implement those things, but it seems useful to note that there's some functionality intentionally missing.
There was a problem hiding this comment.
Yes, this is but the first step.
But macros inside the arguments already work. Try \underset-{\overset+/} if you want to. That's because expansion gets triggered when a token is removed from the stack, and works by putting the expanded form on the stack. So the arguments end up on the stack, from where they will get expanded once the parser gets there. Which I think is what LaTeX does, too.
Macros with unbalanced } in them are somewhat tricky. As far as I know, you can't really define them in TeX either, at least not using that notation. That's what \bgroup and \egroup are for. Come to think of it, it should be perfectly possible to use our built-in macro definition facility to define \bgroup and \egroup. The requirement to have balanced brackets is in the argument handling, and it's in the Parser down the line. Neither of these should matter. Will give that a try. By the way, do we care about the distinction between \bgroup and \begingroup?
The case of fixed strings in the macro definition is something @edemaine also already mentioned. I think we should do that eventually, but right now I don't even know how TeX does that, exactly. Will need to figure that out first, then I can see about implementing it. Might add a comment until then.
There was a problem hiding this comment.
It's probably worth having a comment in the code about the limitations.
| var numArgs = 0; | ||
| if (expansion.indexOf("#") !== -1) { | ||
| var stripped = expansion.replace(/##/g, ""); | ||
| while (stripped.indexOf("#" + (numArgs + 1)) !== -1) { |
There was a problem hiding this comment.
This would incorrectly report one argument for {#1}{#3}, correct? Not sure if we really care about that use case.
There was a problem hiding this comment.
Correct. I see declaring macros as strings as a convenience function which puts some responsibility on the people writing the macros. We'd also get problems with placeholder tokens in comments, at least if we ever introduce comments. Of course these two can cancel out: a knowledgeable developer may define a macro with an unused last argument by mentioning that argument in a comment.
I guess in the long run we'd want to offer other alternatives. For example, we'd want a way to define a macro in the main input, using \def#1#2#3{…}. The defineMacro function @edemaine rightly suggested could handle such cases for the pre-defined macros. Strings will work for 99.8% of the relevant inputs, I guess, but for the rest we'll find better solutions.
There was a problem hiding this comment.
I think initially most of the macros, will probably be ones included in KaTeX itself so it's probably okay to be a bit more strict in what we accept.
src/MacroExpander.js
Outdated
| if (typeof expansion === "string") { | ||
| var numArgs = 0; | ||
| if (expansion.indexOf("#") !== -1) { | ||
| var stripped = expansion.replace(/##/g, ""); |
There was a problem hiding this comment.
I think ## is supposed to expand to # in TeX (a way of quoting # that isn't part of an argument like #1). Presumably this is to ignore those for detecting macro arguments.
There was a problem hiding this comment.
Exactly. If I do \def\setfoo#1{\def\foo##1##2{##1#1##2}} that's a macro which takes but a single argument, so the string "\\def\\foo##1##2{##1#1##2}" should have the same effect.
| } | ||
| tok = expansion[--i]; // next token on stack | ||
| if (tok.text === "#") { // ## → # | ||
| expansion.splice(i + 1, 1); // drop first # |
There was a problem hiding this comment.
In the LaTeX world if would be. In our world it's also required if you want a hex-encoded HTML color inside a macro body.
| macros: | ||
| \startExp: e^\bgroup | ||
| \endExp: \egroup | ||
| tex: \startExp a+b\endExp |
|
|
||
| // \def\overset#1#2{\binrel@{#2}\binrel@@{\mathop{\kern\z@#2}\limits^{#1}}} | ||
| defineMacro("\\overset", "\\mathop{#2}\\limits^{#1}"); | ||
| defineMacro("\\underset", "\\mathop{#2}\\limits_{#1}"); |
There was a problem hiding this comment.
Could you add a screenshot test for these?
|
@kevinbarabash I added screenshots, and just rebased the branch to resolve one conflict. |
|
Is there a reason that this PR never got merged into master? |
… as suggested by Erik Demaine, to future-proof the code.
|
I just rebased this to use |
|
@edemaine I forgot about it. 😞 |
|
@gagern thanks for the adding underset/overset. Sorry for the delay in merging this. |
|
Thanks @kevinbarabash !! I'll have to look back at my other pending PR requests which were blocked on this. |
This fixes #484, by introducing
\oversetand\undersetas predefined macros. Shortly after starting work on this, I realized that handling of macro argument expansion was missing, too, so that's where most of the actual work was.Contrary to LaTeX, we can't error out on a naked
#outside macro bodies, since that's needed for hex colors. Well, as long as we can be strict inside macros, that should be no problem.The
\binrel@aspect of\oversetand\undersetis still missing in this pull request. Adding that may require some more work. See #484 (comment) for details.