Skip to content

Builtin macros, macro arguments, \overset and \underset#605

Merged
k4b7 merged 5 commits intoKaTeX:masterfrom
gagern:overset
Apr 16, 2017
Merged

Builtin macros, macro arguments, \overset and \underset#605
k4b7 merged 5 commits intoKaTeX:masterfrom
gagern:overset

Conversation

@gagern
Copy link
Copy Markdown
Collaborator

@gagern gagern commented Jan 6, 2017

This fixes #484, by introducing \overset and \underset as predefined macros. Shortly after starting work on this, I realized that handling of macro argument expansion was missing, too, so that's where most of the actual work was.

Contrary to LaTeX, we can't error out on a naked # outside macro bodies, since that's needed for hex colors. Well, as long as we can be strict inside macros, that should be no problem.

The \binrel@ aspect of \overset and \underset is still missing in this pull request. Adding that may require some more work. See #484 (comment) for details.

Copy link
Copy Markdown
Contributor

@xymostech xymostech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome! Thanks for implementing the macro expansion code, I'm sure that'll be very useful as time goes on.

My TeX-purity is complaining that it doesn't support everything that real TeX macro expansion does, but oh well. We don't need that for now.

tok = expansion[--i]; // next token on stack
if (tok.text === "#") { // ## → #
expansion.splice(i + 1, 1); // drop first #
} else if (/^[1-9]$/.test(tok.text)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking backwards through the text, correct? Doesn't that mean that this will find a '#', then look backwards one more character to find the number? I'd assume that this would find things that look like 1# not #1. Maybe I'm missing something though, since the tests pass...

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole MacroExpander is working kind of backwards. It uses an array as a stack, with tokens added by lexing (one at a time) or macro expansion (may be many), and tokens removed by consumption by the parser (one at a time). The canonical way of using an array as a stack is using its push and pop methods, which modify the tail end of the array. So if subsequent pop calls should return the tokens in order, the stack itself has to have them in reverse order, with higher indices corresponding to eralier tokens. That's the reason I have two calls to reverse in there by now.

If this is causing too much confusion, we might add bigger comments than those we have, but I guess those would still get lost in the diffs you see on GitHub. Or we could make the stack work forward, and hope that a shift is fast enough. Or we could avoid removing elements, and instead maintain a pointer indicating the current position. Do you think any of these approaches would be better than what we have now?

I guess that although the difference between push/pop and shift/unshift appears to be considerable, it's probably still negligible compared to the other processing we do, so I'm willing to reformulate all of this in terms of these if you want me to.

See #493 (diff) for previous discussion on this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worse than that benchmark suggests (which is already 10x on a size-4 array): shift/unshift's cost grow linearly with array size, so things will get worse and worse as if your stack grows at all, while push/pop take constant time (amortized). So I'd avoid that. Indexing instead of removal is a reasonable way to go, though. (In general, it sounds like you're looking for a deque data structure.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaahhh! I see now. I didn't notice that expansion was reversed earlier. This is fine, it was just a bit confusing.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edemaine sure I'd want a deque, or a proper stack. Might even ask for a decently typed language while we are at it… Well, looks like we can stick to the reversed stack for now.

expansion.numArgs = numArgs;
this.macros[name] = expansion;
}
if (expansion.numArgs) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This expansion looks like it works (or I can't find any glaring flaws)! However, you might want to note for future macro-writers that this is somewhat simplified from the true TeX macro expansion? For instance, it won't expand macros inside of the arguments (which might matter if, for instance, the macro expands to something with a } in it). It also won't handle any argument lists except for plain \def\x#1#2...#n{}, whereas TeX can handle \def\x#1 some random text here #2{}.

Not sure that we'd ever actually want to implement those things, but it seems useful to note that there's some functionality intentionally missing.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is but the first step.

But macros inside the arguments already work. Try \underset-{\overset+/} if you want to. That's because expansion gets triggered when a token is removed from the stack, and works by putting the expanded form on the stack. So the arguments end up on the stack, from where they will get expanded once the parser gets there. Which I think is what LaTeX does, too.

Macros with unbalanced } in them are somewhat tricky. As far as I know, you can't really define them in TeX either, at least not using that notation. That's what \bgroup and \egroup are for. Come to think of it, it should be perfectly possible to use our built-in macro definition facility to define \bgroup and \egroup. The requirement to have balanced brackets is in the argument handling, and it's in the Parser down the line. Neither of these should matter. Will give that a try. By the way, do we care about the distinction between \bgroup and \begingroup?

The case of fixed strings in the macro definition is something @edemaine also already mentioned. I think we should do that eventually, but right now I don't even know how TeX does that, exactly. Will need to figure that out first, then I can see about implementing it. Might add a comment until then.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth having a comment in the code about the limitations.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added such a comment in 8b3f39e.

var numArgs = 0;
if (expansion.indexOf("#") !== -1) {
var stripped = expansion.replace(/##/g, "");
while (stripped.indexOf("#" + (numArgs + 1)) !== -1) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would incorrectly report one argument for {#1}{#3}, correct? Not sure if we really care about that use case.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I see declaring macros as strings as a convenience function which puts some responsibility on the people writing the macros. We'd also get problems with placeholder tokens in comments, at least if we ever introduce comments. Of course these two can cancel out: a knowledgeable developer may define a macro with an unused last argument by mentioning that argument in a comment.

I guess in the long run we'd want to offer other alternatives. For example, we'd want a way to define a macro in the main input, using \def#1#2#3{…}. The defineMacro function @edemaine rightly suggested could handle such cases for the pre-defined macros. Strings will work for 99.8% of the relevant inputs, I guess, but for the rest we'll find better solutions.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think initially most of the macros, will probably be ones included in KaTeX itself so it's probably okay to be a bit more strict in what we accept.

if (typeof expansion === "string") {
var numArgs = 0;
if (expansion.indexOf("#") !== -1) {
var stripped = expansion.replace(/##/g, "");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are ## being stripped out?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ## is supposed to expand to # in TeX (a way of quoting # that isn't part of an argument like #1). Presumably this is to ignore those for detecting macro arguments.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. If I do \def\setfoo#1{\def\foo##1##2{##1#1##2}} that's a macro which takes but a single argument, so the string "\\def\\foo##1##2{##1#1##2}" should have the same effect.

}
tok = expansion[--i]; // next token on stack
if (tok.text === "#") { // ## → #
expansion.splice(i + 1, 1); // drop first #
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to handled nested \defs?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the LaTeX world if would be. In our world it's also required if you want a hex-encoded HTML color inside a macro body.

macros:
\startExp: e^\bgroup
\endExp: \egroup
tex: \startExp a+b\endExp
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!


// \def\overset#1#2{\binrel@{#2}\binrel@@{\mathop{\kern\z@#2}\limits^{#1}}}
defineMacro("\\overset", "\\mathop{#2}\\limits^{#1}");
defineMacro("\\underset", "\\mathop{#2}\\limits_{#1}");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a screenshot test for these?

@gagern
Copy link
Copy Markdown
Collaborator Author

gagern commented Jan 11, 2017

@kevinbarabash I added screenshots, and just rebased the branch to resolve one conflict.

@edemaine
Copy link
Copy Markdown
Member

edemaine commented Apr 4, 2017

Is there a reason that this PR never got merged into master?

gagern added 5 commits April 5, 2017 21:42
* Ship predefined macros with the library, in macros.js.
* Allow macro arguments #1 and so on, with argument count deduced from string.
* Use these features to implement \overset and \underset, fixes KaTeX#484.
… as suggested by Erik Demaine, to future-proof the code.
@gagern
Copy link
Copy Markdown
Collaborator Author

gagern commented Apr 5, 2017

I just rebased this to use let and const.

@k4b7
Copy link
Copy Markdown
Member

k4b7 commented Apr 16, 2017

@edemaine I forgot about it. 😞

@k4b7 k4b7 merged commit 2c92a9a into KaTeX:master Apr 16, 2017
@k4b7
Copy link
Copy Markdown
Member

k4b7 commented Apr 16, 2017

@gagern thanks for the adding underset/overset. Sorry for the delay in merging this.

@edemaine
Copy link
Copy Markdown
Member

Thanks @kevinbarabash !! I'll have to look back at my other pending PR requests which were blocked on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

\overset and \underset not working, probably others

4 participants