Builtin macros, macro arguments, \overset and \underset by gagern · Pull Request #605 · KaTeX/KaTeX

gagern · 2017-01-06T17:59:51Z

This fixes #484, by introducing \overset and \underset as predefined macros. Shortly after starting work on this, I realized that handling of macro argument expansion was missing, too, so that's where most of the actual work was.

Contrary to LaTeX, we can't error out on a naked # outside macro bodies, since that's needed for hex colors. Well, as long as we can be strict inside macros, that should be no problem.

The \binrel@ aspect of \overset and \underset is still missing in this pull request. Adding that may require some more work. See #484 (comment) for details.

xymostech

This looks awesome! Thanks for implementing the macro expansion code, I'm sure that'll be very useful as time goes on.

My TeX-purity is complaining that it doesn't support everything that real TeX macro expansion does, but oh well. We don't need that for now.

xymostech · 2017-01-06T20:15:53Z

src/MacroExpander.js

+                    tok = expansion[--i]; // next token on stack
+                    if (tok.text === "#") { // ## → #
+                        expansion.splice(i + 1, 1); // drop first #
+                    } else if (/^[1-9]$/.test(tok.text)) {


This is looking backwards through the text, correct? Doesn't that mean that this will find a '#', then look backwards one more character to find the number? I'd assume that this would find things that look like 1# not #1. Maybe I'm missing something though, since the tests pass...

The whole MacroExpander is working kind of backwards. It uses an array as a stack, with tokens added by lexing (one at a time) or macro expansion (may be many), and tokens removed by consumption by the parser (one at a time). The canonical way of using an array as a stack is using its push and pop methods, which modify the tail end of the array. So if subsequent pop calls should return the tokens in order, the stack itself has to have them in reverse order, with higher indices corresponding to eralier tokens. That's the reason I have two calls to reverse in there by now.

If this is causing too much confusion, we might add bigger comments than those we have, but I guess those would still get lost in the diffs you see on GitHub. Or we could make the stack work forward, and hope that a shift is fast enough. Or we could avoid removing elements, and instead maintain a pointer indicating the current position. Do you think any of these approaches would be better than what we have now?

I guess that although the difference between push/pop and shift/unshift appears to be considerable, it's probably still negligible compared to the other processing we do, so I'm willing to reformulate all of this in terms of these if you want me to.

See #493 (diff) for previous discussion on this.

It's worse than that benchmark suggests (which is already 10x on a size-4 array): shift/unshift's cost grow linearly with array size, so things will get worse and worse as if your stack grows at all, while push/pop take constant time (amortized). So I'd avoid that. Indexing instead of removal is a reasonable way to go, though. (In general, it sounds like you're looking for a deque data structure.)

Aaahhh! I see now. I didn't notice that expansion was reversed earlier. This is fine, it was just a bit confusing.

@edemaine sure I'd want a deque, or a proper stack. Might even ask for a decently typed language while we are at it… Well, looks like we can stick to the reversed stack for now.

xymostech · 2017-01-06T20:21:06Z

src/MacroExpander.js

+            expansion.numArgs = numArgs;
            this.macros[name] = expansion;
        }
+        if (expansion.numArgs) {


This expansion looks like it works (or I can't find any glaring flaws)! However, you might want to note for future macro-writers that this is somewhat simplified from the true TeX macro expansion? For instance, it won't expand macros inside of the arguments (which might matter if, for instance, the macro expands to something with a } in it). It also won't handle any argument lists except for plain \def\x#1#2...#n{}, whereas TeX can handle \def\x#1 some random text here #2{}.

Not sure that we'd ever actually want to implement those things, but it seems useful to note that there's some functionality intentionally missing.

Yes, this is but the first step.

But macros inside the arguments already work. Try \underset-{\overset+/} if you want to. That's because expansion gets triggered when a token is removed from the stack, and works by putting the expanded form on the stack. So the arguments end up on the stack, from where they will get expanded once the parser gets there. Which I think is what LaTeX does, too.

Macros with unbalanced } in them are somewhat tricky. As far as I know, you can't really define them in TeX either, at least not using that notation. That's what \bgroup and \egroup are for. Come to think of it, it should be perfectly possible to use our built-in macro definition facility to define \bgroup and \egroup. The requirement to have balanced brackets is in the argument handling, and it's in the Parser down the line. Neither of these should matter. Will give that a try. By the way, do we care about the distinction between \bgroup and \begingroup?

The case of fixed strings in the macro definition is something @edemaine also already mentioned. I think we should do that eventually, but right now I don't even know how TeX does that, exactly. Will need to figure that out first, then I can see about implementing it. Might add a comment until then.

It's probably worth having a comment in the code about the limitations.

I've added such a comment in 8b3f39e.

xymostech · 2017-01-06T20:22:20Z

src/MacroExpander.js

+            var numArgs = 0;
+            if (expansion.indexOf("#") !== -1) {
+                var stripped = expansion.replace(/##/g, "");
+                while (stripped.indexOf("#" + (numArgs + 1)) !== -1) {


This would incorrectly report one argument for {#1}{#3}, correct? Not sure if we really care about that use case.

Correct. I see declaring macros as strings as a convenience function which puts some responsibility on the people writing the macros. We'd also get problems with placeholder tokens in comments, at least if we ever introduce comments. Of course these two can cancel out: a knowledgeable developer may define a macro with an unused last argument by mentioning that argument in a comment.

I guess in the long run we'd want to offer other alternatives. For example, we'd want a way to define a macro in the main input, using \def#1#2#3{…}. The defineMacro function @edemaine rightly suggested could handle such cases for the pre-defined macros. Strings will work for 99.8% of the relevant inputs, I guess, but for the rest we'll find better solutions.

I think initially most of the macros, will probably be ones included in KaTeX itself so it's probably okay to be a bit more strict in what we accept.

k4b7 · 2017-01-08T00:42:31Z

src/MacroExpander.js

        if (typeof expansion === "string") {
+            var numArgs = 0;
+            if (expansion.indexOf("#") !== -1) {
+                var stripped = expansion.replace(/##/g, "");


Why are ## being stripped out?

I think ## is supposed to expand to # in TeX (a way of quoting # that isn't part of an argument like #1). Presumably this is to ignore those for detecting macro arguments.

Exactly. If I do \def\setfoo#1{\def\foo##1##2{##1#1##2}} that's a macro which takes but a single argument, so the string "\\def\\foo##1##2{##1#1##2}" should have the same effect.

k4b7 · 2017-01-08T00:54:08Z

src/MacroExpander.js

+                    }
+                    tok = expansion[--i]; // next token on stack
+                    if (tok.text === "#") { // ## → #
+                        expansion.splice(i + 1, 1); // drop first #


This is to handled nested \defs?

In the LaTeX world if would be. In our world it's also required if you want a hex-encoded HTML color inside a macro body.

k4b7 · 2017-01-08T00:59:53Z

test/screenshotter/ss_data.yaml

+    macros:
+        \startExp: e^\bgroup
+        \endExp: \egroup
+    tex: \startExp a+b\endExp


k4b7 · 2017-01-08T01:00:09Z

src/macros.js

+
+// \def\overset#1#2{\binrel@{#2}\binrel@@{\mathop{\kern\z@#2}\limits^{#1}}}
+defineMacro("\\overset", "\\mathop{#2}\\limits^{#1}");
+defineMacro("\\underset", "\\mathop{#2}\\limits_{#1}");


Could you add a screenshot test for these?

gagern · 2017-01-11T12:44:29Z

@kevinbarabash I added screenshots, and just rebased the branch to resolve one conflict.

edemaine · 2017-04-04T14:11:10Z

Is there a reason that this PR never got merged into master?

* Ship predefined macros with the library, in macros.js. * Allow macro arguments #1 and so on, with argument count deduced from string. * Use these features to implement \overset and \underset, fixes KaTeX#484.

… as suggested by Erik Demaine, to future-proof the code.

gagern · 2017-04-05T20:47:50Z

I just rebased this to use let and const.

k4b7 · 2017-04-16T00:37:33Z

@edemaine I forgot about it. 😞

k4b7 · 2017-04-16T00:39:33Z

@gagern thanks for the adding underset/overset. Sorry for the delay in merging this.

edemaine · 2017-04-16T03:30:29Z

Thanks @kevinbarabash !! I'll have to look back at my other pending PR requests which were blocked on this.

gagern added the GH Review: review-needed label Jan 6, 2017

edemaine mentioned this pull request Jan 6, 2017

Implement AMSMath's \dots #599

Closed

xymostech approved these changes Jan 6, 2017

View reviewed changes

k4b7 reviewed Jan 8, 2017

View reviewed changes

test/screenshotter/ss_data.yaml

macros:

\startExp: e^\bgroup

\endExp: \egroup

tex: \startExp a+b\endExp

Copy link
Copy Markdown

Member

k4b7 Jan 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

k4b7 reviewed Jan 8, 2017

View reviewed changes

gagern force-pushed the overset branch from 1b1f782 to 1e6306a Compare January 11, 2017 12:43

edemaine mentioned this pull request Jan 13, 2017

Add support for \implies and \iff #190

Closed

k4b7 added GH Review: needs-revision and removed GH Review: review-needed labels Jan 15, 2017

edemaine mentioned this pull request Apr 5, 2017

I use \rm{} but it doesn t work #252

Closed

gagern added 5 commits April 5, 2017 21:42

Builtin macros, macro arguments, overset and underset

7ec4550

* Ship predefined macros with the library, in macros.js. * Allow macro arguments #1 and so on, with argument count deduced from string. * Use these features to implement \overset and \underset, fixes KaTeX#484.

Introduce defineMacro function

96d1e6a

… as suggested by Erik Demaine, to future-proof the code.

Support \bgroup and \egroup

e2763a3

Indicate missing support for delimited macros

28ad473

Added screenshotter tests for overset and underset

3a95d88

gagern force-pushed the overset branch from 1e6306a to 3a95d88 Compare April 5, 2017 20:47

gagern mentioned this pull request Apr 7, 2017

Allow macro definitions in settings #493

Merged

edemaine mentioned this pull request Apr 8, 2017

Old font command support: \rm, \sf, \tt, \bf, \it #675

Merged

k4b7 merged commit 2c92a9a into KaTeX:master Apr 16, 2017

k4b7 mentioned this pull request Apr 16, 2017

KaTex failed to load Chemical equation #669

Closed

gagern mentioned this pull request Jun 8, 2017

Make a new release (was: \left and \right don't work in macros with arguments) #720

Closed

Uh oh!

Conversation

gagern commented Jan 6, 2017

Uh oh!

xymostech left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gagern commented Jan 11, 2017

Uh oh!

edemaine commented Apr 4, 2017

Uh oh!

gagern commented Apr 5, 2017

Uh oh!

k4b7 commented Apr 16, 2017

Uh oh!

k4b7 commented Apr 16, 2017

Uh oh!

edemaine commented Apr 16, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants