Skip to content

Linting the spec #173

@bakkot

Description

@bakkot

I would like to start enforcing some basic sanity checks for the spec. Some of them will end up implemented in other projects, but I would like to use this issue to track possible lint rules everywhere.

The main goal of this is to reduce editorial churn and correctness issues, but I know that it would also be helpful to people who build tooling based on the spec, like https://github.com/jscert/jsexplain (so cc @brabalan in case you have any ideas).

Currently these are mostly enforced by editors noticing them or @jmdyck submitting after-the-fact PRs, which isn't sustainable.

Here's a few possibilities to get started. Please suggest more.

  • Output is valid HTML
  • No trailing whitespace
  • In algorithms:
    • Consistent spacing, e.g. foo ( a, b ) in algorithm headers and foo(a, b) in algorithm steps.
    • Parameter names are surrounded by _
    • If an If has substeps, the If line ends with , then
    • If an If has an Else which has substeps, it is spelled Else, not Otherwise
    • If an If has an else on the same line, it is spelled ; else rather than , else or ; else, or ; otherwise (we are inconsistent about else vs otherwise here, and I am OK not enforcing it, but we should at least enforce the semicolon and lack of following space).
    • Lines end in ., :, ,, do, or then
    • Any line not ending in . is followed by an indented sequences of steps.
    • Consistent casing for variable names - currently we're very bad about this one
    • If we say "A and B" or "A, B, and C", the commas are as they are in those examples (rather than "A, and B" or "A, B and C")
    • Consistently "let x be" and "set x to" (e.g. not "let x to", "set x be", or "set x to be")
    • Ban If _foo_ is present, let _bar_ be _foo_; else let _bar_ be *undefined*. (Editorial: Treat not present parameters as undefined ecma262#1411) [edit: actually that PR only applied to functions, not AOs, so we can only do this if we first sort out the treatment of missing parameters in AO e.g. by banning optional parameters in AOs)
    • "... to a new empty List" rather than "to a new List", "to be a new List", "to be a new empty List"
      • Probably something similar for records
    • The last step should not be Return., since that's implicit (as of Editorial: describe behavior for algorithms without a "Return" ecma262#2397).
  • Grammar lookahead restrictions and flags are omitted in early error definitions and syntax-directed operations
  • In the grammar:
    • Lookahead restrictions and flags do not have spaces between brackets (so [lookahead != `let`] or [+Await] rather than [ lookahead != `let` ] or [ +Await ], etc)
    • One space before the : on the LHS
    • One space between each terminal or nonterminal on the LHS
    • Grammar flags are consistent: every ?-prefixed flag on the RHS appears as a parameter flag on the LHS, and every LHS flag is passed down at least one nonterminal in one production on the RHS, or is used to gate a production (see more about what grammar flags mean here)
  • A bunch of HTML consistency stuff:
    • tags are lowercase auto-formatter #367
    • attribute values are quoted (or unquoted; I don't care as long as we're consistent)
    • tags don't have any unexpected attributes (e.g.)
    • tags have all the expected attributes (e.g. <emu-xref="foo>" should be caught, since they meant <emu-xref href="foo">)
    • for tags for which the closing tag is optional, they are always included (or not included; again, as long as we're consistent)
    • is spelled &ge; (etc) formatter: render HTML entities #481
    • no unknown emu tags lint: some checks for legal tags/attributes #279
    • sec- is only a prefix for an ID when attached to a clause (cf Editorial: Don't use "sec-" prefix for dfn ids ecma262#2103)
    • all rows in a table have the same number of cells (account for colspan)
    • consistent indentation
    • emu-grammar and emu-alg tags are not adjacent with others of their kind
    • no more than one blank line in a row
      • and maybe consistent rules about where blank lines go, at least in some cases: e.g., never between <emu-clause> and <h1>.
    • Consistent spacing for records: exactly the spacing in { [[key]]: value, [[key2]]: value2 }
  • Consistent spelling
    • the *this* value, not *this* value or the *this* object
      • we have a lint for "this object" but not no-"the" "this value" because the latter form is still in use (generally as an argument to an AO call)
    • British vs American spelling for words where it's an issue, like "behaviour"
    • "one's complement" and "two's complement", not "1's complement" or "2's complement"
    • "uppercase" and "lowercase", not "upper case" or "upper-case" (Editorial: Standardize the spelling of "uppercase" and "lowercase" ecma262#2598)
  • Annex A ("Grammar Summary") has emu-prodrefs to all productions (or, ideally, is automatically generated)
  • As of Editorial: use consistent wording for abstract operation preambles ecma262#1914, every abstract operation has a preamble in the correct format (though, what is an abstract operation from the perspective of ecmarkup? - I guess an emu-clause with an AOID is a reasonable heuristic.)
  • When the steps or prose for a syntax-directed operation refer to the name of a nonterminal, it is surrounded with |, as in |UnicodeLeadSurrogate|.
  • In syntax-directed operations,
    • All referenced nonterminals occur in the production for the SDO.
    • In the algorithm steps or prose, opt subscripts and grammar parameters are not included.
  • Miscellaneous stuff:
    • ~~Always *+0*<sub>𝔽</sub> or *-0*<sub>𝔽</sub>, never *0*<sub>𝔽</sub>. ~~(add some lint rules for numbers #257)
    • "be the Record {", not "be a new Record {"
    • Never *+1* for any string of digits except 0.
    • Exactly one space between sentences.
    • <p> is not followed by a linebreak, and </p> is not preceded by a linebreak (even with intervening whitespace).
    • [Cc]lauses? \d should be forbidden.
    • An inline if does not have a then: If foo, return false. not If foo, then return false.
    • "ECMAScript language value", not "ECMAScript value"
  • No namespace collisions between constants and AOs (and maybe other namespaces)
  • All AOs have structured headers (once people have had time to get used to the new syntax)
  • No unnecessary explicit suppressions / annotations for can-call-user-code
  • Every algorithm returns.
  • consistently "a number or a bigint", in that order, in types: Editorial: Use consistent phrasing for parameters that are Number or BigInt ecma262#2622
  • no unused Let bindings or captured variables in AOs: Editorial: remove unused capture in %TypedArray%.prototype.sort ecma262#2836 lint for used-but-not-declared and declared-but-not-used variables in algorithms #483
  • If _x_ is *null*, return *null* rather than If _x_ is *null*, return _x_, per Editorial: Prefer returning static values after alias comparison ecma262#3122 - anywhere there's a comparison against a thing in *s or ~s, or a literal number or +/-∞, and then the alias is returned, we should return the constant instead.

It would also be nice to have a few more static-analysis-y checks, like

  • The grammar should be unambiguous
    • And LR(k)
  • Typechecking for abstract operations
    • All used operations are defined
    • They are invoked with the right number of arguments
    • They don't reference any values which are not defined lint for used-but-not-declared and declared-but-not-used variables in algorithms #483
      • And when updating an already defined value, this is done with set rather than let, increase, increment, add, etc.
    • Their return value is treated as a completion record or not as a completion record, as appropriate (pending Sorting out completion records ecma262#1796)
    • In an ideal world, actual typechecking for values
      • Given that, enforce the * vs ~ vs _ vs " rules for referring to different kinds of values
      • As a particular case, algorithms should say If _x_ is *true*, not If _x_ is true.
  • Grammar productions have all and only the appropriate flags
  • All syntax-directed operations correspond to actual productions

Edit: some of these are done in #199, #205, #207, #209, #210, and #239. I've struck them from the above list. I'm keeping this issue open to track remaining items.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions