Skip to content

perf(perser-adaper-json): fold syntactic analysis phases #407

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 19, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions apidom/packages/apidom-ast/src/Error.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import Node from './Node';

interface Error extends Node {
value: unknown;
isUnexpected: boolean;
}

const Error: stampit.Stamp<Error> = stampit(Node, {
Expand Down
2 changes: 2 additions & 0 deletions apidom/packages/apidom-ast/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ export { default as JsonTrue } from './json/nodes/JsonTrue';
export { default as JsonFalse } from './json/nodes/JsonFalse';
export { default as JsonNull } from './json/nodes/JsonNull';
export {
isDocument as isJsonDocument,
isFalse as isJsonFalse,
isProperty as isJsonProperty,
isStringContent as isJsonStringContent,
Expand Down Expand Up @@ -59,6 +60,7 @@ export { default as Literal } from './Literal';
export { Point, default as Position } from './Position';
export { default as Error } from './Error';
export { default as ParseResult } from './ParseResult';
export { isParseResult, isLiteral, isPoint, isPosition } from './predicates';
// AST traversal related exports
export {
getVisitFn,
Expand Down
2 changes: 2 additions & 0 deletions apidom/packages/apidom-ast/src/json/nodes/predicates.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
import { isNodeType } from '../../predicates';

export const isDocument = isNodeType('document');

export const isString = isNodeType('string');

export const isFalse = isNodeType('false');
Expand Down
1 change: 1 addition & 0 deletions apidom/packages/apidom-ls/test/openapi-json-async.ts
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,7 @@ describe('apidom-ls-async', function () {

assert.deepEqual(result, expected as Diagnostic[]);
doc = TextDocument.create('foo://bar/file.json', 'json', 0, specError);
console.dir(doc);
result = await languageService.doValidation(doc, validationContext);

assert.deepEqual(result, [
Expand Down
53 changes: 48 additions & 5 deletions apidom/packages/apidom-parser-adapter-json/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,54 @@
# apidom-parser-adapter-json

`apidom-parser-adapter-json` is a parser adapter for the [JSON format](https://www.json.org/json-en.html).
This parser adapter uses [tree-sitter](https://www.npmjs.com/package/tree-sitter) / [web-tree-sitter](https://www.npmjs.com/package/web-tree-sitter) as an underlying parser.
Tree-sitter uses [tree-sitter-json grammar](https://www.npmjs.com/package/tree-sitter-json) to produce [CST](https://tree-sitter.github.io/tree-sitter/using-parsers#syntax-nodes) from a source string.

[CST](https://tree-sitter.github.io/tree-sitter/using-parsers#syntax-nodes) produced by tree-sitter parser is [syntactically analyzed](https://github.com/swagger-api/apidom/blob/master/apidom/packages/apidom-parser-adapter-json/src/parser/syntactic-analysis.ts) and [JSON AST](https://github.com/swagger-api/apidom/tree/master/apidom/packages/apidom-ast#json-ast-nodes) is produced.
JSON AST is then transformed into generic ApiDOM structure using [base ApiDOM namespace](https://github.com/swagger-api/apidom/tree/master/apidom/packages/apidom#base-namespace).
[CST](https://tree-sitter.github.io/tree-sitter/using-parsers#syntax-nodes) produced by lexical analysis is [syntactically analyzed](https://github.com/swagger-api/apidom/blob/master/apidom/packages/apidom-parser-adapter-json/src/syntactic-analysis) and
and ApiDOM structure using [base ApiDOM namespace](https://github.com/swagger-api/apidom/tree/master/apidom/packages/apidom#base-namespace) is produced.
[JSON AST](https://github.com/swagger-api/apidom/tree/master/apidom/packages/apidom-ast#json-ast-nodes) is produced.


## Parse phases

The parse stage takes JSON string and producesApiDOM structure using [base ApiDOM namespace](https://github.com/swagger-api/apidom/tree/master/apidom/packages/apidom#base-namespace) is produced.
There are two phases of parsing: **Lexical Analysis** and **Syntactic Analysis**.

### Lexical Analysis

Lexical Analysis will take a string of code and turn it into a stream of tokens.
[tree-sitter](https://www.npmjs.com/package/tree-sitter) / [web-tree-sitter](https://www.npmjs.com/package/web-tree-sitter) is used as an underlying lexical analyzer.

### Syntactic Analysis

Syntactic Analysis will take a stream of tokens and turn it into an ApiDOM representation.
[CST](https://tree-sitter.github.io/tree-sitter/using-parsers#syntax-nodes) produced by lexical analysis is [syntactically analyzed](https://github.com/swagger-api/apidom/blob/master/apidom/packages/apidom-parser-adapter-json/src/syntactic-analysis)
and ApiDOM structure using [base ApiDOM namespace](https://github.com/swagger-api/apidom/tree/master/apidom/packages/apidom#base-namespace) is produced.

#### [Direct Syntactical analysis](https://github.com/swagger-api/apidom/blob/master/apidom/packages/apidom-parser-adapter-json/src/syntactic-analysis/direct)

This analysis directly turns tree-sitter CST into ApiDOM. Single traversal is required which makes
it super performant, and it's the default analysis used.

```js
import { parse } from 'apidom-parser-adapter-json';

const parseResult = await adapter.parse('{"prop": "value"}', {
syntacticAnalysis: 'direct',
});
```

#### [Indirect Syntactic analysis]((https://github.com/swagger-api/apidom/blob/master/apidom/packages/apidom-parser-adapter-json/src/syntactic-analysis/indirect))

This analysis turns trees-sitter CST into [JSON AST](https://github.com/swagger-api/apidom/tree/master/apidom/packages/apidom-ast#json-ast-nodes) representation.
Then JSON AST is turned into ApiDOM. Two traversals are required, which makes indirect analysis less performant than direct one.
Thought less performant, having JSON AST representation allows us to do further complex analysis.

```js
import { parse } from 'apidom-parser-adapter-json';

const parseResult = await adapter.parse('{"prop": "value"}', {
syntacticAnalysis: 'indirect',
});
```

## Parser adapter API

Expand Down Expand Up @@ -34,8 +77,8 @@ This adapter exposes an instance of [base ApiDOM namespace](https://github.com/s

Option | Type | Default | Description
--- | --- | --- | ---
<a name="specObj"></a>`specObj` | `Object` | [Specification Object](https://github.com/swagger-api/apidom/blob/master/apidom/packages/apidom-parser-adapter-json/src/parser/specification.ts) | This specification object drives the JSON AST transformation to base ApiDOM namespace.
<a name="sourceMap"></a>`sourceMap` | `Boolean` | `false` | Indicate whether to generate source maps.
<a name="syntacticAnalysis"></a>`syntacticAnalysis` | `String` | `direct` | Indicate type of syntactic analysis

All unrecognized arbitrary options will be ignored.

Expand Down
5 changes: 4 additions & 1 deletion apidom/packages/apidom-parser-adapter-json/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,10 @@
"clean": "rimraf ./es ./cjs ./dist ./types",
"typescript:check-types": "tsc --noEmit",
"typescript:declaration": "tsc -p declaration.tsconfig.json",
"test": "cross-env BABEL_ENV=cjs mocha"
"test": "cross-env BABEL_ENV=cjs mocha",
"perf": "cross-env BABEL_ENV=cjs node ./test/perf/index.js",
"perf:parsing-syntactic-analysis-direct": "cross-env BABEL_ENV=cjs node ./test/perf/parsing-syntactic-analysis-direct.js",
"perf:parsing-syntactic-analysis-indirect": "cross-env BABEL_ENV=cjs node ./test/perf/parsing-syntactic-analysis-indirect.js"
},
"author": "Vladimir Gorej",
"license": "Apache-2.0",
Expand Down
35 changes: 33 additions & 2 deletions apidom/packages/apidom-parser-adapter-json/src/adapter-browser.ts
Original file line number Diff line number Diff line change
@@ -1,2 +1,33 @@
export { default as parse, namespace } from './parser/index-browser';
export { detect, mediaTypes } from './adapter';
import { ParseResultElement } from 'apidom';

import lexicallyAnalyze from './lexical-analysis/browser';
import syntacticallyAnalyzeDirectly from './syntactic-analysis/direct';
import syntacticallyAnalyzeIndirectly from './syntactic-analysis/indirect';

export { detect, mediaTypes, namespace } from './adapter';

interface ParseFunctionOptions {
sourceMap?: boolean;
syntacticAnalysis?: 'direct' | 'indirect';
}

type ParseFunction = (
source: string,
options?: ParseFunctionOptions,
) => Promise<ParseResultElement>;

export const parse: ParseFunction = async (
source,
{ sourceMap = false, syntacticAnalysis = 'direct' } = {},
) => {
const cst = await lexicallyAnalyze(source);
let apiDOM;

if (syntacticAnalysis === 'indirect') {
apiDOM = syntacticallyAnalyzeIndirectly(cst, { sourceMap });
} else {
apiDOM = syntacticallyAnalyzeDirectly(cst, { sourceMap });
}

return apiDOM;
};
35 changes: 33 additions & 2 deletions apidom/packages/apidom-parser-adapter-json/src/adapter-node.ts
Original file line number Diff line number Diff line change
@@ -1,2 +1,33 @@
export { default as parse, namespace } from './parser/index-node';
export { mediaTypes, detect } from './adapter';
import { ParseResultElement } from 'apidom';

import lexicallyAnalyze from './lexical-analysis/node';
import syntacticallyAnalyzeDirectly from './syntactic-analysis/direct';
import syntacticallyAnalyzeIndirectly from './syntactic-analysis/indirect';

export { detect, mediaTypes, namespace } from './adapter';

interface ParseFunctionOptions {
sourceMap?: boolean;
syntacticAnalysis?: 'direct' | 'indirect';
}

type ParseFunction = (
source: string,
options?: ParseFunctionOptions,
) => Promise<ParseResultElement>;

export const parse: ParseFunction = async (
source,
{ sourceMap = false, syntacticAnalysis = 'direct' } = {},
) => {
const cst = await lexicallyAnalyze(source);
let apiDOM;

if (syntacticAnalysis === 'indirect') {
apiDOM = syntacticallyAnalyzeIndirectly(cst, { sourceMap });
} else {
apiDOM = syntacticallyAnalyzeDirectly(cst, { sourceMap });
}

return apiDOM;
};
4 changes: 4 additions & 0 deletions apidom/packages/apidom-parser-adapter-json/src/adapter.ts
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import { createNamespace } from 'apidom';

export const mediaTypes = ['application/json'];

export const detect = async (source: string): Promise<boolean> => {
Expand All @@ -8,3 +10,5 @@ export const detect = async (source: string): Promise<boolean> => {
}
return true;
};

export const namespace = createNamespace();
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
import { tail } from 'ramda';
import { isString, isFunction } from 'ramda-adjunct';
// @ts-ignore
import treeSitterWasm from 'web-tree-sitter/tree-sitter.wasm';

// patch fetch() to let emscripten load the WASM file
const realFetch = globalThis.fetch;

if (isFunction(realFetch)) {
globalThis.fetch = (...args) => {
// @ts-ignore
if (isString(args[0]) && args[0].endsWith('/tree-sitter.wasm')) {
// @ts-ignore
return realFetch.apply(globalThis, [treeSitterWasm, tail(args)]);
}
return realFetch.apply(globalThis, args);
};
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
import './browser-patch';

import Parser, { Tree } from 'web-tree-sitter';
// @ts-ignore
import treeSitterJson from 'tree-sitter-json/tree-sitter-json.wasm';

/**
* We initialize the WebTreeSitter as soon as we can.
*/
const parserP = (async () => {
await Parser.init();
await Parser.Language.load(treeSitterJson);

return new Parser();
})();

/**
* Lexical Analysis of source string using WebTreeSitter.
* This is WebAssembly version of TreeSitters Lexical Analysis.
*/
const analyze = async (source: string): Promise<Tree> => {
const parser = await parserP;
return parser.parse(source);
};

export default analyze;
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import Parser, { Tree } from 'tree-sitter';
// @ts-ignore
import JSONLanguage from 'tree-sitter-json';

const parser = new Parser();
parser.setLanguage(JSONLanguage);

/**
* Lexical Analysis of source string using TreeSitter.
* This is Node.js version of TreeSitters Lexical Analysis.
*/
const analyze = async (source: string): Promise<Tree> => {
return parser.parse(source);
};

export default analyze;

This file was deleted.

This file was deleted.

This file was deleted.

51 changes: 0 additions & 51 deletions apidom/packages/apidom-parser-adapter-json/src/parser/index.ts

This file was deleted.

Loading