masaj salonu masaj salonları
Home » Advertising » Parsing in JavaScript: Tools and Libraries, Part 3

Parsing in JavaScript: Tools and Libraries, Part 3

Welcome back! If you missed the first two parts to this article, follow these links to check them out! Part 1 and Part 2

JavaScript Libraries Related to Parsing

There are also some other interesting libraries related to parsing that are not part of a common category.

JavaScript Libraries That Parse JavaScript

There is one special case that could be managed in a more specific way: the case in which you want to parse JavaScript code in JavaScript. Contrary to what we have found for Java and C#, there is not a definitive choice: there are many good choices to parse JavaScript.

The three most popular libraries seem to be: Acorn, Esprima, and UglifyJS. We are not going to say which one it is best because they all seem to be awesome, updated, and well supported.

One important difference is that UglifyJS is also a mangler/compressor/beautifier toolkit, which means that it also has many other uses. On the other hand, it is the only one of these libraries that only supports ECMAScript up to version 5. Another thing to consider is that only Esprima has a documentation worthy of projects of such magnitude.

Interesting Parsing Libraries: Chevrotain

Chevrotain is a very fast and feature rich JavaScript LL(k) Parsing DSL. It can be used to build parsers/compilers/interperters for various use cases ranging from simple configuration files, to full fledged programming languages.

There is another interesting parsing tool that does not really fit in more common categories of tools, like parser generators or combinators: Chevrotain, a parsing DSL. A parsing DSL works as a cross between a parser combinator and a parser generator. You define a grammar in JavaScript code directly, but using the (Chevrotain) API and not a standard syntax like EBNF or PEG.

Chevrotain supports many advanced features typical of parser generators: like semantic predicates, separate lexers and parsers, and a grammar definition (optionally) separated from the actions. The actions can be implemented using a visitor and thus you can reuse the same grammar for multiple projects.

The following is a partial JSON example grammar from the documentation. As you can see, the syntax is easier to understand for a developer who is inexperienced in parsing, but a bit more verbose than a standard grammar.

Chevrotain JSON Example in JavaScript:

// partial JSON grammar for ES6 from the documentation
"use strict"
const chevrotain = require("chevrotain")
 
// ----------------- lexer -----------------
const Token = chevrotain.Token
const Lexer = chevrotain.Lexer
const Parser = chevrotain.Parser
 
// With ES6 we can define Tokens using the class keywords.
 
class True extends Token {}
True.PATTERN = /true/
 
class False extends Token {}
False.PATTERN = /false/
 
class StringLiteral extends Token {}
StringLiteral.PATTERN = /"(?:[^\"]|\(?:[bfnrtv"\/]|u[0-9a-fA-F]{4}))*"/
///" this comments exists only to fix syntax highlighting on the website
 
class WhiteSpace extends Token {}
WhiteSpace.PATTERN = /s+/
WhiteSpace.GROUP = Lexer.SKIPPED // marking WhiteSpace as 'SKIPPED' makes the lexer skip it.
WhiteSpace.LINE_BREAKS = true
 
const allTokens = [
    WhiteSpace,    
    StringLiteral,        
    True,
    False,
    [..]
]
const JsonLexer = new Lexer(allTokens)
 
// ----------------- parser -----------------
// Using ES6 the parser too can be defined as a class
class JsonParserES6 extends chevrotain.Parser {
    constructor(input) {
        super(input, allTokens)
 
        // not mandatory, using $ (or any other sign) to reduce verbosity (this. this. this. this. .......)
        const $ = this
 
        $.RULE("json", () = {            
            $.OR([
                {ALT: () = {$.SUBRULE($.object)}},
                {ALT: () = {$.SUBRULE($.array)}}
            ])
        })
 
        [..]
        
        $.RULE("array", () = {
            $.CONSUME(LSquare)
            $.MANY_SEP({
                SEP: Comma,
                DEF: () = {
                    $.SUBRULE2($.value)
                }
            })
            $.CONSUME(RSquare)
        })
 
        $.RULE("value", () = {            
            $.OR([
                {ALT: () = {$.CONSUME(StringLiteral)}},                               
                {ALT: () = {$.SUBRULE($.array)}},
                {ALT: () = {$.CONSUME(True)}},
                {ALT: () = {$.CONSUME(False)}},
                [..]
            ])
        })
 
        // very important to call this after all the rules have been defined.
        // otherwise the parser may not work correctly as it will lack information
        // derived during the self analysis phase.
        Parser.performSelfAnalysis(this)
    }
}

It is very fast, faster than any other JavaScript library, and can compete with a custom parser written by hand, depending on the JavaScript engine on which it runs. You can see the numbers and get more details on the benchmark of parsing libraries developed by the author of the library.

It also supports features useful for debugging, like railroad diagram generation and custom error reporting. There are also a few features that are useful for building compilers, interpreters, or tools for editors, such as automatic error recovery or syntactic content assist. The last one means that it can suggest the next token given a certain input, so it could be used as the building block for an autocomplete feature.

Chevrotain has good documentation, with a tutorial, examples grammars, and a reference. It also has a great online editor/playground.

Chevrotain is written in TypeScript.

Summary

As we said in our sister articles about parsing in Java and C#, the world of parsers is a bit different from the usual world of programmers. In the case of JavaScript, also, the language lives in a different world from any other programming language. There is such disparate levels of competence between its developers that you could find the best ones working with people that just barely know how to put together a script. And both want to parse things.

So for JavaScript, there are tools that are a bit all over this spectrum. We have serious tools developed by academics for their courses or in the course of their degrees together with much simpler tools. Some of which blur the lines between parser generators and parser combinators. And all of them have their place. A further complication is that while usually parser combinators are reserved for easier uses, with JavaScript it is not always the case. You could find very powerful and complex parser combinators and much easier parser generators.

So with JavaScript, more than ever, we cannot definitely suggest one software over the other. What is best for one user might not be the best for somebody else. And we all know that the most technically correct solution might not be ideal in real life with all its constraints. So, we wanted to share what we have learned about the best options for parsing in JavaScript.

We would like to thank Shahar Soel for having signaled to us Chevrotain and having suggested some needed corrections.

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*

cover letter