Reworking Ace's HAML Syntax Highlighting

Ace is a great web text editor and the default editor for Cloud 9 IDE. I have been using it for many years without any complaints at all. However I was not very satisfied with the HAML syntax highlighting, which seemed to have some problems caused by indentation when highlighting some tokens. Additionally, it didn't support correct highlighting of some HAML stuff such as HAML comments (which begin with -#) or block comments.

This is how Ace's HAML syntax highlighting issues look like:

HAML Highlighting Issues

I proceeded to study Ace's logic for syntax highlighting. It consists basically on a lexer that reads the input through different regular expressions and proceeds to different stages depending on the regular expression caught. Basically, a state machine. The source where this happens is found in lib/ace/mode/haml_highlight_rules.js

Defining States

A few states have to be defined to represent where the lexer currently “stands” in regards to the code. For example, entering a multi-block comment could represent entering a new state, since everything parsed in this state would belong to this block comment until the block ends, this will also mean another change of state.

In Ace, all syntax highlighting lexers must begin with a start state. From this state we can switch to other defined states. The example below shows how we begin from the start state and can jump to a comment block state when the code matches a regular expression that represents this:

this.$rules = {
    "start": [
        {
            token: "comment.block", // multiline HTML comment
            regex: /^\/$/,
            next: "comment"
        },
        {
            token: "comment.block", // multiline HAML comment
            regex: /^\-#$/,
            next: "comment"
},

/* ... */

Notice that we define 2 different comment types, since HAML supports HAML (not rendered in HTML) and HTML (rendered in HTML) comments. Both of these different regular expressions will make the lexer parse them as comment.block tokens, and will also make the lexer jump to a comment state, denoted by the next keyword.

Reworking The Syntax Highlighting

Reworking this syntax highlighting required fixing some mistakes in some of the existing regular expressions. Also there were not any states for comments, so I decided to add them as well. The complete details of my rework can be seen in my pull request.

This is how the highlighting looks after my pull request was merged:

HAML correct highlighting

Future Work

There are still some improvements that can be made and hopefully I get time to address. Particularly the way indentation is handled by the lexer. Currently, this is handled purely by regular expressions, but it would be better to have a state for tokens that are currently indented. Also maybe make use of push and pop keywords for handling states. The YAML syntax highlighting was mentioned as reference to improve indentation detection.

haml ace open source javascript cloud9

Comments

comments powered by Disqus