JavaScript Syntax Highlighter
This is a JavaScript syntax highlighter experiment. You can enter JavaScript code into the textarea (down below) and you will be able to view the code with syntax highlighting.
It first builds a token list then sends that to a function to output the
data. The tokeniser is designed to be as accurate as possible, for example
the code ++
will be read as one operator token (increment), while +-
will be
read as two operator tokens (plus then minus). Although this is not strictly
required for a syntax highlighter, it is implemented so that I may later extend
this code to allow crushing/beautifying and/or actual parsing and executing. This
level of accuracy does actually have benefits to syntax highlighting however,
as tokens such as {
and }
can be coloured differently depending on whether
they are block level indicators or object literals.
Every open source Syntax highlighter that I have tried fails on at least some valid JS. Common pitfalls include:
- Failure to recognise numbers that start with a period, eg:
.01
- Failure to recognise that the second period in the following is
an operator, and not part of the number:
0.1.method(); // yes this is valid JS
- Failure to handle multiline strings
- Interpreting the following as containing a regular expression:
1/2/3;
- Ending regular expression tokens prematurely, eg:
/reg[/]exp/;
/[/*regexp*/]/;
- Recognising some edge case 'divide' operators as regular expression opening
tags, eg:
/regexp/
In this example the first line does not have a semicolon, so the first
/notRegexp/g;/
on the next line immediately becomes a division operation, withnotRegexp
andg
as variables.
I know of only one bug in my implementation: if an object literal
is placed in the false branch of a tertiary statement then it is highlighted
as a block level token (though most other libraries don't distinguish between these).
Fixed!
Christian Krebbs writes in to inform of another bug, a regular expression following a variable declaration (without an assignment or semicolon) is interpreted as a division. This one is going to be particularly difficult to fix. If you find another code sequence that fails to highlight correctly please contact me.
Note: This is not designed to necessarily fail gracefully on invalid input.