This page last changed on Oct 25, 2005 by jnolen.

Confluence uses (mostly) regular expressions to convert wiki style markup into html. These are simple to write, fairly simple to compose (that is, you can add another regular expression which gets applied on top of the ones you already have), and most importantly, are forgiving.

It wouldn't be too hard to write a grammar expressing the markup language, but when a user enters markup which the system doesn't understand, you need to fail softly – not throwing away any input, and not presenting the user with an error message. The mechanics of the markup process must be invisible to the user.

Regular expressions can be expensive to apply – for instance, when viewing a 100 line page in Confluence 1.4, 17% of the CPU time used during the request is used in java.util.regex.Matcher.replaceAll().

A typical regular expression is

"(^|\\s)---(\\s|$)"
which finds
---
and replaces it with an emdash,
—
which renders as — .

It's simple to see when this regular expression certainly doesn't apply to some wiki text – when that text doesn't include

---

You can do an analagous test for each of our many regular expressions, just look for a constant part of the regex. Of course, the existence of the constant part is a necessary, not a sufficient condition to know that the regex will match, but it works well enough to be worthwhile.

A simple

wikiText.indexOf(constantPart) > 0
check before each application of a regular expression reduces that 17% to 9%, on a page of 100 lines which has bold and italic markup on every other line.

It's interesting that replaceAll() doesn't try that itself. Presumably its optimised for the case when the string you give it does match the expression, which is probably the most common situation.

A very simple but worthwhile saving. The only situation we need to worry about is if many of the lines in our pages have many types of markup on them, because then we not only pay for the replaceAll(), but also pay for the indexOf.

Document generated by Confluence on Oct 10, 2007 18:47