Confluence 4.0 : Character encodings in Confluence
This page last changed on Jul 15, 2010 by mryall.
Character encoding adviceIn general, always set all character encodings to UTF-8. That includes database, JDBC drivers, application server, filesystem and Confluence. In certain isolated cases (e.g. Microsoft Windows), it might not be possible to use a fully Unicode filesystem (that is, a default Windows install doesn't support Unicode filenames properly). If so, stick with UTF-8 for the other two and be aware that your operating system might have limitations around international attachments (pre-2.2), backup and restore of international data, etc. The remainder of the document explains the encoding settings that are applicable in Confluence and how they relate to application behaviour. Where character encoding is usedThere are three places that character encoding matters to Confluence:
Problems generally arise when Confluence thinks one of the above encoding is different to what it actually is. For example, Confluence might believe the database is using ISO-8859-1 encoding, when in fact it is UTF-8 encoded. Java character encodingJava always uses the multibyte UTF-16 character encoding for all So when a request comes in to Confluence, we convert it from the request encoding to UTF-16. Then we store that data into the database, converting from UTF-16 to the database's encoding. Retrieving information from the database and sending it back to the browser is the same process in the opposite direction. Problems with character encodingsIf Confluence has the wrong idea about encoding for one of the above, it manifests itself in different ways:
Configuration of character encodingsThe Confluence character encoding is a configuration setting found in
In summary, changing the Confluence character encoding will change your HTTP request and response encoding and your Filesystem encoding as used by exports and velocity templates. The database encoding is the responsibility of your JDBC drivers. The drivers are responsible for reading and writing from the database in its native encoding and translating this data to and from Java Strings (which are UTF-16). For some drivers, such as MySQL, you must set Unicode encoding explicitly in the JDBC URL. For others, the driver is smart enough to determine the database encoding automatically. Ideally, your database itself should be in a Unicode encoding (and we recommend doing this for the simplest configuration), but that is not necessary as long as:
The filesystem encoding is mostly ignored by Confluence, except for the cases where the above configuration setting above plays a part (exports, velocity). When attachments are uploaded, they are written as a stream of bytes directly to the filesystem. It is the same when they are downloaded: the bytes from the file InputStream are written directly to the HTTP response. In some places in Confluence, we use the default filesystem encoding as determined by the JVM and stored in the In certain cases we explicitly hard-code the encoding used to read or write data to the filesystem. Two important examples are:
Some application servers, Tomcat for example, have an encoding setting that modifies Confluence URLs before they reach the application. This can prevent access to international pages and attachments (really anything with international characters in the URL). See configuring your Application Server URL encoding. RELATED TOPICS: |
![]() |
Document generated by Confluence on Sep 19, 2011 02:39 |