3.7. Character sets and Unicode

Character sets are a little bit nasty. The reason for this is that we are working with three (variable) charsets in the Python bindings:

Your terminal charset (depends on your locale, most modern OS’s default to utf-8 but many still use iso-8859-1 or other local charsets)
The sys.getdefaultencoding() charset (depends on your site.py settings but defaults to ascii)
The internal MAPI charset (windows-1252)

What’s more, a user can send information using a string or a unicode type in python.

This is the way that charsets are used:

Since sys.getdefaultencoding() isn’t easy to change for each application, it is not used. This in turn means that we never do any conversions between string and unicode in the python binding since this would require using sys.getdefaultencoding(). This would cause a lot of confusion since passing a unicode string without the MAPI_UNICODE flag would cause the unicode to be converted back to string (using ascii) and probably make python complain about the non asciiness of your unicode string, which is confusing to say the least, since the python binding itself would then have to convert from the string charset back into whatever charset MAPI was expecting.
String input data is assumed to be in Your terminal charset
Strings output by MAPI are in Your terminal charset
When passing the MAPI_UNICODE flag in flags or when using the PT_UNICODE property type you must pass a unicode string (u’string'). Failure to do so will result in a raised exception.

The nice thing about this is that when you parse commandline arguments or when you are printing to the terminal, you never have to do any charset conversions. The drawback is that if you know that you are receiving, say, UTF-8 from some other library (eg. an XML reader), then you can do any of two things:

Make sure that the current locale is in utf-8 (use the locale command from the bash shell to check your locale)
Convert the utf-8 data from the other library to unicode strings and use the PT_UNICODE data types (and possibly MAPI_UNICODE flag, but this only affects strings in the argument list of a method call):

message = folder.CreateMessage(0)
s = 'some string from XML lib'

message.SetProps([SPropValue(PR_SUBJECT_W, s.decode('utf-8'))]);

Note

Since the release of version 7.0 Zarafa has server-wide support for unicode, but every older version only support the windows-1252 (almost identical to iso-8859-15 or Latin-1) charset internally. Which means that although using unicode strings in versions prior to 7.0 is supported, any character outside the windows-1252 charset will be converted to a questionmark symbol (?).

Note

The python interface has not changed with this internal change to unicode. Python programs written for Zarafa 6.30 or 6.40 will continue to work unchanged on 7.0 and upwards.