Product SiteDocumentation Site

3.7. Character sets and Unicode

Character sets are a little bit nasty. The reason for this is that we are working with three (variable) charsets in the Python bindings:
What’s more, a user can send information using a string or a unicode type in python.
This is the way that charsets are used:
The nice thing about this is that when you parse commandline arguments or when you are printing to the terminal, you never have to do any charset conversions. The drawback is that if you know that you are receiving, say, UTF-8 from some other library (eg. an XML reader), then you can do any of two things:
  1. Make sure that the current locale is in utf-8 (use the locale command from the bash shell to check your locale)
  2. Convert the utf-8 data from the other library to unicode strings and use the PT_UNICODE data types (and possibly MAPI_UNICODE flag, but this only affects strings in the argument list of a method call):
message = folder.CreateMessage(0)
s = 'some string from XML lib'

message.SetProps([SPropValue(PR_SUBJECT_W, s.decode('utf-8'))]);

Note

Since the release of version 7.0 Zarafa has server-wide support for unicode, but every older version only support the windows-1252 (almost identical to iso-8859-15 or Latin-1) charset internally. Which means that although using unicode strings in versions prior to 7.0 is supported, any character outside the windows-1252 charset will be converted to a questionmark symbol (?).

Note

The python interface has not changed with this internal change to unicode. Python programs written for Zarafa 6.30 or 6.40 will continue to work unchanged on 7.0 and upwards.