Bytes/text management¶
The LDAP protocol states that some fields (distinguished names, relative
distinguished names, attribute names, queries) be encoded in UTF-8.
In python-ldap, these are represented as text (str
on Python 3).
Attribute values, on the other hand, MAY
contain any type of data, including text.
To know what type of data is represented, python-ldap would need access to the
schema, which is not always available (nor always correct).
Thus, attribute values are always treated as bytes
.
Encoding/decoding to other formats – text, images, etc. – is left to the caller.
Historical note¶
Python 3 introduced a hard distinction between text (str
) – sequences of
characters (formally, Unicode codepoints) – and bytes
– sequences of
8-bit values used to encode any kind of data for storage or transmission.
Python 2 had the same distinction between str
(bytes) and
unicode
(text).
However, values could be implicitly converted between these types as needed,
e.g. when comparing or writing to disk or the network.
The implicit encoding and decoding can be a source of subtle bugs when not
designed and tested adequately.
In python-ldap 2.x (for Python 2), bytes were used for all fields, including those guaranteed to be text.
From version 3.0 to 3.3, python-ldap uses text where appropriate.
On Python 2, special bytes_mode
and bytes_strictness
settings
influenced how text was handled.
From version 3.3 on, only Python 3 is supported. The “bytes mode” settings are deprecated and do nothing.