The LDAP protocol states that some fields (distinguished names, relative
distinguished names, attribute names, queries) be encoded in UTF-8.
In python-ldap, these are represented as text (
str on Python 3).
Attribute values, on the other hand, MAY
contain any type of data, including text.
To know what type of data is represented, python-ldap would need access to the
schema, which is not always available (nor always correct).
Thus, attribute values are always treated as
Encoding/decoding to other formats – text, images, etc. – is left to the caller.
Python 3 introduced a hard distinction between text (
str) – sequences of
characters (formally, Unicode codepoints) – and
bytes – sequences of
8-bit values used to encode any kind of data for storage or transmission.
Python 2 had the same distinction between
str (bytes) and
However, values could be implicitly converted between these types as needed,
e.g. when comparing or writing to disk or the network.
The implicit encoding and decoding can be a source of subtle bugs when not
designed and tested adequately.
In python-ldap 2.x (for Python 2), bytes were used for all fields, including those guaranteed to be text.
From version 3.0 to 3.3, python-ldap uses text where appropriate.
On Python 2, special
influenced how text was handled.
From version 3.3 on, only Python 3 is supported. The “bytes mode” settings are deprecated and do nothing.