An introduction to Python Descriptors

Written by Dan Sackett on December 8, 2014

Diving into the low level workings of Python, you will inevitably come across descriptors.

In the most basic explanation, a descriptor is an object that implements any form of the __get__(), __set__(), and __delete__() methods. They form the code behind the @property, @classmethod, @staticmethod, and @super. On a level to understand better, descriptors are a way to regulate access and manipulation of an attribute on an object. For instance, they can restrict type, provide validation, and handle how you interact with the underlying attribute dictionary of your object.

To get a good picture of these, think about how Django has defined fields for forms and models. The BooleanField, CharField, and any other field is in essence a descriptor. These classes provide an interface for setting, and getting these attributes based on the types. We use descriptors to build these rules so that the actual values remain in the correct format.

When learning about descriptors, you will likely see a mention to descriptor protocol which relates simply to implementing the methods I mentioned like so:

def __get__(self, obj, type=None):
    # return value
    pass
def __set__(self, obj, value):
    # returns None
    pass
def __delete__(self, obj):
    # returns None
    pass

When writing these, there are two primary types. A data descriptor is an object that implements both the __get__() and __set__() method while non-data descriptors implement only the __get__() method. For a full writeup on the descriptor protocol, check out the official documentation.

In the end though, a good way to define a descriptor is by calling it a reusable property. I’ll show some examples of why this is so below.

How do we use it?

To me, descriptors are hard to explain without code. I'm hoping I can show you what a descriptor really is and why we want it in the following trial and error code blocks. Let’s start with a basic example. This example will include a class to track the stats for a basketball team during a game.

class BasketballGameStats(object):
    def __init__(self, team_name, points, rebounds, steals, blocks):
        self.team_name = team_name
        self.points = points
        self.rebounds = rebounds
        self.steals = steals
        self.blocks = blocks

        self.readonly_fields = ['team_name']

    def update(self, attr, value):
        if attr in self.__dict__.keys() and attr not in self.readonly_fields:
            if value < 0:
                raise ValueError('Positive values only!')
            setattr(self, attr, value)
        else:
            raise ValueError('Cannot update readonly fields')

As we see from this class, we have an __init__() method that takes a number of arguments and sets them as attributes. We then specify an attribute called readonly_fields to track the fields that we don't want to allow updates to. Our only method is one for updating an attribute. In this update method, we check that the attribute exists and that it isn't a readonly field and if the new value is a positive number. If so, we set the attribute.

Let's see how we would work with this class.

>>> stats = BasketballGameStats('Sixers', 5, 2, 1, 6)
>>> print stats.points
5
>>> stats.update('points', 10)
>>> print stats.points
10
>>> try:
...     stats.update('blocks', -2)
...     print stats.blocks
... except ValueError as e:
...     print e
... 
Positive values only!
>>> try:
...     stats.update('team_name', 1)
...     print stats.team_name
... except ValueError as e:
...     print e
... 
Cannot update readonly field

As we see, we can run our update method on a class instance and update attributes as we wish. If the value we're setting is negative, then we see an error. If the attribute is a readonly field, then we see an error as well. This works.

When we look at this approach though, there are some things that aren't ideal. For starters, if we add a new attribute to this class and it's a readonly field, we need to add it to our list. If we forget this, then we could get some weird results. Another drawback to this approach is the fact that we have to call a method on our class and specify the attribute based on a string value. When using a class, it's a lot better to use dot notation and not rely on a method to manage attributes in Python.

We can do better.

One way to make sure we're using dot notation and not managing a list of the readonly fields is by using the @property decorator. This decorator is a descriptor in itself as I mentioned above and it's a way to build getters and setters for your attributes. For instance, we can update our example like this:

class BasketballGameStats(object):
    def __init__(self, team_name, points, rebounds, steals, blocks):
        self.team_name = team_name
        self.points = points
        self.rebounds = rebounds
        self.steals = steals
        self.blocks = blocks

    @property
    def team_name(self):
        return self._team_name

    @team_name.setter
    def team_name(self, value):
        if self.__dict__.get('_team_name'):
            raise ValueError('Cannot update readonly field')
        self._team_name = value

    @property
    def points(self):
        return self._points

    @points.setter
    def points(self, value):
        if value < 0:
            raise ValueError('Positive values only!')
        self._points = value

    @property
    def rebounds(self):
        return self._rebounds

    @rebounds.setter
    def rebounds(self, value):
        if value < 0:
            raise ValueError('Positive values only!')
        self._rebounds = value

    @property
    def steals(self):
        return self._steals

    @steals.setter
    def steals(self, value):
        if value < 0:
            raise ValueError('Positive values only!')
        self._steals = value

    @property
    def blocks(self):
        return self._blocks

    @blocks.setter
    def blocks(self, value):
        if value < 0:
            raise ValueError('Positive values only!')
        self._blocks = value

Cool, so now we have attributes that we can call through dot notation and each one manages validation itself. Right away, we should see a few things wrong with this though. For one, the amount of code that we had to write has increased. And more importantly, that new code is repetitive. We're violating the DRY (don't repeat yourself) principle big time here and if we add more attributes this will get out of hand.

Still, let's run the execution to be sure that this works:

>>> stats = BasketballGameStats('Sixers', 5, 2, 1, 6)
>>> print stats.points
5
>>> stats.points = 10
>>> print stats.points
10
>>> try:
...     stats.blocks = -2
...     print stats.blocks
... except ValueError as e:
...     print e
... 
Positive values only!
>>> try:
...     stats.team_name = 1
...     print stats.team_name
... except ValueError as e:
...     print e
... 
Cannot update readonly field

Perfect, we have the same results. Now since this isn't ideal, let's see how we can update this class to be super clean and super efficient. For this, we're finally going to get into descriptors. Let's start by naming our functionality that we want. For one, we want the code to be DRY. We should only define our constraints once and then use that for all attributes. Another thing we want is to use dot notation still.

Other than that, we should be OK. So let's see how we can write and use descriptors.

class ReadOnlyField(object):
    def __init__(self):
        self.data = {}

    def __get__(self, instance, owner):
        return self.data.get(instance)

    def __set__(self, instance, value):
        if self.data.get(instance):
            raise ValueError('Cannot update readonly field')
        self.data[instance] = value

class NonNegativeField(object):
    def __init__(self):
        self.data = {}

    def __get__(self, instance, owner):
        return self.data.get(instance, 0)

    def __set__(self, instance, value):
        if value < 0:
            raise ValueError('Positive values only!')
        self.data[instance] = value

class BasketballGameStats(object):
    team_name = ReadOnlyField()
    points = NonNegativeField()
    rebounds = NonNegativeField()
    steals = NonNegativeField()
    blocks = NonNegativeField()

    def __init__(self, team_name, points, rebounds, steals, blocks):
        self.team_name = team_name
        self.points = points
        self.rebounds = rebounds
        self.steals = steals
        self.blocks = blocks

Whoa, we've got a winner! A little confusing when seeing it for the first time though. Let's start at the top.

We first define a ReadOnlyField class. In this class, we follow the descriptor protocol and provide a __get__() and __set__() method. As an underlying data store, we use a dictionary called data. In this instance, we want the field to be set on initialization, but then it becomes immutable. For this, our __set__() method will check if the data key exists and if it does then the exception is raised. Otherwise, we can set it the first time.

All in all, this kind of makes sense.

We do a lot of the same things in our NonNegativeField class. In our __set__() method we do the value check and set the value if it passes.

Using these new descriptors becomes very easy. In our main basketball class, we define the attributes on the class itself with the appropriate descriptors. Now in our constructor, when we assign the passed in values to our attributes we see the magic happen. Since we defined the attributes on the class above the constructor, self.team_name = team_name will dig into our descriptor. This will look something like this:

self.team_name = team_name
==
self.team_name.__set__(team_name)
==
ReadOnlyField.__set__(team_name)

So as we see, defining the descriptor means that when we access it with dot notation then we are actually calling the defined descriptor methods. This is why they work. We will be able to set the ReadOnlyField types once, and then they're set for good. This is the same as the NonNegativeField types.

Looking at this, it's clear that this is ideal. Now we have DRY code and a set of field types that can be reused for any attribute in any class. This is super useful. Let's see that things work like we expected:

>>> stats = BasketballGameStats('Sixers', 5, 2, 1, 6)
>>> print stats.points
5
>>> stats.points = 10
>>> print stats.points
10
>>> try:
...     stats.blocks = -2
...     print stats.blocks
... except ValueError as e:
...     print e
... 
Positive values only!
>>> try:
...     stats.team_name = 1
...     print stats.team_name
... except ValueError as e:
...     print e
... 
Cannot update readonly field

Yep, we got the same results. Hopefully you can see why we care about descriptors and why they can make your code more readable and more reusable.

When should we use it?

Descriptors can be a little confusing at first but there are definitely places where they come in handy. In the above example, we saw how we can use them to define a template for an attribute. This also provided a way to validate input without adding a lot of complexity to the class that's using the attribute. Another thing we can do with a descriptor is add additional functionality onto an attribute. For instance, we can provide methods that apply to a type.

In reality though, writing a custom descriptor is a last case scenario for a lot of people because it's a very low level thing. My example above, using multiple @property declarations that are the same, is an ideal use case. When evaluating whether it's a good idea to write a custom descriptor, follow this list:

Descriptors are a very advanced Python topic and writing them can confuse people who have never seen them before. Still, when done right they can be very clean and effective.

Gotchas

One thing to always remember is to place your descriptors on the class level before the constructor. If you don't do so, the __get__() and __set__() methods may not work as we expect. Another thing to always remember is to use a dictionary to handle the instances within the descriptor. If you were for instance to set your values on the descriptor directly, then all instances of that descriptor will share that scope. This is annoying, but if you run into it you should now know why.

Conclusion

I'll reiterate that descriptors are an advanced Python topic. They aren't used in a lot of simple cases, but when you need them you'll really see the benefit of them. There's a lot of good resources out there on the topic including the following links:

I'd love to hear about use cases you've run into with descriptors as I'm still learning on this topic too.


python descriptors

comments powered by Disqus