A masterclass in metaclasses
Superdesk ingests news data from a plethora of different external source types, ranging from incoming emails to public RSS feeds and files uploaded to an FTP server. Since these source types often significantly differ from each other, a separate ingest component is required for each of them.
Ingest components are implemented as specialised classes that inherit from the base IngestService class. After an ingest class is defined, it also needs to be registered as a provider, so that Superdesk knows that it exists.
For example, defining two new ingest components might roughly look like the following code snippets (details omitted for brevity):
Both components are registered under some name, along with the list of errors they might raise during the ingest process.
As you can see for yourself, the registration is pretty straightforward as it requires only a few extra lines of code after class definition. Yet, there are two drawbacks with this approach:
- Registration still needs to be done "manually", even though it is exactly the same for all ingest components.
- There is no prior warning if there happens to be another component that wants to register itself as a provider under the same PROVIDER name. The error occurs only when the application is actually run.
Ideally, new ingest classes would be registered automatically with all the needed name (and other) checks in place, and this is exactly what we did.
Enter the world of black magic - Python metaclasses
"A metaclass - say what??"
-- anonymous confused developer --
If you do not know what metaclasses are, do not worry. Despite being a powerful feature of the Python language, they are rarely used because they are not needed most of the time. They are somewhat less understood and are even considered voodoo by some developers because of their seemingly magic effects. They can be sometimes useful, as we will see below.
But first a little bit of theory. If you are familiar with object-oriented programming, you know that classes are used to create objects (instances of classes). A class defines the common characteristics of its instances, what attributes they have and how they behave.
But where do classes come from, what creates them in the first place? The answer is metaclasses.
The key thing to note is that in Python classes themselves are objects, too! Just pause for a second and repeat this to yourself a couple of times. Classes are objects, and they are instances of metaclasses. In the same way that you create a new object by instantiating a class, you create a new class by instantiating a metaclass.
To illustrate that, let me give you an example:
As you can see, new classes can be created either by using the reserved keyword class (the most common way), or by creating a new instance of a metaclass.
In the example above we used type, the Python's built-in metaclass. It accepts three arguments:
- The name of the new class we are creating.
- A tuple of base classes for the new class. We used the IngestService base class for both of the new classes.
- A dictionary containing the attribute names and their values of the new class. We defined only the PROVIDER attribute.
When invoked with three arguments, the return value of type() is a new class, which we then assigned to two variables, FTPService and RssIngestService, respectively.
Customising the new class creation
The core idea behind the automatic component registration is that we can hook into the process of creating new classes and customise it. We do that by creating our own metaclass, adjust the default class creation mechanism, and then tell Python to use this custom metaclass instead of the built-in type metaclass.
The __new__() method receives four arguments:
- The metaclass that we want to instantiate (i.e. create a new class from it).
- The name of the new class.
- A tuple with all base classes of the class we are going to create
- A dictionary with all the attributes of the new class (their names and values).
We first create a new class by using the __new__() method of the parent metaclass (i.e. type). We then check for the existence of the PROVIDER attribute and if the new class defines it, we register the new class as a provider under this name (if the value of PROVIDER is not yet taken, of course).
We then set the metaclass of the base IngestService class to the custom metaclass we just defined (AutoRegisteredMeta) and all custom ingest classes get registered automatically behind the scenes, relieving the developers from having to register them explicitly and from checking all the existing ingest classes for possible PROVIDER name conflicts.
If interested, you can see the pull request that introduced this improvement in Superdesk here. Sure it might look a bit like black magic at a first glance, but when used properly, metaclasses can turn out to be an elegant solution for specific types of problems.