Most of the time version control (VC) is more than welcome in IT solutions, providing ways to roll back unwanted changes, monitor work progress, analyze modification trends, etc. etc. For Django, reversion seems to be the de facto standard for versioning model instances. Unfortunately it is based on serialization (in e.g. JSON) of objects and storing them in separate revisions table, which is of very low practical value if we want to perform the latter two of the VC-related actions I mentioned as it would require deserialization of all the stored instances. I’d like to introduce a different solution which is a lot simpler, doesn’t require any additional tables and allows for robust analysis of time trends. Of course it’s not all pros but I’ll get to that later 😉
Version Control for the Poor (VCP) 😉 is based on the concept of ranges of dates. Every model instance is tagged with two fields – date_from and date_to, which specify the time period of this particular instance being the “head” revision of a unique object identified by order_number field. The current head revision has date_to field set to NULL.
Many to many (M2M) relations are shallow-copied, which means that only relation to unique (in order_number sense) object is preserved. No snapshot of the related object is taken, because it is unnecessary. If the related object is later modified we can always retrieve its state from before that modification. This way deep-copy of objects with m2m relations is provided indirectly, e.g. one (date_from, date_to) pair for object with m2m relation could in fact correspond to many more than one revision because many instances of related objects with (rel_date_from, rel_date_to) pairs intersecting (date_from, date_to) range could exist. This also solves the reciprocity problem, i.e. nothing has to be done with the other side of many to many relation when one is modified.
It is implemented on Model / Model Manager level in such a way that it is completely transparent to e.g. Django Admin.
Now, on to the cons. Django is in no way capable of grasping the idea of order_number field so I had to devise a workaround. The first instance of a particular order_number object is always the head revision. Whenever a modification is made, instead of “terminating” that instance by setting its date_to field to current date and creating a new one, a “terminated copy” is made, and existing instance is updated by setting its date_from field to current date and leaving date_to as NULL. A dedicated Model Manager is provided which simply ignores all the instances with date_to other than NULL, effectively ignoring historical records. This way Django being blind to order_number concept can rely on a regular primary key to describe relations to order_numbers. I on the other hand will have to go through additional join-related hassle to correctly interpret historical records. But it seems a reasonable price to pay for this level of transparency. Another consequence of this approach is inability to delete versioned models. In principle date_to of the current instance could be just set to current date leaving only terminated instances but then having this sort of Model Manager what would Django (and Django Admin in particular) do about the objects relating to the removed one…
An example below illustrates current implementation. Rows marked red are head revisions. A cross table says that :00-:05 revision of Model1 with order_number 1 was related to Model2 with order_numbers 1 and 2, :05-:10 revision was related to 1, 2 and 3, :10-:15 was related to 4 and 5, while the current revision is related just to 6. PKs are not correct, they are used just for explanatory purposes. We can also observe that Model2 with order_number 1 was changed at 09:00:02, effectively splitting :00-:05 Model1 with order number 1 revision into two revisions :00-:02 and :02-:05 (the first one being in relation with Model2/ord_num=1,f1=3, the other with Model2/ord_num=1,f2=2). Pretty nice 😉
There are also some purely technical issues with this solution, e.g. keeping sequences for order_number fields (in current implementation, order_number has to be entered manually), concurrency control (table/row-level locking), transactioning, etc. but they hardly affect the core idea.
Please share your thoughts about this solution and let me know if you have any ideas for eliminating its shortcomings.
# # Simplified versioning # Performs shallow versioning of objects with m2m relations! # class VersionedManager(models.Manager): def get_query_set(self): return super(VersionedManager, self).get_query_set().filter(date_to__isnull=True) class VersionedModel(models.Model): class Meta: abstract = True order_number = models.IntegerField() date_from = models.DateTimeField(blank=True) date_to = models.DateTimeField(null=True, blank=True) objects = VersionedManager() def save(self, *args, **kwargs): if self.pk == None: existing = None try: existing = self.__class__.objects.filter(order_number=self.order_number)[0] except: pass if existing != None: raise Exception('Object with this order_number already exists!') else: old_self = self.__class__.objects.get(pk=self.pk) if self.order_number != old_self.order_number: raise Exception('Cannot change order_number, it's permanent') old_self.pk = None old_self.date_to = datetime.now() super(VersionedModel, old_self).save(*args, **kwargs) for f in self.__class__._meta.many_to_many: old_m2m = getattr(self, f.name) preserved_m2m = getattr(old_self, f.name) for o in old_m2m.all(): preserved_m2m.add(o) self.date_from = datetime.now() self.date_to = None super(VersionedModel, self).save(*args, **kwargs) def delete(self, *args, **kwargs): raise Exception('Cannot remove versioned objects.')