.. _versions-upgrading-and-migration: Versions, Upgrading And Migration ********************************* The two standard methods for recovering application objects both return a 2-tuple of the application object and a version tag. This section is about the latter and how it can be used to write applications that behave well in the presence of old files. In The Beginning ================ This is a repeat of the journey described in :ref:`more-about-types` except this time the changes made to ``ReceivedJob`` will be tracked. A simple audit trail is created and maintained. Eventually pieces of that trail pop out during recovery of application objects, as version tags. To illustrate the flow of a version tag, consider the original declaration: .. literalinclude:: class-and-store/received_job_basic.py The stored representation looks like: .. literalinclude:: class-and-store/received_job_basic.json There is no evidence of anything suggesting version support - yet. This absence of any version information is deliberate and the proper representation of an `initial version`. The library looks for version information inside the stored representation and if there is nothing there, it fabricates the ``"0.0"`` tag. Consider a store-and-recover cycle of a default ``ReceivedJob``:: >>> f = ar.File('job', ReceivedJob) >>> j = ReceivedJob() >>> f.store(j) >>> r, v = f.recover() >>> print(r) <__main__.ReceivedJob object at 0x7fc300dd7150> >>> print(v) None The version tag has the ``None`` value. It might have been reasonable to expect the ``"0.0"`` tag but the library does more than simply move these tags around. The tag returned by the :py:meth:`~ansar.file.File.recover` method is the result of comparing the tag associated with the recovered materials (i.e. the stored job) and the tag for the current version of ``ReceivedJob`` inside the running application. In the demonstration above these values are guaranteed to be the same as the job was stored and recovered by the same application. Where the two tags are the same the library returns a ``None`` tag to the application, to indicate that no more needs to be done. Everything Changes ================== The first change to ``ReceivedJob`` was to record the creation time. This change, along with a `version history` is shown here: .. literalinclude:: class-and-store/received_job_created_version.py An application running with this version of the ``ReceivedJob`` declaration is no longer at the initial version. This is what happens when that application recovers that same stored job:: >>> f = ar.File('job', ReceivedJob) >>> r, v = f.recover() >>> print(r) <__main__.ReceivedJob object at 0x7f3dd9197b10> >>> print(v) 0.0 The library is notifying the application that the stored job is an older version than the version in use by the application. Storing a representation of the current ``ReceivedJob`` declaration produces the following: .. literalinclude:: class-and-store/received_job_created_version.json A pair of version tags are injected into the object under the opaque name ``_`` (underscore), during the encoding process. There is a nod to the use of underscore as a name in languages like Python and Go, to silence compiler/lint tools. Other names considered were visually noisey. The need to filter out the presence of the version tags was distracting during viewing and editing. The two tags come from the beginning and the end of the version history, the latter also being the current version. The history is currently defined with two entries so there is the appearance that the full history is being injected into every stored representation - which is not the case. The reasons for including the pair rather than just the current version tag are beyond the scope of this documentation. They are related to networking and component interoperability. A Typical Change ================ Making one more change demonstrates a more typical change, i.e. a change not involving the initial version. A list of email addresses is added below: .. literalinclude:: class-and-store/received_job_who_version.py This produces the stored representation: .. literalinclude:: class-and-store/received_job_who_version.json The representation more clearly shows the nature of the injected version information. The first and last version tags from the history appear as a pair. Given the following set of test files: * ``received-job``, an instance of the original class * ``received-job-created``, an instance with the ``created`` member added * ``received-job-who``, an instance with the ``who`` member added >>> f = ar.File('received-job', ReceivedJob) >>> r, v = f.recover() >>> print(r) <__main__.ReceivedJob object at 0x7fc3ef7884d0> >>> print(v) 0.0 >>> f = ar.File('received-job-created', ReceivedJob) >>> r, v = f.recover() >>> print(v) 0.1 >>> f = ar.File('received-job-who', ReceivedJob) >>> r, v = f.recover() >>> print(v) None The last recovered version tag is again ``None``, reflecting that fact that the stored version and the application version are the same. An Intermission And That Tag ============================ A version tag is a string containing a pair of small integers separated by a dot. The two integers are referred to as the major and minor version numbers. Incrementing The Minor Number ----------------------------- The minor number is incremented on every change to the associated class - when a member is added or deleted. Or there are multiple adds and deletes. The type of each member cannot change. If a member wants to change type then it must also accept a different name, i.e. it becomes an add. The old member remains for when an instance of the associated version is recovered. Deletion of a member is in name only - members are only truly deleted from the class declaration in very specific circumstances (see below). This means that when a nominal deletion occurs, the minor number is bumped even though the class definition remains the same. This process can seem strange. In practise the minor number does `not` need to be incremented on `every` change. The version machinery forms the basis for application version support. That support is only critical in the operations of formal installations. The rule should be relaxed in the development environment. The more pragmatic rule is that the minor number is incremented at every `release` of new software. The new software takes its own version number with it and that number is distinct to every stored version tag that it may encounter in the field. Dropping Old Versions --------------------- Eventually there can be a large number of versions. The associated materials become unwieldy to deal with and any software creating and expecting stored representations at those versions, is long gone and forgotten. The oldest entry (or entries) in the version history are simply removed - the related lines of code defining the version-description pairs are deleted. Where the description refers to the nominal deletion of a member, that member can now be properly deleted from the class declaration. During the recovery of a stored representation, the library extracts the stored version information and compares it to the current application version information. If the stored version tag is older than the oldest version in the application version history, the recovery process rejects the input and raises an exception. The stored representation is considered to be `unsupported`. This acknowledges that the representation appears valid in all other ways but the executing application is no longer maintaining that area of code. .. code-block:: python :emphasize-lines: 2,3 rjh = ( ('0.0', 'Initial version'), ('0.1', 'Added created timestamp'), ('0.2', 'Added the who list of email addresses'), ('0.3', 'Added accounting'), ('0.4', 'Deleted the priority number'), ('0.5', 'Added permissions'), ) By deleting the first two lines highlighted above, the ``"0.0"`` and ``"0.1"`` versions immediately become unsupported. Any encounter with stored representations at those versions will result in exceptions. At the same time all code specific to those versions can be retired from the application. When housekeeping work finally catches up with the ``"0.4"`` version and the relevant version-description pair is deleted from the history, the ``priority`` parameter and member can also be deleted from the ``ReceivedJob`` class. Any attempt to recover materials at that version (or before) will result in a version exception, pre-empting the decoding exception that would otherwise occur, due to the presence of a ``priority`` member in the recovered materials and nowhere for it to go. A Brave New World ----------------- The major version number is used to signal a complete reset. The major number is incremented by one and the minor number returns to zero. The version history contains the lone entry:: rjh = ( ('1.0', 'Brave new world'), ) The class is now effectively at an initial version. Recovery of any representation tagged with the previous major number - ``"0.24"`` - results in a rejection by the library. It is considered `inappropriate` to distinguish it from `unsupported`. An exception is raised. Moving to a new major number is likely to reflect significant technical changes in the application - a shift to new tools and/or architecture. Perhaps a re-targeting from customer premises deployment to the cloud. There may be commerical considerations involved. For whatever reason the application wants to continue using the name in the class declaration (e.g. ``ReceivedJob``) but it is starting a new eco-system of software and stored representations, and is not offering any integration with the previous eco-system. Resuming The Journey ==================== Version support begins with the object and version tuple returned by the two ``recover`` methods. This section looks at how an application might respond to these values. The goal is seamless operation in a mixed-version world, but there are at least 2 different ways that this can be achieved. A First Attempt --------------- The ``ReceivedJob`` class has been through 2 changes, giving a total of 3 versions that might be encountered by an application tasked with processing these objects: .. code-block:: python rjh = ( ('0.0', 'Initial version'), ('0.1', 'Added created timestamp'), ('0.2', 'Added the who list of email addresses'), ) Every recovery of a job must contend with all three: .. code-block:: python j, v = f.recover() if v: if v == '0.0': j.created = DEFAULT_CREATED j.who.append(DEFAULT_EMAIL) elif v == '0.1': j.who.append(DEFAULT_EMAIL) else: not_supported() A version value of ``None`` is ignored. Otherwise a series of conditionals arranges for a patching of the job object, according to the detected version. Default values are assigned to those members that did not exist in the respective versions of a ``ReceivedJob``. If the version remains unattended the application calls an error routine. This implementation of version support meets the primary requirement, i.e. seamless processing of mixed versions. A tacit decision was made to promote or `upgrade` older versions to an impersonation of the current version. The actual processing of the job can begin without regard to details of the stored representation. A different coding style looks long-winded but brings an advantage: .. code-block:: python j, v = f.recover() if v: if v == '0.0': j = ReceivedJob(created=DEFAULT_CREATED, unique_id=j.unique_id, title=j.title, priority=j.priority, service=j.service, body=j.body, who=[DEFAULT_EMAIL]) elif v == '0.1': j = ReceivedJob(created=j.created, unique_id=j.unique_id, title=j.title, priority=j.priority, service=j.service, body=j.body, who=[DEFAULT_EMAIL]) else: not_supported() An entirely new job is constructed from the information provided by each version. The result is something closer to a real job in that it uses the current definition of ``ReceivedJob.__init__``. Without explicitly having to do so this approach can drop members that are no longer used. The history of ``ReceivedJob`` does not include any deletions (i.e. nominal deletions count here) so there is currently no benefit. It can be significant in the future, particularly where large, unused members are involved and patched up jobs are being written back to where they came from. A second issue with both of these coding styles is that they are `inline` and over time the application will accumulate more than one call to :py:meth:`~ansar.file.File.recover`. An Upgrade Plan --------------- A more forward-thinking style of coding is to move all the version-related activities to a dedicated function and call it something sensible like ``upgrade``. It cleans up the call site: .. code-block:: python j, v = f.recover() j = upgrade(j, v) It plays nice with the ``dict`` comprehensions of :py:class:`~ansar.folder.Folder` objects: .. code-block:: python jobs = {k: upgrade(j, v) for k, j, v in f.recover()} Lastly, it allows for another kind of growth. There is a reasonable chance that the application will involve more than one type of persisted object: .. code-block:: python def upgrade(r, v): if isinstance(r, ReceivedJob): if v: if v == '0.0': return ReceivedJob(created=DEFAULT_CREATED, unique_id=j.unique_id, title=j.title, priority=j.priority, service=j.service, body=j.body, who=[DEFAULT_EMAIL]) elif v == '0.1': return ReceivedJob(created=j.created, unique_id=j.unique_id, title=j.title, priority=j.priority, service=j.service, body=j.body, who=[DEFAULT_EMAIL]) else: not_supported() return r elif isinstance(r, License): if v: .. Eventually a single application ``upgrade`` function may become multiple functions, each one dealing with a logical grouping of types. .. note:: There are several ways to break down the application ``upgrade`` into more digestible pieces. The function and the breakdown are both design and implementation issues for the specific application. The actual transformation of stored representations cannot be provided by a library function. Where the number of types involved becomes unwieldy or there is a performance issue, a more advanced dispatching technique may be applied, e.g. a ``dict`` of ``upgrade`` functions with object type as the key. Smart Upgrades -------------- ``DEFAULT_CREATED`` and ``DEFAULT_EMAIL`` are assigned to the relevent members when the recovered version lacks those particular values. These are hardcoded constants and there will likely be scenarios where runtime values are needed. The optimal approach is a matter of design and preference. A few suggestions follow. Values may be computed just prior to the recovery site, or even perhaps just prior to the upgrade: .. code-block:: python who = [get_who()] jobs = {} for k, j, v in f.recover(): hi = get_hi(j) lo = get_lo(j) j = upgrade(j, v, who=who, hi=hi, lo=lo) jobs[k] = j The implication is that ``get_hi`` and ``get_lo`` are values that will be based on the values present in each ``ReceivedJob``. The ``who`` value does not share that dependency and can be calculated ahead of time prior to the ``for`` loop. These different runtime values are gathered together by the call to ``upgrade`` and become available for population of ``ReceivedJob`` members. Another arrangment elicits a slightly different behaviour: .. code-block:: python who = None .. global who who = who or [get_who()] jobs = {} for k, j, v in f.recover(): hi = get_hi(j) lo = get_lo(j) j = upgrade(j, v, who=who, hi=hi, lo=lo) jobs[k] = j The ``get_who`` function is called on every visit to this code site, until it returns a non-``None`` value. The ``upgrade`` function is of course aware of the availability issue around the ``who`` value. Any runtime values that have no special connection to particulars of a ``recover`` site can be located with the ``upgrade`` function. Similar options exist with respect to placement of the different code elements: .. code-block:: python who = None def upgrade(r, v, hi=100, lo=10): if isinstance(r, ReceivedJob): if v: global who who = who or [get_who()] if v == '0.0': return ReceivedJob(created=DEFAULT_CREATED, unique_id=j.unique_id, title=j.title, priority=j.priority, service=j.service, body=j.body, who=who) elif v == '0.1': return ReceivedJob(created=j.created, unique_id=j.unique_id, title=j.title, priority=j.priority, service=j.service, body=j.body, who=who) .. else: not_supported() return r elif isinstance(r, License): if v: .. Migration - Reducing The Upgrade Workload ----------------------------------------- The goal was seamless operation in a mixed-version world and the ``upgrade`` function ticks that box. An application with a smart, solid implementation of ``upgrade`` can focus its attentions on the current version of ``ReceiveJob``. Any application that is repeatedly upgrading the same stored representations is missing an opportunity to avoid the overhead of all non-initial upgrades. This can happen with an application configuration file, or a job scheduler polling a folder for work. In the case of the former the perception of overhead is low. The reality for the latter can be quite different. A small function provides another option: .. code-block:: python def migrate(f, upgrade, *args, **kwargs): r, v = f.recover() a = upgrade(r, v, *args, **kwargs) if id(a) != id(r): f.store(a) return a This exact function is available within the library - :py:func:`~ansar.version.migrate`. Usage requires a :py:class:`~ansar.file.File` object so consequently the loop is now based on :py:meth:`~ansar.folder.Folder.each`: .. code-block:: python def get_jobs(spool): jobs = {} for f in spool.each(): j = ar.migrate(f, upgrade) k = spool.key(j) jobs[k] = j return jobs .. note:: Runtime values for the ``upgrade`` function have been omitted for clarity. The ``args`` and ``kwargs`` parameters allow the ``migrate`` function to forward them on to ``upgrade`` when needed. A list comprehension reduces to: .. code-block:: python jobs = [migrate(f) for f in spool.each()] The ``migrate`` function detects when the recovered ``ReceivedJob`` is changed by the ``upgrade`` function and stores the results back into the file that it came from. If the file is ever recovered again by the same application, the migrated job gets the express treatment and passes through the upgrade machinery as a current version, i.e. no additional work required. Software applying the ``migrate`` style of version support versus the ``upgrade`` style bring an "auto data migration" behaviour to every software release process. Wherever it goes the files it works with are brought up-to-date with respect to the latest version histories.