.. _more-about-types: More About Types **************** This section looks more closely at what Ansar can provide when data requirements become more demanding. Perhaps a configuration file needs to contain a list of recently opened files, or a job scheduler needs to maintain a table of performance metrics. A series of changes are made to the ``ReceivedJob`` introduced in an earlier section. These changes involve progressively more complex types. The full series introduces enough information to cover most application data requirements. A separate section looks into the more arcane corners of data. This evolution of ``ReceivedJob`` is also used as a vehicle for illustrating how Ansar assists the application with version support. The library provides version-related information at crucial moments. It's up to the application to take the proper action. In general that means responding intelligently to old versions of stored data. Refer to :ref:`versions-upgrading-and-migration` for a how-to on version support. A Refresh Cycle =============== The starting point is the original declaration: .. literalinclude:: class-and-store/received_job_basic.py Members of the class are listed below along with their respective Python types: * ``unique_id`` - uuid.UUID * ``title`` - str * ``priority`` - int * ``service`` - str * ``body`` - bytes These types are said to be "complete" in that no further type information is needed by the library, during its internal activities on behalf of members. An example of an incomplete type would be a list, in that the contents of the list are unknown. These types are also known as the "autos" in that any detected instance of these types is automatically treated in the expected way - a "10" is treated as a Python ``int`` and a "0.25" is treated as a Python ``float``. This might seem redundant but there are actually many cases where the same Python type is used for different purposes. The overlap can result in unexpected and undesirable results. Remembering When ================ Recording the moment a job was presented is likely to be useful. Adding a ``created`` member looks like this: .. literalinclude:: class-and-store/received_job_created.py This new declaration satisfies the "type-complete" requirement - it can immediately be put to work. New features and functionality can be based on the accurate storage and recovery of the creation time, which is great. However, a peek at the stored materials is disappointing: .. literalinclude:: class-and-store/received_job_created.json The ``created`` member appears as a floating-point value. This might be functionally valid but the stored representation is useless as a time. The file is effectively unreadable and it's not possible to modify the value without a lot of bother. In this scenario that capability is less likely to be needed, but consider an application configuration file that contained a date of birth. The stored birthday could not be readily checked or modified using a text editor. Passing additional type information at registration time fixes this issue:: ar.bind(ReceivedJob, type_details={'created': ar.WorldTime}) Readability is restored in the file contents: .. literalinclude:: class-and-store/received_job_created_worldtime.json The Python ``float`` type is used for both mathematical purposes and for the marking of time. The library understands this overloading and provides a mechanism for users to be specific about their intentions. Then There Was None =================== With specific type information available there is no need to initialize the associated member with a value. The library gives the specific information precedence over what it finds in the object itself. This is useful as it allows ``None`` values to flow around in a much more natural way. It also means that initializers involving computed values can be removed, saving machine cycles and perhaps avoiding undesirable side-effects. Both the ``created`` and ``unique_id`` members can benefit from the related adjustment: .. literalinclude:: include/received-job-none.py Allowing ``None`` values is a significant design decision. Providing usable, default values for all members is generally safer, especially where a container is involved. In the case of fixed-size arrays this is even more so. .. note:: As an alternative, there are a small set of functions with names like :py:func:`~ansar.message.default_time` and :py:func:`~ansar.message.default_uuid` that return valid examples of the implied type. They do so at the least possible expense to the application. The actual values are undefined and should not be compared against other values. In the context of recovering stored representations, the assumption is that these default values will be immediately overwritten by the incoming data. There are also ``default_`` functions that help with declaration of the container types. See some example usage in the following section. Names For Runtime Numbers ========================= Where there are operational numbers that need readable representation an :ref:`Enumeration` is used to map the numbers to strings during storage and from strings to numbers during recovery. Declaration looks like this: .. literalinclude:: class-and-store/received_job_transport.py A simple example of usage appears below:: >>> f = ar.File('received_job_transport', ReceivedJob) >>> j = ReceivedJob() >>> f.store(j) >>> >>> r, _ = f.recover() >>> r.transport 3 Lastly, the stored representation looks like this: .. literalinclude:: class-and-store/received_job_transport.json Sequences And Collections ========================= A notification feature is added to the job processing machinery. It is decided that notifications will be in the form of emails sent to designated parties. Each job needs a list of zero or more email addresses: .. literalinclude:: class-and-store/received_job_who.py Given the updated job creation:: ReceivedJob(created=time.time(), title='the quick', who=['tom.pirate@black.ship', 'gerard.diplomat@ivory.tower', 'aswan.swami@nowhere.everywhere']) The resulting file contents look like this: .. literalinclude:: class-and-store/received_job_who.json A job now has a list of email addresses. The list can be empty but there is always a list present, i.e. the ``j.who`` member is always an instance of a Python ``list``. The :py:func:`~ansar.message.default_vector` function returns a `new` list every time it is called. The ``type_details`` parameter is a ``dict`` of names and "type expressions". An instance of the :ref:`VectorOf` class is an instance of a type expression. These are the full set of sequences and collections supported in type expressions: * ``ArrayOf`` - a fixed-length sequence * ``VectorOf`` - a variable-length sequence * ``DequeOf`` - a double-ended queue * ``SetOf`` - a unique collection * ``MapOf`` - an associative array Refer to :ref:`type expressions` for full information. Example usage appears below: .. code-block:: python 'recent_work': ar.DequeOf(uuid.UUID), 'times_square': ar.ArrayOf(ar.ArrayOf(ar.WorldTime,2),2), 'time_periods': ar.DequeOf(ar.TimeSpan), 'priority_queue': ar.MapOf(int, ar.VectorOf(ReceivedJob)) Given the previously listed type expressions and the following values:: recent_work=ar.deque([ ar.uuid4(), ar.uuid4(), ar.uuid4(), ar.uuid4(), ]), times_square=[ [time.time(),time.time()], [time.time(),time.time()] ], time_periods=ar.deque([ ar.time_span(hours=1,minutes=2,seconds=3.0), ar.time_span(hours=8), ar.time_span(minutes=10), ar.time_span(seconds=0.0125), ar.time_span(days=1), ]), priority_queue={ 100: [ReceivedJob(title='a'), ReceivedJob(title='b')], 10: [], 1: [ReceivedJob(title='!')], } The resulting JSON file contents will look like this:: "priority_queue": [ [ 100, [ { "body": "", "created": "1970-01-01T00:00:00Z", "priority": 10, "service": "noop", "title": "a", "unique_id": "4accaeef-b44a-4b18-8c6b-a28c5f8a2bd5", "who": [] }, { "body": "", "created": "1970-01-01T00:00:00Z", "priority": 10, "service": "noop", "title": "b", "unique_id": "e2c480ed-8fa6-414e-82b0-8fe8964a67c9", "who": [] } ] ], [ 10, [] ], [ 1, [ { "body": "", "created": "1970-01-01T00:00:00Z", "priority": 10, "service": "noop", "title": "!", "unique_id": "bd5fb91c-1369-4430-8555-43935d24e233", "who": [] } ] ] ], "recent_work": [ "390a9d21-bbd8-457a-9f39-72113238bf9f", "b3bbd453-e1c5-49de-bddc-68f212964587", "242facf7-bfc2-4aea-9e8c-98d43abaca0a", "11587a87-e49e-4758-9754-d261b1c7e10e" ], "time_periods": [ "1h2m3s", "8h", "10m", "0.0125s", "1d" ], "times_square": [ [ "2020-05-30T04:17:07.419894Z", "2020-05-30T04:17:07.419894Z" ], [ "2020-05-30T04:17:07.419895Z", "2020-05-30T04:17:07.419895Z" ] ] Those Fixed-Size Arrays ======================= All the containers have convenient functions to help with the proper initialization of members inside registered classes: * :py:func:`~ansar.message.default_vector` * :py:func:`~ansar.message.default_set` * :py:func:`~ansar.message.default_map` * :py:func:`~ansar.message.default_deque` That is - except for arrays. The fundamental reason for that difference is that default instances of ``list``, ``dict`` and ``set`` are `empty`. Additional information is required (i.e. the type of the elements and the size) to construct a default instance of a particular array. The fact that the elements of the array could themselves be any type, further complicates the scenario. A special function is provided that accepts any valid type expression - including expressions involving arrays - and returns a default instance of that type. The following example constructs a 4-by-2 array of integers: >>> import ansar as ar >>> t = ar.ArrayOf(ar.ArrayOf(int,2),4) >>> a = ar.default_value(t) >>> a [[0, 0], [0, 0], [0, 0], [0, 0]] The :py:func:`~ansar.message.default_value` function returns the simplest, conforming instance of the specified type. There are several possible styles of usage. A revised version of the ``ReceivedJob`` declaration appears below: .. literalinclude:: class-and-store/received_job_default_value.py A set of type expressions appear before the class declaration. The types are used for initialization of members `and` for registration of the class. This avoids multiple definitions. The consistent use of :py:func:`~ansar.message.default_value` rather than a mixture of functions (e.g. :py:func:`~ansar.message.default_vector`) and constants (e.g. ``[]``) is arguably better engineering. The benefits of this style can become more compelling as the class becomes more complex. Polymorphic Persistence ======================= The library supports the concept of polymorphism through the :ref:`Any` type. Rather than declaring that a file contains a specific type:: >>> f = ar.File('job', ReceivedJob) >>> j = ReceivedJob() >>> f.store(j) >>> r, _ = f.recover() >>> r <__main__.ReceivedJob object at 0x7f75deaf0750> A file can be declared in this way:: >>> f = ar.File('job', ar.Any) >>> j = ReceivedJob() >>> f.store(j) >>> r, _ = f.recover() >>> r <__main__.ReceivedJob object at 0x7f75deaf0750> To the application the behaviour has not changed. If a second class declaration is introduced: .. literalinclude:: include/maintenance-job.py A fresh sequence of operations illustrates the new possibilities:: >>> f = ar.File('job', ar.Any) >>> r = ReceivedJob() >>> m = MaintenanceJob() >>> f.store(r) >>> a, _ = f.recover() >>> a <__main__.ReceivedJob object at 0x7f14419af650> >>> f.store(m) >>> a, _ = f.recover() >>> a <__main__.MaintenanceJob object at 0x7f14419afe50> A file declared as polymorphic using the :ref:`Any` type, contains an instance of any registered class, such as ``ReceivedJob`` or ``MaintenanceJob``. This capability is especially powerful when used with the :py:class:`~ansar.folder.Folder` class. Using :ref:`Any`, a :py:class:`~ansar.folder.Folder` can legitimately contain a mixture of job files and :py:class:`~ansar.folder.Folder` processing remains essentially the same: .. code-block:: python spool = ar.Folder('spool', ar.Any) for f in spool.each(): j, v = f.recover() if work_on_job(j): f.store(j) .. def work_on_job(j): if isinstance(j, ReceivedJob): # Process the external job. return True elif isinstance(j, MaintenanceJob): # Process internal job. return True # Unknown job type. return False This avoids having to represent disparate job details - e.g. for ``ReceivedJob`` and ``MaintenanceJob`` objects - within a single class. It's also extensible in that new job types can be added without disturbing the existing codebase and operational sites. Obviously a proper strategy must be in place for when an instance of a new job type somehow reaches an operational site before the associated software update. .. note:: The recovery of polymorphic representations also integrates seamlessly with version support. Refer to :ref:`versions-upgrading-and-migration`. How This Works -------------- Polymorphism is an important capability. However, there can be some misunderstandings about the scope of what it can do. The simplest way to avoid any confusion is to show a sample of the stored materials and explain that content: .. literalinclude:: class-and-store/maintenance_job_any.json The JSON ``value`` has changed from an `object` to a `list`, containing a string and an `object`. The string appears to identify the type of the object. That is exactly the case. The first element is the name of the declared class in a form curated by the library and the second element is exactly what normally appears as the ``value``. This leads to the understanding of why a file created using ``ar.File('job', ReceivedJob)`` cannot be recovered by a file declared using ``ar.File('job', ar.Any)``. A polymorphic "recover" operation cannot recover `anything`, it must be presented with materials created by a polymorphic "store". Going Incognito --------------- As a part of polymorphism, the library includes special handling of unknown types. During a store operation the library compiles the identifying string and tags the representation with that name. During a recover operation the library `de-compiles` that tag to the actual Python class object. Any failure to de-compile is simply a case of the application not knowing the named type - this is either a problem on the development side (i.e. a bug) or an operations problem. An example of the latter is where a file originating from a different system is presented to an unsuspecting application. The library detects these scenarios, and during a recover `folds` the materials into an :py:class:`~ansar.message.Incognito` object. The outcome is that recover operations of unknown types do not fail in the normal sense. They produce an instance of a special class:: def work_on_job(j): if isinstance(j, ReceivedJob): # Process the external job. return True elif isinstance(j, MaintenanceJob): # Process internal job. return True elif isinstance(j, ar.Incognito): log(j.type_name) return False software_error() This implementation of ``work_on_job`` logs the name of any un-registered class. If the job is not an instance of anything tested for by the function, it calls the ``software_error`` function. The class of the job was known within the application (i.e. it has been registered) but the function has not yet been updated to perform the related work. .. note:: For those who are curious or are needing explanation for something they have observed, the :py:class:`~ansar.message.Incognito` object never appears in stored materials. The object is `un-folded` during the store process in a manner matching the prior `folding`. Effectively this allows representations of unknown type to pass through the application without change.