More About Types

This section looks more closely at what Ansar can provide when data requirements become more demanding. Perhaps a configuration file needs to contain a list of recently opened files, or a job scheduler needs to maintain a table of performance metrics.

A series of changes are made to the ReceivedJob introduced in an earlier section. These changes involve progressively more complex types. The full series introduces enough information to cover most application data requirements. A separate section looks into the more arcane corners of data.

This evolution of ReceivedJob is also used as a vehicle for illustrating how Ansar assists the application with version support. The library provides version-related information at crucial moments. It’s up to the application to take the proper action. In general that means responding intelligently to old versions of stored data. Refer to Versions, Upgrading And Migration for a how-to on version support.

A Refresh Cycle

The starting point is the original declaration:

import uuid
import ansar as ar

class ReceivedJob(ar.Message):
    def __init__(self, unique_id=None, title='watchdog', priority=10, service='noop', body=b''):
        ar.Message.__init__(self)
        self.unique_id = unique_id or uuid.uuid4()
        self.title = title
        self.priority = priority
        self.service = service
        self.body = body

ar.bind(ReceivedJob)

Members of the class are listed below along with their respective Python types:

  • unique_id - uuid.UUID

  • title - str

  • priority - int

  • service - str

  • body - bytes

These types are said to be “complete” in that no further type information is needed by the library, during its internal activities on behalf of members. An example of an incomplete type would be a list, in that the contents of the list are unknown.

These types are also known as the “autos” in that any detected instance of these types is automatically treated in the expected way - a “10” is treated as a Python int and a “0.25” is treated as a Python float. This might seem redundant but there are actually many cases where the same Python type is used for different purposes. The overlap can result in unexpected and undesirable results.

Remembering When

Recording the moment a job was presented is likely to be useful. Adding a created member looks like this:

import time
import uuid
import ansar as ar

class ReceivedJob(ar.Message):
    def __init__(self, created=None, unique_id=None,
        title='watchdog', priority=10, service='noop', body=b''):
        ar.Message.__init__(self)
        self.created=created or time.time()
        self.unique_id=unique_id or uuid.uuid4()
        self.title=title
        self.priority=priority
        self.service=service
        self.body=body

ar.bind(ReceivedJob)

This new declaration satisfies the “type-complete” requirement - it can immediately be put to work. New features and functionality can be based on the accurate storage and recovery of the creation time, which is great. However, a peek at the stored materials is disappointing:

{
    "value": {
        "body": "",
        "created": 1592693932.694369,
        "priority": 10,
        "service": "noop",
        "title": "watchdog",
        "unique_id": "9a61f70f-bee6-41ce-b175-ae403384af46"
    }
}

The created member appears as a floating-point value. This might be functionally valid but the stored representation is useless as a time. The file is effectively unreadable and it’s not possible to modify the value without a lot of bother. In this scenario that capability is less likely to be needed, but consider an application configuration file that contained a date of birth. The stored birthday could not be readily checked or modified using a text editor.

Passing additional type information at registration time fixes this issue:

ar.bind(ReceivedJob, type_details={'created': ar.WorldTime})

Readability is restored in the file contents:

{
    "value": {
        "body": "",
        "created": "2020-06-20T22:58:52.829716Z",
        "priority": 10,
        "service": "noop",
        "title": "watchdog",
        "unique_id": "cfcd943f-2254-4a3a-836d-4c0842624d27"
    }
}

The Python float type is used for both mathematical purposes and for the marking of time. The library understands this overloading and provides a mechanism for users to be specific about their intentions.

Then There Was None

With specific type information available there is no need to initialize the associated member with a value. The library gives the specific information precedence over what it finds in the object itself. This is useful as it allows None values to flow around in a much more natural way. It also means that initializers involving computed values can be removed, saving machine cycles and perhaps avoiding undesirable side-effects. Both the created and unique_id members can benefit from the related adjustment:

import ansar as ar

class ReceivedJob(ar.Message):
    def __init__(self, created=None, unique_id=None,
        title='watchdog', priority=10,
        service='noop', body=b''):
        ar.Message.__init__(self)
        self.created=created
        self.unique_id=unique_id
        self.title=title
        self.priority=priority
        self.service=service
        self.body=body

ar.bind(ReceivedJob, type_details={
    'created': ar.WorldTime,
    'unique_id': ar.UUID,
})

Allowing None values is a significant design decision. Providing usable, default values for all members is generally safer, especially where a container is involved. In the case of fixed-size arrays this is even more so.

Note

As an alternative, there are a small set of functions with names like default_time() and default_uuid() that return valid examples of the implied type. They do so at the least possible expense to the application. The actual values are undefined and should not be compared against other values. In the context of recovering stored representations, the assumption is that these default values will be immediately overwritten by the incoming data. There are also default_ functions that help with declaration of the container types. See some example usage in the following section.

Names For Runtime Numbers

Where there are operational numbers that need readable representation an Enumeration is used to map the numbers to strings during storage and from strings to numbers during recovery. Declaration looks like this:

import ansar as ar

ModeOfTransport = ar.Enumeration(CAR=1, TRUCK=2, MOTORCYCLE=3, BOAT=100, SCOOTER=10, SKATEBOARD=11)

class ReceivedJob(ar.Message):
    def __init__(self, unique_id=None, title='watchdog', priority=10, service='noop', body=b'', transport=None):
        ar.Message.__init__(self)
        self.unique_id = unique_id or ar.default_uuid()
        self.title = title
        self.priority = priority
        self.service = service
        self.body = body
        self.transport = transport or ModeOfTransport.MOTORCYCLE

ar.bind(ReceivedJob, type_details={
    'transport': ModeOfTransport,
})

A simple example of usage appears below:

>>> f = ar.File('received_job_transport', ReceivedJob)
>>> j = ReceivedJob()
>>> f.store(j)
>>>
>>> r, _ = f.recover()
>>> r.transport
3

Lastly, the stored representation looks like this:

{
    "value": {
        "body": "",
        "priority": 10,
        "service": "noop",
        "title": "watchdog",
        "transport": "MOTORCYCLE",
        "unique_id": "66bc218a-e24b-4b36-a3a1-7985f017e535"
    }
}

Sequences And Collections

A notification feature is added to the job processing machinery. It is decided that notifications will be in the form of emails sent to designated parties. Each job needs a list of zero or more email addresses:

import ansar as ar

class ReceivedJob(ar.Message):
    def __init__(self, created=None, unique_id=None,
        title='watchdog', priority=10,
        service='noop', body=b'', who=None):
        ar.Message.__init__(self)
        self.created=created or ar.default_time()
        self.unique_id=unique_id or ar.default_uuid()
        self.title=title
        self.priority=priority
        self.service=service
        self.body=body
        self.who=who or ar.default_vector()

ar.bind(ReceivedJob, type_details={
    'created': ar.WorldTime,
    'who': ar.VectorOf(ar.Unicode),
})

Given the updated job creation:

ReceivedJob(created=time.time(),
    title='the quick',
    who=['tom.pirate@black.ship', 'gerard.diplomat@ivory.tower', 'aswan.swami@nowhere.everywhere'])

The resulting file contents look like this:

{
    "value": {
        "body": "",
        "created": "1970-01-01T00:00:00Z",
        "priority": 10,
        "service": "noop",
        "title": "the quick",
        "unique_id": "43910cc6-84c4-40f9-9d13-9f8681687108",
        "who": [
            "tom.pirate@black.ship",
            "gerard.diplomat@ivory.tower",
            "aswan.swami@nowhere.everywhere"
        ]
    }
}

A job now has a list of email addresses. The list can be empty but there is always a list present, i.e. the j.who member is always an instance of a Python list. The default_vector() function returns a new list every time it is called.

The type_details parameter is a dict of names and “type expressions”. An instance of the VectorOf class is an instance of a type expression. These are the full set of sequences and collections supported in type expressions:

  • ArrayOf - a fixed-length sequence

  • VectorOf - a variable-length sequence

  • DequeOf - a double-ended queue

  • SetOf - a unique collection

  • MapOf - an associative array

Refer to type expressions for full information. Example usage appears below:

'recent_work': ar.DequeOf(uuid.UUID),
'times_square': ar.ArrayOf(ar.ArrayOf(ar.WorldTime,2),2),
'time_periods': ar.DequeOf(ar.TimeSpan),
'priority_queue': ar.MapOf(int, ar.VectorOf(ReceivedJob))

Given the previously listed type expressions and the following values:

recent_work=ar.deque([
    ar.uuid4(),
    ar.uuid4(),
    ar.uuid4(),
    ar.uuid4(),
]),
times_square=[
    [time.time(),time.time()],
    [time.time(),time.time()]
],
time_periods=ar.deque([
    ar.time_span(hours=1,minutes=2,seconds=3.0),
    ar.time_span(hours=8),
    ar.time_span(minutes=10),
    ar.time_span(seconds=0.0125),
    ar.time_span(days=1),
]),
priority_queue={
    100: [ReceivedJob(title='a'), ReceivedJob(title='b')],
    10: [],
    1: [ReceivedJob(title='!')],
}

The resulting JSON file contents will look like this:

"priority_queue": [
    [
        100,
        [
            {
                "body": "",
                "created": "1970-01-01T00:00:00Z",
                "priority": 10,
                "service": "noop",
                "title": "a",
                "unique_id": "4accaeef-b44a-4b18-8c6b-a28c5f8a2bd5",
                "who": []
            },
            {
                "body": "",
                "created": "1970-01-01T00:00:00Z",
                "priority": 10,
                "service": "noop",
                "title": "b",
                "unique_id": "e2c480ed-8fa6-414e-82b0-8fe8964a67c9",
                "who": []
            }
        ]
    ],
    [
        10,
        []
    ],
    [
        1,
        [
            {
                "body": "",
                "created": "1970-01-01T00:00:00Z",
                "priority": 10,
                "service": "noop",
                "title": "!",
                "unique_id": "bd5fb91c-1369-4430-8555-43935d24e233",
                "who": []
            }
        ]
    ]
],
"recent_work": [
    "390a9d21-bbd8-457a-9f39-72113238bf9f",
    "b3bbd453-e1c5-49de-bddc-68f212964587",
    "242facf7-bfc2-4aea-9e8c-98d43abaca0a",
    "11587a87-e49e-4758-9754-d261b1c7e10e"
],
"time_periods": [
    "1h2m3s",
    "8h",
    "10m",
    "0.0125s",
    "1d"
],
"times_square": [
    [
        "2020-05-30T04:17:07.419894Z",
        "2020-05-30T04:17:07.419894Z"
    ],
    [
        "2020-05-30T04:17:07.419895Z",
        "2020-05-30T04:17:07.419895Z"
    ]
]

Those Fixed-Size Arrays

All the containers have convenient functions to help with the proper initialization of members inside registered classes:

That is - except for arrays. The fundamental reason for that difference is that default instances of list, dict and set are empty. Additional information is required (i.e. the type of the elements and the size) to construct a default instance of a particular array. The fact that the elements of the array could themselves be any type, further complicates the scenario.

A special function is provided that accepts any valid type expression - including expressions involving arrays - and returns a default instance of that type. The following example constructs a 4-by-2 array of integers:

>>> import ansar as ar
>>> t = ar.ArrayOf(ar.ArrayOf(int,2),4)
>>> a = ar.default_value(t)
>>> a
[[0, 0], [0, 0], [0, 0], [0, 0]]

The default_value() function returns the simplest, conforming instance of the specified type. There are several possible styles of usage. A revised version of the ReceivedJob declaration appears below:

import uuid
import ansar as ar

CREATED_TYPE = ar.WorldTime
UNIQUE_ID_TYPE = uuid.UUID
WHO_TYPE = ar.VectorOf(str)

class ReceivedJob(ar.Message):
    def __init__(self, created=None, unique_id=None,
        title='watchdog', priority=10,
        service='noop', body=b'', who=None):
        ar.Message.__init__(self)
        self.created=created or ar.default_value(CREATED_TYPE)
        self.unique_id=unique_id or ar.default_value(UNIQUE_ID_TYPE)
        self.title=title
        self.priority=priority
        self.service=service
        self.body=body
        self.who=who or ar.default_value(WHO_TYPE)

ar.bind(ReceivedJob, type_details={
    'created': CREATED_TYPE,
    'unique_id': UNIQUE_ID_TYPE,
    'who': WHO_TYPE,
})

A set of type expressions appear before the class declaration. The types are used for initialization of members and for registration of the class. This avoids multiple definitions. The consistent use of default_value() rather than a mixture of functions (e.g. default_vector()) and constants (e.g. []) is arguably better engineering. The benefits of this style can become more compelling as the class becomes more complex.

Polymorphic Persistence

The library supports the concept of polymorphism through the Any type. Rather than declaring that a file contains a specific type:

>>> f = ar.File('job', ReceivedJob)
>>> j = ReceivedJob()
>>> f.store(j)
>>> r, _ = f.recover()
>>> r
<__main__.ReceivedJob object at 0x7f75deaf0750>

A file can be declared in this way:

>>> f = ar.File('job', ar.Any)
>>> j = ReceivedJob()
>>> f.store(j)
>>> r, _ = f.recover()
>>> r
<__main__.ReceivedJob object at 0x7f75deaf0750>

To the application the behaviour has not changed. If a second class declaration is introduced:

import ansar as ar

TOD_TYPE = ar.VectorOf(ar.ClockTime)
DOW_TYPE = ar.ArrayOf(bool, 7)

class MaintenanceJob(ar.Message):
        def __init__(self, area=0, tod=None, dow=None):
            ar.Message.__init__(self)
            self.area = area
            self.tod = tod or ar.default_value(TOD_TYPE)
            self.dow = dow or ar.default_value(DOW_TYPE)

ar.bind(MaintenanceJob, type_details={
    'tod': TOD_TYPE,
    'dow': DOW_TYPE,
})

A fresh sequence of operations illustrates the new possibilities:

>>> f = ar.File('job', ar.Any)
>>> r = ReceivedJob()
>>> m = MaintenanceJob()
>>> f.store(r)
>>> a, _ = f.recover()
>>> a
<__main__.ReceivedJob object at 0x7f14419af650>
>>> f.store(m)
>>> a, _ = f.recover()
>>> a
<__main__.MaintenanceJob object at 0x7f14419afe50>

A file declared as polymorphic using the Any type, contains an instance of any registered class, such as ReceivedJob or MaintenanceJob. This capability is especially powerful when used with the Folder class. Using Any, a Folder can legitimately contain a mixture of job files and Folder processing remains essentially the same:

spool = ar.Folder('spool', ar.Any)
for f in spool.each():
    j, v = f.recover()
    if work_on_job(j):
        f.store(j)

..
def work_on_job(j):
    if isinstance(j, ReceivedJob):
        # Process the external job.
        return True
    elif isinstance(j, MaintenanceJob):
        # Process internal job.
        return True
    # Unknown job type.
    return False

This avoids having to represent disparate job details - e.g. for ReceivedJob and MaintenanceJob objects - within a single class. It’s also extensible in that new job types can be added without disturbing the existing codebase and operational sites. Obviously a proper strategy must be in place for when an instance of a new job type somehow reaches an operational site before the associated software update.

Note

The recovery of polymorphic representations also integrates seamlessly with version support. Refer to Versions, Upgrading And Migration.

How This Works

Polymorphism is an important capability. However, there can be some misunderstandings about the scope of what it can do. The simplest way to avoid any confusion is to show a sample of the stored materials and explain that content:

{
    "value": [
        "maintenance_job.MaintenanceJob",
        {
            "area": 0,
            "dow": [
                false,
                false,
                false,
                false,
                false,
                false,
                false
            ],
            "tod": []
        }
    ]
}

The JSON value has changed from an object to a list, containing a string and an object. The string appears to identify the type of the object.

That is exactly the case. The first element is the name of the declared class in a form curated by the library and the second element is exactly what normally appears as the value. This leads to the understanding of why a file created using ar.File('job', ReceivedJob) cannot be recovered by a file declared using ar.File('job', ar.Any).

A polymorphic “recover” operation cannot recover anything, it must be presented with materials created by a polymorphic “store”.

Going Incognito

As a part of polymorphism, the library includes special handling of unknown types. During a store operation the library compiles the identifying string and tags the representation with that name. During a recover operation the library de-compiles that tag to the actual Python class object.

Any failure to de-compile is simply a case of the application not knowing the named type - this is either a problem on the development side (i.e. a bug) or an operations problem. An example of the latter is where a file originating from a different system is presented to an unsuspecting application.

The library detects these scenarios, and during a recover folds the materials into an Incognito object. The outcome is that recover operations of unknown types do not fail in the normal sense. They produce an instance of a special class:

def work_on_job(j):
    if isinstance(j, ReceivedJob):
        # Process the external job.
        return True
    elif isinstance(j, MaintenanceJob):
        # Process internal job.
        return True
    elif isinstance(j, ar.Incognito):
        log(j.type_name)
        return False
    software_error()

This implementation of work_on_job logs the name of any un-registered class. If the job is not an instance of anything tested for by the function, it calls the software_error function. The class of the job was known within the application (i.e. it has been registered) but the function has not yet been updated to perform the related work.

Note

For those who are curious or are needing explanation for something they have observed, the Incognito object never appears in stored materials. The object is un-folded during the store process in a manner matching the prior folding. Effectively this allows representations of unknown type to pass through the application without change.