More About Types¶
This section looks more closely at what Ansar can provide when data requirements become more demanding. Perhaps a configuration file needs to contain a list of recently opened files, or a job scheduler needs to maintain a table of performance metrics.
A series of changes are made to the ReceivedJob
introduced in an earlier
section. These changes involve progressively more complex types. The full series
introduces enough information to cover most application data requirements. A
separate section looks into the more arcane corners of data.
This evolution of ReceivedJob
is also used as a vehicle for illustrating
how Ansar assists the application with version support. The library provides
version-related information at crucial moments. It’s up to the application to
take the proper action. In general that means responding intelligently to old
versions of stored data. Refer to Versions, Upgrading And Migration for
a how-to on version support.
A Refresh Cycle¶
The starting point is the original declaration:
import uuid
import ansar as ar
class ReceivedJob(ar.Message):
def __init__(self, unique_id=None, title='watchdog', priority=10, service='noop', body=b''):
ar.Message.__init__(self)
self.unique_id = unique_id or uuid.uuid4()
self.title = title
self.priority = priority
self.service = service
self.body = body
ar.bind(ReceivedJob)
Members of the class are listed below along with their respective Python types:
unique_id
- uuid.UUID
title
- str
priority
- int
service
- str
body
- bytes
These types are said to be “complete” in that no further type information is needed by the library, during its internal activities on behalf of members. An example of an incomplete type would be a list, in that the contents of the list are unknown.
These types are also known as the “autos” in that any detected instance of these
types is automatically treated in the expected way - a “10” is treated as a
Python int
and a “0.25” is treated as a Python float
. This might seem
redundant but there are actually many cases where the same Python type is used
for different purposes. The overlap can result in unexpected and undesirable
results.
Remembering When¶
Recording the moment a job was presented is likely to be useful. Adding a created
member looks like this:
import time
import uuid
import ansar as ar
class ReceivedJob(ar.Message):
def __init__(self, created=None, unique_id=None,
title='watchdog', priority=10, service='noop', body=b''):
ar.Message.__init__(self)
self.created=created or time.time()
self.unique_id=unique_id or uuid.uuid4()
self.title=title
self.priority=priority
self.service=service
self.body=body
ar.bind(ReceivedJob)
This new declaration satisfies the “type-complete” requirement - it can immediately be put to work. New features and functionality can be based on the accurate storage and recovery of the creation time, which is great. However, a peek at the stored materials is disappointing:
{
"value": {
"body": "",
"created": 1592693932.694369,
"priority": 10,
"service": "noop",
"title": "watchdog",
"unique_id": "9a61f70f-bee6-41ce-b175-ae403384af46"
}
}
The created
member appears as a floating-point value. This might be
functionally valid but the stored representation is useless as a time. The file
is effectively unreadable and it’s not possible to modify the value without a
lot of bother. In this scenario that capability is less likely to be needed, but
consider an application configuration file that contained a date of birth. The stored
birthday could not be readily checked or modified using a text editor.
Passing additional type information at registration time fixes this issue:
ar.bind(ReceivedJob, type_details={'created': ar.WorldTime})
Readability is restored in the file contents:
{
"value": {
"body": "",
"created": "2020-06-20T22:58:52.829716Z",
"priority": 10,
"service": "noop",
"title": "watchdog",
"unique_id": "cfcd943f-2254-4a3a-836d-4c0842624d27"
}
}
The Python float
type is used for both mathematical purposes and for the
marking of time. The library understands this overloading and provides a
mechanism for users to be specific about their intentions.
Then There Was None¶
With specific type information available there is no need to initialize the
associated member with a value. The library gives the specific information precedence
over what it finds in the object itself. This is useful as it allows None
values
to flow around in a much more natural way. It also means that initializers involving
computed values can be removed, saving machine cycles and perhaps avoiding undesirable
side-effects. Both the created
and unique_id
members can benefit from the
related adjustment:
import ansar as ar
class ReceivedJob(ar.Message):
def __init__(self, created=None, unique_id=None,
title='watchdog', priority=10,
service='noop', body=b''):
ar.Message.__init__(self)
self.created=created
self.unique_id=unique_id
self.title=title
self.priority=priority
self.service=service
self.body=body
ar.bind(ReceivedJob, type_details={
'created': ar.WorldTime,
'unique_id': ar.UUID,
})
Allowing None
values is a significant design decision. Providing usable, default
values for all members is generally safer, especially where a container is involved.
In the case of fixed-size arrays this is even more so.
Note
As an alternative, there are a small set of functions with names like default_time()
and
default_uuid()
that return valid examples of the implied type. They do so at the least
possible expense to the application. The actual values are undefined and should not be
compared against other values. In the context of recovering stored representations, the
assumption is that these default values will be immediately overwritten by the incoming data.
There are also default_
functions that help with declaration of the container types.
See some example usage in the following section.
Names For Runtime Numbers¶
Where there are operational numbers that need readable representation an Enumeration is used to map the numbers to strings during storage and from strings to numbers during recovery. Declaration looks like this:
import ansar as ar
ModeOfTransport = ar.Enumeration(CAR=1, TRUCK=2, MOTORCYCLE=3, BOAT=100, SCOOTER=10, SKATEBOARD=11)
class ReceivedJob(ar.Message):
def __init__(self, unique_id=None, title='watchdog', priority=10, service='noop', body=b'', transport=None):
ar.Message.__init__(self)
self.unique_id = unique_id or ar.default_uuid()
self.title = title
self.priority = priority
self.service = service
self.body = body
self.transport = transport or ModeOfTransport.MOTORCYCLE
ar.bind(ReceivedJob, type_details={
'transport': ModeOfTransport,
})
A simple example of usage appears below:
>>> f = ar.File('received_job_transport', ReceivedJob)
>>> j = ReceivedJob()
>>> f.store(j)
>>>
>>> r, _ = f.recover()
>>> r.transport
3
Lastly, the stored representation looks like this:
{
"value": {
"body": "",
"priority": 10,
"service": "noop",
"title": "watchdog",
"transport": "MOTORCYCLE",
"unique_id": "66bc218a-e24b-4b36-a3a1-7985f017e535"
}
}
Sequences And Collections¶
A notification feature is added to the job processing machinery. It is decided that notifications will be in the form of emails sent to designated parties. Each job needs a list of zero or more email addresses:
import ansar as ar
class ReceivedJob(ar.Message):
def __init__(self, created=None, unique_id=None,
title='watchdog', priority=10,
service='noop', body=b'', who=None):
ar.Message.__init__(self)
self.created=created or ar.default_time()
self.unique_id=unique_id or ar.default_uuid()
self.title=title
self.priority=priority
self.service=service
self.body=body
self.who=who or ar.default_vector()
ar.bind(ReceivedJob, type_details={
'created': ar.WorldTime,
'who': ar.VectorOf(ar.Unicode),
})
Given the updated job creation:
ReceivedJob(created=time.time(),
title='the quick',
who=['tom.pirate@black.ship', 'gerard.diplomat@ivory.tower', 'aswan.swami@nowhere.everywhere'])
The resulting file contents look like this:
{
"value": {
"body": "",
"created": "1970-01-01T00:00:00Z",
"priority": 10,
"service": "noop",
"title": "the quick",
"unique_id": "43910cc6-84c4-40f9-9d13-9f8681687108",
"who": [
"tom.pirate@black.ship",
"gerard.diplomat@ivory.tower",
"aswan.swami@nowhere.everywhere"
]
}
}
A job now has a list of email addresses. The list can be empty but there is always
a list present, i.e. the j.who
member is always an instance of a Python list
.
The default_vector()
function returns a new list every time it is called.
The type_details
parameter is a dict
of names and “type expressions”. An
instance of the VectorOf class is an instance of a type expression. These are
the full set of sequences and collections supported in type expressions:
ArrayOf
- a fixed-length sequence
VectorOf
- a variable-length sequence
DequeOf
- a double-ended queue
SetOf
- a unique collection
MapOf
- an associative array
Refer to type expressions for full information. Example usage appears below:
'recent_work': ar.DequeOf(uuid.UUID),
'times_square': ar.ArrayOf(ar.ArrayOf(ar.WorldTime,2),2),
'time_periods': ar.DequeOf(ar.TimeSpan),
'priority_queue': ar.MapOf(int, ar.VectorOf(ReceivedJob))
Given the previously listed type expressions and the following values:
recent_work=ar.deque([
ar.uuid4(),
ar.uuid4(),
ar.uuid4(),
ar.uuid4(),
]),
times_square=[
[time.time(),time.time()],
[time.time(),time.time()]
],
time_periods=ar.deque([
ar.time_span(hours=1,minutes=2,seconds=3.0),
ar.time_span(hours=8),
ar.time_span(minutes=10),
ar.time_span(seconds=0.0125),
ar.time_span(days=1),
]),
priority_queue={
100: [ReceivedJob(title='a'), ReceivedJob(title='b')],
10: [],
1: [ReceivedJob(title='!')],
}
The resulting JSON file contents will look like this:
"priority_queue": [
[
100,
[
{
"body": "",
"created": "1970-01-01T00:00:00Z",
"priority": 10,
"service": "noop",
"title": "a",
"unique_id": "4accaeef-b44a-4b18-8c6b-a28c5f8a2bd5",
"who": []
},
{
"body": "",
"created": "1970-01-01T00:00:00Z",
"priority": 10,
"service": "noop",
"title": "b",
"unique_id": "e2c480ed-8fa6-414e-82b0-8fe8964a67c9",
"who": []
}
]
],
[
10,
[]
],
[
1,
[
{
"body": "",
"created": "1970-01-01T00:00:00Z",
"priority": 10,
"service": "noop",
"title": "!",
"unique_id": "bd5fb91c-1369-4430-8555-43935d24e233",
"who": []
}
]
]
],
"recent_work": [
"390a9d21-bbd8-457a-9f39-72113238bf9f",
"b3bbd453-e1c5-49de-bddc-68f212964587",
"242facf7-bfc2-4aea-9e8c-98d43abaca0a",
"11587a87-e49e-4758-9754-d261b1c7e10e"
],
"time_periods": [
"1h2m3s",
"8h",
"10m",
"0.0125s",
"1d"
],
"times_square": [
[
"2020-05-30T04:17:07.419894Z",
"2020-05-30T04:17:07.419894Z"
],
[
"2020-05-30T04:17:07.419895Z",
"2020-05-30T04:17:07.419895Z"
]
]
Those Fixed-Size Arrays¶
All the containers have convenient functions to help with the proper initialization of members inside registered classes:
That is - except for arrays. The fundamental reason for that difference is that default instances
of list
, dict
and set
are empty. Additional information is required (i.e. the type
of the elements and the size) to construct a default instance of a particular array. The fact that
the elements of the array could themselves be any type, further complicates the scenario.
A special function is provided that accepts any valid type expression - including expressions involving arrays - and returns a default instance of that type. The following example constructs a 4-by-2 array of integers:
>>> import ansar as ar
>>> t = ar.ArrayOf(ar.ArrayOf(int,2),4)
>>> a = ar.default_value(t)
>>> a
[[0, 0], [0, 0], [0, 0], [0, 0]]
The default_value()
function returns the simplest, conforming instance of the
specified type. There are several possible styles of usage. A revised version of the ReceivedJob
declaration appears below:
import uuid
import ansar as ar
CREATED_TYPE = ar.WorldTime
UNIQUE_ID_TYPE = uuid.UUID
WHO_TYPE = ar.VectorOf(str)
class ReceivedJob(ar.Message):
def __init__(self, created=None, unique_id=None,
title='watchdog', priority=10,
service='noop', body=b'', who=None):
ar.Message.__init__(self)
self.created=created or ar.default_value(CREATED_TYPE)
self.unique_id=unique_id or ar.default_value(UNIQUE_ID_TYPE)
self.title=title
self.priority=priority
self.service=service
self.body=body
self.who=who or ar.default_value(WHO_TYPE)
ar.bind(ReceivedJob, type_details={
'created': CREATED_TYPE,
'unique_id': UNIQUE_ID_TYPE,
'who': WHO_TYPE,
})
A set of type expressions appear before the class declaration. The types are used for
initialization of members and for registration of the class. This avoids multiple definitions.
The consistent use of default_value()
rather than a mixture of
functions (e.g. default_vector()
) and constants (e.g. []
) is arguably
better engineering. The benefits of this style can become more compelling as the class becomes
more complex.
Polymorphic Persistence¶
The library supports the concept of polymorphism through the Any type. Rather than declaring that a file contains a specific type:
>>> f = ar.File('job', ReceivedJob)
>>> j = ReceivedJob()
>>> f.store(j)
>>> r, _ = f.recover()
>>> r
<__main__.ReceivedJob object at 0x7f75deaf0750>
A file can be declared in this way:
>>> f = ar.File('job', ar.Any)
>>> j = ReceivedJob()
>>> f.store(j)
>>> r, _ = f.recover()
>>> r
<__main__.ReceivedJob object at 0x7f75deaf0750>
To the application the behaviour has not changed. If a second class declaration is introduced:
import ansar as ar
TOD_TYPE = ar.VectorOf(ar.ClockTime)
DOW_TYPE = ar.ArrayOf(bool, 7)
class MaintenanceJob(ar.Message):
def __init__(self, area=0, tod=None, dow=None):
ar.Message.__init__(self)
self.area = area
self.tod = tod or ar.default_value(TOD_TYPE)
self.dow = dow or ar.default_value(DOW_TYPE)
ar.bind(MaintenanceJob, type_details={
'tod': TOD_TYPE,
'dow': DOW_TYPE,
})
A fresh sequence of operations illustrates the new possibilities:
>>> f = ar.File('job', ar.Any)
>>> r = ReceivedJob()
>>> m = MaintenanceJob()
>>> f.store(r)
>>> a, _ = f.recover()
>>> a
<__main__.ReceivedJob object at 0x7f14419af650>
>>> f.store(m)
>>> a, _ = f.recover()
>>> a
<__main__.MaintenanceJob object at 0x7f14419afe50>
A file declared as polymorphic using the Any type, contains an instance of any registered
class, such as ReceivedJob
or MaintenanceJob
. This capability is especially powerful when
used with the Folder
class. Using Any, a Folder
can
legitimately contain a mixture of job files and Folder
processing remains essentially
the same:
spool = ar.Folder('spool', ar.Any)
for f in spool.each():
j, v = f.recover()
if work_on_job(j):
f.store(j)
..
def work_on_job(j):
if isinstance(j, ReceivedJob):
# Process the external job.
return True
elif isinstance(j, MaintenanceJob):
# Process internal job.
return True
# Unknown job type.
return False
This avoids having to represent disparate job details - e.g. for ReceivedJob
and
MaintenanceJob
objects - within a single class. It’s also extensible in that new
job types can be added without disturbing the existing codebase and operational sites.
Obviously a proper strategy must be in place for when an instance of a new job type
somehow reaches an operational site before the associated software update.
Note
The recovery of polymorphic representations also integrates seamlessly with version support. Refer to Versions, Upgrading And Migration.
How This Works¶
Polymorphism is an important capability. However, there can be some misunderstandings about the scope of what it can do. The simplest way to avoid any confusion is to show a sample of the stored materials and explain that content:
{
"value": [
"maintenance_job.MaintenanceJob",
{
"area": 0,
"dow": [
false,
false,
false,
false,
false,
false,
false
],
"tod": []
}
]
}
The JSON value
has changed from an object to a list, containing a string and
an object. The string appears to identify the type of the object.
That is exactly the case. The first element is the name of the declared class in a form
curated by the library and the second element is exactly what normally appears as the
value
. This leads to the understanding of why a file created using
ar.File('job', ReceivedJob)
cannot be recovered by a file declared using
ar.File('job', ar.Any)
.
A polymorphic “recover” operation cannot recover anything, it must be presented with materials created by a polymorphic “store”.
Going Incognito¶
As a part of polymorphism, the library includes special handling of unknown types. During a store operation the library compiles the identifying string and tags the representation with that name. During a recover operation the library de-compiles that tag to the actual Python class object.
Any failure to de-compile is simply a case of the application not knowing the named type - this is either a problem on the development side (i.e. a bug) or an operations problem. An example of the latter is where a file originating from a different system is presented to an unsuspecting application.
The library detects these scenarios, and during a recover folds the materials into an
Incognito
object. The outcome is that recover operations of unknown types do not
fail in the normal sense. They produce an instance of a special class:
def work_on_job(j):
if isinstance(j, ReceivedJob):
# Process the external job.
return True
elif isinstance(j, MaintenanceJob):
# Process internal job.
return True
elif isinstance(j, ar.Incognito):
log(j.type_name)
return False
software_error()
This implementation of work_on_job
logs the name of any un-registered class. If the
job is not an instance of anything tested for by the function, it calls the software_error
function. The class of the job was known within the application (i.e. it has been registered)
but the function has not yet been updated to perform the related work.
Note
For those who are curious or are needing explanation for something they have observed, the
Incognito
object never appears in stored materials. The object
is un-folded during the store process in a manner matching the prior folding. Effectively
this allows representations of unknown type to pass through the application without change.