A Folder Of Jobs¶
This section introduces the Folder
class and how it can be used to
manage collections of files containing application objects. Folders
can be treated like maps of objects. Hierarchies of folders with specific
content at each point can be created and managed in a clear and concise
way.
A Folder In The Filesystem¶
A Folder
is an absolute location in the filesystem. Once created
it always refers to the same location, independent of where the host
application may move to:
>>> import ansar as ar
>>>
>>> f = ar.Folder('mighty-thor')
>>> f.path
'/home/bjorn/mighty-thor'
Bjorn is working in his home folder. Internally the Folder
object
converts the relative name mighty-thor
to the full pathname. All subsequent operations
on the object will operate on that absolute location. Full pathnames passed to
the Folder
are adopted without change and no name at all is a
synonym for the current folder.
Creation of Folder
objects is also the mechanism for creation
of folders in the filesystem. This means that the mighty-thor
folder is assured to
exist on disk once the f
variable has been assigned. Any errors result in an exception.
A Folder Of Folders And Files¶
The following code has an excellent chance of producing a folder hierarchy in your own home folder:
import os
import ansar as ar
home = ar.Folder(os.environ['HOME'])
gods = home.folder('gods')
odin = gods.folder('odin')
loki = gods.folder('loki')
thor = gods.folder('thor')
That hierarchy will look like this:
Note the use of the folder()
method to create sub-folders from
the parent. The new Folder
refers to the absolute location below
the parent.
Remembering the ReceivedJob
class, work can now be delegated with the following:
f = loki.file('job', ReceivedJob)
j = ReceivedJob(title='royal decree', service='herculean task')
f.store(j)
The file()
method is used to create a File
object at the absolute location provided by the parent folder object. The store()
method is used to pass the job on to Loki.
Listing The Files In A Folder¶
A folder is a container of files. These can be fixed decorations on a known hierarchy of folders, or they can be a dynamic collection, where the set of files available at any one time is unknown. This is the case for a spooling area where jobs are persisted until completed or abandoned. The next few paragraphs are relevant to folders that behave like spooling areas.
Assuming that loki
is conscientious about his responsibilites, he
might check for new assignments using this:
received = [m for m in loki.matching()]
The matching()
generator method returns a sequence
of the filenames detected in the folder. Given the following folder listing:
$ ls /home/bjorn/gods/loki
2888-43c4-998f-3b5671f69459.xml 4409-4182-a1fc-dde4004ccbe9.xml
549d-4ba9-9a08-f77b50540c92.xml 2856-4e96-bc0b-3840ae3b2c6a.xml
3128-4f85-9729-691661b55682.xml 2eaf-4efb-b07a-aa1ad6e67d04.xml
631b-4f18-9207-0e39940a668b.xml 1fae-4dc2-b274-149f7520bed0.xml
4995-40a3-8ccd-116bcf78fd83.xml 5f26-4d12-8276-b615244edc4e.xml
3dec-4518-be5b-953065216afc.xml b11b-4d55-8168-cdeab30ae771.xml
configuration.json
The matching()
method will return the
sequence “2888-43c4-998f-3b5671f69459”, “4409-4182-a1fc-dde4004ccbe9”,
“549d-4ba9-9a08-f77b50540c92”, etc. The method automatically truncates the
file extension resulting in a name suitable for any file operations that might
follow. As always, this automated handling of file extension can be disabled
by passing decorate_names=False
on creation of
the loki
Folder
object.
The configuration
name will not appear in the listing as it does not end
with the extension setting for the folder. If a folder is to contain a mixture
of fixed decorations and dynamic content the proper way to do that is using
the re (i.e. regular expression) parameter on creation of the Folder
object:
loki = gods.folder('loki',
te=ReceivedJob,
re='^.{27}$', encoding=ar.CodecXml)
Note
The te
parameter is optional for the Folder
class,
unlike for the File
class. For this reason it must be named.
This brute-force expression will cause the loki
folder object to limit
its attention to those filenames that are 27 characters long (e.g. the length
of “2888-43c4-998f-3b5671f69459”). Internally the expression match is performed on
the truncated version of the filename - with no file extension. The folder can
then contain fixed decorations and the Folder
methods involved in processing
dynamic content will not “see” them. The configuration.json
file can be
replaced with a configuration.xml
file, if that was the true intent.
It is also valid to create several Folder
objects that
refer to the same absolute location but are created with different re expressions.
As long as the expressions describe mutually exclusive names the different
dynamic collections can exist alongside each other.
Of course, the simplest arrangement is for any dynamic content to be assigned its own dedicated folder. Considering the ease with which folders can be created “on disk” there is less justification for maintaining folders with mixed content.
Working With A Folder Of Files¶
The each()
method is similar
to matching()
except that it returns a sequence of
ready-made File
objects. This means that
the object inside the file is one method call away:
for f in loki.each():
j, _ = f.recover()
# Process the job here.
f.store(j)
The recover()
method, introduced in a previous section, is being
used to load the file contents into a ReceivedJob
. The caller
is free to process the job and perhaps save the results back into the
file.
Yet another method exists to further automate the processing of folders.
The recover()
method goes all the way and returns a sequence of
the ReceivedJob
objects. Actually, it returns a 3-tuple of 1) a unique
key, 2) the recovered object and 3) the detected version. An extra parameter
is required at Folder
construction time:
kn = (lambda j: j.unique_id, lambda j: str(j.unique_id))
loki = gods.folder('loki', te=ReceivedJob,
re='^.{27}$',
encoding=ar.CodecXml,
keys_names=kn)
The keys_names parameter delivers a pair of functions to the Folder
object.
These two functions are used internally during the execution of several Folder
methods, to calculate a key value and a filename, respectively.
When the recover()
method opens a file and loads the contents,
this results in an instance of the te
. The method then calls the first function passing the
freshly loaded object. The function can make use of any of the values within the object to formulate
the key. The constraints are that the result must be acceptable as a unique Python dict
key and
that the value is “stable”, i.e. the key formulated for an object will be the same each time the
object is loaded.
Whatever that function produces becomes the first element of the k, j, _
tuple
below:
jobs = {k: j for k, j, _ in loki.recover()}
This gives the application complete control over the key value used by the dict
comprehension. Calling the store()
method looks like this:
loki.store(jobs)
The method iterates the collection of jobs
writing the latest values from each object
into a system file. To do this it uses the second keys_names
function, passing the
current object and getting a filename in return. The function can make use of any
of the values within the object to formulate the filename. The constraints are that
the result must be acceptable as a filename and the value is “stable”, i.e. the
filename constructed for an object will be the same each time the object is stored.
In advanced use there can also be the need for an additional “tag” that distinguishes
one set of Folder
-related materials from another. Simply adding the “job-”
prefix to the constructed filename is an example of a tag. An additional collection
of objects co-habiting the same space might be given the “schedule-” prefix. The
final effect of the second keys_names
function is that the application has complete
control over where objects are stored, i.e. under what filenames.
There is no requirement relating the keys and the filenames. The set of keys produced
for a set of objects in a Folder
is independent of the set of filenames produced
for those same objects. There can be cases where the same value can be used for both
but doing so is a design choice.
Note
The store()
and recover()
methods are not designed to work
in the same way. The first is a method that accepts an entire dict
whereas
the second is a generator method that can be used to construct a dict
,
by visiting one file at a time. This design difference is because recovery of
objects involves version information and the application needs an opportunity
to respond to that version, for each individual file. Refer to
Versions, Upgrading And Migration for more information.
The individual jobs can be modified:
for k, j in jobs.items():
if update_job(j):
loki.update(jobs, j)
Or the entire collection can be processed and then saved back to the folder as a single operation:
for k, j in jobs.items():
update_job(j)
loki.store(jobs)
There are also methods to support adding new jobs, removing individual jobs and
lastly, the removal of an entire collection. This group of methods assumes the dict
object to be the canonical reference, modifying the related folder contents as needed.
A Few Details¶
The 3 “scanning” methods - matching()
, each()
and recover()
, provide different styles of folder processing. To
avoid the dangers associated with modifications to folder contents during scanning, the latter 2 methods
take filename snapshots using matching()
and then iterate the snapshots.
The style based on the matching()
method is the most powerful but also requires
the most boilerplate code. Using the each()
method avoids the responsibility
of creating a correct File
object and allows for both recover()
and
store()
operations on the individual objects. Lastly, the recover()
method requires the least boilerplate but is constrained in one important aspect;
there is no File
object available. Processing a folder with the recover()
method is a “read-only” process - without a File
object there can be no store()
.
The clear()
method uses a snapshot to select files for deletion, rather
than a wholesale delete of all folder contents. This preserves the integrity of the
folder where it is being shared with fixed files, and other Folder
objects
defined with different re expressions.
Snapshots are also used to delete any “dangling” files at the end of a call
to store()
. This ensures that the set of files in the folder is consistent
with the contents of the presented dict
.