A Folder Of Jobs

This section introduces the Folder class and how it can be used to manage collections of files containing application objects. Folders can be treated like maps of objects. Hierarchies of folders with specific content at each point can be created and managed in a clear and concise way.

A Folder In The Filesystem

A Folder is an absolute location in the filesystem. Once created it always refers to the same location, independent of where the host application may move to:

>>> import ansar as ar
>>>
>>> f = ar.Folder('mighty-thor')
>>> f.path
'/home/bjorn/mighty-thor'

Bjorn is working in his home folder. Internally the Folder object converts the relative name mighty-thor to the full pathname. All subsequent operations on the object will operate on that absolute location. Full pathnames passed to the Folder are adopted without change and no name at all is a synonym for the current folder.

Creation of Folder objects is also the mechanism for creation of folders in the filesystem. This means that the mighty-thor folder is assured to exist on disk once the f variable has been assigned. Any errors result in an exception.

A Folder Of Folders And Files

The following code has an excellent chance of producing a folder hierarchy in your own home folder:

import os
import ansar as ar

home = ar.Folder(os.environ['HOME'])
gods = home.folder('gods')
odin = gods.folder('odin')
loki = gods.folder('loki')
thor = gods.folder('thor')

That hierarchy will look like this:

_images/home-zeus.png

Note the use of the folder() method to create sub-folders from the parent. The new Folder refers to the absolute location below the parent.

Remembering the ReceivedJob class, work can now be delegated with the following:

f = loki.file('job', ReceivedJob)
j = ReceivedJob(title='royal decree', service='herculean task')

f.store(j)

The file() method is used to create a File object at the absolute location provided by the parent folder object. The store() method is used to pass the job on to Loki.

Note

The parameters passed on creation of a Folder are all saved in the object and influence the subsequent behaviour of class methods. They are also passed on to the child objects created by the folder() and file() methods, where appropriate.

Listing The Files In A Folder

A folder is a container of files. These can be fixed decorations on a known hierarchy of folders, or they can be a dynamic collection, where the set of files available at any one time is unknown. This is the case for a spooling area where jobs are persisted until completed or abandoned. The next few paragraphs are relevant to folders that behave like spooling areas.

Assuming that loki is conscientious about his responsibilites, he might check for new assignments using this:

received = [m for m in loki.matching()]

The matching() generator method returns a sequence of the filenames detected in the folder. Given the following folder listing:

$ ls /home/bjorn/gods/loki
2888-43c4-998f-3b5671f69459.xml  4409-4182-a1fc-dde4004ccbe9.xml
549d-4ba9-9a08-f77b50540c92.xml  2856-4e96-bc0b-3840ae3b2c6a.xml
3128-4f85-9729-691661b55682.xml  2eaf-4efb-b07a-aa1ad6e67d04.xml
631b-4f18-9207-0e39940a668b.xml  1fae-4dc2-b274-149f7520bed0.xml
4995-40a3-8ccd-116bcf78fd83.xml  5f26-4d12-8276-b615244edc4e.xml
3dec-4518-be5b-953065216afc.xml  b11b-4d55-8168-cdeab30ae771.xml
configuration.json

The matching() method will return the sequence “2888-43c4-998f-3b5671f69459”, “4409-4182-a1fc-dde4004ccbe9”, “549d-4ba9-9a08-f77b50540c92”, etc. The method automatically truncates the file extension resulting in a name suitable for any file operations that might follow. As always, this automated handling of file extension can be disabled by passing decorate_names=False on creation of the loki Folder object.

The configuration name will not appear in the listing as it does not end with the extension setting for the folder. If a folder is to contain a mixture of fixed decorations and dynamic content the proper way to do that is using the re (i.e. regular expression) parameter on creation of the Folder object:

loki = gods.folder('loki',
    te=ReceivedJob,
    re='^.{27}$', encoding=ar.CodecXml)

Note

The te parameter is optional for the Folder class, unlike for the File class. For this reason it must be named.

This brute-force expression will cause the loki folder object to limit its attention to those filenames that are 27 characters long (e.g. the length of “2888-43c4-998f-3b5671f69459”). Internally the expression match is performed on the truncated version of the filename - with no file extension. The folder can then contain fixed decorations and the Folder methods involved in processing dynamic content will not “see” them. The configuration.json file can be replaced with a configuration.xml file, if that was the true intent.

It is also valid to create several Folder objects that refer to the same absolute location but are created with different re expressions. As long as the expressions describe mutually exclusive names the different dynamic collections can exist alongside each other.

Of course, the simplest arrangement is for any dynamic content to be assigned its own dedicated folder. Considering the ease with which folders can be created “on disk” there is less justification for maintaining folders with mixed content.

Working With A Folder Of Files

The each() method is similar to matching() except that it returns a sequence of ready-made File objects. This means that the object inside the file is one method call away:

for f in loki.each():
    j, _ = f.recover()
    # Process the job here.
    f.store(j)

The recover() method, introduced in a previous section, is being used to load the file contents into a ReceivedJob. The caller is free to process the job and perhaps save the results back into the file.

Yet another method exists to further automate the processing of folders. The recover() method goes all the way and returns a sequence of the ReceivedJob objects. Actually, it returns a 3-tuple of 1) a unique key, 2) the recovered object and 3) the detected version. An extra parameter is required at Folder construction time:

kn = (lambda j: j.unique_id, lambda j: str(j.unique_id))

loki = gods.folder('loki', te=ReceivedJob,
    re='^.{27}$',
    encoding=ar.CodecXml,
    keys_names=kn)

The keys_names parameter delivers a pair of functions to the Folder object. These two functions are used internally during the execution of several Folder methods, to calculate a key value and a filename, respectively.

When the recover() method opens a file and loads the contents, this results in an instance of the te. The method then calls the first function passing the freshly loaded object. The function can make use of any of the values within the object to formulate the key. The constraints are that the result must be acceptable as a unique Python dict key and that the value is “stable”, i.e. the key formulated for an object will be the same each time the object is loaded.

Whatever that function produces becomes the first element of the k, j, _ tuple below:

jobs = {k: j for k, j, _ in loki.recover()}

This gives the application complete control over the key value used by the dict comprehension. Calling the store() method looks like this:

loki.store(jobs)

The method iterates the collection of jobs writing the latest values from each object into a system file. To do this it uses the second keys_names function, passing the current object and getting a filename in return. The function can make use of any of the values within the object to formulate the filename. The constraints are that the result must be acceptable as a filename and the value is “stable”, i.e. the filename constructed for an object will be the same each time the object is stored. In advanced use there can also be the need for an additional “tag” that distinguishes one set of Folder-related materials from another. Simply adding the “job-” prefix to the constructed filename is an example of a tag. An additional collection of objects co-habiting the same space might be given the “schedule-” prefix. The final effect of the second keys_names function is that the application has complete control over where objects are stored, i.e. under what filenames.

There is no requirement relating the keys and the filenames. The set of keys produced for a set of objects in a Folder is independent of the set of filenames produced for those same objects. There can be cases where the same value can be used for both but doing so is a design choice.

Note

The store() and recover() methods are not designed to work in the same way. The first is a method that accepts an entire dict whereas the second is a generator method that can be used to construct a dict, by visiting one file at a time. This design difference is because recovery of objects involves version information and the application needs an opportunity to respond to that version, for each individual file. Refer to Versions, Upgrading And Migration for more information.

The individual jobs can be modified:

for k, j in jobs.items():
    if update_job(j):
        loki.update(jobs, j)

Or the entire collection can be processed and then saved back to the folder as a single operation:

for k, j in jobs.items():
    update_job(j)
loki.store(jobs)

There are also methods to support adding new jobs, removing individual jobs and lastly, the removal of an entire collection. This group of methods assumes the dict object to be the canonical reference, modifying the related folder contents as needed.

A Few Details

The 3 “scanning” methods - matching(), each() and recover(), provide different styles of folder processing. To avoid the dangers associated with modifications to folder contents during scanning, the latter 2 methods take filename snapshots using matching() and then iterate the snapshots.

The style based on the matching() method is the most powerful but also requires the most boilerplate code. Using the each() method avoids the responsibility of creating a correct File object and allows for both recover() and store() operations on the individual objects. Lastly, the recover() method requires the least boilerplate but is constrained in one important aspect; there is no File object available. Processing a folder with the recover() method is a “read-only” process - without a File object there can be no store().

The clear() method uses a snapshot to select files for deletion, rather than a wholesale delete of all folder contents. This preserves the integrity of the folder where it is being shared with fixed files, and other Folder objects defined with different re expressions.

Snapshots are also used to delete any “dangling” files at the end of a call to store(). This ensures that the set of files in the folder is consistent with the contents of the presented dict.