PATCH With minimal data or PUT with a lof of data for updates

Thomas Sødring thomas.sodring at hioa.no
Sat Feb 11 21:05:13 CET 2017


Hi,

Following up the lack of continued work on the Noark 5v4 interface, I'd like to raise an issue that we might be able to have influence on. First a few points on CRUD operations in the core.

I think the CREATE and READ operations in the interface are straight
forward. I think the CREATE is OK, while the READ is a little challenging
in that we need to support OData. Currently have a find and findAll.

A quick side not on delete. The core has implemented soft delete (data is marked as deleted, but still exists), but delete is not really something
that happens in a Noark core. Disposal, a type of delete can happen.
But that's for another day ...

Update worries me. On page 19 they show how an update should be issued.
It is a PUT request with the contents of the object to update. 

Taking the example on page 19. Let's say I just want to update the
title. The example shows that I should be in possession of the primary
entity, mappe, as well as secondary entities. In this case
documentMedium and classified (no:gradering).

"Alle egenskaper må være med, med unntak av underobjekter som har en
mange relasjon"

The client should have been in possession of all secondary entities from
the original GET, but if the client does not use the same
representation, for example, chooses to ignore documentmedium on the
current front end page, then the new payload might not be complete.

I do not see the need for the presence of secondary entities during an
update anyway. Secondary entities are updated in their own way, so why
do we need to pass secondary entities back from the client to the core?

I think this  makes the business logic unnecessary messy in both the
client and core! We are requiring that the client knows what it is
doing, when assembling the updated object, when all it wants to do is
change the title. In the core, we then have to compare all fields in the
primary entity and figure out which ones changed, or simply update the
entire row. I guess the latter is an approach a lazy programmer might use.

The problem with updating the entire row is that some fields cannot be
changed once set. createdDate is one of these fields. As it stands the
HATEOAS description is giving the user an impression that this field can
be changed by allowing the presence of the field createdDate in the payload.

Another strange thing here is, what happens if the client also has
updated a secondary entity in the PUT request? Unless the business logic
is sound it will return a 200 OK on an updated createdDate and updated
classified (gradering) on such a request. When clearly that should be an
illegal request!

Now, if we used a PATCH request with a simple payload

{
"tittel": "The new title of file"
}

with a command:

http://localhost:8092/noark5v4/hateoas-api/arkivstruktur/mappe/b645fd1a-8cf8-4e7b-b583-fe824bb57e08/"

we would have a simple elegant way of handling updates. The client just
needs the primary identifier and a payload with the fields to change.

If multiple fields are to be updated:

{
"tittel": "The new title of file",
"beskrivelse": "The new description of file"
}


If the following payload occurs:

{
"beskrivelse": "The new description of file",
"createdDate": "2017-02-02T12:00:00"
}

the the core should return a 404 with a description of an illegal
payload. Petter pointed to some XSD descriptions of payload, perhaps all
that is needed is a subset of these.

Granted this means that the client has to know what was changed at the
front end, but surely the client can do some work as well!

I think PATCH is a simpler approach. It reduces the complexity of server
side handling having to try and figure out what has changed and just
change what needs to be changed. My understanding is that PATCH is better suited for this, while a PUT is maybe better suited for updating an entire object.

To put this another way. I think CREATE (POST) and READ (GET) should have entity payloads, while UPDATE (PATCH) should have field payloads. Unless there really is a requirement to update entire entities in one go, which I don't think there is.

An argument against my reasoning is that every Noark core should
implement the business logic correctly. Fair enough!

But history shows us that some Noark 4 systems have not remained
compliant with the standard resulting in inconsistent data in the
database that make it difficult to produce an extraction. To be fair to
the vendors, when the standard itself said "vendors are allowed deviate
from this model if necessary", then these things will happen.

Simply saying "It's in the standard" does not work. The National Archive
do not enforce the Noark standard. Approval is done on self-assessment
basis, vendors promise that their system is in compliance. We now have
about 7 versions of Noark 5 with slight changes between versions. So I
really wonder what all the various municipalities are really using!

For this to be robust, the National Archive has to produce a proper
description without ambiguity and enforce compliance! Further we need a
rigid and robust testing mechanism, something like what Petter is
producing, that can capture all the different strange combinations of
things that can occur. If the National Archive is unable to answer
simple questions about the interface, then I doubt they will be in a position to develop a robust description.

So I guess this post is about three things.

1. Is PATCH on fields, a better approach then PUT on entities?
2. Should we push this as something we want the National Archive to
consider?
3. Any other thoughts / comments?

I raised this issue last year with them, but didn't get a reply. When the work starts up again, maybe we should push this a bit more. But I'd like feedback about my thinking first.

 - Tom



More information about the nikita-noark mailing list