Idea for work flow to handling importing of documents in bulk

Thomas Sødring thomas.sodring at hioa.no
Tue Mar 21 18:19:33 CET 2017


On 03/20/2017 10:26 PM, Petter Reinholdtsen wrote:
> I got this idea for a workflow for uploading documents in bulk, but am
> unsure will work with Noark 5, and hope you can comment on it.
>
> I would like to have a way to upload documents / emails to the archive
> in bulk / non-interactively, and update the metadata later.  My idea is
> to have a way to automatically upload documents (say allow a mailing
> list to store a copy of posted messages in the archive), import lots of
> PDFs from a directory or set up a "print queue" that feed the
> archive. The required metadata (title, to, from, file, etc) would be
> automatically or manually added after the document is stored in the
> archive.  This would allow the archive system to guess as much metadata
> as possible from the document itself.  As far as I can tell, the Noark 5
> web API is not made to allow this, and intuitive use of the API will
> require metadata to be inserted before the dokument is uploaded.
Let's assume your structure is the following:

arkiv->arkivdel->mappe->registrering->dokumentobjekt

you don't need dokumentbesrkivelse I believe.

>From what I can tell the only fields that dokumentobjekt requires at
creation  are:

 * versjonsnummer
 * variantformat

Unless you are storing multiple versions, versjonsnummer= 1 and
variantformat is either "Arkivformat" or "Produksjonsformat"

registrering has no required fields, so a systemId is assigned and it is
associated with the mappe.

mappe has mappeID and tittel as required fields.

So I think you can import a lot into the core given this structure.

So if you later want to add casehandling information, then you can do a
utvid-til-saksmappe and insert the missing data.

"utvid-til-saksmappe" is part of a mangelmelding at the moment as there
is no "utvid-til-journalpost" for registrering defined.

After that fields can be updated via PATCH requests. This is not
supported at the moment though, as it is also subject to a mangelmelding.

> But if I understand correctly, it is possible to have a noark 5 file
> (mappe) in the archive which is not listed in the public journal.  Is
> this correct?  Also, if I understand correctly, it is possible to move a
> document from one file to another (I assume it would involve moving the
> mappe -> dokumentbeskrivelse connection).  Is this correct?
Yes but it is the journalpost that exposes this. The only field I find
related to this is M501 skjermingshjemmel.  So this is something that
needs special attention. OEP produced a document "For
innholdsleverandørene til OEP". I don't think that really gives us the
answer either.

https://www.regjeringen.no/globalassets/upload/fad/vedlegg/informasjonspolitikk/oep_veiledning_2.pdf

Even the description of what metadata to protect is missing in the
interface from what I can tell, or certainly things like the title
should be protected.

You are correct. Noark opens up to move documents that are in the wrong
mappe. In this case the registrering will probably be moved. Unless you
are trying to create a copy in the other mappe. It's interesting, what
if you just want to move a vedlegg to another mappe. I think we need a
proper description of this, as in some cases you won't move you'll
reregister the document in the other mappe

>
> If so, each 'user' of the archiving system can have a 'temporary' file,
> in which every automatically uploaded document is attached.  The user
> can then go through every document in her 'temporary' file, update the
> document metadata and assign it to the appropriate file.  This will give
> the document the correct case ID and sequence number, and make it show
> up in the public mail journal.  It would be best if most or all the
> metadata could be modified until the document is moved into a
> 'permanent' file, in case incorrect information is guessed or extracted
> automatically.
>
> Would such work flow work with Noark 5?
I do believe it will work but may not give desired correct results in
the core on a daily basis. The fields like createdDate will be a
challenge here as they will reflect the move to the new mappe /
registrering / dokumentbeskrivelse / dokumentobjekt. E.g a casefile that
has a duedate 3 weeks after it is created. If the documents are sitting
in a temporary mappe for two weeks and then moved to a new mappe, then
they suddenly get 2 extra weeks before the case is due, according to the
system.

The metadata you don't control is opprettetDato, opprettetAv,
avsluttetDato, avslutteAv and systemId. Some other fields are set
automatically (arkivstatus).

> Which operations is needed to make a file that will not show up in the
> public mail journal?  What about moving a document from one file to
> another?
>
The actual mechanism is a little unclear, but if there is no
skjermingshjemmel, then it can show up. Now you could use a temporary
"not ready" hjemmel to hold it back, but that approach in production
would draw a lot of criticism. I think you have seen how government can
be very late in publishing documents. So we would not want to give them
more tools ... I do think there is a new mechanism coming, where
registrering and basisregistrering are going to be merged and we will
get a new field, "offentligjournal". That makes sense as there is little
to help understand this.

I think moving should result in a entry in hendelseslogg. Not sure if
that is specified. But from the database perspective, there is a
changelog so we should try and make sure a move is visible via the
changelog.

We have discussed RFC6902 today and I guess that is probably the
approach we should use to solve this issue of moving. It's not specified
properly in the standard, so perhaps they might adopt this approach.

 - Tom


More information about the nikita-noark mailing list