Hvordan bør epost lagres i en NOARK5-database?

Petter Reinholdtsen pere at hungry.com
Sat Apr 15 09:58:11 CEST 2017


[Thomas Sødring]
> I'll gladly admit that I haven't looked at XMP in detail, nor have I
> ever anyone in the KAI-miljø in Norway talk about it. But my
> understanding is that you can extend XML as with the following:
>
> https://www.pdflib.com/fileadmin/pdflib/products/XMP/machine_extension_schema_1.xmp
>
> So you could create an email description extension.

Hm, interesting.  Will try to keep it in mind if the email in PDF
approach is explored further.

>> I guess we should do this.  I've started on a comment to send in on
>> <URL: https://titanpad.com/noark5-forskrift>.  Please have a look and
>> improve it. :)
> I will take a look at it. Norsk Arkivråd are looking for input. Would
> you be interested in also sending it to Norsk Arkivråd?

Feel free to send it to whoever might be interested.  My comments are
public for anyone to have a look.  But I'm not quite sure what kind of
coordination 'sending it to Norsk Arkivråd' would require, and given
that it is just 15 days left to the deadline and that I have very
limited time to work on this, I doubt I will have time to coordinate
with anyone expecting to follow some democratic work flow before sending
in my comments.

> No, I don't think so ... If you wanted to be very "microsofty" you could
> hack the understanding of M007 dokumentnummer. But that is meant to be
> 1,2,3,4,5,6,7,8,.. but it is defined as an integer in XSD. I really
> think it would be better if we identify the need for such a field to be
> included in a revision of the tjenestegrensesnitt / next version of
> Noark. I think you make a very good case for it. My gut feeling is that
> dokumentObjekt should be extendible to specific types of documents,
> emailDocument, SMS, MMS, that have their own additional metadata
> requirements.

I find M711 virksomhetsspesifikkeMetadata mentioned several times in the
spec.  Any idea what it is and how to sue it?  It is part of
dokumentbeskrivelse, among other things.

Another alternative might be to use filnavn in dokumentobjekt.  Emails
do not really have file names, and the Message-ID would fit in there
just fine.  Is it a problem for the filnavn value that Message-ID is not
always unique?  Spam emails tend to reuse Message-ID or set it to empty.
Proper email clients and server should always strive to make unique
values.

A key for this to work would be to be able to quickly search for all
dokumentobjekt entries with a given filename value.  This way the email
injector would loop over all the values in In-Reply-To and References
and find if any of them are already stored in the archive, and propose
to store the email in the same file as these existing archive objects.

With my previous proposed work flow (store everything in a temp file and
move individual documents to their proper file afterwards), there should
be an automatic task moving emails into files and create new files for
new email threads.  It would probably handle file merging, as there are
email clients breaking email threads.  It would only work well for well
behaving email clients.

> Should we send in a mangelmelding asking for this? Or we could ask for
> this to be included in Noark 6. I think there may be a Noark 5v4.1 but
> am hearing rumours that the tjenestegrensnitt will be finalised in Noark
> 6. But that does not really make sense. They have to finalise the
> interface in Noark 5v4.1 and then can move forward with Noark 6. I think
> a mangelmelding identifying the need for this will get them thinking and
> is perhaps more important from the perspective of a standalone Noark 5
> core for fagsystem integration than for Noark 5 komplett. I guess the
> vendors hack these kind of requirements on top of their Noark systems.

I suspect a defect report with a proposal for storing emails would be
good.  The key defect is missing a RFC 822 format as allowed storage
format.  Not sure the lack of Message-ID field is a defect yet, given
that I do not understand virksomhetsspesifikkeMetadata and filnavn might
be a OK fit.

> That's really good. I think I've been so stuck in the Noark structure
> that I don't really look at the possibilities!  I see more the
> limitations. The word "tvangstrøye" was often used to describe Noark 4
> and I think in someways it still holds true and your use-cases are
> showing how Noark lacks flexibility.

Well, I am not quite there yet, but I worry a bit that I might be
proposing solutions that break expectations so much that no other system
will be able to use the information we store in the archive. :)

-- 
Happy hacking
Petter Reinholdtsen


More information about the nikita-noark mailing list