Re: Hvordan bør epost lagres i en NOARK5-database?

Thomas Sødring thomas.sodring at hioa.no
Sat Apr 15 08:12:57 CEST 2017


On 04/14/2017 08:13 PM, Petter Reinholdtsen wrote:
> [Thomas Sødring]
>> I think you can use XMP
>> (https://www.pdflib.com/knowledge-base/xmp-metadata/xmp-in-pdfa/)
>  
> Right.  Did not find anything obvious while skimming the document.
I'll gladly admit that I haven't looked at XMP in detail, nor have I
ever anyone in the KAI-miljø in Norway talk about it. But my
understanding is that you can extend XML as with the following:

https://www.pdflib.com/fileadmin/pdflib/products/XMP/machine_extension_schema_1.xmp

So you could create an email description extension.

>
>> Arkivforskriften is out for comments at the moment. This would be a good
>> time to propose that.
> I guess we should do this.  I've started on a comment to send in on
> <URL: https://titanpad.com/noark5-forskrift>.  Please have a look and
> improve it. :)
I will take a look at it. Norsk Arkivråd are looking for input. Would
you be interested in also sending it to Norsk Arkivråd?

>
>> Given that I haven't read IETF RFC 5322, you have to forgive me if my
>> devils advocate arguments are out of place, but I think it's worth
>> considering. Let's take an email with 4 attachments. This results in a
>> journalpost with one (the text at the begning of the mail)
>> "hoveddokument" and the 4 attachments are "vedlegg". So they would
>> have to parsed out as documents. But I guess this is defined in 5322
>> so is possible.
> The MIME part specify how to extract attachments, sure.  I am not quite
> sure how to best handle this in Noark 5, as I assume the original email
> should be connected somehow to the separate parts if they are extracted
> and stored separately.
>
>> I think a contrived challenge here is that the message to be archived
>> might have, for example, been forwarded twice, with or without proper
>> ID values in the mail and the case-handlers will complain about how
>> difficult it is, or part of the mail contains private information and
>> is not meant to head into the archive. But these situations are not
>> the majority, and routines should try to pick up the difficult
>> use-cases.
> Yes, there will be edge cases that need more thought.  For example,
> depending on the 'forwarding' mechanism used, storing the original email
> might be easy ro impossible.
>
>> But I think it's a really good idea. The less heterogeneity, the easier
>> it is to do preservation. And surely it can't be that difficult to get a
>> message from exchange to the proper format.
> One challenge to solve is how/where to store the message "ID", as it
> will need to be easily searchable if you want to group all emails in an
> email thread into the same file.  Is there a good place to store such
> value in noark 5?

No, I don't think so ... If you wanted to be very "microsofty" you could
hack the understanding of M007 dokumentnummer. But that is meant to be
1,2,3,4,5,6,7,8,.. but it is defined as an integer in XSD. I really
think it would be better if we identify the need for such a field to be
included in a revision of the tjenestegrensesnitt / next version of
Noark. I think you make a very good case for it. My gut feeling is that
dokumentObjekt should be extendible to specific types of documents,
emailDocument, SMS, MMS, that have their own additional metadata
requirements.

Should we send in a mangelmelding asking for this? Or we could ask for
this to be included in Noark 6. I think there may be a Noark 5v4.1 but
am hearing rumours that the tjenestegrensnitt will be finalised in Noark
6. But that does not really make sense. They have to finalise the
interface in Noark 5v4.1 and then can move forward with Noark 6. I think
a mangelmelding identifying the need for this will get them thinking and
is perhaps more important from the perspective of a standalone Noark 5
core for fagsystem integration than for Noark 5 komplett. I guess the
vendors hack these kind of requirements on top of their Noark systems.

>
> I've extended archive-file in the noar5-tester git repo to be able to
> store emails.  But it isn't yet doing a good job sorting them into
> files.
>

That's really good. I think I've been so stuck in the Noark structure
that I don't really look at the possibilities!  I see more the 
limitations. The word "tvangstrøye" was often used to describe Noark 4
and I think in someways it still holds true and your use-cases are
showing how Noark lacks flexibility.

 - Tom


More information about the nikita-noark mailing list