Re: Hvordan bør epost lagres i en NOARK5-database?

Thomas Sødring thomas.sodring at hioa.no
Tue Apr 4 14:48:26 CEST 2017


On 04/04/2017 02:24 PM, Petter Reinholdtsen wrote:
> [Petter Reinholdtsen]
>> Gah, an obvious answer to this question occured to me.  We can perhaps
>> add some metadata to the PDF to identify the PDF type (ie PDF with
>> attached email).  Anyone know if it possible?
> I have not been able to find a list of possible metadata fields.  Is it
> a fixed list or can we make up our own metadata fields?
I think you can use XMP
(https://www.pdflib.com/knowledge-base/xmp-metadata/xmp-in-pdfa/)
>
> But I did investigate PDF/A and embedded files.  Embedding unspecified
> content in PDF/A is allowed in PDF/A-3 (strangely enough), but this is
> not allowed according to the Noark 5 standard.  Luckily, as it would
> accept files that could not be interepreted in the future.  So storing
> emails like proposed using PDF/A-1 (currently accepted) or PDF/A-2
> (might become accepted) is not possible.
I think they will open for PDF/A-2. Currently PDF/A-2 is OK if you apply
first.
>
> Btw, why isn't Noark 5 simply accepting emails as emails into the
> archive?  In other words, plain text with header \n\n body.  Perhaps we
> should send them a proposal for this?
Good question.  I do not know the answer as I have never considered it.
I guess they never read IETF RFC 5322.

>   Where are new formats proposed?
Arkivforskriften is out for comments at the moment. This would be a good
time to propose that.

> Noark 5 should simply accept IETF RFC 5322 formatted emails.  Anything
> else is simply going to be a cludge.
>
Given that I haven't read IETF RFC 5322, you have to forgive me if my
devils advocate arguments are out of place, but I think it's worth
considering. Let's take an email with 4 attachments. This results in a
journalpost with one (the text at the begning of the mail)
"hoveddokument" and the 4 attachments are "vedlegg". So they would have
to parsed out as documents. But I guess this is defined in 5322 so is
possible.

I think a contrived challenge here is that the message to be archived
might have, for example, been forwarded twice, with or without proper ID
values in the mail and the case-handlers will complain about how
difficult it is, or part of the mail contains private information and is
not meant to head into the archive. But these situations are not the
majority, and routines should try to pick up the difficult use-cases.

But I think it's a really good idea. The less heterogeneity, the easier
it is to do preservation. And surely it can't be that difficult to get a
message from exchange to the proper format.

 - Tom



More information about the nikita-noark mailing list