Re: Hvordan bør epost lagres i en NOARK5-database?

Thomas Sødring thomas.sodring at hioa.no
Thu Dec 22 13:58:33 CET 2016


On 12/22/2016 01:02 PM, Petter Reinholdtsen wrote:
> [Thomas Sødring]
>> as you see in the response you got from the national archive, they do
>> not want to think about it. They avoid getting involved in the detail
>> and rely on the vendors to sort it out. The "arkivskaper" is incapable
>> of solving this. The vendors don't really care about long term
>> preservation, and will probably be happy to simply convert it a long
>> term preservation format (PDF/A).  In fact as I teach Noark to
>> students I still denote emails as a "utradisjonell kilde" as
>> documents.
> So perhaps we should come up with a good way to handle them, and
> implement it in Nikita?  My idea would be to store emails in some XML
> form inside the database, and have a way to render the XML, perhaps as
> PDF.  For HTML emails the XML form should (optionally) download external
> documents referenced in the HTML (like images and iframes), to make sure
> the email can be reproduced like it looked when it was archived, even if
> the referred documents disappeared in the mean time.
I really like this idea! It fits perfectly in within the OAIS package
model and there should not be a problem getting acceptance for it. But
as you say later, we need to build on existing standards and not
reinvent the wheel. I will look into this in the new year unless
somebody else does so first.
>
> Emails with attachments could be handled like that too, but as you
> mention, the attachments might be documents belonging to different
> cases.  I suspect the proper thing to do is to file the email with both
> cases, to document how the document was transfered.
>
>> There is not really any room to handle the in-reply to field in Noark.
> That is OK, as long as it can be stored in an reproducable manner,
> allowing the email to be extracted and reproduced the way it was when it
> was received.
I think this information is missing in a lot of solutions today. I guess
the way emails might be processed might be according to the shortest
route possible and print to pdf via a printdriver might see a lot of
metadata go missing.
>
>> You might be interested to read about the capstone project from the US
>> [1]. Datatilsynet might not allow a similar project in Norway, but I
>> really like the idea of capturing all emails and archiving them.
> Here at USIT we have stored all emails to role addresses (ie not
> personal emails, but sent to USIT in an semi-official manner) in email
> archives for at least 20 years.  It surprises me that this is not done
> all over the public sector in Norway.
It's the public sector :)
>
>> Signatures are something that the archive wants to keep away from. The
>> reason being that there is no reliable way of dealing with (PKI) key
>> management in a long term preservation perspective. In 100 years, we
>> might have moved away from PKI to something else that we are unable to
>> see at the moment. The public keys simply are not there or may have
>> been revoked. So Noark has fields that say the document was signed and
>> the signature was verified on a particular date as correct or
>> incorrect by some system.
> I do not believe you are right here.  I believe trusted timestamping
> have a key role in archiving to be able to ensure archives are
> unmodified.
How do we verify the timestamps / signatures in 100 years when the TTP
is no longer around? I agree with the need of trusted timestamping and
heard DIFI we're meant to be making such a service, where you give a
UUID or similar of an object along with a hash value. DIFI as a TTP
would never need to know contents, just provide a service where you give
a UUID and hash value/timestamp and get a yes/no answer back.

In many ways the IKA Kongsberg project on distributed noark 5 core is
partly about the archive becoming a TTP for the municipalities. I
certainly would love to see some experimental work with blockchain
signing of documents in such a configuration. I'm definitely not
conservative here :)

But until we have a proper TTP mechanism that we know can be used in a
100 year perspective, we don't really have any other way to trust the
signatures. Additional functionality to verified metadata should be
pursued, but we need to make sure that we at least have stored the
information the signature was valid or not at a given point in time.

But yes, we absolutely should pursue TTP / timestamping!

>
>> So this is something we definitely should look at implementing, once the
>> core becomes useful enough that you can play around with.
>>
>> I think another standard that might be of interest here is METS [2].
>> It's overly complicated for this, but might be something to think about.
>> They have a part that allows you to link things together. But we really
>> need to avoid https://xkcd.com/927/
> Absolutely.  I do not want to create our own representation, just find a
> good existing representation to store in the archive. :)
>



More information about the nikita-noark mailing list