Noark and blockchain ... a bit of info ...

Thomas Sødring thomas.sodring at hioa.no
Tue Apr 4 18:28:15 CEST 2017


On 04/04/2017 03:04 PM, Petter Reinholdtsen wrote:
> [Thomas Sødring]
>> Just to follow up on a discussion, here is some information about the
>> work I'm trying to do to integrate Noark 5 with block chain.
> Thank you.
>
>> So to deliver on the "bestilling" I need to find a distributed
>> blockchain implementation, that will allow me write, at a minimum,
>> systemId and a timestamp for CUD events.
> I am quite sure bitcoin allow that.
>
> But instead of trying to track down a public ledger you can write to,
> what about simply simulate it using trusted timestamps and a internal
> changelog.
>
> Make a database table consisting of a sequence counter
> (timestamp/counter/whatever) and the fields you want to track.  In
> addition you need a checksum and away to store a trusted timestamp
> value:
>
>  * timestamp
>  * systemid
>  * checksum
>  * tts-value
>
> Every time something change, you create the checksum of the timestamp,
> the systemid and the previous checksum, and ask for trusted timestamp
> value of this checksum.
>
> You can read more about how to get a trusted timestamp in
> <URL: http://people.skolelinux.org/pere/blog/syslog_trusted_timestamp___chain_of_trusted_timestamps_for_your_syslog.html >.
>
> In short, it is simply to do something like this:
>
>   echo '$timestamp $systemid $oldchecksum" > /tmp/file
>   openssl ts -query -data "/tmp/file" -cert -sha256 -no_nonce \
>     | curl -s -H "Content-Type: application/timestamp-query" \
>            --data-binary "@-" http://zeitstempel.dfn.de > /tmp/file.tsr
>
> You can then store the content ot file.tsr in the database.
Given that I am actually having trouble getting a blockchain service
running, I guess I should currently latch on to your approach now and
later integrate with a blockchain approach.  In my head I'm not sure if
I really see that blockchain achieves anything more than what you
achieve with your approach. This why I've said from day 1 with this
work, that I believe there is an element of "buzzword" to blockchain and
Noark integration. And I wanted to avoid chasing buzzwords.

> This would create a chain of signed checksums.  The signature can be
> used to both detect holes in the sequence and prove that a given
> checksum existed at a specific point in time.  This approach will both
> make the archive tamper evident (any missing events/changelog entries
> can be detected) and prove that a given file has not been changed after
> it was stored.  It will not make anything available to the public, of
> course, but see below if that is a goal.  The changelog could be made
> public to allow anyone to verify its content, but without the archive
> content it would not be possible to verify that files have not been
> modified.
>
> This will simulate a several parts of a public block chain.  The p2p
> broadcasting is taken out (stored locally instead) and the "proof of
> work" part is replaced by a trusted third party.
I agree that your approach is is very easy and straightforward to
implement.  Put this in a P2P fashion and it is just as valid as a
blockchain approach. It is a blockchain approach. Or blockchain has
copied your approach ...

So I think we should try your approach first. I'll create the event
listeners and starting publishing events to the logfile. They can be set
to be asynchronous so I think there will be little performance hit. I
was considering having a secondary service running that contained a
queue so we could have a pipe of outgoing requests to the blockchain
server. But that's for later.

I have to start writing articles. I think there is definitely enough
work here that we can write an article. I hope you will write something
with me Petter, once we get it up and running.

>
>> In many ways using blockchain is about exposing what is happening
>> inside government, without exposing any private details. It makes the
>> archive more "public" in that its internals are exposed in a trusted
>> verifiable manner.
> I fail to see how publishing the systemid is going to expose anything
> about what is happening inside a government.  To get that effect more
> information need to be made public.
I'm just starting with systemId and timestamp. Just to get a
proof-of-concept. But it is possible to extend it to many other metadata
fields.  There is a balancing act that has to kept in place here. If
the  distributed ledger is to act like a Write Once Read Many (WORM),
then it may not be possible to retract already published information.
There is a story a few years ago about an integration between a Noark
system and a child protection services  system that started leaking
barnevern information on the public record. It was relatively simple to
pull the data from the public record published on the municipalities
website. Maybe it will be more difficult from a WORM-style public ledger.
>
> Perhaps record type, case number, operation type etc need to be exposed
> to have that effect?

I agree that we need more than systemId, that's why I believe it's the
minimal amount of information. PST for example are not covered by the
FOI-law. With this minimalistic approach, even PSTs Noark system could
publish information ... perhaps :)

> What exactly should be available to the public, and what should not?
>

That's going to have to be covered by doing some research, based on
organisation type, what fagsystem they have integrated to the Noark
system etc. With the minimal approach I think we are good to go on all
systems.  To be concrete though,  you would start of with everything
that's on the public record (offentlig journal).  But the public record
is based on the journalposts and also expose things like caseId and
classification code (if I remember correctly). In this approach we are
aiming for the entire record keeping structure being visible through a
block chain description. You should be able to write some kind of viewer
that can show the organisations archive structure. That might be
possible through e.g each ledger block not only including the systemId
of the object (e.g. mappe), but also of it's parent (e.g. klasse or
arkivdel or both).

You did some analysis of OEP, if I remember correctly, where you looked
at the document creation date and the time the document lands on OEP. If
I remember that correctly there are instances of 100 days between a
document being created and it landing on OEP. Our approach will expose
the fact that something is there much sooner. I may be quoting your work
wrong....

I am a little weary of publishing all public information at creation
time, as the case-handler might add private data in e.g. in the title
and later needs to retract that information. If we consider the
arkivstruktur as a tree, private data will typically only be exposed at
branches close to leaves or be leaves. So potential retractions will
have little further consequence on the chain. Perhaps retractions can be
allowed, just not on systemId, but other fields. So you allow for
retractions of data like title at leaves. But this is something that
needs some research.  From the HiOA perspective nikita has been about
creating a research infrastructure and this mini-project is a good
example of using nikita as a research infrastructure to test out ideas.
And given that we have code, it's a simple task to show that it actually
works and is not just a theoretical idea that can be implemented. And as
with the interface work, we  find all the things that are assumed, but
are not that simple. After all these years I think I've nearly given up
the thought of seeing an article pointing to the nikitka github showing
research.

I also think archivists/recordkeepers are weary of any new approach,
they are conservative. So I think we must approach it from a very
non-threatening point of view.

I suspect there is a lot more information in a Noark system than that
which is on the public record, and more information will be in there in
the future. I also think that when considering this, I am turning the
table and saying this is useful because "I don't trust government". Now
that statement is not really applicable in Scandinavia, but there are
other parts of the world where that statement is very true. And that's
really where we see the need for this work. Arguing that we need this to
increase trust in government in Scandinavia would be laughed at and
that's not the point. But we can show that we can increase trust with a
few simple steps. But I believe you have to first see value in the
minimalist approach. From there we can publish more and more information.

These events that I want capture are just as relevant if you want to
automate the extraction process and deposit data with an archive at the
same time it was created. So I think application events is something
that is central to nikita and should be implemented even if you're not
using blockchain or ssl-chain.

 - Tom



More information about the nikita-noark mailing list