Data extraction client?

Wed Feb 15 13:36:32 CET 2017

On 02/15/2017 01:12 PM, Petter Reinholdtsen wrote:
> [Thomas Sødring]
>> From a first guess, there is enough in the API to support extraction
>> making!  Normally you might use a findAll method and process the
>> database in a particular fashion to make an extraction, but the
>> interface has no real findAll (avoid returning 300000 records in a
>> single call). Pagination is required so we'd process the core through
>> pages of results. It is possible!
> Good.
>
> A nice way to everything is working as it should, would be to import a
> data set and export it again, and verify that the content did not
> change.  Do you think that would be possible with nikita and the API?

Absolutely possible with the core, but not with the Hateoas API though, and
you'd need to use the import API. I'm not working on that API at the moment
though. In the Hateoas API it is not possible to override fields like
systemId,
createdBy, createdDate etc as that would kill the authenticity of the
material.

One of the things I was looking at was to use the core as a viewer for
various Noark 5 extractions as well importing some Noark 4 extractions
to Noark 5 so that it can be included in the core. Hence the Noark 4 to
Noark 5 mapping tool I referred to in a post yesterday.

> I suspect the systemID values will change, if there is no way to set
> them.  Is there any control of the uniqueness of such IDs?

The core set's them automatically. There is no control of uniqueness.
I'd also
make a point that if you are using this as a recordkeeping core, you
can't change
this, but using it as a viewer in an archive, you need this import
functionality.

>
>> I wrote some code  (php) to validate a Noark 5 extraction that might be
>> of interest:
>>
>> https://github.com/KDRS-SA/noark5-validator
>>
>> You can find two example extractions here:
>>
>> https://github.com/KDRS-SA/noark5-validator/tree/master/src/resources/test-uttrekk
>>
>> I think one is correct, while the other has some mistakes
> Right.  Perhaps we should have importing the correct one as a short term
> goal, along with an extraction program?

An extraction module is on the roadmap, import isn't. But it's not
difficult to write
an import program to parse a Noark 5 extraction. The changelog.xml
 (endringslogg.xml) is definitely an issue as I use hibernate-envers at the
moment and sneaking in records there is difficult, but possible! But
importing
arkivstruktur.xml is very straight forward.

>
>> Good idea! We can develop it together if you want. You tend to be two
>> steps ahead so you will probably have it done first :) But I
>> definitely think that such a module is mandatory for a Noark 5 core.
> I'll see what I can do.  I first plan to add more data using my test
> script, and try to verify that the field content is controlled for
> invalid and valid values. :)
>