"Metadata is a love note to the future"

Wed Jun 21 10:17:56 CEST 2017

[Thomas Sødring]
> When it comes to machine learning, I would like to autoclassify new
> incoming documents based on existing documents. Take for example the
> Hallingdal kommuner that I worked with.  I generated 7 database dump
> extractions with related documents. The classification code and other
> metadata and documents are linked. So we could try a machine learning
> training run on 2 of the extractions and verify the algorithm on the
> other 5.  There are so many factors that we can finetune and play with
> during such experiments.

That would be cool.  How much work would it be to get access to these
extractions?  What kind of volume are we talking about?

> The limiting factor here is privacy laws, but we could do this without
> ever looking at the sensitive data, but it would require a strict
> technical setup at an IKA.

I do not believe it make sense to talk about 'without ever looking' when
we are writing software to look at the document content.  To me, the
fact that the content is made available for processing by people or
computers controlled by an entity is the privacy challenging part.  I do
not buy into the idea that 'only machines will look at the information'
is less intrusive than having people looking at the personal
information,

So if we head down this path, we need to consider carefully what we do
and how we do it.

-- 
Happy hacking
Petter Reinholdtsen