Make source acceptable in Debian?

Wed Nov 9 14:02:31 CET 2016

On 11/09/2016 12:28 PM, Petter Reinholdtsen wrote:
> Hi Thomas.
>
> [Thomas Sødring]
>> My current thinking is that I want to keep the project source up to
>> date with libraries that the project is based on. This means that the
>> project will always use the latest versions of hibernate, spring etc
>> (where there are no conflicts). But this *only* applies at compile
>> time.
> My starting point is that systems like the NSA project Irritant Horn
> describe do exist, and HTTP and HTTPS connections can be hijacked as one
> connect to download dependencies.  There are ways to guard against this,
> for example by storing checksums alongside the URL to use while
> downloading, and always verify the checksum after downloading.  As far
> as I can see, maven do not do this, leading to the risk that what I
> download and you download isn't the same - and it is used to compromise
> my machine.  This seem like a bad starting point for a filing system
> that should be able to store the sensitive personal information of
> citizens.
I totally agree with that point. I myself have not checked the checksums
of the jar files that have been downloaded. But as we approach more of a
product, closer to version 1.0, I think that is something that will be
more important. Am I naive to trust the maven repo? Maybe!  Do I think
my connection is not being hijacked? I honestly don't know but this is
under development and everyone can compile from source so it's up to the
person doing the compiling to ensure the libraries are correct.

> Next there is the challenge of providing help and support to those using
> the system, while it is hard or impossible to know which version of
> dependencies they happened to download.  If I build one day and you
> another, it isn't given which version of dependencies we installed if
> "the latest versions" is the aim.
Isn't this the philosophy with Debian? They use older more stable
packages. It annoys people that want the shiny bells. I think though
that this really is about managing the project once it reaches version
1.0. At that point you or some of the other NUUG guys or others might
actually fork the project. In all honesty a good Noark 5 core is not
something that should be updated much. If you do it properly there
should  be few updates. One of the reasons I want to move forward based
on the latest libraries is that I don't want the project to have hidden
security vulnerabilities that are known in older libraries. So I live
with potential zero-day vulnerabilities. I think also that the strategy
of dependency management until we reach 1.0 is going to be very
different after we reach 1.0.  I have an opinion from stack overflow
that developers often say "I can't upgrade because this part of our
system is dependent on a particular version of a library". If I remember
correct the hibernate orm and hibernate elasticsearch dependencies had a
problem due to where the entitymanager is defined. This is the classic
problem that you want to avoid. I agree! But on the other side, if there
is a zero-day vulnerability in hibernate from two years ago, there might
be a large refactoring job to fix the code.

I also think that this is about versioning and project management. Once
we hit 1.0, we are hopefully looking at something stable. What changes
can we expect? Problems in the REST interface, minor bugs, problems with
the domain model, bugs in the code that create the extraction. I also
think that if the support is community based, then the community can say
"we only provide support for version 1.0", everything else is
experimental or "use at your own risk". If there comes a new version
"1.0.1" based on a security fix related to hibernate then it's up to the
community to decide it they will support it.

To be honest, when it reaches 1.0 and if it is to be used by others it
should leave HiOA and be supported through some licensing model. I don't
believe the project can exist outside of HiOA solely as a community
based project. A community project is fine as a teaching tool at HiOA,
but if a municipality wants to use this, they are going to have to pay
someone for support.

>
> For system integrators and security managers there is the added
> complication with working with multiple versions of the same libraries.
> Say product X need library L version 1 and product Y need library L
> version 2.  As product Z and the rest of the alphabet is introduced, it
> soon become impossible to find a version of library L that can work with
> all the products.  And when a security issue in library L show up,
> upgrading to a version where the security issue is fixed become
> impossible.
There will not be that many versions once we hit 1.0. Perhaps at version
1.0 we decide to review the libraries ever 6 months. I personally, given
my role at HiOA, will not have time to download and compile new versions
of the core every day or week. Nor will I want to. Once I have something
that works as a teaching tool, I will be happy with that. But I will be
on the security mailing lists of the libraries we use and will try to
patch within 24 hours if a severe vulnerability in one of the libraries
is discovered. I don't know what practice we will have, but until
version 1.0, I will make use of the latest versions, so that when we hit
1.0, we will have the latest versions of the libraries. Maybe the issue
here is that the libraries the project depends on aren't that stable!
Over the next 6 months I do not expect to change the versions of the
libraries. I doubt I will change from httpclient 4.5.1 to 4.5.2 just
because a new version might exist. But I might change to 4.6 because I
need new functionality. But I don't want to tie my hands from changing
either.
>> My first choice of distribution model is docker, that will work its way
>> into the project soon.
> Do you intend to have the SQL database inside or outside docker?  As I
> understand it, docker installations are upgraded by throwing away the
> image and starting afresh, which isn't a great way to upgrade databases.
Regarding docker, that's my opinion as well. I don't think anyone would
put the database inside the docker image and just throw it away, so the
database will probably be in another image or just running somewhere.

>> So I guess the discussion is about compile time versus runtime.
> My focus is compile time downloads.
>
I think that not using maven and automatic downloads at the moment would
be the wrong thing to do. I want people to download and play with the
project. Manually installing the dependencies is something that will
scare some people away. I accept that some people can be more frightened
by accepting unknown jar files onto their system. So could we see if
it's possible to allow people to do both? State in the readme that the
project is maven based and describe that these jar files will be
downloaded and let them know that they also can do this manually, by
using --offline. It's open source so people can do what they want.

You made a comment before about avoiding overselling the product. I try
to avoid that as it normally ends up in disappointment. I also think I
should explain what the project is from my perspective.  In many ways it
reflects an exploration in libraries, tools and methods that I would
like to integrate into teaching. As such I am learning as I explore.
It's a knowledge project to learn about electronic record keeping within
the Norwegian government context, which is something that actually is
enshrined in law! I have discussed and disputed with others that
knowledge is not just something that exists as a report. Open source
software can also be knowledge. This is a university project that may
result in something coo and as I am a public employee I believe the
results of my work belong to the public. I am not a developer that works
in a team. I am pretty much on my own here. So I will trip and make
silly choices, and I hope that keeping everything open will allow me to
learn.

 - Tom