Jeg har lekt litt med kunstig idioti i det siste, og som et forsøk ga jeg den oppdrag å analysere kodebasen til Nikita og se hva som trengs for å kunne få Nikita inn som en offisiell Debian pakke, i praksis hva som trengs for å kunne bygge Nikita uten Internett-tilkobling. Her er det den kom opp med. Det er som forventet ganske mye som må på plass før vi er i mål, men tenkte det var greit om flere er kjent med detaljene rundt utfordringen.
# Deserted Island Build Proposal for nikita-noark5-core
## Goal
Enable building and running the project with **zero Internet access** --- only a local Debian mirror (Testing) on disk and the source repository. The "deserted island" test: can you get this up and running without any network connection?
------------------------------------------------------------------------
## Current Situation
### What Gets Downloaded Today
------------------------------------------------------------------------------------------ When What From Count/Size ---------------- ----------------------------- -------------- ---------------------------- **Build** ~450 JAR artifacts (Spring Maven Central ~200 MB (`make build`) Boot BOM + transitive deps)
**Build** Additional test-scoped Maven Central ~100 MB (`make check`) dependencies (Mockito, JUnit engine, etc.)
**Build** `antlr4-maven-plugin`, Maven Central ~50 MB (plugins) `asciidoctor-maven-plugin`, `spring-boot-maven-plugin`, `maven-surefire-plugin` + transitive deps
**Test setup** Keycloak 26.0.6 binary GitHub ~130 MB tarball Releases (`keycloak-setup-start.sh`)
**Runtime Language detection model Maven Central ~5 MB (Tika)** (`langdetect` models) --- may / CDN download on first use if not bundled ------------------------------------------------------------------------------------------
The `maven-repo/` directory exists in the working tree but is **not committed to git** (untracked files), so every fresh clone downloads everything from scratch.
### Why Debian Packages Alone Aren't Enough
Spring Boot 3.4.5 and its entire ecosystem (spring-boot-starter-web, -data-jpa, -security, -oauth2-resource-server, -amqp, etc.) are **not packaged for Debian**. Many individual libraries *are* available as `lib*-java` packages in Debian Testing:
------------------------------------------------------------------------------------- Dependency Available as Debian pkg? Package name ------------------- ---------------------------------- ------------------------------ ANTLR 4 runtime ✅ Yes `libantlr4-runtime-java`
commons-lang3 ✅ Yes `libcommons-lang3-java`
Guava ✅ Yes `libguava-java` (32.0.1)
H2 database ✅ Yes `libh2-java`
PostgreSQL JDBC ✅ Yes `libpostgresql-jdbc-java`
Joda-Time ✅ Yes `libjoda-time-java`
JAXB runtime ✅ Yes `libjaxb-java`
ByteBuddy ✅ Yes `libbyte-buddy-java`
Reflections ✅ Yes `libreflections-java`
RabbitMQ client ✅ Yes `librabbitmq-client-java`
JSON (org.json) ⚠ Partially `libjson-java` is different library (gson-based, not org.json)
Apache POI ❌ No Not packaged in Debian
**Spring Boot** ❌ No Not packaged at all
Spring Security ❌ No Not packaged at all
springdoc-openapi ❌ No Not packaged at all
Tika parsers ❌ No Not packaged for Debian
AsciiDoctor (Ruby ✅ Yes `asciidoctor` package provides CLI) the Ruby-based CLI tool
AsciiDoctorJ (Maven ❌ No The JVM wrapper plugin bridge) (`asciidoctor-maven-plugin`) is not in Debian. Can be replaced by calling the Debian `asciidoctor` CLI from Makefile, or skipped entirely since docs are cosmetic. -------------------------------------------------------------------------------------
**Bottom line**: Replacing Maven with Debian packages alone is **not feasible** because Spring Boot, POI, springdoc-openapi, and Tika have no Debian equivalents. The only realistic path is to make Maven work offline.
------------------------------------------------------------------------
## Proposal: Three-Part Approach
### Part 1 --- Build-Time Offline (Maven)
#### Option A: Commit `maven-repo/` to git (Recommended for simplicity)
**Changes needed:** 1. Add `maven-repo/*.jar`, `maven-repo/*.pom`, `maven-repo/*.sha1`, `maven-repo/*.lastUpdated` to `.gitattributes` with `export-ignore = false` and commit the directory. 2. Alternatively, use a sparse checkout or git-lfs for large binaries. 3. Add an offline guard to the Makefile:
``` makefile # In Makefile, add: MVNOPTS := -Dmaven.repo.local=$(CURDIR)/maven-repo --offline
# Optional: fail-fast if network would be needed: .PHONY: verify-offline verify-offline: @echo "Verifying offline build capability..." $(MVN) $(MVNOPTS) dependency:resolve -DskipTests || \ (echo "ERROR: Missing dependencies. Run 'make populate-maven-repo' online first."; exit 1)
build: verify-offline $(MVN) $(MVNOPTS) clean validate install ```
4. Provide a one-time online bootstrap target for maintainers to keep `maven-repo/` current:
``` makefile .PHONY: populate-maven-repo populate-maven-repo: @echo "Populating local Maven repo (requires Internet)..." $(MVN) -Dmaven.repo.local=$(CURDIR)/maven-repo dependency:go-offline $(MVN) -Dmaven.repo.local=$(CURDIR)/maven-repo dependency:resolve-plugins ```
**Pros**: Simple, works immediately, no build-system changes needed. **Cons**: Bloated git repository (~300 MB of JARs). Consider git-lfs or a separate tarball artifact instead.
#### Option B: Debian policy-compliant approach (Recommended for packaging)
For proper Debian packaging (`dpkg-buildpackage`), the standard approach is:
1. **List all upstream VCS artifacts in `debian/watch`** and use `uscan/udeb` or manually manage them in `debian/source/include-binaries`. 2. **Download all JARs during package build** from Maven Central using the `debian/rules` target, with checksums pinned in `debian/control` or a separate file. 3. Use **Maven's offline mode** (`--offline`) pointing at a pre-populated local repo that was assembled during the online phase of the Debian build.
Debian Java Policy recommends: - Each upstream JAR dependency should be either (a) packaged in Debian as `lib*-java`, or (b) downloaded and built from source within the package build process. - For option (b), use `download-maven-poms` helper script or similar to fetch artifacts.
However, since Spring Boot is not in Debian, **this project cannot be packaged purely from Debian packages**. The practical approach:
``` makefile # debian/rules snippet: %: dh $@ --with javahelper,maven
override_dh_auto_build: # Ensure offline build dh_auto_build -- --offline -Dmaven.repo.local=$(CURDIR)/deps-maven-repo ```
With `debian/control` Build-Depends including all available Debian packages:
Build-Depends: debhelper-compat (=13), dh-buildupdate, default-jdk (>= 17), maven-debian-helper, maven-repo-helper, libantlr4-runtime-java, libcommons-lang3-java, libguava-java, libh2-java, libpostgresql-jdbc-java, libjoda-time-java, libjaxb-java, libbyte-buddy-java, librabbitmq-client-java, libreflections-java
Then supplement with direct downloads for non-packaged dependencies (Spring Boot BOM, POI, Tika, springdoc-openapi). The `maven-repo-helper` tools can download these from Maven Central during the online build phase.
### Part 2 --- Test-Time Offline (Keycloak)
The file `scripts/keycloak-setup-start.sh` downloads Keycloak from GitHub:
``` bash wget https://github.com/keycloak/keycloak/releases/download/$%7Bver%7D/keycloak-$... ```
**Fixes:**
1. **Bundle Keycloak tarball in the repo** (or ship as a separate artifact): - Download `keycloak-26.0.6.tar.gz` and place it in `scripts/keycloak/`. - Modify script to check for local file first:
``` bash if [ -f "scripts/keycloak/keycloak-${ver}.tar.gz" ]; then echo "Using bundled Keycloak tarball." else wget https://github.com/keycloak/keycloak/releases/download/$%7Bver%7D/keycloak-$... fi ```
2. **Alternatively**, package Keycloak from Debian: `libkeycloak-admin-rest-client-java` exists, but the full Keycloak server is not packaged. The bundled tarball approach is simpler.
3. **Or skip Keycloak** for unit tests entirely --- run only `make check` (which runs JUnit tests that don't need Keycloak) and document that integration tests require the online setup:
``` makefile check-offline: $(MVN) $(MVNOPTS) test -Dtest='!*IntegrationTest,!*IT' ```
### Part 3 --- Runtime Offline (Tika Language Detection)
Apache Tika's `tika-langdetect` module downloads language detection models on first use. This happens at runtime, not build time.
**Fix:** Set the system property to prevent online download:
``` java // In application configuration or startup code: System.setProperty("org.apache.tika.language.detect.model", "/path/to/local/model.jar"); ```
Or exclude the language detection module from Tika's dependency tree if it's not needed. Check `FileHandlingService.java` and `FileUtilsService.java` --- they use `Tika` for MIME detection and text extraction, which works fine without language detection models. The models are only needed for `LanguageDetector`.
**In pom.xml**, add exclusions if language detection is not used:
``` xml <dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-parsers-standard-package</artifactId> <exclusions> <exclusion> <groupId>com.github.pemistahl</groupId> <artifactId>lingua-language-detector</artifactId> </exclusion> </exclusions> </dependency> ```
------------------------------------------------------------------------
## Proposed Debian Packages to Install
### Required Build Dependencies
default-jdk (>= 21) # Java 21+ compiler and runtime (see JDK note below) maven # Maven build tool libantlr4-runtime-java # ANTLR runtime (already in deps, but available as pkg) git # Version control (for git-commit-id plugin if used)
**JDK version bump required:** `pom.xml` currently sets `<java.version>17</java.version>`. JDK 17 has been removed from Debian Testing --- only JDK 21 (`openjdk-21-jdk`) and JDK 25 (`openjdk-25-jdk`) remain. The fix is straightforward:
``` xml <!-- In pom.xml, change: --> <properties> <java.version>21</java.version> </properties> ```
Spring Boot 3.x supports JDK 17 through 21+ and the codebase doesn't use any Java 17-specific APIs that would break on 21. This is a one-line change in `pom.xml` and updates to `.gitlab-ci.yml` (which references `openjdk-17-jdk`).
### Required Runtime Dependencies
default-jre # Java runtime postgresql # Database backend for production use rabbitmq-server # Message queue (if using AMQP integration profile) keycloak # Auth provider (NOT in Debian — bundle or skip) tesseract-ocr-nor # OCR language data (used by Tika, from .gitlab-ci.yml) unoconv # Document conversion (from .gitlab-ci.yml) libreoffice-core # For document format conversion via unoconv python3 # Test scripts use Python 3 curl # Health checks and API testing jq # JSON processing in test scripts
### Optional / Nice to Have
asciidoctor # For documentation generation (alternative to maven plugin) libh2-java # Embedded database for demo/testing mode
------------------------------------------------------------------------
## Proposed Dependencies That Could Be Dropped
-------------------------------------------------------------------------------------------------------------- Dependency Used For Can Drop? Notes ---------------------------------- ----------------- ------------------- ------------------------------------- `spring-boot-starter-amqp` RabbitMQ **Yes, Only needed if mail queue integration integration conditionally** is used. Profile-gated via `application-queueintegration.yml`.
`spring-boot-starter-validation` Bean validation No Core functionality depends on it. (`@NotNull`)
`springdoc-openapi-*` Swagger/OpenAPI **Yes** Cosmetic/documentation only. Can be UI docs excluded for minimal build.
`asciidoctor-maven-plugin` API documentation **Yes** Only generates HTML docs during generation package phase, not needed to run the app.
`spring-restdocs-*` (test scope) REST API **Yes** Test-time doc generation only. documentation tests
`junit-vintage-engine` (test JUnit 3/4 Maybe Only if all tests migrate to JUnit 5. scope) compatibility in tests
`spring-boot-starter-webflux` Reactive web Maybe Depends on which tests use it. (test scope) client for tests
Tika language detection models Language **Yes** MIME type detection works without it. identification of Exclude from classpath or set offline documents mode property. --------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------
## Summary: Concrete Steps to Achieve "Deserted Island" Build
### Immediate (low effort, high impact)
1. **Add `--offline` flag to Maven in Makefile**:
``` makefile MVNOPTS := -Dmaven.repo.local=$(CURDIR)/maven-repo --offline ```
2. **Commit `maven-repo/` contents** (or create a tarball artifact):
``` bash mvn dependency:go-offline -Dmaven.repo.local=$(pwd)/maven-repo mvn dependency:resolve-plugins -Dmaven.repo.local=$(pwd)/maven-repo # Then commit or tar the directory ```
3. **Bundle Keycloak** in `scripts/keycloak/` and update `keycloak-setup-start.sh`.
4. **Guard against Tika model download** via system property: Add to `application.yml`:
``` yaml spring: main: add-application-context-initializer: true --- # Or set JVM flag: -Dorg.apache.tika.language.detect.model=none ```
### Medium-term (proper Debian packaging)
5. **Create proper `debian/` directory** with: - `debian/control` listing all Build-Depends - `debian/rules` using `dh-sequence-maven` or manual Maven invocation with offline mode - `debian/source/include-binaries` for bundled JARs that can't be Debian-packaged 6. **Document the offline build process** in a new `docs/general/OfflineBuild.md`.
### Long-term: Get Into Debian Main (reduce external dependency count)
This section addresses the fundamental conflict between Maven's "download JARs" model and Debian's requirement that all code in `main` be built from source available within Debian.
#### The Problem: Bundled Binary JARs vs DFSG Compliance
Debian Free Software Guidelines (DFSG) and [Java Policy](https://www.debian.org/doc/packaging-manuals/policies/java-policy/) require: 1. **All code must be available in source form** --- binary-only JAR blobs are not acceptable for `main` 2. **Build dependencies must themselves be Debian packages** --- downloading from Maven Central during build is only acceptable for `non-free` 3. **Each upstream component should be packaged separately** as its own `lib*-java` package
Currently, ~450 JARs are downloaded from Maven Central. Even if their licenses are DFSG-compatible (Apache 2.0, MIT, LGPL), they violate the "buildable from source" requirement because: - They're pre-built binaries shipped alongside our source - Their build chain is external to Debian and not reproducible within the archive
#### Three Paths Forward
**Path A: Package Each Dependency Individually (Required for `main`)**
Every non-packaged dependency must become its own Debian package. The ones we need to package are:
------------------------------------------------------------------------ Dependency License Packaging Difficulty -------------------- --------------- ----------------------------------- Spring Boot 3.4.x Apache 2.0 **Very high** --- ~50 transitive modules, each needs separate packaging; depends on Jakarta EE APIs not all in Debian
Apache POI 5.4.0 Apache 2.0 Medium --- single project but large; may already exist as `libpoi-java` (check)
Tika parsers 2.8.0 Apache 2.0 **Very high** --- huge dependency tree including Lucene, XML libraries, etc.
springdoc-openapi MIT Low --- single project, 1.6.x straightforward Maven build
Keycloak server Apache 2.0 High --- not needed in `main` if (test only) test-only; bundle as VCS artifact or skip ------------------------------------------------------------------------
This is a **multi-year effort**. Each package needs proper metadata, patches, and maintenance. Spring Boot alone has dozens of modules that would need individual packaging.
**Path B: Use Maven Offline with Online Download Phase (Acceptable for `non-free`)**
For `contrib`/`non-free`, the approach is simpler: 1. `debian/rules` downloads all JARs from Maven Central during online build phase 2. Checksums are verified against pinned values in `debian/control` or checksum file 3. Build runs with `--offline` pointing at pre-populated local repo
This still requires Internet access during package build, but is acceptable for non-free. The "deserted island" test would pass because the Debian mirror includes these downloaded artifacts as part of the built package.
**Path C: Hybrid Approach (Recommended Near-Term)**
Use a combination: 1. **Replace packaged dependencies with Debian packages** where available (ANTLR, Guava, H2, etc.) --- reduces JAR count from ~450 to ~350 2. **Package the critical non-packaged ones ourselves**: springdoc-openapi (MIT, easy), any others that are small and well-maintained 3. **Bundle remaining JARs** for now with proper licensing documentation, targeting `non-free` initially 4. **Work upstream** to get Spring Boot packaged in Debian --- this is the blocker for everything
#### Concrete Steps for Path C
``` makefile # debian/rules approach: override_dh_auto_build: # Use Debian packages where available via classpath # Download remaining from Maven Central (online phase only) dh_auto_build -- -Dmaven.repo.local=$(CURDIR)/.deps-repo --offline ```
With `debian/control`:
Build-Depends: debhelper-compat (=13), default-jdk (>= 21), maven-debian-helper, libantlr4-runtime-java, libguava-java, libh2-java, libpostgresql-jdbc-java, libjoda-time-java, libjaxb-java, libbyte-buddy-java, librabbitmq-client-java, libreflections-java, # Replace Maven deps with Debian packages where available Standards-Version: 4.6.2
And `debian/copyright` documenting all bundled JAR licenses.
#### Impact on "Deserted Island" Goal
The "deserted island" goal is **achievable now** for building and running the application, even without Debian main packaging: - A complete Debian mirror + our source repo + pre-populated `maven-repo/` = offline build works - The barrier to debian `main` is separate from the ability to build/run offline
The two goals should be tracked separately: 1. **Offline build capability** (this document) --- achievable with bundled artifacts 2. **Debian main compliance** --- requires packaging all dependencies or dropping Spring Boot
7. **Evaluate dropping springdoc-openapi and asciidoctor-maven-plugin** from default build, moving them to an optional profile.
8. **Profile-gate AMQP integration** more clearly so it's not pulled in by default.
9. **Replace Tika with Debian-packaged alternatives** where possible (e.g., `file` command for MIME detection, `tesseract-ocr` for OCR) --- significant refactoring needed.
------------------------------------------------------------------------
## Verification Checklist
To verify the "deserted island" build works:
``` bash # 1. Start with fresh clone + bundled maven-repo tarball git clone <repo> && cd nikita-noark5-core-upstream tar xzf ../maven-repo-bundle.tar.gz # or already in git
# 2. Ensure no network connectivity iptables -A OUTPUT -p tcp --dport 80 -j DROP iptables -A OUTPUT -p tcp --dport 443 -j DROP
# 3. Build offline make build check
# 4. Verify success test -f target/nikita-noark5-core-*.jar && echo "BUILD SUCCESS"
# 5. Restore network iptables -D OUTPUT -p tcp --dport 80 -j DROP iptables -D OUTPUT -p tcp --dport 443 -j DROP ```
If `make build check` succeeds with all ports blocked, the deserted island test passes.
Hei,
Dette var veldig interessant - takk for at du tok deg tid og sjekket det og ga tilbakemelding. Vi snakket om dette for flere år tilbake og jeg var lunken til å prøve meg på det da jeg regnet med at det ville kreve mye pakking og vedlikehold av spring avhengighetene. Sludderboten bekrefter det når den sier:
This is a **multi-year effort**. Each package needs proper metadata, patches, and maintenance. Spring Boot alone has dozens of modules that would need individual packaging.
Jeg er usikker om det er en **multi-year effort**, men jeg tror at det er en rabbit-hole og mye som man må sette seg inn i (som sludderbot sier). Ellers er jeg enig med det sludderbot sier om å fjerne avhengigheter som kø, dersom det ikke er bruk for kø når man kjører lokalt.
Det som kanskje også hadde vært interessant å se på var hvorvidt sludderbot kan oppgradere tjenestegrensesnitt fra 5.5 til 5.6 og eventuell om sludderbot klarer å finne uklarheter i beskrivelsen av tjenestegrensesnittet. Av og til er det fint med et ekstra sett øyner og sludderbotene er flink til det.
Thomas
________________________________ Fra: Petter Reinholdtsen pere@hungry.com Sendt: torsdag 28. mai 2026 10:29 Til: nikita-noark@nuug.no nikita-noark@nuug.no Emne: Nikita i Debian, eller øde øy-støtte for Nikita...
Jeg har lekt litt med kunstig idioti i det siste, og som et forsøk ga jeg den oppdrag å analysere kodebasen til Nikita og se hva som trengs for å kunne få Nikita inn som en offisiell Debian pakke, i praksis hva som trengs for å kunne bygge Nikita uten Internett-tilkobling. Her er det den kom opp med. Det er som forventet ganske mye som må på plass før vi er i mål, men tenkte det var greit om flere er kjent med detaljene rundt utfordringen.
# Deserted Island Build Proposal for nikita-noark5-core
## Goal
Enable building and running the project with **zero Internet access** --- only a local Debian mirror (Testing) on disk and the source repository. The "deserted island" test: can you get this up and running without any network connection?
------------------------------------------------------------------------
## Current Situation
### What Gets Downloaded Today
------------------------------------------------------------------------------------------ When What From Count/Size ---------------- ----------------------------- -------------- ---------------------------- **Build** ~450 JAR artifacts (Spring Maven Central ~200 MB (`make build`) Boot BOM + transitive deps)
**Build** Additional test-scoped Maven Central ~100 MB (`make check`) dependencies (Mockito, JUnit engine, etc.)
**Build** `antlr4-maven-plugin`, Maven Central ~50 MB (plugins) `asciidoctor-maven-plugin`, `spring-boot-maven-plugin`, `maven-surefire-plugin` + transitive deps
**Test setup** Keycloak 26.0.6 binary GitHub ~130 MB tarball Releases (`keycloak-setup-start.sh`)
**Runtime Language detection model Maven Central ~5 MB (Tika)** (`langdetect` models) --- may / CDN download on first use if not bundled ------------------------------------------------------------------------------------------
The `maven-repo/` directory exists in the working tree but is **not committed to git** (untracked files), so every fresh clone downloads everything from scratch.
### Why Debian Packages Alone Aren't Enough
Spring Boot 3.4.5 and its entire ecosystem (spring-boot-starter-web, -data-jpa, -security, -oauth2-resource-server, -amqp, etc.) are **not packaged for Debian**. Many individual libraries *are* available as `lib*-java` packages in Debian Testing:
------------------------------------------------------------------------------------- Dependency Available as Debian pkg? Package name ------------------- ---------------------------------- ------------------------------ ANTLR 4 runtime ✅ Yes `libantlr4-runtime-java`
commons-lang3 ✅ Yes `libcommons-lang3-java`
Guava ✅ Yes `libguava-java` (32.0.1)
H2 database ✅ Yes `libh2-java`
PostgreSQL JDBC ✅ Yes `libpostgresql-jdbc-java`
Joda-Time ✅ Yes `libjoda-time-java`
JAXB runtime ✅ Yes `libjaxb-java`
ByteBuddy ✅ Yes `libbyte-buddy-java`
Reflections ✅ Yes `libreflections-java`
RabbitMQ client ✅ Yes `librabbitmq-client-java`
JSON (org.json) ⚠ Partially `libjson-java` is different library (gson-based, not org.json)
Apache POI ❌ No Not packaged in Debian
**Spring Boot** ❌ No Not packaged at all
Spring Security ❌ No Not packaged at all
springdoc-openapi ❌ No Not packaged at all
Tika parsers ❌ No Not packaged for Debian
AsciiDoctor (Ruby ✅ Yes `asciidoctor` package provides CLI) the Ruby-based CLI tool
AsciiDoctorJ (Maven ❌ No The JVM wrapper plugin bridge) (`asciidoctor-maven-plugin`) is not in Debian. Can be replaced by calling the Debian `asciidoctor` CLI from Makefile, or skipped entirely since docs are cosmetic. -------------------------------------------------------------------------------------
**Bottom line**: Replacing Maven with Debian packages alone is **not feasible** because Spring Boot, POI, springdoc-openapi, and Tika have no Debian equivalents. The only realistic path is to make Maven work offline.
------------------------------------------------------------------------
## Proposal: Three-Part Approach
### Part 1 --- Build-Time Offline (Maven)
#### Option A: Commit `maven-repo/` to git (Recommended for simplicity)
**Changes needed:** 1. Add `maven-repo/*.jar`, `maven-repo/*.pom`, `maven-repo/*.sha1`, `maven-repo/*.lastUpdated` to `.gitattributes` with `export-ignore = false` and commit the directory. 2. Alternatively, use a sparse checkout or git-lfs for large binaries. 3. Add an offline guard to the Makefile:
``` makefile # In Makefile, add: MVNOPTS := -Dmaven.repo.local=$(CURDIR)/maven-repo --offline
# Optional: fail-fast if network would be needed: .PHONY: verify-offline verify-offline: @echo "Verifying offline build capability..." $(MVN) $(MVNOPTS) dependency:resolve -DskipTests || \ (echo "ERROR: Missing dependencies. Run 'make populate-maven-repo' online first."; exit 1)
build: verify-offline $(MVN) $(MVNOPTS) clean validate install ```
4. Provide a one-time online bootstrap target for maintainers to keep `maven-repo/` current:
``` makefile .PHONY: populate-maven-repo populate-maven-repo: @echo "Populating local Maven repo (requires Internet)..." $(MVN) -Dmaven.repo.local=$(CURDIR)/maven-repo dependency:go-offline $(MVN) -Dmaven.repo.local=$(CURDIR)/maven-repo dependency:resolve-plugins ```
**Pros**: Simple, works immediately, no build-system changes needed. **Cons**: Bloated git repository (~300 MB of JARs). Consider git-lfs or a separate tarball artifact instead.
#### Option B: Debian policy-compliant approach (Recommended for packaging)
For proper Debian packaging (`dpkg-buildpackage`), the standard approach is:
1. **List all upstream VCS artifacts in `debian/watch`** and use `uscan/udeb` or manually manage them in `debian/source/include-binaries`. 2. **Download all JARs during package build** from Maven Central using the `debian/rules` target, with checksums pinned in `debian/control` or a separate file. 3. Use **Maven's offline mode** (`--offline`) pointing at a pre-populated local repo that was assembled during the online phase of the Debian build.
Debian Java Policy recommends: - Each upstream JAR dependency should be either (a) packaged in Debian as `lib*-java`, or (b) downloaded and built from source within the package build process. - For option (b), use `download-maven-poms` helper script or similar to fetch artifacts.
However, since Spring Boot is not in Debian, **this project cannot be packaged purely from Debian packages**. The practical approach:
``` makefile # debian/rules snippet: %: dh $@ --with javahelper,maven
override_dh_auto_build: # Ensure offline build dh_auto_build -- --offline -Dmaven.repo.local=$(CURDIR)/deps-maven-repo ```
With `debian/control` Build-Depends including all available Debian packages:
Build-Depends: debhelper-compat (=13), dh-buildupdate, default-jdk (>= 17), maven-debian-helper, maven-repo-helper, libantlr4-runtime-java, libcommons-lang3-java, libguava-java, libh2-java, libpostgresql-jdbc-java, libjoda-time-java, libjaxb-java, libbyte-buddy-java, librabbitmq-client-java, libreflections-java
Then supplement with direct downloads for non-packaged dependencies (Spring Boot BOM, POI, Tika, springdoc-openapi). The `maven-repo-helper` tools can download these from Maven Central during the online build phase.
### Part 2 --- Test-Time Offline (Keycloak)
The file `scripts/keycloak-setup-start.sh` downloads Keycloak from GitHub:
``` bash wget https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com...https://github.com/keycloak/keycloak/releases/download/$ ```
**Fixes:**
1. **Bundle Keycloak tarball in the repo** (or ship as a separate artifact): - Download `keycloak-26.0.6.tar.gz` and place it in `scripts/keycloak/`. - Modify script to check for local file first:
``` bash if [ -f "scripts/keycloak/keycloak-${ver}.tar.gz" ]; then echo "Using bundled Keycloak tarball." else wget https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com...https://github.com/keycloak/keycloak/releases/download/$ fi ```
2. **Alternatively**, package Keycloak from Debian: `libkeycloak-admin-rest-client-java` exists, but the full Keycloak server is not packaged. The bundled tarball approach is simpler.
3. **Or skip Keycloak** for unit tests entirely --- run only `make check` (which runs JUnit tests that don't need Keycloak) and document that integration tests require the online setup:
``` makefile check-offline: $(MVN) $(MVNOPTS) test -Dtest='!*IntegrationTest,!*IT' ```
### Part 3 --- Runtime Offline (Tika Language Detection)
Apache Tika's `tika-langdetect` module downloads language detection models on first use. This happens at runtime, not build time.
**Fix:** Set the system property to prevent online download:
``` java // In application configuration or startup code: System.setProperty("org.apache.tika.language.detect.model", "/path/to/local/model.jar"); ```
Or exclude the language detection module from Tika's dependency tree if it's not needed. Check `FileHandlingService.java` and `FileUtilsService.java` --- they use `Tika` for MIME detection and text extraction, which works fine without language detection models. The models are only needed for `LanguageDetector`.
**In pom.xml**, add exclusions if language detection is not used:
``` xml <dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-parsers-standard-package</artifactId> <exclusions> <exclusion> <groupId>com.github.pemistahl</groupId> <artifactId>lingua-language-detector</artifactId> </exclusion> </exclusions> </dependency> ```
------------------------------------------------------------------------
## Proposed Debian Packages to Install
### Required Build Dependencies
default-jdk (>= 21) # Java 21+ compiler and runtime (see JDK note below) maven # Maven build tool libantlr4-runtime-java # ANTLR runtime (already in deps, but available as pkg) git # Version control (for git-commit-id plugin if used)
**JDK version bump required:** `pom.xml` currently sets `<java.version>17</java.version>`. JDK 17 has been removed from Debian Testing --- only JDK 21 (`openjdk-21-jdk`) and JDK 25 (`openjdk-25-jdk`) remain. The fix is straightforward:
``` xml <!-- In pom.xml, change: --> <properties> <java.version>21</java.version> </properties> ```
Spring Boot 3.x supports JDK 17 through 21+ and the codebase doesn't use any Java 17-specific APIs that would break on 21. This is a one-line change in `pom.xml` and updates to `.gitlab-ci.yml` (which references `openjdk-17-jdk`).
### Required Runtime Dependencies
default-jre # Java runtime postgresql # Database backend for production use rabbitmq-server # Message queue (if using AMQP integration profile) keycloak # Auth provider (NOT in Debian — bundle or skip) tesseract-ocr-nor # OCR language data (used by Tika, from .gitlab-ci.yml) unoconv # Document conversion (from .gitlab-ci.yml) libreoffice-core # For document format conversion via unoconv python3 # Test scripts use Python 3 curl # Health checks and API testing jq # JSON processing in test scripts
### Optional / Nice to Have
asciidoctor # For documentation generation (alternative to maven plugin) libh2-java # Embedded database for demo/testing mode
------------------------------------------------------------------------
## Proposed Dependencies That Could Be Dropped
-------------------------------------------------------------------------------------------------------------- Dependency Used For Can Drop? Notes ---------------------------------- ----------------- ------------------- ------------------------------------- `spring-boot-starter-amqp` RabbitMQ **Yes, Only needed if mail queue integration integration conditionally** is used. Profile-gated via `application-queueintegration.yml`.
`spring-boot-starter-validation` Bean validation No Core functionality depends on it. (`@NotNull`)
`springdoc-openapi-*` Swagger/OpenAPI **Yes** Cosmetic/documentation only. Can be UI docs excluded for minimal build.
`asciidoctor-maven-plugin` API documentation **Yes** Only generates HTML docs during generation package phase, not needed to run the app.
`spring-restdocs-*` (test scope) REST API **Yes** Test-time doc generation only. documentation tests
`junit-vintage-engine` (test JUnit 3/4 Maybe Only if all tests migrate to JUnit 5. scope) compatibility in tests
`spring-boot-starter-webflux` Reactive web Maybe Depends on which tests use it. (test scope) client for tests
Tika language detection models Language **Yes** MIME type detection works without it. identification of Exclude from classpath or set offline documents mode property. --------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------
## Summary: Concrete Steps to Achieve "Deserted Island" Build
### Immediate (low effort, high impact)
1. **Add `--offline` flag to Maven in Makefile**:
``` makefile MVNOPTS := -Dmaven.repo.local=$(CURDIR)/maven-repo --offline ```
2. **Commit `maven-repo/` contents** (or create a tarball artifact):
``` bash mvn dependency:go-offline -Dmaven.repo.local=$(pwd)/maven-repo mvn dependency:resolve-plugins -Dmaven.repo.local=$(pwd)/maven-repo # Then commit or tar the directory ```
3. **Bundle Keycloak** in `scripts/keycloak/` and update `keycloak-setup-start.sh`.
4. **Guard against Tika model download** via system property: Add to `application.yml`:
``` yaml spring: main: add-application-context-initializer: true --- # Or set JVM flag: -Dorg.apache.tika.language.detect.model=none ```
### Medium-term (proper Debian packaging)
5. **Create proper `debian/` directory** with: - `debian/control` listing all Build-Depends - `debian/rules` using `dh-sequence-maven` or manual Maven invocation with offline mode - `debian/source/include-binaries` for bundled JARs that can't be Debian-packaged 6. **Document the offline build process** in a new `docs/general/OfflineBuild.md`.
### Long-term: Get Into Debian Main (reduce external dependency count)
This section addresses the fundamental conflict between Maven's "download JARs" model and Debian's requirement that all code in `main` be built from source available within Debian.
#### The Problem: Bundled Binary JARs vs DFSG Compliance
Debian Free Software Guidelines (DFSG) and [Java Policy](https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.debian...https://www.debian.org/doc/packaging-manuals/policies/java-policy/) require: 1. **All code must be available in source form** --- binary-only JAR blobs are not acceptable for `main` 2. **Build dependencies must themselves be Debian packages** --- downloading from Maven Central during build is only acceptable for `non-free` 3. **Each upstream component should be packaged separately** as its own `lib*-java` package
Currently, ~450 JARs are downloaded from Maven Central. Even if their licenses are DFSG-compatible (Apache 2.0, MIT, LGPL), they violate the "buildable from source" requirement because: - They're pre-built binaries shipped alongside our source - Their build chain is external to Debian and not reproducible within the archive
#### Three Paths Forward
**Path A: Package Each Dependency Individually (Required for `main`)**
Every non-packaged dependency must become its own Debian package. The ones we need to package are:
------------------------------------------------------------------------ Dependency License Packaging Difficulty -------------------- --------------- ----------------------------------- Spring Boot 3.4.x Apache 2.0 **Very high** --- ~50 transitive modules, each needs separate packaging; depends on Jakarta EE APIs not all in Debian
Apache POI 5.4.0 Apache 2.0 Medium --- single project but large; may already exist as `libpoi-java` (check)
Tika parsers 2.8.0 Apache 2.0 **Very high** --- huge dependency tree including Lucene, XML libraries, etc.
springdoc-openapi MIT Low --- single project, 1.6.x straightforward Maven build
Keycloak server Apache 2.0 High --- not needed in `main` if (test only) test-only; bundle as VCS artifact or skip ------------------------------------------------------------------------
This is a **multi-year effort**. Each package needs proper metadata, patches, and maintenance. Spring Boot alone has dozens of modules that would need individual packaging.
**Path B: Use Maven Offline with Online Download Phase (Acceptable for `non-free`)**
For `contrib`/`non-free`, the approach is simpler: 1. `debian/rules` downloads all JARs from Maven Central during online build phase 2. Checksums are verified against pinned values in `debian/control` or checksum file 3. Build runs with `--offline` pointing at pre-populated local repo
This still requires Internet access during package build, but is acceptable for non-free. The "deserted island" test would pass because the Debian mirror includes these downloaded artifacts as part of the built package.
**Path C: Hybrid Approach (Recommended Near-Term)**
Use a combination: 1. **Replace packaged dependencies with Debian packages** where available (ANTLR, Guava, H2, etc.) --- reduces JAR count from ~450 to ~350 2. **Package the critical non-packaged ones ourselves**: springdoc-openapi (MIT, easy), any others that are small and well-maintained 3. **Bundle remaining JARs** for now with proper licensing documentation, targeting `non-free` initially 4. **Work upstream** to get Spring Boot packaged in Debian --- this is the blocker for everything
#### Concrete Steps for Path C
``` makefile # debian/rules approach: override_dh_auto_build: # Use Debian packages where available via classpath # Download remaining from Maven Central (online phase only) dh_auto_build -- -Dmaven.repo.local=$(CURDIR)/.deps-repo --offline ```
With `debian/control`:
Build-Depends: debhelper-compat (=13), default-jdk (>= 21), maven-debian-helper, libantlr4-runtime-java, libguava-java, libh2-java, libpostgresql-jdbc-java, libjoda-time-java, libjaxb-java, libbyte-buddy-java, librabbitmq-client-java, libreflections-java, # Replace Maven deps with Debian packages where available Standards-Version: 4.6.2
And `debian/copyright` documenting all bundled JAR licenses.
#### Impact on "Deserted Island" Goal
The "deserted island" goal is **achievable now** for building and running the application, even without Debian main packaging: - A complete Debian mirror + our source repo + pre-populated `maven-repo/` = offline build works - The barrier to debian `main` is separate from the ability to build/run offline
The two goals should be tracked separately: 1. **Offline build capability** (this document) --- achievable with bundled artifacts 2. **Debian main compliance** --- requires packaging all dependencies or dropping Spring Boot
7. **Evaluate dropping springdoc-openapi and asciidoctor-maven-plugin** from default build, moving them to an optional profile.
8. **Profile-gate AMQP integration** more clearly so it's not pulled in by default.
9. **Replace Tika with Debian-packaged alternatives** where possible (e.g., `file` command for MIME detection, `tesseract-ocr` for OCR) --- significant refactoring needed.
------------------------------------------------------------------------
## Verification Checklist
To verify the "deserted island" build works:
``` bash # 1. Start with fresh clone + bundled maven-repo tarball git clone <repo> && cd nikita-noark5-core-upstream tar xzf ../maven-repo-bundle.tar.gz # or already in git
# 2. Ensure no network connectivity iptables -A OUTPUT -p tcp --dport 80 -j DROP iptables -A OUTPUT -p tcp --dport 443 -j DROP
# 3. Build offline make build check
# 4. Verify success test -f target/nikita-noark5-core-*.jar && echo "BUILD SUCCESS"
# 5. Restore network iptables -D OUTPUT -p tcp --dport 80 -j DROP iptables -D OUTPUT -p tcp --dport 443 -j DROP ```
If `make build check` succeeds with all ports blocked, the deserted island test passes.
-- Vennlig hilsen Petter Reinholdtsen _______________________________________________ nikita-noark mailing list -- nikita-noark@nuug.no To unsubscribe send an email to nikita-noark-leave@nuug.no
[Thomas John Sødring]
Jeg er usikker om det er en **multi-year effort**, men jeg tror at det er en rabbit-hole og mye som man må sette seg inn i (som sludderbot sier). Ellers er jeg enig med det sludderbot sier om å fjerne avhengigheter som kø, dersom det ikke er bruk for kø når man kjører lokalt.
Hvis en tar på seg ansvar for nye pakker i Debian og laster opp avhengigheter som mangler inn i Debian, så må en regne med å ha ansvaret for vedlikehold av disse i flere år fremover, så joda, det er nok en forpliktelse som går over lang tid.
Jeg ble inspirert av rapporten og tok derfor en titt på om det er avhengigheter som kan droppes. Det er to ubrukte og to ubrukelige avhengigheter (for eksempel eget bibliotek for å finne dagens år, det går like greit uten det biblioteket), som jeg foreslår droppes. Raskere bygg, mindre angrepsflate og mindre jobb i fremtiden for å få Nikita inn i Debian. :)
Det som kanskje også hadde vært interessant å se på var hvorvidt sludderbot kan oppgradere tjenestegrensesnitt fra 5.5 til 5.6 og eventuell om sludderbot klarer å finne uklarheter i beskrivelsen av tjenestegrensesnittet. Av og til er det fint med et ekstra sett øyner og sludderbotene er flink til det.
Det siste har jeg tenkt på men ikke forsøkt ennå. Antar den også vil finne uklarheter, gitt hvor enkelt jeg selv finner svakheter nesten hver gang jeg finleser spesifikasjonen. :)
Lite tro på full spesifikasjonsskriving for endring fra Noark 5.5 til 5.6, men kunne være morsomt å forsøke.