Archiving web pages as PDF...

Petter Reinholdtsen pere at hungry.com
Fri Jul 7 12:12:15 CEST 2017


A while back, Alexander mentioned his expectation that my simple
archive-file tool should be able to take a URL and upload the web page
into the archive.  It is a simple and obvious idea, and archive-file is
not able to do so.

But while looking for a free software tool to format my song book[1] as
PDF, I came across a tool that might be able to help us.  The
wk<html>topdf tool[2] is able to download a web page and output it as a
PDF using the QT Webkit web rendering engine in headless mode.  With it
we should be able to specify the URL to a publicly available web page
and stash it into the archive with the content that would show up in a
Webkit based web browser.

 [1] <URL: http://www.hungry.com/~pere/cs-songbook/ >
 [2] <URL: https://wkhtmltopdf.org/ >

It is not optimal, and I would prefer to archive web pages as web pages
to be able to see exactly what the web page looked like on the inside,
but at least something is stored that we can be reasonably sure is
viewable in 100 years.

-- 
Happy hacking
Petter Reinholdtsen


More information about the nikita-noark mailing list