[Kevin Patrick Scannell]
Sure you do: just don't put all of the texts on the web. Run-of-the-mill contributors to the project have no need for the unprocessed corpora.
My point is that we have no way to know who is going to be deeply involved and who will be just a passing contributor. And to make sure all those willing and capable of becoming deeply involved can do so without having to be trained by the "know-hows" and given extra access, all the information used to maintain the spell checker need to be publicly available. Yes, of course we could have a secret archive of extra information, but then we run the risk killing the project when the few with access to the secret archive disappear from the project.
This has already happened once with this spell checking project, when Rune Kleveland started working and lost access to the project web page. This stopped development for almost five years. He had (and probably still have) access to lots of extra data, and as no-one else had this data we had a really hard time to continue development. I do not want us to end up in that situation again, and thus believe we should base the spell checking work only on publicly available sources.
You're welcome to put frequency lists, etc. up for others to use, or word lists for contributors to check.
Good. If that is the public info we can get from you, it will come very handy. :)
Temporary link:
Thank you. I've downloaded it, and will put it on the web pages soon. Will need to massage the scripts before I can use the numbers to update the frequency info in norsk.words. :)