[Kevin Patrick Scannell]
Right - generally speaking I only provide the data to open source projects.
OK. The problem with this is that we as a free software project have no way to keep the info we get from you away from non-free software projects. To make sure all people willing to contribute to your project have the means to do it, we need to make sure all our sources and background material are publicly available. We already tried the alternative when the last maintainer went missing and no-one else had the background material and could not really update the spell checker for 5 years. If you want us to keep your data non-public, it will in reality be unavailable for all potential (and probably some current) contributors of the spell checking project. So what do you mean when you say you only provide data to open souce projects? Are projects like us supposed to keep the information you provide away from non-open source projects, or can we make it publicly available for everyone?
What we've done for some languages in similar situations is convince the dictionary project to make some of their data available in return for access to the corpora - maybe you could explore this possibility with them.
I will for sure explore that possibility, but it would only solve part of the problem. :)
Actually I meant the latest development version of your spell checking packages, word list + affix file, etc.
Right. Those are available from URL:https://alioth.debian.org/projects/spell-norwegian/. The build system is a bit special, so you will have to extract the bokmål and nynorsk words from norsk.words. :)
I used the existing aspell dictionaries to train the web crawler, but it looks like those were created by someone else and are outdated.
Yes. The aspell package floating around is based on the old source of the spell checking package.
I can send you raw frequency lists but those aren't all that useful since they contain a lot of "pollution" - my software works in part by trying to filter out the pollution by statistical means. This is why having your latest version is useful.
Having a look at the raw frequency list would be useful for me, to see which words in the current package could use an updated frequency value. Please post the URL to i18n-no@.