Vuos, ođđajagemánu 9. b. 2006 14:44, Kevin Patrick Scannell čálii:
On 08:34 Mon 09 Jan , Petter Reinholdtsen wrote:
[Kevin Patrick Scannell]
Right - generally speaking I only provide the data to open source projects.
OK. The problem with this is that we as a free software project have no way to keep the info we get from you away from non-free software projects.
Sure you do: just don't put all of the texts on the web. Run-of-the-mill contributors to the project have no need for the unprocessed corpora. You're welcome to put frequency lists, etc. up for others to use, or word lists for contributors to check.
Right. Those are available from URL:https://alioth.debian.org/projects/spell-norwegian/. The build system is a bit special, so you will have to extract the bokmål and nynorsk words from norsk.words. :)
thanks, I'll have a look.
I can send you raw frequency lists but those aren't all that useful since they contain a lot of "pollution" - my software works in part by trying to filter out the pollution by statistical means. This is why having your latest version is useful.
Having a look at the raw frequency list would be useful for me, to see which words in the current package could use an updated frequency value. Please post the URL to i18n-no@.
Temporary link:
http://borel.slu.edu/obair/nbnnse.zip
frequencies based on corpora of 1.38M words (nb) 3.06M words (nn) 1.99M words (se)
Thank you, Kevin! I'll run the sámi part of the list through our transducer and report the results to you.