On 12:54 Mon 02 Jan , Trond Trosterud wrote:
There are two Sámi projects, http://divvun.no and http:// giellatekno.uit.no. Both are open source, GPL, both will make everything available. We just don't think things are ready enough to just put up a download link, but interested parties may get copies of source code already now.
I see - I'd definitely like to see what you have so far.
And I understand about the Xerox stuff. There's a similar situation for Irish - a morphological analyzer was developed at the Irish Linguistics Institute (ITÉ) with the Xerox tools. I developed a separate, completely open source version to use for my grammar checking stuff.
So as long as you make the Sámi transducer sources available freely, I'm satisfied (and I'll be excited to look at what you've done since I have a particular interest in morphology).
As is clear from the above, the Sámi projects do not work like that. We have a transducer with a lexicon and a morphological component. We also have corpora from which it is possible to make list of wordforms (not words, i.e., not lemmas, and also not full paradigms). So on this point some elaboration on what you mean is probably needed.
My code is designed to work with an existing Xspell package (even if it amounts to a simple word list with no affix file, which is how most languages start out). So for me to offer any non-trivial help I'd have to work your transducer into my system, which mightn't be too hard if you're interested.
What we would like to get access to is the text corpus you have gathered from the web, but I take it that my collegue Børre will return to you on that issue.
OK, well just sending you raw data is the easiest thing for me! Let me know.
Best Kevin