[i18n-no] Re: [i18n-sme]Re: Access to you Norwegian and Nothern Saami word collection?

2 Jan 2006


      Petter Reinholdtsen kirjoitti 2. jan. 2006 kello 10.02:
...
[Kevin Patrick Scannell]
...
Yes, I believe Børre Gaup wrote to me about this last year some
time.
Right.  I suspect he got too busy to follow up on it. :)
He will return to this, we are following your website.
...
...
I don't make the word/frequency lists available on the web because
(ironically) I intend them only to be used for open source projects.
Heh.  Quite ironic, yes, when one consider the freedom aspect of it
all. :)
Does this mean that you are not interested in making the information
available to non-free projects?
Kevin may answer himself, but I read him so that he would like to  
give the lists to projects he knows adher to open source. In that  
case, it is easy, since all projects involved in this discussion  
(both nno/nob spellers and the Sámi speller project) are open source  
projects (but cf. below).
...
...
Can you tell me a bit about the licensing you'll be using for the
spell checkers?  As I recall there was some kind of morphological
back end being written for Saami - will that be open source also or
will you use it to generate a large word list offline?  Are you
writing affix files too?
There are two Sámi projects, http://divvun.no and http:// 
giellatekno.uit.no. Both are open source, GPL, both will make  
everything available. We just don't think things are ready enough to  
just put up a download link, but interested parties may get copies of  
source code already now. The one notable proviso is that the core  
analysers are compiled by Xerox compilers (twolc, lexc, xfst), these  
compilers belong to Xerox and are not open source. We do not have  
access to the source code of these compilers, only to their binary  
versions. But they are accessible (as binaries) to all the buyers of  
the http://www.fsmbook.com/ book, so the open-source Sámi  
morphological transducers may be modified, compiled and run by anyone.
As for affix files, that is for a different technology, the Xspell  
family. The divvun project alpha version was made on schedule, as an  
aspell spellchecker, in August, but we haven't distributed it, since  
we would like to get past some more basic problems before we invite  
testers to look at it. Interested parties may get a version, though.  
This is all documented on our web pages, cf. http://divvun.no/doc/ 
proof/spelling/X-spell/aspell.html.
...
...
Anyway, if you send me your latest word lists for all three
languages (with affix flags expanded, if any) I can send lists of
"best candidates" for addition that are determined via some naive
statistics.
As is clear from the above, the Sámi projects do not work like that.  
We have a transducer with a lexicon and a morphological component. We  
also have corpora from which it is possible to make list of wordforms  
(not words, i.e., not lemmas, and also not full paradigms). So on  
this point some elaboration on what you mean is probably needed.
What we would like to get access to is the text corpus you have  
gathered from the web, but I take it that my collegue Børre will  
return to you on that issue.
Trond.
----------------------------------------------------------------------
Trond Trosterud                                        t +47 7764 4763
Institutt for språkvitskap, Det humanistiske fakultet  m +47 950 70140
N-9037 Universitetet i Tromsø, Noreg                   f +47 7764 5216
Trond.Trosterud (a) hum.uit.no          http://www.hum.uit.no/a/trond/
----------------------------------------------------------------------

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

[i18n-no] Re: [i18n-sme]Re: Access to you Norwegian and Nothern Saami word collection?