Recent Discussions in LingPipe
To join this group, send a blank email to LingPipe-subscribe@yahoogroups.com or visit http://groups.yahoo.com/group/LingPipe
-
- Re: Chinese Model Quality
- ... We did this for Chinese in the past by extending sentences.HeuristicSentenceModel with the appropriate end tokens for Chinese and using the...
- - Thu, 20 Nov 2008 19:28:41 GMT
-
- Re: Chinese Model Quality
- Thanks, Bob. The goal of making English Chinese word alignment is to create some TMX files for "translation memory" tools used by translators. We have some MT...
- - Thu, 20 Nov 2008 15:36:44 GMT
-
- Re: Chinese Model Quality
- ... Usually longer n-grams means more accuracy up to a point at which accuracy plateaus. Longer n-grams can overfit in some situations compared to shorter...
- - Wed, 19 Nov 2008 18:12:58 GMT
-
- Re: Chinese Model Quality
- Hi Bob, Thanks for replying. Does longer n-grams model mean more accuracy? How do I prune out low-count sequences from model using LingPipe? I have some...
- - Tue, 18 Nov 2008 19:49:25 GMT
-
- Re: Chinese Model Quality
- ... The other way to control model size is take longer n-grams and prune out low-count sequences. If you follow the tutorial, you'll see where we run standard...
- - Sat, 15 Nov 2008 02:08:15 GMT
-
- Chinese Model Quality
- Hi Bob, I have a question on Model Quality. I used the ChineseToken sample to generated a words-zh-as.CompiledSpellChecker model, which has size 78,303KB. I...
- - Fri, 14 Nov 2008 18:11:36 GMT
-
- Re: Chinese Token Demo Bug patch (CompiledSpellChecker.setTokenSet)
- Thanks much, Bob. The patch fixed my problem. ________________________________ From: Bob Carpenter <carp@...> To: LingPipe@yahoogroups.com Sent:...
- - Fri, 14 Nov 2008 16:38:38 GMT
-
- Re: Chinese Token Demo Bug patch (CompiledSpellChecker.setTokenSet)
- ... It sure does. Thanks for the detailed bug report. The culprit is the following file: $LINGPIPE/src/com/aliasi/spell/CompiledSpellChecker.java The method...
- - Thu, 13 Nov 2008 21:12:23 GMT
-
- ChineseToken tutorial sample in version 3.6.0 doesn't work
- I recently downloaded LingPipe 3.6.0 and tried the ChineseToken tutorial sample. I always got NullPointerException. My environment is: - Window XP - Java...
- - Wed, 12 Nov 2008 20:21:37 GMT
-
- Re: v 3.0 clustering API backward incompatibility with 2.4.1
- As of 3.0, the chunking interface completely changed so it's no longer backward compatible with 2.x code. The last version of LingPipe to support the...
- - Tue, 28 Oct 2008 15:25:48 GMT
-
- Re: Spell checking and edit distance
- ... I should make this clearer in the doc for each operation. ... Right. It's the noisy-channel setup, so it's edits going from the suggested term to the...
- - Thu, 23 Oct 2008 17:00:29 GMT
-
- Spell checking and edit distance
- Hello- I'm having a blank. In the context of spell checking, is the edit distance used between the user-entered term and the suggested term, or the reverse? I...
- - Thu, 23 Oct 2008 14:42:47 GMT
-
- Re: NER for multi language documents
- ... Nothing free and fast, I'm afraid. We don't have corpora in French or German. Spanish is easy -- you can get it from the CoNLL data. You can get Spanish...
- - Tue, 14 Oct 2008 17:45:09 GMT
-
- Re: NER for multi language documents
- Dear Bob, my problem is multilanguage in the sense that I handle documents that can be written in German, French, Spanish and so on, each document is written...
- - Tue, 14 Oct 2008 17:33:32 GMT
-
- Re: NER for multi language documents
- ... We don't distribute any data, but our named entity tutorial points to some sources of data. ELRA and LDC also distribute data, but it's expensive. Most...
- - Tue, 14 Oct 2008 17:17:43 GMT