Recent Discussions in LingPipe


To join this group, send a blank email to LingPipe-subscribe@yahoogroups.com or visit http://groups.yahoo.com/group/LingPipe



  1. Re: Chinese Model Quality
    ... We did this for Chinese in the past by extending sentences.HeuristicSentenceModel with the appropriate end tokens for Chinese and using the...
    - Thu, 20 Nov 2008 19:28:41 GMT
  2. Re: Chinese Model Quality
    Thanks, Bob. The goal of making English Chinese word alignment is to create some TMX files for "translation memory" tools used by translators. We have some MT...
    - Thu, 20 Nov 2008 15:36:44 GMT
  3. Re: Chinese Model Quality
    ... Usually longer n-grams means more accuracy up to a point at which accuracy plateaus. Longer n-grams can overfit in some situations compared to shorter...
    - Wed, 19 Nov 2008 18:12:58 GMT
  4. Re: Chinese Model Quality
    Hi Bob, Thanks for replying. Does longer n-grams model mean more accuracy? How do I prune out low-count sequences from model using LingPipe? I have some...
    - Tue, 18 Nov 2008 19:49:25 GMT
  5. Re: Chinese Model Quality
    ... The other way to control model size is take longer n-grams and prune out low-count sequences. If you follow the tutorial, you'll see where we run standard...
    - Sat, 15 Nov 2008 02:08:15 GMT
  6. Chinese Model Quality
    Hi Bob, I have a question on Model Quality. I used the ChineseToken sample to generated a words-zh-as.CompiledSpellChecker model, which has size 78,303KB.  I...
    - Fri, 14 Nov 2008 18:11:36 GMT
  7. Re: Chinese Token Demo Bug patch (CompiledSpellChecker.setTokenSet)
    Thanks much, Bob.  The patch fixed my problem. ________________________________ From: Bob Carpenter <carp@...> To: LingPipe@yahoogroups.com Sent:...
    - Fri, 14 Nov 2008 16:38:38 GMT
  8. Re: Chinese Token Demo Bug patch (CompiledSpellChecker.setTokenSet)
    ... It sure does. Thanks for the detailed bug report. The culprit is the following file: $LINGPIPE/src/com/aliasi/spell/CompiledSpellChecker.java The method...
    - Thu, 13 Nov 2008 21:12:23 GMT
  9. ChineseToken tutorial sample in version 3.6.0 doesn't work
    I recently downloaded LingPipe 3.6.0 and tried the ChineseToken tutorial sample.  I always got NullPointerException. My environment is: - Window XP - Java...
    - Wed, 12 Nov 2008 20:21:37 GMT
  10. Re: v 3.0 clustering API backward incompatibility with 2.4.1
    As of 3.0, the chunking interface completely changed so it's no longer backward compatible with 2.x code. The last version of LingPipe to support the...
    - Tue, 28 Oct 2008 15:25:48 GMT
  11. Re: Spell checking and edit distance
    ... I should make this clearer in the doc for each operation. ... Right. It's the noisy-channel setup, so it's edits going from the suggested term to the...
    - Thu, 23 Oct 2008 17:00:29 GMT
  12. Spell checking and edit distance
    Hello- I'm having a blank. In the context of spell checking, is the edit distance used between the user-entered term and the suggested term, or the reverse? I...
    - Thu, 23 Oct 2008 14:42:47 GMT
  13. Re: NER for multi language documents
    ... Nothing free and fast, I'm afraid. We don't have corpora in French or German. Spanish is easy -- you can get it from the CoNLL data. You can get Spanish...
    - Tue, 14 Oct 2008 17:45:09 GMT
  14. Re: NER for multi language documents
    Dear Bob, my problem is multilanguage in the sense that I handle documents that can be written in German, French, Spanish and so on, each document is written...
    - Tue, 14 Oct 2008 17:33:32 GMT
  15. Re: NER for multi language documents
    ... We don't distribute any data, but our named entity tutorial points to some sources of data. ELRA and LDC also distribute data, but it's expensive. Most...
    - Tue, 14 Oct 2008 17:17:43 GMT