java.lang.Object
org.xbib.interlibrary.catalog.matching.string.WordBoundaryEntropyEncoder
Alle implementierten Schnittstellen:
StringEncoder

public class WordBoundaryEntropyEncoder extends Object implements StringEncoder
A word-boundary aware entropy encoder. Each character at word beginning or end of string is kept as upper case, the rest will collabse into lower case. This helps preserving words. The reason for preserving that kind of information is that errors are very rare at boundaries and the too eager dropping of single frequence character are compensated. There are also lone characters forming a word in a journal title, e.g. "Physical Review A", Physical Review B" which are preserved this way. Inspired by: Character coding for bibliographical record control. E. J. Yannakoudakis, F. H. Ayres and J. A. W. Huggill. Computer Centre, University of Bradford, 1980 Yannakoudakis, E. J. Derived search keys for bibliographic retrieval. SIGIR Forum 17, 4 (Jun. 1983), 220-237.
  • Konstruktordetails

    • WordBoundaryEntropyEncoder

      public WordBoundaryEntropyEncoder()
  • Methodendetails

    • encode

      public String encode(String s) throws EncoderException
      Encode a string by a simple entropy-based method. Strategy: count characters in lower-case string, select only characters with a frequency of 1, drop space characters.
      Angegeben von:
      encode in Schnittstelle StringEncoder
      Parameter:
      s - s
      Gibt zurück:
      encoded string
      Löst aus:
      EncoderException - if encoding fails