Klasse WordBoundaryEntropyEncoder
java.lang.Object
org.xbib.interlibrary.catalog.matching.string.WordBoundaryEntropyEncoder
- Alle implementierten Schnittstellen:
StringEncoder
A word-boundary aware entropy encoder.
Each character at word beginning or end of string is kept as upper case, the rest
will collabse into lower case. This helps preserving words.
The reason for preserving that kind of information is that errors are very rare at boundaries
and the too eager dropping of single frequence character are compensated.
There are also lone characters forming a word in a journal title, e.g.
"Physical Review A", Physical Review B" which are preserved this way.
Inspired by:
Character coding for bibliographical record control.
E. J. Yannakoudakis, F. H. Ayres and J. A. W. Huggill.
Computer Centre, University of Bradford, 1980
Yannakoudakis, E. J. Derived search keys for bibliographic
retrieval. SIGIR Forum 17, 4 (Jun. 1983), 220-237.
-
Konstruktorübersicht
Konstruktoren -
Methodenübersicht
-
Konstruktordetails
-
WordBoundaryEntropyEncoder
public WordBoundaryEntropyEncoder()
-
-
Methodendetails
-
encode
Encode a string by a simple entropy-based method. Strategy: count characters in lower-case string, select only characters with a frequency of 1, drop space characters.- Angegeben von:
encode
in SchnittstelleStringEncoder
- Parameter:
s
- s- Gibt zurück:
- encoded string
- Löst aus:
EncoderException
- if encoding fails
-