Jump to user comments
information science, human language A program or
algorithm which determines the morphological root of a given inflected
(or, sometimes, derived) word form -- generally a written word
form.
A stemmer for English, for example, should identify the
string "cats" (and possibly "catlike", "catty" etc.) as
based on the root "cat", and "stemmer", "stemming", "stemmed"
as based on "stem".
English stemmers are fairly
trivial (with only occasional
problems, such as "dries" being the third-person singular
present form of the verb "dry", "axes" being the plural of
"ax" as well as "axis"); but stemmers become harder to design
the target language becomes more complex. For example, an
Italian stemmer is more complex than an English one (because
of more possible verb inflections), a Russian one is more
complex (more possible noun declensions), a Hebrew one is even
more complex (a
hairy writing system), and so on.
Stemmers are common elements in
query systems, since a user
who runs a query on "daffodils" probably cares about documents
that contain the word "daffodil" (without the s).
(April 1997) handles only conversion of plurals to singulars).
(1997-04-09)