Main Page | See live article | Alphabetical index

Spellchecker

A spellchecker is a software program designed to verify the spelling of words in a file. They are customarily incorporated in word-processors, email programs, and similar programs where users produce a lot of text and are rarely used nowadays on a standalone basis.

Table of contents
1 Design
2 History
3 Functionality

Design

A spellchecker customarily consists of two parts:

  1. a set of routines for scanning text and extracting words, and
  2. a wordlist (often referred to as a dictionary) against which the words found in the text are compared.

The scanning routines usually have language-dependent algorithms for handling morphology. Even for a lightly inflected language like English, word extraction routines will need to handle such phenomena as contractions and possessives.

The wordlist might simply be a list of words, or it might also contain additional information, such a hyphenation points or lexical and grammatical attributes.

As an adjunct to these two components, the program's user interface will allow users to approve replacements and modify the program's operation.

(The one exception to the above paradigm are spellcheckers which use solely statistics, ie bigrams and trigrams, but these have never caught on.)

History

The first spellcheckers appeared for CP/M computers in the late 1970s, followed by packages for the IBM PC after it was introduced in 1981. Developers such as Soft-Art, Microlytics, Proximity, Circle Noetics, and Reference Software rushed OEM packages or end-user products into the rapidly expanding software market, primarily for the PC but also for Apple Macintosh, VAX, and Unix. On the PCs, these spellcheckers were standalone programs, many of which could be run in TSR mode from within wordprocessing packages on PCs with sufficient memory.

However, the market for standalone packages was shortlived, as by the mid 1980s developers of popular wordprocessing packages like WordStar and WordPerfect had incorporated spellcheckers in their packages, mostly licensed from the above companies, who quickly expanded support from just English to European and eventually even Asian languages. However, this required increasing sophistication in the morphology routines of the software, particularly with regard to heavily-inflected languages like Hungarian and Finnish. Although the size of the wordprocessing market in a country like Iceland might not have justify the investment of implementing a spellchecker, companies like WordPerfect nonetheless strove to localize their software for as many as possible national markets as part of their global marketing strategy.

Functionality

The first spellcheckers were "verifiers" not "correctors"; they offered no suggestions for incorrectly spelled words. This was helpful for typos but it was not so helpful for logical or phonetic errors. The challenge the developers was faced was the difficulty is offering useful suggestions for misspelled words. This requires reducing words to a skeletal form and applying pattern-matching algorithms.

It might seen logical that where spellchecking dictionaries are concerned, "the bigger the better". In practice however, an optimal size for English appears to be around 90,000 entries. More than that, incorrectly spelled words may be skipped because they are mistaken for others. Hypothetically speaking, for example, a linguist might determine in the basis of corpus linguistics that the word baht is more frequently a misspelling of bath or bat than a reference to the Thai currency. Hence, it would be more useful if baht is recognized as a incorrectly spelled word and not included in the wordlist.

The first MS DOS spellcheckers were mostly used in proofing mode from within wordprocessing packages. After preparing a document, a user scanned the text looking for misspellings. Later, however, batch processing was offered in such packages as Oracle's shortlived CoAuthor. This allowed a user to view the results after a document was processed and only correct the words which he or she knew to be wrong. When memory and processing power became abundant, spellchecking was performed in the background in an interactive way, such as has been the case with Microsoft Word since Word 97.

In recent years, spellcheckers have become increasingly sophisticated; some are now capable of recognizing simple grammatical errors. However, even at their best, they rarely catch all the errors in a text (such as homonym errors) and will inevitably flag neologisms and foreign words as misspellings.