Lingua::EN::StopWordList - A sorted list of English stop words


        use Lingua::EN::StopWordList;

        my($ara_ref) = Lingua::EN::StopWordList -> new -> words;

Here's a complete program:

        use strict;
        use warnings;
        use Lingua::EN::StopWordList;

        my($count) = 0;

        print map{"@{[++$count]}: $_\n"} @{Lingua::EN::StopWordList -> new -> words};


Lingua::EN::StopWordList is a pure Perl module.

It returns a sorted arrayref of 659 English stop words.

Constructor and initialization

new(...) returns an object of type Lingua::EN::StopWordList.

This is the class's contructor.

Usage: Lingua::EN::StopWordList -> new.


This module is available as a Unix-style distro (*.tgz).

Install Lingua::EN::StopWordList as you would for any Perl module:


        cpanm Lingua::EN::StopWordList

or run:

        sudo cpan Lingua::EN::StopWordList

or unpack the distro, and then run one of:

        perl Build.PL
        ./Build test
        ./Build install


        perl Makefile.PL
        make (or dmake)
        make test
        make install

See "Constructor and initialization".


Returns the sorted arrayref of English stop words.


Is there a definitive list of stop words?

No, there is no such thing as a definitive list. For an important discussion, e.g. including 'phrase search', see the Wikipedia discussion of word lists.

Where does the list come from?

I downloaded it from the bottom of this page: It contains 659 words.

Are there other lists available?

Sure. Try This list contains 570 words.

Another good place to look is, but its English list only contains 174 words. Since Lingua::StopWords (below) also has 174 words in its Englist list, perhaps this is where that module got its words from. Lastly, it has stop word lists for a whole range of languages.

Alternately, just Google for references to various lists. Note however these lists are normally very short.

Why another Perl module for stop words?

Lingua::StopWords only has a short list of words (174). And its bug list goes back 3 years.

Lingua::EN::StopWords only has a short list of words (227). Also, this module is part of Lingua::EN::Segmenter, whose documentation is poor. Even the exact basis of how it splits text is not documented. Lastly, its bug list goes back 6 years.

I could have offered to take over maintentance of either or both those modules, but there are problems:

o Lingua::StopWords

It ships with a set of sub-modules, with names like Lingua::StopWords::EN, but I'm not in a position to support its other languages if I put my module's English list into it.

Nevertheless, the fact that it supports 13 languages is definitely something in favour of this module.

o Lingua::EN::StopWords

This is part of text processing stuff which I don't want to get involved with. Also, it has a long list of pre-reqs (not listed on MetaCPAN until you view the makefile), which may well suit the purposes of Lingua::EN::Segmenter, but is overkill for just a stop word list.

Several other Perl modules, written for various purposes, either use one of the above, or have their own very short (as always) lists.

How can I help?

If you translate the list of stop words in this module into your favourite language and email it to me, I will include your words in the next release.

It all depends on whether you think this new list is somehow 'better' than the lists in pre-existing modules. I cannot make that decision on your behalf.

See Also


This module includes a comparison of various stopword list modules.





Lingua::EN::StopWordList was written by Ron Savage <> in 2012.



