The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Voting::VoteFairRanking - Calculates VoteFair Ranking results

VERSION

Version 5.01

SYNOPSIS

VoteFair Ranking is described at www.VoteFair.org and in the book "Ending The Hidden Unfairness In U.S. Elections" by Richard Fobes. The components of VoteFair Ranking that are implemented here are briefly described below in the ABOUT section.

The following sample code executes this module.

use Voting::VoteFairRanking;
&Voting::VoteFairRanking::votefair_read_calculate_write( );

This usage assumes that you supply via STDIN (standard input) a file that contains the appropriately formatted election/survey/poll data, and that you direct the output via STDOUT (standard output) to a file. These input and output files can be handled most easily by using Vote-Info-Split-Join (VISJ) files, which are available on GitHub in the CPSolver account. The VISJ framework uses the Language::Dashrep module.

Alternatively, this module can be accessed from any Perl software by directly using these subroutines:

  • votefair_put_next_vote_info_number

  • votefair_do_calculations_all_questions

  • votefair_get_next_result_info_number

(c) Copyright 1991 through 2011 Richard Fobes at www.VoteFair.org. You can redistribute and/or modify this VoteFairRanking library module under the Perl Artistic license version 2.0 (a copy of which is included in the LICENSE file). As required by the license this full copyright notice must be included in all copies of this software.

Conversion of this code into another programming language is also covered by the above license terms.

The mathematical algorithms of VoteFair Ranking are in the public domain.

ABOUT

This module calculates VoteFair Ranking results. The portions of VoteFair Ranking implemented here are:

  • VoteFair popularity ranking. This voting method calculates the full popularity ranking of all candidates (or choices in the case of a survey) from most popular and second-most popular down to least popular. It uses the preference information collected on 1-2-3 ballots (or any equivalent way of expressing "ranked" preferences). When a single position is being filled, the most popular candidate is declared the winner. This calculation method is mathematically equivalent to the Condorcet-Kemeny election method.

  • VoteFair representation ranking. This voting method is used to elect a second candidate who represents the voters who are not well-represented by the most-popular candidate, or to fill multiple board-of-director positions, or to choose a second simultaneous activity in addition to the most popular activity. This method reduces the influence of the voters who are already well-represented by the most popular candidate (or choice), and it does so in a way that protects against strategic voting. If instead the second-most popular candidate as identified by VoteFair popularity ranking were chosen, the same voters who prefer the first winner also can determine the second winner, and this can leave large numbers of other voters unrepresented. Additional levels of representation ranking can be used to fill additional seats, although VoteFair partial-proportional ranking should be used instead if "proportional representation" of political parties is needed, especially for the purpose of defeating attempts to gerrymander district boundaries.

  • VoteFair party ranking. This voting method ranks political parties according to a different kind of "popularity". The results can be used in high-stakes elections to limit the number of candidates allowed by each party. In such cases the two or three political parties that are ranked highest can be limited to offering just two candidates from each party, and lower-ranked parties can be allowed to offer one candidate each, and any additional parties can be prohibited from offering any candidate (because those parties are too unpopular and too unrepresentative). Such limits have not been needed in the past because the fear of vote splitting has limited each political party to offering just one candidate in each contest.

For detailed descriptions of VoteFair Ranking, see www.VoteFair.org or the book "Ending The Hidden Unfairness In U.S. Elections" by Richard Fobes.

In addition to being useful for elections, VoteFair Ranking also is useful for calculating results for surveys and polls, ranking the popularity of songs and movies, and much more.

In mathematical terms, VoteFair Ranking is useful for doing "combinatorial optimization" and may be useful for solving the "linear ordering problem". See Wikipedia for details about these terms.

EXPORT

The following subroutines are exported:

  • votefair_read_calculate_write

  • votefair_put_next_vote_info_number

  • votefair_do_calculations_all_questions

  • votefair_get_next_result_info_number

  • votefair_put_input_string

  • votefair_get_output_string

  • votefair_always_do_rep_and_party_ranking

  • votefair_start_new_cases

FUNCTIONS

votefair_read_calculate_write

Reads numbers from the standard input file, does all the requested calculations, and writes requested results to the standard output file. In most election situations this is the only subroutine that needs to be used.

votefair_put_next_vote_info_number

Adds the next input vote-info number to the list that stores the input data. The meaning of the negative numbers are explained in an output file that this module creates.

votefair_get_next_result_info_number

Gets the next result-info number from the list that stores the result information.

votefair_put_input_string

Interprets an input text string that contains vote-info data (for all cases). Unlike the votefair_put_next_vote_info_number subroutine, this string can contain easy-to-type codes instead of numeric-only codes. These codes are useful for testing this module, and for supplying a hypothetical voting scenario that is typed rather than collected through ballots.

votefair_get_output_string

Creates a text string that contains all the results in a person-readable code. It is useful for testing this module, and to allow a person to directly (although cryptically) read the results.

votefair_do_calculations_all_questions

Does the requested calculations for all the questions (although other subroutines do the actual calculations).

votefair_start_new_cases

Does initialization again so that another batch of cases of data can be processed. Normally this subroutine is not needed because all the cases (each with independent questions and preferences) are done in one batch.

votefair_always_do_rep_and_party_ranking

Requests that VoteFair representation ranking and VoteFair party ranking always be done -- except when only plurality votes are requested (because in those cases 1-2-3 ballots have not been used).

calc_votefair_popularity_rank

(Not exported, for internal use only.)

Handles the overhead actions for calculating VoteFair popularity ranking results, as described in the book "Ending The Hidden Unfairness In U.S. Elections", and as described in Wikipedia as the "Condorcet-Kemeny method" (which redirects to the "Kemeny-Young method" article). See VoteFair.org for details.

These results are used in situations where a single seat is being filled (and there is only one such seat), or to determine the full ranking of choices, or to correctly identify the least-popular choice (where that choice is a contestant who is eliminated before the next round of the contest).

calc_votefair_representation_rank

(Not exported, for internal use only.)

Calculates VoteFair representation ranking results, as described in the book "Ending The Hidden Unfairness In U.S. Elections." These results are used in situations where more than one choice is selected, such as when there is more than one seat being filled, or when there is more than one activity (for participation) being offered at the same time (and attendance at both activities is not possible). The first-most representative choice is the most popular based on VoteFair popularity ranking, which is calculated before arriving here. Therefore, this subroutine identifies the VoteFair-based second-most representative, third-most representative, etc. choices in an election.

calc_votefair_party_rank

(Not exported, for internal use only.)

Calculates VoteFair party ranking results, as described in the book "Ending The Hidden Unfairness In U.S. Elections." These results are used to determine the maximum number of candidates each political party is allowed to offer in one election. The number of allowed candidates will vary according to the election type, with less-important elections not having any limits, and very important elections, such as for U.S. President, allowing two candidates each from the first-ranked and second-ranked parties, one candidate each from the next three or four parties, and no candidates from any other parties.

calc_votefair_choice_specific_pairwise_score_popularity_rank

(Not exported, for internal use only.)

Calculates VoteFair choice-specific pairwise-score (CSPS) popularity ranking results.

This kind of ranking is useful for estimating the overall ranking when there are many choices (such as hundreds of choices) and fast results are desired. It is also used to estimate the ranking before doing the VoteFair insertion-sort popularity ranking calculations. (If access to a computer is not available, this CSPS method can be done using pen and paper and a calculator.)

Example:

(A concise description follows this example. The description is easier to understand after you have read this example.)

Consider an election (or survey) in which there are five choices: A, B, C, D, E, and the final ranking order is this alphabetical order.

In this example the notation A>B refers to how many voters pairwise prefer choice A over choice B, and the notation B>A refers to how many voters pairwise prefer choice B over choice A. This notation always uses the "greater-than" symbol ">", and never uses the "less-than" symbol "<".

At the beginning of this ranking example, suppose that the choices are arranged in the order C, E, A, D, B. The pairwise counts for this arrangement are shown in this pairwise matrix.

     |       |       |       |       |       |
     |   C   |   E   |   A   |   D   |   B   |
     |       |       |       |       |       |
-----+-------+-------+-------+-------+-------+
     | \     |       |       |       |       |
 C   |   \   |  C>E  |  C>A  |  C>D  |  C>B  |
     |     \ |       |       |       |       |
-----+-------+-------+-------+-------+-------+
     |       | \     |       |       |       |
 E   |  E>C  |   \   |  E>A  |  E>D  |  E>B  |
     |       |     \ |       |       |       |
-----+-------+-------+-------+-------+-------+
     |       |       | \     |       |       |
 A   |  A>C  |  A>E  |   \   |  A>D  |  A>B  |
     |       |       |     \ |       |       |
-----+-------+-------+-------+-------+-------+
     |       |       |       | \     |       |
 D   |  D>C  |  D>E  |  D>A  |   \   |  D>B  |
     |       |       |       |     \ |       |
-----+-------+-------+-------+-------+-------+
     |       |       |       |       | \     |
 B   |  B>C  |  B>E  |  B>A  |  B>D  |   \   |
     |       |       |       |       |     \ |
-----+-------+-------+-------+-------+-------+

The diagonal line passes through empty cells. These cells are empty because they would represent a choice's comparison with itself, such as A>A.

The goal of these calculations is to change the sequence so that the largest pairwise counts move into the upper-right triangular area, leaving the smallest pairwise counts in the lower-left triangular area. (This is similar to the goal of VoteFair popularity ranking.)

The first step is to calculate choice-specific scores, with each choice having a row score and a column score. For choice A, its row score equals the sum of the pairwise counts in the row labelled A, which equals A>B + A>C + A>D + A>E. The column score for choice A is the sum of the pairwise counts in the column labeled A, which equals B>A + C>A + D>A + E>A. The row scores and column scores for choices B, C, D, and E are calculated similarly.

Next, all the row scores are compared to determine which choice has the largest row score. In this example that score would be the row score for choice A (because it is first in alphabetical order). Therefore choice A is moved into first place. The other choices remain in the same order. The resulting sequence is A, C, E, D, B. Here is the pairwise matrix for the new sequence. The pairwise counts for the ranked choice (A) are surrounded by asterisks:

     |       |       |       |       |       |
     |   A   |   C   |   E   |   D   |   B   |
     |       |       |       |       |       |
-----*****************************************
     * \     |       |       |       |       *
 A   *   \   |  A>C  |  A>E  |  A>D  |  A>B  *
     *     \ |       |       |       |       *
-----*-------*********************************
     *       * \     |       |       |       |
 C   *  C>A  *   \   |  C>E  |  C>D  |  C>B  |
     *       *     \ |       |       |       |
-----*-------*-------+-------+-------+-------+
     *       *       | \     |       |       |
 E   *  E>A  *  E>C  |   \   |  E>D  |  E>B  |
     *       *       |     \ |       |       |
-----*-------*-------+-------+-------+-------+
     *       *       |       | \     |       |
 D   *  D>A  *  D>C  |  D>E  |   \   |  D>B  |
     *       *       |       |     \ |       |
-----*-------*-------+-------+-------+-------+
     *       *       |       |       | \     |
 B   *  B>A  *  B>C  |  B>E  |  B>D  |   \   |
     *       *       |       |       |     \ |
-----*********-------+-------+-------+-------+

The row scores and column scores for the remaining (unranked) choices are adjusted to remove the pairwise counts that involve the just-ranked choice (A). The removed pairwise counts are the ones surrounded by asterisks. Specifically, after subtracting B>A, the row score for choice B becomes B>C + B>D + B>E, and after subtracting A>B, the column score for choice B becomes C>B + D>B + E>B.

From among the remaining row scores the highest score is found. At this point let's assume that both choice B and choice C have the same highest row score.

In the case of a row-score tie, the choice with the smallest column score -- from among the choices that have the same largest row score -- is ranked next. This would be choice B. Therefore, choice B is moved to the sequence position just after choice A. The resulting sequence is A, B, C, E, D. Below is the pairwise matrix for the new sequence. The pairwise counts for the ranked choices are surrounded by asterisks.

     |       |       |       |       |       |
     |   A   |   B   |   C   |   E   |   D   |
     |       |       |       |       |       |
-----*****************************************
     * \     |       |       |       |       *
 A   *   \   |  A>B  |  A>C  |  A>E  |  A>D  *
     *     \ |       |       |       |       *
-----*-------+-------+-------+-------+-------*
     *       | \     |       |       |       *
 B   *  B>A  |   \   |  B>C  |  B>E  |  B>D  *
     *       |     \ |       |       |       *
-----*-------+--------************************
     *       |       * \     |       |       |
 C   *  C>A  |  C>B  *   \   |  C>E  |  C>D  |
     *       |       *     \ |       |       |
-----*-------+-------*-------+-------+-------+
     *       |       *       | \     |       |
 E   *  E>A  |  E>B  *  E>C  |   \   |  E>D  |
     *       |       *       |     \ |       |
-----*-------+-------*-------+-------+-------+
     *       |       *       |       | \     |
 D   *  D>A  |  D>B  *  D>C  |  D>E  |   \   |
     *       |       *       |       |     \ |
-----*****************-------+-------+-------+

The same ranking process is repeated. The next choice to be ranked would be choice C. It would have the highest row score -- and the smallest column score if there is a row-score tie. So choice C would be identifed as the next choice in the ranked sequence. After that, choice D would have the highest row score, and would be ranked next. Finally the only remaining choice, choice E, would be ranked at the last (lowest) position.

Here is the final pairwise matrix.

     |       |       |       |       |       |
     |   A   |   B   |   C   |   D   |   E   |
     |       |       |       |       |       |
-----*****************************************
     * \     |       |       |       |       *
 A   *   \   |  A>B  |  A>C  |  A>D  |  A>E  *
     *     \ |       |       |       |       *
-----*-------+-------+-------+-------+-------*
     *       | \     |       |       |       *
 B   *  B>A  |   \   |  B>C  |  B>D  |  B>E  *
     *       |     \ |       |       |       *
-----*-------+-------+-------+-------+-------*
     *       |       | \     |       |       *
 C   *  C>A  |  C>B  |   \   |  C>D  |  C>E  *
     *       |       |     \ |       |       *
-----*-------+-------+-------+-------+-------*
     *       |       |       | \     |       *
 D   *  D>A  |  D>B  |  D>C  |   \   |  D>E  *
     *       |       |       |     \ |       *
-----*-------+-------+-------+-------+-------*
     *       |       |       |       | \     *
 E   *  E>A  |  E>B  |  E>C  |  E>D  |   \   *
     *       |       |       |       |     \ *
-----*****************************************

The choices are now fully ranked according to the Choice-Specific Pairwise-Count method.

If only a single winner is needed, the first-ranked choice should not necessarily be selected as the winner. Instead, the pairwise counts should be checked for a possible Condorcet winner, which may be second or third in the CSPS ranking result.

Concise description of the calculation method:

A row score and a column score is calculated for each choice. The row score is the sum of the pairwise counts in which the specified choice is preferred over each of the other choices. The column score is the sum of the pairwise counts in which each other choice is preferred over the specified choice.

For the choices that have not yet been ranked, all the row scores are compared to find the highest row score. The choice that has the highest row score is moved to the most-popular or next-most popular position in the ranking results.

If more than one choice is tied with the highest row score, the choice with the smallest column score is chosen. If more than one choice has the same row score and the same column score, the choices are regarded as tied.

After each choice has been ranked, the scores for the remaining (unranked) choices are adjusted by subtracting from all the remaining scores the pairwise counts that involve the just-ranked choice.

The process of ranking each choice and adjusting the remaining scores is repeated until only one choice remains, and it is ranked in the bottom (least-popular) position.

(This description of the VoteFair choice-specific pairwise-score method was copied from www.VoteFair.org with permission.)

calc_votefair_insertion_sort_popularity_rank

(Not exported, for internal use only.)

Calculates VoteFair popularity ranking results using the "insertion-sort" sorting algorithm.

VoteFair popularity ranking is described in Wikipedia, and is mathematically equivalent to the Condorcet-Kemeny method. The following comments explain the algorithm used here, which quickly calculates the ranking results. This explanation is important because some academic sources claim that the computations cannot be done quickly if there is a large number of choices being ranked.

Although the goal of VoteFair popularity ranking is to find the sequence that has the highest sequence score, the scores themselves do not need to be calculated. This concept is similar to not needing to know the elevation of every point in a region in order to know that a specific mountain peak is the highest elevation in that region. By not calculating all the sequence scores, the calculation time can be greatly reduced compared to the method that calculates all the sequence scores.

The algorithm described here assumes that the choices already have been pre-sorted using the Choice-Specific Pairwise-Score (CSPS) algorithm. That algorithm is part of this full calculation algorithm.

The full algorithm repeatedly uses a variation of the "insertion-sort" sorting algorithm, so that algorithm is described first.

Insertion-sort algorithm applied to finding maximum sequence score:

This explanation clarifies how the well-known insertion-sort algorithm is applied to VoteFair popularity ranking in a way that greatly reduces the number of calculations needed to find maximum sequence scores. (This method is just part of the full algorithm, which is explained in the next section.)

Consider an example in which there are five choices named A, B, C, D, and E, with a final sort order that matches this alphabetical order.

Notation: The notation A>B refers to how many voters pairwise prefer choice A over choice B, and the notation B>A refers to how many voters pairwise prefer choice B over choice A. This notation always uses the "greater-than" symbol ">", and never uses the "less-than" symbol "<".

At an intermediate stage in this sorting example, suppose that the choices A, C, and E have been sorted -- into this correct order -- and choice B is about to be sorted, and choice D remains unsorted. The pairwise counts for this arrangement are shown below. The asterisks show the separation between sorted choices and unsorted choices.

     |       |       |       |       |       |
     |   A   |   C   |   E   |   B   |   D   |
     |       |       |       |       |       |
-----*************************-------+-------+
     * \     |       |       *       |       |
 A   *   \   |  A>C  |  A>E  *  A>B  |  A>D  |
     *     \ |       |       *       |       |
-----+-------+-------+-------+-------+-------+
     *       | \     |       *       |       |
 C   *  C>A  |   \   |  C>E  *  C>B  |  C>D  |
     *       |     \ |       *       |       |
-----+-------+-------+-------+-------+-------+
     *       |       | \     *       |       |
 E   *  E>A  |  E>C  |   \   *  E>B  |  E>D  |
     *       |       |     \ *       |       |
-----*************************-------+-------+
     |       |       |       | \     |       |
 B   |  B>A  |  B>C  |  B>E  |   \   |  B>D  |
     |       |       |       |     \ |       |
-----+-------+-------+-------+-------+-------+
     |       |       |       |       | \     |
 D   |  D>A  |  D>C  |  D>E  |  D>B  |   \   |
     |       |       |       |       |     \ |
-----+-------+-------+-------+-------+-------+

The diagonal line passes through empty cells -- that would otherwise represent a choice's comparison with itself, such as A>A.

The diagonal line also is the border between the upper-right triangular area and the lower-left triangular area. The sequence score for the current sequence is the sum of all the pairwise counts in the upper-right triangular area (currently A>C + A>E + A>B + A>D + C>E + C>B + C>D + E>B + E>D + B>D).

The goal of these calculations is to find the maximum sequence score, which means the goal is to change the sequence so that the largest pairwise counts move into the upper-right triangular area, leaving the smallest pairwise counts in the lower-left triangular area.

The first step in sorting choice B is to consider the possibility of moving it to the left of choice E, which would form the sequence A, C, B, E. Here is the pairwise-count matrix for this sequence. The asterisks now include choice B because this is a possible sort order.

     |       |       |       |       |       |
     |   A   |   C   |   B   |   E   |   D   |
     |       |       |       |       |       |
-----*********************************-------+
     * \     |       |       |       *       |
 A   *   \   |  A>C  |  A>B  |  A>E  *  A>D  |
     *     \ |       |       |       *       |
-----*-------+-------+-------+-------*-------+
     *       | \     |       |       *       |
 C   *  C>A  |   \   |  C>B  |  C>E  *  C>D  |
     *       |     \ |       |       *       |
-----*-------+-------+-------+-------*-------+
     *       |       | \     |       *       |
 B   *  B>A  |  B>C  |   \   |  B>E  *  B>D  |
     *       |       |     \ |  ---  *       |
-----*-------+-------+-------+-------*-------+
     *       |       |       | \     *       |
 E   *  E>A  |  E>C  |  E>B  |   \   *  E>D  |
     *       |       |  ---  |     \ *       |
-----*********************************-------+
     |       |       |       |       | \     |
 D   |  D>A  |  D>C  |  D>B  |  D>E  |   \   |
     |       |       |       |       |     \ |
-----+-------+-------+-------+-------+-------+

The only pairwise counts that crossed the diagonal line are the (underlined) counts B>E and E>B, which swapped places. All the other pairwise counts that move do not cross the diagonal line; they stay on the same side of the diagonal line.

As a result, the score for this sequence, compared to the score for the previous sequence, increases (or decreases if negative) by the amount B>E minus E>B. In this case (assuming there are no complications that are explained later) the sequence score has increased because in the final (alphabetical) sort order, choice B appears before choice E.

The next step in sorting choice B is to consider the possibility of moving it to the left of choice C, which would form the sequence A, B, C, E. Here is the pairwise-count matrix for this sequence.

     |       |       |       |       |       |
     |   A   |   B   |   C   |   E   |   D   |
     |       |       |       |       |       |
-----*********************************-------+
     * \     |       |       |       *       |
 A   *   \   |  A>B  |  A>C  |  A>E  *  A>D  |
     *     \ |       |       |       *       |
-----*-------+-------+-------+-------*-------+
     *       | \     |       |       *       |
 B   *  B>A  |   \   |  B>C  |  B>E  *  B>D  |
     *       |     \ |  ---  |       *       |
-----*-------+-------+-------+-------*-------+
     *       |       | \     |       *       |
 C   *  C>A  |  C>B  |   \   |  C>E  *  C>D  |
     *       |  ---  |     \ |       *       |
-----*-------+-------+-------+-------*-------+
     *       |       |       | \     *       |
 E   *  E>A  |  E>B  |  E>C  |   \   *  E>D  |
     *       |       |       |     \ *       |
-----*********************************-------+
     |       |       |       |       | \     |
 D   |  D>A  |  D>B  |  D>C  |  D>E  |   \   |
     |       |       |       |       |     \ |
-----+-------+-------+-------+-------+-------+

The only pairwise counts that crossed the diagonal line are the (underlined) counts B>C and C>B, which swapped places. The other pairwise counts that moved remained on the same side of the diagonal line.

The score for this sequence increases (or decreases if negative) by the amount B>C minus C>B. In this case the sequence score has increased because (in the final alphabetical order) choice B appears before choice C.

The final step in sorting choice B is to consider the possibility of moving it to the left of choice A, which would form the sequence B, A, C, E. Here is the matrix for this sequence.

     |       |       |       |       |       |
     |   B   |   A   |   C   |   E   |   D   |
     |       |       |       |       |       |
-----*********************************-------+
     * \     |       |       |       *       |
 B   *   \   |  B>A  |  B>C  |  B>E  *  B>D  |
     *     \ |  ---  |       |       *       |
-----*-------+-------+-------+-------*-------+
     *       | \     |       |       *       |
 A   *  A>B  |   \   |  A>C  |  A>E  *  A>D  |
     *  ---  |     \ |       |       *       |
-----*-------+-------+-------+-------*-------+
     *       |       | \     |       *       |
 C   *  C>B  |  C>A  |   \   |  C>E  *  C>D  |
     *       |       |     \ |       *       |
-----*-------+-------+-------+-------*-------+
     *       |       |       | \     *       |
 E   *  E>B  |  E>A  |  E>C  |   \   *  E>D  |
     *       |       |       |     \ *       |
-----*********************************-------+
     |       |       |       |       | \     |
 D   |  D>B  |  D>A  |  D>C  |  D>E  |   \   |
     |       |       |       |       |     \ |
-----+-------+-------+-------+-------+-------+

The only pairwise counts that crossed the diagonal line are the (underlined) counts B>A and A>B, which swapped places. The other pairwise counts that moved remained on the same side of the diagonal line.

The score for this sequence increases (or decreases if negative) by the amount B>A minus A>B. In this case the sequence score has decreased because (in the final alphabetical order) choice B appears after, not before, choice A.

At this point choice B has been tested at each position within the sorted portion. The maximum sequence score (for the sorted portion) occurred when it was between choices A and C. As a result, choice B will be moved to the position between choices A and C.

Notice that the full sequence score did not need to be calculated in order to find this "local" maximum. These calculations only need to keep track of increases and decreases that occur as the being-sorted choice swaps places with successive already-sorted choices.

The pairwise-count matrix with choice B in the second sort-order position (between A and C) is shown below. Now choice D is the only unsorted choice.

     |       |       |       |       |       |
     |   A   |   B   |   C   |   E   |   D   |
     |       |       |       |       |       |
-----*********************************-------+
     * \     |       |       |       *       |
 A   *   \   |  A>B  |  A>C  |  A>E  *  A>D  |
     *     \ |       |       |       *       |
-----*-------+-------+-------+-------*-------+
     *       | \     |       |       *       |
 B   *  B>A  |   \   |  B>C  |  B>E  *  B>D  |
     *       |     \ |       |       *       |
-----*-------+-------+-------+-------*-------+
     *       |       | \     |       *       |
 C   *  C>A  |  C>B  |   \   |  C>E  *  C>D  |
     *       |       |     \ |       *       |
-----*-------+-------+-------+-------*-------+
     *       |       |       | \     *       |
 E   *  E>A  |  E>B  |  E>C  |   \   *  E>D  |
     *       |       |       |     \ *       |
-----*********************************-------+
     |       |       |       |       | \     |
 D   |  D>A  |  D>B  |  D>C  |  D>E  |   \   |
     |       |       |       |       |     \ |
-----+-------+-------+-------+-------+-------+

Choice D would be sorted in the same way. Of course the maximum sequence score would occur when choice D is between choices C and E, so D is moved there.

     |       |       |       |       |       |
     |   A   |   B   |   C   |   D   |   E   |
     |       |       |       |       |       |
-----*****************************************
     * \     |       |       |       |       *
 A   *   \   |  A>B  |  A>C  |  A>D  |  A>E  *
     *     \ |       |       |       |       *
-----*-------+-------+-------+-------+-------*
     *       | \     |       |       |       *
 B   *  B>A  |   \   |  B>C  |  B>D  |  B>E  *
     *       |     \ |       |       |       *
-----*-------+-------+-------+-------+-------*
     *       |       | \     |       |       *
 C   *  C>A  |  C>B  |   \   |  C>D  |  C>E  *
     *       |       |     \ |       |       *
-----*-------+-------+-------+-------+-------*
     *       |       |       | \     |       *
 D   *  D>A  |  D>B  |  D>C  |   \   |  D>E  *
     *       |       |       |     \ |       *
-----*-------+-------+-------+-------+-------*
     *       |       |       |       | \     *
 E   *  E>A  |  E>B  |  E>C  |  E>D  |   \   *
     *       |       |       |       |     \ *
-----*****************************************

Now there are no more choices to sort, so the resulting sequence is A, B, C, D, E. In this sequence the full sequence score -- which equals A>B + A>C + A>D + A>E + B>C + B>D + B>E + C>D + C>E + D>E -- is likely to be the highest possible sequence score.

Additional calculations, as described below, are needed because in rare cases it is possible that moving two or more choices at the same time could produce a higher sequence score. This concept is analogous to climbing a mountain in foggy conditions by always heading in the locally higher direction and ending up at the top of a peak and then, when the fog clears, seeing a higher peak.

Full calculation method for VoteFair popularity ranking:

This is a description of the full algorithm used to calculate VoteFair popularity ranking results.

The algorithm begins by calculating the Choice-Specific Pairwise-Score ranking. This pre-sort is a required part of the process. Without it, some unusual cases can cause the calculations to fail to find the sequence with the highest score. This pre-sort is analogous to starting a search for the highest mountain peak within a mountain range instead of starting the search within a large valley.

The next step is to apply the insertion-sort method as described in the section above, including starting at the left/highest end.

To ensure that all possible moves of each choice are considered, the insertion-sort method is done in both directions. Sorting in both directions means that in some sorting passes sorting moves choices to the left, as explained in the above example. In other sorting passes sorting starts by considering the right-most choice as the first sorted choice, and choices move to the right, into the sorted portion. This convention ensures movement for choices that need to move right, instead of left, in order to cause an increase in the score.

Complications can arise when there is "circular ambiguity", so additional steps are used. The most common cases of circular ambiguity involve several choices that are tied for the same sort-order position.

A key part of dealing with circular ambiguity is to follow this convention: whenever a choice can produce the same, highest, sequence score at more than one position, the choice is moved to the farthest of those highest-sequence-score positions.

Another part of dealing with these complications is to sort the sequence multiple times.

During the second sorting pass, if there is no circular ambiguity, the sequence of the choices in the pairwise matrix remains the same. This lack of movement (when there is no circular ambiguity) occurs because the sorted and unsorted portions are adjacent. Specifically, each choice to be sorted is already at the top (for left-movement sorting) or bottom (for right-movement sorting) of the "unsorted" portion, and it is being moved to the bottom (for left-movement sorting) or top (for right-movement sorting) of the "sorted" portion. In such cases the only thing that moves is the boundary between the sorted choices and unsorted choices.

However, in cases that involve circular ambiguity, the positions of some choices will change during the second and later sorting passes. This happens because the convention (as explained above) is to move each choice as far as it will go, within the limits of maximizing the sequence score.

During the sorting passes the highest sort-order (sequence) position of each choice is tracked, and the lowest sort-order position of each choice is tracked. These highest and lowest positions are reset (to current positions) whenever the sequence score increases to a higher score. At the end of the sorting process the highest and lowest positions reveal which choices are tied at the same popularity ranking level.

Using the insertion-sort example, if choices B, C, and D can be in any order and still produce the same highest sequence score, then each of these choices would move to the left of the other two each time it is sorted, and each of these choices would have the same highest-ranked position of second place, and each would have the same lowest-ranked position of fourth place. Because these three choices have the same highest and lowest positions, they are easily identified as tied (at the same popularity ranking).

More complex cases of circular ambiguity can occur. To deal with these cases, and to ensure the correctness of the "winner" (the most popular choice), the sorting process is repeated for the top half (plus one) of the highest-ranked choices, and this sub-set sorting is repeated until there are just three choices. For example, if there are 12 choices, the sorting process is done for 12 choices, then the top 7 choices, then the top 4 choices, and finally the top 3 choices. Then the highest-ranked choice (or the choices that are tied at the top) is kept at the highest rank while the other choices are sorted a final time. (If, instead, the least-popular choice is the most important one to identify correctly, the data supplied to this algorithm can be inverted according to preference levels, and then the calculated ranking can be reversed.)

As a clarification, the extra sub-set sorting is done only if more than one sequence has the same highest sequence score. This point is significant if the distinction between VoteFair popularity ranking and the Condorcet-Kemeny method is relevant. Specifically, the Condorcet-Kemeny method does not indicate how such "tied" sequence scores should be resolved, whereas VoteFair popularity ranking resolves such "tied" sequence scores as part of its calculation process.

After all the sorting has been done, the highest and lowest ranking levels are used to determine the results. For each choice its highest and lowest ranking levels are added together (which equals twice their average) and then multiplied times a constant. The constant equals 10 times the number of choices minus one. These numbers are converted to integers, and then these "averaged scaled integerized" values are used as the non-normalized ranking levels. Two or more choices are ranked at the same level if they have the same "averaged-scaled-integerized" ranking values.

The final calculation step is to normalize the "averaged-scaled-integerized" ranking levels so that the normalized ranking levels are consecutive, namely 1, 2, 3, etc. (so that no ranking levels are skipped).

The result is a ranking that identifies which choice is first-most popular, which choice is second-most popular, and so on down to which choice is least popular. Ties can occur at any level.

Calculation time:

The full algorithm used to calculate VoteFair popularity ranking results has a calculation time that is less than or equal to the following polynomial function:

T = A + ( B * N ) + ( C * ( N * N ) )

where T is the calculation time, N is the number of choices, and A and B and C are constants. (In mathematical notation, N * N would be written as N squared.) This function includes the calculation time required for the Choice-Specific Pairwise-Score (CSPS) pre-sort calculations.

This time contrasts with the slow execution times of the "NP-hard" approach, in which every sequence score is calculated in order to find the sequence with the highest score. If every sequence score were calculated (from scratch), the calculation time would be proportional to:

N! * N * N

where N is the number of choices, N! is N factorial (2 * 3 * 4 * ... * N), and N * N equals N squared. Note that N factorial equals the number of possible sequences, and N squared times one-half approximately equals the number of pairwise counts that are added to calculate each sequence score.

This clarification about calculation time is included because there is an academically common -- yet mistaken -- belief that calculating the "Condorcet-Kemeny method" is "NP-hard" and cannot be calculated in a time that is proportional to a polynomial function of N (the number of choices).

(c) Copyright 2011 Richard Fobes at VoteFair.org

(This description copied from VoteFair.org with permission.)

calc_all_sequence_scores

(Not exported, for internal use only.)

Calculates VoteFair popularity ranking results by calculating every sequence score to find the highest score, and regarding the sequence (ranking) with the highest score to be the overall ranking. For details, see www.VoteFair.org or Wikipedia's "Condorcet-Kemeny method" article (which currently redirects to the "Kemeny-Young method" article) or the book titled "Ending The Hidden Unfairness In U.S. Elections".

If multiple sequences have the same highest score, calculate the average sequence position for each choice (but only for the sequences that have the highest score), and then normalize (remove gaps from) those rankings.

compare_popularity_results

(Not exported, for internal use only.)

Compares the results of different methods for calculating VoteFair popularity ranking.

do_full_initialization

(Not exported, for internal use only.)

Initializes all the global values and constants. It is always done at the beginning of executing this module. It is also executed if a new (second or later) set of cases is calculated.

write_numeric_code_definitions

(Not exported, for internal use only.)

Write to an output file Dashrep definitions that associate negative code numbers -- that are used for input data and output results -- with text-based names for those values.

check_vote_info_numbers

(Not exported, for internal use only.)

Checks the validity of the numbers in the vote-info list, and requests skipping any cases or questions that contain invalid data. Also, counts the number of questions in each case, and counts the choices in each question.

calculate_results_for_one_question

(Not exported, for internal use only.)

Calculates voting results for one question (contest) in an election/poll/survey.

set_all_choices_as_used

(Not exported, for internal use only.)

Specifies that all the choices (for the current question) are used (non-ignored).

reset_ballot_info_and_tally_table

(Not exported, for internal use only.)

Restarts the counting of ballot information at the first ballot (in the current case). Also sets up the adjusted (alias) choice numbers and pair counters to exclude the choices being ignored. Also creates and initializes the tally table.

get_numbers_based_on_one_ballot

(Not exported, for internal use only.)

Gets the preference information from the next ballot. This information may include an optional multiple-ballot count that indicates how many ballots have the same specified preferences.

add_preferences_to_tally_table

(Not exported, for internal use only.)

Adds to the tally table the just-acquired preference numbers (from the current ballot).

internal_view_matrix

(Not exported, for internal use only.)

For debugging purposes, writes to the debug log a matrix with the pairwise counts.

normalize_ranking

(Not exported, for internal use only.)

Normalizes ranking levels so that ranking levels are sequential integer numbers (with no numbers skipped).

put_next_result_info_number

(Not exported, for internal use only.)

Puts the next result-info number into the array that stores the result information.

output_plurality_counts

(Not exported, for internal use only.)

Puts into the output results the plurality counts.

output_tally_table_numbers

(Not exported, for internal use only.)

Puts the pairwise counts from the tally table into the output list.

output_ranking_results

(Not exported, for internal use only.)

Outputs the results of the requested VoteFair Ranking results. These results are supplied in two forms: in their sequence order, and in order of their choice number.

AUTHOR

Richard Fobes, <fobes at CPAN.org>

BUGS

Please report any bugs or feature requests on GitHub, at the CPSolver account, in the VoteFairRanking project area. Thank you!

SUPPORT

You can find documentation for this module on GitHub, in the CPSolver account, in the VoteFairRanking project area.

You can find details about VoteFair Ranking at: www.VoteFair.org

ACKNOWLEDGEMENTS

Richard Fobes designed VoteFair Ranking and developed the original version of this code over a period of many years. Richard Fobes is the author of the books titled "The Creative Problem Solver's Toolbox" and "Ending The Hidden Unfairness In U.S. Elections."

COPYRIGHT & LICENSE

(c) Copyright 1991 through 2011 Richard Fobes at www.VoteFair.org. You can redistribute and/or modify this VoteFairRanking library module under the Perl Artistic license version 2.0 (a copy of which is included in the LICENSE file). As required by the license this full copyright notice must be included in all copies of this software.

Conversion of this code into another programming language is also covered by the above license terms.

The mathematical algorithms of VoteFair Ranking are in the public domain.