This vignette will describe `comperes`

functionality for manipulating (summarising and transforming) competition results (hereafter - results):

- Computation of item summaries.
- Computation of Head-to-Head values and conversion between its formats.
- Creating pairgaimes.

We will need the following packages:

```
library(comperes)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(rlang)
```

Example results in long format:

```
cr_long <- tibble(
game = c("a1", "a1", "a1", "a2", "a2", "b1", "b1", "b2"),
player = c(1, NA, NA, 1, 2, 2, 1, 2),
score = 1:8,
season = c(rep("A", 5), rep("B", 3))
) %>%
as_longcr()
```

Functions discussed in these topics leverage `dplyr`

's grammar of data manipulation. Only basic knowledge is enough to use them. Also a knowledge of `rlang`

's quotation mechanism is preferred.

Item summary is understand as some summary measurements (of arbitrary nature) of item (one or more columns) present in data. To compute them, `comperes`

offers `summarise_*()`

family of functions in which summary functions should be provided as in `dplyr::summarise()`

. Basically, they are wrappers for grouped summarise with forced ungrouping, conversion to `tibble`

and possible adding prefix to summaries. **Note** that if one of columns in item is a factor with implicit `NA`

s (present in vector but not in levels), there will be a warning suggesting to add `NA`

to levels. This is due to `group_by()`

functionality in `dplyr`

after 0.8.0 version.

Couple of examples:

```
cr_long %>% summarise_player(mean_score = mean(score))
#> # A tibble: 3 x 2
#> player mean_score
#> <dbl> <dbl>
#> 1 1 4
#> 2 2 6.33
#> 3 NA 2.5
cr_long %>% summarise_game(min_score = min(score), max_score = max(score))
#> # A tibble: 4 x 3
#> game min_score max_score
#> <chr> <int> <int>
#> 1 a1 1 3
#> 2 a2 4 5
#> 3 b1 6 7
#> 4 b2 8 8
cr_long %>% summarise_item("season", sd_score = sd(score))
#> # A tibble: 2 x 2
#> season sd_score
#> <chr> <dbl>
#> 1 A 1.58
#> 2 B 1
```

For convenient transformation of results there are `join_*_summary()`

family of functions, which compute respective summaries and join them to original data:

```
cr_long %>%
join_item_summary("season", season_mean_score = mean(score)) %>%
mutate(score = score - season_mean_score)
#> # A longcr object:
#> # A tibble: 8 x 5
#> game player score season season_mean_score
#> <chr> <dbl> <dbl> <chr> <dbl>
#> 1 a1 1 -2 A 3
#> 2 a1 NA -1 A 3
#> 3 a1 NA 0 A 3
#> 4 a2 1 1 A 3
#> 5 a2 2 2 A 3
#> 6 b1 2 -1 B 7
#> 7 b1 1 0 B 7
#> 8 b2 2 1 B 7
```

For common summary functions `comperes`

has a list `summary_funs`

with 8 quoted expressions to be used with `rlang`

's unquoting mechanism:

```
# Use .prefix to add prefix to summary columns
cr_long %>%
join_player_summary(!!!summary_funs[1:2], .prefix = "player_") %>%
join_item_summary("season", !!!summary_funs[1:2], .prefix = "season_")
#> # A longcr object:
#> # A tibble: 8 x 8
#> game player score season player_min_score player_max_score season_min_score
#> <chr> <dbl> <int> <chr> <int> <int> <int>
#> 1 a1 1 1 A 1 7 1
#> 2 a1 NA 2 A 2 3 1
#> 3 a1 NA 3 A 2 3 1
#> 4 a2 1 4 A 1 7 1
#> 5 a2 2 5 A 5 8 1
#> 6 b1 2 6 B 5 8 6
#> 7 b1 1 7 B 1 7 6
#> 8 b2 2 8 B 5 8 6
#> # … with 1 more variable: season_max_score <int>
```

Head-to-Head value is a summary statistic of direct confrontation between two players. It is assumed that this value can be computed based only on the players' **matchups**, data of actual participation for ordered pair of players in one game.

To compute matchups, `comperes`

has `get_matchups()`

, which returns a `widecr`

object with all matchups actually present in results (including matchups of players with themselves). **Note** that missing values in `player`

column are treated as separate players. It allows operating with games where multiple players' identifiers are not known. However, when computing Head-to-Head values they treated as single player. Example:

```
get_matchups(cr_long)
#> # A widecr object:
#> # A tibble: 18 x 5
#> game player1 score1 player2 score2
#> <chr> <dbl> <int> <dbl> <int>
#> 1 a1 1 1 1 1
#> 2 a1 1 1 NA 2
#> 3 a1 1 1 NA 3
#> 4 a1 NA 2 1 1
#> 5 a1 NA 2 NA 2
#> 6 a1 NA 2 NA 3
#> 7 a1 NA 3 1 1
#> 8 a1 NA 3 NA 2
#> 9 a1 NA 3 NA 3
#> 10 a2 1 4 1 4
#> 11 a2 1 4 2 5
#> 12 a2 2 5 1 4
#> 13 a2 2 5 2 5
#> 14 b1 2 6 2 6
#> 15 b1 2 6 1 7
#> 16 b1 1 7 2 6
#> 17 b1 1 7 1 7
#> 18 b2 2 8 2 8
```

Head-to-Head values can be stored in two ways:

**Long**, a`tibble`

with columns`player1`

and`player2`

which identify ordered pair of players, and columns corresponding to Head-to-Head values. Computation is done with`h2h_long()`

which returns an object of class`h2h_long`

. Head-to-Head functions are specified as in`dplyr`

's grammar**for results matchups**:

```
cr_long %>%
h2h_long(
abs_diff = mean(abs(score1 - score2)),
num_wins = sum(score1 > score2)
)
#> # A long format of Head-to-Head values:
#> # A tibble: 9 x 4
#> player1 player2 abs_diff num_wins
#> <dbl> <dbl> <dbl> <int>
#> 1 1 1 0 0
#> 2 1 2 1 1
#> 3 1 NA 1.5 0
#> 4 2 1 1 1
#> 5 2 2 0 0
#> 6 2 NA NA NA
#> 7 NA 1 1.5 2
#> 8 NA 2 NA NA
#> 9 NA NA 0.5 1
```

**Matrix**, a matrix where rows and columns describe ordered pair of players and entries - Head-to-Head values. This allows convenient storage of only one Head-to-Head value. Computation is done with`h2h_mat()`

which returns an object of class`h2h_mat`

. Head-to-Head functions are specified as in`h2h_long()`

:

```
cr_long %>% h2h_mat(sum_score = sum(score1 + score2))
#> # A matrix format of Head-to-Head values:
#> 1 2 <NA>
#> 1 24 22 7
#> 2 22 38 NA
#> <NA> 7 NA 20
```

`comperes`

also offers a list `h2h_funs`

of 9 common Head-to-Head functions as quoted expressions to be used with `rlang`

's unquoting mechanism:

```
cr_long %>% h2h_long(!!!h2h_funs)
#> # A long format of Head-to-Head values:
#> # A tibble: 9 x 11
#> player1 player2 mean_score_diff mean_score_diff… mean_score sum_score_diff
#> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 1 1 0 0 4 0
#> 2 1 2 0 0 5.5 0
#> 3 1 NA -1.5 0 1 -3
#> 4 2 1 0 0 5.5 0
#> 5 2 2 0 0 6.33 0
#> 6 2 NA NA NA NA NA
#> 7 NA 1 1.5 1.5 2.5 3
#> 8 NA 2 NA NA NA NA
#> 9 NA NA 0 0 2.5 0
#> # … with 5 more variables: sum_score_diff_pos <dbl>, sum_score <int>,
#> # num_wins <dbl>, num_wins2 <dbl>, num <int>
```

To compute Head-to-Head for only subset of players or include values for players that are not in the results, use factor `player`

column. **Notes**:

- You can use
`fill`

argument to replace`NA`

s in certain columns after computing Head-to-Head values. - As Head-to-Head functions use
`summarise_item()`

, there will be a warning in case of implicit`NA`

s in factor columns.

```
cr_long_fac <- cr_long %>%
mutate(player = factor(player, levels = c(1, 2, 3)))
cr_long_fac %>%
h2h_long(abs_diff = mean(abs(score1 - score2)),
fill = list(abs_diff = -100))
#> # A long format of Head-to-Head values:
#> # A tibble: 9 x 3
#> player1 player2 abs_diff
#> <fct> <fct> <dbl>
#> 1 1 1 0
#> 2 1 2 1
#> 3 1 3 -100
#> 4 2 1 1
#> 5 2 2 0
#> 6 2 3 -100
#> 7 3 1 -100
#> 8 3 2 -100
#> 9 3 3 -100
cr_long_fac %>%
h2h_mat(mean(abs(score1 - score2)),
fill = -100)
#> # A matrix format of Head-to-Head values:
#> 1 2 3
#> 1 0 1 -100
#> 2 1 0 -100
#> 3 -100 -100 -100
```

To convert between long and matrix formats of Head-to-Head values, `comperes`

has `to_h2h_long()`

and `to_h2h_mat()`

which convert from matrix to long and from long to matrix respectively. **Note** that output of `to_h2h_long()`

has `player1`

and `player2`

columns as characters. Examples:

```
cr_long %>% h2h_mat(mean(score1)) %>% to_h2h_long()
#> # A long format of Head-to-Head values:
#> # A tibble: 9 x 3
#> player1 player2 h2h_value
#> <chr> <chr> <dbl>
#> 1 1 1 4
#> 2 1 2 5.5
#> 3 1 <NA> 1
#> 4 2 1 5.5
#> 5 2 2 6.33
#> 6 2 <NA> NA
#> 7 <NA> 1 2.5
#> 8 <NA> 2 NA
#> 9 <NA> <NA> 2.5
cr_long %>%
h2h_long(mean_score1 = mean(score1), mean_score2 = mean(score2)) %>%
to_h2h_mat()
#> Using mean_score1 as value.
#> # A matrix format of Head-to-Head values:
#> 1 2 <NA>
#> 1 4.0 5.500000 1.0
#> 2 5.5 6.333333 NA
#> <NA> 2.5 NA 2.5
```

All this functionality is powered by useful outside of `comperes`

functions `long_to_mat()`

and `mat_to_long()`

. They convert general pair-value data between long and matrix format:

```
pair_value_long <- tibble(
key_1 = c(1, 1, 2),
key_2 = c(2, 3, 3),
val = 1:3
)
pair_value_mat <- pair_value_long %>%
long_to_mat(row_key = "key_1", col_key = "key_2", value = "val")
pair_value_mat
#> 2 3
#> 1 1 2
#> 2 NA 3
pair_value_mat %>%
mat_to_long(
row_key = "key_1", col_key = "key_2", value = "val",
drop = TRUE
)
#> # A tibble: 3 x 3
#> key_1 key_2 val
#> <chr> <chr> <int>
#> 1 1 2 1
#> 2 1 3 2
#> 3 2 3 3
```

For some ranking algorithms it crucial that games should only be between two players. `comperes`

has function `to_pairgames()`

for this. It removes games with one player. Games with three and more players `to_pairgames()`

splits into **separate games** between unordered pairs of different players without specific order. **Note** that game identifiers are changed to integers but order of initial games is preserved. Example:

```
to_pairgames(cr_long)
#> # A widecr object:
#> # A tibble: 5 x 5
#> game player1 score1 player2 score2
#> <int> <dbl> <int> <dbl> <int>
#> 1 1 1 1 NA 2
#> 2 2 1 1 NA 3
#> 3 3 NA 2 NA 3
#> 4 4 1 4 2 5
#> 5 5 2 6 1 7
```