--- title: "Custom Simulations" author: Tan Ho date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Custom Simulations} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, eval = FALSE ) options(dplyr.summarise.inform = FALSE) ``` ffsimulator also provides the component subfunctions (prefixed with `ffs_`) in case you want to customize and run certain components individually. This vignette will discuss various options and paths you might use in your own analysis, and discuss various customizations for simulating the [2021 Scott Fish Bowl](https://scottfishbowl.com). You can also use the `ffs_copy_template()` function to quickly get started with your own custom simulation! ```{r setup, message = FALSE} library(ffsimulator) library(ffscrapr) library(dplyr) library(ggplot2) ``` ## Overview of ff_simulate() Loosely speaking, the main ff_simulate function has the following components: - importing data - generating projections - calculating roster scores - building schedules - aggregating by week, season, and simulation so we'll reuse this structure in this vignette! ### Importing Data By default, ff_simulate imports the following data based on the ff_connect `conn` object that is passed in: ```{r eval = FALSE} scoring_history <- ffscrapr::ff_scoringhistory(conn, seasons = 2012:2020) ``` This retrieves week-level fantasy scoring for the specified years, and is built from nflfastR weekly data combined with the platform specified league rules. ```{r eval = FALSE} latest_rankings <- ffs_latest_rankings(type = "draft") # also "week", for inseason sims ``` This retrieves the latest FantasyPros positional rankings available from the [DynastyProcess data repository](https://github.com/dynastyprocess/data). If you want to customize the rankings used in your simulation, you can construct and replace this latest_rankings dataframe with one of a similar structure and column naming - the important ones are (positional) "ecr", "sd", "bye", and "fantasypros_id". ```{r eval = FALSE} rosters <- ffs_rosters(conn) ``` This retrieves rosters and attaches a `fantasypros_id` to them. You could run hypothetical scenarios such as trades by editing this rosters dataframe by hand and then running the simulation! ```{r eval = FALSE} lineup_constraints <- ffs_starter_positions(conn) ``` This retrieves lineup constraints from your fantasy platform. You can edit these to test out hypothetical starting lineup settings and minimum requirements! ```{r eval = FALSE} league_info <- ffscrapr::ff_league(conn) ``` This brings in league information that is primarily used for plot names. ### Generating Projections `ff_simulate` runs two functions to generate "projections" - the first one builds the population of weekly scores to resample from, and the second one runs the bootstrap resampling for n_seasons x n_weeks. ```{r eval = FALSE} adp_outcomes <- ffs_adp_outcomes( scoring_history = scoring_history, gp_model = "simple", # or "none" pos_filter = c("QB","RB","WR","TE","K") ) ``` This builds out the population of weekly outcomes for each positional adp rank, using the above-mentioned scoring history as well as `fp_rankings_history` (2012-2020 historical positional rankings) and `fp_injury_table`(an injury model). ```{r eval = FALSE} projected_scores <- ffs_generate_projections( adp_outcomes = adp_outcomes, latest_rankings = latest_rankings, n_seasons = 100, # number of seasons weeks = 1:14, # specifies which weeks to generate projections for rosters = rosters # optional, reduces the sample to just rostered players ) ``` This uses the adp_outcomes table, latest rankings, some parameters (number of seasons, specific weeks), and rosters, generating a dataframe of length n_seasons x n_weeks x nrow(latest_rankings) and automatically blanking out NFL bye weeks. ### Calculating Roster Scores This is a simple process conceptually, but probably the most computationally expensive part of the simulation: first, inner join the projected_scores for each player onto the rosters, then run a linear programming optimizer to determine the optimal lineup and calculate the week's score. ```{r eval = FALSE} roster_scores <- ffs_score_rosters( projected_scores = projected_scores, rosters = rosters ) ``` This function performs an inner join of these two tables and calculates position rank for each player (based on the scores for each week). ```{r eval = FALSE} optimal_scores <- ffs_optimise_lineups( roster_scores = roster_scores, lineup_constraints = lineup_constraints, lineup_efficiency_mean = 0.775, lineup_efficiency_sd = 0.05, best_ball = FALSE, # or TRUE pos_filter = c("QB","RB","WR","TE","K") ) ``` This function runs the lineup optimisation and applies a small lineup efficiency model. Lineup efficiency refers to the ratio of "actual lineup score" to "optimal lineup score". Lineup efficiency is generated as a random number that is normally distributed around 0.775 (77.5%) and has a standard deviation of 0.05. This gives the usual lineup efficiency range to be somewhere between 0.67 and 0.87, which is (in my experience) the typical range of lineup efficiency. You can adjust the lineup efficiency model for yourself, or perhaps apply your own modelling afterwards. `best_ball` forces lineup efficiency to be 100% of the optimal score. There are options to use parallel processing - in my experience, 100 seasons of a 12 team league is too small to see any benefit from parallel. I'd recommend it for running larger simulations, i.e. 12 x 1000, or 100 x 1920 (like SFB)! ### Building Schedules In order to calculate head to head wins, you need a schedule! Enter `ffs_build_schedules()`: ```{r eval = FALSE} schedules <- ffs_build_schedules( n_seasons = n_seasons, n_weeks = n_weeks, seed = NULL, franchises = ffs_franchises(conn) ) ``` This efficiently builds a randomized head to head schedule for a given number of seasons, teams, and weeks. It starts with the [circle method for round robin scheduling](https://en.wikipedia.org/wiki/Round-robin_tournament#Scheduling_algorithm), grows or shrinks the schedule to match the required number of weeks, and then shuffles both the order that teams are assigned in and the order that weeks are generated. This doesn't "guarantee" unique schedules, but there are `n_teams! x n_weeks!` permutations of the schedule so it's very very likely that the schedules are unique (3x10\^18 possible schedules for a 12 team league playing 13 weeks). ### Aggregating Results Now that we have a schedule, we can aggregate by week, and then by season, and then by simulation: ```{r eval = FALSE} summary_week <- ffs_summarise_week(optimal_scores, schedules) summary_season <- ffs_summarise_season(summary_week) summary_simulation <- ffs_summarise_simulation(summary_season) ``` Each summary function feeds into the next summary function! ## SFB Simulation Okay! So now that we've done that, let's have a look at how I'd customize these functions to simulate SFB11 - a 1,920 team contest spread over 20 league IDs. (This code typically takes about 30 minutes to run in parallel and eats up about 40GB of memory). ```{r eval = FALSE} options(ffscrapr.cache = "filesystem") library(ffsimulator) library(ffscrapr) library(tidyverse) library(tictoc) # for timing! set.seed(613) ``` Package listing. I set ffscrapr to cache to my hard drive, and set a seed for reproducibility. ### Import Data ```{r eval = FALSE} conn <- mfl_connect(2021, 47747) # a random SFB league to grab league info from league_info <- ffscrapr::ff_league(conn) scoring_history <- ffscrapr::ff_scoringhistory(conn, 2012:2020) adp_outcomes <- ffs_adp_outcomes(scoring_history = scoring_history, gp_model = "simple",pos_filter = c("QB","RB","WR","TE","K")) latest_rankings <- ffs_latest_rankings() lineup_constraints <- ffs_starter_positions(conn) ``` We can use one league ID here to grab most of the historical scoring data and rules/lineups etc for the entire SFB contest (rather than running it once for each of the twenty league IDs). ```{r eval = FALSE} conn2 <- mfl_connect(2021) leagues <- mfl_getendpoint(conn2, "leagueSearch", SEARCH = "#SFB11") %>% pluck("content","leagues","league") %>% tibble() %>% unnest_wider(1) %>% filter(str_detect(name,"Mock|Copy|Satellite|Template",negate = TRUE)) get_rosters <- function(league_id){ mfl_connect(2021, league_id) %>% ffs_rosters() } get_franchises <- function(league_id){ mfl_connect(2021, league_id) %>% ff_franchises() } rosters_raw <- leagues %>% select(-homeURL) %>% mutate( rosters = map(id, get_rosters), franchises = map(id, get_franchises) ) franchises <- rosters_raw %>% select(league_id = id, franchises) %>% unnest(franchises) %>% select(league_id, franchise_id, division_name) rosters <- rosters_raw %>% select(rosters) %>% unnest(rosters) %>% left_join(franchises,by = c("league_id","franchise_id")) ``` Because SFB is spread over multiple league IDs, we need to get a list of IDs from the leagueSearch endpoint, map over them with the get_rosters and get_franchises helper functions we just defined, and attach the division name. ```{r eval = FALSE} n_seasons <- 100 n_weeks <- 13 projected_scores <- ffs_generate_projections(adp_outcomes = adp_outcomes, latest_rankings = latest_rankings, n_seasons = n_seasons, weeks = 1:14, rosters = rosters) tictoc::tic(glue::glue("ffs_score_rosters {Sys.time()}")) roster_scores <- ffs_score_rosters(projected_scores, rosters) tictoc::toc() tictoc::tic("ffs_optimize_lineups {Sys.time()}") optimal_scores <- ffs_optimize_lineups( roster_scores = roster_scores, lineup_constraints = lineup_constraints, pos_filter = c("QB","RB","WR","TE","K"), best_ball = FALSE) tictoc::toc() ``` These are pretty straight forward, I use tictoc here to time the most expensive parts of the computation so that I know how long it takes - on my machine, this takes between 20-25 minutes to compute. ```{r eval = FALSE} schedules <- ffs_build_schedules(franchises = franchises, n_seasons = n_seasons, n_weeks = n_weeks) summary_week <- ffs_summarise_week(optimal_scores, schedules) summary_season <- ffs_summarise_season(summary_week) summary_simulation <- ffs_summarise_simulation(summary_season) ``` By comparison, these are very fast to compute (a minute or two total).