Tuesday, July 17, 2007

Using Video Game Technology to Improve Statistical Modeling

A Condensed and Basic Outline for Using Video Game Technology to Improve Statistical Modeling

The following ideas came to me upon reflection on the dearth of meaningful soccer statistics. You have goals, saves, assists, and perhaps a few other basic measures, but because the data is so limited - e.g. you frequently have games scored 0-0 - it is exceedingly difficult to quantify player value. In baseball, by contrast - the sport in which the statistical arts are most advanced - there are dozens of metrics and an ever-increasing number of ways to manipulate and analyze them. This is because every stage and action of the game is discrete, measured, and recorded; and because the game itself is very simple: you need only 4 points per base runner to score a run: a double is +2, a single is +1 to the hitter and +2 to the existing base runners, and so on. Meanwhile, the defensive aspect of the game (minus pitching) is simple to extremes: either you play perfectly or you earn an error.

Soccer, however, is far more chaotic, continuous, and dependent on team play. For instance: how do we accurately measure the (essential) contributions of a fullback, who usually records neither saves nor goals nor assists? The answer may lie in borrowing the simulations we find in sports video games, which often assign nebulous values such as 'Speed', 'Power', and 'Control', on simple 1-100 scales, to each player. These values get cranked through a set of algorithms - or 'rules' for the game - that the programmers have written, and presto! - in a matter of seconds we get to see who will win the match between Manchester United and AC Milan. There is variation, of course, but in a series of simulations the 'better' team will become clear by the number and margin of its wins. (We can even fidget with the results by sitting particular players on the bench, trading them for players on another team, or placing them in a different position.)

My proposal, then, is that to assign proper value to players in the real world, we determine the algorithms - or 'rules' for the game - written not by human programmers but by nature. This can be accomplished by a series of steps, the first of which is to translate from videotape a sufficient number of actual played games into computational models. The playing field then becomes an x-y coordinate graph; the ball and each player a particular point in motion. Thus transcribed, we now have all the information we could ask for, and we can set about the arduous task of isolating the relevant variables. Such as: what is the effect on the outcome of the game, of any given player's running speed? Of the speed with which he kicks the ball? Of his accuracy in passing the ball to his teammates? Which player-point geometries - i.e. 'formations' - are most effective in scoring or denying goals? Regarding 'intangibles' like morale - what is the effect on the above named variables, of a teammate's yellow card? Red card? Injury? How 'pumped up' do the players get when the feared or inspirational 'team leader' comes on to the field? Comes within x meters of a given teammate? Opponent? And so on - the variables one could isolate are practically infinite.

Once the most determinant factors have been isolated, we may begin to infer both their role in, and the very logic or program of, the game as it is actually played. (Essentially we are treating the game and all its variants as an equation to be solved. The technical difficulties and details of such a process, I confess, I am utterly ignorant, and leave up to more disciplined and resourceful minds.) This task accomplished, it will become possible to predict - with 'statistical certainty' - the winners and losers of given matches, the contributions certain players will make to certain teams, and, in short, the probability of any outcome you wish to discover. Moreover, given that the potential roadblocks are surmountable, this relatively simple approach may find success in applications for dynamic systems far afield from mere spectator sports.

No comments: