Einzelnen Beitrag anzeigen
  #17  
Alt 12.02.2002, 16:48
Cogito Cogito ist offline
steht wie ein Fels
Foren-Stammgast 4000
 
Registriert seit: 02.03.2000
Alter: 74
Beiträge: 4.333
Cogito kommt allgemein ganz gut an (Renommeepunkte mindestens +60)
Liebe Leute,

ich kann nur noch einmal versuchen eure Aufmerksamkeit auf das Elo-System zu richten:

1.
Punkte-Berechnung
Die Punkteberechnung erfolgt nach dem ELO-System. Das Elo System wurde um 1960 von Apard Elo entwickelt, hauptsächlich um eine objektive und langfristige Bewertung im Schach zu ermöglichen. Daraus entwickelte sich das meistverbreiteste Ratingsystem für das wir uns auch entschieden haben, gerade weil es so lang erprobt und auf viele andere 'Sport'-Arten abbildbar ist. (Ein Beispiel für die mathematische Berechnung findet Ihr am Ende des Texts.)

Ziel des Elo-Systems ist es, nicht immer mehr Punkte zuverteilen, sondern den Spielern die Punkte zuzuteilen, die er sich verdient hat. Ein Team mit der Bewertung 1000 entspricht also genau dem Durchschnitts Team, wohingegen ein Team mit 1500 Punkten schon wesentlich besser ist. Von dem 1500er Team wird erwartet, dass es gegen ein 1000er Team gewinnt und zwar mit einem gewissen Mindest-Verhältnis in Prozent. Dieses Verhältnis kann man auf unterschiedliche Weise berechnen:

Z.B. beim 1on1 ist es ganz simpel:
Spieler A hat 20 Frags, B 5 Frags. Somit hat A 80% (20/(20+5)*100), B 20% (5/(20+5)*100).
Spieler A erfüllt also genau die Erwartetungen und bekommt 0 Punkte, da er sich genau so verhält wie 1500 Punkte es verlangen. Er war nicht besser und auch nicht schlechter. Gleiches gilt also somit auch für Spieler B, der mit 1000 Punkten bewertet wurde.

Ein weiteres Beispiel, diesmal CTF:
Team A mit 1500 Punkten, Team B mit 1000 Punkten bewertet.
Erste Map:
Team A 3 Captures, Team B 8
Zweite Map:
Team A 4 Captures, Team B 1

Verhältnisse:
A : B
Erste Map: 27% zu 73%
Zweite Map: 80% zu 20%
Summe: 107% zu 93%

Team B hat also eine weit bessere Leistung erbracht als erwartet und bekommt dementsprechend mehr Punkte zu seiner jetztigen Bewertung addiert. Team A hat zwar gewonnen, aber eben nicht hoch genug und bekommt somit Punkte abgezogen, da die Erwartungen enttäuscht wurden.

Das Bewertungssystem ist beim Start der Ladder noch nicht sehr aussagekräftig, aber je länger jemand spielt, desto treffender ist seine Bewertung

2.
I expound on how the United States Chess Federation has rated players since first instituting the rating system in 1952. Chess is different from a number of sports in that the only objective information that can be drawn from a game is who won; basketball, football, hockey, etc provide additional information in the form of scores and other statistics. I accordingly discuss some potential modifications.
The Elo rating system
This system was cooked up by a guy named Arpad Elo and is named after him. It requires an initial set of ratings; I won't bother explaining how the USCF assigns them, except to note that its very similar to the RPI I discuss elsewhere. (Chess is less susceptible to the problem of a team that plays most of its games against teams that are much better or much worse than they are; this is because almost all games rated by the USCF are played in tournaments in which players who are doing well play each other, so that even if a player is playing an opponent who much better than they are at the beginning of the tournament, he will be playing much weaker opponents by the end of the tournament.) It's not terribly important how the initial ratings are set, so long as enough games are played after this for the Elo system to correct mistakes, which will eventually die out. Once ratings are established, the assumption is that a player who holds a given rating advantage over another player should win a certain percentage of his games against that player.
Elo assumed, and the system seems to work, that this advantage is essentially multiplicative; if player A beats player B 3 times as often as vice versa, and player B beats player C 3 times as often as vice versa, then player A should beat player C 9 times as often as the other way around. The ratings, then, are assigned logarithmically, so that the ratio (3) becomes a difference (proportional to the log of 3). There's still a scale factor to be decided, and the USCF uses a rating system where a difference of 400 points is a factor of ten; you can then calculate the probability that player A will beat player B by taking the difference between the ratings, dividing by 400, raising ten to that power, then dividing this by one more than itself (so that the probability of A beating B plus the probability of B beating A is equal to 1). The sticklers will be interested to know that a draw counts as half a win for each side; if two players play 4 games, I don't distinguish between two wins and two draws versus three wins and a loss.

Even if we don't accept the assumption of multiplicativeness, the next step is very reasonable so long as we have some way of calculating probabilities from ratings. (The very concept itself of "ratings" requires an assumption of transitivity, that if team A is better than team B and B is better than C that A is better than C, where "better than" means "will win more often than will lose". This assumption is almost certainly not exactly correct, but is often close. The fact that this is not always true is one of my complaints about single elimination tournaments.) If a player wins, his rating increases in proportion to the probability that he was to lose; if the player loses, his rating decreases in proportion to the probability that he was to win. Thus if the player wins exactly as often as he is supposed to, his rating stays fixed; if he wins more, his rating goes up over time, while if he loses it goes down. (For most players, the USCF changes the rating by 32 times the relevant probability; again, a draw is half a win and half a loss.)

For initiating ratings, you could calculate an RPI and multiply by 400, which is essentially what the USCF does. For sports, where everyone's season begins at the same time, we can alternatively just assign everyone the same rating at the beginning of the season and run all the games they played through this procedure twenty or thirty times until the ratings stabilize. Note

that if a team is undefeated or has no losses the rating will not stabilize; it will just run off. My solution is to multiply the rating by some fraction like .75 after each iteration. In some sense this treats the team as having lost a fraction of a game. This problem is somewhat ameliorated by consideration of score as presented later.
The USCF, of course, can't do this; where would it start from? As individuals enter and leave at different times, there is no start of the season from which to run the ratings.
The effect of changing that 32 I presented above is to change how quickly a team's rating will change; an increase will increase its volatility, which is undesirable, but it will also allow you to take account of a team's improvement. A smaller number is, in effect, an average over a longer period of time; 32 effectively averages, in practice, over about twenty or twenty five games, so that if a chess player gets markedly better it will take about that long before this is adequately reflected in the rating.
Winning and Winning Big
This, like the RPI, only thus far accounts for whether a team wins or loses. There are different sorts of ratings out there, some intending to predict margins of victory and others intending only to predict which team will win, and I favor the latter. Defense is a legitimate way to win games, and I think a basketball team that wins all of its games 2-0 against any competition is the best team. Winning games is ultimately the point, and teams in hockey will pull the goalie or in basketball will foul the opponent at the end of close games; these strategies are more likely to give the opponents more points, and tend to increase the margins of victory, but they also increase the number of points scored, so that a deficit is more likely to be closed than if such steps are not taken; if your team is down 4 and there is time for 3 more points to be scored, you are in worse condition than if you are down 6 and there is time for 10 more points to be scored. A significant margin of victory can thus often be created by a last ditch effort to save a lost game.
This isn't to say, however, that it is not fair for a ranking system to use scores, statistics, or astrology to predict which team is going to win; while I've defined what I consider desirable in a ranking system, any system that has been concocted is "good" so long as it meets that single criterion: it must predict in advance the winners of games. I expect it is fair to assume that a team that wins a game 72-71 on a last second shot would have been more likely to lose that game than a team that wins the game 91-50; if these results are recorded against the same team, one would suspect that the second team is better than the first. This leads me, then, to my alteration of the Elo system.

A Generalized Elo System
The Elo system says we can predict a team's probability of winning, and that a team that is more likely to win than its rating suggests should have its rating increased. In the same vein as this imminently sensible assumption comes our intuition that the score of a game also correlates to a team's likelihood of repeating the results; the team that won 72-71 has perhaps a 53% chance of winning a rematch, while the team that won 91-50 may have a 99% chance of winning a rematch. Presumably, then, if we had predicted that the former team has a 60% chance of winning, it would not be unreasonable to lower somewhat our estimate of that team based on new data suggesting that the teams are more evenly matched. At the same time, let's not forget the previous caveats against penalizing a team that widens a margin in an attempt to win the game; I think a certain amount of credit simply has to be given for the win, beyond any smooth function of the margin of victory.
To make things clearer, I write an equation. (I'm serious. I don't sympathize with people who are blindly frightened by equations; I often find them much clearer than words.) Let w be 1 if the team wins, w=0 if the team loses, and P is the probability that the team wins; then in the Elo system we change the rating by 32*(w-P), 32 times the difference between the result and the expected result. All I propose to do to modify this is to redefine w; w is going to be the probability I think the team has of winning a rematch based on its performance in the game. In the above example, I might say w=.53 for the team that wins 72-71. This is where I bring in my caveat about teams that gamble to win; I might get 80% of the value for w from the score, but some of it, say 20%, should be based only on whether they win. In this case .8*.53+.2*1=.62, so that I assign the value .62 to w; if I thought the team had a 60% chance of winning, I might actually bump them up a little then, because, even if it was close, they *did* win.

Poisson Statistics
For hockey, I can provide what I think is a very reasonable model for the probability of a team winning a rematch based on the score of a game. The same model works for soccer, but seems less justified for the more popular American sports, at least without some alteration.
I assume in hockey that, in any given 20 seconds or so, team A will have a certain (small) probability of beating scoring against team B, and team B will have a different (small) probability of scoring against team A. So long as these probabilities are constant throughout the game, that they don't change appreciably when the score changes, and so long as these probabilities are small, the probability of team A scoring a certain number of goals over the course of a game will follow a distribution well known to physicists, the Poisson distribution. The expected value, of course, is 180, the number of 20-second periods in a hockey match, times P, the probability of scoring a goal in one of those periods. For those with patience for combinatorics, I note that there are C(180,n) ways of scoring n goals, and that this gives a probability of C(180,n)P^n(1-P)^(180-n), which, for small n, is about (180^n/n!)*(P^n)*(1-P)^(180-n) = ((180*P)^n)(e^(-180*P))/n!, where n! is n factorial and e is 2.7183, the base of the natural logarithm; since 180*P is just the expected score m, we have a probability of (e^-m)(m^n)/n!, where the e^-m is just a constant that makes the total come out to 1 as the total of probabilities should.

For larger values of m, and even, to a decent approximation, for smaller ones, this distribution is similar to a normal curve, or a bell curve, where the standard deviation will be equal to the square root of the expected value; the difference between two random numbers that follow separate normal distributions follows a normal distribution itself, where the standard deviation is the square root of the sum of the squares of the separate distributions; the punch line is that the difference between the scores should follow a distribution centered at m1-m2 with a width equal to the square root of m1+m2, so that, with hockey at least, the probability associated with a victory by a score of m1 to m2 is approximately erf((m1-m2)/sqrt(m1+m2)), where erf is the error function, which is tabulated in standard probability tables.

If we gave each team 2 points for each goal they scored, we'd of course double both m1 and m2, which would double the numerator of (m1-m2)/sqrt(m1+m2) but would only multiply the denominator by the square root of 2; in such a system we would want to use erf((m1-m2)/sqrt(2*(m1+m2))). Football might work with a fudge factor similar to this; in the case of football it would presumably be between 3 and 7. In basketball, however, and to a lesser extent in football, there is a heavy inverse correlation between a team's scores, by which I mean that when one team scores the other team gets the ball and is more likely than not going to score before you get the ball back. The difference with hockey and soccer, then, is that in these sports there are very few goals scored per possession, which fits into the given assumptions well. For basketball and football, we need another term, giving us something like erf((m1-m2)/sqrt(n*(m1+m2-2r*sqrt(m1*m2)))), where n is a constant that sort of describes how many points a team scores in one go and r is a number between 0 and 1 that characterizes this inverse correlation alluded to earlier. n and r will really have to be fit semiempirically, that is to say by just trying different numbers and seeing what works for each given sport, but once you have those you can use the formula given to calculate this w to put into the rating-change formula that I gave.

Vielleicht können die Experten ja etwas damit anfangen. Wenn nicht, dann hilft vielleicht http://www.schachbund.de/WOrdng/ORDWO1.HTM
weiter. Dort sind die Regularien für die DWZ-Zahl, einer weiteren Variante (leider etwas schlechter als die Elo-Zahl) nachzusehen.

Also: EFINDET DAS RAD NICHT NOCH EINMAL !
__________________
Immer schön eklig spielen !
Mit Zitat antworten