1. What are these systems?
These are the names of the systems participants submitted to this competition.
These systems generate jokes given a prompt (e.g., a headline).
The names of the systems are composed of the participant name, a hyphen, and the submission number,
obtained from
our CodaBench competition website.
2. Why do some systems have similar names?
Participants are allowed to make multiple submissions.
So the participant name can appear multiple times, with a different submission number.
3. What's baseline-1?
It's the name of a system provided by the competition organizers as a baseline.
4. How are the systems evaluated?
We use
this annotation web page
to let anyone on the Internet help us
decide what's the funnier system on 1-on-1 arena-style battles,
partially inspired by
LMArena.
With all the annotations, we compute an Elo-like rating score to rate the systems.
A higher rating indicates a system that is more likely to generate outputs perceived as humorous.
This is a system used by LMArena and also by games such as chess.
More specifically, we employ a Bradley-Terry model to compute stable ratings,
and apply bootstrapping to compute 95% confidence intervals.
Note that, in some border cases, there could be differences between the confidence intervals
and the final rating values.
See
this blog post from LMSYS Org
for more info.
5. Why do some systems have the same rank?
Some systems have the same rank because we can't differentiate them in a statistically significant way,
even when their ratings are different.
Note that ties aren't transitive.
For example, we may not be able to tell which of A and B and which of B and C are better,
but we may be able to significantly tell that A is better than C.
That's why a system may have a lower rank than another one without a statistically significant
difference (because there are others systems with the same rank as the latter that can be differentiated
from the former).
6. Why do some systems have fewer votes than others?
Some systems were submitted more recently.
7. When is the leaderboard updated?
We target to update it every one hour, but there may be issues so we can't guarantee it.
In any case, the last update time appears at the bottom of the tables.
8. Can I evaluate systems?
Yes!
Everyone is welcome!
Visit the
the annotation web page
and have fun rating the funnier systems!
Note that you can't both submit a system and evaluate systems.
In other words, only non-participants can evaluate systems.
If you're considering participating in the competition, please refrain from evaluating systems.
9. Can I participate in this competition?
Of course!
Everyone is welcome!
Visit the
2025-2026 MWAHAHA competition website
for more info.
You have time to submit your system's jokes until the Evaluation phase ends (late Jan, 2026).
Note that, if you are considering participating, you cannot vote/annotate
(i.e., you can't both submit a system and evaluate systems).