Fascinating stuff, all revealed via your tweets:
The dataset was about 55% female, 45% male (which squares roughly with estimates of Twitter’s overall gender breakdown). Thus, by guessing “female” for every user, a computer would be right 55% of the time. Simply by examining the full name of the user, a computer was accurate about 89% of the time–a remarkable improvement, if not an especially interesting one, since first names are highly predictive of gender. The Mitre findings become intriguing, though, when the team limited its analysis to tweets alone. By scanning for patterns in all the tweets of a given user, Mitre’s program was able to guess the correct gender 75.8% of the time–a 20% improvement over the baseline. And even just by analyzing a single tweet of a user, it was right 65.9% of the time–an over 10% improvement over the baseline.