Frequently Asked Questions

How Do You ‘Predict’ Player Behavior?

Our models are based on the analysis of a significant amount of a player’s football-related speech while in college. The results of the analysis are then used to predict the likelihood that he will be arrested or suspended for engaging in certain types of activity while in the NFL . In order to produce the predictions, the analysis compares each player’s college speech – examining the specific words, the types of speech, patterns of words, and other characteristics – to the speech patterns of all other players in our database, which includes both players who have been arrested or suspended while in the NFL and others who have not.

What Kinds Of Arrests Or Suspensions Do You Predict?

We currently have models that predict a player’s likelihood of:

  1. Drug or alcohol-related arrest or suspension: for example, DUI, possession, public intoxication, substance abuse policy violations, etc.
  2. Personal misconduct arrest or suspension: for example, disorderly conduct, public indecency, resisting arrest, illegal possession of a firearm, etc.
  3. Team rules violation that leads to suspension: these can range from being chronically late for team meetings to team suspensions for ongoing substance abuse issues.

How Many Players Do You Have In Your Database?

There are currently 592 players in our database. This is an increase of more than 300 players from the size of our database in February 2009, when we presented our initial results to teams at the NFL Combine in Indianapolis. Those initial results were based on a 270-player database.

How Were Players Selected For Your Database?

Our initial data collection yielded speech from 3,870 players from 126 colleges, though an emphasis was placed on collecting data from the college programs that produced the most NFL players from 2000-07. This initial collection effort produced 656 players with sufficient speech for analysis, and 270 who had been on an active roster for at least one regular season. Post-NFL Draft arrest and suspension data for the 270 players was collected after the collection of player speech was completed.

During our second data collection effort, player speech was collected for:
• Players whose amount of collected speech had fallen just short of inclusion in the original analysis.
• Players from colleges that are members of the six Bowl Championship Series conferences but that have sent fewer players to the NFL since 2000 than those included in the initial collections effort.

Once data collection for players falling into the above categories was complete, documents were collected for additional players who have been arrested or suspended while in the NFL. This was done to strengthen our ability to discern the similarities in the speech of such players. Complete arrest and suspension data was then collected for the players eligible for inclusion in the database.

The size of our database is constantly growing as more player speech and arrest and suspension data is collected.

Is Your Sample Of NFL Players A Representative Sample?

Despite the fact that our sample is not a random sample of NFL players, it does match the characteristics of all NFL players on a number of important characteristics. For example, the racial composition of our 592-player database matches that of the NFL extremely closely. As we expand our database, our goal is to include all new players entering the league so that our analysis would include the population of NFL players. Additional information on the sample is available upon request.

How Much Speech Is Needed For A Player To Be Included In The Analysis?

A player can be included in the analysis with approximately 1,000 words of speech.

Where Do You Find The Player Speech?

All of the speech is publicly available and can be found on the Internet.

We focus on collecting transcripts of speech related to football, such as midweek and postgame interviews and press conferences. If necessary, we will also collect news articles containing extensive quotes from a player. We can also transcribe interviews available only in video or audio format.

Although players will occasionally be asked questions about their family, their plans for the summer, etc, the vast majority of the speech we collect is directly related to football.

Do You Update Your Analysis Based On Interviews After A Player Is Drafted?

Our risk assessments only utilize speech from the players’ time in college. We do plan to carry out different types of analyses in the future making use of players’ speech while in the NFL.

What If I Want A Report On A Player Not In Your Database?

Our analysis can be extended to any current or potential NFL player for whom we can locate sufficient speech for analysis. Current college players who are likely to be drafted receive significant attention from the media, but media coverage does vary by school, media market, and other factors. We can conduct a preliminary search and inform you if there will be sufficient data before conducting a custom analysis.

How Can You Say The Models Predict Arrests And Suspensions When You Used Players Who Are Already In The NFL To Create The Models, Some Of Whom Have Already Been Arrested Or Suspended?

Building the models required us to identify patterns of speech that differentiated players with an increased likelihood of engaging in activity that results in arrest or suspension from those who are unlikely to engage in such activity. To do this, we needed to analyze players with NFL experience, some of whom had been arrested or suspended during their professional careers. Our analysis of these players, however, is limited to the speech of these players while in college – our models predict personal behavior in the NFL based upon public comments made while in college.

Additionally, our current models were built using a split-sample procedure to ensure that the speech and behavior patterns for a player were not used to build the model and predict that player’s risk assessment.

How Accurate Are Your Predictions?

Our predictions are extremely accurate. For example, across the three models:
1. 89 percent (89 out of 100) of the players placed in the high-risk category have been arrested or suspended while in the NFL.
2. Even more striking, only 0.13 percent (two out of 1,522) of players categorized as low-risk have been arrested or suspended during their professional careers.
3. Of the players in our database who have been arrested or suspended while in the NFL, our models placed 98 percent (104 out of 106) in the intermediate- or high-risk category based on their football-related speech from college.

The tables below break down the results by model and risk category.

These results provide robust support for our assumption that player speech in college accurately measures underlying personality factors that predict player behavior while in the NFL.

What Are The Variables That Make Up Your Models?

The predictions generated by our models are based on:
a. “Bag of Words” Variable – a predictor variable created by analyzing all of the words spoken by the 592 players in our sample and determining which of these words are correlated (either positively or negatively) with arrest or suspension while in the NFL.
b. Theory-Based Variables - combinations of predictor variables based on coding schemes developed over the past decade by Social Science Automation, Inc., our partner company. These coding schemes search for and generate scores based on the use of specific words and phrases long associated with certain personality traits by researchers in the fields of psychology, sociology, economics and political science.

Does Your Analysis Include Any Player Characteristics Or Background Variables?

No, our risk assessment looks exclusively at player speech while in college. Our models do not include any player characteristics or personal background information. The only thing that distinguishes one player from another in these models is the words they choose to use.

The Key Words That You Look For When Analyzing Players’ Speech Should Raise Obvious Red Flags To Anyone Who Hears Them, Right? Aren’t There Also Some Obvious Words That Would Indicate A Player Is Unlikely To Have Off-The-Field Issues?

Upon hearing that one component of our models (the bag of words variable) is based on the words spoken by players that are correlated with arrest or suspension, some people may assume that we’re looking for certain words that they feel would either raise an obvious red flag or, conversely, cast a favorable light on the player who uses them. This is simply not the case. The bag of words variable for each of the predictive models is based on a list of approximately 2,000 words that are correlated with arrest or suspension. Each word also has an associated “weight” based on the strength of its correlation. Most of these words seem as though they would hardly have any value in predicting arrest or suspension, but our analyses and results have shown otherwise.

What Is The Effect Of Slang And Regional Colloquialisms In Your Analyses?

Slang, colloquialisms and mispronunciations have little to no impact on the bag of words variable. Words must be used a minimum number of times across the entire sample to even be analyzed for their correlation with arrest or suspension; this has the effect of removing most words that are regionally or culturally based.

Furthermore, because almost all of the player speech used in the analysis deals directly with football, certain words and phrases that one might hear players use in other contexts are mostly missing.

Finally, mispronunciations, improper grammar and speech disfluencies (for example, fillers such as “uh,” “um,” “like;” repeated words or syllables; restarts; etc) are generally corrected by those transcribing the interviews or press conferences. For example, “I seen, um — I seen the linebacker in coverage out the, uh, corner of my eye,” would likely be transcribed as, “I saw the linebacker in coverage out of the corner of my eye.”

As You Add Players To The Database, Is Their Speech Taken Into Account In The Bag Of Words Variable? If So, How Will This Affect Your Results?

We have found that the bag of words variable is a powerful predictor of arrest and suspension. Because it is a dynamic component of the predictive models (that is, common, everyday language changes over time), we will re-estimate the word lists and associated word weights that are used to create the bag of words variable on a regular basis. It is highly unlikely that such re-estimation will lead to dramatic changes in the variable from year-to-year. The re-estimation will, however, allow us to capture gradual changes in language and take these into account while generating predictions in the future.

What’s The Purpose Of Including Theory-Based Variables In The Models?

The theory-based predictors included in the models add insight gained from years of research across a number of fields concerned with personality traits and their relationship with individual behavior. While the bag of words predictor is a dynamic component of the models that will be updated regularly, the theory-based predictors are based on coding schemes with established rules. These schemes, which search for and generate scores based on the use of words and phrases that researchers have associated with specific personality traits, have been developed, tested for accuracy and used in previous research carried out over the past decade by Social Science Automation, Inc., our partner company.

Shouldn’t A Team Psychologist Be Able To Determine Which Players Are More Likely To Have Off-The-Field Issues By Analyzing The Results Of Psychological Tests Administered During The NFL Combine And Listening To Player Interviews?

Team psychologists provide an immensely valuable service to the teams that employ them. There are many instances of successful draft choices having been made in large part due to the information provided by team psychologists.

Fully aware of the value of team psychologists, we view our analyses as being complementary to the work they perform. On one hand, a team can be more confident in its overall assessment of a player when our predictions agree with the information provided by team psychologists. On the other hand, a team may decide that more research should be carried out on a player’s background if our predictions differ from the team psychologists’ assessment.

Another major benefit of our work is that teams are able to receive Achievement Metrics risk assessment reports for players of interest while they are still playing college football. This lengthens the time horizon for teams as they begin to gather information for players on their early draft boards. Because our reports are available months or even years before players can have contact with representatives of an NFL team, our assessments can serve as an early point of reference for team psychologists.

Are You Saying That If A Player Falls Into The High-Risk Category, He Is Definitely Going To Be Arrested And/Or Suspended During His NFL Career?

No. Our work isn’t like the movie Minority Report. The predictions we provide are probabilities, not certainties. We do not claim to know when a player will commit an offense that is punishable by arrest and/or suspension.

Furthermore, we hope that our predictions are used by teams to proactively address potential off-the-field problems. In other words, our hope is that teams will devote resources to those players with elevated risks of arrest or suspension in order to help them avoid making the kinds of choices that could lead to their arrest or suspension.