When Data Does Not Lead: A Quick Guide to Using Data in Game Development
“We’re data led” - But is that a good thing and how should game development be led by data? Unpacking a few thoughts about data in the age of AI.
“We’re data led” is a fairly common refrain in the games industry, especially in areas that generate a lot of data by default (mobile and Roblox being prime examples) and it is something I’ve had as a strategy for some of the times at Auroch Digital in the past too. In general it’s a good approach as it acknowledges that our ‘gut feelings’ can be wrong - wildly wrong in fact. So it posits that taking cues from cold, unemotive, reliable data is a better way to lead on decision making. I’ve definitely made decisions from the gut in the past without any data to help guide me and I have been wrong. Had I used some data it may have shown the shortcoming of the ‘gut feeling’. So, yay! Data-led? As with most things in life, it is not that simple.
(Image - The sculpture of Paolozzi’s Newton does some number crunching at the British Library. Photo by me.)
Let me explore some example if I follow this line of argument;
“What is the most popular genre of games? Let’s make games in that genre as it is where the players are!!” So yes, except that it is also going to be where the most intense competition is too.
“Ok let us look at the genres that are least popular, we’ll face less competition!!?” Yes, but you’re fishing in a smaller pool of players, so the sales potential is going to be smaller.
“So we’ll search the data to find a good place where there is a higher player rate of play and lower amount of competition and make that genre of game! Winning!” OK, better, but what if that opportunity is a temporary window and because it takes years to make a game and by the time you release it the opportunity is gone? It’s complex.
There is also the issue that with large pools of data - such as that available in an industry composed of many platforms, genres and sub-markets/genres where multiple things can be true at once, for example from GameDiscoveryCo about an exception that proves and disproves the rule:
Of course, that leads to typically counter-cyclical commentary from the Internets: “It’s cool how we keep proving over and over and over again that there’s a significant market for narrative-heavy, character-driven, single-player experiences in video games and publishers just categorically refuse to believe us.”
So using data to help you make decisions is going to be hard work. “But hey, here’s AI - that can parse vast amounts of data, more than a single human ever can?” Does that solve our problem? Sorry, no:
But let’s not confuse adoption with progress. The generative AI wave didn’t solve the data quality crisis in gaming. Rather, it amplified it. Now, instead of just bad research, we have bad research at scale. ChatGPT can hallucinate market sizing with remarkable confidence. Consultants armed with LLMs produce reports that sound authoritative but rest on the same shaky foundations I criticized years ago: estimates based on estimates based on estimates. The tools got faster. The thinking didn’t. (Source: Superjoost)
(Video - An art instillation representing data flow, from the House of the Futures, Berlin. Video by me.)
So how can we use data in the age of AI? Here’s a few rules I use for data in general to guide me alongside the ‘gut feeling’:
Use data to null, not prove, your hypothesis, I’ve absolutely made the mistake of seeking data to confirm a game I wanted to make, so inadvertently biasing the supposed ‘cold, unemotive, reliable data’ with old-fashioned human error. So by trying to disprove your case, you can remove some of the confirmation bias.
Understand that data is always flawed. Any source of data will have gaps, be missing parameters or have areas that should be included but for various reasons, are not. The collection of the data itself is often subject to human error and bias in the first place. The trick is to know where it might be wrong, so this can help you account for it.
Understand that data is always a reflection of the past and while it can predict the future, it is not always assured. “So the sun has risen every day for millions of years, so the data predicts it will keep rising each morning, ongoing!” Except one day, yes off in the far future, the sun will die and not rise the next morning. That’s a bit of a hyperbolic example, but let me take the example of Steam Wishlists. They are a very reliable predictor of the success of a game, but only if the quality of the game released is rated by players as good and the actual game players get matches their expectations. Plus every now and then, the prediction does just fail for complex reasons so there’s always some rare cases of games that buck the trend.
Always verify AI sourced data with a non-AI generated source. (If there’s a lot of data points, you can pick a few and verify those to give a degree of validation, but AI can be wrong because it confuses two very similar things or because it hallucinates)
Data can’t necessarily predict new genres. So trying to predict the next success via data of popular genres of game would not have necessarily predicted a title like PowerWash Simulator or Among Us, because they didn’t necessarily have clear presidents in the current data. But once popular and established, data on them and any emergent genre they have surfaced, magically appears!
Try not to ‘stack’ data sources to get a result, especially if they are estimates or limited in scope. So for example if you wanted to know how many players of a genre of game there were and you had a) the total number of players and b) an estimate of the percentage of games in that genre of interest. Now, together these can give you the total number of players of that genre. However, notice one of these numbers is an estimate, so the final number is going to have a margin of error. However, if both the total number and the percentage playing the genre were estimates then the margin of error is even bigger!
Don’t forget within-game data! While this is baked into most mobile and Roblox development, indie development tends to have less of it. It can be really helpful to find pain-points for players, such as levels they get stuck on or items they never use. Use this info to guide scarce resources.
Be transparent about data; if it is a presentation for example, state your sources! (I always find a separate document with a longer list of sources, methodology and calculations, helps. It means you have this deeper info to hand if asked questions, but it also gives you some extra discipline in generating data as you need to then document the process not just the final result.)
If it is data you are gathering from a live game, do let players know how, what and why you are gathering data for. Plus ideally given them an opt out too. I’ve found in the past that if you are up-front about it and it’s about making the game better, most players are happy to opt in.
I hope this helps and thanks for reading.
Interesting Gaming Links!
From the article on the biggest video games of 2026 (not called Grand Theft Auto 6) - A quote on the growing cost of making games
Obsidian Entertainment says The Outer Worlds 2 and Avowed, two of their big games of 2025, missed sales forecasts. There are no plans for The Outer Worlds 3, although more games set in the Avowed universe may be coming in the future. In an interview with Bloomberg, Obsidian boss Feargus Urquhart said that while Grounded 2, which was also released last year, was a hit, missing targets on The Outer Worlds 2 and Avowed led the studio to “think a lot about how much we put into the games, how much we spend on them, how long they take.”
Thank you for reading!
P.S. This newsletter is a personal one and is done as a personal project and as such is not affiliated with any company that, in my day job, I work with or partner with. Nor do the views I express necessarily reflect any company that, in my day job, I work with or partner with. More on me here.



