Ishan Sharma on Real-Time Generative AI for Gaming Apps Running on Kubernetes

Kubernetes provides a great platform for gaming applications using generative artificial intelligence (GenAI) for both game development and gameplay. Ishan Sharma from Google spoke at the recent KubeCon CloudNativeCon NA 2023 Conference about real-time GenAI inference integrated with distributed game servers running on Kubernetes.

With the launch of ChatGPT and Bard the term GenAI has become mainstream, not just in the technical community. Over the last decade AI and ML technologies have been steadily improving and AI has been beating humans in perception tests in domains such as handwriting recognition, speech recognition, image recognition, reading comprehension and language understanding. Generative capabilities of AI have also improved a lot in the last 9 years, from generating very pixelated black and white images in 2014 to very realistic images in just three years (2017) and by 2021 it was possible for text-to-image generation using prompts.

GenAI offers a lot of support for online gaming based applications. With the help of a chart showing global GenAI prediction in the gaming market from 2022-2032, Sharma said GenAI is being used in game development use cases first and will be eclipsed by new game experiences such as smart non-player characters (NPC), level generation, image enhancement, scenarios and stories. In game development, the applications of GenAI are boundless: create art assets, auto-generate game code, life-like conversations with bots, and generate levels from player input. Generative AI is evolving the games industry and will transform live service games into living games. From boxed software games in the past to live service games today to evolve in the near future into what are called the living games. In living games, three aspects – Developer, Game, and Player – will interact with each other to enrich the user experience. Here, the game developers will need to develop AI responsibly and safely by protecting the intellectual property while at the same time respecting the player’s privacy and safety.

Classification of GenAI use cases in games includes two categories: improving productivity during game development and improving player experience during gameplay.

In the game development phase, we can use GenAI to accelerate time-to-market by creating content and simplifying development. This includes development of game assets such as characters, props, audio and video. Turnkey APIs like VertexAI, Amazon’s Sagemaker, and ChatGPT can help in this category.

In the second category, run-time gameplay phase, we can use AI/ML & GenAI to adapt the gameplay and empower players to generate game content in real time. These capabilities include smart NPCs (bots), dynamic in-game content, and customized player experiences. GenAI during gameplay brings demanding requirements like low latency, high performance, fast scalability, and low cost. Runtime gameplay environment can use platforms like Google Kubernetes Engine (GKE) to host the gaming apps.
Based on user research that his team conducted across SME’s in the gaming industry, Sharma discussed user pain points for GenAI in games in three different categories: platform, AI maturity, and Gameplay. In the platform category, we need at-scale cost efficiency to ensure financial feasibility for popular (AAA) games. Also, for a seamless player experience, low latency and lag are essential to ensure smooth gameplay. Lag can hurt the success of games where even sub-second latency is not acceptable. And the platforms with performance, & access to run state-of-art models without vendor lock-in will drive the platform decisions. For the pain points in the AI maturity category, LLM Unpredictability is a big concern. We need a coherent, relevant, and contextually appropriate inference over and over again that’s repeatable. The models should not promote AI biases and stereotypes. Content filtering and moderation is needed to ensure safe & inclusive gameplay environment for the players. In the third category of gameplay, we need to balance user generated content with game lore & structure (creativity). Some games need content for gameplay which LLMs filter out so we need to keep in mind the GenAI constraints. Also, procedural generation with GenAI still requires human supervision in the near future as we continue to evolve with GenAI and LLM’s.

Sharma mentioned Kubernetes is a good computing solution for games as it solves majority of the IT operations problems like scheduling, health-checking, deployment methods, autoscaling and rollbacks, centralized logging & monitoring, declarative paradigm and primitives for isolation. But the challenge is that Kubernetes, on its own, does not understand how game servers work. For game servers, we need additional capabilities like maintaining in-memory state, starting and shutting down game servers on demand and protecting the running servers from shutting down (even for upgrades!) will result in poor player experience. 

Agones open source framework can help with these game server scaling and orchestration requirements. It was developed in 2017 with a partnership between Google and Ubisoft. Agones makes it possible to get all the benefits of Kubernetes operations, but now for game servers as well, including better understanding of game matches and sessions, seamless scaling with player loads, multiple UDP/TCP ports per node and hot-spares with tunable warm-up parameters.
Sharma discussed the high-level architecture of a live service game with a use case of multi-player based game session. Core components of the solution like Game Frontend, Matchmaker Service to direct the player to connect to a dedicated server where they can connect with other players in a shared environment and shared experience, and Player Profile Service, can all be hosted on a Kubernetes cluster. Game servers also run on K8s and are orchestrated by Agones.

When it comes to integrating GenAI inference with gameservers, development teams have a few different options. Similar to game development options, turnkey solutions like VertexAI, Sagemaker, and Stable Diffusion API can be used for gameplay environments. Second approach is a DIY solution with k8s where dedicated GenAI Inference servers would run on Kubernetes Nodes. These servers can leverage infrastructure hardware options like GPU’s or high-performance CPU’s. Another approach is to run GenAI inference servers as sidecar components within the same pod where a dedicated inference server is needed for each Game Server. The underlying hardware is optimal for both Agones Game Server and the GenAI Inference Server. The teams should find the right balance between raw performance and cost when choosing any of these options.
He talked about advantages of different options in integrating GenAI inference with game servers. Advantages of using a turnkey solution include out of the box game development use-cases, improving time-to-value, and some specific models are only available through Turnkey APIs, not openly available where you can containerize them. DIY solution with Kubernetes for GenAI in games include openly available models that can run in containers, k8s can be more cost-effective than pay-per-use APIs for high usage scenarios (game launches where you see an influx of a lot of concurrent users in a short amount of time). Also, dedicated inference k8s nodes are easy to set up with k8s features such as horizontal pod autoscaling (HPA), scheduling with taints/tolerations.

They ran some tests using Stable Diffusion (for image generation) and Bloom (text generation). A slight better performance was observed when using the sidecars. In general, inference latency overpowers any difference between different Kubernetes deployment methods. Dedicated inference k8s nodes provide the most versatility, ease of use, and flexibility.
In his conclusion of the talk, Sharma highlighted the advantages of using Kubernetes for GenAI in games, in the areas of portability, flexibility, scalability & performance, and cost & efficiency. There is also a decent ecosystem of frameworks from which to choose, which includes frameworks like Spark, Beam, Dask, Ray, Rapids, and XGBoost.
Sharma ended the presentation with a demo of integrated GenAI into a multiplayer game with real-time image generation. Demo app is hosted on GKE’s GenAI inference cluster on Google Cloud and uses a dedicated nodes option. There is a GenAI API component that routes traffic to different models. The logic layer consists of NPC for test pre/post processing for dialog, image generation logic that handles image pre/post processing, and VertexAI services for LLM pre/post processing and talks to VertexAI LLM endpoints. In terms of models, LLAMA 2 model was used for text generation and Stable Diffusion was used for image generation.
For more information on KubeCon NA 2023, check out the conference website and the complete program schedule as well as Data and AI/ML specific session catalog.

Related Posts