In this article, I will explain the optimization I used on the game part to ensure a fluid 60PS experience in most browser. This article is also available on medium.
1 The webgl API
The Webgl (Web Graphics Library) API permits to uses modern GPU to render graphics inside the browser. In order to put the GPU to work, you need to provide two functions called shaders: a vertex shader and a fragment shader. The shaders are written in GLSL(GL Shader Language) which is similar to C++.
The vertex shader aims at computing the vertex position of a scene. The vertex shader’s output is then sent to the fragment shader who compute the color for all the pixels that are rendered. In myshmup.com, I used a simple pair of vertex and fragment shader. They handle only 2D rectangle as primitive shape, each with their own texture to be painted on their surface. Texture colors can be adjusted to enable blinking effect. Most of the rendering work consists to feed the shaders with the data they need at each frame.
2 A naive implementation
When I first tried to render 2D game using Webgl, I coded a pair of shaders drawing textures on the screen. The shaders were handled by a typescript function that drew one game object at a time. This low level function was called by a draw function taking a game object as input. The draw function was called for every game object visible on the screen.
gameObjects.forEach( gameObject => draw(gameObject) );
While working well for a low number of objects, this approach gave very poor result when the number of game objects was more than 50 on a recent mac book pro. Something was wrong…
3 To minimize draw count you must
Every rendered frame is a result of work done both by the CPU and the GPU. The CPU prepares the data and the instructions for the GPU. The GPU memory is located on, well the GPU. It is called VRAM and it’s separated from the main RAM. Hence, we need to transfer data between the classic RAM and the GPU VRAM. At each draw call, the GPU has to wait. It can only start rendering after the needed data were pushed from RAM to VRAM. When this is ready, the graphics card can start doing its job with lots of effectiveness due to its high parallelization level.
It’s like a factory: it’s designed to produce lots of good in a given batch, but the setup time of the batch can be long. You don’t want to use this production line for artisanal work with one object per batch, you want to produce hundreds of wooden pallets for every batch instead to optimize production cost.
We understand now the importance of minimizing the number of draw calls performed for each frame. The CPU to GPU data transfer overhead being the main bottleneck for fast rendering.
4 Instanced drawing
Our goal was to minimize the number of draw calls. To do so, we pack all the game objects using the same texture together. Think of all the bullets of an enemy or the hero in a shoot em up game. They are in large number on the screen (hence the bullet hell expression) but they share the same texture. It’s just the same sprite draw at different position. The same is true for decor, they generally are made of tiles that are repeated on the screen. Instead of handling the game object set as a simple array, you prepare the data before the rendering. You build a map where the key is the texture id and the value is the array of objects using this texture. Then you can call the draw function one time per texture only. A huge saving for shmup where there is a lot of duplicated sprites.
textures.forEach( texture => draw(texture.gameObjects))
In order to draw array of objects using the same texture several times in one batch we should use the “instanced drawing” feature of Webgl 2. This feature was available as an option in Webgl 1. For simplicity’s sake, we decided to use Webgl 2 although it is not compatible with all today’s browsers.
5 Texture atlas
I implemented instanced drawing and everything was fine. After one year of development, I released the site for the public. A game jam was organized where all the games were created using myshmup.com. Each participant created very original game in a short period of time. The winner of the game jam published a level inspired by the TRON movie with neon pixel art. He created a lot of decor tiles and destructible ground enemy to give a rich game environment. And then it goes again: the game was lagging sometimes on my state-of-the-art-hipster-approved mac book pro. What was wrong? The number of different textures shown at a given time was bigger in this game that on simpler game. What to do next?
The silver bullet was the “texture atlas” trick. The idea is to create a very big texture: in myshmup.com the atlas size is 4096 x 4096 pixels. Then you just draw all the game object’s texture in this big texture. When you copy one texture to the atlas, you keep track of the texture coordinate associated with it, so that you can retrieve it afterwards. If your atlas is too small, you just create another one.
After implementing the texture atlas, I attained Webgl nirvana. I call the draw function only once per frame. Well, to be honest, it’s more accurate to say the draw function is called only once per frame for each layer. That means 10 draws in myshmup.com: 6 parallax layers in the game and an additional 4 for the game UI (score bar and buttons). And that’s it. I can have 1000s of objects, there will be only 10 draws every frame. GPU doing the heavy lifting and rendering everything like a lamp genius.
This journey in webgl optimization was full of surprises. If implementing instanced drawing and texture atlas seems like over-engineering, believe me it is not. It is key to have fluid action game in the browser. Only after that, I was 100 % confident in the robustness of my platform to deliver fluid entertainment. When you have a nearly constant 60 FPS frame rate, the action is visceral, you are in the game. You play on a 16-bit console or on an arcade, you forget that all this happens in a simple browser. If I was starting again myshmup.com project, I would have done it sooner for sure.