Learn OpenGL: Part 1

Introduction

After exploring raytracing in my last two projects I wanted to learn more about rasterization and real-time graphics technologies. I picked up a copy of “Learn OpenGL – Graphics Programming”, which is often sighted as a good introduction to the topic. The book is rather large, even by programming textbook standards, and covers a lot of information. To avoid writing one long bog post I am going to break up my experience into multiple post. I am still working my way through the book, but have completed the introductory sections. This post will cover those first three sections of the book. I am going to try and keep my discussion here to an overview. The idea of this post is to help me cement my understanding in broad ideas and not to be overly detailed. If you are interested in learning more you can pick the book up on Amazon for yourself.

(https://www.amazon.com/Learn-OpenGL-programming-step-step/dp/9090332561/ref=sr_1_3?crid=OKI60VDM8AHJ&dchild=1&keywords=learn+opengl&qid=1609380799&sprefix=learn+open%2Caps%2C221&sr=8-3).

Getting Started

This section is all about getting everything set up and rendering your first scene with OpenGL. A good portion of it is devoted to just getting your project set up so that you can even start working with OpenGL. The overarching concepts introduced are vertex data, shaders, textures, and geometry transformation. At the end of the section I was able to create a scene filled with rotating boxes and a moveable camera.

Hello Triangle

To get started the book walks you through all the steps to display a triangle. While this seems like a trivial task, there were a lot of prerequisites concepts that made it challenging.

One of the big concepts is understanding how OpenGL stores and handles geometry data on the GPU. Vertices, which are a set of cartesian coordinates that define a shape, are basically treated as a large array. Vertex Buffer Objects (VBO), Vertex Array Object (VAO), and Element Array Buffer (EBO), are OpenGL data structures which store vertex data in an indexable way that can be accessed by the vertex shader. Interestingly, the vertex array can store normals, vertex color, and texture coordinates along with position data.

The chapter also brief talks about shaders, but only enough to enable to you to get your first triangle.

Introduction to Shaders

Before discussing shaders you are introduced to the render pipeline. There are 6 shader steps that convert geometry data to pixel color. To be fully honest, I do not full understand the whole process. However, at this point there are only two shaders processes that you are really interested in. The vertex shader and fragment shader.

The vertex shader sole purpose is to transform 3D coordinates. Later the vertex shader will play a large role in positioning objects in a scene and letting the user move around in 3D space. For now though it simply takes in vertex coordinates and passes them to fragment shader.

The fragment shader takes in vertex data and calculates the final color. It does this by splitting geometry into smaller fragments and then calculating the color of each fragment. In the above image, you can see how using the fragment shader, I am able to pass in in vertex color data from the vertex shader, and the fragment linearly interpolates between the vertices to shade the entire triangle. The fragment shader is going to play a crucial role once the book gets into lighting and texturing.

An interesting aside, it is possible to calculate color values in the vertex shader and then pass it to the fragment shader for rendering. However, because the geometry isn’t fragment the resulting image is really pixeled and it is not ideal.

Textures

After getting an introduction to the basic of shaders the next item is learning about applying textures to create more complex material.

Each vertex has a texture coordinate associated with it that is a 2D vector (x,y) with values between 0 and 1. Coordinate (0,0) is the lower left corner of an image and (1,1) is the upper right. Using texture coordinates we can map an image to a 3D object. These textures coordinates are declared in the vertex array as floats and then passed to the shader program to process the texture image.

Once we are able to map an image to geometry, we can import a texture from a standard image format (ie png/jpeg) and pass that texture image to the fragment shader that samples the image and outputs the final color value.

An interesting concept I learned about are mipmaps which dynamically scale the resolution of textures based on distance from viewer. This results in objects further away from the viewer using smaller resolution textures helping save on memory.

We are able to pass multiple textures to a shader program using texture units. In the image above, I blend two textures together to create a more complex texture. Multiple textures will later become important when creating shaders that use multiple light maps to control material properties. With texture units we can have separate textures for different light properties, such as specular and diffuse.

Transformation

Next we want to be able to move objects and move around in a scene. It wouldn’t be a very interesting program or game if everything was stationary.

Geometry transformation rely heavily on vectors, matrices, and trigonometry. The basic idea is we can transform a vertex, which are by their nature a vector with three elements (or a 3-by-1 matrix), by multiple the vertex with a 4-by-4 matrix that stores all transformations. This goes into a lot of fundamental mathematic concepts that I won’t get into here. Basically though, we can multiple together a matrix for translation, a matrix for rotation, and a matrix for scale to produce a single matrix that we can then pass to our vertex shader. The great thing about this single transformation matrices is that we can update it every frame to create animated geometry.

Coordinate Systems

This was a really conceptually heavy chapter. OpenGL expects all vertex coordinates to be normalized device coordinates (NDC) meaning they between -1.0 and 1.0. However, most of the time we are working with objects either in object space or in world space, meaning vertices have to be transformed for rendering. This is accomplished in a step-by-step fashion with several intermediate coordinate systems. The benefit of intermediate coordinate systems is that some operations are easier in one coordinate space than another.

Coordinate spaces

All geometry start off in object space (local space) meaning that coordinates are relative to the origin of the object. If you assume the origin of an object is (0,0,0) than each vertex would be declared in relation to that origin. For a triangle the three vertices could be (1,0,0), (-1,0,0) and (0,1,0).

To create scenes with several transformed object, it makes sense to declare their positions in terms of a global origin, instead of each one having its own local space. That’s where world-space coordinates come in. This coordinate system places objects in respect of a larger world.

After world-space coordinates are transformed to view-space coordinates. This tells us where objects are in respect to the viewer. Meaning that the viewer is the origin. This is like saying an object is 2 feet away from you.

Next is the clip-space coordinate system which process coordinates to fit within a range of -1.0 and 1.0 and removes any coordinates that fall outside of this range. Essentially removing any coordinates that are not within the camera’s view.

Lastly, there is screen space coordinates which simply transforms coordinates to match resolution of the display window.

Transformation matrices

To do all these coordinate transformations, within the vertex shader we multiple each vertex position coordinate by a set of transformation matrices. There are three important matrices the model, view, and projection matrix

The model matrix is used to position objects in world space. The view matrix transforms everything to be in relation to the camera. Finally, the projection transforms coordinates to normalized device coordinates (NDC) meaning they are between -1.0 and 1.0. The projection matrix is also responsible for establishing prospective.

Below is a vertex shader demonstrating how vertices are multiplied with matrices to translate each vertex’s position the different coordinate systems.

# version 330 core
layout (location = 0) in vec3 aPos; 

uniform mat4 model; 
uniform mat4 view;
uniform mat4 projection; 

void main () 
{
   
    gl_Position = projection * view * model * vec4(aPos, 1.0);
}

Camera

With all that said about coordinate systems we can now create a virtual camera and move around a 3D scene.

The first step is to establish camera position and direction. Position is simply a vector (x,y,z) representing position in world space. To get the camera direction requires a more involved process. We start by finding the z-axis. which declares the forward direction of the camera, by finding the difference between the camera position and target position, where we want the camera to look. Then the x-axis is found using the cross product between the z-axis, forward vector, and the world up vector, usually (0,1,0). Finally the y-axis, the up vector, is the cross product between the z-axis and x-axis.

By updating these vectors we can move and look around. The really interesting concept behind the camera is that in actuality the camera never moves. Instead, all the objects in the scene move around the camera with no perceptual difference. To accomplish this we apply the inverse camera transformation matrix to every object in the scene. If we want to move the camera 2 units to the left we apply the inverse and move all objects in the scene 2 units to the right.

Camera rotation is accomplished using Euler rotation. With a little help from trigonometry that I won’t get into here. Basically we find the rotation on each of the axes update the forward axis (z-axis) and pass that to a function to generate a view matrix and then apply the inverse to all objects in the scene.

Lighting

This was the section that I was most excited to get to. The book uses the Phong lighting model, which I have experience with from a ray tracer that I previously worked on, to create a lighting shader. The basic components of a Phong shader are ambient lighting, diffuse lighting, specular lighting which when added together give the total color of a fragment.

Ambient light is a simplistic representation of global illumination. It assumes that all objects whether or not they receive direct lighting have some light rays reaching it and therefore is not completely black. We give all surfaces a default value.

vec3 ambient = light.ambient * vec3(texture(material.texture_diffuse1, TexCoords));

Diffuse lighting represents light that is directly hitting the surface. To find the diffuse lighting value we multiple the surface color with a diffuse influence value. If a light source is perpendicular to a surface than the diffuse influence is 1 meaning it fully contributes to the final color. If the light source is parallel to the surface, meaning it forms a 90 degree angle, or greater, with the surface normal, then the diffuse influence is 0, meaning diffuse lighting contributes nothing to the final color.

float diff = max(dot(normal,lightDir), 0.0);
vec3 diffuse = light.diffuse * (diff * vec3(texture(material.texture_diffuse1, TexCoords)));

Finally, specular lighting represents reflected light. The specular lighting value is result of the specular color of a surface multiplied by the specular influence. I

float spec = pow(max(dot(viewDir, reflectDir), 0.0), material.shininess);
vec3 specular = light.specular * (spec * vec3(texture(material.texture_specular1, TexCoords)));

Below is a visual representation of how the specular influence value is determined.

Light Casters

In this book there are three different types of light sources, also known as light casters.

The first is a directional light, which simulates light from a source a distant source. When a light source is infinitely far away the light rays are parallel to one other. Image the sun as a directional light. Instead of specifying the position of light and finding the directional vector between the light position and fragment position, we specifying a light direction and all lighting is calculated with the same light direction vector regardless of the fragments position.

Next, is a point light which is the light caster we have been discussing in the previous section. We add an import feature to our previous point light implementation. We add light falloff. By multiplying all our light values, ambient, diffuse, and specular, by a quadratic equation the intensity of the light diminishes as objects get further away from the light.

Finally, there is a spot light which is essentially a point light that only cast light within a cone instead of in all directions. This is accomplished by specifying both a light position and direction. Then we find if a given light vector falls within a cone angle. If the angle between the light vector and spot light direction is less than the cone angle than the light contributes to the diffuse light. To get a nice falloff on the cone we specify both and inner and outer cone angle. The closer the light vector-spot light direction angle is to the outer angle the less the light contributes to the diffuse light color.

Multiple Light

Finally, most scenes have more than one light. To add multiple lights to a scene we encapsulate lighting functions for each light caster then add the return value of each light to the total color of a fragment.

Modeling Loading

Up until now we have been working only with cubes since it easy to manually input their vertex data. However, if we want to work with more complex geometry we have to import them from outside modeling software. This turns out to be a much more complex process than I anticipate. The book uses an library called assimp that handles the more detailed aspects of asset importing.

Assimp takes model data from a bunch of different file formats, FBX, OBJ, ABC and converts it to a standard format. Instead of having to an import class for each different file format, with the help of assimp, we can create a single import class and import models from several file formats.

We create two new classes for importing models, the model and mesh classes. Each model can be composed of several mesh. For instance, a car model could be composed of tires, window, and seat models. Utilizing a recursive structure we iterate through a model until all sub-meshes have been imported. As we load mesh data we also pass texture files to the fragment shader. This was a really tricky bit that, to be honest, is not something I completely understood. However, basically by using a consistent naming structure images files can be assigned to texture units and passed to the fragment shader as uniforms.

Finally, I applied the lighting techniques I learned about in the last section to the imported models to get the scene below.

Conclusion

At this point I can import models, create interactive lighting, and move a camera around the scene. In essence I have a very boring game. While this doesn’t sound impressive I really learned a lot to get to this point. There are still 4 major sections in the book I need to read, but I’ve been excited by what I have learned so far. Next, I will be learning about more advanced OpenGL concepts like anti-aliasing. The section that I am looking forward to is the PBR lighting section. Hopefully, I can write another update soon.