Putting It All Together — Game Physics and Machine Learning

You have spent eight chapters building a toolkit. You have vectors and dot products, matrices and transformations, determinants, systems of equations, and eigenvectors. Each piece made sense on its own, but the real payoff is what happens when they work together.

This chapter contains two extended examples. The first walks a single vertex — one corner of a spaceship — all the way from its definition in a 3D model file through rotation, camera positioning, and perspective projection onto your screen. The second passes a tiny input through a two-layer neural network and shows, in concrete numbers, that every "layer" is just a matrix multiply followed by a nonlinear squeeze. By the end you will be able to look at real engine or framework code and recognise the linear algebra underneath.

Part 1 — The 3D Rendering Pipeline

The Problem of Coordinate Spaces

A 3D model is just a list of vertices — points in space relative to the model's own origin. A spaceship's nose might sit at in the file, meaning "3 units forward from the ship's centre." But the game world has its own origin, the camera has its own perspective, and the screen is a flat 2D rectangle. Getting from model file to pixel requires five different coordinate systems and a matrix for each transition between them.[^1]

Object space  -->  World space  -->  View space  -->  Clip space  -->  Screen
              [M]              [V]              [P]           [÷w]

Each arrow is a matrix multiplication. The whole pipeline collapses to a single equation:

Right-to-left, as always with matrix products: model first, then view, then projection.[^1]

Homogeneous Coordinates — Why 4D?

Before we can build those matrices, there is a small bookkeeping issue. Translation — shifting a point by a fixed amount — is not a linear operation. You cannot express "move everything 5 units to the right" as a matrix multiply; you can only add a vector. But adding is inconvenient when you want to chain ten transforms together with a single multiply.

The fix is to represent every 3D point as a 4D homogeneous coordinate:

The fourth component for ordinary points. With this trick, a translation by becomes a perfectly ordinary matrix multiply:

Now every transform — translate, rotate, scale — is a matrix, and composing transforms is just matrix multiplication.[^2]

The Model Matrix — Placing the Ship in the World

The model matrix transforms vertices from the object's local coordinate system into the shared world coordinate system. It encodes three things:

Scale — how big the object is
Rotation — which way it is facing
Translation — where in the world it sits

These compose as (scale first, then rotate, then translate — because matrix multiplication is right-to-left).[^3]

In Unity, this is exactly Matrix4x4.TRS(position, rotation, scale).[^4]

Let's say our spaceship sits at world position , is rotated around the -axis, and has not been scaled. The model matrix is:

The nose of the ship, at in local space, gets multiplied by to land at its world-space position. Every vertex of the ship goes through the same matrix — the whole model moves as a unit.

Chapter 4 in action

This is exactly the transformation composition from Chapter 4. Rotation is a linear transformation, translation is an affine one, and stacking them as matrices turns both into matrix multiplication. The identity matrix leaves the ship untouched; the inverse moves it back.

The View Matrix — Moving the World to Face the Camera

There is no actual camera in 3D graphics. Instead, the view matrix moves the entire world so that the camera sits at the origin and looks down the negative -axis.[^2]

Mathematically, is the inverse of the camera's own model matrix. If the camera is at position and oriented with rotation , then:

For rotation matrices, (from Chapter 4 — orthogonal matrices have this convenient property), so the view matrix is cheap to compute. Applying to every world- space vertex repositions them as if seen through the camera's eye.

Einstein's principle, applied

The camera "moving forward" is mathematically identical to the world "moving backward." The view matrix applies the world-moving version because it is easier to keep the camera fixed and transform geometry.

The Projection Matrix — Flattening 3D to 2D

The projection matrix is the most visually dramatic step. It encodes the camera's viewing frustum — the pyramid-shaped volume of space that is visible — and maps it to a standard cube called clip space.[^1]

For perspective projection, the matrix also achieves the "things far away look smaller" effect by manipulating the component. A point at (far away) will emerge from the multiply with a larger than a point at (close). When the GPU divides all components by (the perspective divide), the far point gets squished toward the centre of the screen.

The perspective projection matrix, given field-of-view , aspect ratio , near plane , and far plane :

Notice the bottom row: it writes into the slot, setting up the perspective divide. After multiplying a view-space point by , the GPU divides the result by to get normalized device coordinates (NDC) in the range . Anything outside that cube is clipped — never drawn.[^2]

Putting It Together

Here is what the GLSL vertex shader in an OpenGL program actually looks like:

glsl

// vertex shader
uniform mat4 model;
uniform mat4 view;
uniform mat4 projection;

in vec3 aPos;  // vertex position from the model file

void main() {
    gl_Position = projection * view * model * vec4(aPos, 1.0);
}

Four lines. One matrix multiplication. The entire pipeline — object placement, camera simulation, perspective — collapses to .[^5]

And because matrix multiplication is associative, the engine can precompute $MVP = P \cdot V \cdot M$ once per object per frame, then apply the single combined matrix to every vertex. That is why game engines ship mvpMatrix as a single uniform — not three separate ones.

glsl

// equivalent, and faster for many vertices
uniform mat4 mvp;
void main() {
    gl_Position = mvp * vec4(aPos, 1.0);
}

Collision Detection: The Dot Product Returns

Once vertices are in world space (after applying but before or ), the physics engine does its work. Collision detection uses every concept from the early chapters:

Dot product (Chapter 3): check if two bounding-sphere centres are closer than the sum of their radii; use the dot product to project a velocity onto a surface normal and compute the component of force perpendicular to the surface.
Linear systems (Chapter 6): solve for the exact moment when two swept shapes intersect.
Determinants / cross products (Chapter 7): compute the surface normal of a triangle from its two edge vectors; determine which side of a plane a point is on.

The rendering pipeline and the physics pipeline share the same vertices, the same matrices, and the same linear algebra. They are running in parallel, every frame, sixty times a second.

Part 2 — A Neural Network Forward Pass

A Small but Complete Network

Let's build the smallest neural network that actually does something interesting: a binary classifier that takes a 2D point and outputs a number between 0 and 1 representing "how confident are we that this point is inside the unit circle?"

The architecture:

Input layer     Hidden layer    Output layer
(2 neurons)     (4 neurons)     (1 neuron)

   x1 ----+---- h1 ----+
           |---- h2     |---- y  (probability)
   x2 ----+---- h3 ----+
           |---- h4

Two inputs, four hidden neurons, one output. Small enough to trace by hand, representative enough to show all the important ideas.

Layer 1 — Linear Transform Plus Nonlinearity

Every fully connected layer does two things: a linear transformation (matrix multiply plus bias), and a nonlinear activation.[^6]

For the first layer, the weight matrix has shape (four output neurons, two inputs), and the bias vector has shape :

Let's feed in the point (which lies on the unit circle, so the correct answer is roughly 1.0).

Now apply the ReLU activation — zero out anything negative:

Why the nonlinearity?

Without an activation function, stacking two linear layers is the same as having one linear layer — you can always merge into a single affine map. The nonlinearity (ReLU, sigmoid, tanh) breaks linearity so the network can learn curved decision boundaries.

Layer 2 — Output Layer

The second (and final) weight matrix has shape (one output, four inputs from the hidden layer), and bias is a scalar:

Apply the sigmoid activation to squash to :

The network outputs — roughly 50/50. These are randomly chosen weights, so this is expected. After training (adjusting , , , via gradient descent), the output for a point on the unit circle would be pushed toward 1.0.

Writing It Out as Pure Matrix Math

Here is the full forward pass, symbolically:

That is the entire forward pass of this network. Unwrap the notation and it is:

One matrix–vector multiply:
One vector addition:
One elementwise nonlinearity:
One matrix–vector multiply:
One scalar addition:
One scalar nonlinearity:

Steps 1–2 and 4–5 are pure linear algebra. Steps 3 and 6 are simple elementwise operations — no matrix math, just applying a function to each number independently. The entire intelligence of the network — its ability to classify — lives in the learned values of , , , and .

The Same Thing in PyTorch

Here is the identical network written in PyTorch:

python

import torch
import torch.nn as nn

class CircleClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(2, 4)   # W1 is 4×2, b1 is 4×1
        self.layer2 = nn.Linear(4, 1)   # W2 is 1×4, b2 is 1×1

    def forward(self, x):
        z1 = self.layer1(x)             # z1 = W1 x + b1
        a1 = torch.relu(z1)             # a1 = ReLU(z1)
        z2 = self.layer2(a1)            # z2 = W2 a1 + b2
        return torch.sigmoid(z2)        # ŷ = σ(z2)

nn.Linear(in, out) stores a weight matrix of shape (out, in) and a bias vector of shape (out,). Its forward method computes x @ W.T + b, which is exactly $W\mathbf{x} + \mathbf{b}$.[^7] Every line of the forward method maps directly to one of the six steps above.

A real network for image classification (say, recognising handwritten digits) uses the same structure — just larger matrices. The first layer might be (one input for each pixel of a image), followed by several hidden layers, followed by a $512 \times 10$ output layer (one score per digit class). The forward pass is still a chain of matrix multiplies and nonlinearities. The math is identical.

What about convolutional networks?

Convolutional layers are a special case of linear operations — they apply the same small weight matrix (the kernel) to every local patch of the input. In the limit, they can be written as a very large sparse matrix multiply. The underlying linear algebra is the same.

Now You Can Read This

Look at these two real code snippets. Both should feel familiar.

From a Unity vertex shader (HLSL):

hlsl

float4 vert(float4 vertex : POSITION) : SV_POSITION {
    return UnityObjectToClipPos(vertex);
}

UnityObjectToClipPos is exactly — the combined model, view, and projection matrices, precomputed and stored in unity_MatrixMVP, applied to a homogeneous vertex.[^4]

From a PyTorch model summary:

Layer (type)    Output Shape    Param #
Linear-1        [1, 128]        100,480   ← 784×128 weights + 128 biases
ReLU-1          [1, 128]              0
Linear-2        [1, 64]           8,256   ← 128×64 weights + 64 biases
ReLU-2          [1, 64]               0
Linear-3        [1, 10]             650   ← 64×10 weights + 10 biases

Each Linear layer is a matrix multiply. The Param # column is the number of entries in plus . The Output Shape is the shape of .

You now have the vocabulary to read both of these without guessing.

Course Recap

Eight chapters. Here is what you have built:

Vectors (Ch. 2) are the atoms — points, directions, velocities. Everything is a vector somewhere.
Dot products (Ch. 3) measure alignment — lighting, line-of-sight, collision normals, and the inner product at the heart of every neural network layer.
Matrices (Ch. 4) are transformation machines — they rotate, scale, and translate, and they compose by multiplication.
Linear transformations (Ch. 5) are what matrices mean geometrically — the columns tell you where the basis vectors land.
Systems of equations (Ch. 6) let you solve for unknowns — physics constraints, interpolation, inverse kinematics, model fitting.
Determinants (Ch. 7) measure area and volume, signal invertibility, and produce surface normals via the cross product.
Eigenvalues and eigenvectors (Ch. 8) reveal the skeleton of a transformation — the axes that don't rotate, the directions of maximum variance, the natural frequencies of a system.
The pipeline (this chapter) is where it all runs together — the MVP transform that puts geometry on screen, the forward pass that makes a network predict.

None of this was abstract. Every concept appeared in a real system that developers build and maintain every day. You now have a working map of the territory. When you open a game engine's source, read a paper on neural architecture search, or debug a physics simulation that is behaving strangely, you will know which tool to reach for.

The Further Reading appendix points to resources for going deeper. There is a lot of beautiful mathematics ahead — and you now have the foundation to enjoy it.

References

[^1]: "Coordinate Systems." LearnOpenGL. https://learnopengl.com/Getting-started/Coordinate-Systems

[^2]: "WebGL Model View Projection." MDN Web Docs. https://developer.mozilla.org/en-US/docs/Web/API/WebGL_API/WebGL_model_view_projection

[^3]: "Model View Projection." Jordan Santell. https://jsantell.com/model-view-projection/

[^4]: "Matrix4x4.TRS." Unity Scripting API. https://docs.unity3d.com/ScriptReference/Matrix4x4.TRS.html

[^5]: "GLSL Programming/Unity/Shading in World Space." Wikibooks. https://en.wikibooks.org/wiki/GLSL_Programming/Unity/Shading_in_World_Space

[^6]: "The Math behind Neural Networks — Forward Propagation." Jason Osajima. https://www.jasonosajima.com/forwardprop

[^7]: "Build the Neural Network." PyTorch Tutorials. https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html

Putting It All Together — Game Physics and Machine Learning ​

Part 1 — The 3D Rendering Pipeline ​

The Problem of Coordinate Spaces ​

Homogeneous Coordinates — Why 4D? ​

The Model Matrix — Placing the Ship in the World ​

The View Matrix — Moving the World to Face the Camera ​

The Projection Matrix — Flattening 3D to 2D ​

Putting It Together ​

Collision Detection: The Dot Product Returns ​

Part 2 — A Neural Network Forward Pass ​

A Small but Complete Network ​

Layer 1 — Linear Transform Plus Nonlinearity ​

Layer 2 — Output Layer ​

Writing It Out as Pure Matrix Math ​

The Same Thing in PyTorch ​

Now You Can Read This ​

Course Recap ​

References ​