Skip to content
  1. 2.1
  2. 2.2
  3. 2.3
  4. 2.4
  5. 2.5
Step 1 of 6~8 min left

Chapter 2 · Lesson 2.1

Values and expressions

Before a GPT becomes a larger system, it is still built from values produced by operations. This lesson gives those values enough memory to become a graph.

A computation is not just the final number it returns. It is a trail of values produced by operations. If each value remembers how it was created, ordinary code becomes a computational graph.

We are starting with the smallest useful unit in this deep dive: a number, an operation, and the new value that operation creates.

Ordinary numbers are forgetful

Start with plain Python:

a = 2.0
b = -3.0
 
c = a * b
d = a + b
e = c + d
 
print(e)

Plain Python gives the correct answer, -7.0, but not any memory of how it was produced. The final float does not know that it came from c + d, that c came from a * b, or that d came from a + b.

For Python's float, the computation is over. That is fine if all we want is the final value. It is not enough if we want to ask deeper questions later, like:

  • which earlier values influenced this output?
  • which operation produced this intermediate value?
  • what structure did the forward pass create?

For those questions, the final number is too thin. We need values that remember where they came from.

A Value Is a Number With a Little Memory

Now replace those floats with Value objects:

a = Value(2.0)
b = Value(-3.0)

Value is a small custom object, not a built-in number. It stores the scalar in .data and can also carry fields like ._op and ._prev. That is why we can ask it for more than just its numeric result.

At first, Value is intentionally simple. It wraps a scalar number, so a.data is 2.0 and b.data is -3.0.

The point is not that Value is a better way to write an ordinary number. For ordinary arithmetic, it is worse: heavier, slower, and more cumbersome. Its value here is that it can carry bookkeeping next to the number.

For this lesson, the important fields are:

  • data: the actual scalar value
  • _op: the operation that created this value
  • _prev: the parent values this value depends on

For an input value like a, there is no operation yet because it did not come from another calculation inside our graph. It is a starting point. So conceptually:

a
data: 2.0
_op: ""
_prev: {}

The extra fields matter only once values start combining.

Operations Produce New Values

Now combine two values:

c = a * b

The result is not a raw float but another Value. Its scalar value is -6.0, exactly what ordinary multiplication would produce.

But c also remembers how it was created:

c
data: -6.0
_op: "*"
_prev: {a, b}

So c = a * b does two things at once: it computes the forward value, 2.0 * -3.0 = -6.0, and it records that c depends on a and b through multiplication. This is the beginning of the graph.

Expressions Build Graphs One Operation at a Time

Let us build the full expression:

a = Value(2.0)
b = Value(-3.0)
 
c = a * b
d = a + b
e = c + d

Read it slowly as graph construction. c = a * b creates a new value c with data -6.0, operation *, and parents a and b. d = a + b creates d with data -1.0, operation +, and the same parents. e = c + d creates e with data -7.0, operation +, and parents c and d.

By the time the code finishes, we have more than an answer: we have a record of how the answer was produced.

e
data: -7.0
_op: "+"
_prev: {c, d}

And because c and d also remember their parents, we can walk backward from e and recover the whole structure.

If you change the values of a and b, the numeric data changes but the dependency pattern does not. The same values still depend on the same parents.

Expressions Are Syntax; Graphs Are Structure

The expression:

e = a * b + (a + b)

and the staged version:

c = a * b
d = a + b
e = c + d

describe the same computation. The staged version is easier to inspect because every intermediate value has a name, but the underlying structure is the same: operations produce values, and those values point back to their parents.

The code is how we write the computation. The graph is the structure the computation creates. As expressions get larger, that structure becomes easier to reason about in graph form than in one long line of code.

Tiny checkpoint

Consider this code:

x = Value(4.0)
y = Value(10.0)
z = x + y
q = z * x

Answer before revealing the graph:

  1. Which line creates z?
  2. What is z.data?
  3. What is z._op?
  4. What are z._prev?
  5. Which line creates q?
  6. What are q._prev?
  7. Sketch or describe the direct parents of q before checking the answer.
Reveal answers
  1. z = x + y
  2. 14.0
  3. "+"
  4. {x, y}
  5. q = z * x
  6. {z, x}
  7. q points directly to z and x.

The important part is question 6. q depends on z and x directly. Since z also depends on x, the original value x influences q in more than one way. Do not compute that influence yet. Just notice the structure.