Categories

Posts in this category

Sun, 12 Mar 2017

What's a Variable, Exactly?


Permanent link

When you learn programming, you typically first learn about basic expressions, like 2 * 21, and then the next topic is control structures or variables. (If you start with functional programming, maybe it takes you a bit longer to get to variables).

So, every programmer knows what a variable is, right?

Turns out, it might not be that easy.

Some people like to say that in ruby, everything is an object. Well, a variable isn't really an object. The same holds true for other languages.

But let's start from the bottom up. In a low-level programming language like C, a local variable is a name that the compiler knows, with a type attached. When the compiler generates code for the function that the variable is in, the name resolves to an address on the stack (unless the compiler optimizes the variable away entirely, or manages it through a CPU register).

So in C, the variable only exists as such while the compiler is running. When the compiler is finished, and the resulting executable runs, there might be some stack offset or memory location that corresponds to our understanding of the variable. (And there might be debugging symbols that allows some mapping back to the variable name, but that's really a special case).

In case of recursion, a local variable can exist once for each time the function is called.

Closures

In programming languages with closures, local variables can be referenced from inner functions. They can't generally live on the stack, because the reference keeps them alive. Consider this piece of Perl 6 code (though we could write the same in Javascript, Ruby, Perl 5, Python or most other dynamic languages):

sub outer() {
    my $x = 42;
    return sub inner() {
        say $x;
    }
}

my &callback = outer();
callback();

The outer function has a local (lexical) variable $x, and the inner function uses it. So once outer has finished running, there's still an indirect reference to the value stored in this variable.

They say you can solve any problem in computer science through another layer of indirection, and that's true for the implementation of closures. The &callback variable, which points to a closure, actually stores two pointers under the hood. One goes to the static byte code representation of the code, and the second goes to a run-time data structure called a lexical pad, or short lexpad. Each time you invoke the outer function, a new instance of the lexpad is created, and the closure points to the new instance, and the always the same static code.

But even in dynamic languages with closures, variables themselves don't need to be objects. If a language forbids the creation of variables at run time, the compiler knows what variables exist in each scope, and can for example map each of them to an array index, so the lexpad becomes a compact array, and an access to a variable becomes an indexing operation into that array. Lexpads generally live on the heap, and are garbage collected (or reference counted) just like other objects.

Lexpads are mostly performance optimizations. You could have separate runtime representations of each variable, but then you'd have to have an allocation for each variable in each function call you perform, whereas which are generally much slower than a single allocation of the lexpad.

The Plot Thickens

To summarize, a variable has a name, a scope, and in languages that support it, a type. Those are properties known to the compiler, but not necessarily present at run time. At run time, a variable typically resolves to a stack offset in low-level languages, or to an index into a lexpad in dynamic languages.

Even in languages that boldly claim that "everything is an object", a variable often isn't. The value inside a variable may be, but the variable itself typically not.

Perl 6 Intricacies

The things I've written above generalize pretty neatly to many programming languages. I am a Perl 6 developer, so I have some insight into how Perl 6 implements variables. If you don't resist, I'll share it with you :-).

Variables in Perl 6 typically come with one more level of indirection, we which call a container. This allows two types of write operations: assignment stores a value inside a container (which again might be referenced to by a variable), and binding, which places either a value or a container directly into variable.

Here's an example of assignment and binding in action:

my $x;
my $y;
# assignment:
$x = 42;
$y = 'a string';

say $x;     # => 42
say $y;     # => a string

# binding:
$x := $y;

# now $x and $y point to the same container, so that assigning to one
# changes the other:
$y = 21;
say $x;     # => 21

Why, I hear you cry?

There are three major reasons.

The first is that makes assignment something that's not special. For example in python, if you assign to anything other than a plain variable, the compiler translates it to some special method call (obj.attr = x to setattr(obj, 'attr', x), obj[idx] = x to a __setitem__ call etc.). In Perl 6, if you want to implement something you can assign to, you simply return a container from that expression, and then assignment works naturally.

For example an array is basically just a list in which the elements are containers. This makes @array[$index] = $value work without any special cases, and allows you to assign to the return value of methods, functions, or anything else you can think of, as long as the expression returns a container.

The second reason for having both binding and assignment is that it makes it pretty easy to make things read-only. If you bind a non-container into a variable, you can't assign to it anymore:

my $a := 42;
$a = "hordor";  # => Cannot assign to an immutable value

Perl 6 uses this mechanism to make function parameters read-only by default.

Likewise, returning from a function or method by default strips the container, which avoids accidental action-at-a-distance (though an is rw annotation can prevent that, if you really want it).

This automatic stripping of containers also makes expressions like $a + 2 work, independently of whether $a holds an integer directly, or a container that holds an integer. (In the implementation of Perl 6's core types, sometimes this has to be done manually. If you ever wondered what nqp::decont does in Rakudo's source code, that's what).

The third reason relates to types.

Perl 6 supports gradual typing, which means you can optionally annotate your variables (and other things) with types, and Perl 6 enforces them for you. It detects type errors at compile time where possible, and falls back to run-time checking types.

The type of a variable only applies to binding, but it inherits this type to its default container. And the container type is enforced at run time. You can observe this difference by binding a container with a different constraint:

my Any $x;
my Int $i;
$x := $i;
$x = "foo";     # => Type check failed in assignment to $i; expected Int but got Str ("foo")

Int is a subtype of Any, which is why the binding of $i to $x succeeds. Now $x and $i share a container that is type-constrained to Int, so assigning a string to it fails.

Did you notice how the error message mentions $i as the variable name, even though we've tried to assign to $x? The variable name in the error message is really a heuristic, which works often enough, but sometimes fails. The container that's shared between $x and $i has no idea which variable you used to access it, it just knows the name of the variable that created it, here $i.

Binding checks the variable type, not the container type, so this code doesn't complain:

my Any $x;
my Int $i;
$x := $i;
$x := "a string";

This distinction between variable type and container type might seem weird for scalar variables, but it really starts to make sense for arrays, hashes and other compound data structures that might want to enforce a type constraint on its elements:

sub f($x) {
    $x[0] = 7;
}
my Str @s;
f(@s);

This code declares an array whose element all must be of type Str (or subtypes thereof). When you pass it to a function, that function has no compile-time knowledge of the type. But since $x[0] returns a container with type constraint Str, assigning an integer to it can produce the error you expect from it.

Summary

Variables typically only exists as objects at compile time. At run time, they are just some memory location, either on the stack or in a lexical pad.

Perl 6 makes the understanding of the exact nature of variables a bit more involved by introducing a layer of containers between variables and values. This offers great flexibility when writing libraries that behaves like built-in classes, but comes with the burden of additional complexity.

[/perl-6] Permanent link