Environment variables vs shell variables

In the previous post, I introduced the idea strings are the primary data structure in command-line shells, and that variables in shells simply substitute the variable with the value assigned when they are used. Simple enough. However, most people who are just trying to configure their shells for working with different programs have more experience with environment variables. Environment variables are declared with the export keyword in POSIX-complaint shells, and typically declared like

export some_variable="some value"

or the equivalent

# this line can also go after the assignment
export some_variable
some_variable="some variable"

As a reminder, "regular" shell variables are declared without the export keyword. As I'll see, while they have similar syntax in the shell, shell variables and environment variables are fundamentally different things.

Environment variables can be use similarly to shell variables within a running shell instance: uses of the are substituted with the string they are assigned. You cannot tell by looking at a referenced variable whether it was a shell variable or an environment variable:

$shell_variable="this is a shell variable"
export $env_variable="this is an environment variable"

echo "$shell_variable, and $env_variable"
# prints:
# this is a shell variable, and this is an environment variable

However, there are two key differences between environment variables and shell variables.

Environment variables are an operating system feature, not a feature specific to shells.
Environment variables are copied and shared when one program (the "parent" process) executes another program (the "child" process)

Environment variables are an operating system feature

It's a common mistake to think of environment variables as something controlled by the shell. In fact, in Unix-like operating systems, any program can set and get environment variables, not just shells. Your favorite programming language has a module in the standard library with functions to set and get environment variables for the currently running program. For instance, this module is named os in Python, Node.js, Go, and others. Here's a small program in Go that would set and get an environment variable:

package main

import "os"

func main() {
	os.Setenv("env_var_name", "some value")
	envVar := os.Getenv("env_var_name")
	println(envVar) // will print 'some value'
}

We might think environment variables have something to do with shells because we will typically run programs from our shells shell (like, go run env_var.go), but there is no relationship. This program in Go could be compiled and run by a scheduler (like cron) or an init system (like launchd or systemd) with no shell, and it would work the same.

Environment variables are shared between parent and child processes

Variables defined in the parent program (like a shell) will be available to child programs executed by the program.¹

This simple Python program (in a file named env_var.py) prints the value of the variable foo, but does not define it.

import os

# str(..) to convert None to a string in the case the environment
# is not defined
print("foo is '" + str(os.getenv("foo")) +"'")

If we were to run env_var.py in a shell without defining the environment variable foo, this is what we see:

$ python env_var.py
Foo is 'None'

If we define the shell variable foo, the value will not be available to the python child process because this is not an environment variable. However, it is available for substitution for other commands within the same shell:

$ foo="magick"
$ python env_var.py
Foo is 'None'
$ echo "I see foo as '$foo'"
I see foo as 'magick'

("But isn't this showing that $foo is available to echo?" Nope. The shell process is the one that replaces $foo with the value of foo in the string, not echo.)

We define the environment variable and see that the value is shared with the python child process:

$ foo="magick"
$ python env_var.py
Foo is 'magick'
$ echo "I see foo as '$foo'"
I see foo as 'magick'

In most shells, it is possible to define an environment variable for a child process at the same time the program is executed by adding the variable to the beginning of the command. In this mode, no export is needed and the variable is not defined further in the shell; it is only available to the child process executed as an environment variable. This can be useful for one-offs and overrides when executing a command, and does not set any variables in your current shell. The variable lives only as long as the command.

$ foo="magick" python env_var.py
Foo is 'magick'
$ echo "I see foo as '$foo'"
I see foo as ''

Setting an environment variable in the child does not affect the parent. If the parent process assigns a value to the variable and the child process re-assigns it, the variable in the parent will keep the same value. If we change the python program to modify the environment variable, we see that it is unchanged after the program exits and returns to the shell.

import os

if os.getenv("foo") is not None:
	# python uses the os.environ dictionary to set environment variables
	os.environ["foo"] = "deep " + str(os.getenv("foo"))

print("foo is '" + str(os.getenv("foo")) +"'")

Running the program does not change the value of the variable afterwards.

$ export foo="magick"
$ python env_var.py
Foo is 'deep magick'
$ echo "I see foo as '$foo'"
I see foo as 'magick'

Environment variables are not secure

A quick but important digression.

It's not uncommon to encounter programs that will instruct you to provide passwords, API keys, and other secure configuration data by setting environment variables in the shell. However, environment variables are not secure. Environment variables have no access controls; once they have been set in a shell, all child processes—including a child-of-a-child process—will have access to them. This means if the software you're providing the password to has a security flaw, and it is possible for the attacker to have your program execute a child program, the malicious code can obtain passwords set in environment variables. Secondly, in many Unix-like operating systems, environment variables are available to the operating system and may be stored on the file system. You can see the environment variables defined for all of your running processes by running the command ps eww (ww forces ps to print as wide as your terminal. Plus, it makes a good mnemonic: "eww! I can see all of my insecure environment variables!") In GNU/Linux OSes, environment variables are written to the file /proc/${Process ID of process/environ. This is immensely useful for debugging a running program to figure out what environment variables are set for it, but means anyone with adequate permissions could read secrets stored in environment variables.

There are certain protections to prevent regular users from reading each other's environment variables with ps and the file in /proc, but nothing stops a nosey super-user (or someone who has gained access) from nabbing a password on a shared system.

Wrap up

Environment variables are one of those things that are simple but for which you're often "too afraid to ask". After understanding that environment variables are not exclusive to shells and how environment variables are shared between parent and child processes, I found that I was a lot less confused when working with environment variables—especially the all-important PATH environment variable, which I'll talk about next time.

This is actually done by a "fork-exec", rather than directly executing the program. A program executing another program will first issue the system-call fork() to create a child process that's a copy of itself. This child will then the exec() system-call to replace itself with the program that it's executing.↩