Data Manipulation

In order to get anything done, we need some way to store and manipulate data. Generally, there are two important things we need to do with data: (i) acquire them; and (ii) process them once they are inside the computer. There is no point in acquiring data without some way to store it, so to start, let's get our hands dirty with $n$-dimensional arrays, which we also call tensors.

If you already know the NumPy scientific computing package, this will be a breeze. For all modern deep learning frameworks, the tensor class (`ndarray` in MXNet, `Tensor` in PyTorch and TensorFlow) resembles NumPy's `ndarray`, with a few killer features added. First, the tensor class supports automatic differentiation. Second, it leverages GPUs to accelerate numerical computation, whereas NumPy only runs on CPUs. These properties make neural networks both easy to code and fast to run.

Getting Started

To start, we import the `np` (`numpy`) and `npx` (`numpy_extension`) modules from MXNet. Here, the `np` module includes functions supported by NumPy, while the `npx` module contains a set of extensions developed to empower deep learning within a NumPy-like environment. When using tensors, we almost always invoke the `set_np` function: this is for compatibility of tensor processing by other components of MXNet.

To start, we import the PyTorch library. Note that the package name is `torch`.

To start, we import `tensorflow`. For brevity, practitioners often assign the alias `tf`.

from mxnet import np, npx
npx.set_np()

import torch

import tensorflow as tf

import jax
from jax import numpy as jnp

A tensor represents a (possibly multidimensional) array of numerical values. In the one-dimensional case, i.e., when only one axis is needed for the data, a tensor is called a vector. With two axes, a tensor is called a matrix. With $k > 2$ axes, we drop the specialized names and just refer to the object as a $k^\textrm{th}$-order tensor.

MXNet, PyTorch, and TensorFlow provide a variety of functions for creating new tensors prepopulated with values. For example, by invoking `arange(n)`, we can create a vector of evenly spaced values, starting at 0 (included) and ending at `n` (not included). By default, the interval size is $1$. Unless otherwise specified, new tensors are stored in main memory and designated for CPU-based computation.

# MXNet
x = np.arange(12)
x

# PyTorch
x = torch.arange(12, dtype=torch.float32)
x

# TensorFlow
x = tf.range(12, dtype=tf.float32)
x

# JAX
x = jnp.arange(12)
x

Each of these values is called an element of the tensor. The tensor `x` contains 12 elements. We can inspect the total number of elements in a tensor via its `size` attribute (MXNet), `numel` method (PyTorch), or `tf.size` function (TensorFlow).

x.size      # MXNet
x.numel()   # PyTorch
tf.size(x)  # TensorFlow

We can access a tensor's shape (the length along each axis) by inspecting its `shape` attribute. Because we are dealing with a vector here, the `shape` contains just a single element and is identical to the size.

x.shape

We can change the shape of a tensor without altering its size or values, by invoking `reshape`. For example, we can transform our vector `x` whose shape is $(12,)$ to a matrix $X$ with shape $(3, 4)$. This new tensor retains all elements but reconfigures them into a matrix. Notice that the elements of our vector are laid out one row at a time and thus $x[3] == X[0, 3]$.

X = x.reshape(3, 4)
X

Note that specifying every shape component to `reshape` is redundant. Because we already know our tensor's size, we can work out one component of the shape given the rest. For example, given a tensor of size $n$ and target shape $(h, w)$, we know that $w = n/h$. To automatically infer one component of the shape, we can place a $-1$ for the shape component that should be inferred automatically. In our case, instead of calling `x.reshape(3, 4)`, we could have equivalently called `x.reshape(-1, 4)` or `x.reshape(3, -1)`.

Practitioners often need to work with tensors initialized to contain all 0s or 1s. We can construct a tensor with all elements set to 0 and a shape of $(2, 3, 4)$ via the `zeros` function.

np.zeros((2, 3, 4))    # MXNet
torch.zeros((2, 3, 4)) # PyTorch
tf.zeros((2, 3, 4))    # TensorFlow
jnp.zeros((2, 3, 4))   # JAX

Similarly, we can create a tensor with all 1s by invoking `ones`.

We often wish to sample each element randomly (and independently) from a given probability distribution. For example, the parameters of neural networks are often initialized randomly. The following snippet creates a tensor with elements drawn from a standard Gaussian (normal) distribution with mean 0 and standard deviation 1.

torch.randn(3, 4)

Finally, we can construct tensors by supplying the exact values for each element by supplying (possibly nested) Python list(s) containing numerical literals.

torch.tensor([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])

Indexing and Slicing

As with Python lists, we can access tensor elements by indexing (starting with 0). To access an element based on its position relative to the end of the list, we can use negative indexing. Slicing can be done via `X[start:stop]`, where the returned value includes the first index (`start`) but not the last (`stop`).

X[-1], X[1:3]

`Tensors` in TensorFlow are immutable. `Variables` are mutable containers that support assignments.

X[1, 2] = 17 # PyTorch / MXNet

Operations

Among the most useful operations are elementwise operations. In mathematical notation, we denote unary scalar operators by the signature $f: \mathbb{R} \rightarrow \mathbb{R}$. Most standard operators, including $e^x$, can be applied elementwise.

torch.exp(x)

Likewise, we denote binary scalar operators via the signature $f: \mathbb{R}, \mathbb{R} \rightarrow \mathbb{R}$. Given any two vectors $\mathbf{u}$ and $\mathbf{v}$ of the same shape, we can produce a vector $\mathbf{c} = F(\mathbf{u}, \mathbf{v})$ by setting $c_i \gets f(u_i, v_i)$ for all $i$.

x = torch.tensor([1.0, 2, 4, 8])
y = torch.tensor([2, 2, 2, 2])
x + y, x - y, x * y, x / y, x ** y

We can also concatenate multiple tensors, stacking them end-to-end to form a larger one.

torch.cat((X, Y), dim=0), torch.cat((X, Y), dim=1)

Summing all the elements in the tensor yields a tensor with only one element.

X.sum()

Broadcasting

Even when shapes differ, we can still perform elementwise binary operations by invoking the broadcasting mechanism. Broadcasting works according to the following two-step procedure: (i) expand one or both arrays by copying elements along axes with length 1 so that the two tensors have the same shape; (ii) perform an elementwise operation.

a = torch.arange(3).reshape((3, 1))
b = torch.arange(2).reshape((1, 2))
a + b

Saving Memory

Running operations can cause new memory to be allocated. We can demonstrate this with Python's `id()` function.

before = id(Y)
Y = Y + X
id(Y) == before # False

We can perform updates in place using slice notation: `Y[:] = <expression>` or `Y += X`.

Z = torch.zeros_like(Y)
Z[:] = X + Y

Conversion to Other Python Objects

Converting to a NumPy tensor (`ndarray`), or vice versa, is easy. In PyTorch, the converted result shares memory if on CPU.

A = X.numpy()
B = torch.from_numpy(A)

To convert a size-1 tensor to a Python scalar, we can invoke the `item` function.

a = torch.tensor([3.5])
a.item(), float(a), int(a)

Summary

The tensor class is the main interface for storing and manipulating data in deep learning libraries. Tensors provide a variety of functionalities including construction routines; indexing and slicing; basic mathematics operations; broadcasting; memory-efficient assignment; and conversion to and from other Python objects.

Exercises

1. Run the code in this section. Change the conditional statement `X == Y` to `X < Y` or `X > Y`.
2. Replace the two tensors in the broadcasting mechanism with 3-dimensional tensors.

[Discussions](https://discuss.d2l.ai/t/27) [Notebook](https://colab.research.google.com/github/d2l-ai/d2l-pytorch-colab/blob/master/chapter_preliminaries/ndarray.ipynb)

Curs AI

Table of Contents