# Foundations of Machine Learning - Session 01

- *Course*: Foundations of Machine Learning
- *Session*: 01
- *Unit*: Numpy

[Numpy](http://www.numpy.org/) is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays.

In [1]:
import numpy as np
np.__version__

'1.23.2'

## Arrays

A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.

We can initialize numpy arrays from nested Python lists, and access elements using square brackets:

In [2]:
# Create an 1D-array from a list
arr = np.array([1,2,3])
arr 

array([1, 2, 3])

In [3]:
# Change a specified element of the 1D-array
arr[0] = 5
arr

array([5, 2, 3])

In [4]:
# Create a 2D-array from a list of lists
arr = np.array([[1,2,3],[4,5,6]])
arr

array([[1, 2, 3],
       [4, 5, 6]])

Different kinds of information about an array is available. Note that the data-type is inferred automatically, or can be set explicitly.

In [5]:
arr = np.array([[1,2,3],[4,5,6]], dtype="float64")
arr.shape, arr.dtype, arr.size

((2, 3), dtype('float64'), 6)

In [6]:
arr = arr.astype("float64")
arr.shape, arr.dtype, arr.size

((2, 3), dtype('float64'), 6)

Numpy provides many functions to create arrays.

`np.zeros` creates an array of the specified shape filled with all zeros:

In [7]:
np.zeros((2,2))

array([[0., 0.],
       [0., 0.]])

Similarly, `np.ones` creates an array filled with all ones:

In [8]:
np.ones((3,2), dtype="int8")

array([[1, 1],
       [1, 1],
       [1, 1]], dtype=int8)

To get values beside 0 and 1, you can use `np.full`, specifying the value that you want the array to be filled with:

In [9]:
np.full((2,2), 1.0)

array([[1., 1.],
       [1., 1.]])

There are also more specialized array types.

`np.eye` creates an identity matrix ("eye-dentity"):

In [10]:
np.eye(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

`np.random.random` creates an array with random values:

In [11]:
# Create a 2x2-array filled with random values
np.random.random((3,3))

array([[0.27637682, 0.78071633, 0.95435774],
       [0.74918979, 0.75600911, 0.92160844],
       [0.47174457, 0.77522067, 0.72754253]])

## Arithmetic

Arrays allow performing parallel operations on data without having to use for loops. This is called "vectorization". For all arithmetic operations between arrays of the same size, the operation is applied element by elemen. All builtin python operators are supported.

In [12]:
arr = np.random.random((3,3))
arr

array([[0.57195973, 0.96688208, 0.82855902],
       [0.26466049, 0.34284802, 0.38153458],
       [0.1217582 , 0.83594922, 0.56010155]])

In [13]:
arr + 5

array([[5.57195973, 5.96688208, 5.82855902],
       [5.26466049, 5.34284802, 5.38153458],
       [5.1217582 , 5.83594922, 5.56010155]])

In [14]:
arr1 = np.random.random((4,4))
arr2 = np.random.random((4,4))
arr1 > arr2

array([[ True, False,  True, False],
       [ True, False,  True, False],
       [ True,  True, False,  True],
       [False,  True, False,  True]])

## Indexing and Slicing

Indexing is the selection of a subset or individual elements. Numpy offers several ways to index into arrays. In the simplest case, for 1D-arrays, indexing works like in Python lists:

In [15]:
arr = np.array([0,1,2,3,4,5])
arr[3]

3

For multi-dimensional indexing, dimensions are comma-separated:

In [16]:
arr = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
arr

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [17]:
arr[1,2]

7


Similar to Python lists, numpy arrays can be sliced. Since arrays may be multidimensional, you must specify a slice for each dimension of the array:

In [18]:
arr = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
arr

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [19]:
arr_s = arr[:, 1:3]#.copy()
arr_s

array([[ 2,  3],
       [ 6,  7],
       [10, 11]])

A slice of an array is a view into the same data, so modifying it will modify the original array: `arr_s[0, 0]` is the same as `arr[0, 1]`:

In [20]:
arr_s[0, 0] = 77
arr, arr_s

(array([[ 1, 77,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]]),
 array([[77,  3],
        [ 6,  7],
        [10, 11]]))

Numpy also supports boolean indexing:

In [21]:
mask = np.array([
    [True, False, True, False],
    [False, True, False, True],
    [True, False, True, False]
])

arr[mask]

array([ 1,  3,  6,  8,  9, 11])

## Reshaping & Transposition

Arrays can be reshaped into different views. We can use `.reshape` to return a 3x3 view of our 9 original values. Note that shapes need to be compatible: reshaping 9 values into a 4x2 view will not work.

In [22]:
arr = np.arange(9)
arr.reshape((3,3))

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

Arrays can be transposed using the `.transpose` function. It allows to swap axes of an array.

In [23]:
np.array([[1,2,3]]), np.array([[1,2,3]]).transpose((1,0))

(array([[1, 2, 3]]),
 array([[1],
        [2],
        [3]]))

Arrays can also be quickly transposed using the `.T` shorthand:

In [24]:
arr.reshape((3,3)).T

array([[0, 3, 6],
       [1, 4, 7],
       [2, 5, 8]])

## Universal Functions

Universal functions are vectorized functions that extend what is available through Python builting operators. There are currently more than 60 universal functions defined in numpy, so we will only look at a subset here. For a complete overview, refer to the [numpy documentation](https://numpy.org/doc/stable/reference/ufuncs.html).

In [25]:
arr = np.arange(6).reshape(2,3)
arr

array([[0, 1, 2],
       [3, 4, 5]])

One of the most common functions is `np.sum`.  It calculates the sum of all values in an array. You can supply an optional `axis` argument to calculate the sum only along a specified axis.

In [26]:
np.sum(arr)

15

In [27]:
np.sum(arr, axis=0)

array([3, 5, 7])

Mathematical operations such as `log` and `exp` and trigonometric functions are supported. Note that since the array contains a 0-value, `log` produces a warning:

In [28]:
np.log(arr)

  np.log(arr)


array([[      -inf, 0.        , 0.69314718],
       [1.09861229, 1.38629436, 1.60943791]])

In [29]:
np.exp(arr)

array([[  1.        ,   2.71828183,   7.3890561 ],
       [ 20.08553692,  54.59815003, 148.4131591 ]])

There are also ufuncs that take two arrays. Examples include `minimum` and `maximum` (note that these also exist in singular form, but are name `min` and `max` there to avoid confusion), and logical operators:

In [30]:
arr1 = np.random.randint(0,2,5)
arr2 = np.random.randint(0,2,5)
arr1, arr2

(array([0, 0, 0, 0, 0]), array([1, 1, 0, 0, 1]))

In [31]:
np.maximum(arr1, arr2)

array([1, 1, 0, 0, 1])

In [32]:
np.logical_and(arr1, arr2) 

array([False, False, False, False, False])

## Conditional Logic

Numpy provides `np.where` as vectorized version of `if` and `else`. A `where` statement has the form `(condition, data-if-true, data-if-false)`:

In [33]:
np.where(np.arange(6) > 2, 0 , 10)

array([10, 10, 10,  0,  0,  0])

This can be extended to take data from (equally shaped) arrays, and even another array as condition:

In [34]:
arr1 = np.arange(9).reshape((3,3))
arr2 = np.arange(9).reshape((3,3)) + 10

arr1, arr2

(array([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]]),
 array([[10, 11, 12],
        [13, 14, 15],
        [16, 17, 18]]))

In [35]:
mask = np.random.random((3,3)) > 0.5
mask

array([[ True, False, False],
       [ True, False,  True],
       [False, False,  True]])

In [36]:
np.where(mask, arr1, arr2)

array([[ 0, 11, 12],
       [ 3, 14,  5],
       [16, 17,  8]])

## Statistical methods

Numpy includes statistical operations on arrays, such as `mean`, `std`, and `var`:

In [37]:
arr = np.random.random(100_000)

np.mean(arr), np.var(arr)

(0.4984169070153069, 0.0833095314833551)

## Quantifiers

An array has no well-defined truth value, as conditions are evaluated element-wise. However, Numpy provides `np.all()` and `np.any()` to implement quantifiers ($\forall, \exists$) to check if the conditions applies for all elements or for at least one element:




In [38]:
np.all(np.arange(10) > 5)

False

In [39]:
np.any(np.arange(10) > 5)

True

## Sorting

Similar to python lists, numpy arrays can be either sorted in-place, or return a sorted view:

In [40]:
arr = np.random.random(10)
# Inplace
arr.sort()
# Returning view
np.sort(arr)

array([0.09332174, 0.09867704, 0.13856183, 0.20507908, 0.50070437,
       0.77026047, 0.77169204, 0.88993967, 0.91530329, 0.93708805])

Sorting can also be done along an axis:

In [41]:
np.sort(np.random.random((3,3)), axis=0)

array([[0.25084136, 0.27434974, 0.09638973],
       [0.8072073 , 0.31453919, 0.15275276],
       [0.99305094, 0.82717777, 0.87413788]])