• hello@databyheart.com
  • Mon - Fri: 8:00 - 18:00
Share on facebook
Share on google
Share on twitter
Share on linkedin

In the blog post Starting Data Science / Machine Learning from 0 I mentioned numpy as the first library I want to look at. For most users numpy is quite known, but starting out everything is unknown – so is numpy. I wrote all the code for the introduction into two notebooks, which can be found on my github. In these notebooks you can find more code than on this page, so don’t skip them:

Resources

What is numpy?

Numpy is basically a math library. You can easily create matrices, vectors and store them efficiently. It is the starting point to understand your data, which is the main goal. The library contains many functions to analyse and change the data to your needs. To get started you need to import the numpy library. 
import numpy as np 

Store data

The first thing, after importing numpy, is creating an array. An array is basically a list, but much better. You can do math operations and element wise calculations directly on the array. 
That small line of code creates an array with the numbers 0 to 7.
np.array([0,1,2,3,4,5,6,7]) 

Writing the numbers 0 to 7 is manageable, but doing it with bigger arrays takes too much time. The easiest way to create large arrays is by telling numpy to create an array with the values between two numbers. This array includes the 0, but excludes the 20. You can leave out the 0, which is the default value.

a = np.arange(0,20) 

This way we only create one dimensional arrays, but often we need much bigger, random matrices. You just can give np.array() data, which already has the shape you want or tell numpy to create an array with a specific shape.

b = np.array([np.arange(10)]*3)
# or
b = np.random.random_sample((10,2)) 

Indexing data

Now that I created arrays, I want to inspect them. Sometimes I just want a specific part of the data to do some further analysis. Arrays are quite similar to lists, so you can pick parts the same way. They start with 0 as well, so always subtract 1 from the position (1st element has the 0 index). By the way, you can use negative indexing as well to get the element from the end.
a[0] # first element
a[-1] # last element 

If you have a two dimensional array and want to get the first element of the the first row you have two options. Either do double square brackets or one square bracket with a row and column separated by a comma. By the way, if you have 3 or more dimensions, the order in the square brackets is equal to the order of the dimension.

b[0][0] # double brackets

b[0, 0] # [row, column]

z[0, 0, 0] # [first dimension, second dimension, third dimension, ...] 

Numpy Operations

Numpy includes a lot of mathematical functions / options to calculate with arrays. Adding or subtracting from each value is no problem. The same goes for multiplying or dividing by a number. By the way, you often can intuitively calculate with numpy arrays.
np.add(a,1) # adds one to each element
a+1 # adds one to each element 

These are very simple functions, but there are way more than I usually need. For example you can find the remainder or sinus. 

To find more useful functions, look at the documentation. There are different functions for nearly everything:

  • math operations
  • trigonometric functions
  • bit twiddling functions
  • comparison functions 
  • floating functions

Resource

Tasks

Now you know a little bit about numpy. The next step is to do a lot of exercise to memorise numpy and its function. I split the tutorial into two parts, one for introduction and one for taks. The next part contains a lot of practical exercise, but for now use the current jupyter notebook training on my github. A little teaser for the next chapter:
# Plot the whole sinus curve.
import matplotlib.pyplot as plt
sinus = np.sin(np.arange(0,6.5,0.1))
plt.plot(sinus) 

Table of Contents

Related Posts