Python Data Processing: What Are NumPy ndarrays?

What are ndarrays and how should you use them?
By Boris Delovski • Updated on Nov 24, 2022

In this article, I will talk about ndarrays. To be more specific, I will cover what they are and what their characteristics are. Once I go over these topics, you will be ready to tackle more complex ones, such as selecting data using indexing and slicing, filtering data using boolean indexing, performing mathematical operations, and similar topics. Once I cover those more complex NumPy topics, you will be ready to move on to more advanced libraries for data processing such as the Pandas library.

 

What is an ndarray?

An ndarray is an n-dimensional array in which you can store data. These special multidimensional arrays are the basic building block of the NumPy package. They are:

  • flexible
  • space-efficient
  • fast

An ndarray is fundamentally different from other containers (e.g. lists) in Python because you can store homogenous and not heterogeneous data inside them. While a lot of the speed from using ndarrays comes from the fact that you are working with homogenous and not heterogeneous data, a part of the performance increase also stems from the difference in memory allocation. You use contiguous memory allocation for ndarrays, while containers such as lists use non-contiguous memory allocation. This makes arrays much more space-efficient. In terms of data processing, by using NumPy arrays you can avoid using loops when performing linear algebra and standard math operations. Python loops are known to be very inefficient when used for vectorized operations, so the ability to perform such operations without using loops is very useful and is one of the main reasons why NumPy is so popular.

 

 

How do you create ndarrays?

Before I demonstrate how to create an ndarray, I first must explain the importance of dimensionality. How you will create an ndarray is closely connected to how many dimensions an ndarray needs to have.  This importance is ingrained in the name of the data type itself: the word ndarray essentially means n-dimensional array. Ndarrays can have as many dimensions as you like. You can even create zero-dimensional ndarrays (which are technically scalars).

To create an ndarray, you use the array constructor from NumPy. This constructor allows you to create an ndarray that will contain whatever data you put inside the parentheses of that constructor. For example, if you want to create a scalar you can do so by entering a value in the following way:

# Create an ndarray that represents a scalar 
arr = np.array(5)

To create ndarrays that are not scalars, you need to input some type of data collection inside the parentheses of the array constructor. This tells NumPy that it should convert that data collection into an ndarray. You typically create ndarrays from lists or sometimes even tuples.

# Create an ndarray from a list  
arr_1 = np.array([1, 2, 3, 4])

# Create an ndarray from a tuple  
arr_2 = np.array((5, 6, 7, 8))

 

To create an ndarray that has more than one dimension, you need to input a nested collection into the parentheses of the array constructor. The shape of the ndarray will depend on how the data is actually nested. A standard example of a multidimensional ndarray looks like this:

# Create a multidimensional ndarray from a list
# that contains nested lists
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

Of course, you can use tuples instead of lists and you would get the same result. There is one reason why you should prefer using lists, and that is because NumPy allows you to convert any ndarray back into a list using the special .tolist() method, while an equivalent method does not exist for tuples. If you need to convert an array to a tuple, the easiest way to do so is to convert it to a list first, and then convert that list to a tuple.

# Create a multidimensional ndarray from a list
# that contains nested lists
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Convert the previously created ndarray
# back into a list
converted_arr = arr.tolist()

 

The array constructor also has a special argument to which you can assign a value to called the ndmin argument. This argument allows you to explicitly define the dimensionality of your ndarrays. For example, the following will create a two-dimensional ndarray:

# Create a two dimensional ndarray from a list
# using the ndmin argument
arr = np.array([11, 21, 31], ndmin=2)

Finally, when creating an ndarray, you can also specify the type of data you want to have in your ndarray. So even if you are creating an array from a list of integers, you can specify that you want your ndarray to contain those integers as strings.


# Create a two dimensional ndarray from a list of integers
# and define its data type using the dtype argument
arr = np.array([11, 21, 31], dtype=str)

 

 

What are the characteristics of ndarrays?

There are a few characteristics of ndarrays that you need to keep in mind when working with them, and those are:

  • dimensionality of the ndarray
  • size of the ndarray
  • shape of the ndarray
  • number of rows in the ndarray
  • type of data stored in the ndarray

You can access the aforementioned information by accessing different attributes of an ndarray. These attributes are very important because they play a big role in performing more complex operations. 

 

When working with ndarrays, you will not use the terms rows and columns that often, and will mostly stick to using the so-called axis names of the ndarray. Each array has two axes: axis 0 and axis 1. Axis 0 corresponds to rows and axis 1 corresponds to columns. These terms are not very important at this moment, but they will be very important once you will start talking about modifying data stored in ndarrays. 

Example of ndarray axes

Image Source: Edlitera

 

Dimensionality

The dimensionality of the previous ndarray is accessed by calling the ndim attribute and is returned to you as an integer. The result will be 2 because that is the dimensionality of this example ndarray, as can be seen by looking at the image above.

# Check the dimensionality
# of the ndarray
arr2.ndim

# The result will be 2

 

Ndarray size

The size of the ndarray is accessed by calling the size attribute, and it is returned to you as an integer. It tells you how many elements in total are inside your ndarray, so in this case, it will return 8 because that is how many elements are in this ndarray.

# Check the size
# of the ndarray
arr2.size

# The result will be 8

 

Ndarray shape

The shape of the ndarray is accessed by calling the shape attribute, and it is returned to you as a tuple. It tells you how many rows and columns you have in your ndarray, or to be more precise, how many values you have per axis. If you look at axis 0, you have 2 rows so the first member of the tuple will be 2, and if you look at axis 1, you have 4 columns so the second member of the tuple will be 4. The resulting tuple will therefore be (2, 4).

# Check the shape
# of the ndarray
arr.shape

# The result will be (2, 4)

 

Number of rows in an ndarray 

To get the number of rows of some ndarray, you can use the len() function. If used with an ndarray it will return the number of rows that you have in your ndarray. It is also the equivalent of the first member of the tuple that represents the shape attribute of an ndarray.

# Check the number of rows
# in the ndarray
len(arr)

# The result will be 2

 

Ndarray data type

Finally, you can check the data type of the elements stored in your ndarray by accessing the dtype attribute. It will tell you which type of data is stored inside your ndarray. In this case, the result will tell you that you have integers stored inside of your array.

# Check the data type
# of the data stored in the ndarray
arr.dtype

# The result will be int

 

Conclusion

Because ndarrays are the workhorse of the NumPy package, understanding what they look like and how they function is of the utmost importance. In this article, I covered everything you need to know about them in-depth. I went over all of the fundamentals of ndarrays and also explained a few nuances that might be useful to you from time to time. This article serves as a great foundation for everything you will go over in future articles in this series on NumPy, and will prepare you so you have no problems understanding any topic you go over in the future.

 

Read next: Python Data Processing: How To Select Data in NumPy >

Boris Delovski

Data Science Trainer

Boris Delovski

Boris is a data science trainer and consultant who is passionate about sharing his knowledge with others.

Before Edlitera, Boris applied his skills in several industries, including neuroimaging and metallurgy, using data science and deep learning to analyze images.