Table of Contents
Jupyter notebooks are powerful environments for both writing and running code, and which you can also use for other things, such as writing formatted text like you would in a Microsoft Word Document, or inserting pictures or videos alongside Python code. Jupyter notebooks are particularly good for beginners because you can both write and run code in the same place. Most text editors - even the ones designed for programming - don't typically allow you to do this. In this blog series, we will mostly be using Jupyter notebooks. And, the best thing about it is that, if you read and followed our previous post about how to set up your Python system, you have installed the Anaconda distribution, so you already have Jupyter notebooks installed too. And if you haven't - please go read it now. I'll show you how to open a Jupyter notebook in just a moment.
NOTE: we put together a detailed guide on how to start the Jupyter notebook server on Mac or on a PC. The screenshots below were taken on a Mac computer. While the commands and steps involved are the same between the two operating systems (OS X and Windows), it might still be easier to follow our detailed guide linked above.
To launch Jupyter notebooks, I have to first launch the command line. I will then navigate to my desired folder, which for me is ~/Projects/i2p. Once I'm here, all I have to do is type jupyter notebook and hit enter.
After I run the command, it should automatically open a browser window - in my case it opens a Firefox window, because Firefox is my default browser. In your case it might be Chrome, Safari, or another browser. What you can see right here is the starting page for Jupyter notebooks, which just shows the files that are present in the directory where I typed the command that launched the Jupyter notebooks environment. In my case, I was in my ~/Projects/i2p folder, which had the hello_world.py script I wrote earlier using the Atom text editor.
So why did we have to use the command line to launch a Jupyter notebook? And why is the web page automatically launched? One thing you need to understand about Jupyter is that it has a server-client architecture, similar to other online services you might use. Whereas most desktop applications you might use (Microsoft Excel, Atom, Sublime etc.) are standalone, Jupyter notebooks have two parts: a server and a client. In this way, they are more similar to say, Gmail, or Twitter. The webpage is the client - it displays an interface you can use to interact with a server that sits in some data center somewhere. Unlike Gmail or Twitter, Jupyter's server is running on your own computer in this case (however, the Jupyter server can also run remote, in a data-center or in the cloud, if your laptop is not powerful enough for your computational needs).
NOTE: If you are still confused about the server-client interaction, think about a restaurant. When you visit a restaurant, you are a client. You interact with a server, who takes your "commands" (orders) and gives you back "responses" (food, drinks etc.). This analogy mimics how the internet works to some extent. Though, of course, the internet is quite a bit more complicated than that.
When it comes to websites, the web page is the client and the server is usually some computer that sits somewhere in a data-center, sometimes on the other side of the planet (however, it can also be a server that runs locally, on your computer!). When you click on a button on a web page, that button typically translates into some command that is sent to the server. The server receives the command, performs some actions and sends back a response.
Jupyter notebooks use a similar architecture to that of web based services: we have a web page that we can see in the screenshot above, and we have a server that runs locally on our computer. When you run the jupyter notebook command you are actually starting that server and the text you see right below the command is the server log:
The server log tells you what the server is doing. For example, Serving notebooks from local directory: /Users/ciprian/Projects/i2p informs me that the server is looking for "notebooks" that are located in this directory on my computer. We will see what a notebook is in just a moment.
A notebook is a collection of code cells as well as some meta data. Notebooks are files that store code that we can run using a Jupyter server.
Let's go back to that webpage that was launched automatically when we started the Jupyter server. We know now that the web page is connected to the Jupyter server, which is running locally on your computer. Now, let's go over to the top right corner and click the New button and from here we'll select Python 3.
NOTE: One thing to note here is that Jupyter notebooks are very powerful and allow you to write code in many languages. They're also widely used in data science and data analysis - so it's a really good tool to get comfortable with and invest some time into. In this blog series, we'll only use it to write Python code. But remember, it can do much more than that.
After we select Python 3, a new notebook will be created, which will understand Python 3. Depending on your Jupyter notebook version, you should see something more or less similar to the screenshot below.
Let's take a few moments to familiarize ourselves with some of the features of Jupyter notebooks. First of all, let's give this notebook a name. We can do this by moving our mouse over the top left corner, right here where it says Untitled and clicking on it. A window will pop-up and we can enter a name here - let's say python_practice. You can see that I avoid using spaces in my file names. That's just good practice whenever you create any files that contain code.
Now, unlike the text editors, I don't have to specify a .py extension when naming the file. We'll just enter python_practice and click Rename. You can see that the name of the Jupyter notebook has been updated.
Next, let's talk about the input box we see in the screenshot above. It is called a cell and this is where we'll write code. Before we write any code though, there's something I want you to notice about this cell. If I click inside it, it turns green. If I click outside it, it turns blue. I can also make it blue by pressing the Escape key. When a cell is green, I can actually edit it and write code inside it, but when it's blue, I can't.
When all the cells in my notebook are blue, however, I can do other things, like insert other cells in my notebook or delete cells from my notebook.
To insert a new cell in a notebook, simply press the Esc key, to make sure there are no green cells, and then the b key on your keyboard to insert a new cell below (b for below), or press a to insert a new cell above (a for above). You will notice that generally only one cell is blue (selected) at a time, just how only one cell is green (ready for edit) at a time.
There we go, a new cell was added below the previous one. If we want to delete this cell, I will first press Escape, and then press d twice. Doing this will delete all the cells that were blue (selected).
I use these keyboard shortcuts a lot, but if they're a bit hard to figure out in the beginning, you can also use the menu. To add a new cell below the currently selected (blue) cell, you can use the Insert > Insert Cell Below menu. To add a new cell above the currently selected (blue) cell, you can use the Insert > Insert Cell Above menu.
If you are following along, take a break from reading this article and practice adding / removing cells.
Another useful thing to know is how to select a cell. You can select a cell (make it blue) by clicking in the area shown in the screenshot below. Note that clicking inside the cell will make it green (ready for edit).
Now we know how to insert, delete and select cells. It's time to write some code in our Jupyter notebook. Let's first click inside the first cell and write code that prints the text 'hello world'. Again, that's just:
You can already see that the code has the nice syntax highlighting that you might be familiar with from using a code editor. But how do we run this code? If we were in the Python command line, as we discussed in our article on how to write and run Python code, we just typed Enter. If we type Enter inside that cell, it will just add a new row inside it. So instead, what we need to type is Shift and Enter at the same time. Alternatively, you can click the Run button on the Header (in my screenshot, it's right under the Cell menu).
As soon as we type Shift and Enter (or click the Run button), our code gets executed and the result is printed out immediately underneath the cell that contains the code. We can see here our 'hello world' phrase.
More than that, Jupyter is being extra helpful and, if you don't have an empty cell underneath (I already had one), it will automatically add a new cell below, so we can quickly write some more code.
Let's write a different program. Instead of 'hello world', let's print our names. First, we make sure the cell is green, by clicking inside it, and then we type:
print('My name is Ciprian')
And again, we type Shift and Enter and again, our code gets executed and the result is printed.
That's pretty cool. As you can see, Jupyter notebook combines the ease of writing and immediately executing code that we saw when we used the Python command line, with the benefits of having a text editor. In this blog series, we will mostly be using Jupyter notebooks precisely because of these great benefits.
NOTE: I would not be thorough if I didn't point out that, for most programming tasks, code editors are preferred. There are several reasons for that:
This being said, Jupyter notebooks are excellent (and generally preferred to code editors) for data analysis, data science and some machine learning tasks.
So far we have only written Python code inside the Jupyter notebook cells. But there is one more type of content we can add inside a Jupyter cell: markdown. If you are not familiar with it, this markdown tutorial is an excellent place to learn more about it. In short, markdown is a way to write formatted text.
To understand why having the ability to add formatted text inside a Jupyter notebook is useful, you have to keep in mind the heritage of Jupyter notebooks. Notebook interfaces are actually rather old. They originated in the 1980s with the Mathematica software, and their main purpose was to make technical computing easier. Initially mostly used by scientists, they served as a kind of digital lab notes. In a lab note, you have results of computations, but you also have lots of explanatory text.
Today, data analysts and data scientists still need to explain in their Jupyter notebooks what their hypotheses are, why they took a certain approach to their analysis, and what their results mean. This is mostly text. To be able to insert this text next to code, we need cells that can display formatted text. Hence, the support for markdown in Jupyter notebooks.
By default, any given cell in a Jupyter notebook is a code cell. You can tell the type of a cell by looking at the cell type dropdown menu.
Remember, a code cell contains code that can be executed. A markdown cell contains text that can be displayed. Let's make the empty cell at the bottom of the notebook a markdown cell and add some text.
Notice how I had to change the cell type to markdown in the dropdown menu in the header. If I now press Shift and Enter, instead of running code, the Jupyter notebook will simply display the formatted markdown. The two # symbols indicate a type of header in markdown.
Using markdown, you can:
Here's the gotcha #1: what if you accidentally change the type of a cell that has code to markdown? Well, rather than interpreting it as code, the Jupyter notebook is going to assume it's just markdown text. Not only that, but you'll also lose the syntax highlighting. And if you don't pay attention, especially in the beginning, you'll be wondering why your code is not running.
Gotcha #2 is the corollary of the first one. If you have a cell containing markdown text and you change the cell type to code, when you try to run that cell you might get syntax errors because the text is interpreted as code, when in fact it's just plain English.
So be very careful with the cell type and make sure you check it if you run into these issues!
I want to show you one more thing that makes Jupyter notebooks quite powerful compared to the Python command line. You can actually include multiple lines of code in one cell. Let's say I want to write some code that prints 'hello world' and then on a new line 'My name is Ciprian'. I could do that in two different cells, but that takes up quite a bit of space and besides, I want those two phrases to show up one next to the other. Jupyter notebook allows me to do that. Let me show you how.
First, I click inside an empty cell to make sure it's green and now I type print('hello world') and then hit Enter. Once I do that, I will still be inside the same cell, just on a new line, so here, I can write print('My name is Ciprian').
Now, I type Shift and Enter to run this code. And just like that, the Jupyter notebook sends these two lines of code to the Jupyter server, which runs them and returns the response (the text to print on the screen), which the Jupyter notebook then shows on the screen.
NOTE: You may wander, perhaps, why we can't write the two phrases on the same line. I'm glad you asked, because this is a very important first lesson when it comes to writing programs. In Python, each command, or instruction, if you will, must be on a separate line. That's simply because the interpreter is not complex enough to understand what you mean when you write multiple commands on the same line. If you don't believe me, try it out.
Let's edit the code cell above and write the following code:
print('hello world!') print('My name is Ciprian')
This seems very similar to the code above, except, I have a single line here. If I press Shift and Enter, you can see that Python complains that my syntax is invalid.
This word, syntax, is borrowed from grammar and you may have run into it in school. This message just tells us that what we're trying to do is not correct, similarly to how some phrases just don't sound like correct English, even though they do use English words. It's like saying, "I want not breakfast" instead of "I don't want breakfast". Python is very helpful here and it informs me that what I'm trying to do doesn't make sense. So remember, each Python command needs to be on a separate line.
The Jupyter notebook menus are at the top of the page. You can click on them to explore what's inside each one of them. You probably won't need most of them in the beginning, but it's good to go over them.
First, the File menu. This gives you the ability to start a new notebook, rename your notebook, save it. One thing to also note is that Jupyter notebooks are automatically saved every once in a while. In fact, if you look at the Jupyter server logs, you will see a lot of Saving file at... log entries - that's just the server confirming the fact that it autosaved your notebook.
Getting back to the File menu, a very useful feature is the Download as option. This allows you to export your Jupyter notebook to a variety of formats. If this sounds a bit useless, remember that Jupyter notebooks are often used for data analysis. In addition to running code, they can also display images, graphs, video and other media. Essentially, you can build a whole report in a Jupyter notebook, then export it to a PDF and share it with your team. That's pretty neat!
From the Edit menu, you can edit cells. You can merge them, find text and replace it, and you can move cells up and down.
The View menu has a bunch of options that control what the notebook looks like. For example, we can hide or show the header - if you want to see more of your notebook on one screen, and get some extra real estate. Another one I like is the Toggle Line Numbers feature, which allows me to show or hide line numbers. When you write small programs, line numbers don't matter that much, but in larger programs they're crucial. That's because, if you have an error in your code, Python will often tell you exactly which line is wrong, and when that happens, you don't want to be counting lines. It's much easier to just see the line numbers.
The Insert menu allows us to insert cells above or below the currently selected cell. We have seen this menu before, when we introduced the keyboard shortcuts for adding cells to a notebook. Generally, I suggest that you learn the keyboard shortcuts because they will make you a lot faster at writing code, but know that you can also just use these menu options right here.
Another useful menu is the Cell menu. Here, a useful option is Run All, which simply executes every cell in your notebook, one by one, starting from the top of the notebook. You could do this manually, by selecting which cell you want to run and then clicking Shift and Enter, but if you have a large notebook, that can be very time consuming.
In the Cell menu, we also find the options for Cell Type. Remember, Jupyter supports multiple types of cells, the main ones being Code, which are cells where we can write and execute code, and Markdown, which are cells where you can write text in markdown format. We saw earlier that we can toggle between these modes using the dropdown menu in the header. If the header is hidden, however, you can also use the Cell menu to switch between Cell and Markdown modes.
Before we get into details about the available options in the Kernel menu, we need to understand what a kernel is. Briefly, a Jupyter kernel is a program that runs the code that the Jupyter server receives from the Jupyter notebook. The diagram below shows a rough mental model you can use to understand the interaction between the Jupyter notebook, the notebook server and the kernel.
The red arrows show the data flow. When you run some code in the Jupyter notebook, the code is sent to the server, which then uses a kernel to run that code.
Why do we need kernels? Well, first of all, the Jupyter server itself is not built to understand the code you want to execute. Its main job is to delegate the code execution to the appropriate kernel that understands that particular programming language (in our case Python) and let the kernel execute the code.
In this way, Jupyter is designed to be extensible. This architecture allows programmers to add support for new programming languages by simply writing a new kernel that can understand and run that particular programming language. In the diagram above, I listed only a few of the kernels available in the Jupyter ecosystem. A more complete list can be found here. There are so many of them!
Kernels are what makes Jupyter notebooks so powerful. They are the feature that allows Jupyter to be able to run so many programming languages using a unified interface.
Now that we know what kernels are, let's look at the Kernel menu.
As you can guess, all the options in the Kernel menu expose actions you can perform with the kernel that is currently assigned to interpret the code in your Jupyter notebook.
Interrupt simply stops the kernel from whatever it's doing at the moment. This is useful if your code is taking too long to run, for example due to some bug you accidentally introduced in your code. Why wait for the kernel to finish executing that (incorrect) code when you can just tell it to stop doing what it's doing, fix your code, and then run the cell again?
Restart restarts the kernel. This will free up some memory, but also wipe out any variables that you have currently stored in your context. We'll learn more about this in future posts in this series.
Restart & Clear Output restarts the kernel and also removes all the output from the notebook. In the screenshot above, the output is hello world and My name is Ciprian. Restarting and clearing the output would restart the kernel and remove those texts.
Restart & Run All restarts the kernel and immediately runs all the cells in the notebook.
Reconnect attempts to reconnect to the kernel. Sometimes, the kernel can become disconnected. When the kernel is disconnected, you won't be able to execute code because there is no kernel ready to execute it. This menu option allows you to reconnect to the kernel.
Shutdown, well, shuts down the kernel.
Jupyter notebooks are autosaved. However, if you want to manually save them, you can use the File > Save and Checkpoint, or the (anachronistic) floppy disk icon in the top left corner. You can also simply type Ctrl and s at the same time.
Closing a Jupyter noteboook is a bit more involved. You can just close the web page, but the server will still be running. To stop the server, you need to go back to the terminal where you launched the Jupyter server and type Ctrl and c at the same time.
The server will ask you to confirm that you want to shut it down. Simply type y and then Enter and the server will be shut down and you will be back in the terminal. At this point, you can also close all the browser tabs that Jupyter opened.
In the terminal you can now see (using the ls command on OS X or dir on Windows) that we have a new file in our folder, called python_practice.ipynb. This is the Jupyter notebook we just created!
IMPORTANT NOTE: If you want to open a saved notebook, one thing to remember is that you can't simply double-click on it. Instead, what you need to do is open a terminal, type jupyter notebook and when the Jupyter notebook interface opens up in your browser, you can use the file system interface to navigate to the folder where you saved your notebook and then double click on it. Let me show you how.
Step 1. Open the terminal or command line application:
Step 2. Navigate to the folder where we have the Jupyter notebook we want to run using the change directory command (cd):
Step 3. Start the Jupyter server by typing jupyter notebook:
Step 4. When the browser tab opens, click on the notebook you want to open:
Step 5. Now finally, the notebook will open in a new tab. Notice how the notebook remembers both the code we wrote and the results of executing that code! If you want to clear that output, don't forget you can use the Kernel > Restart & Clear Output menu option.