Notice: You do not have to install Jupyter! Since all exercises of this lecture will be on Colab, you will only need a browser and the internet. However, if you want to run the scripts locally, which will also be helpful for your future studies, please keep reading…
For those comfortable with basic installation, a TL;DR version of this guide is also available.
If you encounter any issues, please try searching online or asking AI models first. If you're still stuck, don't hesitate to ask me for help.
Welcome to the world of computational data science. A crucial tool in this field is the Jupyter Notebook, an interactive environment that allows you to write and execute code, display visualizations, and mix in explanatory text all in a single document. Think of it as a digital lab notebook where your experiments, code, and findings live together.
While all coursework for this lecture can be completed online, learning to set up a professional programming environment on your own computer is a fundamental skill. It grants you the freedom to work offline, manage complex projects, and build a portfolio of work that will be invaluable in your academic and professional career. This guide is designed to walk you through that process, building your confidence one step at a time. The setup process is an investment in your skills that will pay dividends long after this course is over.
To accommodate different needs and comfort levels with technology, this guide presents three distinct paths for setting up your environment. Please read through the descriptions and choose the one that best fits you.
Level 0: No Installation (Google Colab)
This is the express lane. It is the perfect choice if you want to start coding immediately without installing anything on your computer. It runs entirely in your web browser and is ideal for completing class assignments, especially if you are using a computer with limited storage or an older operating system. All you need is an internet connection and a Google account.
Level 1: Standard Installation (Anaconda)
This is the recommended path for most students, especially those new to programming. Anaconda is a comprehensive, all-in-one package that installs Python, Jupyter Notebook, and hundreds of the most common data science libraries through a simple graphical installer. It is the most straightforward way to get a complete, powerful, and ready-to-use local environment. This can be thought of as the "batteries-included" option.
Level 2: Power-User Installation (Miniconda + VS Code)
This path is for the adventurous and for those who want maximum control over their setup. It involves using the command line (Terminal) and a minimal installer called Miniconda. You will learn to build your environment from the ground up, installing only what you need. This approach mirrors how professional developers manage their projects and is a fantastic way to deepen your technical skills.
The fundamental difference between these levels is a trade-off between convenience and control. Level 1 prioritizes a low-friction start, which is excellent for avoiding initial frustration. Level 2 introduces more hands-on steps that teach professional-grade practices, such as environment management, from the very beginning.
Google Colaboratory, or "Colab" for short, is a free, cloud-hosted Jupyter Notebook service provided by Google. It is an exceptional tool for learning and collaboration, offering several key advantages for students.
Zero Setup: Since Colab runs entirely in the cloud, there is no need to install any software on your local machine. This eliminates any potential issues with system compatibility or complex installation procedures.
Free Access to Powerful Hardware: Colab provides free access to powerful computing resources, including Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs). These specialized processors can dramatically accelerate the complex calculations required for machine learning models, a benefit that is particularly valuable for students who may not own high-end computers.
Pre-installed Libraries: Most of the essential data science libraries you will need—such as NumPy, Pandas, and Matplotlib—come pre-installed in the Colab environment. You can also easily install additional packages using the !pip install
command directly in a notebook cell.
Seamless Collaboration and Sharing: Colab notebooks function much like Google Docs. You can share a notebook with a simple link, grant viewing or editing permissions, and collaborate with multiple users in real-time.
Google Drive Integration: Your Colab notebooks can be saved directly to your Google Drive, making it easy to organize your work. You can also mount your Google Drive within a notebook to access datasets and other files stored there.
While Colab is an outstanding tool for coursework, it is important to recognize that it abstracts away the process of setting up a local environment. The skills covered in Level 1 and Level 2 are what you will need when working on projects in a professional setting, where you are often required to set up and manage software on your own machine.
This quick-start guide will have you running your first line of Python code in minutes.
Navigate to the Google Colab website: colab.google.
A pop-up window will appear. Click on the New Notebook button at the bottom right.
You will now see your new notebook. The main component of a notebook is a cell. There are two primary types of cells 1:
Code Cells: These are for writing and executing code.
Text Cells: These are for writing notes, headings, and explanations using a simple formatting language called Markdown.
Your new notebook starts with an empty code cell. Type the following Python code into it: print("Hello, World!")
To run the code in the cell, you can either click the circular "play" icon to the left of the cell or press the keyboard shortcut Shift + Enter
.1 The output,
Hello, World!
, will appear directly below the cell.
To add a text cell, click the + Text button in the toolbar at the top. You can type notes here to document your work.
Congratulations, you have just created and run your first Jupyter Notebook in the cloud!
The Anaconda Distribution is a free, open-source platform for Python and R programming. It is designed specifically for data science and machine learning workflows. When you install Anaconda, you get not only Python but also the conda
package and environment manager, a user-friendly graphical tool called Anaconda Navigator, and over 250 of the most popular data science packages pre-installed.5 For beginners, this is the most highly recommended and straightforward path to setting up a complete and powerful local development environment.
Follow these steps carefully to install Anaconda using its graphical installer.
Download the Installer: Navigate to the official Anaconda download page at anaconda.com/download. Click the button to download the 64-bit Graphical Installer for Windows.
Launch the Installer: Once the download is complete, find the .exe
file in your Downloads
folder and double-click it to begin the installation.
Proceed and Agree: Click "Next" to move through the initial screens. When you reach the license agreement, read it and click "I Agree".
Select Installation Type: You will be prompted to choose between "Just Me" and "All Users". Select Just Me (Recommended). The "All Users" option requires administrator privileges and is typically unnecessary.15 Click "Next".
Choose Install Location: The installer will suggest a default location inside your user folder (e.g., C:\Users\<YourUsername>\anaconda3
). It is strongly recommended to accept this default location. Avoid using installation paths that contain spaces or special characters, as this can sometimes cause issues with programming tools.17 Click "Next".
Advanced Installation Options (CRITICAL STEP): This is the most important step of the installation. You will see two checkboxes.
WARNING: DO NOT check the box that says "Add Anaconda3 to my PATH environment variable." This is an outdated practice that can interfere with other software on your system and lead to very confusing errors. The proper way to access Anaconda is through the Anaconda Prompt, which the installer will create for you. (A more detailed explanation of the PATH variable is in Appendix 5.2).
RECOMMENDATION: Leave the box that says "Register Anaconda3 as my default Python" checked. This allows other applications, like code editors, to easily find and use the Python version installed by Anaconda.
Begin Installation: Click the "Install" button. The process will take several minutes to complete as it unpacks hundreds of packages.
Complete Installation: Once the installation is finished, click "Next". The installer may offer to install other software like PyCharm or DataSpell; you can safely skip these offers for now by clicking "Next" and then "Finish".
The installation process on macOS is also graphical, but it has one crucial preliminary step.
Identify Your Mac's Processor (CRITICAL FIRST STEP): Apple has transitioned from using Intel processors to its own "Apple Silicon" chips (e.g., M1, M2, M3). You must download the correct installer for your hardware.
Click the Apple menu () in the top-left corner of your screen and select About This Mac.
In the window that appears, look for the "Chip" or "Processor" line. It will specify whether you have an Apple chip or an Intel processor. This is essential for the next step.
Download the Correct Installer: Go to the Anaconda download page at anaconda.com/download. Under the macOS section, carefully select the correct graphical installer based on your processor type (Apple Silicon or Intel).
Launch the Installer: Open your Downloads
folder and double-click the .pkg
file you just downloaded.
Proceed and Agree: Click "Continue" through the Welcome, Read Me, and License screens. Click "Agree" to accept the software license agreement.
Select Installation Type: When prompted for a destination, choose Install for me only. This is the standard and recommended option that does not require administrator privileges.5 Click "Continue".
Begin Installation: Click "Install" to accept the default installation location. You may be asked for your user password to authorize the installation.
Complete Installation: The installer will proceed and may take a few minutes. Skip any offers for additional software. Once you see the summary screen, click "Close". Your Mac may ask if you want to move the installer file to the Trash; this is safe to do as it is no longer needed.
Installation on Linux is done through the command line but is very straightforward.
Open the Terminal: Launch your terminal application.
Download the Installer Script: Go to the Anaconda download page at anaconda.com/download. Right-click the download link for the Linux installer and select "Copy Link Address". In your terminal, use the wget command to download it. Paste the link you copied:
wget https://repo.anaconda.com/archive/Anaconda3-20XX.XX-X-Linux-x86_64.sh
(Replace the URL with the one you copied).
Verify the Installer (Optional but Recommended): To ensure the file was not corrupted during download, you can verify its checksum. Run the sha256sum command followed by the filename:
sha256sum Anaconda3-20XX.XX-X-Linux-x86_64.sh
Compare the output hash to the one provided on the Anaconda website. This is like checking the seal on a package to make sure it wasn't tampered with.
Run the Installer Script: Execute the script using the bash command:
bash Anaconda3-20XX.XX-X-Linux-x86_64.sh
Follow the Prompts:
Press Enter to begin and review the license agreement. You can press the Space Bar to scroll through it quickly.
At the end, type yes
and press Enter to accept the license terms.
The installer will ask you to confirm the installation location. The default location in your home directory is recommended. Press Enter to confirm.
When the installer asks, "Do you wish the installer to initialize Anaconda3 by running 'conda init'?", type yes
and press Enter. This is the modern, recommended method that automatically configures your terminal shell to recognize conda
commands.
Activate Changes: The installation is complete, but the changes will not take effect until you restart your terminal. Close your current terminal window and open a new one. You should now see the word (base)
at the beginning of your command prompt, indicating that Anaconda's base environment is active.
For all operating systems, the easiest way to start working is with Anaconda Navigator, the graphical user interface.
Open Anaconda Navigator:
Windows: Open the Start Menu and search for "Anaconda Navigator".
macOS: Open your Applications
folder and find "Anaconda-Navigator".
Linux: Open a terminal and type anaconda-navigator, then press Enter.
Wait for Navigator to Load: The first time you launch Navigator, it may take a few moments to initialize.
Launch Jupyter Notebook: On the Navigator home screen, you will see a grid of applications. Find the tile for Jupyter Notebook and click its Launch button.
A new tab will open in your default web browser, displaying the Jupyter file browser interface. From here, you can navigate to your project folders and create new notebooks. You are now ready to code!
This path offers a more lightweight and controlled setup, mirroring the practices of professional developers. It is built on two key components: Miniconda and virtual environments.
Miniconda is a minimal installer for conda
. Unlike the full Anaconda distribution, it includes only Python, the conda
command-line tool, and a few essential dependencies. Everything else—including Jupyter—you will install yourself, as you need it. This results in a smaller, faster installation.
The real power of this approach comes from virtual environments. Imagine you are working on two different projects. Project A requires an older version of a specific library, while Project B needs the very latest version. Installing both on your main system could cause conflicts. A virtual environment solves this by acting as an isolated, self-contained workspace for each project. It is like having a separate, clean workbench for every task, each with its own specific set of tools and ingredients. This practice prevents conflicts, ensures your projects are reproducible by others, and keeps your main system clean.
These instructions are for the command line. Open your terminal application (Terminal on macOS/Linux, or Command Prompt/PowerShell on Windows).
Windows (using PowerShell):
This single command will download the installer, run it silently in the background, and then clean up the installer file.
PowerShell
xxxxxxxxxx
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe -o miniconda.exe; start /wait "" miniconda.exe /S; del miniconda.exe
The /S
flag tells the installer to run in "silent" mode, accepting all default settings.
macOS:
First, identify your processor (Apple Silicon or Intel) as described in section 3.3. Then, run the appropriate set of commands. These commands create a directory for miniconda, download the correct installer script, run it in batch mode (-b), and then remove the script.
For Apple Silicon (M1/M2/M3) Macs:
Bash
xxxxxxxxxx
mkdir -p ~/miniconda3
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh -o ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm ~/miniconda3/miniconda.sh
For Intel-based Macs:
Bash
xxxxxxxxxx
mkdir -p ~/miniconda3
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm ~/miniconda3/miniconda.sh
After running the script, you must initialize conda for your shell:
Bash
xxxxxxxxxx
~/miniconda3/bin/conda init zsh # Or `bash` depending on your shell
Linux:
These commands are very similar to the macOS ones. They will download the installer, run it in batch mode, and clean up.
Bash
xxxxxxxxxx
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm ~/miniconda3/miniconda.sh
After running the script, initialize conda for your shell:
Bash
xxxxxxxxxx
~/miniconda3/bin/conda init bash # Or `zsh` depending on your shell
For all operating systems, close and reopen your terminal after installation. To verify that it worked, type conda --version
. You should see the installed conda version number printed back to you.
conda
With Miniconda installed, you will manage your environments and packages using conda
commands in the terminal. Here are the essentials.
Creating an environment: To create a new, isolated environment for a project, use the conda create command. It is a best practice to specify the Python version you want to use.
conda create --name my-data-project python=3.11
This creates an environment named my-data-project with Python version 3.11.25
Activating an environment: Before you can use an environment, you must "activate" it. This tells your terminal to use the Python and packages from that specific environment.
conda activate my-data-project
Your terminal prompt will change to show the name of the active environment, like (my-data-project).
Installing packages: Packages should always be installed into an active environment.
conda install jupyter pandas numpy matplotlib
This command installs Jupyter and several key data science libraries into your active my-data-project environment.
Listing environments: To see all the environments you have created:
conda env list or conda info --envs
The active environment will be marked with an asterisk (*).
Deactivating an environment: When you are finished working on a project, you can deactivate its environment to return to the base system.
conda deactivate
The (my-data-project) prefix will disappear from your prompt.
The following table provides a quick reference for these fundamental commands.
Command | Description | Example |
---|---|---|
conda create | Creates a new, isolated environment. | conda create --name myenv python=3.11 |
conda activate | Enters a specific environment to use its tools. | conda activate myenv |
conda deactivate | Exits the current environment. | conda deactivate |
conda install | Installs one or more packages into the active environment. | conda install jupyter scikit-learn |
conda list | Lists all packages installed in the active environment. | conda list |
conda env list | Lists all available conda environments on your system. | conda env list |
The final step in the power-user path is to integrate your Miniconda environments with a professional code editor like Visual Studio Code (VS Code). This provides a powerful, unified interface for writing code, running notebooks, and debugging.
Install VS Code: If you do not already have it, download and install VS Code from its official website: code.visualstudio.com.
Install the Python Extension: Launch VS Code. In the left-hand sidebar, click the Extensions icon (it looks like four squares). In the search bar, type Python
and install the official extension from Microsoft. This extension provides Python language support, linting, and debugging.
Open Your Project Folder: In VS Code, go to File > Open Folder...
and select the directory where you will store your project files and notebooks.
Install the Jupyter Kernel Bridge (CRITICAL STEP): For VS Code to recognize your conda environment as a runnable Jupyter "kernel," you must install a special bridge package called ipykernel
inside that environment. This package creates a configuration file that acts like a signpost, telling VS Code and other Jupyter tools where to find your environment's Python executable.
Open your system's terminal (not the one inside VS Code yet).
Activate the environment you created earlier: conda activate my-data-project
Install the kernel package: conda install ipykernel
This is the most common step that users miss, so ensure it is completed.
Select the Python Interpreter in VS Code:
Now, inside VS Code, open the Command Palette using the shortcut Ctrl+Shift+P
(on Windows/Linux) or Cmd+Shift+P
(on macOS).
Start typing Python: Select Interpreter
and select that command from the list.
A list of available Python interpreters will appear. Choose the one that corresponds to your conda environment. It will be labeled with your environment's name, for example: Python 3.11.x ('my-data-project': conda)
.
Create and Run Your Notebook:
Create a new file in VS Code named analysis.ipynb
. The Jupyter Notebook interface will automatically open.
In the top-right corner of the notebook editor, click the Select Kernel button.
From the list that appears, select the kernel that matches your conda environment.
In the first code cell, type import pandas as pd
and run it with Shift + Enter
.
If the cell runs without any errors, your professional development environment is fully configured.
conda: command not found
(macOS/Linux) or 'conda' is not recognized...
(Windows): This is the most common issue. It almost always means one of two things: the installation finished but you did not close and reopen your terminal, or the conda init
step was skipped or failed.
Solution: First, close your terminal window completely and open a new one. If the problem persists, you may need to manually initialize conda. Find your Miniconda or Anaconda installation directory and run the conda init
command from there.
Permission Errors during Installation (macOS/Linux): If you see errors related to permissions, it may be because you are trying to install into a system-protected directory.
Solution: It is always best to install Anaconda/Miniconda within your user's home directory, as this does not require special permissions. If you must install elsewhere, you may need to run the installation script with sudo
, but this should be avoided if possible.
VS Code Cannot Find My Jupyter Kernel: If you have created a .ipynb
file in VS Code but your conda environment does not appear in the "Select Kernel" list, run through this checklist:
Did you install the ipykernel
package inside the specific conda environment you want to use?
Did you select the correct Python interpreter in VS Code (using Python: Select Interpreter
) that points to your conda environment?
Have you tried restarting VS Code after installing ipykernel
and selecting the interpreter?
The warning in the Windows installation section about not adding Anaconda to the PATH is critical. But what is the PATH?
Think of the PATH as your computer's "speed dial" list for command-line programs. When you type a command like python
into a terminal, your operating system doesn't search your entire hard drive. Instead, it looks through a specific list of folders—the folders defined in your PATH variable—to find a program with that name.
The problem is that your computer might already have other versions of Python installed. If you add Anaconda's folder to this permanent list, you might change which python
gets called by default. An application expecting the system's Python might accidentally get Anaconda's Python, leading to unpredictable crashes and errors.
The modern, safe approach used by conda
is to not modify this global list permanently. Instead, when you run conda activate my-environment
, it temporarily and safely adds the correct folder to the PATH for just that one terminal session. When you run conda deactivate
, it removes it. This elegant solution avoids system-wide conflicts and is why following the installer's recommendation is so important.
Like any software, it is good practice to periodically update your conda
installation and the packages within your environments.
To update the conda tool itself:
conda update conda
To update all packages in your currently active environment:
conda update --all
To update a single, specific package:
conda update pandas