What is colab-ssh?

We'll be using the package called colab-ssh. It's a package that uses either Cloudflare or Ngrok to connect to a Colab instance.

This is NOT the same as using Codespaces in your browser, like the approach taken with colabcode. For now, I much prefer using colab-ssh because it allows me to use a local VSCode rather than one in the browser.

I'll be using colab-ssh for my own projects and see how it goes. It's a cheap way to do deep learning, but I'm still not certain if errors and timeouts will bug me enough to stop using it. I think it'll be fine, though! I'll likely use it just to run hyperparameter sweeps and other experiments. I think that's the ideal use for it.

Now, let's get started. First we need to run code in Colab.

Code we need to run in Colab

First we can mount our Google Drive so that we have access files or data that we need:

from google.colab import drive
drive.mount("/content/drive")

This part is optional, but you can access a .env file in your Google Drive to access a PASSWORD and GITHUB_ACCESS_TOKEN:

!pip install python-dotenv --quiet
import dotenv
import os
dotenv.load_dotenv(
        os.path.join('/content/drive/MyDrive/vscode-ssh', '.env')
    )
password = os.getenv('PASSWORD')
github_access_token = os.getenv('GITHUB_ACCESS_TOKEN')

Here we will add the url to the github repo we would like to work on:

git_repo = '<link_to_git_repo>'

Now we can install colab-ssh and import it:

!pip install colab_ssh --upgrade --quiet
from colab_ssh import launch_ssh_cloudflared, init_git_cloudflared

Finally, we create the ssh connection and also add our github repo:

launch_ssh_cloudflared(password)
init_git_cloudflared(repository_url=git_repo + ".git",
         personal_token=github_access_token, 
         branch="main",
         email="<email_for_github>",
         username="<github_username>")

Setting up Cloudflared

After that, you will get the following output:

colab-ssh-output

As it says in "Client machine configuration", you will need to download "cloudflared (Argo Tunnel)" for your OS. I use Mac so that's the one I downloaded. I downloaded the latest version instead of using brew install since that was faster.

Anyways, go here and download the binary. Then, untar the file (or execute the .exe?) and then place the cloudflared file in whatever local path you prefer.

Setup in VSCode

Download Remote - SSH: go into VSCode and go to Extensions (CTRL+SHIFT+P), and search and click on "Install Extension". Then, in Extensions, search and download "Remote - SSH".

Now that we have Remote - SSH, go into Command Palette (CTRL+SHIFT+P), and search and click on "Remote - SSH: Open SSH Configuration File". This file is located at ~/.ssh/config. Go to that file and paste the following:

Host *.trycloudflare.com
    HostName %h
    User root
    Port 22
    ProxyCommand <PUT_THE_ABSOLUTE_CLOUDFLARE_PATH_HERE> access ssh --hostname %h

I'm assuming the port is 22 for everyone. If you have a different port, you can change it based on the output you received.

Now, save the config file, copy the "VSCode Remote SSH" hostname from the Colab output, and paste it into the text box after clicking on "Remote - SSH: Connect to Host...".

There should be a new window that opens up.

Click continue when a pop-up about a fingerprint appears and then type in the password you passed in to launch_ssh_cloudflared. You are now fully connected via ssh!

You can now access your GitHub repository via "Open Folder" in Explorer. I have not figured out how to changed the repository location yet, but for now, you will need to click on .. to exit /root/ and then click on content and your repository should be there.

You will get some cloudflared files added to the root of your repository, you can add them to your .gitignore file.

Additional Tips to Get Started Quickly

Quick Package Installation

Once you've set things up, you just need to click Run All in Colab and it goes pretty fast. However, you will still need to reinstall all packages every time you create a new connection since Colab instances are ephemeral.

I suggest you either create a requirements.txt file, environment.yml file, or you use a package like poetry to get up and running quickly.

Note for Conda: you need to run some extra code in Colab in order to get access to Conda in Colab. Follow the tutorial here if you really want to use Conda. Personally, I would recommend against it since it takes longer to install. Try using pip, pip-tools or poetry instead.

In my case, I create a Makefile for every project and then I simply need to enter make poetry in the terminal. To create a Makefile, simply create a file called Makefile in your project directory. Then, in the Makefile, you can add the following (or whatever installation commands you want for your specific dependency manager):

poetry:
	pip install poetry
	poetry install

Of course, you can use whatever package manager you prefer.

And that's it! You are now ready to start coding!

Use only one Colab Notebook

To prevent having to create a notebook for every project, do the following to things:

  1. Do your package installations in VSCode rather than Colab. Then you only need to install the packages for a specific project.

  2. Create a cell in your Colab notebook with strings to your github repositories using git_repo = "git_repo_url". Just comment out the ones you don't want and uncomment the one you do.

This might sound obvious, but I started out by trying to install via Colab when I started out!

Troubleshooting

You are Asked for Username and Password

If you are asked for a username and password after launching the SSH connection, that means you are not passing in your GitHub personal access token into init_git_cloudflared. Make sure to do that.

You can setup your GitHub personal access token by clicking on your icon on the top right on GitHub, clicking on "Settings", scroll down and click on "Developer settings", and then clicking on "Personal Access Tokens". Generate a new token and use it in init_git_cloudflared.

If you get: "Could not establish connection to..."

This could mean a few things, so I'll go over the ones I encountered:

1: Your Remote - SSH config file is not correct.

Go to "Remote - SSH: Settings" and make sure that you are using the correct config file like the one below:

remote-ssh-settings-config

2: Colab is still running init_git_cloudflared because you did not pass it a valid personal access token.

No Access to GPU?

Don't forget to go to Runtime > Change Runtime Type and select "GPU" in Colab!

Can't Find Repository?

If you ran the code on a different repository and then you rerun it on a new repository, this may happen. Do resolve this, just do a factory reset of your Colab instance, and then rerun the code.

That's It!

If you have any questions, let me know! Or better yet, go to the colab-ssh repo and ask there!

If you liked this post, follow me on Twitter for more content like this! And make sure to let me know what kind of content you'd like to see more of!