Introducing jupyckage: Create Python Packages from Notebooks in One Line of Code

If you work in jupyter often, you probably find yourself sharing code between notebooks. Whether it’s creating plots, processing data, or running pipelines, replication is a part of data science. If you’re aiming to keep DRY (don’t repeat yourself), keep your code in sync across notebooks, or ease the transition from dev towards reusable code–jupyckage can help. It’s a little package that makes a big workflow upgrade.

  1. TLDR;
  2. Existing Options and Why I Don’t love Them
    1. Execute One Notebook Inside Another Notebook
    2. NBConvert and Import
  3. A New, Slicker Solution: jupyckage
    1. Pip Install
    2. Create and Import the Package
    3. You Can Leave Stuff Out of the Module
    4. Directories and Files Created
    5. Create a Package from a Collection of Notebooks
  4. More Info

TLDR;

If you want to convert your jupyter notebook to a python package,

pip install jupyckage
import jupyckage.jupyckage as jp
jp.notebook_to_package("<notebook_name.ipynb>")

OR

jupyckage --nb <notebook_name.ipynb>

will allow you to import your notebook as

import notebooks.src.<notebook_name>.<notebook_name>

Existing Options and Why I Don’t love Them

Execute One Notebook Inside Another Notebook

You could use some magic to import that code %run notebook_name.ipynb, but that’ll execute your notebook which, in data science, is often note desirable. Here are two reasons, in addition to the ones I’ll give for nbconvert:

  • If your notebook trains a model, downloads data, saves a file, or executes any long processes, you probably don’t want to run your notebook just to access its functions.
  • It requires % magic which is great in notebooks, but not useful otherwise.

I’ve rarely been in a situation in which this was a viable option.

NBConvert and Import

You could use nbconvert to create a python executable and then import that file. I don’t love this solution because

  • It creates a mess in my directory that I inevitably have to cleanup. As soon as I do clean it up–ie. organize the modules into directories–I have to do the work of creating a package anyways (ie __init__.py files and appropriate structure) in order to import the modules.
  • Usually I’m working towards reusable code, which means I’m going to have to create a package from the modules anyways. nbconvert doesn’t help with this.
  • It either must be done in the terminal, which interrupts workflow, or requires bang (!) magic, which is great while working in a notebook but not great when you move past it.
  • Everything in my notebook is converted to the module. Which means I have to be “done” or “done-ish” with that notebook to convert it.

These were big enough problems for long enough that I finally pulled together a simple solution.

A New, Slicker Solution: jupyckage

jupyckage creates (local) python packages from notebooks in one line of code.

Pip Install

The package is up on pip — just pip install jupyckage.

Create and Import the Package

To create a package from your notebook, simply run

notebook_to_package("<notebook_name.ipynb>")

Or you can run this via terminal

jupyckage --nb <notebook_name.ipyb>

Either will create a local package for you, which you can import as

import notebooks.src.<notebook_name>.<notebook_name>

Yes, the import statement is long, but it should tab complete for you (either in jupyter or an IDE).

Also, your notebook name will be reformatted as all lower case and any spaces replaced with _. But–if your notebook name contains any disallowed characters, there will be an error.

I recommend importing the package as something convenient, e.g. as abv where abv is your chosen abbreviation. As usual, you can access the functions and objects defined in your notebook via abv.<your_function>() and abv.<your_object>.

You Can Leave Stuff Out of the Module

Nice! If you are still working in your notebook you probably don’t want all of it accessible in a module. You can add a MD cell with the contents

# DO NOT ADD BELOW TO SCRIPT

and nothing below will be added to the module.

Directories and Files Created

You’ll also notice the the below file structure has been created.

notebooks/
├── src/
│   └── <notebook_name>/
│       ├── __init__.py
│       └── <notebook_name>.py
└──bin/
    └── <notebook_name>.py #executable

Create a Package from a Collection of Notebooks

If you want to convert other notebooks in the same directory, they’ll show up alongside your first one.

notebooks/
├── src/
│   ├──  <notebook_name>/
│   │   ├── __init__.py
│   │   └── <notebook_name>.py   
│   ├── <notebook_name2>/
│   │   ├── __init__.py
│   │   └── <notebook_name2>.py  
│   └── <notebook_name3>/
│       ├── __init__.py
│       └── <notebook_name3>.py 
└── bin/
    └── <notebook_name>.py 
    └── <notebook_name2>.py 
    └── <notebook_name3>.py 

More Info

If you want to learn more, request a feature, report a bug, or contribute–

Thanks for reading. Hope this helps and have a great day!

Quick Little Hack: Camera-Ready Scripts and LaTex From Jupyter Notebooks

I’m rounding that corner of my research where the end seems to be in sight–which also means that properly formatting my work for publication can’t be put off too much longer. Depending what I’m coding, I’m either in PyCharm, Sublime, or Jupyter. My dissertation is, of course, in LaTex(…as well on my whiteboard, scattered printer paper, post its, long forgotten notebooks, and the margins of so many journal papers). If my work is already in a script, then getting it to LaTex is a snap. But if it’s written in Jupyter, it needs a little love before it’s fit for prime-time.

From Script to Latex

There’s a nice little package for LaTex that imports your python script (your_script.py) and presents it neatly

\usepackage{listings}

and you can include your script via

\lstinputlisting[language=Python]{path/to/your_script.py}

It will appear nicely formatted with appropriate bolding, etc.

That’s great if your scripts already looks nice. And sure, some of mine do. But I’ve built a few simulations in Jupyter notebook.

From Jupyter Notebook to Script

There is a convenient means to create a python executable from your notebook. You can run this in your terminal

jupyter nbconvert --to script path/to/your_notebook.ipynb

This will produce a your_notebook.py executable. It’s fine if you just want to run it, but it’s not all that great to look at. It includes cell numbers, awkward spacing, and just looks a bit messy. It’s not what I want in my dissertation. And I’m certainly not going to open up each file and tidy them up manually–especially not every time I tweak and edit my code. I know I will eventually forget to export and tidy the updated script. So, let’s just make it mindless.

From Jupyter Notebook to Script-Fit-For-Latex

Using a little magic(!) and a bit of processing, I added these lines to the end of my notebooks. (Don’t worry–I’ve also added the code below for copy-paste purposes.)

Yes, these are my actual filepaths. (Proof I actually use this myself!)  Don't worry--I've replaced them with generic below.

NOTE: The first cell is markdown (the rest are python cells):

# DO NOT ADD BELOW TO SCRIPT

The next two cells are magic-ed (the ‘!’ at the beginning). The first converts this_notebook.ipynb (the actual name of the notebook you’re currently in) to this_notebook.py

! jupyter nbconvert --to script this_notebook.ipynb

The next cell is optional, but there’s a good chance you want it for organizational purposes (especially if, as I recommend, you are versioning your work via github/gitlab/etc). Also, if your notebook takes a long time to run, consider copying (cp) instead of moving (mv) your file. This will come up at the end.

! mv this_notebook.py your/desired/location/this_notebook.py

The next cell will rewrite your script in place, but leave out the extraneous stuff. NOTE: you will have to update f_name

import re

f_name = "your/desired/location/this_notebook.py"
do_not_add_below_to_script = "# DO NOT ADD BELOW TO SCRIPT" # must match the markdown above!
skip = 0
cell_nums = re.escape("# In[") + r"[0-9]*" + re.escape("]:")

with open(f_name, "r") as f:
    lines = f.readlines() # get a list of lines from the converted script

with open(f_name, "w") as f: # overwrite the original converted script

    for line in lines:
        
        if re.search(cell_nums , line.strip()): # don't include the '#In[##]:' lines
            skip = 2

        elif skip > 0 and line == "\n": # trim extra blank lines below #In[##]:' lines
            skip -=1
        
        elif re.search(do_not_add_below_to_script, line): # don't include this code 
            break

        else:
            f.write(line)

Make sure to update you LaTex path as well

\lstinputlisting[language=Python]{your/desired/location/this_notebook.py}

If the Script isn’t updating… Don’t Worry!!!

It just means nb convert is looking at a previous version (annoying, I know). In Jupyter Notebook, just go to File –> Save & Checkpoint. Then rerun this code.

If LaTex isn’t updating… Don’t Worry!!!

LaTex if finicky. We all know that. Fortunately, when an imported file isn’t updating the fix is usually the same.

Delete the your/desired/location/this_notebook.py file. Rebuild your LaTex file so that you get an error (or just delete the appropriate build file). Then rerun the code I’ve shown here — but if you chose just to copy your file, run only that cell and below.

Hope this helps! I look forward to seeing your beautiful code in a publication somewhere soon!