Markdown Mate for Jupyter Notebooks: automating in-notebook documentation with LLMs

This project is a small automation that enriches an existing Jupyter Notebook by inserting explanatory Markdown cells immediately before every code cell. It preserves the original code unchanged and generates concise human-readable descriptions that explain each code block’s purpose, inputs, outputs, and side effects.

Objectives #

Local inference: Use a locally downloaded open source LLM for security and privacy.
Faster comprehension: Researchers and reviewers can scan a notebook and understand intent without reading every line of code.
Improved reproducibility: Inline documentation clarifies side effects like file writes and plotting, making experiments easier to reproduce.
Onboarding & teaching: Instructors and new team members get ready-made narrative flow for notebooks.

🔧 Automation #

Loads a notebook JSON (auto_markdown.ipynb).
Removes execution artifacts (outputs and execution_count) to avoid leaking runtime state.
Constructs a structured prompt for a locally-hosted LLM, requesting a version of the notebook JSON where each code cell is preceded by a short Markdown cell.
Calls the model and writes a new notebook file (e.g., auto_markdown_with_md.ipynb) containing the added markdown cells while leaving code unchanged.

Tech stack #

Language: Python (Jupyter notebook automation).
Notebook format: Standard .ipynb (JSON).
LLM client: OpenAI-compatible client pointed at a local Ollama instance (http://localhost:11434/v1).
Model: llama3.3 (used locally via Ollama in the example).

Benefits for data research and academia #

Paper reproducibility: Reviewers and reproducibility officers appreciate notebooks that tell a clear story; automated explanations reduce friction when sharing experimental artifacts.
Faster review cycles: Small teams can supply notebooks with embedded explanations that speed up peer review and code audits.
Pedagogy: Instructors can programmatically generate annotated notebooks for exercises and examples.
Accessible documentation: The automation produces consistent, plain-language descriptions that help non-experts understand computational workflows.

🛡️ Limitations & best practices #

The quality of generated explanations depends on the model—validate domain-specific claims (especially for scientific computations).
Use the output as a draft: human review is recommended for correctness and contextual clarity.

📂 Source code #

Project notebook: static/source_code/auto_markdown.ipynb (in this blog repo: https://github.com/ZombieDadCoding/debugging-the-matrix/).