Skip to content
Dev Tooling & Vibecoding3 min read

Simon Willison's HTML Table Extractor Converts Any Web Table to CSV, JSON, or Markdown

A new browser paste-tool from the co-creator of Django turns rich-text clipboard data into five structured formats — no scraping script required.

By TRAGenX Desk

Share

Simon Willison — co-creator of Django, creator of Datasette, and one of the most prolific open-source builders working at the intersection of LLMs and developer tooling — shipped another small, sharp utility: an HTML table extractor that lives entirely in the browser and converts any copy-pasted web table into five formats on demand.

What It Does

The workflow is deliberately zero-friction. Select a table — or an entire page containing tables — in any browser, copy it, and paste the rich text into the tool. It auto-detects every embedded HTML table in the clipboard payload. Choose your output: HTML, Markdown, CSV, TSV, or JSON.

No API call. No server. No npm install. Clipboard in, structured data out.

Why This Matters for the Vibecoding Workflow

The friction point Willison is solving is one every data-adjacent builder runs into constantly: you see a useful table on a website, but getting it into a script, a prompt, or a spreadsheet is annoyingly manual. You either write a one-off scraper, wrestle with copy-paste formatting artifacts, or retype it by hand.

In an AI-assisted development context — where you might be feeding financial data, API comparison tables, or spec sheets into an LLM context window — the bottleneck is almost never the model. It's getting clean, structured input. A tool that strips that friction out of the workflow is genuinely useful, even if it looks trivially simple on the surface.

This reflects a design philosophy Willison has been refining for years: small paste-conversion utilities that live at the boundary between unstructured web content and structured data. Each one is a single-purpose tool that composes cleanly with whatever comes next in your pipeline.

The Paste-Conversion Toolkit Is Growing

The HTML table extractor is part of a growing collection. In the same post, Willison noted he recently rebuilt his Rich text to Markdown tool to add table support and improve the UI. The underlying theme is consistent: take the messy rich-text content the browser clipboard hands you and make it programmable.

For agentic pipelines, this class of tool is underrated. LLMs handle structured tabular data well, but getting that data from a live webpage into JSON without writing a custom scraper every time is exactly the kind of yak-shaving that kills developer momentum. Tools that eliminate that step keep you in flow.

The Broader Pattern Worth Noticing

What's worth noting is *how* Willison builds these: browser-native, zero-dependency, no account, no backend. They're the kind of tools that fit naturally into a vibecoding workflow — you reach for them the moment you need them, get the output, and disappear back into your editor or terminal.

As more development work involves feeding real-world data to LLMs, the tools that sit at the 'get structured data out of the web' layer are going to see a lot more use. Willison is building that layer one paste-converter at a time — and it's worth bookmarking.

FAQ

Frequently asked questions

What output formats does Simon Willison's HTML table extractor support?
The tool converts browser-pasted rich text containing HTML tables into five formats: HTML, Markdown, CSV, TSV, and JSON. You select the desired format after pasting your clipboard content.
Does the HTML table extractor require installation or an account?
No. It runs entirely client-side in the browser with no server component, no packages to install, and no account required. Paste rich text in, pick a format, and copy the result.
How does a tool like this fit into AI and LLM development workflows?
Structured tabular data is frequently needed as context for LLMs or as input to data pipelines. Rather than writing a one-off scraper, you can paste a web table and get clean JSON or CSV in seconds — removing a persistent friction point from data ingestion in any agentic or AI-assisted workflow.

Sources

Share

Read next