Self-hosted · no telemetry · no phone-home

Capture anything on the web. Preserve every word, every link, every screenshot. Recall it when you need it.

Khiip is a substrate for personal-web preservation — typed captures from X, Reddit, Wikipedia, web, and YouTube. The AGPL-3.0 core is free and self-hosted; Khiip Plus adds enriched capture and platform-styled renders. Works with Obsidian, agents, your terminal.

Install Read the docs

Capture

What Khiip captures today

Khiip ships extractors for five source types — every one of them returning a typed payload, not raw HTML.

The free AGPL core captures the substance of each source. X — tweet text, author, date, base engagement (likes, reposts, replies), community notes, polls, and reply context. Reddit — the post and every top-level comment, with crosspost lineage and removed-status preserved, over OAuth2 with your own credentials. Wikipedia — the article via the MediaWiki API, with references and infobox. Web articles — the full body text via a Trafilatura + Readability fallback chain, enriched with OG/JSON-LD metadata. YouTube — metadata and the full transcript via a yt-dlp + youtube-transcript-api fallback chain.

Khiip Plus deepens every source from the same captured bytes — no re-fetch. Quote-tweet embeds and X-Article bodies, the complete engagement set (views, bookmarks, quotes), the deep recursive Reddit comment tree with scores and authority, embedded media across all sources, and timestamped YouTube transcripts with chapter navigation. Then it renders each one in Khiip's own platform-reminiscent style — the styled artifact in the window above.

Each capture lands as a structured object — TweetPayload, RedditPayload, WebPayload, WikiPayload, YouTubePayload — with fields that match the source: a tweet keeps its author and engagement, an article keeps its title and body, a thread keeps its comments. The shape that was captured is the shape that comes back, six months later, to whatever surface you point at it.

Capture lands on the filesystem as plain Markdown — a .md file with YAML frontmatter — in your vault (Obsidian-compatible by default): portable, greppable, no lock-in. The raw bytes — the original HTML, the original API response, the original transcript file — are kept alongside on disk under a configurable data_root as a best-effort cache.

Roadmap: deeper PDF support, then Instagram, TikTok, Threads, Bluesky. Substack via web extractor today; native handlers when the engagement gap matters. The substrate is open — the same extractor protocol lives at src/khiip/extractors/base.py.

Substrate

The substrate model

Khiip is a substrate, not a destination. The distinction matters more than it sounds.

Most save-it-for-later tools — Pocket, Evernote, Notion's Web Clipper, Raindrop — couple capture and consumption at the same place. You save into the destination. You read from the destination. When the destination goes away, your archive goes with it.

A substrate separates the two. You capture into the substrate; you consume through whatever surface you currently prefer. The substrate is the fixed point. Surfaces come and go.

Destination — the old way

You capture → ← recall Pocket · Evernote · Raindrop

One box both holds and serves your captures. It shuts down → your archive shuts down with it.

Substrate — Khiip

You Khiip your vault your tools

Khiip writes to a vault you own. Khiip shuts down → the vault stays — still Markdown, still readable by anything.

Three properties make this real:

  • Independence. The substrate predates the application that consumes it and outlives the application. You can swap the application without rebuilding the substrate.
  • Knowability. Every capture has a type, a schema, and a known set of fields. Other software can read the substrate without negotiation.
  • Preservation. The raw bytes the source returned are kept on disk alongside the structured extraction — a best-effort cache beneath the canonical typed payload in your vault. When an extractor needs to re-run with a smarter algorithm three years from now, the bytes are still there.

Surfaces

What you build on top

One substrate. Many surfaces.

Your Obsidian vault

Captures land as markdown with structured frontmatter. Wiki-links across captures. Cross-corpus search at the vault level. The Khiip Obsidian plugin ships a capture command and a recall sidebar today; richer in-editor surfaces — per-source visual indicators and refetch controls — land in a later release. The substrate is on disk, your editor is the surface.

Your agent or your script

khiipd exposes a REST API and an MCP server. An LLM agent points at the substrate, asks structured questions ("what did I save about substrate design last quarter," "which Reddit threads on r/SaaS mention pricing"), and gets typed answers back. When the embedding model changes, the recall index rebuilds cheaply from the vault — the substrate itself doesn't change; it's shaped by the source, not by the consumer.

Your terminal

khiipd capture <url> from the command line. khiipd refetch <id> to re-run an extractor or re-walk media. khiipd validate to check vault ↔ SQLite consistency. The daemon is local, the CLI is direct, the data root is yours.

The same substrate. Different surfaces. When something better than Obsidian arrives, or when you decide your captures belong somewhere else entirely, the substrate doesn't have to be rebuilt. The destination is replaceable.

The breadth isn’t a roadmap — it’s the architecture. The contract is open (REST + MCP + plain Markdown), so any tool can read Khiip. Surfaces that read it today:

  • Obsidian vault
  • Obsidian plugin
  • Terminal (CLI)
  • REST API
  • MCP server
  • LLM agents
  • Local recall
  • grep & editors
  • Markdown tools
  • Your scripts & vector store

Coming soon Browser extension · Mobile · Official SDK · Destinations via the open contract (Notion, …)

Want a surface we don’t have? Suggest one on GitHub Discussions →

Open core

Open core

The substrate is AGPL-3.0 and free. The enriched layer — Khiip Plus — is the paid product.

Khiip's daemon and substrate are licensed under the GNU Affero General Public License v3.0 — free, self-hosted, no account, no telemetry. Capture across all five sources, typed payloads, local recall, the REST API and MCP server, and the plain-Markdown renders: yours, portable, git-versioned, exportable. You never pay to read your own data.

Khiip Plus is the paid tier — the enriched, platform-styled renders and the deeper per-source capture, delivered as one product. It runs on your own machine under a license key; nothing phones home. Khiip Plus arrives alongside the public launch.

To be exact about the line: Plus itself isn't open source — it's licensed under the Elastic License 2.0, the paid layer that funds the open core. The open-source guarantee covers everything in the free tier; Plus is the part you pay for.

It's the proven open-core posture — the same path Plausible, PostHog, Cal.com, Sentry, and GitLab took: an open-source core funded by a clearly-separated paid layer. The SDKs — a separate repo, when published — will be Apache-2.0 so they can be embedded freely in downstream applications.

Transparent

What's transparent

Three documents and one telemetry doc, written to be read.

  • LICENSE — AGPL-3.0 full text, in the repo root, enforceable through copyright law.
  • Terms of Service — what you get from Khiip's maintainers, what you don't, and what behavior on khiip.com is in scope.
  • Privacy Policy — what khiip.com collects (Plausible cookieless analytics; Buttondown if you opt in to the newsletter; standard Cloudflare hosting logs), how long it's kept, and what your rights are wherever you read this from.
  • docs/telemetry.md — what the daemon does to the network, by version, by configuration toggle. Inventory by call type, destination, and trigger. The same surface Plausible and Tailscale publish; it exists so that you can verify what Khiip says about local-by-default behavior without having to trust the marketing copy on this page.