Per-Page Markdown Files: The Easiest Win for AI Crawlability
Learn how adding a .md companion file to each page gives AI agents clean, distraction-free content — and why this simple technique dramatically improves how AI systems read and cite your site.
Most website optimisation advice is complicated — structured data schemas, robots.txt directives, sitemap configurations. Per-page Markdown files are the exception. They're simple text files that live alongside your existing pages, and they give AI agents exactly what they need: your content, with nothing else in the way.
What Is a Per-Page Markdown File?
A per-page Markdown file is a .md companion that mirrors the content of an HTML page. If your site has a page at /blog/what-is-llms-txt, you add a file at /blog/what-is-llms-txt.md containing the same article in plain Markdown — no navigation, no ads, no JavaScript, no boilerplate HTML.
The HTML page remains exactly as it is for human visitors. The .md file is there exclusively for machines: AI agents, LLMs, and crawlers that prefer clean, lightweight text over fully rendered web pages.
Why AI Agents Prefer Markdown
When an AI agent visits an HTML page, it has to do a lot of work before it can read your content. It must parse the HTML tree, strip navigation and footer markup, skip over cookie banners and ad slots, ignore inline scripts, and extract the actual article text — all within a limited context window.
A Markdown file eliminates all of that. The agent opens the file and your content starts on line one. No noise, no parsing overhead, no risk of the agent filling its context window with boilerplate instead of your ideas.
- Clean text — no HTML tags, scripts, or layout elements to filter out
- Smaller file size — loads faster within crawl budget limits
- Predictable structure — headings, lists, and code blocks are immediately parseable
- Better citations — AI systems produce more accurate quotes when the source is unambiguous
- Context window efficient — more of your actual content fits in a single read
How to Declare a Markdown File in Your Page Head
Simply publishing a .md file isn't enough — you also need to tell AI agents it exists. The standard way is a <link> tag in your page's <head>:
<link rel="alternate" type="text/markdown" href="/blog/what-is-llms-txt.md">This tag is the same pattern used by developers.cloudflare.com and recommended by llmstxt.org. AI agents that support the spec check for it when they visit a page, and follow the href to fetch the Markdown version directly. The AI Crawlability Test tool checks for exactly this tag on your pages.
What to Include in a Per-Page Markdown File
A good per-page Markdown file contains only the content a reader — human or machine — came for:
- A top-level heading (
#) matching the page title - A short description — one or two sentences summarising the page
- The full article body — all headings, paragraphs, lists, and code blocks
- No navigation links, headers, footers, or sidebars
- No cookie notices, subscription forms, or promotional banners
Optionally include frontmatter at the top for metadata that downstream tools may use:
---
title: What is llms.txt and Why Every Website Needs One
description: Learn about the llms.txt specification for AI agents.
author: Soorya
publishedAt: 2025-04-01
---
# What is llms.txt and Why Every Website Needs One
In the age of AI-powered search, a new file is rapidly becoming...Implementing in Next.js
In a Next.js project, the simplest approach is to place static .md files in your public/ directory. They're served directly by the web server with no build step required:
public/
blog/
what-is-llms-txt.md
robots-txt-ai-crawlers.md
per-page-markdown-files.mdThen declare the link tag in your blog post layout. In Next.js App Router, add it to the generateMetadata function or directly in the page's <head> via the layout:
// app/blog/[slug]/page.tsx
export async function generateMetadata({ params }: Props) {
const { slug } = await params
return {
alternates: {
types: {
"text/markdown": `/blog/${slug}.md`,
},
},
}
}Connecting Markdown Files to llms.txt
Once you have per-page Markdown files, reference them in your llms.txt rather than the HTML URLs. This gives AI agents that read llms.txt first a direct path to the clean content, without having to visit the HTML page at all:
# My Site
> A crawlability testing tool for AI agents.
## Blog
- [What is llms.txt?](/blog/what-is-llms-txt.md): Introduction to the llms.txt standard.
- [robots.txt for AI Crawlers](/blog/robots-txt-ai-crawlers.md): Complete 2025 guide.
- [Per-Page Markdown Files](/blog/per-page-markdown-files.md): How .md files improve AI readability.Which Pages Should Have Markdown Files?
You don't need to create Markdown companions for every page on your site. Prioritise the ones AI agents are most likely to read and cite:
- Blog posts and articles — Your primary knowledge content
- Documentation pages — Especially API references and how-to guides
- About and company pages — Helps AI systems accurately describe your organisation
- Product and feature pages — Ensures AI gives correct information about what you offer
- FAQ pages — Perfect for AI agents looking for direct answers
You can skip Markdown files for pages like login screens, checkout flows, account dashboards, and anything behind authentication — AI agents shouldn't be reading those anyway.
Keeping Markdown Files in Sync
The main maintenance consideration with per-page Markdown files is keeping them in sync with the HTML content. When you update a blog post, remember to update the .md companion too.
For content-heavy sites, consider generating .md files automatically from your content source — whether that's a CMS, a database, or Markdown-first authoring (in which case the .md files are already your source of truth and no extra work is needed).
Check right now whether your pages declare their Markdown companions: enter your URL in the AI Crawlability Test tool. It fetches your homepage and one additional page, looking for the <link type="text/markdown"> tag in each page's head.
The Compounding Effect
Per-page Markdown files work best as part of a complete AI crawlability stack. On their own, they make your content easier to read. Combined with a well-structured llms.txt that links to them, a robots.txt that allows AI crawlers, and JSON-LD that classifies your content, they form a system where AI agents can find your site, understand what it's about, navigate to the right page, and read clean content — all without friction.
That frictionless path from discovery to content is what gets you cited. And in an AI-first web, being cited is the new being ranked.