Introduction
I wanted a programmatic way to generate and extract ‘Table of Contents’ HTML snippet from existing markdown text for my Next.js blogging website www.notionworkspaces.com .1
- These are the benefits of this approach:
- You don’t have to have a
Table of Contents
section in all your markdown files or HTML files - You can cut-out/extract (or) keep the table of contents after generating them in the content HTML using
cheerio
In this tutorial, I’ll teach you about a programmatic way to generate and extract ‘Table of Contents’ HTML snippet from existing markdown text in Next.js
TLDR; working code snippet here
Original Snippet before modification
I already had a function that converted markdown text to html text using remark
2 library.
export async function getPostData(id) {
const fullPath = path.join(postsDirectory, `${id}.md`);
const fileContents = fs.readFileSync(fullPath, 'utf8');
// Use gray-matter to parse the post metadata section
const matterResult = matter(fileContents);
// Use remark to convert markdown into HTML string
const processedContent = await remark()
.use(html)
.process(matterResult.content);
const contentHtml = processedContent.toString();
// Combine the data with the id and contentHtml
return {
id,
contentHtml,
...matterResult.data,
};
}
This above snippet was taken from the Next.js’s official getting started tutorial3.
- What I wanted exactly:
But, it didn’t do everything that I wanted. It didn’t generate table of contents based on the structure of markdown data.
I googled around and found an existing library called remark-toc
4 but it didn’t do exactly what I wanted.
It required a few conditions that I didn’t want to entertain.
I later stumbled upon rehype
5 library a more recent take on processing html (also markdown) in Next.js.
The Working Code Snippet
This is final code I use to generate and extract table of contents from my markdown content.
export async function getPostData(id) {
const fullPath = path.join(postsDirectory, `${id}.md`);
const fileContents = fs.readFileSync(fullPath, 'utf8');
// Use gray-matter to parse the post metadata section
const matterResult = matter(fileContents);
const file = await unified()
.use(remarkParse)
.use(remarkRehype)
.use(rehypeSlug)
.use(rehypeDocument)
.use(rehypeFormat)
.use(rehypeTOC)
.use(rehypeStringify)
.process(matterResult.content)
// Extract TOC dynamically
const $ = cheerio.load(String(file));
const contentTOC = $("nav.toc").html();
$("nav.toc").remove();
const contentHtml = $.html();
// Combine the data with the id and contentHtml
return {
id,
contentHtml,
contentTOC,
...matterResult.data,
};
}
I used the following imports to get it all working seamlessly,
The import requirements
import { unified } from 'unified'
import remarkParse from 'remark-parse'
import remarkRehype from 'remark-rehype'
import rehypeDocument from 'rehype-document'
import rehypeFormat from 'rehype-format'
import rehypeStringify from 'rehype-stringify'
import rehypeSlug from 'rehype-slug'
import rehypeTOC from "@jsdevtools/rehype-toc";
import * as cheerio from 'cheerio';
I used cheerio
6 to build an DOM tree from html text for me to extract the TOC div component using the name nav.tov
and use it as a Table of Contents
snippet I used in my react components.
This is a screenshot of how I used this piece of code on www.notionworkspaces.com .
The dynamic table of contents section, in left section in the above screenshot.
I hope you found this useful!