Skip to content

Add Syntax Highlighting to a Gatsby/MDX Blog

11 min read

A needle with thread simulating a prism.
Image credit: John Anvik

If you're writing a developer blog, you'll probably want to include some code excerpts. And if you're anything like me, you'll want to spice up those code excerpts with some pretty syntax highlighting.

This tutorial will walk you through adding syntax highlighting to a Gatsby MDX blog. I'll assume that you already have a working MDX blog, but if you don't, check out this awesome tutorial from the Gatsby docs that describes how to get an MDX blog up and running with Gatsby.

The Goal

As a refresher on MDX syntax: you can include code blocks in your MDX by enclosing the code in triple backticks (with the opening set optionally followed by the name of the coding language). The goal for this project will be to convert native MDX code blocks like this into syntax-highlighted code. So, for example, you could write this in your MDX code:

```javascript
console.log('Hello, world!');
```
markdown

And it would look like this on your blog:

console.log('Hello, world!');
js

Setup

In order to support syntax highlighting in MDX, we'll need to use a package called prism-react-renderer. This package exports a React component <Highlight />, which can render code blocks with the Prism syntax highlighter. Let's install it:

npm install prism-react-renderer
bash

The <Highlight /> Component

The prism-react-renderer docs have an example of how to use the <Highlight /> component. Let's take a quick look, to see what we're dealing with:

import React from 'react';
import Highlight, { defaultProps } from 'prism-react-renderer';
const exampleCode = `
(function someDemo() {
var test = "Hello World!";
console.log(test);
})();
return () => <App />;
`;
const Content = (
<Highlight {...defaultProps} code={exampleCode} language="jsx">
{({ className, style, tokens, getLineProps, getTokenProps }) => (
<pre className={className} style={style}>
{tokens.map((line, i) => (
<div {...getLineProps({ line, key: i })}>
{line.map((token, key) => (
<span {...getTokenProps({ token, key })} />
))}
</div>
))}
</pre>
)}
</Highlight>
);
jsx

That's a lot, so let's break it down.

<Highlight /> accepts several props:

  • code, which is the text that we want to apply syntax highlight to. In the example, it's stored in exampleCode.
  • language, which is a string representing the code's language. This will affect how the code gets parsed. For reference, here's a list of supported languages.
  • defaultProps, which is exported from the package and is simply destructured as props directly into the component. We'll take a closer look at this later.
  • A render prop, passed as children.

As you can see, the bulk of the code is the render prop. That's where the main work of rendering the code block is done. <Highlight /> works by parsing the input code line-by-line, dividing each line into 'tokens', which represent specific bits of code (e.g., variables, strings, functions). <Highlight /> then exposes all the line/token data to our render function, along with methods for adding class names to the tokens for styling. For our purposes, we'll just copy and paste the render function from the example. But if you want to modify how the code gets displayed (e.g., adding line numbers or individual line diff highlighting), this is where you'd do it.

This is a lot of boilerplate to write every time we want to render a code block. Ideally, we'd like to have one component that accepts the code and language as props, and then passes them directly to <Highlight />. Let's create that component now.

Creating a <CodeBlock /> Component

Here's the starting point for our component. Most of this is just copied from the prism-react-renderer example above:

import React from 'react';
import Highlight, { defaultProps } from 'prism-react-renderer';
export const CodeBlock = () => {
return (
<Highlight {...defaultProps}>
{({ className, style, tokens, getLineProps, getTokenProps }) => (
<pre className={className} style={style}>
{tokens.map((line, i) => (
<div {...getLineProps({ line, key: i })}>
{line.map((token, key) => (
<span {...getTokenProps({ token, key })} />
))}
</div>
))}
</pre>
)}
</Highlight>
);
};
jsx

To get our component working, there's two main tasks ahead of us: we need a way to automatically pass: 1) the code excerpt, and 2) the language, from our MDX to <CodeBlock />, so the code can be correctly parsed by the <Highlight /> component.

Getting the Code from MDX

First, we need to figure out how to pass the code from our MDX blog post to <CodeBlock />.

It might help to take a step back and see what's actually happening in our MDX. This is the syntax that we'll write in our MDX to render a code block:

```javascript
console.log('Hello, world!');
```
markdown

And this is what the resulting HTML will look like, after Gatsby and MDX finish parsing it:

<pre>
<code class="language-javascript">
console.log('Hello, world!');
</code>
</pre>
html

So our code is wrapped in a <code> tag (with a class of language-javascript), which is nested in a <pre> tag. It would be great if we could just tell MDX to replace all <pre> tags with our <CodeBlock /> component, so the code would be accessible as <Codeblock/>'s children prop.

As it turns out, this is totally possible in MDX! To make it work, we'll need to use something called shortcodes.

Shortcodes allow you to pass global components to all MDX documents in your project, without needing to import them each time. You can also overwrite default HTML elements with custom components. This will let us convert all our <pre> tags to <CodeBlock />!

First, find wherever your MDX is rendered. If you followed the Gatsby tutorial, it should be src/pages/blog/{mdx.slug}.jsx.

We'll need to import a couple things here: the <CodeBlock /> component, and a component called <MDXProvider> from @mdx-js/react:

import { CodeBlock } from '../components/CodeBlock';
import { MDXProvider } from '@mdx-js/react';
js

Next, we need to define our shortcodes. We can do that by creating an object that maps the names of any HTML elements (in this case, pre), to the components that should replace them:

const shortcodes = { pre: CodeBlock };
js

Finally, we'll wrap our <MDXRenderer> with the <MDXProvider> that we just imported, and pass it our shortcodes as components:

<MDXProvider components={shortcodes}>
<MDXRenderer>{/*your mdx */}</MDXRenderer>
</MDXProvider>
jsx

This will convert every <pre> tag in our MDX to the <CodeBlock /> component. So now, the contents of the old <pre> tag (the <code class="language-javascript"> and its children) will be accessible as <CodeBlock />'s children! Here's a rough visualiation of the change:

<CodeBlock>
<code class="language-javascript">
console.log('Hello, world!');
</code>
<CodeBlock>
jsx

The code that we want to parse is actually the children of <code class="language-javascript">, which is itself now the children of <CodeBlock />. So, inside our <CodeBlock /> component, we can access the code with props.children.props.children. It's a little cumbersome, but it works! All that's left to do now is call trim() on the code's text to remove any trailing whitespace, then pass it to the <Highlight /> component:

import React from 'react';
import Highlight, { defaultProps } from 'prism-react-renderer';
export const CodeBlock = ({ children }) => {
const code = children.props.children.trim();
return (
<Highlight {...defaultProps} code={code}>
{/* ...render prop omitted */}
</Highlight>
);
};
jsx

If you use TypeScript, you may have noticed that code is type any by default, which is usually a no-no. This probably isn't an issue, because the way MDX works should ensure that whatever ends up getting parsed as children.props.children will be a string. But if you're concerned, you can feel free to add some type guards, to be safe.

Getting the Language from MDX

Just like with the code, we also need to get the code's language from our MDX blog post, and pass it to <Highlight />.

As a reminder, here's our MDX code:

```javascript
console.log('Hello, world!');
```
markdown

And the rendered html:

<pre>
<code class="language-javascript">
console.log('Hello, world!');
</code>
</pre>
html

As you can see, whatever language is specified after the triple backticks gets added to the <code /> element's className, as class="language-<language>". So to parse the language, we'll just need to use some regular expressions:

const className = children.props.className || 'language-markdown';
const match = className.match(/language-(?<language>.*)/);
const language = match?.groups?.language;
js

Be careful: unlike code, here we need to be more careful about empty values. If no language is specified in the MDX, the <code>'s className will be undefined, and trying to call match() on it will produce a TypeError. I've opted to fall back to 'language-markdown', to ensure that a valid language will be passed to <Highlight />, even if one isn't provided.

Now that we have a language, all that's left is to pass it to <Highlight />:

return (
<Highlight
{...defaultProps}
code={code}
language={language}>
)
jsx

The Final <CodeBlock /> Component

Putting it all together, our shiny new <CodeBlock /> component looks like this:

import React from 'react';
import Highlight, { defaultProps } from 'prism-react-renderer';
export const CodeBlock = ({ children }) => {
const code = children.props.children.trim();
const className = children.props.className || 'language-markdown';
const match = className.match(/language-(?<lang>.*)/);
const language = match?.groups?.lang;
return (
<Highlight {...defaultProps} code={code} language={language}>
{({ className, style, tokens, getLineProps, getTokenProps }) => (
<pre className={className} style={style}>
{tokens.map((line, i) => (
<div {...getLineProps({ line, key: i })}>
{line.map((token, key) => (
<span {...getTokenProps({ token, key })} />
))}
</div>
))}
</pre>
)}
</Highlight>
);
};
jsx

Customization

There are a lot of ways that you could customize your new syntax-highlighted code blocks. Most of them (like adding line-numbers or diff highlighting) involve getting into the nitty-gritty of <Highlight />'s render function. That would really deserve its own post, and I won't get into that here. But we can look at one really easy way of customizing our code blocks: themes.

Theme Customization

If you've followed the tutorial this far, you'll have noticed that your syntax-highlighted code blocks already have a color theme. This is because a default theme is included in {...defaultProps}, so the component can work out of the box, without the need to specify a theme. But you can optionally pick a different color theme if you want to customize your code blocks.

prism-react-renderer provides a variety of available themes, which can be imported in your component, and passed to <Highlight> as theme:

import vsDark from 'prism-react-renderer/themes/vsDark';
// ...
<Highlight theme={vsDark}>
jsx

But you're not just limited to the built-in themes. Because prism-react-renderer uses Prism behind the scenes, you could also add a custom Prism CSS file for your theme. There a bunch of great Prism themes available, so take a look if you're interested!

To add a custom Prism theme, you'll need to copy the CSS and include it somewhere in your project. You'll also need to pass theme={undefined} to <Highlight />, to tell <Highlight /> not to use the defualt theme.

If you're curious, this blog uses a custom theme that I created with the help of the default Tailwind CSS color scheme, with all the colors selected to have a high enough contrast to be accessible.

Conclusion

And there you have it! You've successfully added some awesome syntax-highlighting to your code blocks!