Over the past week or so, I’ve been working to migrate my content from WordPress to Notion as a new data source for my Gatsby website. This post outlines why I decided to do that and the general process. Expect a few more entries in this series with more technical detail on certain parts of the implementation, but here is the kick off where I’ll cover the history behind the project and how it works in a nutshell.

History

I’ve tried several content management systems over the past few years (hell I even built one myself from the ground up) and I kept coming back to using WordPress. It has a great editor, and with the right plugin combination its super easy to write content and extend the default schema. It wasnt without its little quirks but overall it was a pretty nice process. As time went on however, I noticed more and more of these quirks popping up.

Last week I hit a breaking point. I love WordPress as a CMS but it was doing some funky things with code blocks on save and it felt like I had to write small fixes with every post I wanted to ship. On top of that, there is something up with the Gatsby WordPress plugin that would cause my cache to break in my local dev environment AFTER the initial load. Combine that with my slow internet and the whole thing kinda deterred me from even writing new content for my blog.

A few years ago, I was entertaining the idea of using Notion for my content since I essentially run my entire life out of the platform. I track my projects and tasks, content I’m working, family finances, and on and on. I even wrote an NPM package I planed to use that converts a Notion page to HTML to render in Gatsby. I decided to push forward on this just to see how it would work, and I’m pleasantly surprised with the results.

How it works

There is a file used in Gatsby projects called gatsby-node.js which can be used to load data into the GraphQL data layer, as well as generate the static pages from dynamic content. Ultimately I needed to pull the records from a database in Notion into this data layer, then I could do whatever I wanted with it.

Loading the data

To load the data, I’m using the sourceNodes function along with the official Notion SDK for JavaScript. This will call the Notion API and pull down the data in Notion’s native block format. I’ve got a single function I called loadNotionContent that I can pass in a Notion database ID along with a “type” that is used by the GraphQL layer in Gatsby. At the time of this writing, I’m pulling down data for my posts, posts series, and portfolio items.

exports.sourceNodes = async ({ actions, createNodeId, createContentDigest }) => {
  await loadNotionContent('notionPost', process.env.NOTION_CMS_DBID, actions, createNodeId, createContentDigest)
  await loadNotionContent('notionPortfolioItem', process.env.NOTION_PORTFOLIOITEMS_DBID, actions, createNodeId, createContentDigest)
  await loadNotionContent('notionSeries', process.env.NOTION_SERIES_DBID, actions, createNodeId, createContentDigest)
};

async function loadNotionContent(type, dbid, actions, createNodeId, createContentDigest) {
  const { results } = await notion.databases.query({
    database_id: dbid
  })

  let normalized = await processNotionContent(type, results)

  normalized.forEach(n => {
    actions.createNode({
      ...n,
      id: createNodeId(n.id),
      internal: {
        type,
        contentDigest: createContentDigest(n)
      }
    })
  })
}

Transforming the data

The data pulled from Notion is in a complex JSON structure that is used to represent the block system on a Notion page. I wanted this in flattened format so its easier to work with, so processNotionContent does just that. Here is a sample of parsing a few properties based on their type in Notion:

let n = {
  id: p.id,
  notion_id: p.id
}

if(prop.type === "rich_text") {
  n[fieldName] = ""
  if(fieldName === "slug") {
    if(prop.rich_text.length > 0) {
      n.slug = prop.rich_text[0].text.content
    }
  } else if(prop.rich_text.length > 0) {
    // TODO: Flatten this
    n[fieldName] = prop.rich_text[0].text.content
  }
}

if(prop.type === "status") {
  n[fieldName] = ""
  if(prop?.status?.name) {
    n[fieldName] = prop.status.name
  }
}

if(prop.type === "url") {
  n[fieldName] = prop.url
}

The fieldName is a camel-cased version of the same property name in the Notion database. For example, I have a property in my Notion database named YouTubeURL that is used for related videos that might link to a post. The last block in the above snippet will flatten it so the JSON would look something like:

// Notion block
"YouTube URL": {
	"type": "url",
	"url": "https://www.youtube.com/watch?v=uFKRHABC1gg"
}

// Transformed
{
	"youTubeUrl": "https://www.youtube.com/watch?v=uFKRHABC1gg"
}

The other thing that processNotionContent does is use an extended version of my NPM package to pull the contents of the page from the database and output it to HTML.

let converter = new NotionToHtmlClient(process.env.NOTION_TOKEN)

// Get page
let { html, raw } = await converter.generate(p.id, { html: true, raw: true})
n.html = html

Finally, images posed an issue as they are presigned S3 urls which have a 24 hour lifetime, meaning I couldnt just use them in the HTML I write to my website. What I did was write some caching logic that would download the images into the repository and replace the image src attributes with the local version.

// This is in the `processNotionContent` function
n.html = await cacheImagesAndUpdateHtml(n.slug, n.html)

// This is the above function, elsewhere in the file
async function cacheImagesAndUpdateHtml(slug, html) {
  const regexp = /<img.*?src=['"](.*?)['"].*?>/g;
  const matches = [...html.matchAll(regexp)];
  const imgUrls = []
  matches.forEach(m => {
    if(m[1]) {
      imgUrls.push(m[1])
    }
  })

  for(let i = 0; i < imgUrls.length; i++) {
    let imageUrl = imgUrls[i]

    // Cache images and replace img url in the html
    let src = await cacheImage(slug, imageUrl)
    html = html.replace(imageUrl, src)
  }

  return html
}

I’m also caching the objects generated by all this so I dont need to grab EVERYTHING every time I build, but I’ll dive into that more in a future post.

The results

The end result was faster build times, both locally and in my GitHub Actions workflow, as well as a much simpler editing experience. After all, I’m literally writing this in Notion at this very moment, which is an experience I absolutely love. The resulting website logic was much simpler too. I didnt have to have as many shims to generate code blocks and querying data from the GraphQL layer was easier and more consistent.

{
  allNotionPost(sort: {publishOn: DESC}, filter: {status:{ eq:"Published"}}) {
    edges {
      node {
        id
        slug
        icon
        title
        category
        publishOn(formatString: "MMMM DD, YYYY")
        series {
          icon
        }
      }
    }
  }
}

What’s next

My website’s functionality is back up to where it started, but now that I know I’ll have an easier time building it up going forward, here are a few things I plan to do:

  • A better about me page with more details on my employment history
  • Linking portfolio items to various employers
  • Create separate category & series pages for my blog
  • Showcase my YouTube videos better