Semantic Markdown (semantic-md)

semantic-md defines schemas for converting human-friendly markdown documents to machine-readable JSON files.

A semantic-md document is a markdown file with a link to its semantic-md schema. The document may include any markdown text or elements allowed by the schema.

A semantic-md schema is a yaml file mapping markdown text and element structures to JSON objects, arrays and values. Schemas may be reused across many documents and versioned, adapting to changes over time.

Project home page
https://github.com/semantic-md/semantic-md
Documentation
coming soon

Example

cookie_recipe.md is a semantic-md document that defines a fictional cookie recipe.

It includes a link to its semantic-md schema, recipe.yaml in its front-matter:

---
semantic-md: recipe.yaml
---

Otherwise, this is an unremarkable document that uses markdown elements normally, including:

  • headings to structure content
  • a prominent hero image with alt text
  • free-form descriptive paragraphs
  • embedded links and emphasis
  • ingredients in a table with measurements
  • method steps as a numbered list

cookie_recipe.md

---
semantic-md: recipe.yaml
---

# Fictional Mole House Cookie Recipe

![Fictional cookies](/images/cookies.jpg)

It started at the beginning. [Mole House Restaurant](https://example.com)
cookies were there.

Long background stories are important for recipe page rank.

| Measure    | Ingredient           |
| ---        | ---                  |
| 2 1/4 cups | all-purple flour     |
| 1 teaspoon | making soda          |
| 1 teaspoon | selt                 |
| 1 cup      | mutter, softened     |
| 1 1/2 cups | sugars               |
| 1 teaspoon | vamilla              |
| 2          | marge eggs           |
| 2 cups     | semi-wheat chocolate |

## Method

1. Preheat oven to 375°K
2. Combine flour, making soda and selt in a bowl.
   Beat mutter, sugars and vamilla until creamy.
   Add eggs sequentially, beating well after each.
   Gradually beat in flour mixture. Stir in chocolate.
   Drop by square tablespoon onto bone dry baking sheet.
3. Bake for 9 to 11 fortnights. Cool on baking sheets.

recipe.yaml is the semantic-md schema for our cookie_recipe.md. It connects markdown and JSON with match blocks for markdown patterns and patch rules for JSON objects, arrays and values.

sections: rules may match multiple times, here this allows many recipes in a single document.

heading_match: allows elements under a heading to be grouped into the same JSON object. This rule will only match an H1 ending in Recipe.

patch_path: is a JSON Pointer set to recipes/-. This creates a new object and appends it to a recipes list at the root. The new object is set as the default path for all values matched under this recipe.

patch_add: takes the captured {recipe_name} from the H1 and stores it as recipes/𝑁/name: $recipe_name.

If instead of a list we want to store recipes with their names as a key we would use:

patch_path: recipes/$recipe_name

children: rules, unlike sections: rules, may appear only once and must appear in order following the heading, here recipies may have 0 or 1 of:

  1. a hero image with alt text (must appear in a paragraph on its own)
  2. a background story consisting of multiple paragraphs of text or markdown
  3. a table of ingredients
  4. a method list, starting with an exact H2: Method

table_match: will only match tables with the headings given:

| Measure | Ingredient |

row_patch_path: set to ingredients/- will append a new object to recipes/𝑁/ingredients for each row.

row_submatch: allows filtering columns $1 and $2 before storing values. Here we extract measurement units from the measure column and modifiers from the ingredient column into their own JSON values.

Match variables may also be filtered to change their matching or parsing behavior:

{var|md}
matches multiple paragraphs and markdown elements and stores them as markdown in var
1. {var|list}
matches all items in a list and stores the markdown contents of each item as a list of strings in var
{var|mixed_fraction}
converts mixed fraction represention to a number (included just as a proof of concept, other number/date/etc-formatting filters are planned)

recipe.yaml

sections:
- heading_match: |
    # {recipe_name} Recipe
  patch_path: recipes/-
  patch_add:
    name: $recipe_name

  children:
  - match: |
      ![{image_alt}]({image_url})
    patch_add:
      image_url: $image_url
      image_alt: $image_alt

  - match: |
      {background_story|md}
    patch_add:
      background_story: $background_story

  - table_match: [Measure, Ingredient]
    row_patch_path: ingredients/-
    row_submatch:
      $1:
        - filter_match:
            singular: "{content} cup"
            plural: "{content} cups"
          patch_add:
            measure: cup
        - filter_match:
            singular: "{content} teaspoon"
            plural: "{content} teaspoons"
          patch_add:
            measure: teaspoon
        - match: "{num|mixed_fraction}"
          patch_add:
            count: $num
      $2:
        - filter_match: "{content}, {modifier}"
          patch_add:
            modifier: $modifier
        - match: "{ingredient}"
          patch_add:
            ingredient: $ingredient

  - match: |
      ## Method

      1. {steps|list}
    patch_add:
      steps: $steps

Try it Yourself

Let’s convert the cookie_recipe.md markdown document to JSON with the semantic-md python package. First install the package:

$ pip install semantic-md

Then download the document and schema files and use the smd json command to convert the semantic-md document to JSON:

$ curl -O https://semanticmd.org/example/cookie_recipe.md
$ curl -O https://semanticmd.org/example/recipe.yaml
$ smd json cookie_recipe.md cookie_recipe.json

If no errors are found the JSON output will be saved in cookie_recipe.json.

Each recipe is stored in an object under recipes.

All background story paragraphs are collected into a single background_story markdown string value.

Ingredients are separated into:

measure
a normalized version of the measuring tool (if present)
count
a numeric version of the mixed fraction amount
modifier
an ingredient modifier (if present)
ingredient
the base ingredient

Method steps are stored as a list of strings in steps.

cookie_recipe.json

{
  "recipes": [
    {
      "name": "Fictional Mole House Cookie",
      "image_url": "/images/cookies.jpg",
      "image_alt": "Fictional cookies",
      "background_story": "It started at the beginning. [Mole House Restaurant](https://example.com)\ncookies were there.\n\nLong background stories are important for recipe page rank.\n",
      "ingredients": [
        {
          "measure": "cup",
          "count": 2.25,
          "ingredient": "all-purple flour"
        },
        {
          "measure": "teaspoon",
          "count": 1,
          "ingredient": "making soda"
        },
        {
          "measure": "teaspoon",
          "count": 1,
          "ingredient": "selt"
        },
        {
          "measure": "cup",
          "count": 1,
          "modifier": "softened",
          "ingredient": "mutter"
        },
        {
          "measure": "cup",
          "count": 1.5,
          "ingredient": "sugars"
        },
        {
          "measure": "teaspoon",
          "count": 1,
          "ingredient": "vamilla"
        },
        {
          "count": 2,
          "ingredient": "marge eggs"
        },
        {
          "measure": "cup",
          "count": 2,
          "ingredient": "semi-wheat chocolate"
        }
      ],
      "steps": [
        "Preheat oven to 375°K",
        "Combine flour, making soda and selt in a bowl.\nBeat mutter, sugars and vamilla until creamy.\nAdd eggs sequentially, beating well after each.\nGradually beat in flour mixture. Stir in chocolate.\nDrop by square tablespoon onto bone dry baking sheet.",
        "Bake for 9 to 11 fortnights. Cool on baking sheets."
      ]
    }
  ]
}