SiteGen: New Website Structure and Further Developments

c/c++

html

Markdown

Static Site Generator

2025-03-13

I concluded that having one page per category wasn't sustainable, and decided to go with the classic one page per post that you see in most blogs or article based websites.
I prefer this setup. I think it looks better and provides more benefits. Every page can now define unique Open Graph parameters such as title and thumbnail image which will be displayed when linking from any social media.

In addition, the landing page now looks a lot more interesting and makes content easier to find.

During the past couple of months I've been doing further developments on "SiteGen", my static site generator, in order to make all of this happen and in this post I'll summarize the changes.

Recap

Scopes

When output is being generated, the parsed template structure reads from a hierarchy that contains all input data, the "input tree".
The basic building block of a template is the "scope". Scopes are enclosed with {} and are always preceded by an "identifier" [...] naming the "scope variable" to use. Children of scope variables are always "instances".
The template scope will run for each instance of the scope variable.

[Input]
{
	<div>[Name]</div>
	<div>[Content]</div>
}

[Name] and [Content] are instance variables.

Input

An "input" in SiteGen will define a scope variable whose instances will correspond to each input file.

The most basic invocation of SiteGen would look like this.

./SiteGen -i Articles/SomeArticle.md -o ../Site/SomeArticle/index.html -t Template.html

Take one input file and produce one output file based on the provided template.
If the input path was instead a directory, then every file contained within would produce an instance, provided they satisfy some criteria (currently just file extension).

New Features

Argument Parameters

"Arguments" can now take multiple values in the form of "parameters".

-i name=Input path=../Articles/ type=md ext=md

Most parameters have default values which satisfy the common use case, so no need to set all of them.

Multiple Outputs

Previously I had implemented support for multiple inputs. While you can still only define one ouput argument it is now possible to output multiple files, one for each instance.

-o multi=Input

The multi parameter will specify the input to use.
The output routine will be dispatched for each instance of the named input and any template scope of the input will only run for that instance.
All other input scopes will loop through all of their instances as usual.

In this "mode", the given output path will be interpreted as a template, making it possible to generate uniqe and arbitrary paths for each instance.

path=Category/[Input]{[LinkName]}/index.html

Instance Sibling Access

Inside a scope you can now access the variables of instances that are directly linked to the one current in the loop. In other words, the previous and the next.

[Input]
{
	<div class="Post">
		<div>[Name]</div>
		<div>[Content]</div>
		[<]{<a href="[LinkName]">[Name]</a>}
		[>]{<a href="[LinkName]">[Name]</a>}
	</div>
}

[<] signifies previous and [>] signifies next. These identifiers must be followed by a scope in which to access the instance in question. If the instance doesn't exist then we simply skip past it without generating anything.

I use this feature to link between posts.

Scope Recursion and Section Hierarchy

Each post page now features a navigation panel listing links to every section within the post. In order to produce a html structure that would indent the links according the section hierarchy I found that I needed a way to run the scope recursively.

<div class="Panel">
[Input]
{
	[Section]
	{
		<a href="[Name]">[Name]</a>
		[:Section]{<div class="Indent">}
		[^]
		[:Section]{</div>}
	}
}
</div>

[:Section] is a "define" scope. If the current instance has a variable named "Section" then run what's in the scope.
[^] is the new recursion identifier. If the current instance has a scope variable of the same name as its parent, rerun the parent scope using the child instances, returning to [^] where we left off once completed.

Not the most elegant solution as we're checking for "Section" three times in this case but it does the job.

The Section scope variable is generated by the Markdown parser and inserted into the input tree.
Every instance can in turn also contain a Section scope variable with its own set of instances. The Name variable holds the header text.

Generic Metadata

I ditched embedding metadata into Markdown references as SiteGen also accepts plain text as input and the metadata is not particular to any format anyway. It is used to define instance variables.

Alla metadata must be declared at the very beginning of the file. Here's a typical use case.

meta Name "Title of the Post"
meta Date "2025,03,12"
meta Tags "Apple; Banana; Orange"

Keyword meta followed by variable name, followed by value wrapped in quotes.
The value contents can also define multiple instances and their variables, turning the metadata variable into a scope variable.

Each ; in the string separates instances. Each , separates variables. The one instance in Date is implied.
The instance variables are nameless and are accessed in the template via index.

Exclusive Input

I would like to link between instances but sometimes I only need one of them to produce a file.

Instances of each input are sorted according to their Date variable.
Currently there's no way to tell beforehand which files in the input directory will become the neighbours of the output instance as Date will in most cases not match the file modification date.

This means that we still have to process every file in the directory to determine the order. Luckily we only require the metadata so we only read that part of the file and skip the content generation entirely.

The mechanism is invoked like this.

-i path=InputFolder/ excl=Post_03.md

The excl parameter denotes the input as "exclusive" and names the file that will produce the "exclusive instance".
As with multiple outputs, in this "mode" the output path will be interpreted as a template. multi and excl cannot be used at the same time and only one input can be exclusive.

Deviating from Markdown - Media Syntax

All of the implemented media spans; images, video, galleries, turntables and links, are now declared using the same syntax.
Furthermore, inline parameters are no longer supported since I never used them, only references. Additionally, references can now only be declared before any other Markdown content (at the top, after metadata).

The new usage syntax is the same as referenced Markdown links.

[Link text or placeholder text for other media][reference id]

Reference declaractions still follow the Markdown format but the optional string can now define multiple parameter values or options specific to the media type.

[url1]:   www.webpage.com    "Title"
[video1]: Video/Movie_01.mp4 "controls loop"

Initally, any file extension present at the end of the url will determine the media type, defaulting to "link" in any other case.
If the first word in the parameter string contains a media type name, then that type will have precedence. This is required for galleries and turntables as they cannot be identified through the url alone.

[gallery1]: Images/Gal_Fruits_##.jpg "gallery    4"
[turn1]:    Images/Turn_Apple_##.jpg "turntable 16"

Currently, galleries and turntables only take one required parameter, the image count.

Future Work

As mentioned before, I expect this project to continue for as long as I maintain this website.
There's a lot of opportunities for non-pessimization in the code and I keep coming up with new features that I want to implement.

I find that I very much enjoy developing echo-systems, things that can be improved upon indefinitely. This is also what I find appealing in games such as roguelikes and colony-sims.

I'm not sure what I'll tackle next, but I'm looking to replace Markdown with my own language which will have a more unified syntax, which should make it easier to parse.

I would also like to avoid processing any input file that did not change since the last run and instead read the previously generated instance data directly from a cache. Given current circumstances though, this will not be a priority.

That's it for this time. Thanks for reading!

For questions and comments, please send an email or get in touch on LinkedIn.

DemoRun: Multi Buffer Support and Simple Profiler Index Wheel Reinvention Jam 2025: Procedural Tree Growth Simulator

Main

Recap

Scopes Input

New Features

Argument Parameters Multiple Outputs Instance Sibling Access Scope Recursion and Section Hierarchy Generic Metadata Exclusive Input

Deviating from Markdown - Media Syntax Future Work