Static AI Tools Directory: Large Blob Problem and Sharding Strategy

Problem

We wanted to ship the AI tools directory as a fully static site.

That created a practical problem: the source dataset was too large to serve as one JSON blob.

A single large payload causes multiple issues:

slow initial page load
poor mobile performance
unnecessary bandwidth usage
long parse time in the browser
weak user experience for simple actions like opening the homepage
deployment friction when the site contains either one massive file or too many tiny files

We also evaluated the opposite extreme: one file per tool.

That created a different problem:

too many generated files
heavy GitHub Pages file count
noisier deploys
inefficient repo and publish workflow

So the challenge was:

keep the site fully static
keep first load fast
avoid one giant payload
avoid one-file-per-record explosion

Constraint

We explicitly wanted:

no server runtime
no Next.js dependency
GitHub deployment
static hosting only

That means the browser has to load prebuilt data files, and the data layout has to be optimized ahead of time.

Solution

We split the dataset by usage pattern instead of shipping everything together.

The final build uses a sharding strategy with different data shapes for different page types.

How We Sharded

1. Homepage payload

Small dedicated file:

home.json

This contains only what the homepage needs.

2. Category index

Separate category navigation file:

categories/index.json

This avoids loading full tool detail data just to browse categories.

3. Paginated browse pages

All-tools listing is split into pages:

tools/pages/page-0001.json
tools/pages/page-0002.json
...

Category pages are also paginated:

categories/<slug>/page-0001.json
categories/<slug>/page-0002.json
...

This keeps list views lightweight and route-specific.

4. Search chunks

Search data is split into chunk files:

search/manifest.json
search/chunk-0001.json
search/chunk-0002.json
...

The search manifest tells the frontend which chunks exist. The client loads only the search data layer, not all detail data.

5. Tool detail shards

Tool details are not stored one-per-file.

Instead, they are grouped into shard files:

tools/detail/manifest.json
tools/detail/shard-0001.json
tools/detail/shard-0002.json
...

The manifest maps:

slug -> shard file

So the frontend flow is:

load detail manifest
find which shard contains the tool
fetch that shard
extract the matching record

This avoids both extremes:

not one giant detail blob
not one file per tool

Why This Worked

This approach gave us:

fast homepage load
lazy-loaded deep pages
smaller browser payloads
reasonable file counts
GitHub Pages-compatible output
deployable static architecture

It also matches actual user behavior:

homepage needs summary data
browse pages need card/list data
search needs compact searchable records
tool detail pages need full detail only when opened

Result

We kept the site fully static while avoiding the performance and deployment problems of a huge monolithic JSON file.

The final strategy was:

small preload files for top-level navigation
paginated listing files for browse views
chunked search files for search
sharded detail files for tool pages
manifest-driven lookup for lazy loading

Summary

The core problem was not just “large data”. It was that static hosting forces you to design the data layout carefully.

We solved it by turning one oversized dataset into a static delivery system made of:

page-specific JSON
search chunks
detail shards
manifest-based lookup

That made the static GitHub-hosted directory practical.