Static AI Tools Directory: Large Blob Problem and Sharding Strategy
Problem
We wanted to ship the AI tools directory as a fully static site.
That created a practical problem: the source dataset was too large to serve as one JSON blob.
A single large payload causes multiple issues:
- slow initial page load
- poor mobile performance
- unnecessary bandwidth usage
- long parse time in the browser
- weak user experience for simple actions like opening the homepage
- deployment friction when the site contains either one massive file or too many tiny files
We also evaluated the opposite extreme: one file per tool.
That created a different problem:
- too many generated files
- heavy GitHub Pages file count
- noisier deploys
- inefficient repo and publish workflow
So the challenge was:
- keep the site fully static
- keep first load fast
- avoid one giant payload
- avoid one-file-per-record explosion
Constraint
We explicitly wanted:
- no server runtime
- no Next.js dependency
- GitHub deployment
- static hosting only
That means the browser has to load prebuilt data files, and the data layout has to be optimized ahead of time.
Solution
We split the dataset by usage pattern instead of shipping everything together.
The final build uses a sharding strategy with different data shapes for different page types.
How We Sharded
1. Homepage payload
Small dedicated file:
home.json
This contains only what the homepage needs.
2. Category index
Separate category navigation file:
categories/index.json
This avoids loading full tool detail data just to browse categories.
3. Paginated browse pages
All-tools listing is split into pages:
tools/pages/page-0001.jsontools/pages/page-0002.json- ...
Category pages are also paginated:
categories/<slug>/page-0001.jsoncategories/<slug>/page-0002.json- ...
This keeps list views lightweight and route-specific.
4. Search chunks
Search data is split into chunk files:
search/manifest.jsonsearch/chunk-0001.jsonsearch/chunk-0002.json- ...
The search manifest tells the frontend which chunks exist. The client loads only the search data layer, not all detail data.
5. Tool detail shards
Tool details are not stored one-per-file.
Instead, they are grouped into shard files:
tools/detail/manifest.jsontools/detail/shard-0001.jsontools/detail/shard-0002.json- ...
The manifest maps:
slug -> shard file
So the frontend flow is:
- load detail manifest
- find which shard contains the tool
- fetch that shard
- extract the matching record
This avoids both extremes:
- not one giant detail blob
- not one file per tool
Why This Worked
This approach gave us:
- fast homepage load
- lazy-loaded deep pages
- smaller browser payloads
- reasonable file counts
- GitHub Pages-compatible output
- deployable static architecture
It also matches actual user behavior:
- homepage needs summary data
- browse pages need card/list data
- search needs compact searchable records
- tool detail pages need full detail only when opened
Result
We kept the site fully static while avoiding the performance and deployment problems of a huge monolithic JSON file.
The final strategy was:
- small preload files for top-level navigation
- paginated listing files for browse views
- chunked search files for search
- sharded detail files for tool pages
- manifest-driven lookup for lazy loading
Summary
The core problem was not just “large data”. It was that static hosting forces you to design the data layout carefully.
We solved it by turning one oversized dataset into a static delivery system made of:
- page-specific JSON
- search chunks
- detail shards
- manifest-based lookup
That made the static GitHub-hosted directory practical.