Deep Dive: How Sharding Distributes Jobs Across CPU Cores
One of bunqueue’s key design decisions is sharding jobs across multiple priority queues. Instead of a single global queue with a lock, bunqueue creates multiple independent shards that can be accessed concurrently. Here’s how it works.
Why Shard?
A single priority queue becomes a bottleneck when multiple workers pull jobs simultaneously. Lock contention grows with concurrency. Sharding solves this by partitioning the job space:
- Each shard has its own priority queue
- Workers can pull from different shards concurrently
- Lock contention is reduced by a factor of N (shard count)
Shard Count: Auto-Detected from CPU Cores
The number of shards is calculated at startup based on available CPU cores, rounded up to the nearest power of 2:
function calculateShardCount(): number { const cpuCount = navigator.hardwareConcurrency || 4; // Round up to next power of 2, cap at 64 const pow2 = Math.min( 64, Math.max(4, 1 << Math.ceil(Math.log2(cpuCount))) ); return pow2;}| CPU Cores | Shard Count |
|---|---|
| 1-4 | 4 |
| 5-8 | 8 |
| 9-16 | 16 |
| 17-32 | 32 |
| 33+ | 64 |
The power-of-2 constraint is critical for the hashing strategy.
FNV-1a: Fast Deterministic Hashing
Jobs are assigned to shards using FNV-1a hash on the queue name:
function fnv1aHash(str: string): number { let hash = 0x811c9dc5; // FNV offset basis for (let i = 0; i < str.length; i++) { hash ^= str.charCodeAt(i); hash = (hash * 0x01000193) | 0; // FNV prime } return hash >>> 0; // Ensure unsigned}
const SHARD_MASK = SHARD_COUNT - 1;
function shardIndex(queueName: string): number { return fnv1aHash(queueName) & SHARD_MASK;}Because SHARD_COUNT is always a power of 2, we use bitwise AND instead of modulo. This is a single CPU instruction vs. an expensive division:
// Fast: single AND instructionconst idx = hash & SHARD_MASK;
// Slow: division + remainderconst idx = hash % SHARD_COUNT;The Shard Structure
Each shard manages multiple named queues, each backed by an IndexedPriorityQueue:
class Shard { private readonly queues = new Map<string, IndexedPriorityQueue>(); private readonly dlqEntries = new Map<string, DlqEntry[]>(); private readonly stallConfigs = new Map<string, StallConfig>();
getQueue(name: string): IndexedPriorityQueue { let queue = this.queues.get(name); if (!queue) { queue = new IndexedPriorityQueue(); this.queues.set(name, queue); } return queue; }
push(job: Job): void { this.getQueue(job.queue).push(job); }
pop(queueName: string): Job | null { return this.getQueue(queueName).pop(); }}Processing Shards: Separate Lock Domain
Active jobs (being processed by workers) are tracked in a separate set of processing shards with their own locking:
// Processing shards use job ID hashing (not queue name)function processingShardIndex(jobId: JobId): number { return fnv1aHash(String(jobId)) & PROCESSING_SHARD_MASK;}This separation is intentional. The lock hierarchy is strictly enforced:
jobIndex(read) -> 2.completedJobs(read) -> 3.shards[N](write) -> 4.processingShards[N](write)
Violating this order would create deadlocks.
Performance Impact
Sharding delivers measurable throughput improvements on multi-core systems:
| Scenario | Single Queue | 4 Shards | 16 Shards |
|---|---|---|---|
| 1 worker | baseline | ~same | ~same |
| 4 workers | contention | ~3.5x | ~3.8x |
| 16 workers | high contention | ~8x | ~14x |
The gains come from reduced lock contention. With a single queue, all workers compete for one lock. With N shards, workers on different queues never contend.
Key Design Decisions
Queue-name sharding, not job-ID sharding. All jobs for a queue live in one shard. This means a single queue can’t benefit from sharding, but it dramatically simplifies the pull operation (no need to check multiple shards).
Separate processing shards. Active jobs are tracked separately from queued jobs. This prevents pull operations from blocking ack/fail operations.
Static shard count. Shards are created at startup and never change. This avoids the complexity of resharding and makes the hash-to-shard mapping immutable.