> For the complete documentation index, see [llms.txt](https://bucketdb.sullux.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://bucketdb.sullux.com/reference/faq.md).

# FAQ

Below you will find answers to frequently asked questions about BucketDB's design choices, operational characteristics, and troubleshooting steps.

***

## Architecture and Design Choices

### 1. Why use S3 instead of a traditional block storage database like Postgres?

BucketDB targets a specific sweet spot: applications with low write volume (hundreds of writes per day), high read requirements, and rock-bottom cost constraints. S3 is virtually infinitely scalable and incredibly cheap compared to provisioning dedicated persistent volumes and database servers.

### 2. Why eventual consistency? Why not use RAFT or Paxos?

Implementing consensus algorithms like RAFT requires constant communication between nodes, complex quorum calculations, and strict network latency guarantees. By accepting eventual consistency, BucketDB achieves a completely stateless design where nodes only coordinate via a simple, time-based authority model. The local `Committed Cache` bridges the gap so your application still sees consistent reads.

### 3. Why does it take a few seconds for commits to show up on other nodes?

When a node executes a write batch, it flushes it to the `committed/` prefix. The node currently holding write authority must download that batch and merge it into the immutable data blocks. Because leadership rotates on a timer (default 5000ms), there is an inherent delay between a write and when it becomes permanently etched into the data blocks.

### 4. Why use binary serialization instead of JSON?

Performance and cost. S3 API calls charge per request and per GB transferred. JSON is verbose. BucketDB uses a custom binary serialization format based on your predefined schemas. This allows fixed-width rows that can be skipped and read quickly, and drastically reduces payload sizes compared to raw JSON.

### 5. Why doesn't the schema enforce data validity (e.g., max length, enum values)?

We believe data integrity validation is fundamentally a business logic concern, not a storage concern. The schema in BucketDB exists solely to instruct the binary encoder *how* to pack bytes efficiently. Your application layer (e.g., an ORM or validation middleware) should ensure that the data being written makes sense for your domain.

### 6. Can I use BucketDB for OLTP (Online Transaction Processing)?

No. BucketDB is terrible at high-throughput single-record OLTP. If you need to process thousands of transactions per second on the same record (e.g., a highly contested financial ledger), use a traditional relational database.

### 7. How does the B+Tree index work on S3?

BucketDB implements a Copy-on-Write (CoW) B+Tree. Rather than modifying an index block in place, an insertion or split generates a completely new block in the `indexes/` prefix. The Root Pointer in `Block 0` is then atomically updated to point to the new tree state. This ensures reads are always lock-free.

### 8. Is BucketDB relational?

No, it is a NoSQL, block-structured store. There are no built-in SQL joins. However, the schema registry and B+Tree indexes provide strong structure and fast lookups, allowing you to manually resolve relations at the application layer quickly.

### 9. Can I run BucketDB in a serverless environment like AWS Lambda?

No. A serverless function or app is, by definition, stateless. BucketDB is built around a timesharing write-forward mechanism that depends on fixed state (specifically the total number of nodes in the cluster and the current node's integer ID within the cluster). While it would be easy to use BucketDB to build a database offering to be used by serverless functions or apps, BucketDB itself depends on known instances and state.

### 10. How are Snapshots handled during Garbage Collection?

BucketDB uses Time-Range Bounding. An orphaned block is only deleted if there are no active snapshots with a creation time falling between the block's birth and its orphaning.

***

## Operations and Usage

### 11. How do I scale my BucketDB cluster?

You can scale horizontally by deploying more nodes. The cluster will dynamically adjust based on the node configuration callback. See [Cluster Operations](/core-concepts/cluster.md) for details on dynamic scaling and read replicas.

### 12. What happens if two nodes try to update Block 0 at the same time?

BucketDB relies on S3's native conditional writes via `ETag`. If a node reads `Block 0` with an ETag of `ABC`, it will only overwrite it if the current ETag is still `ABC`. If another node has already updated it, S3 will return a `PreconditionFailed` error, and the node will retry the operation safely.

### 13. How do I delete large files like images?

Do not store large files directly in rows. Use the `blob` type in your schema and wrap the payload in `db.Blob()`. When you logically delete the row, BucketDB's `gc()` function will eventually clean up the orphaned blob object.

### 14. Are there transactions?

Yes, within a single batch. `db.batch()` operations are atomic. They are either fully written to the `committed/` log, or not at all.

### 15. Can I use custom JavaScript functions in my queries?

Yes, via `db.registerFunction()`. However, these functions must be fully synchronous and pure (no side effects, no network calls). They are executed against the deserialized rows in memory. See [Custom Functions](/usage/functions.md).

### 16. How do schema migrations achieve zero-downtime?

BucketDB migrations are handled at the storage layer via a Blue/Green Migration Pattern. A background daemon slowly rewrites data from Version 1 blocks to Version 2 blocks while both applications run simultaneously. See [Schema Migrations](/usage/migrations.md).

***

## Troubleshooting

### 17. "S3Error: PreconditionFailed" appears frequently in logs.

This is normal during cluster scale-up/down events or if your system clocks drift significantly. Two nodes believe they have write authority simultaneously and are racing to update `Block 0`. The ETag check prevents data corruption. If this persists, ensure your nodes have synchronized NTP clocks.

### 18. Queries are occasionally very slow, taking several seconds.

This is the **Write-Ahead Gap**. If a node crashes or is rotated out of the cluster, its turn to act as the Leader is skipped. Commits pile up in the `committed/` prefix. The next node to receive a read query must download all of those pending commits before it can guarantee eventual consistency. Once a live node takes leadership, it will process the backlog and latency will return to normal.

### 19. "Schema not found for table X"

You must register schemas using `db.registerSchema()` *before* attempting to query or write to that table.

### 20. The `data/` folder in S3 is huge and has thousands of files.

This means you have orphaned blocks. Because BucketDB is immutable, every flush creates new blocks, but old blocks are not immediately deleted. You must regularly call `await db.gc()` (e.g., via a daily cron job) to prune unreferenced blocks and deleted blobs.

### 21. Local unit tests are hanging.

Ensure you are calling `await db.close()` in your `afterAll` or `teardown` blocks to cleanly stop the background Write-Forward Service loop.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://bucketdb.sullux.com/reference/faq.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
