> For the complete documentation index, see [llms.txt](https://bucketdb.sullux.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://bucketdb.sullux.com/usage/blobs.md).

# Large Object Storage (Blobs)

In `bucket-db`, we differentiate between unmanaged binary references (e.g., storing a raw S3 URI in a `varchar` column) and **Managed Blobs** (using the `blob` schema type `typeId: 11`).

By declaring a column as a `blob`, the database assumes full ownership of the blob's lifecycle, security, and access patterns. The raw object key is never exposed to the client.

## 1. Writing Blobs

When inserting or updating data, if your schema defines a field as a `blob`, you can pass binary data directly to the database. BucketDB handles the upload transparently.

### Inline Buffers

For small files (e.g., avatars, JSON payloads), wrap the data in `db.Blob()`. This intercepts the payload during the synchronous `batch.insert` phase.

```javascript
const fs = require('fs');
const imageBuffer = fs.readFileSync('profile.jpg');

const batch = db.batch();

// The database immediately assigns a UUID, but does NOT upload yet.
batch.insert('users', {
  id: 'u_1',
  name: 'Alice',
  avatar: db.Blob(imageBuffer) 
});

// The actual upload to S3 happens asynchronously during flush().
await batch.flush();
```

*Note: The primary data block only stores a 36-character UUID string (e.g., `123e4567-e89b-12d3-a456-426614174000`), keeping the fixed-width row and heap incredibly compact.*

## 2. Reading Blobs

When a query returns a row containing a `blob` column, the value is simply the UUID string.

```javascript
const user = (await db.query('users').where('id', '=', 'u_1').execute())[0];

console.log(user.avatar); // Output: "123e4567-e89b-12d3-a456-426614174000"
```

To retrieve the actual binary buffer, you use your `StorageDriver` or generate a pre-signed URL (if your driver supports it, like S3). The blob is physically located under the `blobs/` prefix in your storage root.

## 3. Garbage Collection

A massive challenge in NoSQL databases is orphaned blobs. If a user deletes their account, or changes their avatar, the old image file is often left rotting in S3, costing money forever.

Because `bucket-db` completely manages the `blob` fields natively in the binary schema, we can deterministically track when a blob is orphaned.

1. **Detection:** During the background Write-Forward Service execution, the daemon detects if a row containing a `blob` is deleted, or if the `blob` field's UUID changes.
2. **Tombstoning:** The daemon appends the old, orphaned UUID to a special tombstone log called `_deleted_blobs`.
3. **Purging:** When you call `db.gc()`, the garbage collector simply reads `_deleted_blobs`, issues delete commands to the Storage Driver, and clears the log.

This entirely eliminates the need for expensive S3 full-bucket scans or complex application-level cleanup logic.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://bucketdb.sullux.com/usage/blobs.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.