1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
|
# Write your Policy!
Welcome to the Butchunker Policy Development Guide. This guide explains how to create a custom chunking policy for Butchunker. A chunking policy defines how to split data streams or files into chunks. This is a core task for data deduplication, storage, and transfer.
Before starting, you should know basic Rust and understand the Butchunker framework. Your policy will decide where to split the data based on its content and your settings.
## Creating a Policy Crate
First, create a new `Rust Crate` to host your chunking policy.
### Writing `Cargo.toml`
```toml
[package]
name = "butck_fixed_size" # Policy name
authors = ["Butchunker"] # Author info
version = "0.1.0"
edition = "2024"
[dependencies]
```
## Implementing Policy Logic
### Writing `src/lib.rs`
In `src/lib.rs`, implement one or both of the following schemes:
#### Scheme 1: Streaming Processing Scheme
Suitable for processing large files where subsequent content cannot be predicted, but also does not require loading the entire file into memory.
```rust
use std::collections::HashMap;
// Streaming policy struct (must implement the Default trait)
#[derive(Default)]
pub struct YourPolicyStream {
// Define your state fields here
}
// Streaming processing function
pub async fn your_policy_stream(
current_data: &[u8], // Current data chunk
len: u32, // Data length
stream: &mut FixedSizeStream, // Streaming processing context
params: &HashMap<&str, &str>, // Configuration parameters
) -> Option<u32> {
// Implement your chunking logic
// Return the split position (offset from the start of current_data), or None if no split
None
}
```
#### Scheme 2: Simple Processing Scheme
Suitable for processing small to medium-sized files that can be loaded entirely at once, allowing knowledge of subsequent data during chunking for better results.
```rust
use std::collections::HashMap;
// Simple processing function
pub async fn your_policy(
raw_data: &[u8], // Raw data
params: &HashMap<&str, &str>, // Configuration parameters
) -> Vec<u32> {
// Implement your chunking logic
// Return a vector of all split positions (offsets from the start of raw_data)
vec![]
}
```
## Registration and Usage
### Deploying the Policy
1. Place the completed policy `Crate` into the `./policy/` directory of the Butchunker repository.
2. Use the `butckrepo-refresh` program to refresh the registry:
- If the program is not yet installed, you can execute the following in the root directory of the Butchunker repository:
```bash
cargo install --path ./
```
3. After each policy library update, you must:
- Execute `butckrepo-refresh` to refresh the registry.
- Reinstall the `butck` binary: `cargo install --path ./`.
### Calling the Policy
- The policy will be automatically registered in Butchunker's registry.
Use the following command to call the policy:
````rust
butck write <file> --policy <policy_name> --storage ./
````
|