diff options
Diffstat (limited to 'policy/README.md')
| -rw-r--r-- | policy/README.md | 96 |
1 files changed, 96 insertions, 0 deletions
diff --git a/policy/README.md b/policy/README.md new file mode 100644 index 0000000..c995506 --- /dev/null +++ b/policy/README.md @@ -0,0 +1,96 @@ +# Write your Policy! + +Welcome to the Butchunker Policy Development Guide. This guide explains how to create a custom chunking policy for Butchunker. A chunking policy defines how to split data streams or files into chunks. This is a core task for data deduplication, storage, and transfer. + +Before starting, you should know basic Rust and understand the Butchunker framework. Your policy will decide where to split the data based on its content and your settings. + +## Creating a Policy Crate + +First, create a new `Rust Crate` to host your chunking policy. + +### Writing `Cargo.toml` + +```toml +[package] +name = "butck_fixed_size" # Policy name +authors = ["Butchunker"] # Author info +version = "0.1.0" +edition = "2024" + +[dependencies] +``` + +## Implementing Policy Logic + +### Writing `src/lib.rs` + +In `src/lib.rs`, implement one or both of the following schemes: + +#### Scheme 1: Streaming Processing Scheme + +Suitable for processing large files where subsequent content cannot be predicted, but also does not require loading the entire file into memory. + +```rust +use std::collections::HashMap; + +// Streaming policy struct (must implement the Default trait) +#[derive(Default)] +pub struct YourPolicyStream { + // Define your state fields here +} + +// Streaming processing function +pub async fn your_policy_stream( + current_data: &[u8], // Current data chunk + len: u32, // Data length + stream: &mut FixedSizeStream, // Streaming processing context + params: &HashMap<&str, &str>, // Configuration parameters +) -> Option<u32> { + // Implement your chunking logic + // Return the split position (offset from the start of current_data), or None if no split + None +} +``` + +#### Scheme 2: Simple Processing Scheme + +Suitable for processing small to medium-sized files that can be loaded entirely at once, allowing knowledge of subsequent data during chunking for better results. + +```rust +use std::collections::HashMap; + +// Simple processing function +pub async fn your_policy( + raw_data: &[u8], // Raw data + params: &HashMap<&str, &str>, // Configuration parameters +) -> Vec<u32> { + // Implement your chunking logic + // Return a vector of all split positions (offsets from the start of raw_data) + vec![] +} +``` + +## Registration and Usage + +### Deploying the Policy + +1. Place the completed policy `Crate` into the `./policy/` directory of the Butchunker repository. +2. Use the `butckrepo-refresh` program to refresh the registry: + - If the program is not yet installed, you can execute the following in the root directory of the Butchunker repository: + + ```bash + cargo install --path ./ + ``` +3. After each policy library update, you must: + - Execute `butckrepo-refresh` to refresh the registry. + - Reinstall the `butck` binary: `cargo install --path ./`. + +### Calling the Policy + +- The policy will be automatically registered in Butchunker's registry. + + Use the following command to call the policy: + + ````rust + butck write <file> --policy <policy_name> --storage ./ + ```` |
