Usage

Basic Usage

To use py-multihash in a project:

import hashlib
import multihash

# hash your data
m = hashlib.sha256()
m.update(b'hello world')
raw_digest = m.digest()

# add multihash header
multihash_digest = multihash.encode(raw_digest, "sha2-256")

# encode it to a string
multihashed_str = multihash.to_b58_string(multihash_digest)

print(multihashed_str)
# QmaozNR7DZHQK1ZcU9p7QdrshMvXqWK6gpu5rmrkPdT3L4

To see that your data follows the header:

print('  ', m.hexdigest())
print(multihash.to_hex_string(multihash_digest))

#     b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9
# 1220b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9

Hash Computation

You can compute hashes directly using the digest() or sum() functions:

from multihash import digest, sum, Func

# Using digest() function
mh = digest(b"hello world", Func.sha2_256)
print(mh.code)  # 0x12
print(mh.digest.hex())  # Digest in hex

# Using sum() function (Go-compatible API)
mh = sum(b"hello world", "sha2-256")
print(mh.encode('hex'))

Truncation Support

You can truncate digests to a specific length:

from multihash import sum, Func

# Truncate to 16 bytes
mh = sum(b"hello", Func.sha2_256, length=16)
assert len(mh.digest) == 16

# Full digest (explicit)
mh_full = sum(b"hello", Func.sha2_256, length=-1)
assert len(mh_full.digest) == 32  # Full SHA-256 digest

Streaming Hash Computation

For large files or streams, use sum_stream():

from multihash import sum_stream, Func
from io import BytesIO

# From a file
with open("large_file.bin", "rb") as f:
    mh = sum_stream(f, Func.sha2_256)

# From BytesIO
data = BytesIO(b"streaming data")
mh = sum_stream(data, "sha2-256")

# With truncation
with open("file.bin", "rb") as f:
    mh = sum_stream(f, Func.sha2_256, length=16)

SHAKE Variable-Length Hashes

SHAKE-128 and SHAKE-256 support variable output lengths:

from multihash import sum, Func

# SHAKE-128 with default length (32 bytes)
mh = sum(b"hello", Func.shake_128)
assert len(mh.digest) == 32

# SHAKE-128 with custom length
mh = sum(b"hello", Func.shake_128, length=48)
assert len(mh.digest) == 48

# SHAKE-256 with default length (64 bytes)
mh = sum(b"hello", Func.shake_256)
assert len(mh.digest) == 64

Modern Hash Functions

BLAKE3

BLAKE3 is a cryptographic hash function that is much faster than MD5, SHA-1, SHA-2, and SHA-3, yet is just as secure as the latest standard SHA-3:

from multihash import digest, sum, Func

# Using BLAKE3
mh = digest(b"hello world", "blake3")
print(mh.digest.hex())

# Or with Func enum
mh = sum(b"hello world", Func.blake3)

BLAKE2 Variants

BLAKE2b and BLAKE2s support configurable digest sizes. BLAKE2b supports 8 to 512 bits, while BLAKE2s supports 8 to 256 bits:

from multihash import digest, Func

# BLAKE2b with 256-bit output
mh = digest(b"data", "blake2b-256")

# BLAKE2b with 512-bit output (full)
mh = digest(b"data", "blake2b-512")

# BLAKE2s with 128-bit output
mh = digest(b"data", "blake2s-128")

# All variants from 8 to 512 bits are supported
mh = digest(b"data", "blake2b-384")  # 384-bit BLAKE2b

Non-Cryptographic Hash Functions

MurmurHash3

MurmurHash3 is a fast, non-cryptographic hash function suitable for hash-based lookups, bloom filters, and other applications where speed is more important than cryptographic security:

from multihash import digest, Func

# MurmurHash3 128-bit
mh = digest(b"hello world", "murmur3-128")
print(mh.digest.hex())

# MurmurHash3 32-bit
mh = digest(b"hello world", "murmur3-32")

Warning

MurmurHash3 is NOT suitable for cryptographic purposes or security-sensitive applications. Use SHA-256, SHA-3, BLAKE2, or BLAKE3 for security-critical hashing.

Specialized Hash Functions

Double-SHA-256

Double-SHA-256 (SHA-256 applied twice) is commonly used in Bitcoin and other cryptocurrencies:

from multihash import digest, Func

# Double-SHA-256 (used in Bitcoin)
mh = digest(b"block data", "dbl-sha2-256")
print(mh.digest.hex())

# Equivalent to: SHA-256(SHA-256(data))

Security Considerations

Warning

MD5 and MD4 are included for backward compatibility but are cryptographically broken. Do not use MD5 or MD4 for security-sensitive applications. Use SHA-256 or stronger hash functions (e.g., SHA-512, SHA3-256, BLAKE2b) instead.

Error Handling

Custom exceptions are provided for better error handling:

from multihash import sum, Func, TruncationError, HashComputationError

try:
    # This will raise TruncationError
    mh = sum(b"hello", Func.sha2_256, length=100)
except TruncationError as e:
    print(f"Truncation error: {e}")

Verification

You can verify data against a multihash:

from multihash import digest, Func

mh = digest(b"hello", Func.sha2_256)

# Verify the data
assert mh.verify(b"hello") is True
assert mh.verify(b"world") is False

MultihashSet Collection

Manage collections of unique multihash values using MultihashSet:

from multihash import MultihashSet, sum, Func

# Create a new set
mh_set = MultihashSet()

# Add multihashes (Go-style API)
mh1 = sum(b"file1", Func.sha2_256)
mh2 = sum(b"file2", Func.sha2_256)
mh_set.Add(mh1)
mh_set.Add(mh2)

# Or use Python-style API
mh_set.add(mh1)

# Check membership
assert mh_set.Has(mh1) is True  # Go-style
assert mh1 in mh_set  # Python-style

# Get all items
all_hashes = mh_set.All()

# Remove items
mh_set.Remove(mh1)  # Raises KeyError if not present
mh_set.discard(mh2)  # Doesn't raise if not present

# Set operations
set1 = MultihashSet([mh1, mh2])
set2 = MultihashSet([mh2, mh3])
union = set1.union(set2)
intersection = set1.intersection(set2)
difference = set1.difference(set2)

# Iterate over the set
for mh in mh_set:
    print(mh)

JSON Serialization

Convert multihash objects to and from JSON format:

from multihash import sum, Func, from_json

# Create a multihash
mh = sum(b"hello world", Func.sha2_256)

# Serialize to JSON (compact format)
json_str = mh.to_json()
# {"code": 18, "length": 32, "digest": "base64:..."}

# Serialize to JSON (verbose format with name)
json_str = mh.to_json(verbose=True)
# {"code": 18, "name": "sha2-256", "length": 32, "digest": "base64:..."}

# Deserialize from JSON
mh_restored = from_json(json_str)
assert mh_restored == mh

# Works with both compact and verbose formats
mh1 = from_json('{"code": 18, "length": 32, "digest": "..."}')
mh2 = from_json('{"code": 18, "name": "sha2-256", "length": 32, "digest": "..."}')