Files
py-libp2p/libp2p/kad_dht/utils.py
Sumanjeet d61bca78ab Kademlia DHT implementation in py-libp2p (#579)
* initialise the module

* added content routing

* added routing module

* added peer routing

* added value store

* added utilities functions

* added main kademlia file

* fixed create_key_from_binary function

* example to test kademlia dht

* added protocol ID and enhanced logging for peer store size in provider and consumer nodes

* refactor: specify stream type in handle_stream method and add peer in routing table

* removed content routing

* added default value of count for finding closest peers

* added functions to find close peers

* refactor: remove content routing and enhance peer discovery

* added put value function

* added get value function

* fix: improve logging and handle key encoding in get_value method

* refactor: remove ContentRouting import from __init__.py

* refactor: improved basic kademlia example

* added protobuf files

* replaced json with protobuf

* refactor: enhance peer discovery and routing logic in KadDHT

* refactor: enhance Kademlia routing table to use PeerInfo objects and improve peer management

* refactor: enhance peer addition logic to utilize PeerInfo objects in routing table

* feat: implement content provider functionality in Kademlia DHT

* refactor: update value store to use datetime for validity management

* refactor: update RoutingTable initialization to include host reference

* refactor: enhance KBucket and RoutingTable for improved peer management and functionality

* refactor: streamline peer discovery and value storage methods in KadDHT

* refactor: update KadDHT and related classes for async peer management and enhanced value storage

* refactor: enhance ProviderStore initialization and improve peer routing integration

* test: add tests for Kademlia DHT functionality

* fix linting issues

* pydocstyle issues fixed

* CICD pipeline issues solved

* fix: update docstring format for find_peer method

* refactor: improve logging and remove unused code in DHT implementation

* refactor: clean up logging and remove unused imports in DHT and test files

* Refactor logging setup and improve DHT stream handling with varint length prefixes

* Update bootstrap peer handling in basic_dht example and refactor peer routing to accept string addresses

* Enhance peer querying in Kademlia DHT by implementing parallel queries using Trio.

* Enhance peer querying by adding deduplication checks

* Refactor DHT implementation to use varint for length prefixes and enhance logging for better traceability

* Add base58 encoding for value storage and enhance logging in basic_dht example

* Refactor Kademlia DHT to support server/client modes

* Added unit tests

* Refactor documentation to fixsome warning

* Add unit tests and remove outdated tests

* Fixed precommit errora

* Refactor error handling test to raise StringParseError for invalid bootstrap addresses

* Add libp2p.kad_dht to the list of subpackages in documentation

* Fix expiration and republish checks to use inclusive comparison

* Add __init__.py file to libp2p.kad_dht.pb package

* Refactor get value and put value to run in parallel with query timeout

* Refactor provider message handling to use parallel processing with timeout

* Add methods for provider store in KadDHT class

* Refactor KadDHT and ProviderStore methods to improve type hints and enhance parallel processing

* Add documentation for libp2p.kad_dht.pb module.

* Update documentation for libp2p.kad_dht package to include subpackages and correct formatting

* Fix formatting in documentation for libp2p.kad_dht package by correcting the subpackage reference

* Fix header formatting in libp2p.kad_dht.pb documentation

* Change log level from info to debug for various logging statements.

* fix CICD issues (post revamp)

* fixed value store unit test

* Refactored kademlia example

* Refactor Kademlia example: enhance logging, improve bootstrap node connection, and streamline server address handling

* removed bootstrap module

* Refactor Kademlia DHT example and core modules: enhance logging, remove unused code, and improve peer handling

* Added docs of kad dht example

* Update server address log file path to use the script's directory

* Refactor: Introduce DHTMode enum for clearer mode management

* moved xor_distance function to utils.py

* Enhance logging in ValueStore and KadDHT: include decoded value in debug logs and update parameter description for validity

* Add handling for closest peers in GET_VALUE response when value is not found

* Handled failure scenario for PUT_VALUE

* Remove kademlia demo from project scripts and contributing documentation

* spelling and logging

---------

Co-authored-by: pacrob <5199899+pacrob@users.noreply.github.com>
2025-06-16 14:46:40 -06:00

118 lines
2.8 KiB
Python

"""
Utility functions for Kademlia DHT implementation.
"""
import base58
import multihash
from libp2p.peer.id import (
ID,
)
def create_key_from_binary(binary_data: bytes) -> bytes:
"""
Creates a key for the DHT by hashing binary data with SHA-256.
params: binary_data: The binary data to hash.
Returns
-------
bytes: The resulting key.
"""
return multihash.digest(binary_data, "sha2-256").digest
def xor_distance(key1: bytes, key2: bytes) -> int:
"""
Calculate the XOR distance between two keys.
params: key1: First key (bytes)
params: key2: Second key (bytes)
Returns
-------
int: The XOR distance between the keys
"""
# Ensure the inputs are bytes
if not isinstance(key1, bytes) or not isinstance(key2, bytes):
raise TypeError("Both key1 and key2 must be bytes objects")
# Convert to integers
k1 = int.from_bytes(key1, byteorder="big")
k2 = int.from_bytes(key2, byteorder="big")
# Calculate XOR distance
return k1 ^ k2
def bytes_to_base58(data: bytes) -> str:
"""
Convert bytes to base58 encoded string.
params: data: Input bytes
Returns
-------
str: Base58 encoded string
"""
return base58.b58encode(data).decode("utf-8")
def sort_peer_ids_by_distance(target_key: bytes, peer_ids: list[ID]) -> list[ID]:
"""
Sort a list of peer IDs by their distance to the target key.
params: target_key: The target key to measure distance from
params: peer_ids: List of peer IDs to sort
Returns
-------
List[ID]: Sorted list of peer IDs from closest to furthest
"""
def get_distance(peer_id: ID) -> int:
# Hash the peer ID bytes to get a key for distance calculation
peer_hash = multihash.digest(peer_id.to_bytes(), "sha2-256").digest
return xor_distance(target_key, peer_hash)
return sorted(peer_ids, key=get_distance)
def shared_prefix_len(first: bytes, second: bytes) -> int:
"""
Calculate the number of prefix bits shared by two byte sequences.
params: first: First byte sequence
params: second: Second byte sequence
Returns
-------
int: Number of shared prefix bits
"""
# Compare each byte to find the first bit difference
common_length = 0
for i in range(min(len(first), len(second))):
byte_first = first[i]
byte_second = second[i]
if byte_first == byte_second:
common_length += 8
else:
# Find specific bit where they differ
xor = byte_first ^ byte_second
# Count leading zeros in the xor result
for j in range(7, -1, -1):
if (xor >> j) & 1 == 1:
return common_length + (7 - j)
# This shouldn't be reached if xor != 0
return common_length + 8
return common_length