Compare commits

...

33 Commits

Author SHA1 Message Date
6881d2e960 Complete documentation coverage: add final module docstrings
Co-authored-by: varun-r-mallya <100590632+varun-r-mallya@users.noreply.github.com>
2025-10-08 17:30:03 +00:00
d9dfb61000 Add remaining docstrings to complete documentation coverage
Co-authored-by: varun-r-mallya <100590632+varun-r-mallya@users.noreply.github.com>
2025-10-08 17:25:29 +00:00
cdf4f3e885 Add module-level docstrings and helper utility docstrings
Co-authored-by: varun-r-mallya <100590632+varun-r-mallya@users.noreply.github.com>
2025-10-08 17:20:45 +00:00
5b20b08d9f Add docstrings to core modules and helper functions
Co-authored-by: varun-r-mallya <100590632+varun-r-mallya@users.noreply.github.com>
2025-10-08 17:16:05 +00:00
9f103c34a0 Initial plan 2025-10-08 17:04:52 +00:00
6402cf7be5 remove todos and move to projects on github. 2025-10-08 22:27:51 +05:30
9a96e1247b Merge pull request #29 from pythonbpf/smol_pp
add patch for Kernel 6.14 BTF in transpiler
2025-10-08 21:47:49 +05:30
989134f4be add patch for Kernel 6.14 BTF
Signed-off-by: varun-r-mallya <varunrmallya@gmail.com>
2025-10-08 21:47:02 +05:30
120aec08da Update TODO.md 2025-10-08 21:40:14 +05:30
e66ae7cc89 Add failing oneline IfExpr conditional test 2025-10-08 21:36:08 +05:30
32dc8e6636 Merge pull request #21 from pythonbpf/globals
Adds support for globals
SO......
*I'm not merging this because it's complete, but because I don't want it to diverge from master too much.
*Stuff I still need to complete:
-> Structs and eval expressions in these globals.
-> handling the global keyword.
-> assigning back to the global and reading from inside a function.
-> Basically, `global` keyword in Python is used to write only and reading can be done directly without declaring as global as a direct assign without global declaration is going to diverge from Python.
-> The above logic is going to be supported by `global_sym_tab` generated using the new order of passes that we are doing.
-> This needs to be fixed and done ASAP to avoid conflicts. so yes, im  gonna do it soon.
2025-10-08 14:48:37 +05:30
8e3942d38c format chore 2025-10-08 14:31:37 +05:30
8485460374 Merge pull request #26 from pythonbpf/refactor_conds
Refactor conds
2025-10-08 07:28:08 +05:30
0c977514af Add TODO for fixing struct_kioctx issue 2025-10-08 05:34:25 +05:30
1207730ce3 update vmlinux.py 2025-10-08 05:27:56 +05:30
0d9dcd122c Merge pull request #27 from pythonbpf/vmlinux
Add vmlinux transpiler from experiments
2025-10-08 05:19:44 +05:30
8a69e05ee2 fix duplicate variable in example
Signed-off-by: varun-r-mallya <varunrmallya@gmail.com>
2025-10-08 05:18:49 +05:30
976af290af Revert "format chore"
This reverts commit a3443ab1d5.
2025-10-08 05:17:59 +05:30
a3443ab1d5 format chore
Signed-off-by: varun-r-mallya <varunrmallya@gmail.com>
2025-10-08 05:16:36 +05:30
a27360482b complete vmlinux transpiler.
TODO: struct_kioctx for x86_64 vmlinux.h has anonymous structs that refused to transpile well, so an extra rule has been written to make only the structs of that external. Fix this in the future.
2025-10-08 05:15:29 +05:30
c423cc647d add vmlinux.py transpiler from experiment repository
Signed-off-by: varun-r-mallya <varunrmallya@gmail.com>
2025-10-08 00:45:30 +05:30
8554688230 Merge pull request #25 from pythonbpf/dependabot/github_actions/actions-6a14be197d
Bump the actions group with 2 updates
2025-10-06 19:32:01 +05:30
3e873f378e Bump the actions group with 2 updates
Bumps the actions group with 2 updates: [actions/checkout](https://github.com/actions/checkout) and [actions/setup-python](https://github.com/actions/setup-python).


Updates `actions/checkout` from 4 to 5
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4...v5)

Updates `actions/setup-python` from 5 to 6
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](https://github.com/actions/setup-python/compare/v5...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions
- dependency-name: actions/setup-python
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-10-06 11:26:23 +00:00
3abe07c5b2 add global symbol table populate function 2025-10-05 14:05:10 +05:30
01bd7604ed add global symbol table populate function 2025-10-05 14:04:25 +05:30
7ae84a0d5a add failing test 2025-10-05 00:55:38 +05:30
df3f00261a changer order of passes 2025-10-04 08:17:16 +05:30
ab610147a5 update globals test and todos. 2025-10-04 06:36:51 +05:30
7720fe9f9f format chore 2025-10-04 06:33:09 +05:30
7aeac86bd3 fix broken IR generation logic for globals 2025-10-04 06:32:25 +05:30
ab1c4223d5 fix broken IR generation logic for globals 2025-10-03 22:55:40 +05:30
c3a512d5cf add global support with broken generation function 2025-10-03 22:20:04 +05:30
4a60c42cd0 add global failing test
Signed-off-by: varun-r-mallya <varunrmallya@gmail.com>
2025-10-03 21:25:58 +05:30
39 changed files with 234292 additions and 187708 deletions

View File

@ -12,8 +12,8 @@ jobs:
name: Format name: Format
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v5
- uses: actions/setup-python@v5 - uses: actions/setup-python@v6
with: with:
python-version: "3.x" python-version: "3.x"
- uses: pre-commit/action@v3.0.1 - uses: pre-commit/action@v3.0.1

12
TODO.md
View File

@ -1,12 +0,0 @@
## Short term
- Implement enough functionality to port the BCC tutorial examples in PythonBPF
- Add all maps
- XDP support in pylibbpf
- ringbuf support
- recursive expression resolution
## Long term
- Refactor the codebase to be better than a hackathon project
- Port to C++ and use actual LLVM?

27
examples/kprobes.py Normal file
View File

@ -0,0 +1,27 @@
from pythonbpf import bpf, section, bpfglobal, BPF
from ctypes import c_void_p, c_int64
@bpf
@section("kretprobe/do_unlinkat")
def hello_world(ctx: c_void_p) -> c_int64:
print("Hello, World!")
return c_int64(0)
@bpf
@section("kprobe/do_unlinkat")
def hello_world2(ctx: c_void_p) -> c_int64:
print("Hello, World!")
return c_int64(0)
@bpf
@bpfglobal
def LICENSE() -> str:
return "GPL"
b = BPF()
b.load_and_attach()
while True:
print("running")
# Now cat /sys/kernel/debug/tracing/trace_pipe to see results of unlink kprobe.

File diff suppressed because it is too large Load Diff

View File

@ -1,3 +1,10 @@
"""
PythonBPF - A Python frontend for eBPF programs.
This package provides decorators and compilation tools to write BPF programs
in Python syntax and compile them to eBPF bytecode that can run in the kernel.
"""
from .decorators import bpf, map, section, bpfglobal, struct from .decorators import bpf, map, section, bpfglobal, struct
from .codegen import compile_to_ir, compile, BPF from .codegen import compile_to_ir, compile, BPF

View File

@ -1,3 +1,10 @@
"""
Binary operations handling for BPF programs.
This module provides functions to handle binary operations (add, subtract,
multiply, etc.) and emit the corresponding LLVM IR instructions.
"""
import ast import ast
from llvmlite import ir from llvmlite import ir
from logging import Logger from logging import Logger
@ -35,6 +42,17 @@ def get_operand_value(operand, builder, local_sym_tab):
def handle_binary_op_impl(rval, builder, local_sym_tab): def handle_binary_op_impl(rval, builder, local_sym_tab):
"""
Handle binary operations and emit corresponding LLVM IR instructions.
Args:
rval: The AST BinOp node representing the binary operation
builder: LLVM IR builder for emitting instructions
local_sym_tab: Symbol table mapping variable names to their IR representations
Returns:
The LLVM IR value representing the result of the binary operation
"""
op = rval.op op = rval.op
left = get_operand_value(rval.left, builder, local_sym_tab) left = get_operand_value(rval.left, builder, local_sym_tab)
right = get_operand_value(rval.right, builder, local_sym_tab) right = get_operand_value(rval.right, builder, local_sym_tab)
@ -63,6 +81,18 @@ def handle_binary_op_impl(rval, builder, local_sym_tab):
def handle_binary_op(rval, builder, var_name, local_sym_tab): def handle_binary_op(rval, builder, var_name, local_sym_tab):
"""
Handle binary operations and optionally store the result to a variable.
Args:
rval: The AST BinOp node representing the binary operation
builder: LLVM IR builder for emitting instructions
var_name: Optional variable name to store the result
local_sym_tab: Symbol table mapping variable names to their IR representations
Returns:
A tuple of (result_value, result_type)
"""
result = handle_binary_op_impl(rval, builder, local_sym_tab) result = handle_binary_op_impl(rval, builder, local_sym_tab)
if var_name and var_name in local_sym_tab: if var_name and var_name in local_sym_tab:
logger.info( logger.info(

View File

@ -1,10 +1,22 @@
"""
Code generation module for PythonBPF.
This module handles the conversion of Python BPF programs to LLVM IR and
object files. It provides the main compilation pipeline from Python AST
to BPF bytecode.
"""
import ast import ast
from llvmlite import ir from llvmlite import ir
from .license_pass import license_processing from .license_pass import license_processing
from .functions import func_proc from .functions import func_proc
from .maps import maps_proc from .maps import maps_proc
from .structs import structs_proc from .structs import structs_proc
from .globals_pass import globals_processing from .globals_pass import (
globals_list_creation,
globals_processing,
populate_global_symbol_table,
)
from .debuginfo import DW_LANG_C11, DwarfBehaviorEnum, DebugInfoGenerator from .debuginfo import DW_LANG_C11, DwarfBehaviorEnum, DebugInfoGenerator
import os import os
import subprocess import subprocess
@ -33,6 +45,14 @@ def find_bpf_chunks(tree):
def processor(source_code, filename, module): def processor(source_code, filename, module):
"""
Process Python source code and convert BPF-decorated functions to LLVM IR.
Args:
source_code: The Python source code to process
filename: The name of the source file
module: The LLVM IR module to populate
"""
tree = ast.parse(source_code, filename) tree = ast.parse(source_code, filename)
logger.debug(ast.dump(tree, indent=4)) logger.debug(ast.dump(tree, indent=4))
@ -40,15 +60,29 @@ def processor(source_code, filename, module):
for func_node in bpf_chunks: for func_node in bpf_chunks:
logger.info(f"Found BPF function/struct: {func_node.name}") logger.info(f"Found BPF function/struct: {func_node.name}")
populate_global_symbol_table(tree, module)
license_processing(tree, module)
globals_processing(tree, module)
structs_sym_tab = structs_proc(tree, module, bpf_chunks) structs_sym_tab = structs_proc(tree, module, bpf_chunks)
map_sym_tab = maps_proc(tree, module, bpf_chunks) map_sym_tab = maps_proc(tree, module, bpf_chunks)
func_proc(tree, module, bpf_chunks, map_sym_tab, structs_sym_tab) func_proc(tree, module, bpf_chunks, map_sym_tab, structs_sym_tab)
license_processing(tree, module) globals_list_creation(tree, module)
globals_processing(tree, module)
def compile_to_ir(filename: str, output: str, loglevel=logging.INFO): def compile_to_ir(filename: str, output: str, loglevel=logging.INFO):
"""
Compile a Python BPF program to LLVM IR.
Args:
filename: Path to the Python source file containing BPF programs
output: Path where the LLVM IR (.ll) file will be written
loglevel: Logging level for compilation messages
Returns:
Path to the generated LLVM IR file
"""
logging.basicConfig( logging.basicConfig(
level=loglevel, format="%(asctime)s [%(levelname)s] %(name)s: %(message)s" level=loglevel, format="%(asctime)s [%(levelname)s] %(name)s: %(message)s"
) )
@ -122,6 +156,18 @@ def compile_to_ir(filename: str, output: str, loglevel=logging.INFO):
def compile(loglevel=logging.INFO) -> bool: def compile(loglevel=logging.INFO) -> bool:
"""
Compile the calling Python BPF program to an object file.
This function should be called from a Python file containing BPF programs.
It will compile the calling file to LLVM IR and then to a BPF object file.
Args:
loglevel: Logging level for compilation messages
Returns:
True if compilation succeeded, False otherwise
"""
# Look one level up the stack to the caller of this function # Look one level up the stack to the caller of this function
caller_frame = inspect.stack()[1] caller_frame = inspect.stack()[1]
caller_file = Path(caller_frame.filename).resolve() caller_file = Path(caller_frame.filename).resolve()
@ -155,6 +201,18 @@ def compile(loglevel=logging.INFO) -> bool:
def BPF(loglevel=logging.INFO) -> BpfProgram: def BPF(loglevel=logging.INFO) -> BpfProgram:
"""
Compile the calling Python BPF program and return a BpfProgram object.
This function compiles the calling file's BPF programs to an object file
and loads it into a BpfProgram object for immediate use.
Args:
loglevel: Logging level for compilation messages
Returns:
A BpfProgram object that can be used to load and attach BPF programs
"""
caller_frame = inspect.stack()[1] caller_frame = inspect.stack()[1]
src = inspect.getsource(caller_frame.frame) src = inspect.getsource(caller_frame.frame)
with tempfile.NamedTemporaryFile( with tempfile.NamedTemporaryFile(

View File

@ -1,3 +1,5 @@
"""Debug information generation for BPF programs (DWARF/BTF)."""
from .dwarf_constants import * # noqa: F403 from .dwarf_constants import * # noqa: F403
from .dtypes import * # noqa: F403 from .dtypes import * # noqa: F403
from .debug_info_generator import DebugInfoGenerator from .debug_info_generator import DebugInfoGenerator

View File

@ -8,11 +8,31 @@ from typing import Any, List
class DebugInfoGenerator: class DebugInfoGenerator:
"""
Generator for DWARF/BTF debug information in LLVM IR modules.
This class provides methods to create debug metadata for BPF programs,
including types, structs, globals, and compilation units.
"""
def __init__(self, module): def __init__(self, module):
"""
Initialize the debug info generator.
Args:
module: LLVM IR module to attach debug info to
"""
self.module = module self.module = module
self._type_cache = {} # Cache for common debug types self._type_cache = {} # Cache for common debug types
def generate_file_metadata(self, filename, dirname): def generate_file_metadata(self, filename, dirname):
"""
Generate file metadata for debug info.
Args:
filename: Name of the source file
dirname: Directory containing the source file
"""
self.module._file_metadata = self.module.add_debug_info( self.module._file_metadata = self.module.add_debug_info(
"DIFile", "DIFile",
{ # type: ignore { # type: ignore
@ -24,6 +44,15 @@ class DebugInfoGenerator:
def generate_debug_cu( def generate_debug_cu(
self, language, producer: str, is_optimized: bool, is_distinct: bool self, language, producer: str, is_optimized: bool, is_distinct: bool
): ):
"""
Generate debug compile unit metadata.
Args:
language: DWARF language code (e.g., DW_LANG_C11)
producer: Compiler/producer string
is_optimized: Whether the code is optimized
is_distinct: Whether the compile unit should be distinct
"""
self.module._debug_compile_unit = self.module.add_debug_info( self.module._debug_compile_unit = self.module.add_debug_info(
"DICompileUnit", "DICompileUnit",
{ # type: ignore { # type: ignore
@ -83,6 +112,16 @@ class DebugInfoGenerator:
@staticmethod @staticmethod
def _compute_array_size(base_type: Any, count: int) -> int: def _compute_array_size(base_type: Any, count: int) -> int:
"""
Compute the size of an array in bits.
Args:
base_type: The base type of the array
count: Number of elements in the array
Returns:
Total size in bits
"""
# Extract size from base_type if possible # Extract size from base_type if possible
# For simplicity, assuming base_type has a size attribute # For simplicity, assuming base_type has a size attribute
return getattr(base_type, "size", 32) * count return getattr(base_type, "size", 32) * count

View File

@ -1,7 +1,10 @@
"""Debug information types and constants."""
import llvmlite.ir as ir import llvmlite.ir as ir
class DwarfBehaviorEnum: class DwarfBehaviorEnum:
"""DWARF module flag behavior constants for LLVM."""
ERROR_IF_MISMATCH = ir.Constant(ir.IntType(32), 1) ERROR_IF_MISMATCH = ir.Constant(ir.IntType(32), 1)
WARNING_IF_MISMATCH = ir.Constant(ir.IntType(32), 2) WARNING_IF_MISMATCH = ir.Constant(ir.IntType(32), 2)
OVERRIDE_USE_LARGEST = ir.Constant(ir.IntType(32), 7) OVERRIDE_USE_LARGEST = ir.Constant(ir.IntType(32), 7)

View File

@ -1,3 +1,9 @@
"""
DWARF debugging format constants.
Generated constants from dwarf.h for use in debug information generation.
"""
# generated constants from dwarf.h # generated constants from dwarf.h
DW_UT_compile = 0x01 DW_UT_compile = 0x01

View File

@ -1,3 +1,11 @@
"""
Decorators for marking BPF functions, maps, structs, and globals.
This module provides the core decorators used to annotate Python code
for BPF compilation.
"""
def bpf(func): def bpf(func):
"""Decorator to mark a function for BPF compilation.""" """Decorator to mark a function for BPF compilation."""
func._is_bpf = True func._is_bpf = True
@ -23,7 +31,17 @@ def struct(cls):
def section(name: str): def section(name: str):
"""
Decorator to specify the ELF section name for a BPF program.
Args:
name: The section name (e.g., 'xdp', 'tracepoint/syscalls/sys_enter_execve')
Returns:
A decorator function that marks the function with the section name
"""
def wrapper(fn): def wrapper(fn):
"""Decorator that sets the section name on the function."""
fn._section = name fn._section = name
return fn return fn

View File

@ -1,3 +1,5 @@
"""Expression evaluation and processing for BPF programs."""
from .expr_pass import eval_expr, handle_expr from .expr_pass import eval_expr, handle_expr
from .type_normalization import convert_to_bool from .type_normalization import convert_to_bool

View File

@ -1,3 +1,11 @@
"""
Expression evaluation and LLVM IR generation.
This module handles the evaluation of Python expressions in BPF programs,
including variables, constants, function calls, comparisons, boolean
operations, and more.
"""
import ast import ast
from llvmlite import ir from llvmlite import ir
from logging import Logger from logging import Logger
@ -332,6 +340,21 @@ def eval_expr(
map_sym_tab, map_sym_tab,
structs_sym_tab=None, structs_sym_tab=None,
): ):
"""
Evaluate an expression and return its LLVM IR value and type.
Args:
func: The LLVM IR function being built
module: The LLVM IR module
builder: LLVM IR builder
expr: The AST expression node to evaluate
local_sym_tab: Local symbol table
map_sym_tab: Map symbol table
structs_sym_tab: Struct symbol table
Returns:
A tuple of (value, type) or None if evaluation fails
"""
logger.info(f"Evaluating expression: {ast.dump(expr)}") logger.info(f"Evaluating expression: {ast.dump(expr)}")
if isinstance(expr, ast.Name): if isinstance(expr, ast.Name):
return _handle_name_expr(expr, local_sym_tab, builder) return _handle_name_expr(expr, local_sym_tab, builder)

View File

@ -1,3 +1,10 @@
"""
Type normalization and comparison operations for expressions.
This module provides utilities for normalizing types between expressions,
handling pointer dereferencing, and generating comparison operations.
"""
from llvmlite import ir from llvmlite import ir
import logging import logging
import ast import ast
@ -17,7 +24,15 @@ COMPARISON_OPS = {
def _get_base_type_and_depth(ir_type): def _get_base_type_and_depth(ir_type):
"""Get the base type for pointer types.""" """
Get the base type and pointer depth for an LLVM IR type.
Args:
ir_type: The LLVM IR type to analyze
Returns:
A tuple of (base_type, depth) where depth is the number of pointer levels
"""
cur_type = ir_type cur_type = ir_type
depth = 0 depth = 0
while isinstance(cur_type, ir.PointerType): while isinstance(cur_type, ir.PointerType):
@ -27,7 +42,18 @@ def _get_base_type_and_depth(ir_type):
def _deref_to_depth(func, builder, val, target_depth): def _deref_to_depth(func, builder, val, target_depth):
"""Dereference a pointer to a certain depth.""" """
Dereference a pointer to a certain depth with null checks.
Args:
func: The LLVM IR function being built
builder: LLVM IR builder
val: The pointer value to dereference
target_depth: Number of levels to dereference
Returns:
The dereferenced value, or None if dereferencing fails
"""
cur_val = val cur_val = val
cur_type = val.type cur_type = val.type
@ -73,7 +99,18 @@ def _deref_to_depth(func, builder, val, target_depth):
def _normalize_types(func, builder, lhs, rhs): def _normalize_types(func, builder, lhs, rhs):
"""Normalize types for comparison.""" """
Normalize types for comparison by casting or dereferencing as needed.
Args:
func: The LLVM IR function being built
builder: LLVM IR builder
lhs: Left-hand side value
rhs: Right-hand side value
Returns:
A tuple of (normalized_lhs, normalized_rhs) or (None, None) on error
"""
logger.info(f"Normalizing types: {lhs.type} vs {rhs.type}") logger.info(f"Normalizing types: {lhs.type} vs {rhs.type}")
if isinstance(lhs.type, ir.IntType) and isinstance(rhs.type, ir.IntType): if isinstance(lhs.type, ir.IntType) and isinstance(rhs.type, ir.IntType):
@ -99,7 +136,16 @@ def _normalize_types(func, builder, lhs, rhs):
def convert_to_bool(builder, val): def convert_to_bool(builder, val):
"""Convert a value to boolean.""" """
Convert an LLVM IR value to a boolean (i1) type.
Args:
builder: LLVM IR builder
val: The value to convert
Returns:
An i1 boolean value
"""
if val.type == ir.IntType(1): if val.type == ir.IntType(1):
return val return val
if isinstance(val.type, ir.PointerType): if isinstance(val.type, ir.PointerType):
@ -110,7 +156,19 @@ def convert_to_bool(builder, val):
def handle_comparator(func, builder, op, lhs, rhs): def handle_comparator(func, builder, op, lhs, rhs):
"""Handle comparison operations.""" """
Handle comparison operations between two values.
Args:
func: The LLVM IR function being built
builder: LLVM IR builder
op: The AST comparison operator node
lhs: Left-hand side value
rhs: Right-hand side value
Returns:
A tuple of (result, ir.IntType(1)) or None on error
"""
if lhs.type != rhs.type: if lhs.type != rhs.type:
lhs, rhs = _normalize_types(func, builder, lhs, rhs) lhs, rhs = _normalize_types(func, builder, lhs, rhs)

View File

@ -1,3 +1,5 @@
"""BPF function processing and LLVM IR generation."""
from .functions_pass import func_proc from .functions_pass import func_proc
__all__ = ["func_proc"] __all__ = ["func_proc"]

View File

@ -1,3 +1,5 @@
"""Registry for statement handler functions."""
from typing import Dict from typing import Dict
@ -11,6 +13,7 @@ class StatementHandlerRegistry:
"""Register a handler for a specific statement type.""" """Register a handler for a specific statement type."""
def decorator(handler): def decorator(handler):
"""Decorator that registers the handler."""
cls._handlers[stmt_type] = handler cls._handlers[stmt_type] = handler
return handler return handler

View File

@ -1,3 +1,11 @@
"""
BPF function processing and LLVM IR generation.
This module handles the core function processing, converting Python function
definitions into LLVM IR for BPF programs. It manages local variables,
control flow, and statement processing.
"""
from llvmlite import ir from llvmlite import ir
import ast import ast
import logging import logging
@ -17,11 +25,20 @@ logger = logging.getLogger(__name__)
@dataclass @dataclass
class LocalSymbol: class LocalSymbol:
"""
Represents a local variable in a BPF function.
Attributes:
var: LLVM IR alloca instruction for the variable
ir_type: LLVM IR type of the variable
metadata: Optional metadata (e.g., struct type name)
"""
var: ir.AllocaInstr var: ir.AllocaInstr
ir_type: ir.Type ir_type: ir.Type
metadata: Any = None metadata: Any = None
def __iter__(self): def __iter__(self):
"""Support tuple unpacking of LocalSymbol."""
yield self.var yield self.var
yield self.ir_type yield self.ir_type
yield self.metadata yield self.metadata
@ -243,6 +260,21 @@ def handle_assign(
def handle_cond( def handle_cond(
func, module, builder, cond, local_sym_tab, map_sym_tab, structs_sym_tab=None func, module, builder, cond, local_sym_tab, map_sym_tab, structs_sym_tab=None
): ):
"""
Evaluate a condition expression and convert it to a boolean value.
Args:
func: The LLVM IR function being built
module: The LLVM IR module
builder: LLVM IR builder
cond: The AST condition node to evaluate
local_sym_tab: Local symbol table
map_sym_tab: Map symbol table
structs_sym_tab: Struct symbol table
Returns:
LLVM IR boolean value representing the condition result
"""
val = eval_expr( val = eval_expr(
func, module, builder, cond, local_sym_tab, map_sym_tab, structs_sym_tab func, module, builder, cond, local_sym_tab, map_sym_tab, structs_sym_tab
)[0] )[0]
@ -298,6 +330,18 @@ def handle_if(
def handle_return(builder, stmt, local_sym_tab, ret_type): def handle_return(builder, stmt, local_sym_tab, ret_type):
"""
Handle return statements in BPF functions.
Args:
builder: LLVM IR builder
stmt: The AST Return node
local_sym_tab: Local symbol table
ret_type: Expected return type
Returns:
True if a return was emitted, False otherwise
"""
logger.info(f"Handling return statement: {ast.dump(stmt)}") logger.info(f"Handling return statement: {ast.dump(stmt)}")
if stmt.value is None: if stmt.value is None:
return _handle_none_return(builder) return _handle_none_return(builder)
@ -329,6 +373,23 @@ def process_stmt(
did_return, did_return,
ret_type=ir.IntType(64), ret_type=ir.IntType(64),
): ):
"""
Process a single statement in a BPF function.
Args:
func: The LLVM IR function being built
module: The LLVM IR module
builder: LLVM IR builder
stmt: The AST statement node to process
local_sym_tab: Local symbol table
map_sym_tab: Map symbol table
structs_sym_tab: Struct symbol table
did_return: Whether a return has been emitted
ret_type: Expected return type
Returns:
True if a return was emitted, False otherwise
"""
logger.info(f"Processing statement: {ast.dump(stmt)}") logger.info(f"Processing statement: {ast.dump(stmt)}")
if isinstance(stmt, ast.Expr): if isinstance(stmt, ast.Expr):
handle_expr( handle_expr(
@ -363,6 +424,25 @@ def process_stmt(
def allocate_mem( def allocate_mem(
module, builder, body, func, ret_type, map_sym_tab, local_sym_tab, structs_sym_tab module, builder, body, func, ret_type, map_sym_tab, local_sym_tab, structs_sym_tab
): ):
"""
Pre-allocate stack memory for local variables in a BPF function.
This function scans the function body and creates alloca instructions
for all local variables before processing the function statements.
Args:
module: The LLVM IR module
builder: LLVM IR builder
body: List of AST statements in the function body
func: The LLVM IR function being built
ret_type: Expected return type
map_sym_tab: Map symbol table
local_sym_tab: Local symbol table to populate
structs_sym_tab: Struct symbol table
Returns:
Updated local symbol table
"""
for stmt in body: for stmt in body:
has_metadata = False has_metadata = False
if isinstance(stmt, ast.If): if isinstance(stmt, ast.If):
@ -556,6 +636,16 @@ def process_bpf_chunk(func_node, module, return_type, map_sym_tab, structs_sym_t
def func_proc(tree, module, chunks, map_sym_tab, structs_sym_tab): def func_proc(tree, module, chunks, map_sym_tab, structs_sym_tab):
"""
Process all BPF function chunks and generate LLVM IR.
Args:
tree: The Python AST (not used in current implementation)
module: The LLVM IR module to add functions to
chunks: List of AST function nodes decorated with @bpf
map_sym_tab: Map symbol table
structs_sym_tab: Struct symbol table
"""
for func_node in chunks: for func_node in chunks:
is_global = False is_global = False
for decorator in func_node.decorator_list: for decorator in func_node.decorator_list:
@ -581,6 +671,18 @@ def func_proc(tree, module, chunks, map_sym_tab, structs_sym_tab):
def infer_return_type(func_node: ast.FunctionDef): def infer_return_type(func_node: ast.FunctionDef):
"""
Infer the return type of a BPF function from annotations or return statements.
Args:
func_node: The AST function node
Returns:
String representation of the return type (e.g., 'c_int64')
Raises:
TypeError: If func_node is not a FunctionDef
"""
if not isinstance(func_node, (ast.FunctionDef, ast.AsyncFunctionDef)): if not isinstance(func_node, (ast.FunctionDef, ast.AsyncFunctionDef)):
raise TypeError("Expected ast.FunctionDef") raise TypeError("Expected ast.FunctionDef")
if func_node.returns is not None: if func_node.returns is not None:
@ -599,6 +701,7 @@ def infer_return_type(func_node: ast.FunctionDef):
found_type = None found_type = None
def _expr_type(e): def _expr_type(e):
"""Helper function to extract type from an expression."""
if e is None: if e is None:
return "None" return "None"
if isinstance(e, ast.Constant): if isinstance(e, ast.Constant):

View File

@ -1,3 +1,10 @@
"""
Utility functions for handling return statements in BPF functions.
Provides handlers for different types of returns including XDP actions,
None returns, and standard returns.
"""
import logging import logging
import ast import ast

View File

@ -1,10 +1,155 @@
"""
Global variables and compiler metadata processing.
This module handles BPF global variables and emits the @llvm.compiler.used
metadata to prevent LLVM from optimizing away important symbols.
"""
from llvmlite import ir from llvmlite import ir
import ast import ast
from logging import Logger
import logging
from .type_deducer import ctypes_to_ir
def emit_globals(module: ir.Module, names: list[str]): logger: Logger = logging.getLogger(__name__)
# TODO: this is going to be a huge fuck of a headache in the future.
global_sym_tab = []
def populate_global_symbol_table(tree, module: ir.Module):
""" """
Emit the @llvm.compiler.used global given a list of function/global names. Populate the global symbol table with BPF functions, maps, and globals.
Args:
tree: The Python AST to scan for global symbols
module: The LLVM IR module (not used in current implementation)
Returns:
False (legacy return value)
"""
for node in tree.body:
if isinstance(node, ast.FunctionDef):
for dec in node.decorator_list:
if (
isinstance(dec, ast.Call)
and isinstance(dec.func, ast.Name)
and dec.func.id == "section"
and len(dec.args) == 1
and isinstance(dec.args[0], ast.Constant)
and isinstance(dec.args[0].value, str)
):
global_sym_tab.append(node)
elif isinstance(dec, ast.Name) and dec.id == "bpfglobal":
global_sym_tab.append(node)
elif isinstance(dec, ast.Name) and dec.id == "map":
global_sym_tab.append(node)
return False
def emit_global(module: ir.Module, node, name):
"""
Emit a BPF global variable into the LLVM IR module.
Args:
module: The LLVM IR module to add the global variable to
node: The AST function node containing the global definition
name: The name of the global variable
Returns:
The created global variable
"""
logger.info(f"global identifier {name} processing")
# deduce LLVM type from the annotated return
if not isinstance(node.returns, ast.Name):
raise ValueError(f"Unsupported return annotation {ast.dump(node.returns)}")
ty = ctypes_to_ir(node.returns.id)
# extract the return expression
# TODO: turn this return extractor into a generic function I can use everywhere.
ret_stmt = node.body[0]
if not isinstance(ret_stmt, ast.Return) or ret_stmt.value is None:
raise ValueError(f"Global '{name}' has no valid return")
init_val = ret_stmt.value
# simple constant like "return 0"
if isinstance(init_val, ast.Constant):
llvm_init = ir.Constant(ty, init_val.value)
# variable reference like "return SOME_CONST"
elif isinstance(init_val, ast.Name):
# need symbol resolution here, stub as 0 for now
raise ValueError(f"Name reference {init_val.id} not yet supported")
# constructor call like "return c_int64(0)" or dataclass(...)
elif isinstance(init_val, ast.Call):
if len(init_val.args) >= 1 and isinstance(init_val.args[0], ast.Constant):
llvm_init = ir.Constant(ty, init_val.args[0].value)
else:
logger.info("Defaulting to zero as no constant argument found")
llvm_init = ir.Constant(ty, 0)
else:
raise ValueError(f"Unsupported return expr {ast.dump(init_val)}")
gvar = ir.GlobalVariable(module, ty, name=name)
gvar.initializer = llvm_init
gvar.align = 8
gvar.linkage = "dso_local"
gvar.global_constant = False
return gvar
def globals_processing(tree, module):
"""Process stuff decorated with @bpf and @bpfglobal except license and return the section name"""
globals_sym_tab = []
for node in tree.body:
# Skip non-assignment and non-function nodes
if not (isinstance(node, ast.FunctionDef)):
continue
# Get the name based on node type
if isinstance(node, ast.FunctionDef):
name = node.name
else:
continue
# Check for duplicate names
if name in globals_sym_tab:
raise SyntaxError(f"ERROR: Global name '{name}' previously defined")
else:
globals_sym_tab.append(name)
if isinstance(node, ast.FunctionDef) and node.name != "LICENSE":
decorators = [
dec.id for dec in node.decorator_list if isinstance(dec, ast.Name)
]
if "bpf" in decorators and "bpfglobal" in decorators:
if (
len(node.body) == 1
and isinstance(node.body[0], ast.Return)
and node.body[0].value is not None
and isinstance(
node.body[0].value, (ast.Constant, ast.Name, ast.Call)
)
):
emit_global(module, node, name)
else:
raise SyntaxError(f"ERROR: Invalid syntax for {name} global")
return None
def emit_llvm_compiler_used(module: ir.Module, names: list[str]):
"""
Emit the @llvm.compiler.used global to prevent LLVM from optimizing away symbols.
Args:
module: The LLVM IR module to add the compiler.used metadata to
names: List of function/global names that must be preserved
""" """
ptr_ty = ir.PointerType() ptr_ty = ir.PointerType()
used_array_ty = ir.ArrayType(ptr_ty, len(names)) used_array_ty = ir.ArrayType(ptr_ty, len(names))
@ -24,7 +169,14 @@ def emit_globals(module: ir.Module, names: list[str]):
gv.section = "llvm.metadata" gv.section = "llvm.metadata"
def globals_processing(tree, module: ir.Module): def globals_list_creation(tree, module: ir.Module):
"""
Collect all BPF symbols and emit @llvm.compiler.used metadata.
Args:
tree: The Python AST to scan for symbols
module: The LLVM IR module to add metadata to
"""
collected = ["LICENSE"] collected = ["LICENSE"]
for node in tree.body: for node in tree.body:
@ -40,10 +192,11 @@ def globals_processing(tree, module: ir.Module):
): ):
collected.append(node.name) collected.append(node.name)
elif isinstance(dec, ast.Name) and dec.id == "bpfglobal": # NOTE: all globals other than
collected.append(node.name) # elif isinstance(dec, ast.Name) and dec.id == "bpfglobal":
# collected.append(node.name)
elif isinstance(dec, ast.Name) and dec.id == "map": elif isinstance(dec, ast.Name) and dec.id == "map":
collected.append(node.name) collected.append(node.name)
emit_globals(module, collected) emit_llvm_compiler_used(module, collected)

View File

@ -1,3 +1,5 @@
"""BPF helper functions and handlers."""
from .helper_utils import HelperHandlerRegistry from .helper_utils import HelperHandlerRegistry
from .bpf_helper_handler import handle_helper_call from .bpf_helper_handler import handle_helper_call
from .helpers import ktime, pid, deref, XDP_DROP, XDP_PASS from .helpers import ktime, pid, deref, XDP_DROP, XDP_PASS

View File

@ -1,3 +1,11 @@
"""
BPF helper function handlers for LLVM IR emission.
This module provides handlers for various BPF helper functions, emitting
the appropriate LLVM IR to call kernel BPF helpers like map operations,
printing, time functions, etc.
"""
import ast import ast
from llvmlite import ir from llvmlite import ir
from enum import Enum from enum import Enum
@ -16,6 +24,7 @@ logger: Logger = logging.getLogger(__name__)
class BPFHelperID(Enum): class BPFHelperID(Enum):
"""Enumeration of BPF helper function IDs."""
BPF_MAP_LOOKUP_ELEM = 1 BPF_MAP_LOOKUP_ELEM = 1
BPF_MAP_UPDATE_ELEM = 2 BPF_MAP_UPDATE_ELEM = 2
BPF_MAP_DELETE_ELEM = 3 BPF_MAP_DELETE_ELEM = 3
@ -252,6 +261,11 @@ def bpf_perf_event_output_handler(
local_sym_tab=None, local_sym_tab=None,
struct_sym_tab=None, struct_sym_tab=None,
): ):
"""
Emit LLVM IR for bpf_perf_event_output helper function call.
This allows sending data to userspace via a perf event array.
"""
if len(call.args) != 1: if len(call.args) != 1:
raise ValueError( raise ValueError(
f"Perf event output expects exactly one argument, got {len(call.args)}" f"Perf event output expects exactly one argument, got {len(call.args)}"
@ -302,6 +316,7 @@ def handle_helper_call(
# Helper function to get map pointer and invoke handler # Helper function to get map pointer and invoke handler
def invoke_helper(method_name, map_ptr=None): def invoke_helper(method_name, map_ptr=None):
"""Helper function to look up and invoke a registered handler."""
handler = HelperHandlerRegistry.get_handler(method_name) handler = HelperHandlerRegistry.get_handler(method_name)
if not handler: if not handler:
raise NotImplementedError( raise NotImplementedError(

View File

@ -1,3 +1,11 @@
"""
Utility functions for BPF helper function handling.
This module provides utility functions for processing BPF helper function
calls, including argument handling, string formatting for bpf_printk,
and a registry for helper function handlers.
"""
import ast import ast
import logging import logging
from collections.abc import Callable from collections.abc import Callable
@ -18,6 +26,7 @@ class HelperHandlerRegistry:
"""Decorator to register a handler function for a helper""" """Decorator to register a handler function for a helper"""
def decorator(func): def decorator(func):
"""Decorator that registers the handler function."""
cls._handlers[helper_name] = func cls._handlers[helper_name] = func
return func return func
@ -35,14 +44,36 @@ class HelperHandlerRegistry:
def get_var_ptr_from_name(var_name, local_sym_tab): def get_var_ptr_from_name(var_name, local_sym_tab):
"""Get a pointer to a variable from the symbol table.""" """
Get a pointer to a variable from the symbol table.
Args:
var_name: Name of the variable to look up
local_sym_tab: Local symbol table
Returns:
Pointer to the variable
Raises:
ValueError: If the variable is not found
"""
if local_sym_tab and var_name in local_sym_tab: if local_sym_tab and var_name in local_sym_tab:
return local_sym_tab[var_name].var return local_sym_tab[var_name].var
raise ValueError(f"Variable '{var_name}' not found in local symbol table") raise ValueError(f"Variable '{var_name}' not found in local symbol table")
def create_int_constant_ptr(value, builder, int_width=64): def create_int_constant_ptr(value, builder, int_width=64):
"""Create a pointer to an integer constant.""" """
Create a pointer to an integer constant.
Args:
value: The integer value
builder: LLVM IR builder
int_width: Width of the integer in bits (default: 64)
Returns:
Pointer to the allocated integer constant
"""
# Default to 64-bit integer # Default to 64-bit integer
int_type = ir.IntType(int_width) int_type = ir.IntType(int_width)
ptr = builder.alloca(int_type) ptr = builder.alloca(int_type)
@ -52,7 +83,20 @@ def create_int_constant_ptr(value, builder, int_width=64):
def get_or_create_ptr_from_arg(arg, builder, local_sym_tab): def get_or_create_ptr_from_arg(arg, builder, local_sym_tab):
"""Extract or create pointer from the call arguments.""" """
Extract or create pointer from call arguments.
Args:
arg: The AST argument node
builder: LLVM IR builder
local_sym_tab: Local symbol table
Returns:
Pointer to the argument value
Raises:
NotImplementedError: If the argument type is not supported
"""
if isinstance(arg, ast.Name): if isinstance(arg, ast.Name):
ptr = get_var_ptr_from_name(arg.id, local_sym_tab) ptr = get_var_ptr_from_name(arg.id, local_sym_tab)
@ -66,7 +110,21 @@ def get_or_create_ptr_from_arg(arg, builder, local_sym_tab):
def get_flags_val(arg, builder, local_sym_tab): def get_flags_val(arg, builder, local_sym_tab):
"""Extract or create flags value from the call arguments.""" """
Extract or create flags value from call arguments.
Args:
arg: The AST argument node for flags
builder: LLVM IR builder
local_sym_tab: Local symbol table
Returns:
Integer flags value or LLVM IR value
Raises:
ValueError: If a variable is not found in symbol table
NotImplementedError: If the argument type is not supported
"""
if not arg: if not arg:
return 0 return 0
@ -85,7 +143,18 @@ def get_flags_val(arg, builder, local_sym_tab):
def simple_string_print(string_value, module, builder, func): def simple_string_print(string_value, module, builder, func):
"""Prepare arguments for bpf_printk from a simple string value""" """
Prepare arguments for bpf_printk from a simple string value.
Args:
string_value: The string to print
module: LLVM IR module
builder: LLVM IR builder
func: The LLVM IR function being built
Returns:
List of arguments for bpf_printk
"""
fmt_str = string_value + "\n\0" fmt_str = string_value + "\n\0"
fmt_ptr = _create_format_string_global(fmt_str, func, module, builder) fmt_ptr = _create_format_string_global(fmt_str, func, module, builder)
@ -101,7 +170,23 @@ def handle_fstring_print(
local_sym_tab=None, local_sym_tab=None,
struct_sym_tab=None, struct_sym_tab=None,
): ):
"""Handle f-string formatting for bpf_printk emitter.""" """
Handle f-string formatting for bpf_printk emitter.
Args:
joined_str: AST JoinedStr node representing the f-string
module: LLVM IR module
builder: LLVM IR builder
func: The LLVM IR function being built
local_sym_tab: Local symbol table
struct_sym_tab: Struct symbol table
Returns:
List of arguments for bpf_printk
Raises:
NotImplementedError: If f-string contains unsupported value types
"""
fmt_parts = [] fmt_parts = []
exprs = [] exprs = []

View File

@ -1,11 +1,31 @@
"""
BPF helper function stubs for Python type hints.
This module provides Python stub functions that represent BPF helper functions.
These stubs are used for type checking and will be replaced with actual BPF
helper calls during compilation.
"""
import ctypes import ctypes
def ktime(): def ktime():
"""
Get the current kernel time in nanoseconds.
Returns:
A c_int64 stub value (actual implementation is in BPF runtime)
"""
return ctypes.c_int64(0) return ctypes.c_int64(0)
def pid(): def pid():
"""
Get the current process ID (PID).
Returns:
A c_int32 stub value (actual implementation is in BPF runtime)
"""
return ctypes.c_int32(0) return ctypes.c_int32(0)

View File

@ -1,3 +1,10 @@
"""
LICENSE global variable processing for BPF programs.
This module handles the processing of the LICENSE function which is required
for BPF programs to declare their license (typically "GPL").
"""
from llvmlite import ir from llvmlite import ir
import ast import ast
from logging import Logger from logging import Logger
@ -7,6 +14,16 @@ logger: Logger = logging.getLogger(__name__)
def emit_license(module: ir.Module, license_str: str): def emit_license(module: ir.Module, license_str: str):
"""
Emit a LICENSE global variable into the LLVM IR module.
Args:
module: The LLVM IR module to add the LICENSE variable to
license_str: The license string (e.g., 'GPL')
Returns:
The created global variable
"""
license_bytes = license_str.encode("utf8") + b"\x00" license_bytes = license_str.encode("utf8") + b"\x00"
elems = [ir.Constant(ir.IntType(8), b) for b in license_bytes] elems = [ir.Constant(ir.IntType(8), b) for b in license_bytes]
ty = ir.ArrayType(ir.IntType(8), len(elems)) ty = ir.ArrayType(ir.IntType(8), len(elems))

View File

@ -1,3 +1,5 @@
"""BPF map types and processing."""
from .maps import HashMap, PerfEventArray, RingBuf from .maps import HashMap, PerfEventArray, RingBuf
from .maps_pass import maps_proc from .maps_pass import maps_proc

View File

@ -1,18 +1,59 @@
"""
BPF map type definitions for Python type hints.
This module provides Python classes that represent BPF map types.
These are used for type checking and map definition; the actual BPF maps
are generated as LLVM IR during compilation.
"""
# This file provides type and function hints only and does not actually give any functionality. # This file provides type and function hints only and does not actually give any functionality.
class HashMap: class HashMap:
"""
A BPF hash map for storing key-value pairs.
This is a type hint class used during compilation. The actual BPF map
implementation is generated as LLVM IR.
"""
def __init__(self, key, value, max_entries): def __init__(self, key, value, max_entries):
"""
Initialize a HashMap definition.
Args:
key: The ctypes type for keys (e.g., c_int64)
value: The ctypes type for values (e.g., c_int64)
max_entries: Maximum number of entries the map can hold
"""
self.key = key self.key = key
self.value = value self.value = value
self.max_entries = max_entries self.max_entries = max_entries
self.entries = {} self.entries = {}
def lookup(self, key): def lookup(self, key):
"""
Look up a value by key in the map.
Args:
key: The key to look up
Returns:
The value if found, None otherwise
"""
if key in self.entries: if key in self.entries:
return self.entries[key] return self.entries[key]
else: else:
return None return None
def delete(self, key): def delete(self, key):
"""
Delete an entry from the map by key.
Args:
key: The key to delete
Raises:
KeyError: If the key is not found in the map
"""
if key in self.entries: if key in self.entries:
del self.entries[key] del self.entries[key]
else: else:
@ -20,6 +61,17 @@ class HashMap:
# TODO: define the flags that can be added # TODO: define the flags that can be added
def update(self, key, value, flags=None): def update(self, key, value, flags=None):
"""
Update or insert a key-value pair in the map.
Args:
key: The key to update
value: The new value
flags: Optional flags for update behavior
Raises:
KeyError: If the key is not found in the map
"""
if key in self.entries: if key in self.entries:
self.entries[key] = value self.entries[key] = value
else: else:
@ -27,25 +79,76 @@ class HashMap:
class PerfEventArray: class PerfEventArray:
"""
A BPF perf event array for sending data to userspace.
This is a type hint class used during compilation.
"""
def __init__(self, key_size, value_size): def __init__(self, key_size, value_size):
"""
Initialize a PerfEventArray definition.
Args:
key_size: The size/type for keys
value_size: The size/type for values
"""
self.key_type = key_size self.key_type = key_size
self.value_type = value_size self.value_type = value_size
self.entries = {} self.entries = {}
def output(self, data): def output(self, data):
"""
Output data to the perf event array.
Args:
data: The data to output
"""
pass # Placeholder for output method pass # Placeholder for output method
class RingBuf: class RingBuf:
"""
A BPF ring buffer for efficient data transfer to userspace.
This is a type hint class used during compilation.
"""
def __init__(self, max_entries): def __init__(self, max_entries):
"""
Initialize a RingBuf definition.
Args:
max_entries: Maximum number of entries the ring buffer can hold
"""
self.max_entries = max_entries self.max_entries = max_entries
def reserve(self, size: int, flags=0): def reserve(self, size: int, flags=0):
"""
Reserve space in the ring buffer.
Args:
size: Size in bytes to reserve
flags: Optional reservation flags
Returns:
0 as a placeholder (actual implementation is in BPF runtime)
Raises:
ValueError: If size exceeds max_entries
"""
if size > self.max_entries: if size > self.max_entries:
raise ValueError("size cannot be greater than set maximum entries") raise ValueError("size cannot be greater than set maximum entries")
return 0 return 0
def submit(self, data, flags=0): def submit(self, data, flags=0):
"""
Submit data to the ring buffer.
Args:
data: The data to submit
flags: Optional submission flags
"""
pass pass
# add discard, output and also give names to flags and stuff # add discard, output and also give names to flags and stuff

View File

@ -1,3 +1,10 @@
"""
BPF map processing and LLVM IR generation.
This module handles the processing of BPF map definitions decorated with @map,
converting them to appropriate LLVM IR global variables with BTF debug info.
"""
import ast import ast
from logging import Logger from logging import Logger
from llvmlite import ir from llvmlite import ir
@ -20,6 +27,15 @@ def maps_proc(tree, module, chunks):
def is_map(func_node): def is_map(func_node):
"""
Check if a function node is decorated with @map.
Args:
func_node: The AST function node to check
Returns:
True if the function is decorated with @map, False otherwise
"""
return any( return any(
isinstance(decorator, ast.Name) and decorator.id == "map" isinstance(decorator, ast.Name) and decorator.id == "map"
for decorator in func_node.decorator_list for decorator in func_node.decorator_list
@ -27,6 +43,7 @@ def is_map(func_node):
class BPFMapType(Enum): class BPFMapType(Enum):
"""Enumeration of BPF map types."""
UNSPEC = 0 UNSPEC = 0
HASH = 1 HASH = 1
ARRAY = 2 ARRAY = 2
@ -65,7 +82,17 @@ class BPFMapType(Enum):
def create_bpf_map(module, map_name, map_params): def create_bpf_map(module, map_name, map_params):
"""Create a BPF map in the module with given parameters and debug info""" """
Create a BPF map in the module with given parameters and debug info.
Args:
module: The LLVM IR module to add the map to
map_name: The name of the BPF map
map_params: Dictionary of map parameters (type, key_size, value_size, max_entries)
Returns:
The created global variable representing the map
"""
# Create the anonymous struct type for BPF map # Create the anonymous struct type for BPF map
map_struct_type = ir.LiteralStructType( map_struct_type = ir.LiteralStructType(

View File

@ -1,3 +1,5 @@
"""Registry for BPF map processor functions."""
from collections.abc import Callable from collections.abc import Callable
from typing import Any from typing import Any
@ -12,6 +14,7 @@ class MapProcessorRegistry:
"""Decorator to register a processor function for a map type""" """Decorator to register a processor function for a map type"""
def decorator(func): def decorator(func):
"""Decorator that registers the processor function."""
cls._processors[map_type_name] = func cls._processors[map_type_name] = func
return func return func

View File

@ -1,3 +1,5 @@
"""Struct processing for BPF programs."""
from .structs_pass import structs_proc from .structs_pass import structs_proc
__all__ = ["structs_proc"] __all__ = ["structs_proc"]

View File

@ -1,19 +1,72 @@
"""
Struct type wrapper for BPF structs.
This module provides a wrapper class for LLVM IR struct types with
helper methods for field access and manipulation.
"""
from llvmlite import ir from llvmlite import ir
class StructType: class StructType:
"""
Wrapper class for LLVM IR struct types with field access helpers.
Attributes:
ir_type: The LLVM IR struct type
fields: Dictionary mapping field names to their types
size: Total size of the struct in bytes
"""
def __init__(self, ir_type, fields, size): def __init__(self, ir_type, fields, size):
"""
Initialize a StructType.
Args:
ir_type: The LLVM IR struct type
fields: Dictionary mapping field names to their types
size: Total size of the struct in bytes
"""
self.ir_type = ir_type self.ir_type = ir_type
self.fields = fields self.fields = fields
self.size = size self.size = size
def field_idx(self, field_name): def field_idx(self, field_name):
"""
Get the index of a field in the struct.
Args:
field_name: The name of the field
Returns:
The zero-based index of the field
"""
return list(self.fields.keys()).index(field_name) return list(self.fields.keys()).index(field_name)
def field_type(self, field_name): def field_type(self, field_name):
"""
Get the LLVM IR type of a field.
Args:
field_name: The name of the field
Returns:
The LLVM IR type of the field
"""
return self.fields[field_name] return self.fields[field_name]
def gep(self, builder, ptr, field_name): def gep(self, builder, ptr, field_name):
"""
Generate a GEP (GetElementPtr) instruction to access a struct field.
Args:
builder: LLVM IR builder
ptr: Pointer to the struct
field_name: Name of the field to access
Returns:
A pointer to the field
"""
idx = self.field_idx(field_name) idx = self.field_idx(field_name)
return builder.gep( return builder.gep(
ptr, ptr,
@ -22,6 +75,18 @@ class StructType:
) )
def field_size(self, field_name): def field_size(self, field_name):
"""
Calculate the size of a field in bytes.
Args:
field_name: The name of the field
Returns:
The size of the field in bytes
Raises:
TypeError: If the field type is not supported
"""
fld = self.fields[field_name] fld = self.fields[field_name]
if isinstance(fld, ir.ArrayType): if isinstance(fld, ir.ArrayType):
return fld.count * (fld.element.width // 8) return fld.count * (fld.element.width // 8)

View File

@ -1,3 +1,10 @@
"""
BPF struct processing and LLVM IR type generation.
This module handles the processing of Python classes decorated with @struct,
converting them to LLVM IR struct types for use in BPF programs.
"""
import ast import ast
import logging import logging
from llvmlite import ir from llvmlite import ir
@ -26,6 +33,15 @@ def structs_proc(tree, module, chunks):
def is_bpf_struct(cls_node): def is_bpf_struct(cls_node):
"""
Check if a class node is decorated with @struct.
Args:
cls_node: The AST class node to check
Returns:
True if the class is decorated with @struct, False otherwise
"""
return any( return any(
isinstance(decorator, ast.Name) and decorator.id == "struct" isinstance(decorator, ast.Name) and decorator.id == "struct"
for decorator in cls_node.decorator_list for decorator in cls_node.decorator_list
@ -33,7 +49,16 @@ def is_bpf_struct(cls_node):
def process_bpf_struct(cls_node, module): def process_bpf_struct(cls_node, module):
"""Process a single BPF struct definition""" """
Process a single BPF struct definition and create its LLVM IR representation.
Args:
cls_node: The AST class node representing the struct
module: The LLVM IR module (not used in current implementation)
Returns:
A StructType object containing the struct's type information
"""
fields = parse_struct_fields(cls_node) fields = parse_struct_fields(cls_node)
field_types = list(fields.values()) field_types = list(fields.values())
@ -44,7 +69,18 @@ def process_bpf_struct(cls_node, module):
def parse_struct_fields(cls_node): def parse_struct_fields(cls_node):
"""Parse fields of a struct class node""" """
Parse fields of a struct class node.
Args:
cls_node: The AST class node representing the struct
Returns:
A dictionary mapping field names to their LLVM IR types
Raises:
TypeError: If a field has an unsupported type annotation
"""
fields = {} fields = {}
for item in cls_node.body: for item in cls_node.body:
@ -57,7 +93,18 @@ def parse_struct_fields(cls_node):
def get_type_from_ann(annotation): def get_type_from_ann(annotation):
"""Convert an AST annotation node to an LLVM IR type for struct fields""" """
Convert an AST annotation node to an LLVM IR type for struct fields.
Args:
annotation: The AST annotation node (e.g., c_int64, str(32))
Returns:
The corresponding LLVM IR type
Raises:
TypeError: If the annotation type is not supported
"""
if isinstance(annotation, ast.Call) and isinstance(annotation.func, ast.Name): if isinstance(annotation, ast.Call) and isinstance(annotation.func, ast.Name):
if annotation.func.id == "str": if annotation.func.id == "str":
# Char array # Char array
@ -72,7 +119,15 @@ def get_type_from_ann(annotation):
def calc_struct_size(field_types): def calc_struct_size(field_types):
"""Calculate total size of the struct with alignment and padding""" """
Calculate total size of the struct with alignment and padding.
Args:
field_types: List of LLVM IR types for each field
Returns:
The total size of the struct in bytes
"""
curr_offset = 0 curr_offset = 0
for ftype in field_types: for ftype in field_types:
if isinstance(ftype, ir.IntType): if isinstance(ftype, ir.IntType):

View File

@ -1,3 +1,10 @@
"""
Type mapping from Python ctypes to LLVM IR types.
This module provides utilities to convert Python ctypes type names
to their corresponding LLVM IR representations.
"""
from llvmlite import ir from llvmlite import ir
# TODO: THIS IS NOT SUPPOSED TO MATCH STRINGS :skull: # TODO: THIS IS NOT SUPPOSED TO MATCH STRINGS :skull:
@ -19,10 +26,31 @@ mapping = {
def ctypes_to_ir(ctype: str): def ctypes_to_ir(ctype: str):
"""
Convert a ctypes type name to its corresponding LLVM IR type.
Args:
ctype: String name of the ctypes type (e.g., 'c_int64', 'c_void_p')
Returns:
The corresponding LLVM IR type
Raises:
NotImplementedError: If the ctype is not supported
"""
if ctype in mapping: if ctype in mapping:
return mapping[ctype] return mapping[ctype]
raise NotImplementedError(f"No mapping for {ctype}") raise NotImplementedError(f"No mapping for {ctype}")
def is_ctypes(ctype: str) -> bool: def is_ctypes(ctype: str) -> bool:
"""
Check if a given type name is a supported ctypes type.
Args:
ctype: String name of the type to check
Returns:
True if the type is a supported ctypes type, False otherwise
"""
return ctype in mapping return ctype in mapping

View File

@ -0,0 +1,27 @@
// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <linux/types.h>
struct test_struct {
__u64 a;
__u64 b;
};
struct test_struct w = {};
volatile __u64 prev_time = 0;
SEC("tracepoint/syscalls/sys_enter_execve")
int trace_execve(void *ctx)
{
bpf_printk("previous %ul now %ul", w.b, w.a);
__u64 ts = bpf_ktime_get_ns();
bpf_printk("prev %ul now %ul", prev_time, ts);
w.a = ts;
w.b = prev_time;
prev_time = ts;
return 0;
}
char LICENSE[] SEC("license") = "GPL";

19
tests/c-form/kprobe.bpf.c Normal file
View File

@ -0,0 +1,19 @@
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
char LICENSE[] SEC("license") = "Dual BSD/GPL";
SEC("kprobe/do_unlinkat")
int kprobe_execve(struct pt_regs *ctx)
{
bpf_printk("unlinkat created");
return 0;
}
SEC("kretprobe/do_unlinkat")
int kretprobe_execve(struct pt_regs *ctx)
{
bpf_printk("unlinkat returned\n");
return 0;
}

View File

@ -0,0 +1,18 @@
from pythonbpf import bpf, section, bpfglobal, compile
from ctypes import c_void_p, c_int64
@bpf
@section("tracepoint/syscalls/sys_enter_execve")
def hello_world(ctx: c_void_p) -> c_int64:
print("Hello, World!") if True else print("Goodbye, World!")
return
@bpf
@bpfglobal
def LICENSE() -> str:
return "GPL"
compile()

View File

@ -0,0 +1,101 @@
import logging
from pythonbpf import compile, bpf, section, bpfglobal, compile_to_ir
from ctypes import c_void_p, c_int64, c_int32
@bpf
@bpfglobal
def somevalue() -> c_int32:
return c_int32(42)
@bpf
@bpfglobal
def somevalue2() -> c_int64:
return c_int64(69)
@bpf
@bpfglobal
def somevalue1() -> c_int32:
return c_int32(42)
# --- Passing examples ---
# Simple constant return
@bpf
@bpfglobal
def g1() -> c_int64:
return c_int64(42)
# Constructor with one constant argument
@bpf
@bpfglobal
def g2() -> c_int64:
return c_int64(69)
# --- Failing examples ---
# No return annotation
# @bpf
# @bpfglobal
# def g3():
# return 42
# Return annotation is complex
# @bpf
# @bpfglobal
# def g4() -> List[int]:
# return []
# # Return is missing
# @bpf
# @bpfglobal
# def g5() -> c_int64:
# pass
# # Return is a variable reference
# #TODO: maybe fix this sometime later. It defaults to 0
# CONST = 5
# @bpf
# @bpfglobal
# def g6() -> c_int64:
# return c_int64(CONST)
# Constructor with multiple args
#TODO: this is not working. should it work ?
@bpf
@bpfglobal
def g7() -> c_int64:
return c_int64(1)
# Dataclass call
#TODO: fails with dataclass
# @dataclass
# class Point:
# x: c_int64
# y: c_int64
# @bpf
# @bpfglobal
# def g8() -> Point:
# return Point(1, 2)
@bpf
@section("tracepoint/syscalls/sys_enter_execve")
def sometag(ctx: c_void_p) -> c_int64:
print("test")
global somevalue
somevalue = 2
print(f"{somevalue}")
return c_int64(1)
@bpf
@bpfglobal
def LICENSE() -> str:
return "GPL"
compile_to_ir("globals.py", "globals.ll", loglevel=logging.INFO)
compile()

View File

@ -0,0 +1,21 @@
import logging
from pythonbpf import compile, bpf, section, bpfglobal, compile_to_ir
from ctypes import c_void_p, c_int64
# This should not pass as somevalue is not declared at all.
@bpf
@section("tracepoint/syscalls/sys_enter_execve")
def sometag(ctx: c_void_p) -> c_int64:
print("test")
print(f"{somevalue}") # noqa: F821
return c_int64(1)
@bpf
@bpfglobal
def LICENSE() -> str:
return "GPL"
compile_to_ir("globals.py", "globals.ll", loglevel=logging.INFO)
compile()

380
tools/vmlinux-gen.py Executable file
View File

@ -0,0 +1,380 @@
#!/usr/bin/env python3
"""
BTF to Python ctypes Converter
Converts Linux kernel BTF (BPF Type Format) to Python ctypes definitions.
This tool automates the process of:
1. Dumping BTF from vmlinux
2. Preprocessing enum definitions
3. Processing struct kioctx to extract anonymous nested structs
4. Running C preprocessor
5. Converting to Python ctypes using clang2py
6. Post-processing the output
Requirements:
- bpftool
- clang
- ctypeslib2 (pip install ctypeslib2)
"""
import argparse
import os
import re
import subprocess
import sys
import tempfile
class BTFConverter:
def __init__(self, btf_source="/sys/kernel/btf/vmlinux", output_file="vmlinux.py",
keep_intermediate=False, verbose=False):
self.btf_source = btf_source
self.output_file = output_file
self.keep_intermediate = keep_intermediate
self.verbose = verbose
self.temp_dir = tempfile.mkdtemp() if not keep_intermediate else "."
def log(self, message):
"""Print message if verbose mode is enabled."""
if self.verbose:
print(f"[*] {message}")
def run_command(self, cmd, description):
"""Run a shell command and handle errors."""
self.log(f"{description}...")
try:
result = subprocess.run(
cmd,
shell=True,
check=True,
capture_output=True,
text=True
)
if self.verbose and result.stdout:
print(result.stdout)
return result
except subprocess.CalledProcessError as e:
print(f"Error during {description}:", file=sys.stderr)
print(e.stderr, file=sys.stderr)
sys.exit(1)
def step1_dump_btf(self):
"""Step 1: Dump BTF from vmlinux."""
vmlinux_h = os.path.join(self.temp_dir, "vmlinux.h")
cmd = f"bpftool btf dump file {self.btf_source} format c > {vmlinux_h}"
self.run_command(cmd, "Dumping BTF from vmlinux")
return vmlinux_h
def step2_preprocess_enums(self, input_file):
"""Step 1.5: Preprocess enum definitions."""
self.log("Preprocessing enum definitions...")
with open(input_file, 'r') as f:
original_code = f.read()
# Extract anonymous enums
enums = re.findall(
r'(?<!typedef\s)(enum\s*\{[^}]*\})\s*(\w+)\s*(?::\s*\d+)?\s*;',
original_code
)
enum_defs = [enum_block + ';' for enum_block, _ in enums]
# Replace anonymous enums with int declarations
processed_code = re.sub(
r'(?<!typedef\s)enum\s*\{[^}]*\}\s*(\w+)\s*(?::\s*\d+)?\s*;',
r'int \1;',
original_code
)
# Prepend enum definitions
if enum_defs:
enum_text = '\n'.join(enum_defs) + '\n\n'
processed_code = enum_text + processed_code
output_file = os.path.join(self.temp_dir, "vmlinux_processed.h")
with open(output_file, 'w') as f:
f.write(processed_code)
return output_file
def step2_5_process_kioctx(self, input_file):
#TODO: this is a very bad bug and design decision. A single struct has an issue mostly.
"""Step 2.5: Process struct kioctx to extract nested anonymous structs."""
self.log("Processing struct kioctx nested structs...")
with open(input_file, 'r') as f:
content = f.read()
# Pattern to match struct kioctx with its full body (handles multiple nesting levels)
kioctx_pattern = r'struct\s+kioctx\s*\{(?:[^{}]|\{(?:[^{}]|\{[^{}]*\})*\})*\}\s*;'
def process_kioctx_replacement(match):
full_struct = match.group(0)
self.log(f"Found struct kioctx, length: {len(full_struct)} chars")
# Extract the struct body (everything between outermost { and })
body_match = re.search(r'struct\s+kioctx\s*\{(.*)\}\s*;', full_struct, re.DOTALL)
if not body_match:
return full_struct
body = body_match.group(1)
# Find all anonymous structs within the body
# Pattern: struct { ... } followed by ; (not a member name)
anon_struct_pattern = r'struct\s*\{[^}]*\}'
anon_structs = []
anon_counter = 4 # Start from 4, counting down to 1
def replace_anonymous_struct(m):
nonlocal anon_counter
anon_struct_content = m.group(0)
# Extract the body of the anonymous struct
anon_body_match = re.search(r'struct\s*\{(.*)\}', anon_struct_content, re.DOTALL)
if not anon_body_match:
return anon_struct_content
anon_body = anon_body_match.group(1)
# Create the named struct definition
anon_name = f"__anon{anon_counter}"
member_name = f"a{anon_counter}"
# Store the struct definition
anon_structs.append(f"struct {anon_name} {{{anon_body}}};")
anon_counter -= 1
# Return the member declaration
return f"struct {anon_name} {member_name}"
# Process the body, finding and replacing anonymous structs
# We need to be careful to only match anonymous structs followed by ;
processed_body = body
# Find all occurrences and process them
pattern_with_semicolon = r'struct\s*\{([^}]*)\}\s*;'
matches = list(re.finditer(pattern_with_semicolon, body, re.DOTALL))
if not matches:
self.log("No anonymous structs found in kioctx")
return full_struct
self.log(f"Found {len(matches)} anonymous struct(s)")
# Process in reverse order to maintain string positions
for match in reversed(matches):
anon_struct_content = match.group(1)
start_pos = match.start()
end_pos = match.end()
# Create the named struct definition
anon_name = f"__anon{anon_counter}"
member_name = f"a{anon_counter}"
# Store the struct definition
anon_structs.insert(0, f"struct {anon_name} {{{anon_struct_content}}};")
# Replace in the body
replacement = f"struct {anon_name} {member_name};"
processed_body = processed_body[:start_pos] + replacement + processed_body[end_pos:]
anon_counter -= 1
# Rebuild the complete definition
if anon_structs:
# Prepend the anonymous struct definitions
anon_definitions = '\n'.join(anon_structs) + '\n\n'
new_struct = f"struct kioctx {{{processed_body}}};"
return anon_definitions + new_struct
else:
return full_struct
# Apply the transformation
processed_content = re.sub(
kioctx_pattern,
process_kioctx_replacement,
content,
flags=re.DOTALL
)
output_file = os.path.join(self.temp_dir, "vmlinux_kioctx_processed.h")
with open(output_file, 'w') as f:
f.write(processed_content)
self.log(f"Saved kioctx-processed output to {output_file}")
return output_file
def step3_run_preprocessor(self, input_file):
"""Step 2: Run C preprocessor."""
output_file = os.path.join(self.temp_dir, "vmlinux.i")
cmd = f"clang -E {input_file} > {output_file}"
self.run_command(cmd, "Running C preprocessor")
return output_file
def step4_convert_to_ctypes(self, input_file):
"""Step 3: Convert to Python ctypes using clang2py."""
output_file = os.path.join(self.temp_dir, "vmlinux_raw.py")
cmd = (
f"clang2py {input_file} -o {output_file} "
f"--clang-args=\"-fno-ms-extensions -I/usr/include -I/usr/include/linux\""
)
self.run_command(cmd, "Converting to Python ctypes")
return output_file
def step5_postprocess(self, input_file):
"""Step 4: Post-process the generated Python file."""
self.log("Post-processing Python ctypes definitions...")
with open(input_file, "r") as f:
data = f.read()
# Remove lines like ('_45', ctypes.c_int64, 0)
data = re.sub(r"\('_[0-9]+',\s*ctypes\.[a-zA-Z0-9_]+,\s*0\),?\s*\n?", "", data)
# Replace ('_20', ctypes.c_uint64, 64) → ('_20', ctypes.c_uint64)
data = re.sub(r"\('(_[0-9]+)',\s*(ctypes\.[a-zA-Z0-9_]+),\s*[0-9]+\)", r"('\1', \2)", data)
# Replace ('_20', ctypes.c_char, 8) with ('_20', ctypes.c_uint8, 8)
data = re.sub(
r"(ctypes\.c_char)(\s*,\s*\d+\))",
r"ctypes.c_uint8\2",
data
)
# below to replace those c_bool with bitfield greater than 8
def repl(m):
name, bits = m.groups()
return f"('{name}', ctypes.c_uint32, {bits})" if int(bits) > 8 else m.group(0)
data = re.sub(
r"\('([^']+)',\s*ctypes\.c_bool,\s*(\d+)\)",
repl,
data
)
# Remove ctypes. prefix from invalid entries
invalid_ctypes = ["bpf_iter_state", "_cache_type", "fs_context_purpose"]
for name in invalid_ctypes:
data = re.sub(rf"\bctypes\.{name}\b", name, data)
with open(self.output_file, "w") as f:
f.write(data)
self.log(f"Saved final output to {self.output_file}")
def cleanup(self):
"""Remove temporary files if not keeping them."""
if not self.keep_intermediate and self.temp_dir != ".":
self.log(f"Cleaning up temporary directory: {self.temp_dir}")
import shutil
shutil.rmtree(self.temp_dir, ignore_errors=True)
def convert(self):
"""Run the complete conversion pipeline."""
try:
self.log("Starting BTF to Python ctypes conversion...")
# Check dependencies
self.check_dependencies()
# Run conversion pipeline
vmlinux_h = self.step1_dump_btf()
vmlinux_processed_h = self.step2_preprocess_enums(vmlinux_h)
vmlinux_kioctx_h = self.step2_5_process_kioctx(vmlinux_processed_h)
vmlinux_i = self.step3_run_preprocessor(vmlinux_kioctx_h)
vmlinux_raw_py = self.step4_convert_to_ctypes(vmlinux_i)
self.step5_postprocess(vmlinux_raw_py)
print(f"\n✓ Conversion complete! Output saved to: {self.output_file}")
except Exception as e:
print(f"\n✗ Error during conversion: {e}", file=sys.stderr)
import traceback
traceback.print_exc()
sys.exit(1)
finally:
self.cleanup()
def check_dependencies(self):
"""Check if required tools are available."""
self.log("Checking dependencies...")
dependencies = {
"bpftool": "bpftool --version",
"clang": "clang --version",
"clang2py": "clang2py --version"
}
missing = []
for tool, cmd in dependencies.items():
try:
subprocess.run(
cmd,
shell=True,
check=True,
capture_output=True
)
except subprocess.CalledProcessError:
missing.append(tool)
if missing:
print("Error: Missing required dependencies:", file=sys.stderr)
for tool in missing:
print(f" - {tool}", file=sys.stderr)
if "clang2py" in missing:
print("\nInstall ctypeslib2: pip install ctypeslib2", file=sys.stderr)
sys.exit(1)
def main():
parser = argparse.ArgumentParser(
description="Convert Linux kernel BTF to Python ctypes definitions",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
%(prog)s
%(prog)s -o kernel_types.py
%(prog)s --btf-source /sys/kernel/btf/custom_module -k -v
"""
)
parser.add_argument(
"--btf-source",
default="/sys/kernel/btf/vmlinux",
help="Path to BTF source (default: /sys/kernel/btf/vmlinux)"
)
parser.add_argument(
"-o", "--output",
default="vmlinux.py",
help="Output Python file (default: vmlinux.py)"
)
parser.add_argument(
"-k", "--keep-intermediate",
action="store_true",
help="Keep intermediate files (vmlinux.h, vmlinux_processed.h, etc.)"
)
parser.add_argument(
"-v", "--verbose",
action="store_true",
help="Enable verbose output"
)
args = parser.parse_args()
converter = BTFConverter(
btf_source=args.btf_source,
output_file=args.output,
keep_intermediate=args.keep_intermediate,
verbose=args.verbose
)
converter.convert()
if __name__ == "__main__":
main()