python rust reinforcement-learning pyo3 algorithms

Hybrid Navigation System

A reinforcement-learning navigation R&D project pairing Python AI flexibility with a Rust performance core, benchmarking Q-learning and A* pathfinding across both languages via PyO3.

3000

training episodes per Q-learning run

Python ↔ Rust

A* speedup benchmark via PyO3

3 layers

simulation, core & analytics

[ role ] Sole engineer · R&D prototype

Project specs

Tech Stack

Python Rust PyO3 maturin NumPy Matplotlib

R&D Focus

Reinforcement Learning & Performance Engineering

Complexity

An R&D prototype exploring the synergy between high-level AI flexibility (Python) and low-level execution performance (Rust). It trains a navigation agent with Q-learning, solves the same grid with A*, and benchmarks a Rust implementation of A* against its Python counterpart to quantify the speedup.

Problem

Reinforcement learning and pathfinding are easy to prototype in Python but slow at scale, while Rust is fast but cumbersome for rapid experimentation. The goal was to get the best of both: keep training and environment logic flexible in Python, while pushing performance-critical pathfinding into Rust — and measure the difference rigorously.

Approach

  • Simulation layer (the “brain”): A GridEnv environment, a Q-learning agent trained over 3,000 episodes, and a classical A* search — all in Python.
  • Performance core (the “engine”): A high-speed Rust implementation of A*, compiled to an importable astar_rust module via PyO3 and maturin ( maturin develop --release ).
  • Analytics layer (the “observer”): Logs episode rewards/steps, calculates convergence metrics, and renders training curves and benchmark comparisons.
  • Benchmarking flow: Generates a 100×100 grid, runs both the Python and Rust A* implementations, measures time and nodes expanded, and computes the speedup factor.

Outcome

  • End-to-end training pipeline producing training_metrics.json (convergence summary) and training_curves.png (learning visualization).
  • A reproducible head-to-head benchmark isolating the cost of the algorithm’s host language.
  • A clean three-layer separation (simulation, Rust core, analytics) with explicit module dependencies.

Q-learning analytics: reward per episode converges to ~110 and steps-to-goal drop from 100 to ~18 over 3,000 episodes.
Q-learning analytics: reward per episode converges to ~110 and steps-to-goal drop from 100 to ~18 over 3,000 episodes.

Stack notes

Python handles the RL loop and orchestration (NumPy, pandas, Matplotlib, seaborn for analysis). The Rust crate ( astar_rust ) is a cdylib exposed to Python through PyO3’s extension-module and built with maturin. pytest covers the shared logic.