Resources

Reinforcement Learning (RL) is an exciting field of machine learning where agents learn to make decisions by interacting with an environment and receiving feedback. If you’re interested in learning RL coding, here’s a structured path to get started.

Q-Learning: A Great Starting Point

Q-Learning is one of the most accessible RL algorithms for beginners. It’s a model-free, value-based method that learns an action-value function (Q-function) representing the expected utility of taking a specific action in a given state. It can work on or off-policy so can be trained either directly or from replaying past recordings.

How Q-Learning works

  1. Q-Table Initialization:

    • Create a table with rows representing states and columns representing actions
    • Initialize all values to zero (or some arbitrary value)
  2. Learning Loop:

    • Start in an initial state
    • For each time step until reaching a terminal state:
      • Choose an action using an exploration strategy (like ε-greedy)

      • Take the action, observe the reward and next state

      • Update the Q-value using the Bellman equation:

        Q(s,a) ← Q(s,a) + α[r + γ·max(Q(s’,a’)) - Q(s,a)]

        where:

        • α (alpha) is the learning rate (how quickly new information overrides old)
        • γ (gamma) is the discount factor (importance of future rewards)
        • r is the immediate reward
        • s’ is the new state
        • max(Q(s’,a’)) is the best estimated future value
  3. Exploitation vs. Exploration:

    • ε-greedy approach: With probability ε, choose random action (explore)
    • Otherwise, choose action with highest Q-value (exploit)
    • Typically, ε decreases over time as the agent learns

Simple Python Implementation:

import numpy as np

# Initialize Q-table
Q = np.zeros([num_states, num_actions])

# Hyperparameters
alpha = 0.1  # Learning rate
gamma = 0.99  # Discount factor
epsilon = 0.1  # Exploration rate

# Q-learning algorithm
def q_learning(state, num_episodes):
    for i in range(num_episodes):
        state = env.reset()
        done = False
        
        while not done:
            # Choose action using epsilon-greedy
            if np.random.random() < epsilon:
                action = env.action_space.sample()  # Random action
            else:
                action = np.argmax(Q[state,:])  # Best action
            
            # Take action
            next_state, reward, done, _ = env.step(action)
            
            # Update Q-table
            Q[state, action] += alpha * (reward + gamma * np.max(Q[next_state,:]) - Q[state, action])
            
            state = next_state

Useful Resources and Websites

Online Courses

Books

Interactive Tutorials

Communities

Libraries and Frameworks

Most languages have dedicated RL libraries as well as other frameworks related to different aspects of AI. Here are a selection grouped by the most used languages that support Protocol Buffers (according to the 2025 TIOBE Index).

Language DNN / Deep Learning Classical ML RL / Q-Tables LLM Connectors Agent Frameworks Rules Engines Data / Numeric
Python (#1) PyTorch · TensorFlow · JAX · Keras scikit-learn · XGBoost · LightGBM Stable Baselines3 · Ray RLlib · Gymnasium · CleanRL LangChain · OpenAI SDK · Anthropic SDK · LiteLLM LangGraph · AutoGen · CrewAI · Google ADK Durable Rules · business-rules · Experta NumPy · Pandas · Polars
C++ (#2) LibTorch · TensorFlow C++ · ONNX Runtime · Caffe2 mlpack · dlib · Shark RLtools · AI-Toolbox · RLLib · relearn llama.cpp · whisper.cpp · CTranslate2 BehaviorTree.CPPlimited higher-level agent tooling CLIPS · UE Rules Eigen · Armadillo · OpenCV
Java (#3) Deeplearning4j · DJL · ONNX Runtime Java Weka · Tribuo · Smile · Spark MLlib RL4J · Burlap LangChain4j · Spring AI · Semantic Kernel LangGraph4j · Kalix Agents · Google ADK Java Drools · Easy Rules · OpenL Tablets ND4J · Tablesaw · Apache Commons Math
C# (#5) TorchSharp · ONNX Runtime .NET · TensorFlow.NET ML.NET · Accord.NET Limited native options — build custom with ML.NET Semantic Kernel · LLM Tornado · LLamaSharp · Microsoft.Extensions.AI MS Agent Framework · AutoGen.NET NRules · RulesEngine Math.NET · NumSharp
JavaScript / TypeScript (#6/#8) TensorFlow.js · ONNX Runtime Web · Brain.js ml.js · Danfo.js REINFORCEjsniche ecosystem LangChain.js · Vercel AI SDK · OpenAI Node · Anthropic Node LangGraph.js · Mastra · MCP SDK json-rules-engine · Nools mathjs · Arquero
Go (#7) Gorgonia · GoMLX · ONNX Runtime Go GoLearn · Gonum Very limitedgold LangChainGo · Eino · Ollama API · GenKit Go ADK-Go · go-llm · MCP Go-SDK Grule · GoRules Gonum · go-dataframe
Kotlin (#13) KotlinDL · DJL · DL4J (via JVM) Smile (via Kotlin DSL) · Tribuo (via JVM) Via JavaRL4J · Burlap LangChain4j · Spring AI JB AI Agent · Google ADK (Kotlin support) Drools (via JVM) KMath · Multik · Krangl
Rust (#14) Burn · Candle · tch-rs · ort Linfa · SmartCore · tract REnforceemerging ecosystem genai · ollama-rs · async-openai Rig · MCP SDK Rust No major native option ndarray · Polars · nalgebra

Notes