The Agent - Institutional Trading System.

Overview: An autonomous Reinforcement Learning (PPO) trading agent designed to operate profitably in "extreme survival" conditions: low capital, high spreads, and high commissions. Built with Python, PyTorch, and the MetaTrader 5 API.

1. The Challenge: The Mathematical Trap

The system is built to survive in a $100 USD micro-account trading EURUSD. In this environment, the broker's commission and spread consume up to 80% of the first pip of profit.

Action Masking: To prevent the AI from burning capital, I implemented a deterministic pre-model filter. If the real-time spread is too high, the Buy/Sell actions are mathematically masked, forcing the agent into a "HOLD" state. The agent doesn't just learn to avoid bad trades; it is mathematically prohibited from executing them.

2. Architecture & Hardware Optimization

The model is optimized to fully utilize a Ryzen 7 5800x for parallel processing (vectorized environments) and an NVIDIA RTX 3080Ti for accelerated training.

Automatic Mixed Precision (AMP): Implemented torch.cuda.amp to optimize VRAM usage and accelerate Transformer training.
Hybrid Backbone (The Alpha Generator): Uses Temporal Convolutional Networks (TCN) with Dilated Convolutions for feature extraction, followed by a Transformer Encoder Block (Multi-Head Attention) to capture long-term sequence dependencies.

3. Key Innovations

Fractional Differentiation: Instead of standard normalizations that erase historical memory, the system uses López de Prado's algorithm to achieve stationarity while retaining maximum market memory.
Multi-Timeframe Perception (Tensor Fusion): The agent processes three simultaneous tensors: M1 (noise/execution), M5 (tactics), and M15 (macro strategy).
Safety Layer (Monte Carlo Dropout): During live inference, the network runs multiple passes with active dropout to calculate variance. If uncertainty is too high, the signal is classified as noise and the action defaults to HOLD.

4. Reward Function (Sniper Design)

The PPO reward function is strictly penalized to discourage overtrading:

Immediate penalty for spread and commission upon opening a trade.
Time Decay penalty if a position stays open too long without reaching Break-Even.
Volatility penalty based on recent return standard deviation (implicit Sortino Ratio).

5. System Structure

Here is a sample of the core integration logic:

import MetaTrader5 as mt5
import pandas as pd
from datetime import datetime
import pytz

class MT5DataLoader:
    def __init__(self, symbol="EURUSD", login=None, password=None, server=None):
        self.symbol = symbol
        # Inicializar conexión
        if not mt5.initialize():
            print("Error al inicializar MT5, asegúrate de que la terminal esté abierta o configura login/pass")
            quit()
        else:
            print(f"✅ Conectado a MT5. Buscando datos para: {symbol}")

    def get_data(self, timeframe, n_bars=10000):
        """Descarga datos crudos de MT5"""
        rates = mt5.copy_rates_from_pos(self.symbol, timeframe, 0, n_bars)
        if rates is None:
            print(f"❌ Error descargando datos para timeframe: {timeframe}")
            return None
        
        df = pd.DataFrame(rates)
        df['time'] = pd.to_datetime(df['time'], unit='s')
        df.set_index('time', inplace=True)
        return df[['open', 'high', 'low', 'close', 'tick_volume']]

    def get_multi_timeframe_data(self, n_bars=50000):
        """
        Descarga M1, M5, M15, M30, H1 y los alinea.
        La base es M1. Los datos mayores se repiten (ffill) para llenar los huecos de M1.
        """
        timeframes = {
            'M1': mt5.TIMEFRAME_M1,
            'M5': mt5.TIMEFRAME_M5,
            'M15': mt5.TIMEFRAME_M15,
            'M30': mt5.TIMEFRAME_M30,
            'H1': mt5.TIMEFRAME_H1
        }
        
        data_dict = {}
        print("⏳ Descargando datos masivos (esto puede tardar unos segundos)...")
        
        # Descargar base M1
        df_m1 = self.get_data(timeframes['M1'], n_bars)
        if df_m1 is None: return None
        
        data_dict['M1'] = df_m1
        
        # Descargar y alinear el resto
        for tf_name, tf_code in timeframes.items():
            if tf_name == 'M1': continue
            
            # Descargamos menos barras para los TF grandes, pero suficiente para cubrir el tiempo
            df_tf = self.get_data(tf_code, n_bars=n_bars//5) # Ajuste aproximado
            
            # Renombrar columnas para evitar colisiones
            df_tf = df_tf.add_suffix(f'_{tf_name}')
            
            # Unir con M1 usando 'asof' (el valor más reciente conocido)
            # Esto alinea, por ejemplo, el H1 actual a todas las velas M1 que ocurren dentro de esa hora
            data_dict[tf_name] = df_tf

        print("✅ Descarga completada.")
        return data_dict