When instantiating the Trainer, the n_workers parameter controls how many worker processes are spawned to run Agent instances in parallel.
Could the documentation (or a wiki section) provide:
- A rule-of-thumb or benchmark for choosing an optimal n_workers value .
- A short explanation of how changing this number affects the overall RL training pipeline—both pros (higher throughput, better GPU utilisation) and cons (memory overhead, CPU contention, possible slowdown if set too high).
This would help users tune parallelism without resorting to trial-and-error for every new machine or environment.