Unity-Technologies · maryamziaa · Jul 22, 2025 · Jul 22, 2025 · Jul 22, 2025 · Jul 22, 2025
diff --git a/.gitignore b/.gitignore
@@ -52,6 +52,10 @@
 
 # Generated doc folders
 /docs/html
+/com.unity.ml-agents/Documentation~/html
+
+# MkDocs build output
+/site/
 
 # Mac hidden files
 *.DS_Store

diff --git a/com.unity.ml-agents/Documentation~/API-Reference.md b/com.unity.ml-agents/Documentation~/API-Reference.md
@@ -0,0 +1,20 @@
+# API Reference
+
+Our developer-facing C# classes have been documented to be compatible with
+Doxygen for auto-generating HTML documentation.
+
+To generate the API reference, download Doxygen and run the following command
+within the `Documentation~/` directory:
+
+```sh
+doxygen dox-ml-agents.conf
+```
+
+`dox-ml-agents.conf` is a Doxygen configuration file for the ML-Agents Toolkit
+that includes the classes that have been properly formatted. The generated HTML
+files will be placed in the `html/` subdirectory. Open `index.html` within that
+subdirectory to navigate to the API reference home. Note that `html/` is already
+included in the repository's `.gitignore` file.
+
+In the near future, we aim to expand our documentation to include the Python
+classes.
diff --git a/com.unity.ml-agents/Documentation~/Advanced-Features.md b/com.unity.ml-agents/Documentation~/Advanced-Features.md
@@ -0,0 +1,14 @@
+# Advanced Features
+
+The ML-Agents Toolkit provides several advanced features that extend the core functionality and enable sophisticated use cases.
+
+
+| **Feature**                                          | **Description**                                          |
+|------------------------------------------------------|----------------------------------------------------------|
+| [Custom Side Channels](Custom-SideChannels.md)      | Create custom communication channels between Unity and Python. |
+| [Custom Grid Sensors](Custom-GridSensors.md)        | Build specialized grid-based sensors for spatial data.  |
+| [Input System Integration](InputSystem-Integration.md) | Integrate ML-Agents with Unity's Input System.          |
+| [Inference Engine](Inference-Engine.md)             | Deploy trained models for real-time inference.          |
+| [Hugging Face Integration](Hugging-Face-Integration.md) | Connect with Hugging Face models and ecosystem.         |
+| [ML-Agents Package Settings](Package-Settings.md)   | Configure advanced package settings and preferences.    |
+| [Unity Environment Registry](Unity-Environment-Registry.md) | Manage and register Unity environments programmatically. |
diff --git a/com.unity.ml-agents/Documentation~/Background-Machine-Learning.md b/com.unity.ml-agents/Documentation~/Background-Machine-Learning.md
@@ -0,0 +1,193 @@
+# Background: Machine Learning
+
+Given that a number of users of the ML-Agents Toolkit might not have a formal
+machine learning background, this page provides an overview to facilitate the
+understanding of the ML-Agents Toolkit. However, we will not attempt to provide
+a thorough treatment of machine learning as there are fantastic resources
+online.
+
+Machine learning, a branch of artificial intelligence, focuses on learning
+patterns from data. The three main classes of machine learning algorithms
+include: unsupervised learning, supervised learning and reinforcement learning.
+Each class of algorithm learns from a different type of data. The following
+paragraphs provide an overview for each of these classes of machine learning, as
+well as introductory examples.
+
+## Unsupervised Learning
+
+The goal of
+[unsupervised learning](https://en.wikipedia.org/wiki/Unsupervised_learning) is
+to group or cluster similar items in a data set. For example, consider the
+players of a game. We may want to group the players depending on how engaged
+they are with the game. This would enable us to target different groups (e.g.
+for highly-engaged players we might invite them to be beta testers for new
+features, while for unengaged players we might email them helpful tutorials).
+Say that we wish to split our players into two groups. We would first define
+basic attributes of the players, such as the number of hours played, total money
+spent on in-app purchases and number of levels completed. We can then feed this
+data set (three attributes for every player) to an unsupervised learning
+algorithm where we specify the number of groups to be two. The algorithm would
+then split the data set of players into two groups where the players within each
+group would be similar to each other. Given the attributes we used to describe
+each player, in this case, the output would be a split of all the players into
+two groups, where one group would semantically represent the engaged players and
+the second group would semantically represent the unengaged players.
+
+With unsupervised learning, we did not provide specific examples of which
+players are considered engaged and which are considered unengaged. We just
+defined the appropriate attributes and relied on the algorithm to uncover the
+two groups on its own. This type of data set is typically called an unlabeled
+data set as it is lacking these direct labels. Consequently, unsupervised
+learning can be helpful in situations where these labels can be expensive or
+hard to produce. In the next paragraph, we overview supervised learning
+algorithms which accept input labels in addition to attributes.
+
+## Supervised Learning
+
+In [supervised learning](https://en.wikipedia.org/wiki/Supervised_learning), we
+do not want to just group similar items but directly learn a mapping from each
+item to the group (or class) that it belongs to. Returning to our earlier
+example of clustering players, let's say we now wish to predict which of our
+players are about to churn (that is stop playing the game for the next 30 days).
+We can look into our historical records and create a data set that contains
+attributes of our players in addition to a label indicating whether they have
+churned or not. Note that the player attributes we use for this churn prediction
+task may be different from the ones we used for our earlier clustering task. We
+can then feed this data set (attributes **and** label for each player) into a
+supervised learning algorithm which would learn a mapping from the player
+attributes to a label indicating whether that player will churn or not. The
+intuition is that the supervised learning algorithm will learn which values of
+these attributes typically correspond to players who have churned and not
+churned (for example, it may learn that players who spend very little and play
+for very short periods will most likely churn). Now given this learned model, we
+can provide it the attributes of a new player (one that recently started playing
+the game) and it would output a _predicted_ label for that player. This
+prediction is the algorithms expectation of whether the player will churn or
+not. We can now use these predictions to target the players who are expected to
+churn and entice them to continue playing the game.
+
+As you may have noticed, for both supervised and unsupervised learning, there
+are two tasks that need to be performed: attribute selection and model
+selection. Attribute selection (also called feature selection) pertains to
+selecting how we wish to represent the entity of interest, in this case, the
+player. Model selection, on the other hand, pertains to selecting the algorithm
+(and its parameters) that perform the task well. Both of these tasks are active
+areas of machine learning research and, in practice, require several iterations
+to achieve good performance.
+
+We now switch to reinforcement learning, the third class of machine learning
+algorithms, and arguably the one most relevant for the ML-Agents Toolkit.
+
+## Reinforcement Learning
+
+[Reinforcement learning](https://en.wikipedia.org/wiki/Reinforcement_learning)
+can be viewed as a form of learning for sequential decision making that is
+commonly associated with controlling robots (but is, in fact, much more
+general). Consider an autonomous firefighting robot that is tasked with
+navigating into an area, finding the fire and neutralizing it. At any given
+moment, the robot perceives the environment through its sensors (e.g. camera,
+heat, touch), processes this information and produces an action (e.g. move to
+the left, rotate the water hose, turn on the water). In other words, it is
+continuously making decisions about how to interact in this environment given
+its view of the world (i.e. sensors input) and objective (i.e. neutralizing the
+fire). Teaching a robot to be a successful firefighting machine is precisely
+what reinforcement learning is designed to do.
+
+More specifically, the goal of reinforcement learning is to learn a **policy**,
+which is essentially a mapping from **observations** to **actions**. An
+observation is what the robot can measure from its **environment** (in this
+case, all its sensory inputs) and an action, in its most raw form, is a change
+to the configuration of the robot (e.g. position of its base, position of its
+water hose and whether the hose is on or off).
+
+The last remaining piece of the reinforcement learning task is the **reward
+signal**. The robot is trained to learn a policy that maximizes its overall rewards. When training a robot to be a mean firefighting machine, we provide it
+with rewards (positive and negative) indicating how well it is doing on
+completing the task. Note that the robot does not _know_ how to put out fires
+before it is trained. It learns the objective because it receives a large
+positive reward when it puts out the fire and a small negative reward for every
+passing second. The fact that rewards are sparse (i.e. may not be provided at
+every step, but only when a robot arrives at a success or failure situation), is
+a defining characteristic of reinforcement learning and precisely why learning
+good policies can be difficult (and/or time-consuming) for complex environments.
+
+<div style="text-align: center"><img src="images/rl_cycle.png" alt="The reinforcement learning lifecycle."></div>
+
+Learning a policy usually requires many trials and iterative policy updates. More specifically,
+the robot is placed in several fire situations and over time learns an optimal
+policy which allows it to put out fires more effectively. Obviously, we cannot
+expect to train a robot repeatedly in the real world, particularly when fires
+are involved. This is precisely why the use of Unity as a simulator
+serves as the perfect training grounds for learning such behaviors. While our
+discussion of reinforcement learning has centered around robots, there are
+strong parallels between robots and characters in a game. In fact, in many ways,
+one can view a non-playable character (NPC) as a virtual robot, with its own
+observations about the environment, its own set of actions and a specific
+objective. Thus it is natural to explore how we can train behaviors within Unity
+using reinforcement learning. This is precisely what the ML-Agents Toolkit
+offers. The video linked below includes a reinforcement learning demo showcasing
+training character behaviors using the ML-Agents Toolkit.
+
+<p align="center">
+  <a href="http://www.youtube.com/watch?feature=player_embedded&v=fiQsmdwEGT8" target="_blank">
+    <img src="http://img.youtube.com/vi/fiQsmdwEGT8/0.jpg" alt="RL Demo" width="400" border="10" />
+  </a>
+</p>
+
+Similar to both unsupervised and supervised learning, reinforcement learning
+also involves two tasks: attribute selection and model selection. Attribute
+selection is defining the set of observations for the robot that best help it
+complete its objective, while model selection is defining the form of the policy
+(mapping from observations to actions) and its parameters. In practice, training
+behaviors is an iterative process that may require changing the attribute and
+model choices.
+
+## Training and Inference
+
+One common aspect of all three branches of machine learning is that they all
+involve a **training phase** and an **inference phase**. While the details of
+the training and inference phases are different for each of the three, at a
+high-level, the training phase involves building a model using the provided
+data, while the inference phase involves applying this model to new, previously
+unseen, data. More specifically:
+
+- For our unsupervised learning example, the training phase learns the optimal
+  two clusters based on the data describing existing players, while the
+  inference phase assigns a new player to one of these two clusters.
+- For our supervised learning example, the training phase learns the mapping
+  from player attributes to player label (whether they churned or not), and the
+  inference phase predicts whether a new player will churn or not based on that
+  learned mapping.
+- For our reinforcement learning example, the training phase learns the optimal
+  policy through guided trials, and in the inference phase, the agent observes
+  and takes actions in the wild using its learned policy.
+
+To briefly summarize: all three classes of algorithms involve training and
+inference phases in addition to attribute and model selections. What ultimately
+separates them is the type of data available to learn from. In unsupervised
+learning our data set was a collection of attributes, in supervised learning our
+data set was a collection of attribute-label pairs, and, lastly, in
+reinforcement learning our data set was a collection of
+observation-action-reward tuples.
+
+## Deep Learning
+
+[Deep learning](https://en.wikipedia.org/wiki/Deep_learning) is a family of
+algorithms that can be used to address any of the problems introduced above.
+More specifically, they can be used to solve both attribute and model selection
+tasks. Deep learning has gained popularity in recent years due to its
+outstanding performance on several challenging machine learning tasks. One
+example is [AlphaGo](https://en.wikipedia.org/wiki/AlphaGo), a
+[computer Go](https://en.wikipedia.org/wiki/Computer_Go) program, that leverages
+deep learning, that was able to beat Lee Sedol (a Go world champion).
+
+A key characteristic of deep learning algorithms is their ability to learn very
+complex functions from large amounts of training data. This makes them a natural
+choice for reinforcement learning tasks when a large amount of data can be
+generated, say through the use of a simulator or engine such as Unity. By
+generating hundreds of thousands of simulations of the environment within Unity,
+we can learn policies for very complex environments (a complex environment is
+one where the number of observations an agent perceives and the number of
+actions they can take are large). Many of the algorithms we provide in ML-Agents
+use some form of deep learning, built on top of the open-source library,
+[PyTorch](Background-PyTorch.md).
diff --git a/com.unity.ml-agents/Documentation~/Background-PyTorch.md b/com.unity.ml-agents/Documentation~/Background-PyTorch.md
@@ -0,0 +1,35 @@
+# Background: PyTorch
+
+As discussed in our
+[machine learning background page](Background-Machine-Learning.md), many of the
+algorithms we provide in the ML-Agents Toolkit leverage some form of deep
+learning. More specifically, our implementations are built on top of the
+open-source library [PyTorch](https://pytorch.org/). In this page we
+provide a brief overview of PyTorch and TensorBoard
+that we leverage within the ML-Agents Toolkit.
+
+## PyTorch
+
+[PyTorch](https://pytorch.org/) is an open source library for
+performing computations using data flow graphs, the underlying representation of
+deep learning models. It facilitates training and inference on CPUs and GPUs in
+a desktop, server, or mobile device. Within the ML-Agents Toolkit, when you
+train the behavior of an agent, the output is a model (.onnx) file that you can
+then associate with an Agent. Unless you implement a new algorithm, the use of
+PyTorch is mostly abstracted away and behind the scenes.
+
+## TensorBoard
+
+One component of training models with PyTorch is setting the values of
+certain model attributes (called _hyperparameters_). Finding the right values of
+these hyperparameters can require a few iterations. Consequently, we leverage a
+visualization tool called
+[TensorBoard](https://www.tensorflow.org/tensorboard).
+It allows the visualization of certain agent attributes (e.g. reward) throughout
+training which can be helpful in both building intuitions for the different
+hyperparameters and setting the optimal values for your Unity environment. We
+provide more details on setting the hyperparameters in the
+[Training ML-Agents](Training-ML-Agents.md) page. If you are unfamiliar with
+TensorBoard we recommend our guide on
+[using TensorBoard with ML-Agents](Using-Tensorboard.md) or this
+[tutorial](https://github.com/dandelionmane/tf-dev-summit-tensorboard-tutorial).
diff --git a/com.unity.ml-agents/Documentation~/Background-Unity.md b/com.unity.ml-agents/Documentation~/Background-Unity.md
@@ -0,0 +1,19 @@
+# Background: Unity
+
+If you are not familiar with the [Unity Engine](https://unity3d.com/unity), we
+highly recommend the [Unity Manual](https://docs.unity3d.com/Manual/index.html)
+and [Tutorials page](https://unity3d.com/learn/tutorials). The
+[Roll-a-ball tutorial](https://learn.unity.com/project/roll-a-ball)
+is a fantastic resource to learn all the basic concepts of Unity to get started
+with the ML-Agents Toolkit:
+
+- [Editor](https://docs.unity3d.com/Manual/sprite/sprite-editor/use-editor.html)
+- [Scene](https://docs.unity3d.com/Manual/CreatingScenes.html)
+- [GameObject](https://docs.unity3d.com/Manual/GameObjects.html)
+- [Rigidbody](https://docs.unity3d.com/ScriptReference/Rigidbody.html)
+- [Camera](https://docs.unity3d.com/Manual/Cameras.html)
+- [Scripting](https://docs.unity3d.com/Manual/ScriptingSection.html)
+- [Physics](https://docs.unity3d.com/Manual/PhysicsSection.html)
+- [Ordering of event functions](https://docs.unity3d.com/Manual/ExecutionOrder.html)
+  (e.g. FixedUpdate, Update)
+- [Prefabs](https://docs.unity3d.com/Manual/Prefabs.html)
diff --git a/com.unity.ml-agents/Documentation~/Background.md b/com.unity.ml-agents/Documentation~/Background.md
@@ -0,0 +1,11 @@
+# Background
+
+This section provides foundational knowledge to help you understand the technologies and concepts that power the ML-Agents Toolkit. 
+
+| **Topic**                                          | **Description**                                                               |
+|----------------------------------------------------|-------------------------------------------------------------------------------|
+| [Machine Learning](Background-Machine-Learning.md) | Introduction to ML concepts, reinforcement learning, and training principles. |
+| [Unity](Background-Unity.md)                       | Unity fundamentals for ML-Agents development and environment creation.        |
+| [PyTorch](Background-PyTorch.md)                   | PyTorch basics for understanding the training pipeline and neural networks.   |
+| [Using Virtual Environment](Using-Virtual-Environment.md) | Setting up and managing Python virtual environments for ML-Agents.     |
+| [ELO Rating System](ELO-Rating-System.md)          | Understanding ELO rating system for multi-agent training evaluation.         |
diff --git a/com.unity.ml-agents/Documentation~/Blog-posts.md b/com.unity.ml-agents/Documentation~/Blog-posts.md
@@ -0,0 +1,22 @@
+We have published a series of blog posts that are relevant for ML-Agents:
+
+- (July 12, 2021)
+  [ML-Agents plays Dodgeball](https://blog.unity.com/technology/ml-agents-plays-dodgeball)
+- (May 5, 2021)
+  [ML-Agents v2.0 release: Now supports training complex cooperative behaviors](https://blogs.unity3d.com/2021/05/05/ml-agents-v2-0-release-now-supports-training-complex-cooperative-behaviors/)
+- (November 20, 2020)(external link is ok???)
+  [How Eidos-Montréal created Grid Sensors to improve observations for training agents](https://www.eidosmontreal.com/news/the-grid-sensor-for-automated-game-testing/)
+- (February 28, 2020)
+  [Training intelligent adversaries using self-play with ML-Agents](https://blogs.unity3d.com/2020/02/28/training-intelligent-adversaries-using-self-play-with-ml-agents/)
+- (November 11, 2019)
+  [Training your agents 7 times faster with ML-Agents](https://blogs.unity3d.com/2019/11/11/training-your-agents-7-times-faster-with-ml-agents/)
+- (October 2, 2018)(????)
+  [Puppo, The Corgi: Cuteness Overload with the Unity ML-Agents Toolkit](https://blogs.unity3d.com/2018/10/02/puppo-the-corgi-cuteness-overload-with-the-unity-ml-agents-toolkit/)
+- (June 26, 2018)
+  [Solving sparse-reward tasks with Curiosity](https://blogs.unity3d.com/2018/06/26/solving-sparse-reward-tasks-with-curiosity/)
+- (June 19, 2018)
+  [Unity ML-Agents Toolkit v0.4 and Udacity Deep Reinforcement Learning Nanodegree](https://github.com/udacity/deep-reinforcement-learning)
+- (December 11, 2017)(???)
+  [Using Machine Learning Agents in a real game: a beginner’s guide](https://blogs.unity3d.com/2017/12/11/using-machine-learning-agents-in-a-real-game-a-beginners-guide/)
+- (September 19, 2017)
+  [Introducing: Unity Machine Learning Agents Toolkit](https://blogs.unity3d.com/2017/09/19/introducing-unity-machine-learning-agents/)