Skip to content

Move docs to package #6232

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 58 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
509a4bf
Move all files from the extension package to the main package
maryamziaa Jul 22, 2025
440743f
Update the extension tests
maryamziaa Jul 22, 2025
3bce3bd
Remove the duplicate condition
maryamziaa Jul 22, 2025
1f97457
Remove the duplicate condition
maryamziaa Jul 22, 2025
6391a89
Remove duplicate PhysicsMaterial
maryamziaa Jul 22, 2025
da5d781
Remove redundant import statements
maryamziaa Jul 22, 2025
befebd3
Remove Extensions meta file
maryamziaa Jul 22, 2025
886f881
Black reformatting
maryamziaa Jul 22, 2025
113f39f
Merge remote-tracking branch 'origin/develop' into merge-extensions-i…
maryamziaa Jul 23, 2025
9d47fe9
Remove duplicate entries
maryamziaa Jul 23, 2025
6dd1559
Move Runtime Input tests to a separate assembly
maryamziaa Jul 23, 2025
bfd96c1
Move Runtime example test to Tests
maryamziaa Jul 23, 2025
fc1db44
Update assembly
maryamziaa Jul 23, 2025
6cfad25
Update CHANGELOG.md
maryamziaa Jul 23, 2025
53f2134
Update inputsystem version in test assembly
maryamziaa Jul 24, 2025
3dc5882
Undo scene setting
maryamziaa Jul 24, 2025
f8f4c9f
Update the doc
maryamziaa Jul 24, 2025
4856cf4
Update the doc
maryamziaa Jul 24, 2025
0150980
Update scene setting
maryamziaa Jul 24, 2025
a4ab995
Update scene template
maryamziaa Jul 24, 2025
9e44c39
Update assembly
maryamziaa Jul 24, 2025
827d56e
Change namespace to Unity.MLAgents.Input
maryamziaa Jul 24, 2025
15ff1bb
Remove redundant import
maryamziaa Jul 24, 2025
8ad46c7
Remove unnecessary condition
maryamziaa Jul 25, 2025
b3c9106
Update assembly files
maryamziaa Jul 25, 2025
34a5098
Another update
maryamziaa Jul 25, 2025
1ce9848
Remove empty file
maryamziaa Jul 25, 2025
80a9023
Copy web docs to package docs
maryamziaa Jul 25, 2025
0807316
Unify the package doc and web doc main pages
maryamziaa Jul 25, 2025
e66ef0b
Update image path
maryamziaa Jul 25, 2025
652e0bd
Update image path
maryamziaa Jul 25, 2025
daa3fe4
Update hyperlinks
maryamziaa Jul 25, 2025
7bde130
Add MovedFrom tags
maryamziaa Jul 29, 2025
76bd2d4
Revert extension package change
maryamziaa Jul 30, 2025
9227488
Upgrade upm-pvp
maryamziaa Jul 30, 2025
28e08b6
Merge remote-tracking branch 'origin/merge-extensions-into-ml-agents'…
maryamziaa Jul 30, 2025
b957c45
WIP- Add table of contents
maryamziaa Jul 31, 2025
47742c8
WIP- doc
maryamziaa Aug 1, 2025
bf45858
Update index
maryamziaa Aug 1, 2025
f6f23bc
Merge with develop
maryamziaa Aug 1, 2025
be965e7
Update index
maryamziaa Aug 1, 2025
7dff8d0
Update Python APIs and Advanced Features
maryamziaa Aug 4, 2025
de0d8de
Update more sections
maryamziaa Aug 4, 2025
21ed822
Update index
maryamziaa Aug 4, 2025
3486dcd
Update colab doc
maryamziaa Aug 4, 2025
1f27ea1
Update more docs
maryamziaa Aug 4, 2025
989ef98
Update API Reference docs and settings
maryamziaa Aug 4, 2025
544fd45
Update more docs
maryamziaa Aug 4, 2025
0ae2942
Update wrapper docs
maryamziaa Aug 4, 2025
6190d41
Correct reference inconsistency
maryamziaa Aug 4, 2025
fb1bb27
Convert relative path to github url
maryamziaa Aug 4, 2025
3c0286f
Update README
maryamziaa Aug 4, 2025
ab4b37d
Update reference to the old docs
maryamziaa Aug 4, 2025
80c8ee1
Remove readme migration file
maryamziaa Aug 4, 2025
76c44a8
Add doc migration notice
maryamziaa Aug 4, 2025
6952c8a
Deprecated banner
maryamziaa Aug 4, 2025
b1f7fdd
Update banner
maryamziaa Aug 5, 2025
c746a51
Update doc source in readme table
maryamziaa Aug 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,10 @@

# Generated doc folders
/docs/html
/com.unity.ml-agents/Documentation~/html

# MkDocs build output
/site/

# Mac hidden files
*.DS_Store
Expand Down
20 changes: 20 additions & 0 deletions com.unity.ml-agents/Documentation~/API-Reference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# API Reference

Our developer-facing C# classes have been documented to be compatible with
Doxygen for auto-generating HTML documentation.

To generate the API reference, download Doxygen and run the following command
within the `Documentation~/` directory:

```sh
doxygen dox-ml-agents.conf
```

`dox-ml-agents.conf` is a Doxygen configuration file for the ML-Agents Toolkit
that includes the classes that have been properly formatted. The generated HTML
files will be placed in the `html/` subdirectory. Open `index.html` within that
subdirectory to navigate to the API reference home. Note that `html/` is already
included in the repository's `.gitignore` file.

In the near future, we aim to expand our documentation to include the Python
classes.
14 changes: 14 additions & 0 deletions com.unity.ml-agents/Documentation~/Advanced-Features.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Advanced Features

The ML-Agents Toolkit provides several advanced features that extend the core functionality and enable sophisticated use cases.


| **Feature** | **Description** |
|------------------------------------------------------|----------------------------------------------------------|
| [Custom Side Channels](Custom-SideChannels.md) | Create custom communication channels between Unity and Python. |
| [Custom Grid Sensors](Custom-GridSensors.md) | Build specialized grid-based sensors for spatial data. |
| [Input System Integration](InputSystem-Integration.md) | Integrate ML-Agents with Unity's Input System. |
| [Inference Engine](Inference-Engine.md) | Deploy trained models for real-time inference. |
| [Hugging Face Integration](Hugging-Face-Integration.md) | Connect with Hugging Face models and ecosystem. |
| [ML-Agents Package Settings](Package-Settings.md) | Configure advanced package settings and preferences. |
| [Unity Environment Registry](Unity-Environment-Registry.md) | Manage and register Unity environments programmatically. |
193 changes: 193 additions & 0 deletions com.unity.ml-agents/Documentation~/Background-Machine-Learning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
# Background: Machine Learning

Given that a number of users of the ML-Agents Toolkit might not have a formal
machine learning background, this page provides an overview to facilitate the
understanding of the ML-Agents Toolkit. However, we will not attempt to provide
a thorough treatment of machine learning as there are fantastic resources
online.

Machine learning, a branch of artificial intelligence, focuses on learning
patterns from data. The three main classes of machine learning algorithms
include: unsupervised learning, supervised learning and reinforcement learning.
Each class of algorithm learns from a different type of data. The following
paragraphs provide an overview for each of these classes of machine learning, as
well as introductory examples.

## Unsupervised Learning

The goal of
[unsupervised learning](https://en.wikipedia.org/wiki/Unsupervised_learning) is
to group or cluster similar items in a data set. For example, consider the
players of a game. We may want to group the players depending on how engaged
they are with the game. This would enable us to target different groups (e.g.
for highly-engaged players we might invite them to be beta testers for new
features, while for unengaged players we might email them helpful tutorials).
Say that we wish to split our players into two groups. We would first define
basic attributes of the players, such as the number of hours played, total money
spent on in-app purchases and number of levels completed. We can then feed this
data set (three attributes for every player) to an unsupervised learning
algorithm where we specify the number of groups to be two. The algorithm would
then split the data set of players into two groups where the players within each
group would be similar to each other. Given the attributes we used to describe
each player, in this case, the output would be a split of all the players into
two groups, where one group would semantically represent the engaged players and
the second group would semantically represent the unengaged players.

With unsupervised learning, we did not provide specific examples of which
players are considered engaged and which are considered unengaged. We just
defined the appropriate attributes and relied on the algorithm to uncover the
two groups on its own. This type of data set is typically called an unlabeled
data set as it is lacking these direct labels. Consequently, unsupervised
learning can be helpful in situations where these labels can be expensive or
hard to produce. In the next paragraph, we overview supervised learning
algorithms which accept input labels in addition to attributes.

## Supervised Learning

In [supervised learning](https://en.wikipedia.org/wiki/Supervised_learning), we
do not want to just group similar items but directly learn a mapping from each
item to the group (or class) that it belongs to. Returning to our earlier
example of clustering players, let's say we now wish to predict which of our
players are about to churn (that is stop playing the game for the next 30 days).
We can look into our historical records and create a data set that contains
attributes of our players in addition to a label indicating whether they have
churned or not. Note that the player attributes we use for this churn prediction
task may be different from the ones we used for our earlier clustering task. We
can then feed this data set (attributes **and** label for each player) into a
supervised learning algorithm which would learn a mapping from the player
attributes to a label indicating whether that player will churn or not. The
intuition is that the supervised learning algorithm will learn which values of
these attributes typically correspond to players who have churned and not
churned (for example, it may learn that players who spend very little and play
for very short periods will most likely churn). Now given this learned model, we
can provide it the attributes of a new player (one that recently started playing
the game) and it would output a _predicted_ label for that player. This
prediction is the algorithms expectation of whether the player will churn or
not. We can now use these predictions to target the players who are expected to
churn and entice them to continue playing the game.

As you may have noticed, for both supervised and unsupervised learning, there
are two tasks that need to be performed: attribute selection and model
selection. Attribute selection (also called feature selection) pertains to
selecting how we wish to represent the entity of interest, in this case, the
player. Model selection, on the other hand, pertains to selecting the algorithm
(and its parameters) that perform the task well. Both of these tasks are active
areas of machine learning research and, in practice, require several iterations
to achieve good performance.

We now switch to reinforcement learning, the third class of machine learning
algorithms, and arguably the one most relevant for the ML-Agents Toolkit.

## Reinforcement Learning

[Reinforcement learning](https://en.wikipedia.org/wiki/Reinforcement_learning)
can be viewed as a form of learning for sequential decision making that is
commonly associated with controlling robots (but is, in fact, much more
general). Consider an autonomous firefighting robot that is tasked with
navigating into an area, finding the fire and neutralizing it. At any given
moment, the robot perceives the environment through its sensors (e.g. camera,
heat, touch), processes this information and produces an action (e.g. move to
the left, rotate the water hose, turn on the water). In other words, it is
continuously making decisions about how to interact in this environment given
its view of the world (i.e. sensors input) and objective (i.e. neutralizing the
fire). Teaching a robot to be a successful firefighting machine is precisely
what reinforcement learning is designed to do.

More specifically, the goal of reinforcement learning is to learn a **policy**,
which is essentially a mapping from **observations** to **actions**. An
observation is what the robot can measure from its **environment** (in this
case, all its sensory inputs) and an action, in its most raw form, is a change
to the configuration of the robot (e.g. position of its base, position of its
water hose and whether the hose is on or off).

The last remaining piece of the reinforcement learning task is the **reward
signal**. The robot is trained to learn a policy that maximizes its overall rewards. When training a robot to be a mean firefighting machine, we provide it
with rewards (positive and negative) indicating how well it is doing on
completing the task. Note that the robot does not _know_ how to put out fires
before it is trained. It learns the objective because it receives a large
positive reward when it puts out the fire and a small negative reward for every
passing second. The fact that rewards are sparse (i.e. may not be provided at
every step, but only when a robot arrives at a success or failure situation), is
a defining characteristic of reinforcement learning and precisely why learning
good policies can be difficult (and/or time-consuming) for complex environments.

<div style="text-align: center"><img src="images/rl_cycle.png" alt="The reinforcement learning lifecycle."></div>

Learning a policy usually requires many trials and iterative policy updates. More specifically,
the robot is placed in several fire situations and over time learns an optimal
policy which allows it to put out fires more effectively. Obviously, we cannot
expect to train a robot repeatedly in the real world, particularly when fires
are involved. This is precisely why the use of Unity as a simulator
serves as the perfect training grounds for learning such behaviors. While our
discussion of reinforcement learning has centered around robots, there are
strong parallels between robots and characters in a game. In fact, in many ways,
one can view a non-playable character (NPC) as a virtual robot, with its own
observations about the environment, its own set of actions and a specific
objective. Thus it is natural to explore how we can train behaviors within Unity
using reinforcement learning. This is precisely what the ML-Agents Toolkit
offers. The video linked below includes a reinforcement learning demo showcasing
training character behaviors using the ML-Agents Toolkit.

<p align="center">
<a href="http://www.youtube.com/watch?feature=player_embedded&v=fiQsmdwEGT8" target="_blank">
<img src="http://img.youtube.com/vi/fiQsmdwEGT8/0.jpg" alt="RL Demo" width="400" border="10" />
</a>
</p>

Similar to both unsupervised and supervised learning, reinforcement learning
also involves two tasks: attribute selection and model selection. Attribute
selection is defining the set of observations for the robot that best help it
complete its objective, while model selection is defining the form of the policy
(mapping from observations to actions) and its parameters. In practice, training
behaviors is an iterative process that may require changing the attribute and
model choices.

## Training and Inference

One common aspect of all three branches of machine learning is that they all
involve a **training phase** and an **inference phase**. While the details of
the training and inference phases are different for each of the three, at a
high-level, the training phase involves building a model using the provided
data, while the inference phase involves applying this model to new, previously
unseen, data. More specifically:

- For our unsupervised learning example, the training phase learns the optimal
two clusters based on the data describing existing players, while the
inference phase assigns a new player to one of these two clusters.
- For our supervised learning example, the training phase learns the mapping
from player attributes to player label (whether they churned or not), and the
inference phase predicts whether a new player will churn or not based on that
learned mapping.
- For our reinforcement learning example, the training phase learns the optimal
policy through guided trials, and in the inference phase, the agent observes
and takes actions in the wild using its learned policy.

To briefly summarize: all three classes of algorithms involve training and
inference phases in addition to attribute and model selections. What ultimately
separates them is the type of data available to learn from. In unsupervised
learning our data set was a collection of attributes, in supervised learning our
data set was a collection of attribute-label pairs, and, lastly, in
reinforcement learning our data set was a collection of
observation-action-reward tuples.

## Deep Learning

[Deep learning](https://en.wikipedia.org/wiki/Deep_learning) is a family of
algorithms that can be used to address any of the problems introduced above.
More specifically, they can be used to solve both attribute and model selection
tasks. Deep learning has gained popularity in recent years due to its
outstanding performance on several challenging machine learning tasks. One
example is [AlphaGo](https://en.wikipedia.org/wiki/AlphaGo), a
[computer Go](https://en.wikipedia.org/wiki/Computer_Go) program, that leverages
deep learning, that was able to beat Lee Sedol (a Go world champion).

A key characteristic of deep learning algorithms is their ability to learn very
complex functions from large amounts of training data. This makes them a natural
choice for reinforcement learning tasks when a large amount of data can be
generated, say through the use of a simulator or engine such as Unity. By
generating hundreds of thousands of simulations of the environment within Unity,
we can learn policies for very complex environments (a complex environment is
one where the number of observations an agent perceives and the number of
actions they can take are large). Many of the algorithms we provide in ML-Agents
use some form of deep learning, built on top of the open-source library,
[PyTorch](Background-PyTorch.md).
35 changes: 35 additions & 0 deletions com.unity.ml-agents/Documentation~/Background-PyTorch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Background: PyTorch

As discussed in our
[machine learning background page](Background-Machine-Learning.md), many of the
algorithms we provide in the ML-Agents Toolkit leverage some form of deep
learning. More specifically, our implementations are built on top of the
open-source library [PyTorch](https://pytorch.org/). In this page we
provide a brief overview of PyTorch and TensorBoard
that we leverage within the ML-Agents Toolkit.

## PyTorch

[PyTorch](https://pytorch.org/) is an open source library for
performing computations using data flow graphs, the underlying representation of
deep learning models. It facilitates training and inference on CPUs and GPUs in
a desktop, server, or mobile device. Within the ML-Agents Toolkit, when you
train the behavior of an agent, the output is a model (.onnx) file that you can
then associate with an Agent. Unless you implement a new algorithm, the use of
PyTorch is mostly abstracted away and behind the scenes.

## TensorBoard

One component of training models with PyTorch is setting the values of
certain model attributes (called _hyperparameters_). Finding the right values of
these hyperparameters can require a few iterations. Consequently, we leverage a
visualization tool called
[TensorBoard](https://www.tensorflow.org/tensorboard).
It allows the visualization of certain agent attributes (e.g. reward) throughout
training which can be helpful in both building intuitions for the different
hyperparameters and setting the optimal values for your Unity environment. We
provide more details on setting the hyperparameters in the
[Training ML-Agents](Training-ML-Agents.md) page. If you are unfamiliar with
TensorBoard we recommend our guide on
[using TensorBoard with ML-Agents](Using-Tensorboard.md) or this
[tutorial](https://github.com/dandelionmane/tf-dev-summit-tensorboard-tutorial).
19 changes: 19 additions & 0 deletions com.unity.ml-agents/Documentation~/Background-Unity.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Background: Unity

If you are not familiar with the [Unity Engine](https://unity3d.com/unity), we
highly recommend the [Unity Manual](https://docs.unity3d.com/Manual/index.html)
and [Tutorials page](https://unity3d.com/learn/tutorials). The
[Roll-a-ball tutorial](https://learn.unity.com/project/roll-a-ball)
is a fantastic resource to learn all the basic concepts of Unity to get started
with the ML-Agents Toolkit:

- [Editor](https://docs.unity3d.com/Manual/sprite/sprite-editor/use-editor.html)
- [Scene](https://docs.unity3d.com/Manual/CreatingScenes.html)
- [GameObject](https://docs.unity3d.com/Manual/GameObjects.html)
- [Rigidbody](https://docs.unity3d.com/ScriptReference/Rigidbody.html)
- [Camera](https://docs.unity3d.com/Manual/Cameras.html)
- [Scripting](https://docs.unity3d.com/Manual/ScriptingSection.html)
- [Physics](https://docs.unity3d.com/Manual/PhysicsSection.html)
- [Ordering of event functions](https://docs.unity3d.com/Manual/ExecutionOrder.html)
(e.g. FixedUpdate, Update)
- [Prefabs](https://docs.unity3d.com/Manual/Prefabs.html)
11 changes: 11 additions & 0 deletions com.unity.ml-agents/Documentation~/Background.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Background

This section provides foundational knowledge to help you understand the technologies and concepts that power the ML-Agents Toolkit.

| **Topic** | **Description** |
|----------------------------------------------------|-------------------------------------------------------------------------------|
| [Machine Learning](Background-Machine-Learning.md) | Introduction to ML concepts, reinforcement learning, and training principles. |
| [Unity](Background-Unity.md) | Unity fundamentals for ML-Agents development and environment creation. |
| [PyTorch](Background-PyTorch.md) | PyTorch basics for understanding the training pipeline and neural networks. |
| [Using Virtual Environment](Using-Virtual-Environment.md) | Setting up and managing Python virtual environments for ML-Agents. |
| [ELO Rating System](ELO-Rating-System.md) | Understanding ELO rating system for multi-agent training evaluation. |
22 changes: 22 additions & 0 deletions com.unity.ml-agents/Documentation~/Blog-posts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
We have published a series of blog posts that are relevant for ML-Agents:

- (July 12, 2021)
[ML-Agents plays Dodgeball](https://blog.unity.com/technology/ml-agents-plays-dodgeball)
- (May 5, 2021)
[ML-Agents v2.0 release: Now supports training complex cooperative behaviors](https://blogs.unity3d.com/2021/05/05/ml-agents-v2-0-release-now-supports-training-complex-cooperative-behaviors/)
- (November 20, 2020)(external link is ok???)
[How Eidos-Montréal created Grid Sensors to improve observations for training agents](https://www.eidosmontreal.com/news/the-grid-sensor-for-automated-game-testing/)
- (February 28, 2020)
[Training intelligent adversaries using self-play with ML-Agents](https://blogs.unity3d.com/2020/02/28/training-intelligent-adversaries-using-self-play-with-ml-agents/)
- (November 11, 2019)
[Training your agents 7 times faster with ML-Agents](https://blogs.unity3d.com/2019/11/11/training-your-agents-7-times-faster-with-ml-agents/)
- (October 2, 2018)(????)
[Puppo, The Corgi: Cuteness Overload with the Unity ML-Agents Toolkit](https://blogs.unity3d.com/2018/10/02/puppo-the-corgi-cuteness-overload-with-the-unity-ml-agents-toolkit/)
- (June 26, 2018)
[Solving sparse-reward tasks with Curiosity](https://blogs.unity3d.com/2018/06/26/solving-sparse-reward-tasks-with-curiosity/)
- (June 19, 2018)
[Unity ML-Agents Toolkit v0.4 and Udacity Deep Reinforcement Learning Nanodegree](https://github.com/udacity/deep-reinforcement-learning)
- (December 11, 2017)(???)
[Using Machine Learning Agents in a real game: a beginner’s guide](https://blogs.unity3d.com/2017/12/11/using-machine-learning-agents-in-a-real-game-a-beginners-guide/)
- (September 19, 2017)
[Introducing: Unity Machine Learning Agents Toolkit](https://blogs.unity3d.com/2017/09/19/introducing-unity-machine-learning-agents/)
Loading
Loading