Volunteers
...
Un-supervised
- Clustering - hierarchical clustering, k-means, mixture models, DBSCAN, and OPTICS algorithm
- Anomaly Detection - Local Outlier Factor, and Isolation Forest
- Dimensionality Reduction - Principal component analysis, Independent component analysis, Non-negative matrix factorization, Singular value decomposition
...
Name | Comments on Applicability | Reference |
---|---|---|
Hierarchical Clustering |
| |
k-means |
| |
Gaussian Mixture Models |
Reinforcement Learning
...
- Policy Optimization
Q-Learning
Policy Optimization
Q-Learning
optimize the parameters either directly by gradient ascent on the performance objective or indirectly, by maximizing local approximations
learn an approximator for the optimal action-value function
performed on-policy, each update only uses data collected while acting according to the most recent version of the policy
performed off-policy, each update can use data collected at any point during training
directly optimize for the thing you want
indirectly optimize for agent performance
More stable
tends to be less stable
advantage of being substantially more sample efficient when they do work, because they can reuse data more effectively
Less sample efficient and takes longer to learn as learning data is limited at every iteration.
- Value-based methods
- (Q-learning, Deep Q-learning): where we learn a value function that will map each state action pair to a value.
- find the best action to take for each state — the action with the biggest value.
- works well when you have a finite set of actions.
- Policy-based methods
- REINFORCE with Policy Gradients
- we directly optimize the policy without using a value function.
- when the action space is continuous or stochastic.
- use total rewards of the episode
- problem is finding a good score function to compute how good a policy is
- Hybrid Method
- Actor-Critic Method
- Policy Learning + Value Learning
- Policy Function → Actor: Choses to make moves
- Value Function → Critic: Decides how the agent is performing
- we make an update at each step (TD Learning)
- Because we do an update at each time step, we can’t use the total rewards R(t).
- Both learn in parallel, like GANs
- Not Stable but several variations which are stable
- Actor-Critic Method
Algorithms
Name | Comments on Applicability | Reference |
---|---|---|
Q Learning | ||
...