Anuket Project

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 11 Next »


Volunteers


NameML Category
Jahanvi                                  Supervised                                        
AkankshaUnsupervised
Kanak RajReinforced

Supervised

  • Supervised learning algorithms make predictions based on a set of examples
  • Classification: When the data are being used to predict a categorical variable, supervised learning is also called classification. This is the case when assigning a label or indicator, either dog or cat to an image. When there are only two labels, this is called binary classification. When there are more than two categories, the problems are called multi-class classification.
  • Regression: When predicting continuous values, the problems become a regression problem.
  • Forecasting: This is the process of making predictions about the future based on past and present data. It is most commonly used to analyze trends. A common example might be an estimation of the next year sales based on the sales of the current year and previous years.

Algorithms

In progress**

NameComments on ApplicabilityReference









Un-supervised

  1. Clustering -  hierarchical clusteringk-means, mixture models, DBSCAN, and OPTICS algorithm
  2. Anomaly Detection - Local Outlier Factor, and Isolation Forest
  3. Dimensionality Reduction - Principal component analysis, Independent component analysis, Non-negative matrix factorization, Singular value decomposition

Algorithms


NameComments on ApplicabilityReference
Hierarchical Clustering
  1. (N-1) combination of clusters are formed to choose from.
  2. Expensive and slow. n×n  distance matrix needs to be made.
  3. Cannot work on very large datasets.
  4. Results are reproducible.
  5. Does not work well with hyper-spherical clusters.
  6. Can provide insights into the way the data pts. are clustered.
  7. Can use various linkage methods(apart from centroid).

k-means
  1. Pre-specified number of clusters.
  2. Less computationally intensive.
  3. Suited for large dataset.
  4. Point of start can be random which leads to a different result each time the algorithm runs.
  5. K-means needs circular data. Hyper-spherical clusters.
  6. K-Means simply divides data into mutually exclusive subsets without giving much insight into the process of division.
  7. K-Means uses median or mean to compute centroid for representing cluster.

Gaussian Mixture Models
  1. Pre-specified number of clusters.
  2. GMs are somewhat more flexible and with a covariance matrix we can make the boundaries elliptical (as opposed to K-means which makes circular boundaries).
  3. Another thing is that GMs is a probabilistic algorithm. By assigning the probabilities to data points, we can express how strong is our belief that a given data point belongs to a specific cluster.
  4. GMs usually tend to be slower than K-Means because it takes more iterations to reach the convergence. (The problem with GMs is that they have converged quickly to a local minimum that is not very optimal for this dataset. To avoid this issue, GMs are usually initialized with K-Means.)

Reinforcement Learning

  1. Active Learning
  2. No labeled data
  3. No supervisor, only  reward
  4. Actions are sequential
  5. Feedback is delayed, not instantaneous.
  6. Can afford to make mistakes?
  7. Is it possible to use a simulated environment for the task?
  8. Lots of time
  9. Think about the variables that can define the state of the environment.
    1. State Variables and Quantify them
    2. The agent has access to these variables at every time step
    3. Concrete Reward Function and Compute Reward after action
    4. Define Policy Function

Model-Free vs Model-Based RL

Whether the agent has access to (or learns) a model of the environment(a function that predicts state transitions and rewards)

Model Free

Model-Based

 forego the potential gains in sample efficiency from using a model

Allows to plan ahead and look in possible results for a range of possible choices.

 easier to implement and tune.

Ground Truth Model for any task is generally not available.


If agents want to use a model then it has to prepare it purely from experience


fundamentally hard


being willing to throw lots of time 


High computation


Can fail off due to over-exploitation of bias

What to Learn in Model-Free RL

  1. Policy Optimization
  2. Q-Learning

    Policy Optimization

    Q-Learning

    optimize the parameters either directly by gradient ascent on the performance objective or indirectly, by maximizing local approximations

    learn an approximator for the optimal action-value function

    performed on-policy, each update only uses data collected while acting according to the most recent version of the policy

    performed off-policy, each update can use data collected at any point during training

    directly optimize for the thing you want

    indirectly optimize for agent performance

    More stable

    tends to be less stable

    advantage of being substantially more sample efficient when they do work, because they can reuse data more effectively 

    Less sample efficient and takes longer to learn as learning data is limited at every iteration.



  • Value-based methods
    • (Q-learning, Deep Q-learning): where we learn a value function that will map each state action pair to a value.
    • find the best action to take for each state — the action with the biggest value.
    • works well when you have a finite set of actions.
  • Policy-based methods
    • REINFORCE with Policy Gradients
    • we directly optimize the policy without using a value function.
    • when the action space is continuous or stochastic.
    • use total rewards of the episode
    • problem is finding a good score function to compute how good a policy is
  • Hybrid Method
    • Actor-Critic Method
      • Policy Learning + Value Learning
      • Policy Function → Actor: Choses to make moves
      • Value Function → Critic: Decides how the agent is performing
      • we make an update at each step (TD Learning)
      • Because we do an update at each time step, we can’t use the total rewards R(t).
      • Both learn in parallel, like GANs
      • Not Stable but several variations which are stable

Algorithms

NameComments on ApplicabilityReference

Q Learning









Is RL Possible?

  1. Do you have very high computation power?
  2. Do you have lots of time to train an agent?
  3. Do you need your model to be self-explanatory, humans can understand the reasoning behind the predictions and decisions made by the model?
  4. Do you need your model to be easy to implement and maintain?
  5. Is it possible to try the problem several times and afford to make many mistakes?
  6. In your situation, do active and online learning of algorithms is possible i.e while learning by actions, explore new data space and then learn from such conditions and data?
  7. In your situation, Can the algorithm take sequential action and complete the task?
  8. Is it possible to define policy function, actions that the agent takes as a function of the agent's state and the environment.?
  9. Is it possible to define a function to receive feedback from actions, such that feedback helps to learn and take new action?
  10. Can you simulate an environment for the task so that algorithm can try lots of times and can make mistakes to learn?





  • No labels