Yes it is possible to implement state value based approach.
There are mainly three ways to implement reinforcement-learning
in ML, which are:
- Value-based:
The value-based approach is about to find the optimal value
function, which is the maximum value at a state under any policy.
Therefore, the agent expects the long-term return at any state(s)
under policy π.
- Policy-based:
Policy-based approach is to find the optimal policy for the maximum
future rewards without using the value function. In this approach,
the agent tries to apply such a policy that the action performed in
each step helps to maximize the future reward.
The policy-based approach has mainly two types of policy:
- Deterministic: The same action is produced by
the policy (π) at any state.
- Stochastic: In this policy, probability
determines the produced action.
- Model-based: In the model-based approach, a
virtual model is created for the environment, and the agent
explores that environment to learn it. There is no particular
solution or algorithm for this approach because the model
representation is different for each environment.