NPTEL An Introduction To Artificial Intelligence Assignment 11 Answer
byNavin Kumar-
0
NPTEL SWAYAM is a free online learning platform providing courses in various disciplines from top universities and institutions in India. It offers an interactive learning environment with engaging video lectures, quizzes, assignments, and discussion boards. Learners can access courses at their own pace and convenience without any registration or enrollment. Successful completion of courses leads to recognized certification, enhancing career prospects. The platform is accessible to all learners, regardless of age, background, or qualifications. With its high-quality educational resources and opportunities for career advancement, NPTEL SWAYAM is an excellent choice for anyone looking to expand their knowledge and skills.
ABOUT THE COURSE : The course introduces the variety of concepts in the field of artificial intelligence. It discusses the philosophy of AI, and how to model a new problem as an AI problem. It describes a variety of models such as search, logic, Bayes nets, and MDPs, which can be used to model a new problem. It also teaches many first algorithms to solve each formulation. The course prepares a student to take a variety of focused, advanced courses in various subfields of AI.
CRITERIA TO GET A CERTIFICATE
Average assignment score = 25% of average of best 8 assignments out of the total 12 assignments given in the course. Exam score = 75% of the proctored certification exam score out of 100
Final score = Average assignment score + Exam score
YOU WILL BE ELIGIBLE FOR A CERTIFICATE ONLY IF AVERAGE ASSIGNMENT SCORE >=10/25 AND EXAM SCORE >= 30/75. If one of the 2 criteria is not met, you will not get the certificate even if the Final score >= 40/100.
1. Which of the following statements are true?
er:- A,B & D
2. Suppose you are performing model-based passive learning according to a given policy. Following this policy, you have reached State A a total of 100 times. State A has 4 possible transitions to next states: A, B, C, and D. The policy stipulates that you take the action a at this state. Taking action a, you end up in state A 61 times, state B 22 times and state C 17 times. Assuming add-one smoothing, what is the value of T(A, a, B)?
Round off the answer to first 3 decimal places.
er:- 0.221
1 point
3. For the next three questions, consider the following trajectories obtained by running some simulations in an unknown environment following a given policy. The state space is {A,B,C} and the action space is {a,b}. Assume discount factor is 0.5. Each sample is represented as (State, Action, Reward, Next state). Run 1: (A, a, 0,B) Run 2: (C, b, -1,A), (A, a, 0,B) Run 3: (C, b, -1,B) Run 4: (A, a, 0,B) Run 5: (A, a, 0,C), (C, b, -1,B) Using model-free passive learning, give an empirical estimate of VÎ (A).
Round off the answer to the first 3 decimal places.
er:- -0.450
1 point
4. Assume that the above samples are fed sequentially to a Temporal Difference learner. Assume all values of states are initialised to 0 and alpha is kept constant at 0.5. What will be the learned value of VÎ (A)?
Round off the answer to the first 2 decimal places.
er:- -0.25
1 point
5. Assume that the above samples are fed to a Q-learner. What is the value of Q(A,a)? Assume that all Q-values are initialized as 0. The discount factor is 0.5 and the learning rate is also 0.5.
er:- 0
1 point
6. Suppose we compute the optimal policy given the current Q-values. What is the action under optimal policy at state C? Type a or b.
er:- B
1 point
1 point
7. Which of the following is correct regarding Boltzmann exploration?
1 point
8. Which of the following is required for the convergence of Q-learning to the optimal Q-values?
Policy used to generate episodes for learning should be optimal.
All states are visited infinitely often over infinitely many samples.
Suitable initialisation of Q-values before learning updates.
Very large (>>1) learning rate.
Answer:- B
1 point
9. Which of the following statements are correct?
If an agent does not perform sufficient exploration in the choice of actions in the environment, it runs the risk of never getting large rewards.
If the agent has perfect knowledge of the transition and reward model of the environment, exploration is not needed.
Degree of exploration should be increased as the learning algorithm performs more and more updates.
Exploration is not required in model-based RL algorithms.
Answer:- A & C
1 point
10. Which of the following statement(s) is/are correct for Model-based and Model-free reinforcement learning methods?
Model-based learning usually requires more parameters to be learnt.
Model-free learning can simulate new episodes from past experience.