NPTEL An Introduction To Artificial Intelligence Assignment 11 Answer

NPTEL SWAYAM is a free online learning platform providing courses in various disciplines from top universities and institutions in India. It offers an interactive learning environment with engaging video lectures, quizzes, assignments, and discussion boards. Learners can access courses at their own pace and convenience without any registration or enrollment. Successful completion of courses leads to recognized certification, enhancing career prospects. The platform is accessible to all learners, regardless of age, background, or qualifications. With its high-quality educational resources and opportunities for career advancement, NPTEL SWAYAM is an excellent choice for anyone looking to expand their knowledge and skills.

ABOUT THE COURSE :
The course introduces the variety of concepts in the field of artificial intelligence. It discusses the philosophy of AI, and how to model a new problem as an AI problem. It describes a variety of models such as search, logic, Bayes nets, and MDPs, which can be used to model a new problem. It also teaches many first algorithms to solve each formulation. The course prepares a student to take a variety of focused, advanced courses in various subfields of AI.

CRITERIA TO GET A CERTIFICATE

Average assignment score = 25% of average of best 8 assignments out of the total 12 assignments given in the course.
Exam score = 75% of the proctored certification exam score out of 100

Final score = Average assignment score + Exam score

YOU WILL BE ELIGIBLE FOR A CERTIFICATE ONLY IF AVERAGE ASSIGNMENT SCORE >=10/25 AND EXAM SCORE >= 30/75. If one of the 2 criteria is not met, you will not get the certificate even if the Final score >= 40/100.

1. Which of the following statements are true?
A model-based learner learns the optimal policy given a model of the state space
A passive learner requires a policy to be fed to it
A strong simulator can jump to any part of the state space to begin a simulation
An active learner learns the optimal policy and also decides which action to take next.

Answer:- A,B & D

2. Suppose you are performing model-based passive learning according to a given policy. Following this policy, you have reached State A a total of 100 times. State A has 4 possible transitions to next states: A, B, C, and D. The policy stipulates that you take the action a at this state. Taking action a, you end up in state A 61 times, state B 22 times and state C 17 times. Assuming add-one smoothing, what is the value of T(A, a, B)?

Round off the answer to first 3 decimal places.

Answer:- 0.221

1 point
3. For the next three questions, consider the following trajectories obtained by running some simulations in an unknown environment following a given policy. The state space is {A,B,C} and the action space is {a,b}. Assume discount factor is 0.5. Each sample is represented as (State, Action, Reward, Next state).
Run 1: (A, a, 0,B)
Run 2: (C, b, -1,A), (A, a, 0,B)
Run 3: (C, b, -1,B)
Run 4: (A, a, 0,B)
Run 5: (A, a, 0,C), (C, b, -1,B)
Using model-free passive learning, give an empirical estimate of VΠ(A).

Round off the answer to the first 3 decimal places.

Answer:- -0.450
1 point
4. Assume that the above samples are fed sequentially to a Temporal Difference learner. Assume all values of states are initialised to 0 and alpha is kept constant at 0.5. What will be the learned value of VΠ(A)?

Round off the answer to the first 2 decimal places.

Answer:- -0.25
1 point
5. Assume that the above samples are fed to a Q-learner. What is the value of Q(A,a)? Assume that all Q-values are initialized as 0. The discount factor is 0.5 and the learning rate is also 0.5.

Answer:- 0
1 point
6. Suppose we compute the optimal policy given the current Q-values. What is the action under optimal policy at state C?
Type a or b.

Answer:- B
1 point
1 point
7. Which of the following is correct regarding Boltzmann exploration?
It focuses on exploration initially, and more on exploitation as time passes
It is guaranteed to discover all reachable states from the start state, given infinite time
It leans more towards exploitation as temperature is increased
The probability of an action being chosen at a particular state varies exponentially with its Q-value at that point in time

Answer:- C & D

1 point
8. Which of the following is required for the convergence of Q-learning to the optimal Q-values?
Policy used to generate episodes for learning should be optimal.
All states are visited infinitely often over infinitely many samples.
Suitable initialisation of Q-values before learning updates.
Very large (>>1) learning rate.

Answer:- B

1 point
9. Which of the following statements are correct?
If an agent does not perform sufficient exploration in the choice of actions in the environment, it runs the risk of never getting large rewards.
If the agent has perfect knowledge of the transition and reward model of the environment, exploration is not needed.
Degree of exploration should be increased as the learning algorithm performs more and more updates.
Exploration is not required in model-based RL algorithms.

Answer:- A & C
1 point
10. Which of the following statement(s) is/are correct for Model-based and Model-free reinforcement learning methods?
Model-based learning usually requires more parameters to be learnt.
Model-free learning can simulate new episodes from past experience.
Model-based methods are more sample efficient.
None of the above.

Answer:- B

NPTEL An Introduction To Artificial Intelligence Assignment 11 Answer

Answer:- A,B & D

Answer:- 0.221

Answer:- -0.450

Answer:- -0.25

Answer:- 0

Answer:- B

Answer:- C & D

Answer:- B

Answer:- A & C

Answer:- B

You might like

Contact Form