[["Question: Statement 1| Layer Normalization is used in the original ResNet paper, not Batch Normalization. Statement 2| DCGANs use self-attention to stabilize training.\nChoices:\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer:", " True, True"], ["Question: Statement 1| Layer Normalization is used in the original ResNet paper, not Batch Normalization. Statement 2| DCGANs use self-attention to stabilize training.\nChoices:\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer:", " False, False"], ["Question: Statement 1| Layer Normalization is used in the original ResNet paper, not Batch Normalization. Statement 2| DCGANs use self-attention to stabilize training.\nChoices:\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer:", " True, False"], ["Question: Statement 1| Layer Normalization is used in the original ResNet paper, not Batch Normalization. Statement 2| DCGANs use self-attention to stabilize training.\nChoices:\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer:", " False, True"], ["Question: Another term for out-of-distribution detection is?\nChoices:\nA. anomaly detection\nB. one-class detection\nC. train-test mismatch robustness\nD. background detection\nAnswer:", " anomaly detection"], ["Question: Another term for out-of-distribution detection is?\nChoices:\nA. anomaly detection\nB. one-class detection\nC. train-test mismatch robustness\nD. background detection\nAnswer:", " one-class detection"], ["Question: Another term for out-of-distribution detection is?\nChoices:\nA. anomaly detection\nB. one-class detection\nC. train-test mismatch robustness\nD. background detection\nAnswer:", " train-test mismatch robustness"], ["Question: Another term for out-of-distribution detection is?\nChoices:\nA. anomaly detection\nB. one-class detection\nC. train-test mismatch robustness\nD. background detection\nAnswer:", " background detection"], ["Question: Neural networks:\nChoices:\nA. Optimize a convex objective function\nB. Can only be trained with stochastic gradient descent\nC. Can use a mix of different activation functions\nD. None of the above\nAnswer:", " Optimize a convex objective function"], ["Question: Neural networks:\nChoices:\nA. Optimize a convex objective function\nB. Can only be trained with stochastic gradient descent\nC. Can use a mix of different activation functions\nD. None of the above\nAnswer:", " Can only be trained with stochastic gradient descent"], ["Question: Neural networks:\nChoices:\nA. Optimize a convex objective function\nB. Can only be trained with stochastic gradient descent\nC. Can use a mix of different activation functions\nD. None of the above\nAnswer:", " Can use a mix of different activation functions"], ["Question: Neural networks:\nChoices:\nA. Optimize a convex objective function\nB. Can only be trained with stochastic gradient descent\nC. Can use a mix of different activation functions\nD. None of the above\nAnswer:", " None of the above"], ["Question: In building a linear regression model for a particular data set, you observe the coefficient of one of the features having a relatively high negative value. This suggests that\nChoices:\nA. This feature has a strong effect on the model (should be retained)\nB. This feature does not have a strong effect on the model (should be ignored)\nC. It is not possible to comment on the importance of this feature without additional information\nD. Nothing can be determined.\nAnswer:", " This feature has a strong effect on the model (should be retained)"], ["Question: In building a linear regression model for a particular data set, you observe the coefficient of one of the features having a relatively high negative value. This suggests that\nChoices:\nA. This feature has a strong effect on the model (should be retained)\nB. This feature does not have a strong effect on the model (should be ignored)\nC. It is not possible to comment on the importance of this feature without additional information\nD. Nothing can be determined.\nAnswer:", " This feature does not have a strong effect on the model (should be ignored)"], ["Question: In building a linear regression model for a particular data set, you observe the coefficient of one of the features having a relatively high negative value. This suggests that\nChoices:\nA. This feature has a strong effect on the model (should be retained)\nB. This feature does not have a strong effect on the model (should be ignored)\nC. It is not possible to comment on the importance of this feature without additional information\nD. Nothing can be determined.\nAnswer:", " It is not possible to comment on the importance of this feature without additional information"], ["Question: In building a linear regression model for a particular data set, you observe the coefficient of one of the features having a relatively high negative value. This suggests that\nChoices:\nA. This feature has a strong effect on the model (should be retained)\nB. This feature does not have a strong effect on the model (should be ignored)\nC. It is not possible to comment on the importance of this feature without additional information\nD. Nothing can be determined.\nAnswer:", " Nothing can be determined."], ["Question: Statement 1| RoBERTa pretrains on a corpus that is approximate 10x larger than the corpus BERT pretrained on. Statement 2| ResNeXts in 2018 usually used tanh activation functions.\nChoices:\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer:", " True, True"], ["Question: Statement 1| RoBERTa pretrains on a corpus that is approximate 10x larger than the corpus BERT pretrained on. Statement 2| ResNeXts in 2018 usually used tanh activation functions.\nChoices:\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer:", " False, False"], ["Question: Statement 1| RoBERTa pretrains on a corpus that is approximate 10x larger than the corpus BERT pretrained on. Statement 2| ResNeXts in 2018 usually used tanh activation functions.\nChoices:\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer:", " True, False"], ["Question: Statement 1| RoBERTa pretrains on a corpus that is approximate 10x larger than the corpus BERT pretrained on. Statement 2| ResNeXts in 2018 usually used tanh activation functions.\nChoices:\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer:", " False, True"], ["Question: As the number of training examples goes to infinity, your model trained on that data will have:\nChoices:\nA. Lower variance\nB. Higher variance\nC. Same variance\nD. None of the above\nAnswer:", " Lower variance"], ["Question: As the number of training examples goes to infinity, your model trained on that data will have:\nChoices:\nA. Lower variance\nB. Higher variance\nC. Same variance\nD. None of the above\nAnswer:", " Higher variance"], ["Question: As the number of training examples goes to infinity, your model trained on that data will have:\nChoices:\nA. Lower variance\nB. Higher variance\nC. Same variance\nD. None of the above\nAnswer:", " Same variance"], ["Question: As the number of training examples goes to infinity, your model trained on that data will have:\nChoices:\nA. Lower variance\nB. Higher variance\nC. Same variance\nD. None of the above\nAnswer:", " None of the above"], ["Question: MLE estimates are often undesirable because\nChoices:\nA. they are biased\nB. they have high variance\nC. they are not consistent estimators\nD. None of the above\nAnswer:", " they are biased"], ["Question: MLE estimates are often undesirable because\nChoices:\nA. they are biased\nB. they have high variance\nC. they are not consistent estimators\nD. None of the above\nAnswer:", " they have high variance"], ["Question: MLE estimates are often undesirable because\nChoices:\nA. they are biased\nB. they have high variance\nC. they are not consistent estimators\nD. None of the above\nAnswer:", " they are not consistent estimators"], ["Question: MLE estimates are often undesirable because\nChoices:\nA. they are biased\nB. they have high variance\nC. they are not consistent estimators\nD. None of the above\nAnswer:", " None of the above"], ["Question: For Kernel Regression, which one of these structural assumptions is the one that most affects the trade-off between underfitting and overfitting:\nChoices:\nA. Whether kernel function is Gaussian versus triangular versus box-shaped\nB. Whether we use Euclidian versus L1 versus L\u221e metrics\nC. The kernel width\nD. The maximum height of the kernel function\nAnswer:", " Whether kernel function is Gaussian versus triangular versus box-shaped"], ["Question: For Kernel Regression, which one of these structural assumptions is the one that most affects the trade-off between underfitting and overfitting:\nChoices:\nA. Whether kernel function is Gaussian versus triangular versus box-shaped\nB. Whether we use Euclidian versus L1 versus L\u221e metrics\nC. The kernel width\nD. The maximum height of the kernel function\nAnswer:", " Whether we use Euclidian versus L1 versus L\u221e metrics"], ["Question: For Kernel Regression, which one of these structural assumptions is the one that most affects the trade-off between underfitting and overfitting:\nChoices:\nA. Whether kernel function is Gaussian versus triangular versus box-shaped\nB. Whether we use Euclidian versus L1 versus L\u221e metrics\nC. The kernel width\nD. The maximum height of the kernel function\nAnswer:", " The kernel width"], ["Question: For Kernel Regression, which one of these structural assumptions is the one that most affects the trade-off between underfitting and overfitting:\nChoices:\nA. Whether kernel function is Gaussian versus triangular versus box-shaped\nB. Whether we use Euclidian versus L1 versus L\u221e metrics\nC. The kernel width\nD. The maximum height of the kernel function\nAnswer:", " The maximum height of the kernel function"], ["Question: Statement 1| RELUs are not monotonic, but sigmoids are monotonic. Statement 2| Neural networks trained with gradient descent with high probability converge to the global optimum.\nChoices:\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer:", " True, True"], ["Question: Statement 1| RELUs are not monotonic, but sigmoids are monotonic. Statement 2| Neural networks trained with gradient descent with high probability converge to the global optimum.\nChoices:\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer:", " False, False"], ["Question: Statement 1| RELUs are not monotonic, but sigmoids are monotonic. Statement 2| Neural networks trained with gradient descent with high probability converge to the global optimum.\nChoices:\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer:", " True, False"], ["Question: Statement 1| RELUs are not monotonic, but sigmoids are monotonic. Statement 2| Neural networks trained with gradient descent with high probability converge to the global optimum.\nChoices:\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer:", " False, True"], ["Question: Statement 1| The training error of 1-nearest neighbor classifier is 0. Statement 2| As the number of data points grows to infinity, the MAP estimate approaches the MLE estimate for all possible priors. In other words, given enough data, the choice of prior is irrelevant.\nChoices:\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer:", " True, True"], ["Question: Statement 1| The training error of 1-nearest neighbor classifier is 0. Statement 2| As the number of data points grows to infinity, the MAP estimate approaches the MLE estimate for all possible priors. In other words, given enough data, the choice of prior is irrelevant.\nChoices:\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer:", " False, False"], ["Question: Statement 1| The training error of 1-nearest neighbor classifier is 0. Statement 2| As the number of data points grows to infinity, the MAP estimate approaches the MLE estimate for all possible priors. In other words, given enough data, the choice of prior is irrelevant.\nChoices:\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer:", " True, False"], ["Question: Statement 1| The training error of 1-nearest neighbor classifier is 0. Statement 2| As the number of data points grows to infinity, the MAP estimate approaches the MLE estimate for all possible priors. In other words, given enough data, the choice of prior is irrelevant.\nChoices:\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer:", " False, True"]]