CS5243 - Data Science

1. , A feature vectorization process in NLP
4. Ratio of correctly predicted +ves to the total number of +ves
6. , Lemmatization uses __________ to modify the words
9. , collection of all documents, is called
11. , Distance measure used by k-NMN to train and test the categorical features is
13. , A classifier model performs better on training, but poorly on testing is referred to
14. , K-NN is an example of ____________ learning
16. , A Public corpora
19. , P(X/C) in Naive Bayes is termed to be
23. , Sigmoid activation function transforms linear combined data to ___________ form
25. , Method used to train the model with N-1 samples and tested with 1 sample
26. , Type-I error in the confusion matrix refers to _______ value
27. , Soft margin SVM additionally has ________ variable with hard margin SVM
29. , ID3 is sensitive to number of ____________ attribute values
30. , Eye color is an example of ____________ datatype

2. , Likert scale is an example of __________ data
3. , Ratio of predicted +ves to the total number of +ves
5. , Value of k in k-NN is determined using
7. , Cancer Vs Non-Cancer is an example _________ binary datatype
8. , cosine score equals to zero represents the two vectors are
10. , P(X) in Naive Bayes is termed to be
12. , Binary classifier sensitive to noise
15. , Method used to handle the continuous data in Naiva Bayes
17. , A constant use to increase/ decrease the net input value in logistic regression
18. , Entropy of the dataset having binary class labels can have the value > 1. State True or False
20. , Generalized distance measure of L1 and L2 norm
21. , Size of the confusion matrix, if the dataset having 600 features with N class labels are used for training and testing
22. , TF-IDF uses ________ matrix for large vocabulary
24. , identifies the unique or rare occurrence of the words in the documents
28. , L1 norm is also called as