ABW501 Mock Exam 2: Analytics Edge (With Answers)
10 min read
| Item | Details |
|---|---|
| Total Points | 100 |
| Time Allowed | 90 minutes |
| Format | Closed book, calculator allowed |
| Structure | 4 Blocks, 8 Questions |
A hospital wants to reduce patient readmission rates. For each analytics approach, give a specific example of how it could help:
a) Descriptive Analytics b) Diagnostic Analytics c) Predictive Analytics d) Prescriptive Analytics
💡 Click to View Answer & Solution
a) Descriptive Analytics: "Dashboard showing readmission rates by department, age group, and diagnosis. Example: 'Cardiology has 15% readmission rate vs. 8% hospital average.'"
b) Diagnostic Analytics: "Root cause analysis to understand WHY readmissions happen. Example: 'Patients discharged on Friday have 20% higher readmission - likely due to weekend pharmacy closures.'"
c) Predictive Analytics: "ML model predicting which patients are likely to be readmitted. Example: 'Patient John has 78% probability of readmission within 30 days based on his diagnosis, age, and prior history.'"
d) Prescriptive Analytics: "Recommending specific interventions. Example: 'For high-risk patients, schedule follow-up call within 48 hours, arrange home nurse visits, and ensure medication delivery.'"
Key Pattern:
Explain the concept of data-driven decision making vs intuition-based decision making. Give TWO advantages and TWO disadvantages of each approach.
💡 Click to View Answer & Solution
Data-Driven Decision Making:
| Advantages | Disadvantages |
|---|---|
| 1. Objective - removes bias | 1. Requires quality data (garbage in = garbage out) |
| 2. Scalable - can analyze millions of records | 2. May miss context that humans understand |
| 3. Reproducible - same data → same decision | 3. Expensive to set up and maintain |
| 4. Measurable - can track outcomes | 4. Can lead to "analysis paralysis" |
Intuition-Based Decision Making:
| Advantages | Disadvantages |
|---|---|
| 1. Fast - no data collection needed | 1. Subject to cognitive biases |
| 2. Works when data is unavailable | 2. Hard to explain or justify |
| 3. Can capture tacit knowledge | 3. Not scalable |
| 4. Good for unprecedented situations | 4. Inconsistent results |
Best Practice: Combine both - use data to inform decisions, but let human judgment handle context and ethics.
Explain FOUR common data quality issues and how to address each:
💡 Click to View Answer & Solution
1. Missing Values
2. Outliers
3. Inconsistent Formatting
4. Duplicate Records
You receive a dataset with the following issues:
a) Identify what's wrong with each column b) Propose specific cleaning steps
💡 Click to View Answer & Solution
a) Age Column Issues:
Gender Column Issues:
b) Cleaning Steps:
For Age:
# Step 1: Replace invalid values with NaN
df.loc[df['Age'] < 0, 'Age'] = np.nan
# Step 2: Replace outliers (>120) with NaN
df.loc[df['Age'] > 120, 'Age'] = np.nan
# Step 3: Fill missing with median
df['Age'].fillna(df['Age'].median(), inplace=True)For Gender:
# Step 1: Convert to lowercase
df['Gender'] = df['Gender'].str.lower()
# Step 2: Standardize to single format
df['Gender'] = df['Gender'].replace({
'm': 'Male',
'male': 'Male',
'f': 'Female',
'female': 'Female'
})Result:
Explain the following model evaluation concepts:
a) Accuracy, Precision, Recall b) When is accuracy NOT a good metric? c) What is the F1 Score and when to use it?
💡 Click to View Answer & Solution
a) Definitions:
Accuracy: $\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}$
Precision: $\text{Precision} = \frac{TP}{TP + FP}$
Recall (Sensitivity): $\text{Recall} = \frac{TP}{TP + FN}$
b) When Accuracy Fails:
Imbalanced Classes!
Example: Fraud detection
Rule: Use precision/recall for imbalanced datasets.
c) F1 Score:
$F1 = 2 \times \frac{Precision \times Recall}{Precision + Recall}$
Use When:
A spam detection model has the following confusion matrix:
| Predicted: Spam | Predicted: Not Spam | |
|---|---|---|
| Actual: Spam | 85 | 15 |
| Actual: Not Spam | 10 | 890 |
Calculate: a) Accuracy b) Precision (for Spam) c) Recall (for Spam) d) F1 Score e) Which is more important for spam detection: Precision or Recall? Why?
💡 Click to View Answer & Solution
Confusion Matrix Values:
a) Accuracy: $\text{Accuracy} = \frac{85 + 890}{1000} = \frac{975}{1000} = 97.5%$
b) Precision: $\text{Precision} = \frac{85}{85 + 10} = \frac{85}{95} = 89.47%$
c) Recall: $\text{Recall} = \frac{85}{85 + 15} = \frac{85}{100} = 85.0%$
d) F1 Score: $F1 = 2 \times \frac{0.8947 \times 0.85}{0.8947 + 0.85} = 2 \times \frac{0.760}{1.745} = 87.17%$
e) Precision is MORE important for spam detection
Reasoning:
Trade-off:
Compare Supervised vs Unsupervised learning:
| Aspect | Supervised | Unsupervised |
|---|---|---|
| Definition | ? | ? |
| Data Requirements | ? | ? |
| Example Algorithms | ? | ? |
| Business Use Cases | ? | ? |
💡 Click to View Answer & Solution
| Aspect | Supervised | Unsupervised |
|---|---|---|
| Definition | Learning from labeled data (input → known output) | Finding patterns in unlabeled data |
| Data Requirements | Needs labeled training data (expensive to create) | Only needs input data (no labels needed) |
| Example Algorithms | Decision Tree, Random Forest, SVM, Linear Regression, Naive Bayes | K-Means Clustering, Hierarchical Clustering, PCA, Association Rules |
| Business Use Cases | Spam detection, price prediction, customer churn, loan approval | Customer segmentation, market basket analysis, anomaly detection |
Key Difference: Supervised has a "teacher" (labels), unsupervised discovers structure on its own.
Explain the concept of overfitting:
a) What is overfitting? b) How can you detect it? c) List FOUR techniques to prevent overfitting
💡 Click to View Answer & Solution
a) What is Overfitting?
Model learns training data TOO well, including noise and random fluctuations.
Analogy: Student who memorizes test answers but can't solve new problems.
b) How to Detect Overfitting
Train-Test Gap:
Learning Curves:
Cross-Validation:
c) Four Techniques to Prevent Overfitting
Cross-Validation
Regularization (L1/L2)
Early Stopping
Reduce Model Complexity
Bonus techniques:
A company wants to use ML for hiring decisions. The model was trained on historical hiring data (who was hired and who succeeded).
What ethical concerns should be considered?
💡 Click to View Answer & Solution
Ethical Concerns:
Historical Bias Perpetuation
Proxy Discrimination
Lack of Transparency
Feedback Loop
Recommendations:
| Block | Topic | Points |
|---|---|---|
| Block 1 | Analytics Strategy | 25 |
| Block 2 | Data Quality | 25 |
| Block 3 | Model Evaluation | 25 |
| Block 4 | Advanced Concepts | 25 |
| Total | 100 | |
| Bonus | Ethics | +5 |
| Metric | Formula |
|---|---|
| Accuracy | (TP + TN) / Total |
| Precision | TP / (TP + FP) |
| Recall | TP / (TP + FN) |
| F1 Score | 2 × (Precision × Recall) / (Precision + Recall) |
| Specificity | TN / (TN + FP) |
Confusion Matrix:
Predicted
Pos Neg
Actual Pos TP FN
Neg FP TN
Show all working for partial credit. Good luck!