TechBlog
HomeBlogCategoriesTagsLinksGuestbookAbout
中
TechBlog

探索技术世界的无限可能。分享前沿技术、开发经验与行业洞察。

快速链接

  • 博客文章
  • 文章分类
  • 标签云
  • 关于我

热门分类

  • Technology
  • AI
  • Web Development
  • DevOps

© 2026 TechBlog. All rights reserved.

Built with using Next.js & Tailwind CSS

Back to posts
Mock Exams

ABW501 Mock Exam 2: Analytics Edge (With Answers)

January 24, 2026
10 min read
#ABW501#Mock Exam#Business Analytics#Data Mining#Regression#Practice Test

💬 Comments

ABW501 Mock Exam 2 - Analytics Edge

📋 Exam Information

ItemDetails
Total Points100
Time Allowed90 minutes
FormatClosed book, calculator allowed
Structure4 Blocks, 8 Questions

Block 1: Analytics Strategy & Applications (25 points)

Q1.1 (12 points)

A hospital wants to reduce patient readmission rates. For each analytics approach, give a specific example of how it could help:

a) Descriptive Analytics b) Diagnostic Analytics c) Predictive Analytics d) Prescriptive Analytics

💡 Click to View Answer & Solution

a) Descriptive Analytics: "Dashboard showing readmission rates by department, age group, and diagnosis. Example: 'Cardiology has 15% readmission rate vs. 8% hospital average.'"

b) Diagnostic Analytics: "Root cause analysis to understand WHY readmissions happen. Example: 'Patients discharged on Friday have 20% higher readmission - likely due to weekend pharmacy closures.'"

c) Predictive Analytics: "ML model predicting which patients are likely to be readmitted. Example: 'Patient John has 78% probability of readmission within 30 days based on his diagnosis, age, and prior history.'"

d) Prescriptive Analytics: "Recommending specific interventions. Example: 'For high-risk patients, schedule follow-up call within 48 hours, arrange home nurse visits, and ensure medication delivery.'"

Key Pattern:

  • Descriptive: Summarize what happened
  • Diagnostic: Explain why
  • Predictive: Forecast risk
  • Prescriptive: Recommend actions

Q1.2 (13 points)

Explain the concept of data-driven decision making vs intuition-based decision making. Give TWO advantages and TWO disadvantages of each approach.

💡 Click to View Answer & Solution

Data-Driven Decision Making:

AdvantagesDisadvantages
1. Objective - removes bias1. Requires quality data (garbage in = garbage out)
2. Scalable - can analyze millions of records2. May miss context that humans understand
3. Reproducible - same data → same decision3. Expensive to set up and maintain
4. Measurable - can track outcomes4. Can lead to "analysis paralysis"

Intuition-Based Decision Making:

AdvantagesDisadvantages
1. Fast - no data collection needed1. Subject to cognitive biases
2. Works when data is unavailable2. Hard to explain or justify
3. Can capture tacit knowledge3. Not scalable
4. Good for unprecedented situations4. Inconsistent results

Best Practice: Combine both - use data to inform decisions, but let human judgment handle context and ethics.


Block 2: Data Quality & Preparation (25 points)

Q2.1 (12 points)

Explain FOUR common data quality issues and how to address each:

💡 Click to View Answer & Solution

1. Missing Values

  • Problem: Empty cells in dataset
  • Causes: Survey non-response, system errors, data not collected
  • Solutions:
    • Delete rows (if few missing)
    • Impute with mean/median/mode
    • Use predictive models to estimate
    • Create "missing" category for categorical

2. Outliers

  • Problem: Extreme values far from normal range
  • Causes: Data entry errors, genuine rare events
  • Solutions:
    • Remove if clearly erroneous
    • Cap at percentiles (winsorization)
    • Transform data (log scale)
    • Use robust algorithms

3. Inconsistent Formatting

  • Problem: Same thing recorded differently
  • Examples: "USA", "U.S.A.", "United States"
  • Solutions:
    • Standardize formats
    • Create lookup tables
    • Use data validation rules
    • Regular expressions for cleaning

4. Duplicate Records

  • Problem: Same entity recorded multiple times
  • Causes: Multiple data sources, entry errors
  • Solutions:
    • Exact matching (same ID)
    • Fuzzy matching (similar names)
    • Deduplication algorithms
    • Define business rules for merging

Q2.2 (13 points)

You receive a dataset with the following issues:

  • Age column has values: 25, 30, -5, 150, 45, NULL, 35
  • Gender column has: "M", "Male", "m", "F", "female", "Female"

a) Identify what's wrong with each column b) Propose specific cleaning steps

💡 Click to View Answer & Solution

a) Age Column Issues:

  1. -5: Invalid (negative age impossible)
  2. 150: Outlier (likely data entry error - no one is 150)
  3. NULL: Missing value

Gender Column Issues:

  1. Inconsistent case: "M" vs "m", "Male" vs "male"
  2. Inconsistent format: "M" vs "Male" (abbreviation vs full word)
  3. Inconsistent capitalization: "female" vs "Female"

b) Cleaning Steps:

For Age:

# Step 1: Replace invalid values with NaN
df.loc[df['Age'] < 0, 'Age'] = np.nan
 
# Step 2: Replace outliers (>120) with NaN
df.loc[df['Age'] > 120, 'Age'] = np.nan
 
# Step 3: Fill missing with median
df['Age'].fillna(df['Age'].median(), inplace=True)

For Gender:

# Step 1: Convert to lowercase
df['Gender'] = df['Gender'].str.lower()
 
# Step 2: Standardize to single format
df['Gender'] = df['Gender'].replace({
    'm': 'Male',
    'male': 'Male',
    'f': 'Female',
    'female': 'Female'
})

Result:

  • Age: [25, 30, 32.5, 32.5, 45, 32.5, 35] (assuming median = 32.5)
  • Gender: ['Male', 'Male', 'Male', 'Female', 'Female', 'Female']

Block 3: Model Evaluation & Interpretation (25 points)

Q3.1 (12 points)

Explain the following model evaluation concepts:

a) Accuracy, Precision, Recall b) When is accuracy NOT a good metric? c) What is the F1 Score and when to use it?

💡 Click to View Answer & Solution

a) Definitions:

Accuracy: $\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}$

  • Overall correctness
  • "Of all predictions, how many were right?"

Precision: $\text{Precision} = \frac{TP}{TP + FP}$

  • Of positive predictions, how many were actually positive?
  • "When I predict positive, how often am I right?"

Recall (Sensitivity): $\text{Recall} = \frac{TP}{TP + FN}$

  • Of actual positives, how many did I catch?
  • "Of all actual positives, how many did I find?"

b) When Accuracy Fails:

Imbalanced Classes!

Example: Fraud detection

  • 99% legitimate, 1% fraud
  • Model predicts "all legitimate" → 99% accuracy!
  • But catches 0% of fraud (useless)

Rule: Use precision/recall for imbalanced datasets.

c) F1 Score:

$F1 = 2 \times \frac{Precision \times Recall}{Precision + Recall}$

  • Harmonic mean of precision and recall
  • Balances both metrics
  • Range: 0 to 1 (higher is better)

Use When:

  • Class imbalance exists
  • Both false positives AND false negatives matter
  • Need single metric to compare models

Q3.2 (13 points)

A spam detection model has the following confusion matrix:

Predicted: SpamPredicted: Not Spam
Actual: Spam8515
Actual: Not Spam10890

Calculate: a) Accuracy b) Precision (for Spam) c) Recall (for Spam) d) F1 Score e) Which is more important for spam detection: Precision or Recall? Why?

💡 Click to View Answer & Solution

Confusion Matrix Values:

  • TP (Spam correctly identified) = 85
  • FN (Spam missed) = 15
  • FP (Not spam marked as spam) = 10
  • TN (Not spam correctly identified) = 890
  • Total = 1000

a) Accuracy: $\text{Accuracy} = \frac{85 + 890}{1000} = \frac{975}{1000} = 97.5%$

b) Precision: $\text{Precision} = \frac{85}{85 + 10} = \frac{85}{95} = 89.47%$

c) Recall: $\text{Recall} = \frac{85}{85 + 15} = \frac{85}{100} = 85.0%$

d) F1 Score: $F1 = 2 \times \frac{0.8947 \times 0.85}{0.8947 + 0.85} = 2 \times \frac{0.760}{1.745} = 87.17%$

e) Precision is MORE important for spam detection

Reasoning:

  • High precision = Few false positives
  • False positive = Important email marked as spam
  • This is WORSE than missing some spam (user might miss critical email!)

Trade-off:

  • Prefer: Some spam in inbox (annoying but visible)
  • Avoid: Important email in spam folder (might never be seen)

Block 4: Advanced Analytics Concepts (25 points)

Q4.1 (12 points)

Compare Supervised vs Unsupervised learning:

AspectSupervisedUnsupervised
Definition??
Data Requirements??
Example Algorithms??
Business Use Cases??

💡 Click to View Answer & Solution

AspectSupervisedUnsupervised
DefinitionLearning from labeled data (input → known output)Finding patterns in unlabeled data
Data RequirementsNeeds labeled training data (expensive to create)Only needs input data (no labels needed)
Example AlgorithmsDecision Tree, Random Forest, SVM, Linear Regression, Naive BayesK-Means Clustering, Hierarchical Clustering, PCA, Association Rules
Business Use CasesSpam detection, price prediction, customer churn, loan approvalCustomer segmentation, market basket analysis, anomaly detection

Key Difference: Supervised has a "teacher" (labels), unsupervised discovers structure on its own.


Q4.2 (13 points)

Explain the concept of overfitting:

a) What is overfitting? b) How can you detect it? c) List FOUR techniques to prevent overfitting

💡 Click to View Answer & Solution

a) What is Overfitting?

Model learns training data TOO well, including noise and random fluctuations.

  • Performs excellently on training data
  • Performs poorly on new/unseen data
  • Model has memorized rather than learned general patterns

Analogy: Student who memorizes test answers but can't solve new problems.

b) How to Detect Overfitting

  1. Train-Test Gap:

    • Training accuracy = 99%
    • Test accuracy = 70%
    • Large gap = overfitting
  2. Learning Curves:

    • Training error keeps decreasing
    • Validation error increases or plateaus
    • Lines diverge
  3. Cross-Validation:

    • High variance in scores across folds
    • Some folds much worse than others

c) Four Techniques to Prevent Overfitting

  1. Cross-Validation

    • Split data into k folds
    • Train on k-1, test on 1, rotate
    • More reliable performance estimate
  2. Regularization (L1/L2)

    • Adds penalty for complex models
    • L1 (Lasso): Can eliminate features
    • L2 (Ridge): Shrinks coefficients
  3. Early Stopping

    • Monitor validation error during training
    • Stop when validation error starts increasing
    • Prevents over-training
  4. Reduce Model Complexity

    • Fewer features (feature selection)
    • Simpler model (shallower tree)
    • Less parameters

Bonus techniques:

  • Dropout (neural networks)
  • More training data
  • Data augmentation
  • Ensemble methods

🎁 Bonus: Ethics in Analytics (5 extra points)

A company wants to use ML for hiring decisions. The model was trained on historical hiring data (who was hired and who succeeded).

What ethical concerns should be considered?

💡 Click to View Answer & Solution

Ethical Concerns:

  1. Historical Bias Perpetuation

    • If past hiring was biased (e.g., fewer women in tech), model learns this
    • Model will discriminate against groups historically underrepresented
    • "The model is only as fair as its training data"
  2. Proxy Discrimination

    • Even without protected attributes (race, gender), proxies exist
    • Zip code correlates with race
    • Name style correlates with ethnicity
    • Model may discriminate indirectly
  3. Lack of Transparency

    • Candidates can't understand why they were rejected
    • "Black box" decisions are hard to appeal
    • Legal requirement for explainable decisions
  4. Feedback Loop

    • Only hired people have success data
    • Rejected candidates might have succeeded
    • Model never learns from its mistakes

Recommendations:

  • Audit model for bias regularly
  • Use diverse training data
  • Ensure human oversight
  • Allow candidates to appeal/explain
  • Be transparent about AI use

🏁 End of Exam

BlockTopicPoints
Block 1Analytics Strategy25
Block 2Data Quality25
Block 3Model Evaluation25
Block 4Advanced Concepts25
Total100
BonusEthics+5

📝 Key Formulas Reference

MetricFormula
Accuracy(TP + TN) / Total
PrecisionTP / (TP + FP)
RecallTP / (TP + FN)
F1 Score2 × (Precision × Recall) / (Precision + Recall)
SpecificityTN / (TN + FP)

Confusion Matrix:

                Predicted
              Pos    Neg
Actual Pos    TP     FN
       Neg    FP     TN

Show all working for partial credit. Good luck!