ABW501 Mock Exam 1: Analytics Edge (With Answers)
9 分钟阅读
ABW501 Mock Exam 1 - Analytics Edge
📋 Exam Information
| Item | Details |
|---|---|
| Total Points | 100 |
| Time Allowed | 90 minutes |
| Format | Closed book, calculator allowed |
| Structure | 4 Blocks, 8 Questions total |
Block 1: Analytics Types & Applications (25 points)
Q1.1 (12 points)
Complete the comparison table for four analytics types:
| Analytics Type | Key Question | Example Technique | Business Example |
|---|---|---|---|
| Descriptive | ? | ? | ? |
| Diagnostic | ? | ? | ? |
| Predictive | ? | ? | ? |
| Prescriptive | ? | ? | ? |
💡 Click to View Answer & Solution
| Analytics Type | Key Question | Example Technique | Business Example |
|---|---|---|---|
| Descriptive | What happened? | Summary statistics, dashboards, reporting | Monthly sales report showing $2M revenue |
| Diagnostic | Why did it happen? | Drill-down analysis, correlation analysis | Sales drop due to competitor price cut |
| Predictive | What will happen? | Regression, ML models, forecasting | Customer churn prediction (70% likely to leave) |
| Prescriptive | What should we do? | Optimization, simulation, decision models | Optimal inventory levels to minimize costs |
Memory Trick:
- Descriptive = Explain past
- Diagnostic = Investigate why
- Predictive = Estimate future
- Prescriptive = Execute action
Q1.2 (13 points)
Match each business problem to the correct analytics type and explain:
A: "Our sales dropped 15% last quarter. What caused this decline?"
B: "Which products should we recommend based on browsing patterns?"
C: "What is the optimal price point to maximize profit?"
D: "What were our top 5 selling products last month?"
💡 Click to View Answer & Solution
A: "Sales dropped 15%. What caused this?"
- Type: DIAGNOSTIC
- Reason: Investigating WHY something happened (root cause analysis)
B: "Which products to recommend based on browsing?"
- Type: PREDICTIVE
- Reason: Using patterns to predict what customers will want
C: "Optimal price point to maximize profit?"
- Type: PRESCRIPTIVE
- Reason: Optimization problem - determining best action
D: "Top 5 selling products last month?"
- Type: DESCRIPTIVE
- Reason: Simply summarizing historical data
Block 2: Analytics Lifecycle (25 points)
Q2.1 (15 points)
List and describe the SIX stages of Data Analytics Lifecycle in order.
💡 Click to View Answer & Solution
Stage 1: DISCOVERY
- Understand business problem and objectives
- Define key questions to answer
- Identify stakeholders and success criteria
Stage 2: DATA PREPARATION
- Collect data from various sources
- Clean data (handle missing values, outliers)
- Transform and integrate datasets
Stage 3: MODEL PLANNING
- Select appropriate techniques/algorithms
- Identify features (variables) to use
- Plan evaluation metrics
Stage 4: MODEL BUILDING
- Build and train models
- Test different algorithms
- Tune hyperparameters
Stage 5: COMMUNICATE RESULTS
- Present findings to stakeholders
- Create visualizations and reports
- Translate technical results to business insights
Stage 6: OPERATIONALIZE
- Deploy model to production
- Monitor performance over time
- Maintain and update as needed
Memory Trick: D-D-M-M-C-O = "Data Doctors Make Models, Communicate, Operate"
Q2.2 (10 points)
Identify which lifecycle stage each scenario describes:
A: "The team is cleaning missing values and removing outliers."
B: "Management wants to understand why churn increased. Team is defining specific questions."
C: "The model is deployed in production. Team monitors accuracy weekly."
D: "Data scientists are testing Random Forest, SVM, and Logistic Regression."
💡 Click to View Answer & Solution
A: Cleaning missing values and outliers
- Stage: DATA PREPARATION
- Reason: Data cleaning is core preparation activity
B: Defining specific questions to answer
- Stage: DISCOVERY
- Reason: Understanding problem and defining scope
C: Model deployed, monitoring accuracy
- Stage: OPERATIONALIZE
- Reason: Production deployment and monitoring
D: Testing multiple algorithms
- Stage: MODEL BUILDING
- Reason: Training and comparing different models
Block 3: Regression Analysis (25 points)
Scenario:
Real estate price prediction model:
$$\text{Price} = 50000 + 200 \times \text{SquareFeet} + 15000 \times \text{Bedrooms} - 5000 \times \text{Age}$$
| Statistic | Value |
|---|---|
| R² | 0.82 |
| Adjusted R² | 0.80 |
| All p-values | < 0.01 |
| Sample size | 200 houses |
Q3.1 (8 points)
Interpret each coefficient in plain business language.
💡 Click to View Answer & Solution
Constant (50,000):
- Base price when all other variables = 0
- Theoretical minimum house value
SquareFeet Coefficient (200):
- For each additional square foot, price increases by $200
- Holding bedrooms and age constant
Bedrooms Coefficient (15,000):
- Each additional bedroom adds $15,000 to price
- Holding square feet and age constant
Age Coefficient (-5,000):
- Each year older decreases price by $5,000
- Negative = older houses worth less
- Holding square feet and bedrooms constant
Key Phrase: Always say "holding other variables constant"
Q3.2 (8 points)
Interpret R² = 0.82. Calculate predicted price for a house with:
- 2,000 sq ft
- 3 bedrooms
- 10 years old
💡 Click to View Answer & Solution
R² Interpretation:
R² = 0.82 means 82% of the variation in house prices is explained by square feet, bedrooms, and age. The remaining 18% is due to other factors (location, condition, etc.). This is a good model for real estate.
Price Prediction: $\text{Price} = 50000 + 200(2000) + 15000(3) - 5000(10)$ $= 50000 + 400000 + 45000 - 50000$ $= 445000$
Predicted Price = $445,000
Q3.3 (9 points)
Compare two investment houses:
House A: 1,800 sq ft, 3 bedrooms, 5 years old House B: 1,500 sq ft, 4 bedrooms, 2 years old
Which has higher predicted price? Calculate the difference.
💡 Click to View Answer & Solution
House A: $= 50000 + 200(1800) + 15000(3) - 5000(5)$ $= 50000 + 360000 + 45000 - 25000$ $= 430000$
House B: $= 50000 + 200(1500) + 15000(4) - 5000(2)$ $= 50000 + 300000 + 60000 - 10000$ $= 400000$
Results:
- House A: $430,000
- House B: $400,000
- House A is higher by $30,000
Insight: Square footage has more impact than bedrooms. House A's extra 300 sq ft ($60,000 value) outweighs House B's extra bedroom ($15,000).
Block 4: Data Mining Algorithms (25 points)
Q4.1 (15 points)
Complete the algorithm comparison table:
| Aspect | Decision Tree | KNN | Naive Bayes |
|---|---|---|---|
| Algorithm Type | ? | ? | ? |
| How it works | ? | ? | ? |
| Main Advantage | ? | ? | ? |
| Main Disadvantage | ? | ? | ? |
| Best Use Case | ? | ? | ? |
💡 Click to View Answer & Solution
| Aspect | Decision Tree | KNN | Naive Bayes |
|---|---|---|---|
| Type | Both (Classification & Regression) | Both | Classification |
| How it works | Splits data using if-then rules based on feature thresholds | Classifies based on k nearest neighbors' majority vote | Uses Bayes theorem with feature independence assumption |
| Advantage | Easy to interpret, visual | Simple, no training needed | Fast, works well with small data |
| Disadvantage | Prone to overfitting | Slow prediction (compares all points) | Assumes feature independence (often unrealistic) |
| Best Use Case | When explainability matters (credit decisions) | When similar items cluster together (recommendations) | Text classification (spam detection) |
Q4.2 (10 points)
Scenario: Build email spam classifier with:
- 1 million emails (large dataset)
- Need real-time predictions (<100ms)
- Binary: Spam or Not Spam
Which algorithm? Why not the others?
💡 Click to View Answer & Solution
Best Choice: Naive Bayes
Reasons:
- Very fast prediction - perfect for real-time (<100ms)
- Works great for text classification (spam detection is classic NB use case)
- Handles high-dimensional data well (many word features)
- Scales to large datasets efficiently
Why NOT Decision Tree:
- Can overfit with many text features
- Large tree = slower prediction
- Less suited for text data
Why NOT KNN:
- Way too slow for 1 million emails
- Must compare against ALL training examples
- Real-time requirement impossible to meet
- Memory intensive (stores all data)
Summary:
- Naive Bayes: ✅ Fast, good for text, scalable
- Decision Tree: ⚠️ Possible but not optimal for text
- KNN: ❌ Too slow for large data + real-time
🎁 Bonus: Correlation vs Causation (5 extra points)
Explain the difference. Give an example where two variables are correlated but NOT causally related.
💡 Click to View Answer & Solution
Correlation: Two variables move together (positive or negative relationship)
Causation: One variable directly causes changes in another
Key Difference: Correlation ≠ Causation. Just because A and B move together doesn't mean A causes B (or B causes A).
Example:
- Ice cream sales and drowning deaths are positively correlated
- But ice cream doesn't CAUSE drowning!
- Confounding variable: Hot weather
- Hot weather → More ice cream sales
- Hot weather → More swimming → More drownings
Other Examples:
- Shoe size correlates with reading ability (age is the confounding variable)
- Nicolas Cage movies correlate with swimming pool drownings (pure coincidence)
🏁 End of Exam
| Block | Topic | Points |
|---|---|---|
| Block 1 | Analytics Types | 25 |
| Block 2 | Lifecycle | 25 |
| Block 3 | Regression | 25 |
| Block 4 | Data Mining | 25 |
| Total | 100 | |
| Bonus | Correlation/Causation | +5 |
📝 Quick Reference
Analytics Types:
- Descriptive: What happened?
- Diagnostic: Why?
- Predictive: What will happen?
- Prescriptive: What should we do?
Lifecycle: Discovery → Data Prep → Model Plan → Model Build → Communicate → Operationalize
Regression: Coefficient = change in Y per 1-unit change in X
R²: % of variance explained by model
Show your work for partial credit. Good luck!