ABW501 Mock Exam 1 - Analytics Edge

📋 Exam Information

Item	Details
Total Points	100
Time Allowed	90 minutes
Format	Closed book, calculator allowed
Structure	4 Blocks, 8 Questions total

Block 1: Analytics Types & Applications (25 points)

Q1.1 (12 points)

Complete the comparison table for four analytics types:

Analytics Type	Key Question	Example Technique	Business Example
Descriptive	?	?	?
Diagnostic	?	?	?
Predictive	?	?	?
Prescriptive	?	?	?

💡 Click to View Answer & Solution

Analytics Type	Key Question	Example Technique	Business Example
Descriptive	What happened?	Summary statistics, dashboards, reporting	Monthly sales report showing $2M revenue
Diagnostic	Why did it happen?	Drill-down analysis, correlation analysis	Sales drop due to competitor price cut
Predictive	What will happen?	Regression, ML models, forecasting	Customer churn prediction (70% likely to leave)
Prescriptive	What should we do?	Optimization, simulation, decision models	Optimal inventory levels to minimize costs

Memory Trick:

Descriptive = Explain past
Diagnostic = Investigate why
Predictive = Estimate future
Prescriptive = Execute action

Q1.2 (13 points)

Match each business problem to the correct analytics type and explain:

A: "Our sales dropped 15% last quarter. What caused this decline?"

B: "Which products should we recommend based on browsing patterns?"

C: "What is the optimal price point to maximize profit?"

D: "What were our top 5 selling products last month?"

💡 Click to View Answer & Solution

A: "Sales dropped 15%. What caused this?"

Type: DIAGNOSTIC
Reason: Investigating WHY something happened (root cause analysis)

B: "Which products to recommend based on browsing?"

Type: PREDICTIVE
Reason: Using patterns to predict what customers will want

C: "Optimal price point to maximize profit?"

Type: PRESCRIPTIVE
Reason: Optimization problem - determining best action

D: "Top 5 selling products last month?"

Type: DESCRIPTIVE
Reason: Simply summarizing historical data

Block 2: Analytics Lifecycle (25 points)

Q2.1 (15 points)

List and describe the SIX stages of Data Analytics Lifecycle in order.

💡 Click to View Answer & Solution

Stage 1: DISCOVERY

Understand business problem and objectives
Define key questions to answer
Identify stakeholders and success criteria

Stage 2: DATA PREPARATION

Collect data from various sources
Clean data (handle missing values, outliers)
Transform and integrate datasets

Stage 3: MODEL PLANNING

Select appropriate techniques/algorithms
Identify features (variables) to use
Plan evaluation metrics

Stage 4: MODEL BUILDING

Build and train models
Test different algorithms
Tune hyperparameters

Stage 5: COMMUNICATE RESULTS

Present findings to stakeholders
Create visualizations and reports
Translate technical results to business insights

Stage 6: OPERATIONALIZE

Deploy model to production
Monitor performance over time
Maintain and update as needed

Memory Trick: D-D-M-M-C-O = "Data Doctors Make Models, Communicate, Operate"

Q2.2 (10 points)

Identify which lifecycle stage each scenario describes:

A: "The team is cleaning missing values and removing outliers."

B: "Management wants to understand why churn increased. Team is defining specific questions."

C: "The model is deployed in production. Team monitors accuracy weekly."

D: "Data scientists are testing Random Forest, SVM, and Logistic Regression."

💡 Click to View Answer & Solution

A: Cleaning missing values and outliers

Stage: DATA PREPARATION
Reason: Data cleaning is core preparation activity

B: Defining specific questions to answer

Stage: DISCOVERY
Reason: Understanding problem and defining scope

C: Model deployed, monitoring accuracy

Stage: OPERATIONALIZE
Reason: Production deployment and monitoring

D: Testing multiple algorithms

Stage: MODEL BUILDING
Reason: Training and comparing different models

Block 3: Regression Analysis (25 points)

Scenario:

Real estate price prediction model:

$$\text{Price} = 50000 + 200 \times \text{SquareFeet} + 15000 \times \text{Bedrooms} - 5000 \times \text{Age}$$

Statistic	Value
R²	0.82
Adjusted R²	0.80
All p-values	< 0.01
Sample size	200 houses

Q3.1 (8 points)

Interpret each coefficient in plain business language.

💡 Click to View Answer & Solution

Constant (50,000):

Base price when all other variables = 0
Theoretical minimum house value

SquareFeet Coefficient (200):

For each additional square foot, price increases by $200
Holding bedrooms and age constant

Bedrooms Coefficient (15,000):

Each additional bedroom adds $15,000 to price
Holding square feet and age constant

Age Coefficient (-5,000):

Each year older decreases price by $5,000
Negative = older houses worth less
Holding square feet and bedrooms constant

Key Phrase: Always say "holding other variables constant"

Q3.2 (8 points)

Interpret R² = 0.82. Calculate predicted price for a house with:

2,000 sq ft
3 bedrooms
10 years old

💡 Click to View Answer & Solution

R² Interpretation:

R² = 0.82 means 82% of the variation in house prices is explained by square feet, bedrooms, and age. The remaining 18% is due to other factors (location, condition, etc.). This is a good model for real estate.

Price Prediction: $\text{Price} = 50000 + 200(2000) + 15000(3) - 5000(10)$ $= 50000 + 400000 + 45000 - 50000$ $= 445000$

Predicted Price = $445,000

Q3.3 (9 points)

Compare two investment houses:

House A: 1,800 sq ft, 3 bedrooms, 5 years old House B: 1,500 sq ft, 4 bedrooms, 2 years old

Which has higher predicted price? Calculate the difference.

💡 Click to View Answer & Solution

House A: $= 50000 + 200(1800) + 15000(3) - 5000(5)$ $= 50000 + 360000 + 45000 - 25000$ $= 430000$

House B: $= 50000 + 200(1500) + 15000(4) - 5000(2)$ $= 50000 + 300000 + 60000 - 10000$ $= 400000$

Results:

House A: $430,000
House B: $400,000
House A is higher by $30,000

Insight: Square footage has more impact than bedrooms. House A's extra 300 sq ft ($60,000 value) outweighs House B's extra bedroom ($15,000).

Block 4: Data Mining Algorithms (25 points)

Q4.1 (15 points)

Complete the algorithm comparison table:

Aspect	Decision Tree	KNN	Naive Bayes
Algorithm Type	?	?	?
How it works	?	?	?
Main Advantage	?	?	?
Main Disadvantage	?	?	?
Best Use Case	?	?	?

💡 Click to View Answer & Solution

Aspect	Decision Tree	KNN	Naive Bayes
Type	Both (Classification & Regression)	Both	Classification
How it works	Splits data using if-then rules based on feature thresholds	Classifies based on k nearest neighbors' majority vote	Uses Bayes theorem with feature independence assumption
Advantage	Easy to interpret, visual	Simple, no training needed	Fast, works well with small data
Disadvantage	Prone to overfitting	Slow prediction (compares all points)	Assumes feature independence (often unrealistic)
Best Use Case	When explainability matters (credit decisions)	When similar items cluster together (recommendations)	Text classification (spam detection)

Q4.2 (10 points)

Scenario: Build email spam classifier with:

1 million emails (large dataset)
Need real-time predictions (<100ms)
Binary: Spam or Not Spam

Which algorithm? Why not the others?

💡 Click to View Answer & Solution

Best Choice: Naive Bayes

Reasons:

Very fast prediction - perfect for real-time (<100ms)
Works great for text classification (spam detection is classic NB use case)
Handles high-dimensional data well (many word features)
Scales to large datasets efficiently

Why NOT Decision Tree:

Can overfit with many text features
Large tree = slower prediction
Less suited for text data

Why NOT KNN:

Way too slow for 1 million emails
Must compare against ALL training examples
Real-time requirement impossible to meet
Memory intensive (stores all data)

Summary:

Naive Bayes: ✅ Fast, good for text, scalable
Decision Tree: ⚠️ Possible but not optimal for text
KNN: ❌ Too slow for large data + real-time

🎁 Bonus: Correlation vs Causation (5 extra points)

Explain the difference. Give an example where two variables are correlated but NOT causally related.

💡 Click to View Answer & Solution

Correlation: Two variables move together (positive or negative relationship)

Causation: One variable directly causes changes in another

Key Difference: Correlation ≠ Causation. Just because A and B move together doesn't mean A causes B (or B causes A).

Example:

Ice cream sales and drowning deaths are positively correlated
But ice cream doesn't CAUSE drowning!
Confounding variable: Hot weather
- Hot weather → More ice cream sales
- Hot weather → More swimming → More drownings

Other Examples:

Shoe size correlates with reading ability (age is the confounding variable)
Nicolas Cage movies correlate with swimming pool drownings (pure coincidence)

🏁 End of Exam

Block	Topic	Points
Block 1	Analytics Types	25
Block 2	Lifecycle	25
Block 3	Regression	25
Block 4	Data Mining	25
Total		100
Bonus	Correlation/Causation	+5

📝 Quick Reference

Analytics Types:

Descriptive: What happened?
Diagnostic: Why?
Predictive: What will happen?
Prescriptive: What should we do?

Lifecycle: Discovery → Data Prep → Model Plan → Model Build → Communicate → Operationalize

Regression: Coefficient = change in Y per 1-unit change in X

R²: % of variance explained by model

Show your work for partial credit. Good luck!

ABW501 Mock Exam 1 - Analytics Edge

📋 Exam Information

Item	Details
Total Points	100
Time Allowed	90 minutes
Format	Closed book, calculator allowed
Structure	4 Blocks, 8 Questions total

Block 1: Analytics Types & Applications (25 points)

Q1.1 (12 points)

Complete the comparison table for four analytics types:

Analytics Type	Key Question	Example Technique	Business Example
Descriptive	?	?	?
Diagnostic	?	?	?
Predictive	?	?	?
Prescriptive	?	?	?

💡 Click to View Answer & Solution

Analytics Type	Key Question	Example Technique	Business Example
Descriptive	What happened?	Summary statistics, dashboards, reporting	Monthly sales report showing $2M revenue
Diagnostic	Why did it happen?	Drill-down analysis, correlation analysis	Sales drop due to competitor price cut
Predictive	What will happen?	Regression, ML models, forecasting	Customer churn prediction (70% likely to leave)
Prescriptive	What should we do?	Optimization, simulation, decision models	Optimal inventory levels to minimize costs

Memory Trick:

Descriptive = Explain past
Diagnostic = Investigate why
Predictive = Estimate future
Prescriptive = Execute action

Q1.2 (13 points)

Match each business problem to the correct analytics type and explain:

A: "Our sales dropped 15% last quarter. What caused this decline?"

B: "Which products should we recommend based on browsing patterns?"

C: "What is the optimal price point to maximize profit?"

D: "What were our top 5 selling products last month?"

💡 Click to View Answer & Solution

A: "Sales dropped 15%. What caused this?"

Type: DIAGNOSTIC
Reason: Investigating WHY something happened (root cause analysis)

B: "Which products to recommend based on browsing?"

Type: PREDICTIVE
Reason: Using patterns to predict what customers will want

C: "Optimal price point to maximize profit?"

Type: PRESCRIPTIVE
Reason: Optimization problem - determining best action

D: "Top 5 selling products last month?"

Type: DESCRIPTIVE
Reason: Simply summarizing historical data

Block 2: Analytics Lifecycle (25 points)

Q2.1 (15 points)

List and describe the SIX stages of Data Analytics Lifecycle in order.

💡 Click to View Answer & Solution

Stage 1: DISCOVERY

Understand business problem and objectives
Define key questions to answer
Identify stakeholders and success criteria

Stage 2: DATA PREPARATION

Collect data from various sources
Clean data (handle missing values, outliers)
Transform and integrate datasets

Stage 3: MODEL PLANNING

Select appropriate techniques/algorithms
Identify features (variables) to use
Plan evaluation metrics

Stage 4: MODEL BUILDING

Build and train models
Test different algorithms
Tune hyperparameters

Stage 5: COMMUNICATE RESULTS

Present findings to stakeholders
Create visualizations and reports
Translate technical results to business insights

Stage 6: OPERATIONALIZE

Deploy model to production
Monitor performance over time
Maintain and update as needed

Memory Trick: D-D-M-M-C-O = "Data Doctors Make Models, Communicate, Operate"

Q2.2 (10 points)

Identify which lifecycle stage each scenario describes:

A: "The team is cleaning missing values and removing outliers."

B: "Management wants to understand why churn increased. Team is defining specific questions."

C: "The model is deployed in production. Team monitors accuracy weekly."

D: "Data scientists are testing Random Forest, SVM, and Logistic Regression."

💡 Click to View Answer & Solution

A: Cleaning missing values and outliers

Stage: DATA PREPARATION
Reason: Data cleaning is core preparation activity

B: Defining specific questions to answer

Stage: DISCOVERY
Reason: Understanding problem and defining scope

C: Model deployed, monitoring accuracy

Stage: OPERATIONALIZE
Reason: Production deployment and monitoring

D: Testing multiple algorithms

Stage: MODEL BUILDING
Reason: Training and comparing different models

Block 3: Regression Analysis (25 points)

Scenario:

Real estate price prediction model:

$$\text{Price} = 50000 + 200 \times \text{SquareFeet} + 15000 \times \text{Bedrooms} - 5000 \times \text{Age}$$

Statistic	Value
R²	0.82
Adjusted R²	0.80
All p-values	< 0.01
Sample size	200 houses

Q3.1 (8 points)

Interpret each coefficient in plain business language.

💡 Click to View Answer & Solution

Constant (50,000):

Base price when all other variables = 0
Theoretical minimum house value

SquareFeet Coefficient (200):

For each additional square foot, price increases by $200
Holding bedrooms and age constant

Bedrooms Coefficient (15,000):

Each additional bedroom adds $15,000 to price
Holding square feet and age constant

Age Coefficient (-5,000):

Each year older decreases price by $5,000
Negative = older houses worth less
Holding square feet and bedrooms constant

Key Phrase: Always say "holding other variables constant"

Q3.2 (8 points)

Interpret R² = 0.82. Calculate predicted price for a house with:

2,000 sq ft
3 bedrooms
10 years old

💡 Click to View Answer & Solution

R² Interpretation:

Price Prediction: $\text{Price} = 50000 + 200(2000) + 15000(3) - 5000(10)$ $= 50000 + 400000 + 45000 - 50000$ $= 445000$

Predicted Price = $445,000

Q3.3 (9 points)

Compare two investment houses:

House A: 1,800 sq ft, 3 bedrooms, 5 years old House B: 1,500 sq ft, 4 bedrooms, 2 years old

Which has higher predicted price? Calculate the difference.

💡 Click to View Answer & Solution

House A: $= 50000 + 200(1800) + 15000(3) - 5000(5)$ $= 50000 + 360000 + 45000 - 25000$ $= 430000$

House B: $= 50000 + 200(1500) + 15000(4) - 5000(2)$ $= 50000 + 300000 + 60000 - 10000$ $= 400000$

Results:

House A: $430,000
House B: $400,000
House A is higher by $30,000

Insight: Square footage has more impact than bedrooms. House A's extra 300 sq ft ($60,000 value) outweighs House B's extra bedroom ($15,000).

Block 4: Data Mining Algorithms (25 points)

Q4.1 (15 points)

Complete the algorithm comparison table:

Aspect	Decision Tree	KNN	Naive Bayes
Algorithm Type	?	?	?
How it works	?	?	?
Main Advantage	?	?	?
Main Disadvantage	?	?	?
Best Use Case	?	?	?

💡 Click to View Answer & Solution

Aspect	Decision Tree	KNN	Naive Bayes
Type	Both (Classification & Regression)	Both	Classification
How it works	Splits data using if-then rules based on feature thresholds	Classifies based on k nearest neighbors' majority vote	Uses Bayes theorem with feature independence assumption
Advantage	Easy to interpret, visual	Simple, no training needed	Fast, works well with small data
Disadvantage	Prone to overfitting	Slow prediction (compares all points)	Assumes feature independence (often unrealistic)
Best Use Case	When explainability matters (credit decisions)	When similar items cluster together (recommendations)	Text classification (spam detection)

Q4.2 (10 points)

Scenario: Build email spam classifier with:

1 million emails (large dataset)
Need real-time predictions (<100ms)
Binary: Spam or Not Spam

Which algorithm? Why not the others?

💡 Click to View Answer & Solution

Best Choice: Naive Bayes

Reasons:

Very fast prediction - perfect for real-time (<100ms)
Works great for text classification (spam detection is classic NB use case)
Handles high-dimensional data well (many word features)
Scales to large datasets efficiently

Why NOT Decision Tree:

Can overfit with many text features
Large tree = slower prediction
Less suited for text data

Why NOT KNN:

Way too slow for 1 million emails
Must compare against ALL training examples
Real-time requirement impossible to meet
Memory intensive (stores all data)

Summary:

Naive Bayes: ✅ Fast, good for text, scalable
Decision Tree: ⚠️ Possible but not optimal for text
KNN: ❌ Too slow for large data + real-time

🎁 Bonus: Correlation vs Causation (5 extra points)

Explain the difference. Give an example where two variables are correlated but NOT causally related.

💡 Click to View Answer & Solution

Correlation: Two variables move together (positive or negative relationship)

Causation: One variable directly causes changes in another

Key Difference: Correlation ≠ Causation. Just because A and B move together doesn't mean A causes B (or B causes A).

Example:

Ice cream sales and drowning deaths are positively correlated
But ice cream doesn't CAUSE drowning!
Confounding variable: Hot weather
- Hot weather → More ice cream sales
- Hot weather → More swimming → More drownings

Other Examples:

Shoe size correlates with reading ability (age is the confounding variable)
Nicolas Cage movies correlate with swimming pool drownings (pure coincidence)

🏁 End of Exam

Block	Topic	Points
Block 1	Analytics Types	25
Block 2	Lifecycle	25
Block 3	Regression	25
Block 4	Data Mining	25
Total		100
Bonus	Correlation/Causation	+5

📝 Quick Reference

Analytics Types:

Descriptive: What happened?
Diagnostic: Why?
Predictive: What will happen?
Prescriptive: What should we do?

Lifecycle: Discovery → Data Prep → Model Plan → Model Build → Communicate → Operationalize

Regression: Coefficient = change in Y per 1-unit change in X

R²: % of variance explained by model

Show your work for partial credit. Good luck!

ABW501 Mock Exam 1 - Analytics Edge

📋 Exam Information

Block 1: Analytics Types & Applications (25 points)

Q1.1 (12 points)

Q1.2 (13 points)

Block 2: Analytics Lifecycle (25 points)

Q2.1 (15 points)

Q2.2 (10 points)

Block 3: Regression Analysis (25 points)

Scenario:

Q3.1 (8 points)

Q3.2 (8 points)

Q3.3 (9 points)

Block 4: Data Mining Algorithms (25 points)

Q4.1 (15 points)

Q4.2 (10 points)

🎁 Bonus: Correlation vs Causation (5 extra points)

🏁 End of Exam

📝 Quick Reference

💬 评论

ABW501 Mock Exam 1 - Analytics Edge

📋 Exam Information

Block 1: Analytics Types & Applications (25 points)

Q1.1 (12 points)

Q1.2 (13 points)

Block 2: Analytics Lifecycle (25 points)

Q2.1 (15 points)

Q2.2 (10 points)

Block 3: Regression Analysis (25 points)

Scenario:

Q3.1 (8 points)

Q3.2 (8 points)

Q3.3 (9 points)

Block 4: Data Mining Algorithms (25 points)

Q4.1 (15 points)

Q4.2 (10 points)

🎁 Bonus: Correlation vs Causation (5 extra points)

🏁 End of Exam

📝 Quick Reference

💬 评论