ABW505 Mock Exam 1: Python & Machine Learning (Verified Answers)
19 分钟阅读
ABW505 Mock Exam 1 - Python & Machine Learning
📋 Exam Information
| Item | Details |
|---|---|
| Total Points | 100 |
| Time Allowed | 90 minutes |
| Format | Closed book, calculator allowed |
| Structure | Q1 (20pts) + Q2 (30pts, choose 3/5) + Q3 (25pts) + Q4 (25pts) |
Question 1: Python Output Analysis (20 points)
Answer ALL questions. Determine exact output.
Q1.1 (5 points)
x = 15
y = 4
print((x // y) ** 2 + x % y)💡 Click to View Answer & Explanation
Step-by-step breakdown:
# Given values
x = 15
y = 4
# Step 1: Floor division x // y
# 15 // 4 = 3 (integer part of 15/4 = 3.75)
floor_result = 15 // 4 # = 3
# Step 2: Modulo x % y
# 15 % 4 = 3 (remainder when 15 divided by 4)
# 15 = 4 × 3 + 3, so remainder is 3
mod_result = 15 % 4 # = 3
# Step 3: Power (floor_result) ** 2
# 3 ** 2 = 9
power_result = 3 ** 2 # = 9
# Step 4: Addition
# 9 + 3 = 12
final = 9 + 3 # = 12Answer: 12
Key operators explained:
| Operator | Name | Example |
|---|---|---|
// | Floor division | 15 // 4 = 3 |
% | Modulo | 15 % 4 = 3 |
** | Exponentiation | 3 ** 2 = 9 |
Q1.2 (5 points)
numbers = [10, 20, 30, 40, 50]
numbers[1:4] = [100]
print(len(numbers))
print(numbers[2])💡 Click to View Answer & Explanation
Step-by-step breakdown:
# Original list
numbers = [10, 20, 30, 40, 50]
# Indices: 0 1 2 3 4
# Slice assignment: numbers[1:4] = [100]
# This replaces elements at indices 1, 2, 3 with a single element 100
# Before: [10, 20, 30, 40, 50]
# ^^^^^^^^^^^^ <- indices 1:4 (elements 20, 30, 40)
# After: [10, 100, 50]
# ^^^ <- replaced with single element
# Result after slice assignment
# numbers = [10, 100, 50]
# Indices: 0 1 2
# len(numbers) = 3 (was 5, replaced 3 elements with 1)
# numbers[2] = 50 (third element)Answers:
len(numbers)→3numbers[2]→50
Important concept: Slice assignment can change list size! Here we replaced 3 elements (indices 1, 2, 3) with 1 element, reducing length from 5 to 3.
Q1.3 (5 points)
def mystery(a, b=5, c=10):
return a * 2 + b - c
result = mystery(3, c=4)
print(result)💡 Click to View Answer & Explanation
Step-by-step breakdown:
# Function definition
def mystery(a, b=5, c=10):
# a: required parameter
# b: optional, default = 5
# c: optional, default = 10
return a * 2 + b - c
# Function call: mystery(3, c=4)
# a = 3 (positional argument, first position)
# b = 5 (uses default value, NOT provided in call)
# c = 4 (keyword argument, overrides default of 10)
# Calculation:
# a * 2 + b - c
# = 3 * 2 + 5 - 4
# = 6 + 5 - 4
# = 7Answer: 7
Key concept: Keyword arguments (c=4) allow you to skip over parameters with defaults. Here b uses its default value of 5.
Q1.4 (5 points)
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
total = 0
for key in data:
total += sum(data[key])
print(total)💡 Click to View Answer & Explanation
Step-by-step breakdown:
# Dictionary with lists as values
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
total = 0
# Iterating over a dictionary gives KEYS, not values
for key in data: # key will be 'A', then 'B'
# First iteration: key = 'A'
# data['A'] = [1, 2, 3]
# sum([1, 2, 3]) = 6
# total = 0 + 6 = 6
# Second iteration: key = 'B'
# data['B'] = [4, 5, 6]
# sum([4, 5, 6]) = 15
# total = 6 + 15 = 21
total += sum(data[key])
print(total) # 21Answer: 21
Calculation summary:
- sum([1, 2, 3]) = 6
- sum([4, 5, 6]) = 15
- Total = 6 + 15 = 21
Question 2: Code Writing (30 points)
Choose 3 out of 5 questions. Each worth 10 points.
Q2.1 - Grade Calculator (10 points)
Write a function grade_calculator(score) that:
- Returns letter grade: 90+ → "A", 80+ → "B", 70+ → "C", 60+ → "D", <60 → "F"
- Returns "Invalid" for scores < 0 or > 100
💡 Click to View Verified Answer
def grade_calculator(score):
"""
Convert numeric score to letter grade.
Args:
score: Numeric score (expected range: 0-100)
Returns:
str: Letter grade (A/B/C/D/F) or "Invalid" for out-of-range scores
Examples:
>>> grade_calculator(95)
'A'
>>> grade_calculator(-5)
'Invalid'
"""
# STEP 1: Validate input range FIRST
# Must check invalid cases before checking grade ranges
if score < 0 or score > 100:
return "Invalid"
# STEP 2: Check grades from highest to lowest
# Using elif ensures only one condition matches
if score >= 90:
return "A" # 90-100
elif score >= 80:
return "B" # 80-89
elif score >= 70:
return "C" # 70-79
elif score >= 60:
return "D" # 60-69
else:
return "F" # 0-59
# ===== Test Cases =====
if __name__ == "__main__":
test_cases = [
(95, "A"),
(85, "B"),
(73, "C"),
(65, "D"),
(45, "F"),
(-5, "Invalid"),
(105, "Invalid"),
(100, "A"), # Edge case: exactly 100
(0, "F"), # Edge case: exactly 0
]
print("Testing grade_calculator:")
for score, expected in test_cases:
result = grade_calculator(score)
status = "✓" if result == expected else "✗"
print(f" {status} grade_calculator({score}) = {result} (expected {expected})")Test Output:
Testing grade_calculator:
✓ grade_calculator(95) = A (expected A)
✓ grade_calculator(85) = B (expected B)
✓ grade_calculator(73) = C (expected C)
✓ grade_calculator(65) = D (expected D)
✓ grade_calculator(45) = F (expected F)
✓ grade_calculator(-5) = Invalid (expected Invalid)
✓ grade_calculator(105) = Invalid (expected Invalid)
✓ grade_calculator(100) = A (expected A)
✓ grade_calculator(0) = F (expected F)
Common mistakes to avoid:
- Not validating input range first
- Using multiple
ifstatements instead ofelif - Checking in wrong order (e.g., 60+ before 90+)
Q2.2 - Remove Duplicates (10 points)
Write a function remove_duplicates(lst) that:
- Removes duplicates from a list
- Preserves the order of first occurrence
- Example:
[1, 2, 2, 3, 1, 4]→[1, 2, 3, 4]
💡 Click to View Verified Answer
def remove_duplicates(lst):
"""
Remove duplicate elements while preserving order of first occurrence.
Args:
lst: Input list with possible duplicates
Returns:
list: New list with duplicates removed, order preserved
Examples:
>>> remove_duplicates([1, 2, 2, 3, 1, 4])
[1, 2, 3, 4]
"""
# Track elements we've already seen
seen = []
# Iterate through original list
for item in lst:
# Only add to result if not seen before
if item not in seen:
seen.append(item)
return seen
# Alternative approach using dictionary (Python 3.7+ preserves order)
def remove_duplicates_v2(lst):
"""
Remove duplicates using dict.fromkeys() - more efficient for large lists.
Works because dictionaries preserve insertion order in Python 3.7+.
"""
return list(dict.fromkeys(lst))
# ===== Test Cases =====
if __name__ == "__main__":
test_cases = [
[1, 2, 2, 3, 1, 4], # Basic case
[5, 5, 5, 5], # All duplicates
[1, 2, 3, 4], # No duplicates
[], # Empty list
['a', 'b', 'a', 'c'], # Strings
]
print("Testing remove_duplicates:")
for test in test_cases:
result = remove_duplicates(test)
print(f" {test} → {result}")Test Output:
Testing remove_duplicates:
[1, 2, 2, 3, 1, 4] → [1, 2, 3, 4]
[5, 5, 5, 5] → [5]
[1, 2, 3, 4] → [1, 2, 3, 4]
[] → []
['a', 'b', 'a', 'c'] → ['a', 'b', 'c']
Why not use set()? Sets don't preserve order! list(set([1, 2, 2, 3, 1, 4])) might give [1, 2, 3, 4] but order is NOT guaranteed.
Q2.3 - Fibonacci Sequence (10 points)
Write a function fibonacci(n) that:
- Returns the first n Fibonacci numbers as a list
- Sequence: 0, 1, 1, 2, 3, 5, 8, 13...
💡 Click to View Verified Answer
def fibonacci(n):
"""
Generate the first n Fibonacci numbers.
Fibonacci sequence: Each number is the sum of the two preceding ones.
Starts with 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ...
Args:
n: Number of Fibonacci numbers to generate (non-negative integer)
Returns:
list: First n Fibonacci numbers
Examples:
>>> fibonacci(5)
[0, 1, 1, 2, 3]
>>> fibonacci(0)
[]
"""
# Handle edge cases
if n <= 0:
return [] # No numbers requested
if n == 1:
return [0] # Only first number
# Initialize with first two Fibonacci numbers
result = [0, 1]
# Generate remaining numbers
for i in range(2, n):
# Each new number = sum of last two numbers
# Using negative indexing: result[-1] is last, result[-2] is second-to-last
next_num = result[-1] + result[-2]
result.append(next_num)
return result
# ===== Test Cases =====
if __name__ == "__main__":
test_cases = [0, 1, 2, 5, 8, 10]
print("Testing fibonacci:")
for n in test_cases:
result = fibonacci(n)
print(f" fibonacci({n}) = {result}")Test Output:
Testing fibonacci:
fibonacci(0) = []
fibonacci(1) = [0]
fibonacci(2) = [0, 1]
fibonacci(5) = [0, 1, 1, 2, 3]
fibonacci(8) = [0, 1, 1, 2, 3, 5, 8, 13]
fibonacci(10) = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
How it works:
Position: 0 1 2 3 4 5 6 7
Value: 0 1 1 2 3 5 8 13
↑ ↑
0+1=1 1+1=2 1+2=3 2+3=5 3+5=8
Q2.4 - Prime Number Check (10 points)
Write a function is_prime(num) that:
- Returns True if num is prime, False otherwise
- Handle edge cases (num < 2)
💡 Click to View Verified Answer
def is_prime(num):
"""
Check if a number is prime.
A prime number is a natural number greater than 1 that has no positive
divisors other than 1 and itself.
Args:
num: Integer to check
Returns:
bool: True if prime, False otherwise
Examples:
>>> is_prime(7)
True
>>> is_prime(12)
False
"""
# Numbers less than 2 are not prime by definition
# This handles 0, 1, and negative numbers
if num < 2:
return False
# 2 is the only even prime number
if num == 2:
return True
# All other even numbers are not prime
# (They're divisible by 2)
if num % 2 == 0:
return False
# Check odd divisors from 3 up to √num
# Why √num? If num = a × b, at least one of a,b must be ≤ √num
# If no divisor found up to √num, num is prime
for i in range(3, int(num ** 0.5) + 1, 2): # Step by 2 (odd numbers only)
if num % i == 0:
return False # Found a divisor, not prime
return True # No divisors found, it's prime
# ===== Test Cases =====
if __name__ == "__main__":
test_cases = [
(1, False), # Not prime (less than 2)
(2, True), # Prime (smallest prime)
(3, True), # Prime
(4, False), # Not prime (2 × 2)
(7, True), # Prime
(9, False), # Not prime (3 × 3)
(11, True), # Prime
(25, False), # Not prime (5 × 5)
(29, True), # Prime
(-5, False), # Negative, not prime
]
print("Testing is_prime:")
for num, expected in test_cases:
result = is_prime(num)
status = "✓" if result == expected else "✗"
print(f" {status} is_prime({num}) = {result}")Test Output:
Testing is_prime:
✓ is_prime(1) = False
✓ is_prime(2) = True
✓ is_prime(3) = True
✓ is_prime(4) = False
✓ is_prime(7) = True
✓ is_prime(9) = False
✓ is_prime(11) = True
✓ is_prime(25) = False
✓ is_prime(29) = True
✓ is_prime(-5) = False
Optimization: Checking up to √n instead of n reduces time complexity from O(n) to O(√n).
Q2.5 - Tuple Statistics (10 points)
Write a function tuple_stats(data) that:
- Input: tuple of numbers
- Return: tuple of (min, max, average rounded to 2 decimals)
💡 Click to View Verified Answer
def tuple_stats(data):
"""
Calculate statistics for a tuple of numbers.
Args:
data: Tuple of numeric values
Returns:
tuple: (minimum, maximum, average) where average is rounded to 2 decimals
Raises:
ValueError: If tuple is empty
Examples:
>>> tuple_stats((10, 20, 30, 40))
(10, 40, 25.0)
"""
# Handle empty tuple edge case
if len(data) == 0:
raise ValueError("Cannot compute stats for empty tuple")
# Calculate statistics using built-in functions
minimum = min(data) # Smallest value
maximum = max(data) # Largest value
average = sum(data) / len(data) # Arithmetic mean
# Round average to 2 decimal places
average = round(average, 2)
# Return as tuple (note: using parentheses to make it clear)
return (minimum, maximum, average)
# ===== Test Cases =====
if __name__ == "__main__":
test_cases = [
(10, 20, 30, 40), # Even spread
(5, 15, 25), # Odd count
(7,), # Single element
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), # Larger tuple
]
print("Testing tuple_stats:")
for data in test_cases:
result = tuple_stats(data)
print(f" tuple_stats({data})")
print(f" → (min={result[0]}, max={result[1]}, avg={result[2]})")Test Output:
Testing tuple_stats:
tuple_stats((10, 20, 30, 40))
→ (min=10, max=40, avg=25.0)
tuple_stats((5, 15, 25))
→ (min=5, max=25, avg=15.0)
tuple_stats((7,))
→ (min=7, max=7, avg=7.0)
tuple_stats((1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
→ (min=1, max=10, avg=5.5)
Question 3: Pandas & ML Basics (25 points)
Part A: Theory (10 points)
Q3.A1 (5 points) Explain the difference between fit() and predict() in scikit-learn.
💡 Click to View Answer
| Method | Purpose | When Called | What It Does |
|---|---|---|---|
fit() | Train the model | Once, on training data | Learns patterns/parameters from data |
predict() | Use the model | On test/new data | Applies learned patterns to make predictions |
Workflow example:
# Step 1: Create model
model = DecisionTreeClassifier()
# Step 2: Train model (learn from training data)
model.fit(X_train, y_train) # Learns patterns
# Step 3: Use model (apply to new data)
predictions = model.predict(X_test) # Makes predictionsAnalogy:
fit()= studying for an exampredict()= taking the exam
Q3.A2 (5 points) Why do we need train-test split? Why not use all data for training?
💡 Click to View Answer
Why train-test split is essential:
-
Evaluate on unseen data: We need to test how the model performs on data it hasn't seen during training.
-
Detect overfitting: If we train and test on the same data, the model might just memorize the answers (overfitting). Train-test split reveals if the model generalizes well.
-
Simulate real-world usage: In production, the model will encounter new, unseen data. Testing on held-out data simulates this.
-
Get honest performance estimate: Training accuracy is often misleadingly high; test accuracy gives a realistic measure.
What happens without split:
- Model could achieve 100% accuracy on training data
- But fail completely on new data
- No way to detect this problem until deployment
Typical split ratios:
- 80/20 (training/test)
- 70/30 (training/test)
Part B: Pandas Code (15 points)
Given this DataFrame:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, None, 28],
'Salary': [50000, 60000, 75000, 65000, None],
'Department': ['IT', 'HR', 'IT', 'Finance', 'HR']
}
df = pd.DataFrame(data)Q3.B1 (5 points) Fill missing Age with mean, missing Salary with 55000.
💡 Click to View Verified Answer
import pandas as pd
# Create the DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, None, 28],
'Salary': [50000, 60000, 75000, 65000, None],
'Department': ['IT', 'HR', 'IT', 'Finance', 'HR']
}
df = pd.DataFrame(data)
print("Before filling:")
print(df)
print()
# Method 1: Using fillna with inplace=True
# Fill missing Age with mean
age_mean = df['Age'].mean() # Calculate mean first (ignores NaN)
print(f"Age mean (excluding NaN): {age_mean}") # = 29.5
df['Age'].fillna(age_mean, inplace=True)
# Fill missing Salary with 55000
df['Salary'].fillna(55000, inplace=True)
print("\nAfter filling:")
print(df)Output:
Before filling:
Name Age Salary Department
0 Alice 25.0 50000.0 IT
1 Bob 30.0 60000.0 HR
2 Charlie 35.0 75000.0 IT
3 David NaN 65000.0 Finance
4 Eve 28.0 NaN HR
Age mean (excluding NaN): 29.5
After filling:
Name Age Salary Department
0 Alice 25.0 50000.0 IT
1 Bob 30.0 60000.0 HR
2 Charlie 35.0 75000.0 IT
3 David 29.5 65000.0 Finance
4 Eve 28.0 55000.0 HR
Note: mean() automatically ignores NaN values when calculating.
Q3.B2 (5 points) Calculate average salary by Department.
💡 Click to View Verified Answer
# Group by Department and calculate mean of Salary
avg_salary_by_dept = df.groupby('Department')['Salary'].mean()
print("Average Salary by Department:")
print(avg_salary_by_dept)Output (after filling missing values):
Average Salary by Department:
Department
Finance 65000.0
HR 57500.0
IT 62500.0
Name: Salary, dtype: float64
Calculation breakdown:
- Finance: 65000 (only David)
- HR: (60000 + 55000) / 2 = 57500 (Bob + Eve)
- IT: (50000 + 75000) / 2 = 62500 (Alice + Charlie)
Q3.B3 (5 points) Select IT employees with Age > 26.
💡 Click to View Verified Answer
# Filter with multiple conditions
# IMPORTANT: Use & for AND, | for OR
# IMPORTANT: Wrap each condition in parentheses
result = df[(df['Department'] == 'IT') & (df['Age'] > 26)]
print("IT employees with Age > 26:")
print(result)Output:
IT employees with Age > 26:
Name Age Salary Department
2 Charlie 35.0 75000.0 IT
Syntax rules for pandas filtering:
- Use
&instead ofand - Use
|instead ofor - Use
~instead ofnot - Wrap each condition in parentheses
Wrong: df[df['A'] == 1 and df['B'] == 2]
Correct: df[(df['A'] == 1) & (df['B'] == 2)]
Question 4: Decision Tree & Naive Bayes (25 points)
Part A: Theory (10 points)
Q4.A1 (5 points) List THREE advantages of Decision Trees.
💡 Click to View Answer
-
Easy to interpret and visualize
- Can draw the tree and follow decision paths
- Non-technical stakeholders can understand the logic
- "If-then" rules are intuitive
-
No feature scaling required
- Works directly with raw data values
- Unlike SVM or KNN, doesn't need normalization
- Saves preprocessing time
-
Handles both numerical and categorical data
- Can split on continuous values (Age > 30)
- Can split on categories (Color == 'Red')
- Versatile for mixed datasets
-
Captures non-linear relationships
- Can model complex decision boundaries
- Doesn't assume linear separability
-
Shows feature importance
- Reveals which features matter most
- Helps with feature selection
Q4.A2 (5 points) Gini Index vs Information Gain. Which does CART use?
💡 Click to View Answer
| Metric | Formula | Range (binary) | Used By |
|---|---|---|---|
| Gini Index | 1 - Σ(pᵢ²) | 0 to 0.5 | CART |
| Entropy/Information Gain | -Σ(pᵢ log₂ pᵢ) | 0 to 1 | ID3, C4.5 |
CART (Classification and Regression Trees) uses Gini Index.
Why Gini?
- Faster to compute (no logarithm)
- Similar results to entropy in practice
- Slightly favors larger partitions
Interpretation:
- Gini = 0 → Pure node (all same class)
- Gini = 0.5 → Maximum impurity (50/50 split)
Part B: Gini Calculation (15 points)
Scenario: Email classification with 20 emails (12 Spam, 8 Not Spam)
Split 1 - "Contains free":
- Contains "free": 10 emails (9 Spam, 1 Not Spam)
- No "free": 10 emails (3 Spam, 7 Not Spam)
Split 2 - "Contains meeting":
- Contains "meeting": 8 emails (2 Spam, 6 Not Spam)
- No "meeting": 12 emails (10 Spam, 2 Not Spam)
Q4.B1 (8 points) Calculate Gini Index for Split 1.
💡 Click to View Verified Answer
Formula: Gini = 1 - Σ(pᵢ²)
Step 1: Gini for "Contains free" node (10 emails: 9 Spam, 1 Not Spam)
P(Spam) = 9/10 = 0.9
P(Not Spam) = 1/10 = 0.1
Gini = 1 - (0.9² + 0.1²)
= 1 - (0.81 + 0.01)
= 1 - 0.82
= 0.18
Step 2: Gini for "No free" node (10 emails: 3 Spam, 7 Not Spam)
P(Spam) = 3/10 = 0.3
P(Not Spam) = 7/10 = 0.7
Gini = 1 - (0.3² + 0.7²)
= 1 - (0.09 + 0.49)
= 1 - 0.58
= 0.42
Step 3: Weighted Average Gini
Gini(Split 1) = (10/20) × 0.18 + (10/20) × 0.42
= 0.5 × 0.18 + 0.5 × 0.42
= 0.09 + 0.21
= 0.30
Answer: Split 1 Gini = 0.30
Q4.B2 (7 points) Calculate Gini Index for Split 2. Which split is better?
💡 Click to View Verified Answer
Step 1: Gini for "Contains meeting" node (8 emails: 2 Spam, 6 Not Spam)
P(Spam) = 2/8 = 0.25
P(Not Spam) = 6/8 = 0.75
Gini = 1 - (0.25² + 0.75²)
= 1 - (0.0625 + 0.5625)
= 1 - 0.625
= 0.375
Step 2: Gini for "No meeting" node (12 emails: 10 Spam, 2 Not Spam)
P(Spam) = 10/12 = 0.833
P(Not Spam) = 2/12 = 0.167
Gini = 1 - (0.833² + 0.167²)
= 1 - (0.694 + 0.028)
= 1 - 0.722
= 0.278
Step 3: Weighted Average Gini
Gini(Split 2) = (8/20) × 0.375 + (12/20) × 0.278
= 0.4 × 0.375 + 0.6 × 0.278
= 0.15 + 0.167
= 0.317
Answer: Split 2 Gini = 0.317
Comparison:
| Split | Gini Index |
|---|---|
| Split 1 ("free") | 0.30 ← Better |
| Split 2 ("meeting") | 0.317 |
Better split: Split 1 ("Contains free")
Reason: Lower Gini = Lower impurity = Better separation of classes
🏁 End of Exam
| Question | Topic | Points |
|---|---|---|
| Q1 | Python Output Analysis | 20 |
| Q2 | Code Writing (choose 3/5) | 30 |
| Q3 | Pandas & ML Theory | 25 |
| Q4 | Decision Tree & Gini | 25 |
| Total | 100 |
📝 Key Formulas Reference
| Concept | Formula |
|---|---|
| Gini Index | 1 - Σ(pᵢ²) |
| Entropy | -Σ pᵢ log₂(pᵢ) |
| Info Gain | H(parent) - Σ weighted H(children) |
| Bayes | P(A|B) ∝ P(B|A) × P(A) |
| Z-score | (x - μ) / σ |
| MinMax | (x - min) / (max - min) |
All code verified and tested. Show your work for partial credit. Good luck!