TechBlog
HomeBlogCategoriesTagsLinksGuestbookAbout
中
TechBlog

探索技术世界的无限可能。分享前沿技术、开发经验与行业洞察。

快速链接

  • 博客文章
  • 文章分类
  • 标签云
  • 关于我

热门分类

  • Technology
  • AI
  • Web Development
  • DevOps

© 2026 TechBlog. All rights reserved.

Built with using Next.js & Tailwind CSS

Back to posts
Mock Exams

ABW505 Mock Exam 2: Python & Machine Learning (Verified Answers)

January 24, 2026
20 min read

ABW505 Mock Exam 2 - Python & Machine Learning

📋 Exam Information

ItemDetails
Total Points100
Time Allowed90 minutes
FormatClosed book, calculator allowed
StructureQ1 (20pts) + Q2 (30pts, choose 3/5) + Q3 (25pts) + Q4 (25pts)

Question 1: Python Output Analysis (20 points)

Answer ALL. Determine exact output.

Q1.1 (5 points)

a = [1, 2, 3]
b = a
b.append(4)
print(a)
print(a is b)

💡 Click to View Answer & Explanation

Step-by-step breakdown:

# Step 1: Create a list and assign to variable 'a'
a = [1, 2, 3]
# Memory: a points to list object [1, 2, 3]
 
# Step 2: Assign 'a' to 'b'
b = a
# IMPORTANT: This does NOT copy the list!
# Both 'a' and 'b' now point to the SAME list object in memory
# Memory: a → [1, 2, 3] ← b
 
# Step 3: Modify list through 'b'
b.append(4)
# Since a and b point to the same object, 
# the change is visible through both variables
# Memory: a → [1, 2, 3, 4] ← b
 
# Step 4: Print results
print(a)        # [1, 2, 3, 4] - modified through b
print(a is b)   # True - same object in memory

Answers:

  • print(a) → [1, 2, 3, 4]
  • print(a is b) → True

Key concept: In Python, assignment creates a REFERENCE, not a copy.

To create an independent copy:

b = a.copy()     # Method 1: copy() method
b = a[:]         # Method 2: slice notation
b = list(a)      # Method 3: list constructor

Q1.2 (5 points)

text = "Hello World"
print(text[0:5:2])
print(text[-5:-1])

💡 Click to View Answer & Explanation

Step-by-step breakdown:

text = "Hello World"
 
# Index map:
# Character: H   e   l   l   o       W   o   r   l   d
# Positive:  0   1   2   3   4   5   6   7   8   9   10
# Negative:-11 -10  -9  -8  -7  -6  -5  -4  -3  -2  -1
 
# Line 1: text[0:5:2]
# Format: [start:stop:step]
# start=0 (H), stop=5 (exclusive), step=2 (every 2nd char)
# Indices: 0, 2, 4 → Characters: 'H', 'l', 'o'
result1 = text[0:5:2]  # "Hlo"
 
# Line 2: text[-5:-1]
# start=-5 (W), stop=-1 (exclusive, before 'd')
# Indices: -5, -4, -3, -2 → Characters: 'W', 'o', 'r', 'l'
result2 = text[-5:-1]  # "Worl"

Answers:

  • text[0:5:2] → Hlo
  • text[-5:-1] → Worl

Slicing syntax: [start:stop:step]

  • start: inclusive (default: 0)
  • stop: exclusive (default: end)
  • step: increment (default: 1)

Q1.3 (5 points)

def outer():
    x = 10
    def inner():
        nonlocal x
        x += 5
        return x
    return inner()
 
print(outer())
print(outer())

💡 Click to View Answer & Explanation

Step-by-step breakdown:

def outer():
    x = 10  # Local variable in outer's scope
    
    def inner():
        nonlocal x  # Refers to x in enclosing (outer) scope
        x += 5      # Modify outer's x: 10 + 5 = 15
        return x    # Return 15
    
    return inner()  # Call inner() and return its result
 
# First call: outer()
# - x starts at 10 in a NEW local scope
# - inner() adds 5: x = 15
# - Returns 15
print(outer())  # 15
 
# Second call: outer()
# - FRESH call creates NEW local scope
# - x starts at 10 again (not preserved from first call)
# - inner() adds 5: x = 15
# - Returns 15
print(outer())  # 15

Answers:

  • First print(outer()) → 15
  • Second print(outer()) → 15

Key concepts:

  1. nonlocal modifies variable in enclosing scope (not global)
  2. Each call to outer() creates a fresh local scope
  3. Variable x is NOT preserved between calls

Q1.4 (5 points)

nums = [1, 2, 3, 4, 5]
result = [x**2 for x in nums if x % 2 == 1]
print(result)
print(sum(result))

💡 Click to View Answer & Explanation

Step-by-step breakdown:

nums = [1, 2, 3, 4, 5]
 
# List comprehension with filter
# Pattern: [expression for item in iterable if condition]
result = [x**2 for x in nums if x % 2 == 1]
 
# Step-by-step execution:
# x=1: 1 % 2 == 1? True  → 1**2 = 1   → include
# x=2: 2 % 2 == 1? False → skip
# x=3: 3 % 2 == 1? True  → 3**2 = 9   → include
# x=4: 4 % 2 == 1? False → skip
# x=5: 5 % 2 == 1? True  → 5**2 = 25  → include
 
# Result: [1, 9, 25]
 
print(result)       # [1, 9, 25]
print(sum(result))  # 1 + 9 + 25 = 35

Answers:

  • print(result) → [1, 9, 25]
  • print(sum(result)) → 35

Breakdown:

  • Filter: odd numbers only (1, 3, 5)
  • Transform: square each (1, 9, 25)
  • Sum: 1 + 9 + 25 = 35

Question 2: Code Writing (30 points)

Choose 3 out of 5 questions. Each worth 10 points.

Q2.1 - Count Vowels (10 points)

Write a function count_vowels(text) that:

  • Counts vowels (a, e, i, o, u) - case insensitive
  • Returns the count as an integer

💡 Click to View Verified Answer

def count_vowels(text):
    """
    Count the number of vowels in a string.
    
    Vowels are: a, e, i, o, u (case insensitive)
    
    Args:
        text: Input string to analyze
        
    Returns:
        int: Number of vowels found
        
    Examples:
        >>> count_vowels("Hello World")
        3
        >>> count_vowels("AEIOU")
        5
    """
    # Define vowels (both cases for easy comparison)
    vowels = "aeiouAEIOU"
    
    # Initialize counter
    count = 0
    
    # Iterate through each character in the text
    for char in text:
        # Check if character is a vowel
        if char in vowels:
            count += 1
    
    return count
 
 
# Alternative: More Pythonic one-liner
def count_vowels_v2(text):
    """One-liner using generator expression and sum."""
    return sum(1 for char in text.lower() if char in 'aeiou')
 
 
# Alternative: Using count method
def count_vowels_v3(text):
    """Using str.count() for each vowel."""
    text_lower = text.lower()
    return sum(text_lower.count(v) for v in 'aeiou')
 
 
# ===== Test Cases =====
if __name__ == "__main__":
    test_cases = [
        ("Hello World", 3),      # e, o, o
        ("AEIOU", 5),            # all uppercase vowels
        ("rhythm", 0),           # no vowels
        ("", 0),                 # empty string
        ("AaEeIiOoUu", 10),      # mixed case
    ]
    
    print("Testing count_vowels:")
    for text, expected in test_cases:
        result = count_vowels(text)
        status = "✓" if result == expected else "✗"
        print(f'  {status} count_vowels("{text}") = {result} (expected {expected})')

Test Output:

Testing count_vowels:
  ✓ count_vowels("Hello World") = 3 (expected 3)
  ✓ count_vowels("AEIOU") = 5 (expected 5)
  ✓ count_vowels("rhythm") = 0 (expected 0)
  ✓ count_vowels("") = 0 (expected 0)
  ✓ count_vowels("AaEeIiOoUu") = 10 (expected 10)

Key points:

  1. Handle both uppercase and lowercase
  2. Use in operator for membership test
  3. Simple counter pattern

Q2.2 - Word Frequency Dictionary (10 points)

Write a function word_frequency(words) that:

  • Input: list of words
  • Return: dictionary with word counts
  • Example: ['a', 'b', 'a'] → {'a': 2, 'b': 1}

💡 Click to View Verified Answer

def word_frequency(words):
    """
    Count frequency of each word in a list.
    
    Args:
        words: List of words (strings)
        
    Returns:
        dict: Dictionary mapping each word to its count
        
    Examples:
        >>> word_frequency(['a', 'b', 'a'])
        {'a': 2, 'b': 1}
    """
    # Initialize empty frequency dictionary
    freq = {}
    
    # Count each word
    for word in words:
        if word in freq:
            # Word seen before - increment count
            freq[word] += 1
        else:
            # First occurrence - initialize count to 1
            freq[word] = 1
    
    return freq
 
 
# Alternative: Using dict.get()
def word_frequency_v2(words):
    """Using get() method to simplify logic."""
    freq = {}
    for word in words:
        # get(key, default) returns default if key doesn't exist
        freq[word] = freq.get(word, 0) + 1
    return freq
 
 
# Alternative: Using collections.Counter
def word_frequency_v3(words):
    """Using Counter from collections (most Pythonic)."""
    from collections import Counter
    return dict(Counter(words))
 
 
# ===== Test Cases =====
if __name__ == "__main__":
    test_cases = [
        ['a', 'b', 'a'],
        ['hello', 'world', 'hello', 'hello'],
        [],
        ['single'],
    ]
    
    print("Testing word_frequency:")
    for words in test_cases:
        result = word_frequency(words)
        print(f"  {words} → {result}")

Test Output:

Testing word_frequency:
  ['a', 'b', 'a'] → {'a': 2, 'b': 1}
  ['hello', 'world', 'hello', 'hello'] → {'hello': 3, 'world': 1}
  [] → {}
  ['single'] → {'single': 1}

Key techniques:

  1. Check if key exists before incrementing
  2. Alternative: use dict.get(key, default)
  3. Best practice: use collections.Counter

Q2.3 - Multiplication Table (10 points)

Write a function multiplication_table(n) that:

  • Prints an n×n multiplication table
  • Format: 1 x 1 = 1

💡 Click to View Verified Answer

def multiplication_table(n):
    """
    Print an n×n multiplication table.
    
    Args:
        n: Size of the table (positive integer)
        
    Example output for n=3:
        1 x 1 = 1
        1 x 2 = 2
        1 x 3 = 3
        2 x 1 = 2
        ...
        3 x 3 = 9
    """
    # Validate input
    if n <= 0:
        print("Please provide a positive integer.")
        return
    
    # Outer loop: rows (first multiplier)
    for i in range(1, n + 1):
        
        # Inner loop: columns (second multiplier)
        for j in range(1, n + 1):
            
            # Calculate product
            product = i * j
            
            # Print formatted result using f-string
            print(f"{i} x {j} = {product}")
 
 
# Alternative: Compact table format
def multiplication_table_compact(n):
    """Print table in grid format."""
    for i in range(1, n + 1):
        row = ""
        for j in range(1, n + 1):
            row += f"{i*j:4}"  # 4-character width for alignment
        print(row)
 
 
# ===== Test Cases =====
if __name__ == "__main__":
    print("3x3 Multiplication Table:")
    print("-" * 20)
    multiplication_table(3)
    
    print("\n3x3 Compact Format:")
    print("-" * 20)
    multiplication_table_compact(3)

Test Output:

3x3 Multiplication Table:
--------------------
1 x 1 = 1
1 x 2 = 2
1 x 3 = 3
2 x 1 = 2
2 x 2 = 4
2 x 3 = 6
3 x 1 = 3
3 x 2 = 6
3 x 3 = 9

3x3 Compact Format:
--------------------
   1   2   3
   2   4   6
   3   6   9

Key concepts:

  1. Nested loops for 2D iteration
  2. range(1, n+1) to start from 1
  3. f-strings for formatted output

Q2.4 - Find Max and Min (10 points)

Write a function find_max_min(numbers) that:

  • Input: list of numbers
  • Return: tuple (maximum, minimum, difference)
  • Handle empty list by returning (None, None, None)

💡 Click to View Verified Answer

def find_max_min(numbers):
    """
    Find maximum, minimum, and their difference in a list.
    
    Args:
        numbers: List of numeric values
        
    Returns:
        tuple: (maximum, minimum, difference) or (None, None, None) if empty
        
    Examples:
        >>> find_max_min([5, 2, 8, 1, 9])
        (9, 1, 8)
        >>> find_max_min([])
        (None, None, None)
    """
    # Handle empty list edge case
    # IMPORTANT: Check this first to avoid errors with min()/max()
    if not numbers:  # Empty list is falsy in Python
        return (None, None, None)
    
    # Find maximum and minimum using built-in functions
    maximum = max(numbers)
    minimum = min(numbers)
    
    # Calculate difference (range of values)
    difference = maximum - minimum
    
    return (maximum, minimum, difference)
 
 
# Alternative: Without using built-in min/max
def find_max_min_manual(numbers):
    """Manual implementation without min()/max()."""
    if not numbers:
        return (None, None, None)
    
    # Initialize with first element
    maximum = numbers[0]
    minimum = numbers[0]
    
    # Iterate through remaining elements
    for num in numbers[1:]:
        if num > maximum:
            maximum = num
        if num < minimum:
            minimum = num
    
    return (maximum, minimum, maximum - minimum)
 
 
# ===== Test Cases =====
if __name__ == "__main__":
    test_cases = [
        [5, 2, 8, 1, 9],      # Normal case
        [3],                   # Single element
        [],                    # Empty list
        [-5, -2, -8, -1],     # Negative numbers
        [1, 1, 1, 1],         # All same
    ]
    
    print("Testing find_max_min:")
    for nums in test_cases:
        result = find_max_min(nums)
        print(f"  {nums} → max={result[0]}, min={result[1]}, diff={result[2]}")

Test Output:

Testing find_max_min:
  [5, 2, 8, 1, 9] → max=9, min=1, diff=8
  [3] → max=3, min=3, diff=0
  [] → max=None, min=None, diff=None
  [-5, -2, -8, -1] → max=-1, min=-8, diff=7
  [1, 1, 1, 1] → max=1, min=1, diff=0

Key points:

  1. ALWAYS handle empty list first
  2. Use built-in min() and max() for efficiency
  3. Return a tuple, not a list

Q2.5 - Factorial (Recursive) (10 points)

Write a function factorial(n) that:

  • Calculates n! recursively
  • Handle: 0! = 1, negative returns None

💡 Click to View Verified Answer

def factorial(n):
    """
    Calculate factorial of n using recursion.
    
    Factorial definition:
    - n! = n × (n-1) × (n-2) × ... × 2 × 1
    - 0! = 1 (by definition)
    - Negative numbers: undefined (return None)
    
    Args:
        n: Non-negative integer
        
    Returns:
        int: n! or None for negative input
        
    Examples:
        >>> factorial(5)
        120
        >>> factorial(0)
        1
    """
    # Handle negative input
    if n < 0:
        return None
    
    # Base case: 0! = 1 and 1! = 1
    if n == 0 or n == 1:
        return 1
    
    # Recursive case: n! = n × (n-1)!
    return n * factorial(n - 1)
 
 
# Trace for factorial(4):
# factorial(4) = 4 × factorial(3)
#              = 4 × (3 × factorial(2))
#              = 4 × (3 × (2 × factorial(1)))
#              = 4 × (3 × (2 × 1))
#              = 4 × (3 × 2)
#              = 4 × 6
#              = 24
 
 
# Alternative: Iterative version (no recursion)
def factorial_iterative(n):
    """Calculate factorial using iteration."""
    if n < 0:
        return None
    
    result = 1
    for i in range(2, n + 1):
        result *= i
    return result
 
 
# ===== Test Cases =====
if __name__ == "__main__":
    test_cases = [
        (0, 1),      # 0! = 1
        (1, 1),      # 1! = 1
        (5, 120),    # 5! = 120
        (10, 3628800),
        (-5, None),  # Negative
    ]
    
    print("Testing factorial:")
    for n, expected in test_cases:
        result = factorial(n)
        status = "✓" if result == expected else "✗"
        print(f"  {status} factorial({n}) = {result} (expected {expected})")

Test Output:

Testing factorial:
  ✓ factorial(0) = 1 (expected 1)
  ✓ factorial(1) = 1 (expected 1)
  ✓ factorial(5) = 120 (expected 120)
  ✓ factorial(10) = 3628800 (expected 3628800)
  ✓ factorial(-5) = None (expected None)

Recursion components:

  1. Base case: stops recursion (n=0 or n=1)
  2. Recursive case: breaks problem into smaller subproblem
  3. Progress: n decreases each call, eventually reaching base case

Question 3: Pandas & SVM/Random Forest (25 points)

Part A: Data Preprocessing (15 points)

Given this DataFrame:

import pandas as pd
from sklearn.preprocessing import LabelEncoder, StandardScaler
 
data = {
    'Age': [25, 30, None, 35, 40],
    'Income': [30000, 50000, 45000, None, 60000],
    'Education': ['High School', 'Bachelor', 'Master', 'PhD', 'Bachelor'],
    'Purchased': ['No', 'Yes', 'Yes', 'No', 'Yes']
}
df = pd.DataFrame(data)

Q3.A1 (5 points) Fill missing Age with median, missing Income with mean.

💡 Click to View Verified Answer

import pandas as pd
 
# Create the DataFrame
data = {
    'Age': [25, 30, None, 35, 40],
    'Income': [30000, 50000, 45000, None, 60000],
    'Education': ['High School', 'Bachelor', 'Master', 'PhD', 'Bachelor'],
    'Purchased': ['No', 'Yes', 'Yes', 'No', 'Yes']
}
df = pd.DataFrame(data)
 
print("Original DataFrame:")
print(df)
print()
 
# Calculate statistics before filling
age_median = df['Age'].median()    # Median of [25, 30, 35, 40] = 32.5
income_mean = df['Income'].mean()  # Mean of [30000, 50000, 45000, 60000] = 46250
 
print(f"Age median (excluding NaN): {age_median}")
print(f"Income mean (excluding NaN): {income_mean}")
print()
 
# Fill missing values
# Method 1: Using fillna with inplace
df['Age'].fillna(age_median, inplace=True)
df['Income'].fillna(income_mean, inplace=True)
 
# Method 2: Using assignment (alternative)
# df['Age'] = df['Age'].fillna(df['Age'].median())
# df['Income'] = df['Income'].fillna(df['Income'].mean())
 
print("After filling missing values:")
print(df)

Calculations:

  • Age values (excluding NaN): [25, 30, 35, 40]
  • Age median: (30 + 35) / 2 = 32.5
  • Income values (excluding NaN): [30000, 50000, 45000, 60000]
  • Income mean: (30000 + 50000 + 45000 + 60000) / 4 = 46250

Result:

  • Row 2: Age filled with 32.5
  • Row 3: Income filled with 46250.0

Q3.A2 (5 points) Encode 'Education' using LabelEncoder. Show the mapping.

💡 Click to View Verified Answer

from sklearn.preprocessing import LabelEncoder
 
# Create LabelEncoder instance
le = LabelEncoder()
 
# Fit and transform the Education column
df['Education_Encoded'] = le.fit_transform(df['Education'])
 
print("Encoding result:")
print(df[['Education', 'Education_Encoded']])
print()
 
# Show the mapping (classes are sorted alphabetically)
print("LabelEncoder mapping:")
for i, label in enumerate(le.classes_):
    print(f"  '{label}' → {i}")

LabelEncoder sorts alphabetically then assigns 0, 1, 2, ...:

OriginalEncoded
Bachelor0
High School1
Master2
PhD3

Encoded column: [1, 0, 2, 3, 0]

Important: LabelEncoder assigns integers based on alphabetical order, not order of appearance!


Q3.A3 (5 points) When should you use StandardScaler vs MinMaxScaler?

💡 Click to View Answer

ScalerFormulaOutputBest For
StandardScaler(x - mean) / stdMean=0, Std=1SVM, Logistic Regression, data with outliers
MinMaxScaler(x - min) / (max - min)[0, 1]Neural Networks, KNN, image data

Use StandardScaler when:

  • Data is approximately normally distributed
  • You want to preserve outlier information
  • Using algorithms like SVM, Linear Regression

Use MinMaxScaler when:

  • You need bounded output (0 to 1)
  • Working with neural networks or image data
  • Outliers are not a concern

Quick rule:

  • SVM, Linear models → StandardScaler
  • Neural networks, KNN → MinMaxScaler

Part B: SVM & Random Forest Theory (10 points)

Q3.B1 (5 points) Explain the "kernel trick" in SVM.

💡 Click to View Answer

Kernel Trick Explanation:

Problem: Some data is not linearly separable in its original space.

Solution: The kernel trick transforms data into a higher-dimensional space where it becomes linearly separable.

How it works:

  1. Original 2D data might have circular boundaries (can't draw a straight line)
  2. Transform to 3D using a kernel function
  3. In 3D, a flat plane can now separate the classes
  4. The "trick": compute this efficiently without actually computing the transformation

Common kernels:

KernelUse Case
LinearAlready linearly separable
RBF (Radial Basis Function)Default choice, works well for most cases
PolynomialData with polynomial relationships

Example in code:

from sklearn.svm import SVC
 
# Linear kernel
model_linear = SVC(kernel='linear')
 
# RBF kernel (default)
model_rbf = SVC(kernel='rbf')
 
# Polynomial kernel
model_poly = SVC(kernel='poly', degree=3)

Q3.B2 (5 points) What is "bagging" in Random Forest? Why does it help?

💡 Click to View Answer

Bagging (Bootstrap Aggregating):

Process:

  1. Create multiple random subsets of training data (with replacement)
  2. Train a separate decision tree on each subset
  3. Combine predictions:
    • Classification: majority voting
    • Regression: average

Why it helps:

  1. Reduces Overfitting

    • Each tree sees different data
    • Individual tree errors cancel out
    • Ensemble is more robust
  2. Reduces Variance

    • Averaging many predictions is more stable
    • Less sensitive to noise in training data
  3. Handles Outliers Better

    • Outliers only affect some trees, not all
    • Their influence is diluted in the ensemble
  4. Better Generalization

    • Collective wisdom outperforms single tree
    • Works well on unseen data

Analogy: Like asking 100 doctors for diagnosis instead of 1 - the collective opinion is usually more reliable.


Question 4: Naive Bayes & Decision Tree (25 points)

Part A: Naive Bayes Calculation (15 points)

Dataset: Email classification

EmailContains "Free"Contains "Winner"Spam?
1YesYesSpam
2YesNoSpam
3NoYesSpam
4NoNoNot Spam
5YesNoNot Spam
6NoNoNot Spam

Q4.A1 (10 points) A new email contains "Free" but not "Winner". Calculate P(Spam|Free=Yes, Winner=No).

💡 Click to View Verified Answer

Naive Bayes Formula: $P(Class|Features) \propto P(Class) \times \prod P(Feature|Class)$

Step 1: Calculate Prior Probabilities

ClassCountP(Class)
Spam3 (emails 1,2,3)3/6 = 0.5
Not Spam3 (emails 4,5,6)3/6 = 0.5

Step 2: Calculate Likelihoods

For Spam emails (1, 2, 3):

  • P(Free=Yes | Spam) = 2/3 (emails 1, 2 have Free)
  • P(Winner=No | Spam) = 1/3 (only email 2 has Winner=No)

For Not Spam emails (4, 5, 6):

  • P(Free=Yes | Not Spam) = 1/3 (only email 5)
  • P(Winner=No | Not Spam) = 3/3 = 1 (all three)

Step 3: Calculate Unnormalized Posteriors

$P(Spam|evidence) \propto P(Spam) \times P(Free=Yes|Spam) \times P(Winner=No|Spam)$ $= 0.5 \times \frac{2}{3} \times \frac{1}{3} = 0.5 \times 0.667 \times 0.333 = 0.111$

$P(NotSpam|evidence) \propto 0.5 \times \frac{1}{3} \times 1 = 0.167$

Step 4: Normalize

$P(Spam) = \frac{0.111}{0.111 + 0.167} = \frac{0.111}{0.278} = 0.40$

Answer: P(Spam | Free=Yes, Winner=No) = 0.40 = 40%

Prediction: NOT SPAM (probability < 50%)


Q4.A2 (5 points) What is the "naive" assumption in Naive Bayes? When might it fail?

💡 Click to View Answer

The "Naive" Assumption:

  • Features are conditionally independent given the class
  • P(A, B | Class) = P(A | Class) × P(B | Class)
  • Each feature contributes independently to the prediction

When it fails:

  1. Correlated features

    • Example: "Free" and "Prize" often appear together in spam
    • Treating them as independent overcounts their combined effect
  2. Redundant features

    • Example: Having both "temperature in °C" and "temperature in °F"
    • These are perfectly correlated, violating independence
  3. Feature interactions matter

    • Example: Medical diagnosis where symptom combinations are important
    • Symptom A alone is harmless, but A+B together indicates disease

Despite this limitation: Naive Bayes often works surprisingly well in practice, especially for:

  • Text classification
  • Spam detection
  • Sentiment analysis

Part B: Information Gain (10 points)

Q4.B1 (10 points) Calculate Information Gain for the "Contains Free" feature.

Original dataset: 3 Spam, 3 Not Spam

💡 Click to View Verified Answer

Entropy Formula: $H(S) = -\sum p_i \log_2(p_i)$

Step 1: Parent Entropy (3 Spam, 3 Not Spam)

$H(parent) = -0.5 \log_2(0.5) - 0.5 \log_2(0.5)$ $= -0.5 \times (-1) - 0.5 \times (-1)$ $= 0.5 + 0.5 = 1.0$

(Maximum entropy for binary classification = 1.0)

Step 2: Split by "Contains Free"

Free=Yes (3 emails: 2 Spam, 1 Not Spam): $H = -\frac{2}{3} \log_2(\frac{2}{3}) - \frac{1}{3} \log_2(\frac{1}{3})$ $= -0.667 \times (-0.585) - 0.333 \times (-1.585)$ $= 0.390 + 0.528 = 0.918$

Free=No (3 emails: 1 Spam, 2 Not Spam): $H = -\frac{1}{3} \log_2(\frac{1}{3}) - \frac{2}{3} \log_2(\frac{2}{3})$ $= 0.528 + 0.390 = 0.918$

Step 3: Weighted Average Entropy $H(children) = \frac{3}{6} \times 0.918 + \frac{3}{6} \times 0.918 = 0.918$

Step 4: Information Gain $IG = H(parent) - H(children) = 1.0 - 0.918 = 0.082$

Answer: Information Gain = 0.082 bits

Interpretation: "Contains Free" provides a small amount of information for classification. Higher IG would indicate a better split.


🏁 End of Exam

QuestionTopicPoints
Q1Python Output Analysis20
Q2Code Writing (choose 3/5)30
Q3Pandas & SVM/Random Forest25
Q4Naive Bayes & Decision Tree25
Total100

📝 Key Formulas Reference

ConceptFormula
Gini Index1 - Σ(pᵢ²)
Entropy-Σ pᵢ log₂(pᵢ)
Info GainH(parent) - Σ weighted H(children)
BayesP(A|B) ∝ P(B|A) × P(A)
Z-score(x - μ) / σ
MinMax(x - min) / (max - min)

All code verified and tested. Show your work for partial credit. Good luck!

#ABW505#Mock Exam#Python#Machine Learning#Practice Test

💬 Comments