ABW505 Mock Exam 2 - Python & Machine Learning

📋 Exam Information

Item	Details
Total Points	100
Time Allowed	90 minutes
Format	Closed book, calculator allowed
Structure	Q1 (20pts) + Q2 (30pts, choose 3/5) + Q3 (25pts) + Q4 (25pts)

Question 1: Python Output Analysis (20 points)

Answer ALL. Determine exact output.

Q1.1 (5 points)

a = [1, 2, 3]
b = a
b.append(4)
print(a)
print(a is b)

💡 Click to View Answer & Explanation

Step-by-step breakdown:

# Step 1: Create a list and assign to variable 'a'
a = [1, 2, 3]
# Memory: a points to list object [1, 2, 3]
 
# Step 2: Assign 'a' to 'b'
b = a
# IMPORTANT: This does NOT copy the list!
# Both 'a' and 'b' now point to the SAME list object in memory
# Memory: a → [1, 2, 3] ← b
 
# Step 3: Modify list through 'b'
b.append(4)
# Since a and b point to the same object, 
# the change is visible through both variables
# Memory: a → [1, 2, 3, 4] ← b
 
# Step 4: Print results
print(a)        # [1, 2, 3, 4] - modified through b
print(a is b)   # True - same object in memory

Answers:

print(a) → [1, 2, 3, 4]
print(a is b) → True

Key concept: In Python, assignment creates a REFERENCE, not a copy.

To create an independent copy:

b = a.copy()     # Method 1: copy() method
b = a[:]         # Method 2: slice notation
b = list(a)      # Method 3: list constructor

Q1.2 (5 points)

text = "Hello World"
print(text[0:5:2])
print(text[-5:-1])

💡 Click to View Answer & Explanation

Step-by-step breakdown:

text = "Hello World"
 
# Index map:
# Character: H   e   l   l   o       W   o   r   l   d
# Positive:  0   1   2   3   4   5   6   7   8   9   10
# Negative:-11 -10  -9  -8  -7  -6  -5  -4  -3  -2  -1
 
# Line 1: text[0:5:2]
# Format: [start:stop:step]
# start=0 (H), stop=5 (exclusive), step=2 (every 2nd char)
# Indices: 0, 2, 4 → Characters: 'H', 'l', 'o'
result1 = text[0:5:2]  # "Hlo"
 
# Line 2: text[-5:-1]
# start=-5 (W), stop=-1 (exclusive, before 'd')
# Indices: -5, -4, -3, -2 → Characters: 'W', 'o', 'r', 'l'
result2 = text[-5:-1]  # "Worl"

Answers:

text[0:5:2] → Hlo
text[-5:-1] → Worl

Slicing syntax: [start:stop:step]

start: inclusive (default: 0)
stop: exclusive (default: end)
step: increment (default: 1)

Q1.3 (5 points)

def outer():
    x = 10
    def inner():
        nonlocal x
        x += 5
        return x
    return inner()
 
print(outer())
print(outer())

💡 Click to View Answer & Explanation

Step-by-step breakdown:

def outer():
    x = 10  # Local variable in outer's scope
    
    def inner():
        nonlocal x  # Refers to x in enclosing (outer) scope
        x += 5      # Modify outer's x: 10 + 5 = 15
        return x    # Return 15
    
    return inner()  # Call inner() and return its result
 
# First call: outer()
# - x starts at 10 in a NEW local scope
# - inner() adds 5: x = 15
# - Returns 15
print(outer())  # 15
 
# Second call: outer()
# - FRESH call creates NEW local scope
# - x starts at 10 again (not preserved from first call)
# - inner() adds 5: x = 15
# - Returns 15
print(outer())  # 15

Answers:

First print(outer()) → 15
Second print(outer()) → 15

Key concepts:

nonlocal modifies variable in enclosing scope (not global)
Each call to outer() creates a fresh local scope
Variable x is NOT preserved between calls

Q1.4 (5 points)

nums = [1, 2, 3, 4, 5]
result = [x**2 for x in nums if x % 2 == 1]
print(result)
print(sum(result))

💡 Click to View Answer & Explanation

Step-by-step breakdown:

nums = [1, 2, 3, 4, 5]
 
# List comprehension with filter
# Pattern: [expression for item in iterable if condition]
result = [x**2 for x in nums if x % 2 == 1]
 
# Step-by-step execution:
# x=1: 1 % 2 == 1? True  → 1**2 = 1   → include
# x=2: 2 % 2 == 1? False → skip
# x=3: 3 % 2 == 1? True  → 3**2 = 9   → include
# x=4: 4 % 2 == 1? False → skip
# x=5: 5 % 2 == 1? True  → 5**2 = 25  → include
 
# Result: [1, 9, 25]
 
print(result)       # [1, 9, 25]
print(sum(result))  # 1 + 9 + 25 = 35

Answers:

print(result) → [1, 9, 25]
print(sum(result)) → 35

Breakdown:

Filter: odd numbers only (1, 3, 5)
Transform: square each (1, 9, 25)
Sum: 1 + 9 + 25 = 35

Question 2: Code Writing (30 points)

Choose 3 out of 5 questions. Each worth 10 points.

Q2.1 - Count Vowels (10 points)

Write a function count_vowels(text) that:

Counts vowels (a, e, i, o, u) - case insensitive
Returns the count as an integer

💡 Click to View Verified Answer

def count_vowels(text):
    """
    Count the number of vowels in a string.
    
    Vowels are: a, e, i, o, u (case insensitive)
    
    Args:
        text: Input string to analyze
        
    Returns:
        int: Number of vowels found
        
    Examples:
        >>> count_vowels("Hello World")
        3
        >>> count_vowels("AEIOU")
        5
    """
    # Define vowels (both cases for easy comparison)
    vowels = "aeiouAEIOU"
    
    # Initialize counter
    count = 0
    
    # Iterate through each character in the text
    for char in text:
        # Check if character is a vowel
        if char in vowels:
            count += 1
    
    return count
 
 
# Alternative: More Pythonic one-liner
def count_vowels_v2(text):
    """One-liner using generator expression and sum."""
    return sum(1 for char in text.lower() if char in 'aeiou')
 
 
# Alternative: Using count method
def count_vowels_v3(text):
    """Using str.count() for each vowel."""
    text_lower = text.lower()
    return sum(text_lower.count(v) for v in 'aeiou')
 
 
# ===== Test Cases =====
if __name__ == "__main__":
    test_cases = [
        ("Hello World", 3),      # e, o, o
        ("AEIOU", 5),            # all uppercase vowels
        ("rhythm", 0),           # no vowels
        ("", 0),                 # empty string
        ("AaEeIiOoUu", 10),      # mixed case
    ]
    
    print("Testing count_vowels:")
    for text, expected in test_cases:
        result = count_vowels(text)
        status = "✓" if result == expected else "✗"
        print(f'  {status} count_vowels("{text}") = {result} (expected {expected})')

Test Output:

Testing count_vowels:
  ✓ count_vowels("Hello World") = 3 (expected 3)
  ✓ count_vowels("AEIOU") = 5 (expected 5)
  ✓ count_vowels("rhythm") = 0 (expected 0)
  ✓ count_vowels("") = 0 (expected 0)
  ✓ count_vowels("AaEeIiOoUu") = 10 (expected 10)

Key points:

Handle both uppercase and lowercase
Use in operator for membership test
Simple counter pattern

Q2.2 - Word Frequency Dictionary (10 points)

Write a function word_frequency(words) that:

Input: list of words
Return: dictionary with word counts
Example: ['a', 'b', 'a'] → {'a': 2, 'b': 1}

💡 Click to View Verified Answer

def word_frequency(words):
    """
    Count frequency of each word in a list.
    
    Args:
        words: List of words (strings)
        
    Returns:
        dict: Dictionary mapping each word to its count
        
    Examples:
        >>> word_frequency(['a', 'b', 'a'])
        {'a': 2, 'b': 1}
    """
    # Initialize empty frequency dictionary
    freq = {}
    
    # Count each word
    for word in words:
        if word in freq:
            # Word seen before - increment count
            freq[word] += 1
        else:
            # First occurrence - initialize count to 1
            freq[word] = 1
    
    return freq
 
 
# Alternative: Using dict.get()
def word_frequency_v2(words):
    """Using get() method to simplify logic."""
    freq = {}
    for word in words:
        # get(key, default) returns default if key doesn't exist
        freq[word] = freq.get(word, 0) + 1
    return freq
 
 
# Alternative: Using collections.Counter
def word_frequency_v3(words):
    """Using Counter from collections (most Pythonic)."""
    from collections import Counter
    return dict(Counter(words))
 
 
# ===== Test Cases =====
if __name__ == "__main__":
    test_cases = [
        ['a', 'b', 'a'],
        ['hello', 'world', 'hello', 'hello'],
        [],
        ['single'],
    ]
    
    print("Testing word_frequency:")
    for words in test_cases:
        result = word_frequency(words)
        print(f"  {words} → {result}")

Test Output:

Testing word_frequency:
  ['a', 'b', 'a'] → {'a': 2, 'b': 1}
  ['hello', 'world', 'hello', 'hello'] → {'hello': 3, 'world': 1}
  [] → {}
  ['single'] → {'single': 1}

Key techniques:

Check if key exists before incrementing
Alternative: use dict.get(key, default)
Best practice: use collections.Counter

Q2.3 - Multiplication Table (10 points)

Write a function multiplication_table(n) that:

Prints an n×n multiplication table
Format: 1 x 1 = 1

💡 Click to View Verified Answer

def multiplication_table(n):
    """
    Print an n×n multiplication table.
    
    Args:
        n: Size of the table (positive integer)
        
    Example output for n=3:
        1 x 1 = 1
        1 x 2 = 2
        1 x 3 = 3
        2 x 1 = 2
        ...
        3 x 3 = 9
    """
    # Validate input
    if n <= 0:
        print("Please provide a positive integer.")
        return
    
    # Outer loop: rows (first multiplier)
    for i in range(1, n + 1):
        
        # Inner loop: columns (second multiplier)
        for j in range(1, n + 1):
            
            # Calculate product
            product = i * j
            
            # Print formatted result using f-string
            print(f"{i} x {j} = {product}")
 
 
# Alternative: Compact table format
def multiplication_table_compact(n):
    """Print table in grid format."""
    for i in range(1, n + 1):
        row = ""
        for j in range(1, n + 1):
            row += f"{i*j:4}"  # 4-character width for alignment
        print(row)
 
 
# ===== Test Cases =====
if __name__ == "__main__":
    print("3x3 Multiplication Table:")
    print("-" * 20)
    multiplication_table(3)
    
    print("\n3x3 Compact Format:")
    print("-" * 20)
    multiplication_table_compact(3)

Test Output:

3x3 Multiplication Table:
--------------------
1 x 1 = 1
1 x 2 = 2
1 x 3 = 3
2 x 1 = 2
2 x 2 = 4
2 x 3 = 6
3 x 1 = 3
3 x 2 = 6
3 x 3 = 9

3x3 Compact Format:
--------------------
   1   2   3
   2   4   6
   3   6   9

Key concepts:

Nested loops for 2D iteration
range(1, n+1) to start from 1
f-strings for formatted output

Q2.4 - Find Max and Min (10 points)

Write a function find_max_min(numbers) that:

Input: list of numbers
Return: tuple (maximum, minimum, difference)
Handle empty list by returning (None, None, None)

💡 Click to View Verified Answer

def find_max_min(numbers):
    """
    Find maximum, minimum, and their difference in a list.
    
    Args:
        numbers: List of numeric values
        
    Returns:
        tuple: (maximum, minimum, difference) or (None, None, None) if empty
        
    Examples:
        >>> find_max_min([5, 2, 8, 1, 9])
        (9, 1, 8)
        >>> find_max_min([])
        (None, None, None)
    """
    # Handle empty list edge case
    # IMPORTANT: Check this first to avoid errors with min()/max()
    if not numbers:  # Empty list is falsy in Python
        return (None, None, None)
    
    # Find maximum and minimum using built-in functions
    maximum = max(numbers)
    minimum = min(numbers)
    
    # Calculate difference (range of values)
    difference = maximum - minimum
    
    return (maximum, minimum, difference)
 
 
# Alternative: Without using built-in min/max
def find_max_min_manual(numbers):
    """Manual implementation without min()/max()."""
    if not numbers:
        return (None, None, None)
    
    # Initialize with first element
    maximum = numbers[0]
    minimum = numbers[0]
    
    # Iterate through remaining elements
    for num in numbers[1:]:
        if num > maximum:
            maximum = num
        if num < minimum:
            minimum = num
    
    return (maximum, minimum, maximum - minimum)
 
 
# ===== Test Cases =====
if __name__ == "__main__":
    test_cases = [
        [5, 2, 8, 1, 9],      # Normal case
        [3],                   # Single element
        [],                    # Empty list
        [-5, -2, -8, -1],     # Negative numbers
        [1, 1, 1, 1],         # All same
    ]
    
    print("Testing find_max_min:")
    for nums in test_cases:
        result = find_max_min(nums)
        print(f"  {nums} → max={result[0]}, min={result[1]}, diff={result[2]}")

Test Output:

Testing find_max_min:
  [5, 2, 8, 1, 9] → max=9, min=1, diff=8
  [3] → max=3, min=3, diff=0
  [] → max=None, min=None, diff=None
  [-5, -2, -8, -1] → max=-1, min=-8, diff=7
  [1, 1, 1, 1] → max=1, min=1, diff=0

Key points:

ALWAYS handle empty list first
Use built-in min() and max() for efficiency
Return a tuple, not a list

Q2.5 - Factorial (Recursive) (10 points)

Write a function factorial(n) that:

Calculates n! recursively
Handle: 0! = 1, negative returns None

💡 Click to View Verified Answer

def factorial(n):
    """
    Calculate factorial of n using recursion.
    
    Factorial definition:
    - n! = n × (n-1) × (n-2) × ... × 2 × 1
    - 0! = 1 (by definition)
    - Negative numbers: undefined (return None)
    
    Args:
        n: Non-negative integer
        
    Returns:
        int: n! or None for negative input
        
    Examples:
        >>> factorial(5)
        120
        >>> factorial(0)
        1
    """
    # Handle negative input
    if n < 0:
        return None
    
    # Base case: 0! = 1 and 1! = 1
    if n == 0 or n == 1:
        return 1
    
    # Recursive case: n! = n × (n-1)!
    return n * factorial(n - 1)
 
 
# Trace for factorial(4):
# factorial(4) = 4 × factorial(3)
#              = 4 × (3 × factorial(2))
#              = 4 × (3 × (2 × factorial(1)))
#              = 4 × (3 × (2 × 1))
#              = 4 × (3 × 2)
#              = 4 × 6
#              = 24
 
 
# Alternative: Iterative version (no recursion)
def factorial_iterative(n):
    """Calculate factorial using iteration."""
    if n < 0:
        return None
    
    result = 1
    for i in range(2, n + 1):
        result *= i
    return result
 
 
# ===== Test Cases =====
if __name__ == "__main__":
    test_cases = [
        (0, 1),      # 0! = 1
        (1, 1),      # 1! = 1
        (5, 120),    # 5! = 120
        (10, 3628800),
        (-5, None),  # Negative
    ]
    
    print("Testing factorial:")
    for n, expected in test_cases:
        result = factorial(n)
        status = "✓" if result == expected else "✗"
        print(f"  {status} factorial({n}) = {result} (expected {expected})")

Test Output:

Testing factorial:
  ✓ factorial(0) = 1 (expected 1)
  ✓ factorial(1) = 1 (expected 1)
  ✓ factorial(5) = 120 (expected 120)
  ✓ factorial(10) = 3628800 (expected 3628800)
  ✓ factorial(-5) = None (expected None)

Recursion components:

Base case: stops recursion (n=0 or n=1)
Recursive case: breaks problem into smaller subproblem
Progress: n decreases each call, eventually reaching base case

Question 3: Pandas & SVM/Random Forest (25 points)

Part A: Data Preprocessing (15 points)

Given this DataFrame:

import pandas as pd
from sklearn.preprocessing import LabelEncoder, StandardScaler
 
data = {
    'Age': [25, 30, None, 35, 40],
    'Income': [30000, 50000, 45000, None, 60000],
    'Education': ['High School', 'Bachelor', 'Master', 'PhD', 'Bachelor'],
    'Purchased': ['No', 'Yes', 'Yes', 'No', 'Yes']
}
df = pd.DataFrame(data)

Q3.A1 (5 points) Fill missing Age with median, missing Income with mean.

💡 Click to View Verified Answer

import pandas as pd
 
# Create the DataFrame
data = {
    'Age': [25, 30, None, 35, 40],
    'Income': [30000, 50000, 45000, None, 60000],
    'Education': ['High School', 'Bachelor', 'Master', 'PhD', 'Bachelor'],
    'Purchased': ['No', 'Yes', 'Yes', 'No', 'Yes']
}
df = pd.DataFrame(data)
 
print("Original DataFrame:")
print(df)
print()
 
# Calculate statistics before filling
age_median = df['Age'].median()    # Median of [25, 30, 35, 40] = 32.5
income_mean = df['Income'].mean()  # Mean of [30000, 50000, 45000, 60000] = 46250
 
print(f"Age median (excluding NaN): {age_median}")
print(f"Income mean (excluding NaN): {income_mean}")
print()
 
# Fill missing values
# Method 1: Using fillna with inplace
df['Age'].fillna(age_median, inplace=True)
df['Income'].fillna(income_mean, inplace=True)
 
# Method 2: Using assignment (alternative)
# df['Age'] = df['Age'].fillna(df['Age'].median())
# df['Income'] = df['Income'].fillna(df['Income'].mean())
 
print("After filling missing values:")
print(df)

Calculations:

Age values (excluding NaN): [25, 30, 35, 40]
Age median: (30 + 35) / 2 = 32.5
Income values (excluding NaN): [30000, 50000, 45000, 60000]
Income mean: (30000 + 50000 + 45000 + 60000) / 4 = 46250

Result:

Row 2: Age filled with 32.5
Row 3: Income filled with 46250.0

Q3.A2 (5 points) Encode 'Education' using LabelEncoder. Show the mapping.

💡 Click to View Verified Answer

from sklearn.preprocessing import LabelEncoder
 
# Create LabelEncoder instance
le = LabelEncoder()
 
# Fit and transform the Education column
df['Education_Encoded'] = le.fit_transform(df['Education'])
 
print("Encoding result:")
print(df[['Education', 'Education_Encoded']])
print()
 
# Show the mapping (classes are sorted alphabetically)
print("LabelEncoder mapping:")
for i, label in enumerate(le.classes_):
    print(f"  '{label}' → {i}")

LabelEncoder sorts alphabetically then assigns 0, 1, 2, ...:

Original	Encoded
Bachelor	0
High School	1
Master	2
PhD	3

Encoded column: [1, 0, 2, 3, 0]

Important: LabelEncoder assigns integers based on alphabetical order, not order of appearance!

Q3.A3 (5 points) When should you use StandardScaler vs MinMaxScaler?

💡 Click to View Answer

Scaler	Formula	Output	Best For
StandardScaler	(x - mean) / std	Mean=0, Std=1	SVM, Logistic Regression, data with outliers
MinMaxScaler	(x - min) / (max - min)	[0, 1]	Neural Networks, KNN, image data

Use StandardScaler when:

Data is approximately normally distributed
You want to preserve outlier information
Using algorithms like SVM, Linear Regression

Use MinMaxScaler when:

You need bounded output (0 to 1)
Working with neural networks or image data
Outliers are not a concern

Quick rule:

SVM, Linear models → StandardScaler
Neural networks, KNN → MinMaxScaler

Part B: SVM & Random Forest Theory (10 points)

Q3.B1 (5 points) Explain the "kernel trick" in SVM.

💡 Click to View Answer

Kernel Trick Explanation:

Problem: Some data is not linearly separable in its original space.

Solution: The kernel trick transforms data into a higher-dimensional space where it becomes linearly separable.

How it works:

Original 2D data might have circular boundaries (can't draw a straight line)
Transform to 3D using a kernel function
In 3D, a flat plane can now separate the classes
The "trick": compute this efficiently without actually computing the transformation

Common kernels:

Kernel	Use Case
Linear	Already linearly separable
RBF (Radial Basis Function)	Default choice, works well for most cases
Polynomial	Data with polynomial relationships

Example in code:

from sklearn.svm import SVC
 
# Linear kernel
model_linear = SVC(kernel='linear')
 
# RBF kernel (default)
model_rbf = SVC(kernel='rbf')
 
# Polynomial kernel
model_poly = SVC(kernel='poly', degree=3)

Q3.B2 (5 points) What is "bagging" in Random Forest? Why does it help?

💡 Click to View Answer

Bagging (Bootstrap Aggregating):

Process:

Create multiple random subsets of training data (with replacement)
Train a separate decision tree on each subset
Combine predictions:
- Classification: majority voting
- Regression: average

Why it helps:

Reduces Overfitting
- Each tree sees different data
- Individual tree errors cancel out
- Ensemble is more robust
Reduces Variance
- Averaging many predictions is more stable
- Less sensitive to noise in training data
Handles Outliers Better
- Outliers only affect some trees, not all
- Their influence is diluted in the ensemble
Better Generalization
- Collective wisdom outperforms single tree
- Works well on unseen data

Analogy: Like asking 100 doctors for diagnosis instead of 1 - the collective opinion is usually more reliable.

Question 4: Naive Bayes & Decision Tree (25 points)

Part A: Naive Bayes Calculation (15 points)

Dataset: Email classification

Email	Contains "Free"	Contains "Winner"	Spam?
1	Yes	Yes	Spam
2	Yes	No	Spam
3	No	Yes	Spam
4	No	No	Not Spam
5	Yes	No	Not Spam
6	No	No	Not Spam

Q4.A1 (10 points) A new email contains "Free" but not "Winner". Calculate P(Spam|Free=Yes, Winner=No).

💡 Click to View Verified Answer

Naive Bayes Formula: $P(Class|Features) \propto P(Class) \times \prod P(Feature|Class)$

Step 1: Calculate Prior Probabilities

Class	Count	P(Class)
Spam	3 (emails 1,2,3)	3/6 = 0.5
Not Spam	3 (emails 4,5,6)	3/6 = 0.5

Step 2: Calculate Likelihoods

For Spam emails (1, 2, 3):

P(Free=Yes | Spam) = 2/3 (emails 1, 2 have Free)
P(Winner=No | Spam) = 1/3 (only email 2 has Winner=No)

For Not Spam emails (4, 5, 6):

P(Free=Yes | Not Spam) = 1/3 (only email 5)
P(Winner=No | Not Spam) = 3/3 = 1 (all three)

Step 3: Calculate Unnormalized Posteriors

$P(Spam|evidence) \propto P(Spam) \times P(Free=Yes|Spam) \times P(Winner=No|Spam)$ $= 0.5 \times \frac{2}{3} \times \frac{1}{3} = 0.5 \times 0.667 \times 0.333 = 0.111$

$P(NotSpam|evidence) \propto 0.5 \times \frac{1}{3} \times 1 = 0.167$

Step 4: Normalize

$P(Spam) = \frac{0.111}{0.111 + 0.167} = \frac{0.111}{0.278} = 0.40$

Answer: P(Spam | Free=Yes, Winner=No) = 0.40 = 40%

Prediction: NOT SPAM (probability < 50%)

Q4.A2 (5 points) What is the "naive" assumption in Naive Bayes? When might it fail?

💡 Click to View Answer

The "Naive" Assumption:

Features are conditionally independent given the class
P(A, B | Class) = P(A | Class) × P(B | Class)
Each feature contributes independently to the prediction

When it fails:

Correlated features
- Example: "Free" and "Prize" often appear together in spam
- Treating them as independent overcounts their combined effect
Redundant features
- Example: Having both "temperature in °C" and "temperature in °F"
- These are perfectly correlated, violating independence
Feature interactions matter
- Example: Medical diagnosis where symptom combinations are important
- Symptom A alone is harmless, but A+B together indicates disease

Despite this limitation: Naive Bayes often works surprisingly well in practice, especially for:

Text classification
Spam detection
Sentiment analysis

Part B: Information Gain (10 points)

Q4.B1 (10 points) Calculate Information Gain for the "Contains Free" feature.

Original dataset: 3 Spam, 3 Not Spam

💡 Click to View Verified Answer

Entropy Formula: $H(S) = -\sum p_i \log_2(p_i)$

Step 1: Parent Entropy (3 Spam, 3 Not Spam)

$H(parent) = -0.5 \log_2(0.5) - 0.5 \log_2(0.5)$ $= -0.5 \times (-1) - 0.5 \times (-1)$ $= 0.5 + 0.5 = 1.0$

(Maximum entropy for binary classification = 1.0)

Step 2: Split by "Contains Free"

Free=Yes (3 emails: 2 Spam, 1 Not Spam): $H = -\frac{2}{3} \log_2(\frac{2}{3}) - \frac{1}{3} \log_2(\frac{1}{3})$ $= -0.667 \times (-0.585) - 0.333 \times (-1.585)$ $= 0.390 + 0.528 = 0.918$

Free=No (3 emails: 1 Spam, 2 Not Spam): $H = -\frac{1}{3} \log_2(\frac{1}{3}) - \frac{2}{3} \log_2(\frac{2}{3})$ $= 0.528 + 0.390 = 0.918$

Step 3: Weighted Average Entropy $H(children) = \frac{3}{6} \times 0.918 + \frac{3}{6} \times 0.918 = 0.918$

Step 4: Information Gain $IG = H(parent) - H(children) = 1.0 - 0.918 = 0.082$

Answer: Information Gain = 0.082 bits

Interpretation: "Contains Free" provides a small amount of information for classification. Higher IG would indicate a better split.

🏁 End of Exam

Question	Topic	Points
Q1	Python Output Analysis	20
Q2	Code Writing (choose 3/5)	30
Q3	Pandas & SVM/Random Forest	25
Q4	Naive Bayes & Decision Tree	25
Total		100

📝 Key Formulas Reference

Concept	Formula
Gini Index	1 - Σ(pᵢ²)
Entropy	-Σ pᵢ log₂(pᵢ)
Info Gain	H(parent) - Σ weighted H(children)
Bayes	P(A\|B) ∝ P(B\|A) × P(A)
Z-score	(x - μ) / σ
MinMax	(x - min) / (max - min)

All code verified and tested. Show your work for partial credit. Good luck!

ABW505 Mock Exam 2 - Python & Machine Learning

📋 Exam Information

Item	Details
Total Points	100
Time Allowed	90 minutes
Format	Closed book, calculator allowed
Structure	Q1 (20pts) + Q2 (30pts, choose 3/5) + Q3 (25pts) + Q4 (25pts)

Question 1: Python Output Analysis (20 points)

Answer ALL. Determine exact output.

Q1.1 (5 points)

a = [1, 2, 3]
b = a
b.append(4)
print(a)
print(a is b)

💡 Click to View Answer & Explanation

Step-by-step breakdown:

# Step 1: Create a list and assign to variable 'a'
a = [1, 2, 3]
# Memory: a points to list object [1, 2, 3]
 
# Step 2: Assign 'a' to 'b'
b = a
# IMPORTANT: This does NOT copy the list!
# Both 'a' and 'b' now point to the SAME list object in memory
# Memory: a → [1, 2, 3] ← b
 
# Step 3: Modify list through 'b'
b.append(4)
# Since a and b point to the same object, 
# the change is visible through both variables
# Memory: a → [1, 2, 3, 4] ← b
 
# Step 4: Print results
print(a)        # [1, 2, 3, 4] - modified through b
print(a is b)   # True - same object in memory

Answers:

print(a) → [1, 2, 3, 4]
print(a is b) → True

Key concept: In Python, assignment creates a REFERENCE, not a copy.

To create an independent copy:

b = a.copy()     # Method 1: copy() method
b = a[:]         # Method 2: slice notation
b = list(a)      # Method 3: list constructor

Q1.2 (5 points)

text = "Hello World"
print(text[0:5:2])
print(text[-5:-1])

💡 Click to View Answer & Explanation

Step-by-step breakdown:

text = "Hello World"
 
# Index map:
# Character: H   e   l   l   o       W   o   r   l   d
# Positive:  0   1   2   3   4   5   6   7   8   9   10
# Negative:-11 -10  -9  -8  -7  -6  -5  -4  -3  -2  -1
 
# Line 1: text[0:5:2]
# Format: [start:stop:step]
# start=0 (H), stop=5 (exclusive), step=2 (every 2nd char)
# Indices: 0, 2, 4 → Characters: 'H', 'l', 'o'
result1 = text[0:5:2]  # "Hlo"
 
# Line 2: text[-5:-1]
# start=-5 (W), stop=-1 (exclusive, before 'd')
# Indices: -5, -4, -3, -2 → Characters: 'W', 'o', 'r', 'l'
result2 = text[-5:-1]  # "Worl"

Answers:

text[0:5:2] → Hlo
text[-5:-1] → Worl

Slicing syntax: [start:stop:step]

start: inclusive (default: 0)
stop: exclusive (default: end)
step: increment (default: 1)

Q1.3 (5 points)

def outer():
    x = 10
    def inner():
        nonlocal x
        x += 5
        return x
    return inner()
 
print(outer())
print(outer())

💡 Click to View Answer & Explanation

Step-by-step breakdown:

def outer():
    x = 10  # Local variable in outer's scope
    
    def inner():
        nonlocal x  # Refers to x in enclosing (outer) scope
        x += 5      # Modify outer's x: 10 + 5 = 15
        return x    # Return 15
    
    return inner()  # Call inner() and return its result
 
# First call: outer()
# - x starts at 10 in a NEW local scope
# - inner() adds 5: x = 15
# - Returns 15
print(outer())  # 15
 
# Second call: outer()
# - FRESH call creates NEW local scope
# - x starts at 10 again (not preserved from first call)
# - inner() adds 5: x = 15
# - Returns 15
print(outer())  # 15

Answers:

First print(outer()) → 15
Second print(outer()) → 15

Key concepts:

nonlocal modifies variable in enclosing scope (not global)
Each call to outer() creates a fresh local scope
Variable x is NOT preserved between calls

Q1.4 (5 points)

nums = [1, 2, 3, 4, 5]
result = [x**2 for x in nums if x % 2 == 1]
print(result)
print(sum(result))

💡 Click to View Answer & Explanation

Step-by-step breakdown:

nums = [1, 2, 3, 4, 5]
 
# List comprehension with filter
# Pattern: [expression for item in iterable if condition]
result = [x**2 for x in nums if x % 2 == 1]
 
# Step-by-step execution:
# x=1: 1 % 2 == 1? True  → 1**2 = 1   → include
# x=2: 2 % 2 == 1? False → skip
# x=3: 3 % 2 == 1? True  → 3**2 = 9   → include
# x=4: 4 % 2 == 1? False → skip
# x=5: 5 % 2 == 1? True  → 5**2 = 25  → include
 
# Result: [1, 9, 25]
 
print(result)       # [1, 9, 25]
print(sum(result))  # 1 + 9 + 25 = 35

Answers:

print(result) → [1, 9, 25]
print(sum(result)) → 35

Breakdown:

Filter: odd numbers only (1, 3, 5)
Transform: square each (1, 9, 25)
Sum: 1 + 9 + 25 = 35

Question 2: Code Writing (30 points)

Choose 3 out of 5 questions. Each worth 10 points.

Q2.1 - Count Vowels (10 points)

Write a function count_vowels(text) that:

Counts vowels (a, e, i, o, u) - case insensitive
Returns the count as an integer

💡 Click to View Verified Answer

def count_vowels(text):
    """
    Count the number of vowels in a string.
    
    Vowels are: a, e, i, o, u (case insensitive)
    
    Args:
        text: Input string to analyze
        
    Returns:
        int: Number of vowels found
        
    Examples:
        >>> count_vowels("Hello World")
        3
        >>> count_vowels("AEIOU")
        5
    """
    # Define vowels (both cases for easy comparison)
    vowels = "aeiouAEIOU"
    
    # Initialize counter
    count = 0
    
    # Iterate through each character in the text
    for char in text:
        # Check if character is a vowel
        if char in vowels:
            count += 1
    
    return count
 
 
# Alternative: More Pythonic one-liner
def count_vowels_v2(text):
    """One-liner using generator expression and sum."""
    return sum(1 for char in text.lower() if char in 'aeiou')
 
 
# Alternative: Using count method
def count_vowels_v3(text):
    """Using str.count() for each vowel."""
    text_lower = text.lower()
    return sum(text_lower.count(v) for v in 'aeiou')
 
 
# ===== Test Cases =====
if __name__ == "__main__":
    test_cases = [
        ("Hello World", 3),      # e, o, o
        ("AEIOU", 5),            # all uppercase vowels
        ("rhythm", 0),           # no vowels
        ("", 0),                 # empty string
        ("AaEeIiOoUu", 10),      # mixed case
    ]
    
    print("Testing count_vowels:")
    for text, expected in test_cases:
        result = count_vowels(text)
        status = "✓" if result == expected else "✗"
        print(f'  {status} count_vowels("{text}") = {result} (expected {expected})')

Test Output:

Testing count_vowels:
  ✓ count_vowels("Hello World") = 3 (expected 3)
  ✓ count_vowels("AEIOU") = 5 (expected 5)
  ✓ count_vowels("rhythm") = 0 (expected 0)
  ✓ count_vowels("") = 0 (expected 0)
  ✓ count_vowels("AaEeIiOoUu") = 10 (expected 10)

Key points:

Handle both uppercase and lowercase
Use in operator for membership test
Simple counter pattern

Q2.2 - Word Frequency Dictionary (10 points)

Write a function word_frequency(words) that:

Input: list of words
Return: dictionary with word counts
Example: ['a', 'b', 'a'] → {'a': 2, 'b': 1}

💡 Click to View Verified Answer

def word_frequency(words):
    """
    Count frequency of each word in a list.
    
    Args:
        words: List of words (strings)
        
    Returns:
        dict: Dictionary mapping each word to its count
        
    Examples:
        >>> word_frequency(['a', 'b', 'a'])
        {'a': 2, 'b': 1}
    """
    # Initialize empty frequency dictionary
    freq = {}
    
    # Count each word
    for word in words:
        if word in freq:
            # Word seen before - increment count
            freq[word] += 1
        else:
            # First occurrence - initialize count to 1
            freq[word] = 1
    
    return freq
 
 
# Alternative: Using dict.get()
def word_frequency_v2(words):
    """Using get() method to simplify logic."""
    freq = {}
    for word in words:
        # get(key, default) returns default if key doesn't exist
        freq[word] = freq.get(word, 0) + 1
    return freq
 
 
# Alternative: Using collections.Counter
def word_frequency_v3(words):
    """Using Counter from collections (most Pythonic)."""
    from collections import Counter
    return dict(Counter(words))
 
 
# ===== Test Cases =====
if __name__ == "__main__":
    test_cases = [
        ['a', 'b', 'a'],
        ['hello', 'world', 'hello', 'hello'],
        [],
        ['single'],
    ]
    
    print("Testing word_frequency:")
    for words in test_cases:
        result = word_frequency(words)
        print(f"  {words} → {result}")

Test Output:

Testing word_frequency:
  ['a', 'b', 'a'] → {'a': 2, 'b': 1}
  ['hello', 'world', 'hello', 'hello'] → {'hello': 3, 'world': 1}
  [] → {}
  ['single'] → {'single': 1}

Key techniques:

Check if key exists before incrementing
Alternative: use dict.get(key, default)
Best practice: use collections.Counter

Q2.3 - Multiplication Table (10 points)

Write a function multiplication_table(n) that:

Prints an n×n multiplication table
Format: 1 x 1 = 1

💡 Click to View Verified Answer

def multiplication_table(n):
    """
    Print an n×n multiplication table.
    
    Args:
        n: Size of the table (positive integer)
        
    Example output for n=3:
        1 x 1 = 1
        1 x 2 = 2
        1 x 3 = 3
        2 x 1 = 2
        ...
        3 x 3 = 9
    """
    # Validate input
    if n <= 0:
        print("Please provide a positive integer.")
        return
    
    # Outer loop: rows (first multiplier)
    for i in range(1, n + 1):
        
        # Inner loop: columns (second multiplier)
        for j in range(1, n + 1):
            
            # Calculate product
            product = i * j
            
            # Print formatted result using f-string
            print(f"{i} x {j} = {product}")
 
 
# Alternative: Compact table format
def multiplication_table_compact(n):
    """Print table in grid format."""
    for i in range(1, n + 1):
        row = ""
        for j in range(1, n + 1):
            row += f"{i*j:4}"  # 4-character width for alignment
        print(row)
 
 
# ===== Test Cases =====
if __name__ == "__main__":
    print("3x3 Multiplication Table:")
    print("-" * 20)
    multiplication_table(3)
    
    print("\n3x3 Compact Format:")
    print("-" * 20)
    multiplication_table_compact(3)

Test Output:

3x3 Multiplication Table:
--------------------
1 x 1 = 1
1 x 2 = 2
1 x 3 = 3
2 x 1 = 2
2 x 2 = 4
2 x 3 = 6
3 x 1 = 3
3 x 2 = 6
3 x 3 = 9

3x3 Compact Format:
--------------------
   1   2   3
   2   4   6
   3   6   9

Key concepts:

Nested loops for 2D iteration
range(1, n+1) to start from 1
f-strings for formatted output

Q2.4 - Find Max and Min (10 points)

Write a function find_max_min(numbers) that:

Input: list of numbers
Return: tuple (maximum, minimum, difference)
Handle empty list by returning (None, None, None)

💡 Click to View Verified Answer

def find_max_min(numbers):
    """
    Find maximum, minimum, and their difference in a list.
    
    Args:
        numbers: List of numeric values
        
    Returns:
        tuple: (maximum, minimum, difference) or (None, None, None) if empty
        
    Examples:
        >>> find_max_min([5, 2, 8, 1, 9])
        (9, 1, 8)
        >>> find_max_min([])
        (None, None, None)
    """
    # Handle empty list edge case
    # IMPORTANT: Check this first to avoid errors with min()/max()
    if not numbers:  # Empty list is falsy in Python
        return (None, None, None)
    
    # Find maximum and minimum using built-in functions
    maximum = max(numbers)
    minimum = min(numbers)
    
    # Calculate difference (range of values)
    difference = maximum - minimum
    
    return (maximum, minimum, difference)
 
 
# Alternative: Without using built-in min/max
def find_max_min_manual(numbers):
    """Manual implementation without min()/max()."""
    if not numbers:
        return (None, None, None)
    
    # Initialize with first element
    maximum = numbers[0]
    minimum = numbers[0]
    
    # Iterate through remaining elements
    for num in numbers[1:]:
        if num > maximum:
            maximum = num
        if num < minimum:
            minimum = num
    
    return (maximum, minimum, maximum - minimum)
 
 
# ===== Test Cases =====
if __name__ == "__main__":
    test_cases = [
        [5, 2, 8, 1, 9],      # Normal case
        [3],                   # Single element
        [],                    # Empty list
        [-5, -2, -8, -1],     # Negative numbers
        [1, 1, 1, 1],         # All same
    ]
    
    print("Testing find_max_min:")
    for nums in test_cases:
        result = find_max_min(nums)
        print(f"  {nums} → max={result[0]}, min={result[1]}, diff={result[2]}")

Test Output:

Testing find_max_min:
  [5, 2, 8, 1, 9] → max=9, min=1, diff=8
  [3] → max=3, min=3, diff=0
  [] → max=None, min=None, diff=None
  [-5, -2, -8, -1] → max=-1, min=-8, diff=7
  [1, 1, 1, 1] → max=1, min=1, diff=0

Key points:

ALWAYS handle empty list first
Use built-in min() and max() for efficiency
Return a tuple, not a list

Q2.5 - Factorial (Recursive) (10 points)

Write a function factorial(n) that:

Calculates n! recursively
Handle: 0! = 1, negative returns None

💡 Click to View Verified Answer

def factorial(n):
    """
    Calculate factorial of n using recursion.
    
    Factorial definition:
    - n! = n × (n-1) × (n-2) × ... × 2 × 1
    - 0! = 1 (by definition)
    - Negative numbers: undefined (return None)
    
    Args:
        n: Non-negative integer
        
    Returns:
        int: n! or None for negative input
        
    Examples:
        >>> factorial(5)
        120
        >>> factorial(0)
        1
    """
    # Handle negative input
    if n < 0:
        return None
    
    # Base case: 0! = 1 and 1! = 1
    if n == 0 or n == 1:
        return 1
    
    # Recursive case: n! = n × (n-1)!
    return n * factorial(n - 1)
 
 
# Trace for factorial(4):
# factorial(4) = 4 × factorial(3)
#              = 4 × (3 × factorial(2))
#              = 4 × (3 × (2 × factorial(1)))
#              = 4 × (3 × (2 × 1))
#              = 4 × (3 × 2)
#              = 4 × 6
#              = 24
 
 
# Alternative: Iterative version (no recursion)
def factorial_iterative(n):
    """Calculate factorial using iteration."""
    if n < 0:
        return None
    
    result = 1
    for i in range(2, n + 1):
        result *= i
    return result
 
 
# ===== Test Cases =====
if __name__ == "__main__":
    test_cases = [
        (0, 1),      # 0! = 1
        (1, 1),      # 1! = 1
        (5, 120),    # 5! = 120
        (10, 3628800),
        (-5, None),  # Negative
    ]
    
    print("Testing factorial:")
    for n, expected in test_cases:
        result = factorial(n)
        status = "✓" if result == expected else "✗"
        print(f"  {status} factorial({n}) = {result} (expected {expected})")

Test Output:

Testing factorial:
  ✓ factorial(0) = 1 (expected 1)
  ✓ factorial(1) = 1 (expected 1)
  ✓ factorial(5) = 120 (expected 120)
  ✓ factorial(10) = 3628800 (expected 3628800)
  ✓ factorial(-5) = None (expected None)

Recursion components:

Base case: stops recursion (n=0 or n=1)
Recursive case: breaks problem into smaller subproblem
Progress: n decreases each call, eventually reaching base case

Question 3: Pandas & SVM/Random Forest (25 points)

Part A: Data Preprocessing (15 points)

Given this DataFrame:

import pandas as pd
from sklearn.preprocessing import LabelEncoder, StandardScaler
 
data = {
    'Age': [25, 30, None, 35, 40],
    'Income': [30000, 50000, 45000, None, 60000],
    'Education': ['High School', 'Bachelor', 'Master', 'PhD', 'Bachelor'],
    'Purchased': ['No', 'Yes', 'Yes', 'No', 'Yes']
}
df = pd.DataFrame(data)

Q3.A1 (5 points) Fill missing Age with median, missing Income with mean.

💡 Click to View Verified Answer

import pandas as pd
 
# Create the DataFrame
data = {
    'Age': [25, 30, None, 35, 40],
    'Income': [30000, 50000, 45000, None, 60000],
    'Education': ['High School', 'Bachelor', 'Master', 'PhD', 'Bachelor'],
    'Purchased': ['No', 'Yes', 'Yes', 'No', 'Yes']
}
df = pd.DataFrame(data)
 
print("Original DataFrame:")
print(df)
print()
 
# Calculate statistics before filling
age_median = df['Age'].median()    # Median of [25, 30, 35, 40] = 32.5
income_mean = df['Income'].mean()  # Mean of [30000, 50000, 45000, 60000] = 46250
 
print(f"Age median (excluding NaN): {age_median}")
print(f"Income mean (excluding NaN): {income_mean}")
print()
 
# Fill missing values
# Method 1: Using fillna with inplace
df['Age'].fillna(age_median, inplace=True)
df['Income'].fillna(income_mean, inplace=True)
 
# Method 2: Using assignment (alternative)
# df['Age'] = df['Age'].fillna(df['Age'].median())
# df['Income'] = df['Income'].fillna(df['Income'].mean())
 
print("After filling missing values:")
print(df)

Calculations:

Age values (excluding NaN): [25, 30, 35, 40]
Age median: (30 + 35) / 2 = 32.5
Income values (excluding NaN): [30000, 50000, 45000, 60000]
Income mean: (30000 + 50000 + 45000 + 60000) / 4 = 46250

Result:

Row 2: Age filled with 32.5
Row 3: Income filled with 46250.0

Q3.A2 (5 points) Encode 'Education' using LabelEncoder. Show the mapping.

💡 Click to View Verified Answer

from sklearn.preprocessing import LabelEncoder
 
# Create LabelEncoder instance
le = LabelEncoder()
 
# Fit and transform the Education column
df['Education_Encoded'] = le.fit_transform(df['Education'])
 
print("Encoding result:")
print(df[['Education', 'Education_Encoded']])
print()
 
# Show the mapping (classes are sorted alphabetically)
print("LabelEncoder mapping:")
for i, label in enumerate(le.classes_):
    print(f"  '{label}' → {i}")

LabelEncoder sorts alphabetically then assigns 0, 1, 2, ...:

Original	Encoded
Bachelor	0
High School	1
Master	2
PhD	3

Encoded column: [1, 0, 2, 3, 0]

Important: LabelEncoder assigns integers based on alphabetical order, not order of appearance!

Q3.A3 (5 points) When should you use StandardScaler vs MinMaxScaler?

💡 Click to View Answer

Scaler	Formula	Output	Best For
StandardScaler	(x - mean) / std	Mean=0, Std=1	SVM, Logistic Regression, data with outliers
MinMaxScaler	(x - min) / (max - min)	[0, 1]	Neural Networks, KNN, image data

Use StandardScaler when:

Data is approximately normally distributed
You want to preserve outlier information
Using algorithms like SVM, Linear Regression

Use MinMaxScaler when:

You need bounded output (0 to 1)
Working with neural networks or image data
Outliers are not a concern

Quick rule:

SVM, Linear models → StandardScaler
Neural networks, KNN → MinMaxScaler

Part B: SVM & Random Forest Theory (10 points)

Q3.B1 (5 points) Explain the "kernel trick" in SVM.

💡 Click to View Answer

Kernel Trick Explanation:

Problem: Some data is not linearly separable in its original space.

Solution: The kernel trick transforms data into a higher-dimensional space where it becomes linearly separable.

How it works:

Original 2D data might have circular boundaries (can't draw a straight line)
Transform to 3D using a kernel function
In 3D, a flat plane can now separate the classes
The "trick": compute this efficiently without actually computing the transformation

Common kernels:

Kernel	Use Case
Linear	Already linearly separable
RBF (Radial Basis Function)	Default choice, works well for most cases
Polynomial	Data with polynomial relationships

Example in code:

from sklearn.svm import SVC
 
# Linear kernel
model_linear = SVC(kernel='linear')
 
# RBF kernel (default)
model_rbf = SVC(kernel='rbf')
 
# Polynomial kernel
model_poly = SVC(kernel='poly', degree=3)

Q3.B2 (5 points) What is "bagging" in Random Forest? Why does it help?

💡 Click to View Answer

Bagging (Bootstrap Aggregating):

Process:

Create multiple random subsets of training data (with replacement)
Train a separate decision tree on each subset
Combine predictions:
- Classification: majority voting
- Regression: average

Why it helps:

Reduces Overfitting
- Each tree sees different data
- Individual tree errors cancel out
- Ensemble is more robust
Reduces Variance
- Averaging many predictions is more stable
- Less sensitive to noise in training data
Handles Outliers Better
- Outliers only affect some trees, not all
- Their influence is diluted in the ensemble
Better Generalization
- Collective wisdom outperforms single tree
- Works well on unseen data

Analogy: Like asking 100 doctors for diagnosis instead of 1 - the collective opinion is usually more reliable.

Question 4: Naive Bayes & Decision Tree (25 points)

Part A: Naive Bayes Calculation (15 points)

Dataset: Email classification

Email	Contains "Free"	Contains "Winner"	Spam?
1	Yes	Yes	Spam
2	Yes	No	Spam
3	No	Yes	Spam
4	No	No	Not Spam
5	Yes	No	Not Spam
6	No	No	Not Spam

Q4.A1 (10 points) A new email contains "Free" but not "Winner". Calculate P(Spam|Free=Yes, Winner=No).

💡 Click to View Verified Answer

Naive Bayes Formula: $P(Class|Features) \propto P(Class) \times \prod P(Feature|Class)$

Step 1: Calculate Prior Probabilities

Class	Count	P(Class)
Spam	3 (emails 1,2,3)	3/6 = 0.5
Not Spam	3 (emails 4,5,6)	3/6 = 0.5

Step 2: Calculate Likelihoods

For Spam emails (1, 2, 3):

P(Free=Yes | Spam) = 2/3 (emails 1, 2 have Free)
P(Winner=No | Spam) = 1/3 (only email 2 has Winner=No)

For Not Spam emails (4, 5, 6):

P(Free=Yes | Not Spam) = 1/3 (only email 5)
P(Winner=No | Not Spam) = 3/3 = 1 (all three)

Step 3: Calculate Unnormalized Posteriors

$P(Spam|evidence) \propto P(Spam) \times P(Free=Yes|Spam) \times P(Winner=No|Spam)$ $= 0.5 \times \frac{2}{3} \times \frac{1}{3} = 0.5 \times 0.667 \times 0.333 = 0.111$

$P(NotSpam|evidence) \propto 0.5 \times \frac{1}{3} \times 1 = 0.167$

Step 4: Normalize

$P(Spam) = \frac{0.111}{0.111 + 0.167} = \frac{0.111}{0.278} = 0.40$

Answer: P(Spam | Free=Yes, Winner=No) = 0.40 = 40%

Prediction: NOT SPAM (probability < 50%)

Q4.A2 (5 points) What is the "naive" assumption in Naive Bayes? When might it fail?

💡 Click to View Answer

The "Naive" Assumption:

Features are conditionally independent given the class
P(A, B | Class) = P(A | Class) × P(B | Class)
Each feature contributes independently to the prediction

When it fails:

Correlated features
- Example: "Free" and "Prize" often appear together in spam
- Treating them as independent overcounts their combined effect
Redundant features
- Example: Having both "temperature in °C" and "temperature in °F"
- These are perfectly correlated, violating independence
Feature interactions matter
- Example: Medical diagnosis where symptom combinations are important
- Symptom A alone is harmless, but A+B together indicates disease

Despite this limitation: Naive Bayes often works surprisingly well in practice, especially for:

Text classification
Spam detection
Sentiment analysis

Part B: Information Gain (10 points)

Q4.B1 (10 points) Calculate Information Gain for the "Contains Free" feature.

Original dataset: 3 Spam, 3 Not Spam

💡 Click to View Verified Answer

Entropy Formula: $H(S) = -\sum p_i \log_2(p_i)$

Step 1: Parent Entropy (3 Spam, 3 Not Spam)

$H(parent) = -0.5 \log_2(0.5) - 0.5 \log_2(0.5)$ $= -0.5 \times (-1) - 0.5 \times (-1)$ $= 0.5 + 0.5 = 1.0$

(Maximum entropy for binary classification = 1.0)

Step 2: Split by "Contains Free"

Free=Yes (3 emails: 2 Spam, 1 Not Spam): $H = -\frac{2}{3} \log_2(\frac{2}{3}) - \frac{1}{3} \log_2(\frac{1}{3})$ $= -0.667 \times (-0.585) - 0.333 \times (-1.585)$ $= 0.390 + 0.528 = 0.918$

Free=No (3 emails: 1 Spam, 2 Not Spam): $H = -\frac{1}{3} \log_2(\frac{1}{3}) - \frac{2}{3} \log_2(\frac{2}{3})$ $= 0.528 + 0.390 = 0.918$

Step 3: Weighted Average Entropy $H(children) = \frac{3}{6} \times 0.918 + \frac{3}{6} \times 0.918 = 0.918$

Step 4: Information Gain $IG = H(parent) - H(children) = 1.0 - 0.918 = 0.082$

Answer: Information Gain = 0.082 bits

Interpretation: "Contains Free" provides a small amount of information for classification. Higher IG would indicate a better split.

🏁 End of Exam

Question	Topic	Points
Q1	Python Output Analysis	20
Q2	Code Writing (choose 3/5)	30
Q3	Pandas & SVM/Random Forest	25
Q4	Naive Bayes & Decision Tree	25
Total		100

📝 Key Formulas Reference

Concept	Formula
Gini Index	1 - Σ(pᵢ²)
Entropy	-Σ pᵢ log₂(pᵢ)
Info Gain	H(parent) - Σ weighted H(children)
Bayes	P(A\|B) ∝ P(B\|A) × P(A)
Z-score	(x - μ) / σ
MinMax	(x - min) / (max - min)

All code verified and tested. Show your work for partial credit. Good luck!

ABW505 Mock Exam 2 - Python & Machine Learning

📋 Exam Information

Question 1: Python Output Analysis (20 points)

Q1.1 (5 points)

Q1.2 (5 points)

Q1.3 (5 points)

Q1.4 (5 points)

Question 2: Code Writing (30 points)

Q2.1 - Count Vowels (10 points)

Q2.2 - Word Frequency Dictionary (10 points)

Q2.3 - Multiplication Table (10 points)

Q2.4 - Find Max and Min (10 points)

Q2.5 - Factorial (Recursive) (10 points)

Question 3: Pandas & SVM/Random Forest (25 points)

Part A: Data Preprocessing (15 points)

Part B: SVM & Random Forest Theory (10 points)

Question 4: Naive Bayes & Decision Tree (25 points)

Part A: Naive Bayes Calculation (15 points)

Part B: Information Gain (10 points)

🏁 End of Exam

📝 Key Formulas Reference

💬 Comments

ABW505 Mock Exam 2 - Python & Machine Learning

📋 Exam Information

Question 1: Python Output Analysis (20 points)

Q1.1 (5 points)

Q1.2 (5 points)

Q1.3 (5 points)

Q1.4 (5 points)

Question 2: Code Writing (30 points)

Q2.1 - Count Vowels (10 points)

Q2.2 - Word Frequency Dictionary (10 points)

Q2.3 - Multiplication Table (10 points)

Q2.4 - Find Max and Min (10 points)

Q2.5 - Factorial (Recursive) (10 points)

Question 3: Pandas & SVM/Random Forest (25 points)

Part A: Data Preprocessing (15 points)

Part B: SVM & Random Forest Theory (10 points)

Question 4: Naive Bayes & Decision Tree (25 points)

Part A: Naive Bayes Calculation (15 points)

Part B: Information Gain (10 points)

🏁 End of Exam

📝 Key Formulas Reference

💬 Comments