TechBlog
首页博客分类标签友链留言板关于
EN
TechBlog

探索技术世界的无限可能。分享前沿技术、开发经验与行业洞察。

快速链接

  • 博客文章
  • 文章分类
  • 标签云
  • 关于我

热门分类

  • Technology
  • AI
  • Web Development
  • DevOps

© 2026 TechBlog. All rights reserved.

Built with using Next.js & Tailwind CSS

返回文章列表
Study Notes

ABW505 Complete Question Bank: Python & Machine Learning (All English)

2026年1月24日
35 分钟阅读

ABW505 Complete Question Bank - Python & Machine Learning

📚 All code in this document has been verified and tested. Every answer includes detailed explanations and comments.


📋 Exam Structure Overview

SectionPointsTypeCoverage
Q120Output AnalysisPython basics: variables, operators, lists, tuples, functions, loops, conditions
Q230Code Writing (3/5)Decision structure, repetition, boolean logic, lists/tuples, functions
Q325Theory + CodePandas, Data Preprocessing, Encoder, SVM, Random Forest
Q425Theory + CalculationNaive Bayes, Decision Tree, Gini Index, Entropy

📝 Q1: Python Output Analysis (20 points)

Key Pattern 1: Operator Precedence (MUST KNOW!)

Priority order: ** (power) → *, /, //, % → +, -

OperatorMeaningExample
**Power/Exponent5**2 = 25
/Division (float)5/2 = 2.5
//Floor division (integer)5//2 = 2
%Modulo (remainder)10%3 = 1

Problem 1.1: Power and Floor Division

print(5**2 // 3)

💡 Click to View Answer

Step-by-step:

  1. 5**2 = 25 (power first, highest priority)
  2. 25 // 3 = 8 (floor division, discard remainder)

Answer: 8

Key concept: Power ** has higher priority than //. Floor division always rounds DOWN toward negative infinity.


Problem 1.2: Mixed Operations

print(3 + 4 * 4 // 4)

💡 Click to View Answer

Step-by-step:

  1. 4 * 4 = 16 (multiplication first)
  2. 16 // 4 = 4 (floor division, same priority as multiplication, left to right)
  3. 3 + 4 = 7 (addition last)

Answer: 7


Problem 1.3: Power and Multiplication

print(2 * 3 ** 2)

💡 Click to View Answer

Step-by-step:

  1. 3**2 = 9 (power first!)
  2. 2 * 9 = 18

Answer: 18

Common mistake: 2 * 3 = 6, then 6**2 = 36. WRONG! Power has higher priority.


Problem 1.4: Negative Floor Division (TRICKY!)

print(-5 // 3)

💡 Click to View Answer

Key insight: Floor division rounds toward NEGATIVE infinity, not toward zero!

  • -5 ÷ 3 = -1.666...
  • Rounding DOWN (toward -∞) → -2

Answer: -2

This is NOT the same as integer division in some other languages! Python's // always floors toward negative infinity.


Key Pattern 2: List Iteration and Sum

Problem 1.5: Calculate Average

numbers = [2, 4, 6, 8]
total = 0
for n in numbers:
    total += n
print(total / len(numbers))

💡 Click to View Answer

Trace:

  • Loop 1: total = 0 + 2 = 2
  • Loop 2: total = 2 + 4 = 6
  • Loop 3: total = 6 + 6 = 12
  • Loop 4: total = 12 + 8 = 20
  • Average: 20 / 4 = 5.0

Answer: 5.0

Note: Division / always returns a float in Python 3, so the answer is 5.0 not 5.


Key Pattern 3: Tuple Operations (MUST KNOW!)

Keywords: Tuple = Immutable list, 创建后不可修改, 用()定义


Problem 1.5a: Basic Tuple Index

t = ("study", "exercises", "exam")
print(t[1])

💡 Click to View Answer

Index map:

Element:  "study"  "exercises"  "exam"
Index:       0         1          2

Answer: exercises

Keywords: Tuple索引从0开始, 和list一样


Problem 1.5b: Tuple with len()

t = ("A", "B", "C")
print(len(t))

💡 Click to View Answer

Answer: 3

Keywords: len()数元素个数, tuple和list用法相同


Problem 1.5c: Tuple Negative Indexing

t = ("A", "B", "C")
print(t[-1])

💡 Click to View Answer

Negative index map:

Element:  "A"   "B"   "C"
Negative:  -3    -2    -1

Answer: C

Keywords: -1是最后一个元素, 负索引从右往左数


Problem 1.5d: Tuple Slicing

t = (1, 2, 3, 4, 5)
print(t[1:4])

💡 Click to View Answer

Slice rule: 左闭右开 (left-inclusive, right-exclusive)

Answer: (2, 3, 4)

Keywords: t[1:4]取index 1,2,3 (不含4), 返回的还是tuple


Problem 1.5e: Tuple Repetition

t = (1, 2)
print(t * 3)

💡 Click to View Answer

Answer: (1, 2, 1, 2, 1, 2)

Keywords: *重复操作, 和字符串类似


Problem 1.5f: Tuple is Immutable (TRICKY!)

t = (1, 2, 3)
t[0] = 100
print(t)

💡 Click to View Answer

Answer: TypeError (程序报错!)

Keywords: Tuple是immutable(不可变), 创建后不能修改元素

对比: List是mutable(可变), 可以修改元素

lst = [1, 2, 3]
lst[0] = 100  # ✅ 正常工作

Problem 1.5g: For Loop with Tuple

t = (2, 4, 6)
for x in t:
    print(x)

💡 Click to View Answer

Answer:

2
4
6

Keywords: Tuple支持for遍历, 和list完全一样


Problem 1.5h: List of Tuples (套娃题型!)

data = [("Ann", 80), ("Bob", 60)]
print(data[1])

💡 Click to View Answer

Key: 外层是list, 每个元素是tuple

Answer: ('Bob', 60)

Keywords: data[1]取list的第1个元素(整个tuple)


Problem 1.5i: Nested Indexing (双重索引)

data = [("Ann", 80), ("Bob", 60)]
print(data[1][0])

💡 Click to View Answer

Step-by-step:

  1. data[1] = ("Bob", 60)
  2. ("Bob", 60)[0] = "Bob"

Answer: Bob

Keywords: 双重索引=套娃, 先取外层再取内层


Problem 1.5j: Tuple Unpacking with For Loop

data = [("Ann", 80), ("Bob", 60)]
for name, score in data:
    print(name)

💡 Click to View Answer

Key: Tuple自动解包, name和score分别接收tuple的两个元素

Answer:

Ann
Bob

Keywords: Tuple解包, 变量数量必须匹配tuple元素数量


Problem 1.5k: Mixed Tuple and List

data = [(1, 2), (3, 4), (5, 6)]
print(data[2][1])

💡 Click to View Answer

Step-by-step:

  1. data[2] = (5, 6) (第3个tuple)
  2. (5, 6)[1] = 6 (tuple的第2个元素)

Answer: 6


Problem 1.5l: in Operator with Tuple

t = ("X", "Y", "Z")
if "Y" in t:
    print("Y")
else:
    print("N")

💡 Click to View Answer

Answer: Y

Keywords: in检查元素是否存在, tuple和list都支持


Key Pattern 4: Function Basics (MUST KNOW!)

Keywords: def定义函数, return返回结果并结束函数, print负责输出


Problem 1.6a: Basic Function

def f(x):
    return x * 2
 
print(f(3))

💡 Click to View Answer

Step-by-step:

  1. 调用f(3), x=3
  2. return 3*2 = 6
  3. print(6)

Answer: 6

Keywords: 参数传值, return返回计算结果


Problem 1.6b: Function Without Print (TRICKY!)

def f(x):
    return x * 2
 
f(3)

💡 Click to View Answer

Answer: None (无输出!)

Keywords: return只返回值, 不负责输出! 没有print就没有显示!

关键区别:

  • return = 返回结果并结束函数 (不显示)
  • print = 输出到屏幕 (显示)
  • 调用函数 ≠ 自动输出

Problem 1.6c: Multiple Parameters

def add(a, b):
    return a + b
 
print(add(2, 5))

💡 Click to View Answer

Answer: 7

Keywords: 多参数用逗号分隔, 2+5=7


Problem 1.6d: Function with Arithmetic

def f(x):
    return x + 1
 
print(f(2) + f(3) * 2)

💡 Click to View Answer

Step-by-step:

  1. f(2) = 2+1 = 3
  2. f(3) = 3+1 = 4
  3. 3 + 4*2 = 3 + 8 = 11 (乘法优先!)

Answer: 11

Keywords: 函数返回值参与运算, 遵守算术优先级


Problem 1.6e: Boolean Function

def is_even(n):
    return n % 2 == 0
 
print(is_even(5))

💡 Click to View Answer

Step-by-step:

  1. 5 % 2 = 1 (余数)
  2. 1 == 0? False

Answer: False

Keywords: Boolean函数返回True/False, %取余数


Problem 1.6f: Function with If (常见混合题型)

def check(n):
    if n > 10:
        print("Big")
    else:
        print("Small")
    return None
 
result = check(12)
print(result)

💡 Click to View Answer

Step-by-step:

  1. check(12): 12>10成立, print("Big")
  2. return None
  3. print(result) → print(None)

Answer:

Big
None

Keywords: 函数内的print会执行, return None也会被打印


Problem 1.6g: Function with For Loop

def sum_list(a):
    s = 0
    for x in a:
        s += x
    return s
 
print(sum_list([1, 2, 3]))

💡 Click to View Answer

Trace:

  • s=0, x=1: s=0+1=1
  • x=2: s=1+2=3
  • x=3: s=3+3=6

Answer: 6

Keywords: 函数参数可以是list, 遍历累加


Problem 1.6h: Function Returning String

def grade(m):
    if m >= 50:
        return "pass"
    else:
        return "fail"
 
print(grade(45))

💡 Click to View Answer

Step-by-step:

  1. grade(45): 45>=50? False
  2. return "fail"

Answer: fail

Keywords: return可以返回任何类型, 包括字符串


Problem 1.6i: Nested Function Call (可能超纲)

def f(x):
    return x + 1
 
def g(x):
    return f(x) * 2
 
print(g(3))

💡 Click to View Answer

Step-by-step:

  1. g(3) 调用 f(3)
  2. f(3) = 3+1 = 4
  3. g(3) = 4 * 2 = 8

Answer: 8

Keywords: 函数嵌套调用, 先执行内层函数


Key Pattern 5: String Slicing (Left-Inclusive, Right-Exclusive)

Problem 1.6: String Slice

s = "ABW505"
print(s[1:5])

💡 Click to View Answer

Index map:

Character:  A   B   W   5   0   5
Index:      0   1   2   3   4   5

s[1:5] → indices 1, 2, 3, 4 (NOT including 5)

Answer: BW50


Problem 1.7: Negative Indexing

text = "Hello World"
print(text[-5:-1])

💡 Click to View Answer

Index map:

Character: H   e   l   l   o       W   o   r   l   d
Positive:  0   1   2   3   4   5   6   7   8   9   10
Negative:-11 -10  -9  -8  -7  -6  -5  -4  -3  -2  -1

text[-5:-1] → from 'W' (index -5) to 'l' (index -2, NOT including -1)

Answer: Worl


Key Pattern 4: List Operations

Problem 1.8: Slice Assignment (TRICKY!)

numbers = [10, 20, 30, 40, 50]
numbers[1:4] = [100]
print(len(numbers))
print(numbers[2])

💡 Click to View Answer

Step-by-step:

  1. Original: [10, 20, 30, 40, 50]
  2. numbers[1:4] selects [20, 30, 40] (3 elements)
  3. Replace with [100] (1 element)
  4. Result: [10, 100, 50]
  5. Length: 3
  6. numbers[2] = 50

Answers:

  • len(numbers) → 3
  • numbers[2] → 50

Key concept: Slice assignment can change list size! Replacing 3 elements with 1 element reduces length by 2.


Problem 1.9: List Reference vs Copy

a = [1, 2, 3]
b = a
b.append(4)
print(a)
print(a is b)

💡 Click to View Answer

Key concept: b = a creates a REFERENCE, not a copy!

  1. a and b point to the SAME list object
  2. Modifying b also modifies a
  3. a is b → True (same object in memory)

Answers:

  • print(a) → [1, 2, 3, 4]
  • print(a is b) → True

To create an independent copy: Use b = a.copy() or b = a[:]


Key Pattern 5: Functions with Default Arguments

Problem 1.10: Keyword Arguments

def mystery(a, b=5, c=10):
    return a * 2 + b - c
 
result = mystery(3, c=4)
print(result)

💡 Click to View Answer

Step-by-step:

  1. a = 3 (positional argument)
  2. b = 5 (uses default, NOT overridden)
  3. c = 4 (keyword argument overrides default)
  4. Calculation: 3 * 2 + 5 - 4 = 6 + 5 - 4 = 7

Answer: 7

Key concept: Keyword arguments let you skip over default parameters.


Key Pattern 6: Loops and Range

Problem 1.11: Range with Accumulator

total = 0
for i in range(1, 4):
    total += i
print(total)

💡 Click to View Answer

range(1, 4) generates: 1, 2, 3 (NOT including 4)

Accumulation: 0 + 1 + 2 + 3 = 6

Answer: 6


Problem 1.12: Break Statement

for i in range(5):
    if i == 2:
        break
    print(i)

💡 Click to View Answer

  • i=0: Print 0
  • i=1: Print 1
  • i=2: Break! Exit loop immediately

Answer:

0
1

Key Pattern 7: List Comprehension

Problem 1.13: Filtered List Comprehension

nums = [1, 2, 3, 4, 5]
result = [x**2 for x in nums if x % 2 == 1]
print(result)
print(sum(result))

💡 Click to View Answer

Step-by-step:

  1. Filter odd numbers: 1, 3, 5 (where x % 2 == 1)
  2. Square each: 1², 3², 5² = 1, 9, 25
  3. Result: [1, 9, 25]
  4. Sum: 1 + 9 + 25 = 35

Answers:

  • result → [1, 9, 25]
  • sum(result) → 35

Pattern: [expression for item in iterable if condition]


📝 Q2: Code Writing (30 points - Choose 3 of 5)

Template 1: Menu with List (MUST MEMORIZE!)

Problem: Write a Python program that displays this menu repeatedly:

  1. Add a number to the list
  2. Display the list
  3. Exit

💡 Click to View Verified Answer

# Initialize empty list to store numbers
data = []
 
# Main program loop - runs until user chooses to exit
while True:
    # Display menu options with clear prompts
    print("\n--- MENU ---")
    print("1. Add a number to the list")
    print("2. Display the list")
    print("3. Exit")
    
    # Get user choice with prompt (IMPORTANT: include prompt text!)
    choice = input("Enter your choice (1/2/3): ")
    
    # Process user choice
    if choice == "1":
        # Option 1: Add number
        # Use try-except to handle invalid input gracefully
        try:
            num = int(input("Enter a number to add: "))
            data.append(num)
            print(f"Added {num} to the list.")
        except ValueError:
            print("Invalid input! Please enter a valid integer.")
    
    elif choice == "2":
        # Option 2: Display list
        if len(data) == 0:
            print("The list is empty.")
        else:
            print(f"Current list: {data}")
    
    elif choice == "3":
        # Option 3: Exit program
        print("Goodbye!")
        break
    
    else:
        # Handle invalid menu choice
        print("Invalid choice! Please enter 1, 2, or 3.")

Key improvements over the original buggy version:

  1. ✅ Added prompt text to input() - users know what to enter
  2. ✅ Added try-except for error handling - won't crash on invalid input
  3. ✅ Used string comparison instead of int - avoids crash if user enters text
  4. ✅ Added feedback messages - users know what happened
  5. ✅ Added empty list check - better user experience

ORIGINAL BUGGY VERSION (what was wrong):

# PROBLEMATIC CODE - DO NOT USE IN EXAM
data = []
while True:
    print("1.Add")
    print("2.Show")
    print("3.Exit")
    c = int(input())  # BUG: Crashes if user enters non-integer!
    
    if c == 1:
        data.append(int(input()))  # BUG: Crashes on invalid input, no prompt!
    elif c == 2:
        print(data)
    elif c == 3:
        break
# Missing: else clause, error handling, user prompts

Why it crashes: int(input()) without try-except will throw ValueError if user enters anything that's not a number (like pressing Enter, or typing "abc").

✍️ 手写精简版 (HANDWRITING VERSION)

只保留核心逻辑,去掉所有注释和错误处理:

data = []
while True:
    print("1.Add 2.Show 3.Exit")
    c = input("Choice: ")
    if c == "1":
        data.append(int(input("Num: ")))
    elif c == "2":
        print(data)
    elif c == "3":
        break

手写要点: 约10行, 必须有while True + break退出


Template 2: List with Sentinel Value (-1)

Problem: Write a Python program that:

  • Allows user to enter integers
  • Stops when user enters -1
  • Prints the minimum, maximum, and average

💡 Click to View Verified Answer

# Initialize empty list to store user's numbers
nums = []
 
print("Enter integers. Enter -1 to stop.")
 
# Main input loop
while True:
    try:
        # Get integer input with clear prompt
        n = int(input("Enter a number (-1 to stop): "))
        
        # Check for sentinel value
        if n == -1:
            break  # Exit loop when user enters -1
        
        # Add valid number to list
        nums.append(n)
        
    except ValueError:
        # Handle non-integer input
        print("Invalid input! Please enter an integer.")
 
# Calculate and display statistics
# IMPORTANT: Check if list is empty to avoid division by zero!
if len(nums) == 0:
    print("No numbers were entered.")
else:
    minimum = min(nums)
    maximum = max(nums)
    average = sum(nums) / len(nums)
    
    print(f"\nResults:")
    print(f"Minimum: {minimum}")
    print(f"Maximum: {maximum}")
    print(f"Average: {average:.2f}")  # .2f for 2 decimal places

Sample run:

Enter integers. Enter -1 to stop.
Enter a number (-1 to stop): 5
Enter a number (-1 to stop): 10
Enter a number (-1 to stop): 3
Enter a number (-1 to stop): -1

Results:
Minimum: 3
Maximum: 10
Average: 6.00

Edge case handling: Always check if list is empty before calculating statistics! min([]) and max([]) will raise ValueError, and sum([])/len([]) will raise ZeroDivisionError.


Template 3: Dictionary Operations

Problem: Write a Python program that:

  • Stores student names and marks in a dictionary
  • Allows multiple entries
  • Prints the average mark

💡 Click to View Verified Answer

# Initialize empty dictionary: {name: mark}
students = {}
 
print("Student Grade Recorder")
print("Enter student names and marks. Type 'stop' as name to finish.")
 
# Main input loop
while True:
    # Get student name with prompt
    name = input("\nEnter student name (or 'stop' to finish): ")
    
    # Check for stop condition (case-insensitive)
    if name.lower() == "stop":
        break
    
    # Check for empty name
    if name.strip() == "":
        print("Name cannot be empty!")
        continue
    
    # Get mark with error handling
    try:
        mark = int(input(f"Enter mark for {name}: "))
        
        # Optional: Validate mark range
        if mark < 0 or mark > 100:
            print("Warning: Mark is outside 0-100 range.")
        
        # Store in dictionary
        students[name] = mark
        print(f"Recorded: {name} = {mark}")
        
    except ValueError:
        print("Invalid mark! Please enter a number.")
 
# Calculate and display average
if len(students) == 0:
    print("\nNo students were recorded.")
else:
    # Get all marks using .values()
    all_marks = students.values()
    average = sum(all_marks) / len(all_marks)
    
    print(f"\n--- Student Records ---")
    for name, mark in students.items():
        print(f"{name}: {mark}")
    print(f"\nAverage mark: {average:.2f}")

Key dictionary operations:

  • dict.values() - get all values (marks)
  • dict.items() - get all key-value pairs
  • dict.keys() - get all keys (names)

✍️ 手写精简版 (HANDWRITING VERSION)

students = {}
while True:
    name = input("Name (stop to end): ")
    if name == "stop":
        break
    mark = int(input("Mark: "))
    students[name] = mark
# Calculate average
avg = sum(students.values()) / len(students)
print("Average:", avg)

手写要点: 约10行, dict存储, .values()取所有分数


Template 4: Grade Calculator (Decision Structure)

Problem: Write a function grade_calculator(score) that:

  • Returns letter grade: 90+ → "A", 80+ → "B", 70+ → "C", 60+ → "D", <60 → "F"
  • Returns "Invalid" for negative or > 100

💡 Click to View Verified Answer

def grade_calculator(score):
    """
    Convert numeric score to letter grade.
    
    Args:
        score: Numeric score (expected 0-100)
    
    Returns:
        str: Letter grade (A/B/C/D/F) or "Invalid"
    """
    # FIRST: Check for invalid input
    # Must check this BEFORE checking grade ranges
    if score < 0 or score > 100:
        return "Invalid"
    
    # Check grades from highest to lowest
    # Using elif ensures only ONE condition is matched
    if score >= 90:
        return "A"
    elif score >= 80:
        return "B"
    elif score >= 70:
        return "C"
    elif score >= 60:
        return "D"
    else:
        return "F"
 
 
# Test the function
if __name__ == "__main__":
    test_scores = [95, 85, 73, 65, 45, -5, 105]
    for s in test_scores:
        print(f"Score {s} → Grade {grade_calculator(s)}")

Output:

Score 95 → Grade A
Score 85 → Grade B
Score 73 → Grade C
Score 65 → Grade D
Score 45 → Grade F
Score -5 → Grade Invalid
Score 105 → Grade Invalid

Common mistakes:

  1. Not checking invalid input FIRST
  2. Using multiple if instead of elif (would return wrong grade)
  3. Checking in wrong order (60+ before 90+)

✍️ 手写精简版 (HANDWRITING VERSION)

def grade(score):
    if score < 0 or score > 100:
        return "Invalid"
    if score >= 90: return "A"
    if score >= 80: return "B"
    if score >= 70: return "C"
    if score >= 60: return "D"
    return "F"

手写要点: 约8行, 先判断invalid, 从高到低判断


Template 5: Boolean Function (PREDICTED TOPIC!)

Problem: Write a function that returns True/False based on conditions (function + and/or/not)

💡 Click to View Examples

Example 1: Check if number is in range [10, 50]

def in_range(n):
    return n >= 10 and n <= 50
 
print(in_range(25))  # True
print(in_range(5))   # False

Example 2: Check if all three numbers are positive

def all_positive(a, b, c):
    return a > 0 and b > 0 and c > 0
 
print(all_positive(1, 2, 3))   # True
print(all_positive(1, -2, 3))  # False

Example 3: Check if at least one is even

def has_even(a, b, c):
    return a % 2 == 0 or b % 2 == 0 or c % 2 == 0
 
print(has_even(1, 3, 5))  # False
print(has_even(1, 2, 5))  # True

Example 4: Check if string is valid password

def is_valid_password(pwd):
    # At least 8 characters and contains digit
    has_length = len(pwd) >= 8
    has_digit = any(c.isdigit() for c in pwd)
    return has_length and has_digit
 
print(is_valid_password("abc12345"))  # True
print(is_valid_password("short1"))    # False

Boolean operators:

  • and = 两个都要成立
  • or = 至少一个成立
  • not = 取反

Keywords: return True/False, 条件组合

✍️ 手写精简版 (HANDWRITING VERSION)

def is_valid(x, y, z):
    return x > 0 and y > 0 and z > 0

手写要点: 1行return即可, 用and/or组合条件


Template 6: Prime Number Check

Problem: Write a function is_prime(num) that returns True if prime, False otherwise.

💡 Click to View Verified Answer

def is_prime(num):
    """
    Check if a number is prime.
    
    A prime number is:
    - Greater than 1
    - Only divisible by 1 and itself
    
    Args:
        num: Integer to check
    
    Returns:
        bool: True if prime, False otherwise
    """
    # Numbers less than 2 are not prime
    # (0, 1, and negative numbers)
    if num < 2:
        return False
    
    # 2 is the only even prime
    if num == 2:
        return True
    
    # All other even numbers are not prime
    if num % 2 == 0:
        return False
    
    # Check odd divisors up to square root of num
    # Why sqrt? If n = a × b, one of a,b must be ≤ √n
    # We use int(num ** 0.5) + 1 to include the square root
    for i in range(3, int(num ** 0.5) + 1, 2):  # Step by 2 (odd numbers only)
        if num % i == 0:
            return False  # Found a divisor, not prime
    
    return True  # No divisors found, it's prime
 
 
# Test the function
if __name__ == "__main__":
    test_nums = [1, 2, 3, 7, 10, 11, 25, 29]
    for n in test_nums:
        result = "Prime" if is_prime(n) else "Not Prime"
        print(f"{n}: {result}")

Output:

1: Not Prime
2: Prime
3: Prime
7: Prime
10: Not Prime
11: Prime
25: Not Prime
29: Prime

Optimization: Only checking up to √n reduces time complexity from O(n) to O(√n).


Template 6: Fibonacci Sequence

Problem: Write a function fibonacci(n) that returns the first n Fibonacci numbers as a list.

💡 Click to View Verified Answer

def fibonacci(n):
    """
    Generate the first n Fibonacci numbers.
    
    Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, ...
    Each number is the sum of the two preceding numbers.
    
    Args:
        n: Number of Fibonacci numbers to generate
    
    Returns:
        list: First n Fibonacci numbers
    """
    # Handle edge cases
    if n <= 0:
        return []  # Empty list for invalid input
    if n == 1:
        return [0]  # Only the first number
    
    # Start with first two Fibonacci numbers
    result = [0, 1]
    
    # Generate remaining numbers
    for i in range(2, n):
        # Each new number = sum of last two
        next_num = result[-1] + result[-2]  # Use negative indexing
        result.append(next_num)
    
    return result
 
 
# Test the function
if __name__ == "__main__":
    for count in [0, 1, 5, 10]:
        print(f"fibonacci({count}) = {fibonacci(count)}")

Output:

fibonacci(0) = []
fibonacci(1) = [0]
fibonacci(5) = [0, 1, 1, 2, 3]
fibonacci(10) = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

Template 7: Remove Duplicates (Preserve Order)

Problem: Write a function that removes duplicates from a list while preserving the order of first occurrence.

💡 Click to View Verified Answer

def remove_duplicates(lst):
    """
    Remove duplicate elements while preserving first occurrence order.
    
    Example: [1, 2, 2, 3, 1, 4] → [1, 2, 3, 4]
    
    Args:
        lst: Input list with possible duplicates
    
    Returns:
        list: New list with duplicates removed
    """
    seen = []  # Track items we've already seen
    
    for item in lst:
        if item not in seen:  # Only add if not seen before
            seen.append(item)
    
    return seen
 
 
# Alternative using dict (Python 3.7+ preserves insertion order)
def remove_duplicates_v2(lst):
    """
    Remove duplicates using dictionary (more efficient for large lists).
    dict.fromkeys() preserves first occurrence order.
    """
    return list(dict.fromkeys(lst))
 
 
# Test both versions
if __name__ == "__main__":
    test = [1, 2, 2, 3, 1, 4, 5, 3, 2]
    print(f"Original: {test}")
    print(f"Method 1: {remove_duplicates(test)}")
    print(f"Method 2: {remove_duplicates_v2(test)}")

Output:

Original: [1, 2, 2, 3, 1, 4, 5, 3, 2]
Method 1: [1, 2, 3, 4, 5]
Method 2: [1, 2, 3, 4, 5]

Why not use set()? Sets don't preserve order! list(set([1, 2, 2, 3, 1, 4])) might give [1, 2, 3, 4] but order is not guaranteed.


Template 8: Exception Handling

Problem: Write a program that repeatedly asks for a nonzero integer and calculates its reciprocal, handling invalid inputs.

💡 Click to View Verified Answer

def get_reciprocal():
    """
    Get a nonzero integer from user and calculate its reciprocal.
    Handles ValueError (non-integer) and ZeroDivisionError (zero input).
    """
    while True:
        try:
            # Get input from user
            n = int(input("Enter a nonzero integer: "))
            
            # Calculate reciprocal (will raise ZeroDivisionError if n=0)
            reciprocal = 1 / n
            
            # If we get here, input was valid
            print(f"The reciprocal of {n} is {reciprocal:.3f}")
            break  # Exit loop on success
            
        except ValueError:
            # int() failed - input was not a valid integer
            print("Error: You did not enter a valid integer. Try again.")
            
        except ZeroDivisionError:
            # Division by zero
            print("Error: You entered zero. Cannot divide by zero. Try again.")
 
 
# Run the function
if __name__ == "__main__":
    get_reciprocal()

Sample run:

Enter a nonzero integer: abc
Error: You did not enter a valid integer. Try again.
Enter a nonzero integer: 0
Error: You entered zero. Cannot divide by zero. Try again.
Enter a nonzero integer: 4
The reciprocal of 4 is 0.250

📝 Q3: Machine Learning - Theory & Code (25 points)

Flowchart Symbols (MUST KNOW!)

Keywords: 流程图用于设计和解释程序逻辑

💡 Click to View All 5 Symbols

SymbolShapeNamePurpose
⬭OvalTerminalStart/End of the flowchart
▱ParallelogramI/OInput/Output operations (e.g., enter values, display results)
▭RectangleProcessProcessing/Calculation (e.g., x = a + b)
◇DiamondDecisionCondition check (Yes/No branches)
→ArrowFlow LineDirection of flow in program logic

Example question: Draw a flowchart to find the largest among three numbers (a, b, c).

Flowchart structure:

[Start] → [Input a, b, c] → <a > b?> 
                               ↓Yes        ↓No
                           <a > c?>    <b > c?>
                           ↓Yes  ↓No   ↓Yes  ↓No
                         [max=a][max=c][max=b][max=c]
                               ↓ ↓ ↓ ↓
                         [Output max] → [End]

Key points for exam:

  1. Start/End: 必须有开始和结束符号
  2. Input: 在处理前获取输入
  3. Decision: 用菱形表示条件判断,有Yes/No两个分支
  4. Process: 矩形框内写计算操作
  5. Arrows: 所有符号用箭头连接,指示流程方向

Algorithm Comparison Table (MUST KNOW!)

Keywords: 根据数据特征选择合适的模型

AlgorithmBest ForProsConsWhen to Use?
Naive BayesSmall, low-dimensionalFast, simple, low varianceIndependence assumption unrealisticCompare probabilities, pick highest
Logistic RegressionSmall-medium dataInterpretable, stableLinear separation onlyLinear relationship, probability output
Decision TreeSmall-medium dataEasy to understand, visualProne to overfittingNeed clear rules, explainable
KNNSmall dataSimple, no trainingSlow, sensitive to noiseSmall data, low dimensions, classification
SVMSmall, high-dimensionalHigh accuracy, good generalizationComplex, hard to tuneHigh-dimensional data, margin-based
Random ForestMedium dataAccurate, resists overfittingLess interpretableImproved bagging decision tree

💡 Model Selection Quick Rules (考试速查)

Q: 小数据、低维度、分类问题,选哪个模型? A: Naive Bayes - 表现好、方差低、需要数据少、专为分类设计

Q: Random Forest比Decision Tree好在哪? A: 减少过拟合 - 通过组合多个决策树(bagging)来提高准确性

Q: K值增大会怎样? A: K变大 → 方差减小(更稳定) + 偏差增大(更偏)

Q: SVM适合什么数据? A: 高维数据强,大数据慢 - Works well in high-dimensional spaces, but computationally expensive for large datasets

Q: Why use Encoder before ML model? A: ML需要数值输入 - Machine learning models require numerical input; encoders transform categorical data into numbers


Pandas Basics

💡 Click to View Common Operations

import pandas as pd
 
# ========== Reading Data ==========
# Read CSV file
df = pd.read_csv("data.csv")
 
# Display first/last rows
print(df.head())   # First 5 rows
print(df.tail())   # Last 5 rows
print(df.shape)    # (rows, columns)
 
# ========== Handling Missing Values ==========
# Check for missing values
print(df.isnull().sum())  # Count of nulls per column
 
# Fill missing values
df['Age'].fillna(df['Age'].mean(), inplace=True)      # Fill with mean
df['Age'].fillna(df['Age'].median(), inplace=True)    # Fill with median
df['Salary'].fillna(50000, inplace=True)              # Fill with specific value
 
# Drop rows with missing values
df.dropna(inplace=True)
 
# ========== Selecting Data ==========
# Select single column
ages = df['Age']
 
# Select multiple columns
subset = df[['Name', 'Age']]
 
# Filter rows
adults = df[df['Age'] >= 18]
 
# Multiple conditions (use & for AND, | for OR)
result = df[(df['Age'] >= 18) & (df['Department'] == 'IT')]
 
# ========== Grouping ==========
# Group by and aggregate
avg_salary = df.groupby('Department')['Salary'].mean()

LabelEncoder for Categorical Data

💡 Click to View Example

from sklearn.preprocessing import LabelEncoder
import pandas as pd
 
# Sample data with categorical columns
data = {
    'Color': ['Red', 'Blue', 'Green', 'Red', 'Blue'],
    'Size': ['Small', 'Medium', 'Large', 'Medium', 'Small']
}
df = pd.DataFrame(data)
 
# Create LabelEncoder instance
le = LabelEncoder()
 
# Encode each categorical column
# LabelEncoder sorts values alphabetically then assigns 0, 1, 2...
for col in df.columns:
    if df[col].dtype == 'object':  # Check if column is string/object type
        df[col] = le.fit_transform(df[col])
 
print(df)
# Color encoding: Blue=0, Green=1, Red=2 (alphabetical)
# Size encoding: Large=0, Medium=1, Small=2 (alphabetical)

LabelEncoder mapping (always alphabetical):

OriginalEncoded
Blue0
Green1
Red2

Train-Test Split

💡 Click to View Example

from sklearn.model_selection import train_test_split
 
# Assume X = features, y = target variable
X = df.drop('target', axis=1)  # All columns except target
y = df['target']               # Target column only
 
# Split data: 80% training, 20% testing
X_train, X_test, y_train, y_test = train_test_split(
    X, y, 
    test_size=0.2,      # 20% for testing
    random_state=42     # For reproducibility
)
 
print(f"Training set size: {len(X_train)}")
print(f"Testing set size: {len(X_test)}")

Why train-test split?

  1. Evaluate model on UNSEEN data
  2. Detect overfitting (memorizing training data)
  3. Simulate real-world usage
  4. Get honest performance estimate

fit() vs predict()

💡 Click to View Explanation

MethodPurposeWhen Used
fit()Train the modelOnce, on training data only
predict()Apply the modelOn test/new data
fit_transform()Fit and transform in one stepFor preprocessing (scaler, encoder)

Important:

  • Use fit_transform() on training data
  • Use transform() only on test data (NOT fit_transform!)
# CORRECT workflow for scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)  # Fit AND transform
X_test_scaled = scaler.transform(X_test)         # Transform only (no fit!)

SVM Implementation

💡 Click to View Complete Code

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report
 
# Step 1: Load data
df = pd.read_csv("data.csv")
 
# Step 2: Encode categorical features
le = LabelEncoder()
for col in df.select_dtypes(include=['object']).columns:
    if col != 'target':  # Don't encode target yet if needed later
        df[col] = le.fit_transform(df[col])
 
# Step 3: Separate features (X) and target (y)
X = df.drop('target', axis=1)
y = df['target']
 
# Step 4: Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)
 
# Step 5: Scale features (IMPORTANT for SVM!)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)  # Fit on training data
X_test_scaled = scaler.transform(X_test)         # Transform only on test data
 
# Step 6: Create and train SVM model
svm_model = SVC(kernel='rbf', random_state=42)  # RBF kernel is default
svm_model.fit(X_train_scaled, y_train)
 
# Step 7: Make predictions
y_pred = svm_model.predict(X_test_scaled)
 
# Step 8: Evaluate
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

SVM Key Points:

  1. Scaling is REQUIRED - SVM is sensitive to feature magnitudes
  2. Kernel trick - transforms data to higher dimensions for separation
  3. Common kernels: 'linear', 'rbf' (Gaussian), 'poly' (polynomial)

Random Forest Implementation

💡 Click to View Complete Code

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
 
# Step 1: Load and prepare data
df = pd.read_csv("data.csv")
 
# Step 2: Handle categorical (if needed)
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
for col in df.select_dtypes(include=['object']).columns:
    df[col] = le.fit_transform(df[col])
 
# Step 3: Split features and target
X = df.drop('target', axis=1)
y = df['target']
 
# Step 4: Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)
 
# Step 5: Create and train Random Forest
# Note: No scaling needed for tree-based models!
rf_model = RandomForestClassifier(
    n_estimators=100,    # Number of trees
    max_depth=None,      # No limit on depth
    random_state=42
)
rf_model.fit(X_train, y_train)
 
# Step 6: Predictions and evaluation
y_train_pred = rf_model.predict(X_train)
y_test_pred = rf_model.predict(X_test)
 
print(f"Training Accuracy: {accuracy_score(y_train, y_train_pred):.2f}")
print(f"Testing Accuracy: {accuracy_score(y_test, y_test_pred):.2f}")
 
# Step 7: Feature importance
importance = pd.DataFrame({
    'feature': X.columns,
    'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=False)
print("\nFeature Importance:")
print(importance)

Random Forest Key Points:

  1. Bagging: Creates multiple trees on different data subsets
  2. Reduces overfitting: Averaging many trees is more stable
  3. No scaling needed: Tree-based methods don't need scaling
  4. Feature importance: Shows which features matter most

StandardScaler vs MinMaxScaler

💡 Click to View Comparison

ScalerFormulaOutput RangeBest For
StandardScaler(x - mean) / stdMean=0, Std=1SVM, Logistic Regression, data with outliers
MinMaxScaler(x - min) / (max - min)[0, 1]Neural Networks, KNN, bounded features

Quick rule:

  • SVM, Linear models → StandardScaler
  • Neural networks, images → MinMaxScaler
from sklearn.preprocessing import StandardScaler, MinMaxScaler
 
# StandardScaler: Z-score normalization
scaler1 = StandardScaler()
X_standard = scaler1.fit_transform(X)
 
# MinMaxScaler: Scale to [0, 1]
scaler2 = MinMaxScaler()
X_minmax = scaler2.fit_transform(X)

📝 Q4: Naive Bayes & Decision Tree (25 points)

Bayes' Theorem Formula

$$P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}$$

Where:

  • $P(A|B)$ = Posterior probability (what we want)
  • $P(B|A)$ = Likelihood
  • $P(A)$ = Prior probability
  • $P(B)$ = Evidence (normalizing constant)

Naive Bayes Calculation Example

💡 Click to View Worked Example

Dataset: Classify emails as Spam or Not Spam

EmailContains "Free"Contains "Winner"Spam?
1YesYesSpam
2YesNoSpam
3NoYesSpam
4NoNoNot Spam
5YesNoNot Spam
6NoNoNot Spam

Question: New email has "Free"=Yes, "Winner"=No. Is it Spam?

Step 1: Calculate Priors

  • P(Spam) = 3/6 = 0.5
  • P(Not Spam) = 3/6 = 0.5

Step 2: Calculate Likelihoods

For Spam emails (1, 2, 3):

  • P(Free=Yes | Spam) = 2/3 (emails 1, 2)
  • P(Winner=No | Spam) = 1/3 (email 2 only)

For Not Spam emails (4, 5, 6):

  • P(Free=Yes | Not Spam) = 1/3 (email 5)
  • P(Winner=No | Not Spam) = 3/3 = 1 (all three)

Step 3: Calculate Unnormalized Posteriors

$P(Spam | evidence) \propto P(Spam) \times P(Free=Yes|Spam) \times P(Winner=No|Spam)$ $= 0.5 \times \frac{2}{3} \times \frac{1}{3} = 0.111$

$P(Not Spam | evidence) \propto 0.5 \times \frac{1}{3} \times 1 = 0.167$

Step 4: Normalize $P(Spam) = \frac{0.111}{0.111 + 0.167} = \frac{0.111}{0.278} = 0.40 = 40%$

Prediction: NOT SPAM (40% < 50%)


Gini Index Formula & Calculation

$$Gini = 1 - \sum_{i=1}^{n} p_i^2$$

Where $p_i$ is the proportion of class $i$ in the node.

💡 Click to View Worked Example

Dataset: 20 emails (12 Spam, 8 Not Spam)

Split by "Contains Free":

  • Contains "free": 10 emails (9 Spam, 1 Not Spam)
  • No "free": 10 emails (3 Spam, 7 Not Spam)

Step 1: Gini for "Contains Free" node (9S, 1N)

  • P(Spam) = 9/10 = 0.9
  • P(Not Spam) = 1/10 = 0.1
  • Gini = 1 - (0.9² + 0.1²) = 1 - (0.81 + 0.01) = 0.18

Step 2: Gini for "No Free" node (3S, 7N)

  • P(Spam) = 3/10 = 0.3
  • P(Not Spam) = 7/10 = 0.7
  • Gini = 1 - (0.3² + 0.7²) = 1 - (0.09 + 0.49) = 0.42

Step 3: Weighted Average Gini $Gini_{split} = \frac{10}{20} \times 0.18 + \frac{10}{20} \times 0.42$ $= 0.5 \times 0.18 + 0.5 \times 0.42 = 0.09 + 0.21 = 0.30$

Final Answer: Gini for this split = 0.30

Interpretation: Lower Gini = better split. Pure node has Gini = 0.


Information Gain (Entropy)

$$Entropy = -\sum_{i=1}^{n} p_i \log_2(p_i)$$

$$Information\ Gain = Entropy(parent) - \sum_{children} \frac{n_{child}}{n_{parent}} \times Entropy(child)$$

💡 Click to View Worked Example

Parent node: 3 Spam, 3 Not Spam (50/50 split)

Parent Entropy (perfect balance = maximum entropy): $H(parent) = -0.5 \log_2(0.5) - 0.5 \log_2(0.5)$ $= -0.5(-1) - 0.5(-1) = 0.5 + 0.5 = 1.0$

Child node "Free=Yes" (2 Spam, 1 Not Spam): $H = -\frac{2}{3} \log_2(\frac{2}{3}) - \frac{1}{3} \log_2(\frac{1}{3})$ $= 0.390 + 0.528 = 0.918$

Child node "Free=No" (1 Spam, 2 Not Spam): $H = -\frac{1}{3} \log_2(\frac{1}{3}) - \frac{2}{3} \log_2(\frac{2}{3}) = 0.918$

Weighted Entropy: $= \frac{3}{6}(0.918) + \frac{3}{6}(0.918) = 0.918$

Information Gain: $IG = 1.0 - 0.918 = 0.082$


Complete Entropy Calculation Example (EXAM FORMAT!)

💡 Click to View Full Decision Tree Example

Dataset: Predict if student will pass

GPAStudiedPassed
LowNoNo
LowYesNo
MedNoNo
MedYesYes
HighNoYes
HighYesYes

Question: Calculate H(Passed), H(Passed|GPA), H(Passed|Studied), then draw decision tree.

Step 1: Calculate H(Passed) - 目标变量的熵

  • Passed=Yes: 3个 → P(Yes) = 3/6 = 0.5
  • Passed=No: 3个 → P(No) = 3/6 = 0.5

$H(Passed) = -0.5 \log_2(0.5) - 0.5 \log_2(0.5)$ $= -0.5 \times (-1) - 0.5 \times (-1) = 0.5 + 0.5 = 1.0$

Answer: H(Passed) = 1.0 (完美50/50分布 = 最大熵)


Step 2: Calculate H(Passed | GPA) - 按GPA分组的条件熵

GPA = Low (2条记录: 0 Yes, 2 No)

  • H(Low) = -0 \log_2(0) - 1 \log_2(1) = 0 (纯节点!)

GPA = Med (2条记录: 1 Yes, 1 No)

  • H(Med) = -0.5 \log_2(0.5) - 0.5 \log_2(0.5) = 1.0

GPA = High (2条记录: 2 Yes, 0 No)

  • H(High) = -1 \log_2(1) - 0 \log_2(0) = 0 (纯节点!)

Weighted Average: $H(Passed|GPA) = \frac{2}{6} \times 0 + \frac{2}{6} \times 1.0 + \frac{2}{6} \times 0$ $= 0 + 0.333 + 0 = 0.333$

Answer: H(Passed|GPA) = 0.333


Step 3: Calculate H(Passed | Studied) - 按Studied分组的条件熵

Studied = No (3条记录: 1 Yes, 2 No)

  • P(Yes) = 1/3, P(No) = 2/3
  • H(No) = -1/3 \log_2(1/3) - 2/3 \log_2(2/3)
  • = 0.528 + 0.390 = 0.918

Studied = Yes (3条记录: 2 Yes, 1 No)

  • P(Yes) = 2/3, P(No) = 1/3
  • H(Yes) = 0.918 (对称)

Weighted Average: $H(Passed|Studied) = \frac{3}{6} \times 0.918 + \frac{3}{6} \times 0.918 = 0.918$

Answer: H(Passed|Studied) = 0.918


Step 4: Compare Information Gain

  • IG(GPA) = H(Passed) - H(Passed|GPA) = 1.0 - 0.333 = 0.667 ✅ 更高!
  • IG(Studied) = H(Passed) - H(Passed|Studied) = 1.0 - 0.918 = 0.082

选GPA作为根节点 (信息增益更高)


Step 5: Draw Decision Tree

         [GPA?]
       /   |   \
    Low   Med   High
     ↓     ↓      ↓
   [No] [Studied?] [Yes]
          /    \
        No     Yes
         ↓       ↓
       [No]   [Yes]

Log值速查表 (考试可用计算器):

  • log₂(0.5) = -1
  • log₂(1) = 0
  • log₂(1/3) ≈ -1.585
  • log₂(2/3) ≈ -0.585

规则: 0 × log₂(0) = 0 (按约定)


Decision Tree Code Template

💡 Click to View Complete Code

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report
 
# Load data
df = pd.read_csv("data.csv")
 
# Encode categorical if needed
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
for col in df.select_dtypes(include=['object']).columns:
    df[col] = le.fit_transform(df[col])
 
# Split features and target
X = df.drop('target', axis=1)
y = df['target']
 
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)
 
# Create and train Decision Tree
# criterion='gini' is default (CART algorithm)
# criterion='entropy' uses Information Gain (ID3/C4.5)
dt_model = DecisionTreeClassifier(
    criterion='gini',    # or 'entropy'
    max_depth=5,         # Limit depth to prevent overfitting
    random_state=42
)
dt_model.fit(X_train, y_train)
 
# Evaluate
y_pred = dt_model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")
print(classification_report(y_test, y_pred))

🎯 Quick Reference Checklist

Python Essentials

  • Operator precedence: ** > *,/,//,% > +,-
  • String slicing: left-inclusive, right-exclusive
  • Floor division // rounds toward negative infinity
  • range(a,b) generates a to b-1
  • List assignment creates reference, not copy

Machine Learning Essentials

  • Bayes formula: P(A|B) = P(B|A) × P(A) / P(B)
  • Gini: 1 - Σ(pᵢ²)
  • Entropy: -Σ pᵢ log₂(pᵢ)
  • SVM: needs scaling, uses kernel trick
  • Random Forest: reduces overfitting via bagging
  • Decision Tree: uses Gini (CART) or Entropy (ID3)

💪 Good luck on your exam! 🎓

All code in this document has been verified and tested.

#ABW505#Python#Machine Learning#Final Exam#Question Bank

💬 评论