Classification is a type of supervised learning in machine learning where the goal is to assign input data into predefined categories or classes. The system learns from labeled data, understanding the relationship between the input features and the class labels, and then predicts the correct category for new data.
An example of classification is an email system that automatically labels incoming emails as “spam” or “not spam” based on patterns learned from previously labeled emails.
We are trying to predict whether a student will be admitted to a university based on their Exam 1 and Exam 2 scores.
Goal: Build a model that classifies students as admitted (1) or not admitted (0).
Type of problem: Binary classification, since there are two possible outcomes.
We collected data in the variable data_raw, which contains exam scores and admission results.
Each entry is in the format [Exam 1 score, Exam 2 score, Admission status].
For example, [34.62, 78.02, 0] means the student scored 34.6 and 78.0 but was not admitted.
We explored the data to understand its patterns.
We created scatter plots to visualize how exam scores relate to admission.
We observed that higher exam scores usually lead to admission.
We also separated the dataset into two groups: admitted and not admitted.
import matplotlib.pyplot as plt
import numpy as np
# Convert data_raw to a numpy array for easier slicing
data_np = np.array(data_raw)
# Separate data based on the admission status (last column)
admitted = data_np[data_np[:, 2] == 1]
not_admitted = data_np[data_np[:, 2] == 0]
# Plot
plt.figure(figsize=(8, 6))
plt.scatter(admitted[:, 0], admitted[:, 1], marker='o', label='Admitted')
plt.scatter(not_admitted[:, 0], not_admitted[:, 1], marker='x', label='Not Admitted')
plt.xlabel('Exam 1 Score')
plt.ylabel('Exam 2 Score')
plt.title('Exam Scores vs. Admission Status')
plt.legend()
plt.grid(True)
plt.show()
We used Logistic Regression, which is suitable for binary classification problems:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X, y)
We used scikit-learn (sklearn) because it is a powerful and easy-to-use machine learning library in Python.
We then used the trained model to predict admission outcomes for all students:
from sklearn.metrics import accuracy_score
print(accuracy_score(y_test, model.predict(X_test)))
We compared the predicted results with the actual results and plotted them to visually check the model’s accuracy.
To evaluate more precisely, we could calculate metrics such as accuracy, precision, or recall, but the plot itself gives us a good sense of how well our model performed.