Classification

Classification is a type of supervised learning in machine learning where the goal is to assign input data into predefined categories or classes. The system learns from labeled data, understanding the relationship between the input features and the class labels, and then predicts the correct category for new data.

Example

An example of classification is an email system that automatically labels incoming emails as “spam” or “not spam” based on patterns learned from previously labeled emails.

1.Problem scoping

We are trying to predict whether a student will be admitted to a university based on their Exam 1 and Exam 2 scores.

Goal: Build a model that classifies students as admitted (1) or not admitted (0).
Type of problem: Binary classification, since there are two possible outcomes.

2. Data Acquisition

We collected data in the variable data_raw, which contains exam scores and admission results.
Each entry is in the format [Exam 1 score, Exam 2 score, Admission status].
For example, [34.62, 78.02, 0] means the student scored 34.6 and 78.0 but was not admitted.

3.Data exploration

We explored the data to understand its patterns.

We created scatter plots to visualize how exam scores relate to admission.
We observed that higher exam scores usually lead to admission.
We also separated the dataset into two groups: admitted and not admitted.

import matplotlib.pyplot as plt

import numpy as np

# Convert data_raw to a numpy array for easier slicing

data_np = np.array(data_raw)

# Separate data based on the admission status (last column)

admitted = data_np[data_np[:, 2] == 1]

not_admitted = data_np[data_np[:, 2] == 0]

# Plot

plt.figure(figsize=(8, 6))

plt.scatter(admitted[:, 0], admitted[:, 1], marker='o', label='Admitted')

plt.scatter(not_admitted[:, 0], not_admitted[:, 1], marker='x', label='Not Admitted')

plt.xlabel('Exam 1 Score')

plt.ylabel('Exam 2 Score')

plt.title('Exam Scores vs. Admission Status')

plt.legend()

plt.grid(True)

plt.show()

4. Modelling

We used Logistic Regression, which is suitable for binary classification problems:

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()

model.fit(X, y)

We used scikit-learn (sklearn) because it is a powerful and easy-to-use machine learning library in Python.

5. Evaluation

We then used the trained model to predict admission outcomes for all students:

from sklearn.metrics import accuracy_score

print(accuracy_score(y_test, model.predict(X_test)))

We compared the predicted results with the actual results and plotted them to visually check the model’s accuracy.

To evaluate more precisely, we could calculate metrics such as accuracy, precision, or recall, but the plot itself gives us a good sense of how well our model performed.

Page updated

Google Sites

Report abuse

Classification

Classification

Example

1.Problem scoping

2. Data Acquisition

3.Data exploration

4. Modelling

5. Evaluation

Get in touch raahil.ippadi@gmail.com