Once the data is collected, Data Exploration is performed. This step is about examining and understanding the data in detail. It involves identifying patterns, correlations, trends, missing values, and anomalies that may affect the AI’s performance. Data exploration also helps decide which features (data points) are most important for predicting outcomes and may guide data cleaning and transformation.
For example, in the school scenario, data exploration may reveal that attendance and homework scores strongly correlate with exam performance, while participation in extra-curricular activities has little effect. Some records may have missing homework scores, which must be handled before training the model. This step ensures that the data is meaningful and ready for building an AI model.