In a dataset of cars, a column 'colour' has missing values. The dataset has to be free of any missing values. Which measure of central tendency we should use to fill the data?

Question

Which of the following denotes a missing value in a DataFrame?

Answer 1

Step 1: Understanding the Question:
The question asks to identify the most suitable measure of central tendency to impute (fill in) missing values in a categorical data column named 'colour' in a dataset.

Step 2: Data Types and Central Tendency:
- Numerical Data: Quantitative measurements (like price or age) where mathematical calculations like averages can be performed. Useful measures: Mean, Median.
- Categorical Data: Qualitative groupings (like colour, brand, or gender) consisting of text labels instead of numbers.

Step 3: Detailed Explanation:
- Let us evaluate the applicability of each measure of central tendency to the 'colour' column:
- Mean: Requires adding all the values and dividing by the total count. Since we cannot mathematically add text values (e.g., "Red" + "Blue" + "Green"), the mean is impossible to calculate for categorical data.
- Median: Requires sorting values numerically to locate the middle element. Since there is no inherent numerical order for color names, the median cannot be calculated for categorical data.
- Mode: Identifies the most frequently occurring value in the dataset. This can easily be computed for text data by counting the frequency of each category (e.g., if "Red" is the most common color, then "Red" is the mode).
- Therefore, when cleaning categorical columns with missing values, the standard practice is to replace the missing fields with the most frequent value, which is the Mode.

Step 4: Final Answer:
For categorical columns containing missing text data, the Mode is the preferred measure of central tendency to fill in the gaps.
Hence, option (C) is the correct choice.

In a dataset of cars, a column 'colour' has missing values. The dataset has to be free of any missing values. Which measure of central tendency we should use to fill the data?

Show Hint

The Correct Option is C

Solution and Explanation

Top Questions on Handling Missing Data

Questions Asked in CUET (UG) exam