What is data cleaning? How can we do that?Data cleaning is also known as data scrubbing. Data cleaning is a process which ensures the set of data is correct and accurate. Data accuracy and consistency, data integration is checked during data cleaning. Data cleaning can be applied for a set of records or multiple sets of data which need to be merged.
Data cleaning is performed by reading all records in a set and verifying their accuracy. Typos and spelling errors are rectified. Mislabeled data if available is labeled and filed. Incomplete or missing entries are completed. Unrecoverable records are purged, for not to take space and inefficient operations.
What is data cleaning? How can we do that?Data cleaning is the process of identifying erroneous data. The data is checked for accuracy, consistency, typos etc.
Parsing - Used to detect syntax errors.
Data Transformation - Confirms that the input data matches in format with expected data.
Duplicate elimination - This process gets rid of duplicate entries.
Statistical Methods- Values of mean, standard deviation, range, or clustering algorithms etc are used to find erroneous data.