D.F.


Cleaning Nashville Housing Data

Process

Gather Data

Download the Nashville Housing dataset from GitHub and import into BigQuery.

Standardize the date format

Change the saltes dat field from the datetime format to date.

Fix missing data

Populate null property addresses where there is a related ParcelID.

Split address

Split the property address and owner address into the street address, city, and state.

Fix inconsitant values

Change inconsitant Y/N values to Yes/No in the SoldAsVacant field.

Remove Duplicates and Unused Columns

Delete duplicates and unused columns in the copied database.

Results

The data was cleaned by:

  1. changing the date format with CAST,
  2. populating missing data values in PropertyAddress field,
  3. separating the PropertyAddress and OwnerAddress fields using SPLIT()[OFFSET()],
  4. converting shorthand Y/N to longhand Yes/No using CASE-WHEN,
  5. removing duplicates using a CTE.

Future Posibilities

  • Find housing data in another city or region.
  • Compare the Nashville housing data to another region's housing data.
  • Redo the project in MS SQL Server.