Data Sets (privately owned)
Goal
By providing some basic information about a specific house, city feature information, surrounding statistical summary based on the location of the house, macro economic information, and lagged price information, make prediction on the rental and sale price.
Main Model
Math part
Pict=Xictβ+ZctΦ+GgtΨ+θWcd×Pmeancd+3∑j=1δj×PLc(t−j)+ϵictwhere the subscripts are:
i: ith observation
c: cth city
d: dth district
g: gth grid
and the other notations are:
X: house features per se
Z: city features
G: “grid” surrounding features
H: macro economic features
W: IDW matrix of each city (calculated by the center cordinates of each distrcit of a city)
Pmean: average price (by district)
PL: lagged average price (by city)
ϵ: error term
Results
Compared to Linear Regression, LASSO and Ridge Regression, XGBoosting outperforms in prediction (which has the lowest MSE on test set);
Feature importance generated from XGBoosting has been confirmed by the LASSO feature selection results in some extent.