Analisis Faktor Penentu Kategori Harga Rumah di Kota Tangerang Selatan Menggunakan Web Crawling dan Regresi Logistik Multinomial
Abstract
This study aims to identify factors influencing housing price categories in South Tangerang City using digital data obtained through web crawling from online property platforms. The research addresses how physical attributes, facilities, and location affect the probability of a house belonging to a specific price category. Data were automatically collected via web crawling, and after data cleaning and validation, 1,264 housing records were retained for analysis. Housing prices were classified into four categories—Economical, Standard, Luxury, and Exclusive—using a quartile-based approach. Multinomial Logistic Regression (MLR) was applied to model relative probabilities based on land area, building area, number of bedrooms, number of bathrooms, garage availability, and district location. The results indicate that land area, building area, number of bathrooms, and garage availability significantly influence housing price categories, while the number of bedrooms and district location are not significant after controlling for physical characteristics. The model is statistically significant and achieves a classification accuracy of 64.8%. The main contribution of this study lies in the integration of web crawling and Multinomial Logistic Regression for housing price classification, offering a data-driven framework to support housing market analysis and automated property valuation systems.

