# Visualizing Airbnb Gentrification

# Overview

Starting from the mid-2000, Airbnb has become an essential part of our everyday travelling. Demands are enormous in the OTA business nowadays. But have you ever stopped and wondered what factors contribute to the listing price? What type of go-to spots near Airbnb contribute significantly to its listing price? When tourists travel to Manhattan, they cannot miss the numerous galleries. What is the effect of the distance to an Airbnb’s nearest gallery on its listing price for each night, after accounting for the effects of the distance to the nearest subway station and room type in Manhattan, New York? The answer is the vicinity to a gallery contributes more for Airbnb's listing price than the other elements mentioned above. Let’s delve in deeper into the research process.

# Tool

R Studio, ArcGIS

# My Role

Research, Data Cleaning, Data Engineering, Data Analysis

# Instructor

Carole Voulgaris

# Duration

2 weeks

Setup - Choosing Data Variables

I will be addressing this question using Airbnb listing data from a mission-driven project, 'Inside Airbnb', and public amenities data from NYC OpenData. My dataset includes 21,598 Airbnb listings in Manhattan, New York. From the whole set of variables, I will only choose certain of them as my research focus.

# Airbnb Listing Price

# Review Score

# Year Round Availability

# Minimum Night of Stay

# The Distance

# To Its Nearest Gallery

# To Its Nearest Subway Station

# To Its Nearest Park

# The Density

# Of Its Nearby Galleries

# Of Its Nearby Subway Stations

# Of Its Nearby Parks

# Room Type

# Entire Home / Apartment

# Private Room

# Shared Room

# Airbnb Listing Price

# The Distance

# To Its Nearest Gallery

# To Its Nearest Subway Station

# Room Type

# Entire Home / Apartment

# Private Room

# Shared Room

*outcome | continuous variable*

**Airbnb Listing Price**

*predictor | continuous variable*

**Vicinity of The Gallery**

*predictor | continuous variable*

**Vicinity of The Subway Station**

*predictor | categorical variable*

**Room Type**

^{*Inside my categorical variable Room Type, I choose Entire Home/Apartment as my reference category}

To get the vicinity data (distance to an Airbnb listing's nearest gallery/subway station/park/etc.), I will use ArcGIS to calculate the geographical distance value between each airbnb listing to its nearest target of interest.

Data Showcase

# Airbnb Listing Price

(USD$)

# Vicinity of The Gallery

(Meter)

# Vicinity of The Subway Station

(Meter)

# Full Range

*10-2000*

*10-2000*

*2-3658*

*2-3658*

*1-1658*

*1-1658*

# Interquartile Range

*95-220*

*95-220*

*130-395*

*130-395*

*233-512*

*233-512*

# Standard Deviation

*164*

*164*

*290*

*290*

*228*

*228*

# Mean

*186*

*186*

*312*

*312*

*398*

*398*

# Median

*150*

*150*

*223*

*223*

*360*

*360*

Table 1 presents basic descriptive statistics for each continuous variable in the dataset.

➊ Airbnb Listing Price

Listing prices in the sample have a minimum of $10, and a maximum of $2,000. Half of the Airbnb listings in the sample have a listing price between $95 and $220, representing an interquartile range of $125, which is less than the standard deviation of $164. The median value of $150 is less than the average value of $186, which suggests some left skew in the distribution, as illustrated in Figure 1.

➋ Vicinity to The Gallery

The distance from an Airbnb listing to its nearest gallery in the sample has a minimum of 2 metres and a maximum of 3,658 metres. Half of the Airbnb listings to their nearest galleries in the sample have distances between 233 and 512 metres, representing an interquartile range of 279 metres, larger than the standard deviation of 228 metres. The median value of 223 metres is less than the average value of $312, which suggests some left skew in the distribution, as illustrated in Figure 2.

➌ Vicinity to The Subway Station

The distance from an Airbnb listing to its nearest subway station in the sample has a minimum of 1 metre and a maximum of 1,658 metres. Half of the Airbnb listings to their nearest subway stations in the sample have distances between 130 and 395 metres, representing an interquartile range of 265 metres, slightly less than the standard deviation of 295 metres. The median value of 360 metres is less than the average value of $398, which suggests some left skew in the distribution, as illustrated in Figure 3.

➍ Airbnb Listing Room Type

Figure 4 illustrates the sample proportions for different Airbnb listing room types. More than half of the listing sample (61 percent) is an entire home or apartment. The remaining listing sample is mostly private rooms (37 percent) only 2 percent of the listing sample is shared rooms.

Hypothese Testing

I will first do hypothesis testing, calculate a 95-percent confidence interval for a correlation between the listing price and its distance to the nearest gallery (both are log-transformed), then calculate a 95-percent confidence interval for a correlation between the listing price and its distance to the nearest subway station, and calculate 95-percent confidence interval for the average listing price on each room type.

➊ Relationship between an Airbnb listing price and distance to the nearest gallery

The relationship between an Airbnb listing’s vicinity to its nearest gallery and its listing price is illustrated in Figure 5 (both axes are log-transformed).The 95-percent confidence interval for the correlation between the log of an Airbnb listing price and the log of the distance to its nearest gallery is between -0.30 and -0.28. This suggests that we can be 95-percent confident that there is a negative relationship between an Airbnb listing price and the distance to its nearest gallery.

➋ Relationship between an Airbnb listing price and distance to the nearest subway station

The relationship between an Airbnb listing’s vicinity to its nearest subway station and its listing price is illustrated in Figure 6 (both axes are log-transformed).The 95-percent confidence interval for the correlation between the log of an Airbnb listing price and the log of the distance to its nearest gallery is between -0.05 and -0.02. This suggests that we can be 95-percent confident that there is a negative relationship between an Airbnb listing price and the distance to its nearest gallery.

➌ Relationship between an Airbnb listing room type and listing price

Figure 7 shows the average Airbnb listing price within different usual room types. Error bars represent 95-percent confidence intervals. The 95-percent confidence interval for the average Airbnb listing price within Entire home/apts is between $230 and 236. The 95-percent confidence interval for the average Airbnb listing price within private rooms is $110 and 115. The 95-percent confidence interval for the average Airbnb listing price within shared rooms is $81 and 97.

Regression Models

I will then create regression models, to find a linear equation that would let me predict the value of listing price based on the value of its distance to the nearest gallery, its distance to the nearest subway station, and room type.

# Coefficient

# Estimated Value

# P-Value

# Intercept

*233.413*

*233.413*

*<2e-16*

*<2e-16*

# Private Room

*-121.004*

*-121.004*

*<2e-16*

*<2e-16*

# Shared Room

*-144.436*

*-144.436*

*<2e-16*

*<2e-16*

➊ Relationship between an Airbnb listing room type and listing price in regression

Table 2 shows the results of the model that predicts an Airbnb listing price based on room type.The R-squared value for this model was 0.1333, suggesting that about 13 percent of the variation on listing price can be explained by differences in rome type. The coefficients for **private room** and **shared room** both had significant coefficients, indicating that when booking different room types of Airbnb had listing prices that were significantly different from the listing price for the **entire home or apartment**. The coefficients for **private room** and **shared room** are both negative, indicating that booking these two room types of airbnb would cost less money than booking an **entire home or apartment**.

# Predictor

# Estimated Coefficient (In Regression With A Single PRedictor)

# Estimated Intercept Value

# P-Value

# Model R2

# Distance To Its Nearest Gallery

*-0.11*

*-0.11*

*219.74*

*219.74*

*<2e-16*

*<2e-16*

*0.0376*

*0.0376*

# Distance To Its Nearest Subway Station

*-0.014*

*-0.014*

*191.26*

*191.26*

*0.0032*

*0.0032*

*0.0004*

*0.0004*

➋ Relationship between an Airbnb listing price and continuous variables (distance to the nearest gallery and distance to the nearest subway station) in regression

Table 3 shows the results of two different regression models: one predicting Airbnb **listing price** based on the **distance to its nearest gallery**, and the other predicting **listing price** based on the **distance to its nearest subway station**.

The first model with the** distance to its nearest gallery** predicts about 3.76% of the total variation in **listing price**, with R-squared values of 0.0376; The second model with the **distance to its nearest subway station** predicts about 0.04% of the total variation in **listing price**, with R-squared values of 0.0004.

The coefficient in the first model is significant and negative, indicating that, without controlling for other factors, a shorter **distances to an Airbnb’s nearest gallery** is associated with higher **listing price**; The coefficient in the second model, however, is negative but not significant, indicating that, without controlling for other factors, a shorter **distance to an Airbnb’s nearest subway station** is less associated with higher **listing price** than a shorter **distance to its nearest gallery**.

Code Sample (R)

Click To View The Complete Code on GitHub ↗

Limitation

The distance to an Airbnb listing’s nearest gallery and subway station is the linear distance; this can only serve as heuristic value for the accessibility for a listing to its surrounding galleries, subway stations. To get a more accurate or realistic accessibility representation, I need to switch these **linear distances** to **network distances** in ArcGIS.

Wrap It Up

The vicinity to the galley, subway station and room types are some features that contribute to an Airbnb listing price in Manhattan, New York. The listing price of an Airbnb associates more with the distance to its nearest gallery than the distance to its nearest subway station, and after accounting for the effects of the distance to the nearest subway station and room type, when it gets closer to an gallery, the listing price would be higher. In order to get a more realistic representation of the vicinity value, my next step would be to switch the linear distance to the network distance to accurately represent the length for transportation.

Next Step

- Switch the linear distance to the network distance to accurately represent the length for transportation.

- Add more geographical variables such as the distance to a listing's nearest park or public institution, and also review scores, etc.

- Publish the data visualization in an interactive format using leaflet.js, d3.js, and Mapbox API.