Analytics Case Study: Step by Step Visualization Centric (AirBnB Example)

Amit Bhardwaj
4 min readMar 20, 2021

--

Problem Statement :

Over the last five years, we have witnessed an intriguing trend that suggests a correlation between the number of property images associated with a listing and the number of bookings it attracts. We have also noticed an overwhelming number of listings being redundant, due to the lack of any associated images.

You need to help the management decide upon a minimum number of images to be made mandatory for a listing that would ensure bookings.

Also, come up with an optimal number of images that we can suggest the host to post along with a listing that would attract the most bookings and ensure success.

Before moving ahead, let’s have a look at the data-set :

Listings provide a random data sample of 500 listings posted by various hosts (including Superhosts) in the last 5 years from various locations, along with their associated number of property images and the number of bookings they attracted.

Listings Data

Open_Listings provides data for over a year, that shows a number of open listings for each date. Where open listings mean the property listings that were available but did not attract any booking by end of the day. The listings have been classified according to the number of associated images.

Open listings data

Redundant listings provide data as of August 31, 2019, for the Total Listings and the Redundant Listings in each category. Redundant listings here means the listings that have not attracted even a single booking in the last 1 year. The categories here are classified according to the associated number of property images.

Redundant listings data

Columns in all the tables are self-explanatory, let’s move ahead with doing some analysis and visualisation.

Cleaning of the data provided :

Before finding a correlation, we should remove outliers from the data

Outliers detection
Outliers treatment
Distribution of data after outliers treatment

Now we have clean and normalised data we can move on to the analysis

Listings and Bookings by users

While listing, more users are posting less than 4 or no images.

While booking, those listings are getting booked more which have more than 6 images.

Percentage of Bookings done on listings wrt number of images posted by all types of users

Bookings are getting done (20%) even when the images posted are as low as 0–2.

After, 6–10 images posted booking percentage is similar (~90%)

Correlation of Image Posted and Bookings for both User type: Regular Or SuperUsers

There is a very high correlation between Images posted by Regular users for their listings to get booked.

There is a moderate correlation between Images posted by Superusers for their listings to get booked.

Average Open Listings for images posted

The average value of the open listings is maximum when image posted are between the range of 3–5.

The average value of the open listings is minimum when image posted are between the range of 11–15.

It can also be observed that the average value of open listings is similar when images are 6–10.

There is a steep decrease of 68% in open listings when images posted are increased to 6–10 or 11–15.

Open Listings Distribution wrt Images Posted

It can be observed that for less than 6–10 images posted open listings are relatively higher than the rest.

The anomaly can also be detected in open listings for the period of Aug 2018.

Key Results and Conclusions

In conclusion, the following are the points that can be derived :

  1. Since we want to attract more listings and our success is when they are booked so we can mandate at least 4 images for each user type while listing the property.
  2. We have observed that there is a saturation in the percentage of booking after more than 10 images i.e. booking percentage remains the same even when the images posted are as high as 20–30.

Thus, an optimal range of 8–10 images is advisable for the successful booking of a listed property.

code for this can be found here.

Thanks!

--

--