Airbnb Rome EDA

Author

Maan Al Neami,
Nourah Almutairi,
Ammar Alfaifi,
Salman Al-Harbi,
Dina Alkhammash

Introduction:

In this report we will be analyzing Rome Airbnb properties dataset from inside airbnb and try to analyze it to find what variables influences the income generted by the property.

Why Rome?

We choose Rome because it’s one of the most visited cities by tourists in Europe. Thanks to the rich history, amazing food, and the relatively cheaper prices compared to other major european cities. All of these factors make Rome one of the most sought after investments in the hospitality and tourism industry.

About the dataset source

We got our dataset from inside airbnb. Inside Airbnb is a project that provides data and advocacy about Airbnb’s impact on residential communities. They provide data and information to empower communities to understand, decide and control the role of renting residential homes to tourists.

Data dictionary

Variable	Description
host_name	Name of the host.
neighbourhood	Name of the neighbourhood.
latitude	Used to make an interactive map.
longitude	Used to make an interactive map.
room_type	Entire apt, private room, hotel room.
Price	Price in Euro.
minimum_nights	minimum number of night stay for the listing.
number_of_reviews	The number of reviews the listing has.
last_review	The date of the last/newest review.
availability_365	The availability of the listing x days in the future.
number_of_reviews_ltm	The number of reviews the listing has (in the last 12 months).
amenities	A list of amenities in the listing.

Data Munging

First we will be doing basic data cleaning

Importing libraries and Data

Code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import folium
from folium.plugins import MarkerCluster
from folium import plugins
from folium.plugins import FastMarkerCluster
from folium.plugins import HeatMap
from ast import literal_eval
from plotly.subplots import make_subplots


df = pd.read_csv('data/listings.csv')
df2 = pd.read_csv('data/listings-detailed.csv')

df.head(5)

	id	name	host_id	host_name	neighbourhood_group	neighbourhood	latitude	longitude	room_type	price	minimum_nights	number_of_reviews	last_review	reviews_per_month	calculated_host_listings_count	availability_365	number_of_reviews_ltm	license
0	49955080	Singola al Casale di Gardenia	396326393	Alessia	NaN	XV Cassia/Flaminia	42.07605	12.32067	Private room	66	1	0	NaN	NaN	3	88	0	NaN
1	41146116	Il Giardino di Veio	322089651	Rosetta	NaN	XV Cassia/Flaminia	42.05088	12.45619	Private room	20	2	1	2020-01-26	0.03	1	0	0	NaN
2	39624404	CAMERA MATRIMONIALE STANDARD CON COLAZIONE INC...	304471512	Hotel	NaN	VI Roma delle Torri	41.82882	12.73900	Private room	100	1	0	NaN	NaN	1	180	0	NaN
3	1903817	Lovely apartment with fabulous view north of Rome	9883614	Eva	NaN	XV Cassia/Flaminia	42.13578	12.32621	Entire home/apt	110	3	53	2022-05-25	0.63	4	289	3	NaN
4	17617868	SUPER OFFERTA-stanza Maria-doppia o matrimoniale	97622372	Eleonora	NaN	XV Cassia/Flaminia	42.06512	12.46106	Private room	25	1	12	2022-05-17	0.19	3	315	3	16903

Cleaning Data

First we checked for duplicates and null values

Code

df.duplicated().sum()

df.isnull().sum()

id                                    0
name                                  3
host_id                               0
host_name                             5
neighbourhood_group               23911
neighbourhood                         0
latitude                              0
longitude                             0
room_type                             0
price                                 0
minimum_nights                        0
number_of_reviews                     0
last_review                        3941
reviews_per_month                  3941
calculated_host_listings_count        0
availability_365                      0
number_of_reviews_ltm                 0
license                           20387
dtype: int64

there was no duplicates but we have null values in host_id, host_name, neighbourhood_group, licence and . We decided to drop the missing values for all columns except reviews.

Second we used apply to convert the price in the dataset to float. We also decided to drop price values of 0 and bigger than 50000 as there was nothing interesting to investigate there and they will miss up our result.

Code

df2["price"]=df2["price"].apply(lambda x : float(x[1:].replace(",","")))
df2.drop(df2[df2["price"]>=50000].index,axis=0,inplace=True)
df2.drop(df2[df2["price"]==0].index,axis=0,inplace=True)

The amenities column has amenities written as a list inside a string, so we will use literal_eval from ast to turn it into lists of strings, then use explode from pandas to give each element of the list it’s own row, and lastly we will perform a one hot encoding using crosstab

Code

df2["amenities"] = df2["amenities"].apply(literal_eval)
exploded_df2 = df2.explode('amenities')
df_new = pd.crosstab(exploded_df2['id'],exploded_df2['amenities']).rename_axis(None,axis=1).add_prefix("amenities_")

Code

df = pd.concat([df.set_index("id"), df_new], axis=1, join='inner').reset_index()
df = df.join(df2[["review_scores_rating", "review_scores_accuracy", "review_scores_cleanliness", "review_scores_checkin", "review_scores_communication", "review_scores_location", "review_scores_value"]])

EDA

Listings by neighbourhood

Here we want to see how many listings we have per neighbourhood

Code

dist_plt = df['neighbourhood'].value_counts().nlargest(5).plot.bar()
plt.xticks(rotation = 30)

(array([0, 1, 2, 3, 4]),
 [Text(0, 0, 'I Centro Storico'),
  Text(1, 0, 'VII San Giovanni/Cinecittà'),
  Text(2, 0, 'II Parioli/Nomentano'),
  Text(3, 0, 'XIII Aurelia'),
  Text(4, 0, 'XII Monte Verde')])

We can see from this fig above that most of the listings in our dataset are located at I Centro Storico.

Now lets see how these listing appear on a map using folium library

Code

Long=12.6
Lat=41.8
locations = list(zip(df.latitude, df.longitude))

map1 = folium.Map(location=[Lat,Long], zoom_start=10.5)
FastMarkerCluster(data=locations).add_to(map1)
map1

Make this Notebook Trusted to load map: File -> Trust Notebook

As we can see, most of the listings are located in the city center.

Next we want to see the distribution of listings type in the dataset

Code

df['room_type'].value_counts().plot(kind = 'bar', color = ['g', 'r', 'b', 'y'])
plt.title('Listings type count')
plt.xticks(rotation = 30)

(array([0, 1, 2, 3]),
 [Text(0, 0, 'Entire home/apt'),
  Text(1, 0, 'Private room'),
  Text(2, 0, 'Hotel room'),
  Text(3, 0, 'Shared room')])

The plot above shows us that most listings are of type Entire Apt, followed by Private room and Hotel room.

Lets also see what’s the median price per neighbourhood

Code

mode_dist_plt = df.groupby('neighbourhood')['price'].median().nlargest(5).plot.bar()
plt.title('Median price per neighbourhood (Top 5)')
plt.xticks(rotation = 30)

(array([0, 1, 2, 3, 4]),
 [Text(0, 0, 'I Centro Storico'),
  Text(1, 0, 'XIII Aurelia'),
  Text(2, 0, 'XV Cassia/Flaminia'),
  Text(3, 0, 'II Parioli/Nomentano'),
  Text(4, 0, 'XII Monte Verde')])

The highest median price is in I Centro Storico, followed by XIII Aurelia.

Now lets check for mean price per neighbourhood

Code

mean_dist_plt = df.groupby('neighbourhood')['price'].mean().nlargest(5).plot.bar()
plt.title('Mean price per neighbourhood (Top 5)')
plt.xticks(rotation = 30)

(array([0, 1, 2, 3, 4]),
 [Text(0, 0, 'I Centro Storico'),
  Text(1, 0, 'II Parioli/Nomentano'),
  Text(2, 0, 'XIII Aurelia'),
  Text(3, 0, 'XV Cassia/Flaminia'),
  Text(4, 0, 'IX Eur')])

The highest mean price is in XIII Aurelia, followed by I Centro Storico.

Let’s also look at the heatmap of the listings above 300 euro

Code

df_50 = df[df['price']>=300]


map2=folium.Map([42,12],zoom_start=9.8)
location = ['latitude','longitude']
df_map = df_50[location]
HeatMap(df_map.dropna(),radius=8,gradient={.4: 'blue', .65: 'lime', 1: 'red'}).add_to(map2)
map2

Make this Notebook Trusted to load map: File -> Trust Notebook

We also see here that the highest prices are in I Centro Storico.

Amenities and Price

What are the most frequent amenities in Roma listings?

Code

amenities = {}
for c in df.columns:
    if "amenities" in c: 
        amenities[c] = df[c].value_counts()[1]
amenities_list = sorted(amenities, key=amenities.get, reverse=True)[:10]
amenities = {c:df[c].value_counts()[1] for c in amenities_list}
fig = px.bar(x = amenities.keys(), y = amenities.values(), title = "Most Frequent Amenities", labels={
    "y": "Count",
    "x": "Amenities"
})
fig.show()

We can see that Wifi is the most frequent amenity in Roma’s listings, followed bt Essentials, Hair dryer, and Long term stay. This amenities might be the basic or standard amenities that a vistor to Roma would want.

What is the average price for the most frequent Amenities?

Code

amenities = {}
for c in df.columns:
    if "amenities" in c: 
        amenities[c] = df[c].value_counts()[1]
amenities_list = sorted(amenities, key=amenities.get, reverse=True)[:10]
amenities = {c:df.groupby(c)['price'].mean()[1] for c in amenities_list}
fig = px.bar(x = amenities.keys(), y = amenities.values(), title = "Average Price for the Most Frequent Amenities", labels={
    "y": "Average Price",
    "x": "Amenities"
})
fig.show()

from the above graph we can see that listsings that have Air Conditioning, have the highest average price with 185.24

What are the most expensive Amenities?

Code

amenities = {}

for c in df.columns:
    if "amenities" in c: 
        amenities[c] = df.groupby(c)['price'].mean()[1]

amenities_list = sorted(amenities, key=amenities.get, reverse=True)[:10]
amenities = {c:df.groupby(c)['price'].mean()[1] for c in amenities_list}
amenities_count = {c:df[c].value_counts()[1] for c in amenities_list}

fig = px.bar(x = amenities.keys(), y = amenities.values(), title = "Average Price for the Most Expnsive Amenities", labels={
    "y": "Average Price",
    "x": "Amenities"
})

fig.show()

The most expensive listings amenity is Outdoor seating with a 10.5K, followed by Piastre electric stove, Balcony, and Security cameras.

What is the distribution of the most expensive Amenities?

Code

amenities = {}

for c in df.columns:
    if "amenities" in c: 
        amenities[c] = df.groupby(c)['price'].mean()[1]

amenities_list = sorted(amenities, key=amenities.get, reverse=True)[:10]
amenities_count = {c:df[c].value_counts()[1] for c in amenities_list}

fig = px.bar(x = amenities_count.keys(), y = amenities_count.values(), title = "The Distribution of the Most Expnsive Amenities", labels={
    "y": "Count",
    "x": "Amenities"
})

fig.show()

We can see that only one listing has an outdoor seating and Piastre electric stove, while three listings have Balcony and two lsitings have security camera, although these amenities are in expensive listings they seems not to be frequent in Roma.

Distribution of Review’s Rating Scores

Code

for c in df.columns:
    if "scores" in c:
        df[c].fillna(df[c].mean(), inplace=True)

Code

rating_location_category = pd.cut(df.review_scores_location,bins=[1, 2, 3, 4, 5, 6], labels=["Terrible","Bad","Okay","Good", "Great"], right=False)
df['rating_location_category'] = rating_location_category
sns.countplot(x = rating_location_category)

<AxesSubplot:xlabel='review_scores_location', ylabel='count'>

Code

rating_category = pd.cut(df.review_scores_rating,bins=[1, 2, 3, 4, 5, 6],labels=["Terrible","Bad","Okay","Good", "Great"], right=False)
df['rating_category'] = rating_category
sns.countplot(x = rating_category)

<AxesSubplot:xlabel='review_scores_rating', ylabel='count'>

Code

value_category = pd.cut(df.review_scores_value,bins=[1, 2, 3, 4, 5, 6],labels=["Terrible","Bad","Okay","Good", "Great"], right=False)
df['value_category'] = value_category
sns.countplot(x = value_category, data=df)

<AxesSubplot:xlabel='review_scores_value', ylabel='count'>

It seems that in all three types of review’s scores,Good is the most frequent one followed by

Does the rating score effect the price

Code

data = df.groupby("rating_category")["price"].mean()
sns.barplot(x=data.index, y=data)

<AxesSubplot:xlabel='rating_category', ylabel='price'>

It seems that there is no relation between review_scores_rating and price.

what is the income of the past year and the expected income for the next 3 months?

More data cleaning

Code

romeListings = df2.copy()
romeListings.at[19678,"minimum_minimum_nights"]=7
romeListings.drop(14443,inplace=True)
romeListings.drop(7022,inplace=True)
romeListings.at[5888,"minimum_minimum_nights"]=3
romeListings.drop(7250,inplace=True)
romeListings.at[11454,"minimum_minimum_nights"]=5
romeListings.at[20421,"price"]=121.51
romeListings.at[23230,"price"]=90
romeListings.at[9646,"price"]=92.73
romeListings.at[4737,"minimum_minimum_nights"]=3
romeListings.drop(romeListings[romeListings["price"]==0].index,inplace=True)

add min_booked_nights_past_12m and min_income_past_12m column to the dataset

Code

#the minimum estmtation of the number of booked nights of each listing in the last 12 month (current date = 2022-06-07)
romeListings["min_booked_nights_past_12m"]=romeListings.apply(lambda x : x["number_of_reviews_ltm"]*x["minimum_nights_avg_ntm"],axis=1)

#the minimum estmtation of the income of each listing in the last 12 month (current date = 2022-06-07)
romeListings["min_income_past_12m"]=romeListings.apply(lambda x : x["min_booked_nights_past_12m"]*x["price"],axis=1)

add expected_booked_nights_coming_3m and expected_income_coming_3m column to the dataset

Code

#the expected number of booked nights of each listing in the next 3 month (current date = 2022-06-07)
romeListings["expected_booked_nights_coming_3m"]=romeListings.apply(lambda x : 90-x["availability_90"],axis=1)

#the expected income of each listing  in the next 3 month (current date = 2022-06-07)
romeListings["expected_income_coming_3m"]=romeListings.apply(lambda x : (90-x["availability_90"])*x["price"],axis=1)

what neighbourhoods have highest averege price ?

Code

temp=romeListings.groupby("neighbourhood_cleansed")["price"].mean().sort_values(ascending=False).head(5).reset_index()
fig = px.bar(temp,x = "neighbourhood_cleansed", y = "price", title = "The highest price per night average of listings within a neighbourhood",color="neighbourhood_cleansed", labels={
    "y": "Price",
    "x": "Neighbourhood"
})
fig.show()

what neighbourhoods have highest averege of booked nights over the next 3 month ?

Code

temp=romeListings.groupby("neighbourhood_cleansed")["expected_booked_nights_coming_3m"].mean().sort_values(ascending=False).head(50).reset_index()
fig = px.bar(temp,x = "neighbourhood_cleansed", y = "expected_booked_nights_coming_3m", title = "The highest booked nights average of listings within a neighbourhood",color="neighbourhood_cleansed", labels={
    "y": "Booked nights (next 3 months)",
    "x": "Neighbourhood"
},range_y=[0,90])
fig.show()

what room types have highest averege price ?

Code

temp=romeListings.groupby("room_type")["price"].mean().sort_values(ascending=False).head(5).reset_index()
fig = px.bar(temp,x = "room_type", y = "price", title = "The highest price per night average of listings of a room type  ",color="room_type", labels={
    "y": "Price",
    "x": "Room Type"
})
fig.show()

what room types have highest averege of booked nights over the next 3 month ?

Code

temp=romeListings.groupby("room_type")["expected_booked_nights_coming_3m"].mean().sort_values(ascending=False).reset_index()
fig = px.bar(temp,x = 'room_type', y = "expected_booked_nights_coming_3m", title = "The highest booked nights average of listings of a room type",color="room_type", labels={
    "y": "Booked nights (next 3 months)",
    "x": "Room Type"
},range_y=[0,90])
fig.show()

what (room type ,neighbourhood) combinations have the highest averege booked nights ?

Code

nBookedNightinNigh=romeListings.groupby(["room_type","neighbourhood_cleansed"])["expected_booked_nights_coming_3m"].agg(["mean","count"])
temp=nBookedNightinNigh[nBookedNightinNigh["count"]>11]["mean"].sort_values(ascending=False).head(5)
fig = px.bar(x = list(map(str,list(temp.keys()))), y = temp.values, title = "The highest booked nights average of listings for every (Room Type,Neighbourhood) combination", labels={
    "y": "Booked nights (next 3 months)",
    "x": "(Room Type,Neighbourhood)"
},range_y=[0,90])
fig.show()

what neighbourhoods have highest minumam income average for the past 12 months ?

Code

temp=romeListings.groupby("neighbourhood_cleansed")["min_income_past_12m"].mean().sort_values(ascending=False).head(5).reset_index()
fig = px.bar(temp,x = "neighbourhood_cleansed", y = "min_income_past_12m", title = "The highest minimum income of listings within a Neighbourhood",color="neighbourhood_cleansed", labels={
    "y": "average minimum income of listings (past 12 months)",
    "x": "Neighbourhood"
})

fig.show()

what neighbourhoods with the highest expected income average for the next 3 months ?

Code

temp= romeListings.groupby("neighbourhood_cleansed")["expected_income_coming_3m"].mean().sort_values(ascending=False).head(5).reset_index()
fig = px.bar(temp,x = "neighbourhood_cleansed", y = "expected_income_coming_3m", title = "The highest expected income of listings within a Neighbourhood",color="neighbourhood_cleansed", labels={
    "y": "average expected income of listings (next 3 months)",
    "x": "Neighbourhood"
})

fig.show()

what room types have highest minumam income average for the past 12 months ?

Code

temp=romeListings.groupby("room_type")["min_income_past_12m"].mean().sort_values(ascending=False).head(5).reset_index()
fig = px.bar(temp,x = "room_type", y = "min_income_past_12m", title = "The highest minimum income of listings based on room type",color="room_type", labels={
    "y": "average minimum income of listings (past 12 months)",
    "x": "Room Type"
})

fig.show()

what room types have highest expected income average for the next 3 months ?

Code

temp=romeListings.groupby("room_type")["expected_income_coming_3m"].mean().sort_values(ascending=False).head(5).reset_index()
fig = px.bar(temp,x = "room_type", y = "expected_income_coming_3m", title = "The highest expected income of listings based on room type",color="room_type", labels={
    "y": "average expected income of listings (next 3 months)",
    "x": "Room Type"
})

fig.show()

what (room type ,neighbourhood) combinations have the highest averege income ?

Code

nBookedNightinNigh=romeListings.groupby(["room_type","neighbourhood_cleansed"])["expected_income_coming_3m"].agg(["mean","count"])
temp=nBookedNightinNigh[nBookedNightinNigh["count"]>11]["mean"].sort_values(ascending=False).head(5)
fig = px.bar(x = list(map(str,list(temp.keys()))), y = temp.values, title = "The highest income average of listings for every (Room Type,Neighbourhood) combination", labels={
    "y": "average income (next 3 months)",
    "x": "(Room Type,Neighbourhood)"
},)
fig.show()

does the host’s account appearance effects the listing income ?

Code

temp=romeListings.groupby("host_has_profile_pic")["expected_income_coming_3m"].mean().reset_index()
fig = px.bar(temp,x = ["No","Yes"], y = "expected_income_coming_3m", title = "",color=["No","Yes"], labels={
    "y": "average income (next 3 months)",
    "x": "host has a profile pic ?"
})
fig.show()

Code

temp= romeListings.groupby("host_identity_verified")["expected_income_coming_3m"].mean().reset_index()
fig = px.bar(temp,x = ["No","Yes"], y = "expected_income_coming_3m", title = "",color=["No","Yes"], labels={
    "y": "average income (next 3 months)",
    "x": "is the host identity verified ?"
},)
fig.show()

Code

temp= romeListings.groupby("instant_bookable")["expected_income_coming_3m"].mean()
fig = px.bar(temp,x = ["No","Yes"], y = "expected_income_coming_3m", title = "",color=["No","Yes"], labels={
    "y": "average income (next 3 months)",
    "x": "can be booked instantly ?"
},)
fig.show()

does the response time effects the the densisty of the number of the booked nights ?

Code

f, axes = plt.subplots(2, 2, figsize=(20,8))
ax = sns.kdeplot(romeListings[romeListings["host_response_time"]=="a few days or more"]["expected_booked_nights_coming_3m"],x="expected_booked_nights_coming_3m",color="red", fill=True,ax=axes[0,0])
ax = sns.kdeplot(romeListings[romeListings["host_response_time"]=="within a day"]["expected_booked_nights_coming_3m"],x="expected_booked_nights_coming_3m",color="green", fill=True,ax=axes[0,1])
ax = sns.kdeplot(romeListings[romeListings["host_response_time"]=="within a few hours"]["expected_booked_nights_coming_3m"],x="expected_booked_nights_coming_3m",color="orange",  fill=True,ax=axes[1,0])
ax = sns.kdeplot(romeListings[romeListings["host_response_time"]=="within an hour"]["expected_booked_nights_coming_3m"],x="expected_booked_nights_coming_3m",color="blue",  fill=True,ax=axes[1,1])

Does the Location Rating effect the minimum income of listings (past 12 months)

Code

romeListings[["rating_location_category", "rating_category", "value_category"]] = df[["rating_location_category", "rating_category", "value_category"]]

ax = sns.relplot(data= romeListings, x="review_scores_location", y ="min_income_past_12m" ,alpha=0.25)
plt.xticks([0, 1, 2, 3, 4, 5], ["0", "1", "2", "3", "4", "5"])
plt.yscale("log")
plt.title("The minimum income of listings based on location rating")
plt.xlabel("Rating")
plt.ylabel("average minimum income of listings (past 12 months)")
plt.show()

We can see that there is an upward trend that indicate, that listings with higher location review score had higher average minimum income for the past year.

Code

data = romeListings.groupby("rating_location_category")["min_income_past_12m"].mean().sort_values(ascending=False)
sns.barplot(x=data.index, y=data)
plt.title("The Average minimum income of listings based on location rating")
plt.xlabel("Rating")
plt.ylabel("average minimum income of listings (past 12 months)")
plt.show()

We can see here also that Good and Great, both have higher average minimum income than the rest of the categories.

What about the expected income for the next 3 months? let’s check it out.

Does the Location Rating effect the expected average income (next 3 months)

Code

data = romeListings.groupby("rating_location_category")["expected_income_coming_3m"].mean().sort_values(ascending=False)
sns.barplot(x=data.index, y=data)
plt.title("The Expected Income in 3 Months for Listings based on rating")
plt.xlabel("Rating")
plt.ylabel("average income (next 3 months)")
plt.show()

Frome the above figure, it looks like listings with Great location review score are expected to have the highest minimum income for the next three months.

Code

data = romeListings.groupby("rating_location_category")["expected_booked_nights_coming_3m"].mean().sort_values(ascending=False)
sns.barplot(x=data.index, y=data)
plt.title("The Expected Booked Nights in 3 Months for Listings based on rating")
plt.xlabel("Rating")
plt.ylabel("Booked nights (next 3 months)")
plt.show()

It seems that listings with Great, Good or Okay location review score are expected to be booked more than the other categories for the next three months.

What (Bedrooms, Room Type) combination have highest expected income average for the next 3 months?

Code

nBookedNightinNigh=romeListings.groupby(["room_type","bedrooms"])["expected_income_coming_3m"].agg(["mean","count"])
temp=nBookedNightinNigh[nBookedNightinNigh["count"]>11]["mean"].sort_values(ascending=False).head(5)
fig = px.bar(x = list(map(str,list(temp.keys()))), y = temp.values, title = "The highest Expected Income average of listings for every (Bedrooms, Room Type) combination", labels={
    "y": "average income (next 3 months)",
    "x": "(Bedrooms, Room Type)"
})
fig.show()

Listings that are an entire home or an apartment seems to be expected to have the highest income average for the next three months, and an entire home or an apartment with seven bedrooms are expected to have the highest income average with 65.5K

What (Bedrooms, Room Type) combination have highest minimum income average for the past 12 months?

Code

nBookedNightinNigh=romeListings.groupby(["room_type","bedrooms"])["min_income_past_12m"].agg(["mean","count"])
temp=nBookedNightinNigh[nBookedNightinNigh["count"]>11]["mean"].sort_values(ascending=False).head(5)
fig = px.bar(x = list(map(str,list(temp.keys()))), y = temp.values, title = "The highest income average of listings for every (Room Type, Number of Bedrooms) combination", labels={
    "y": "average minimum income of listings (past 12 months)",
    "x": "(Room Type, Number of Bedrooms)"
})
fig.show()

From the above figure, it seems for the past year, listings that are an entire home or an apartment also have the highest income average for the past year, and an entire home or an apartment with seven bedrooms have the highest income average with 7.8K.

Code

data = romeListings.groupby("bedrooms")["expected_income_coming_3m"].agg(["mean", "count"])
data = data[data["count"]>11]["mean"].sort_values(ascending=False).head(10)
fig = px.bar(x = data.index, y = data, title = "The highest Expected Income average of listings for every Bedrooms count", labels={
    "y": "average income (next 3 months)",
    "x": "Number of Bedrooms"
})
fig.show()

We can see that listings with seven bedrooms are expected to have the highest average income for the next three months, followed by 8, 6 and 5 bedrooms.

Conclusion

To increase the profitabilty of your invesment in Rome:

Have a verification mark.
Have a profile picture.
Invest in the top neighbourhood: ‘II Centro Storico’.
Invest in the room type: Entire Home/Apt in ‘II Centro Storico’, with 7 bedrooms
Try not to exceed the average prices, might lead to bad Scores Value.
Try to response as quick as possible.
Provide an instant booking.
Include the following amenities: WiFi, Hair Dryer, …

Challenges

In this project we faced some challenges, here is some of them:

Choosing a dataset based on cities.
A dataset with more than 70 columns, is not easy.
Cleaning the amenities column, into discrete values.
First time to use world map library.
Language and namings difficulties.
Estimating occupancy and income.

Sources

http://insideairbnb.com/get-the-data/
http://insideairbnb.com/rome
https://python-visualization.github.io/folium/