# Boston

AirBnB Analysis

First deliverable for the Udacity Data Scientist Nanodegree

Have a look at the code on my Github: Github

Libraries: numpy, pandas, matplotlib, searborn, plotly, sklearn

## Brief Overview:

### Goal

- Analyze a dataset and write a blogpost about it
- Ask Questions and answer them through visualizations

### Tools

- NumPy and Pandas to analyze data
- Matplotlib, Seaborn and Plotly to create visualizations
- Scikit-learn for a linear Regression (see the effect of each feature on the price)

### Results

- Blog post on Medium
- Best travel dates: between January and April on a Monday, Tuesday or Wednesday
- Wort travel day: date of the Boston Marathon
- Best location: Roxbury is affordable and close to Downtown

### Challenges

- Creating an interactiv map with every AirBnb ploted to it with meta information

### What did I learn?

- Plotting geographic data (longitude and latitude) with Plotly for interactivity
- Did a lot of plotting and arranging data to get the right format

## Introduction

The goal of this project was to write a blog post about a data analysis. This involved coming up with questions
about a dataset and then editing the dataset to answer the questions. The final step was then to write the answers
in a readable report for a non-technical audience. The article can be accessed **here**.

I decided to analyze the AirBnB dataset of Boston from the perspective of a traveler who wants to stay in Boston as
cheaply as possible. **Link** to dataset.

## What did I do?

First, I looked at the data set. From the three datasets I used two:

- listings.csv: full description of the listing
- calendar.csv: shows the availability of each listing for each day and a corresponding price, when the listing is available

After assessing the data I transformed columns that weren’t in the right data format removed outliers in the dataset, based on the respected histograms. After that the data was ready to be used for the questions. I will shortly describe what I did in a more technical way, if you want to read about more of the interpretations please read the medium article.

**Question 1: What is the availability over the year and how does the price develop?**

For this question, I visualized the sum of available AirBnBs versus the average price per day. I also noticed a
certain pattern (seasonality) in the average price.

**Question 2: Where are the BnBs in Boston and how expensive is each of them?**

Interactive visualization with plotly. Polted every listing on AirBnB over a map and color-coded the price.

**Question 3: Which neighborhoods are the most expensive?**

Concise visualization to showcase the difference in price per neighborhood (distance to the average price).

**Question 4: What features influence the price of a BnB?**

To answer this question I needed to clean the data more so I could use it in a multiple regression to get the weights
of each feature of a AirBnB (based on the listings) and order them by importance.

- Dealt with missing values
- Checked correlations
- Looked at the relationship between categorical features and price
- Feature engineering
- Created dummy variables

**Question 5: How does amenities impact the price?**

For this question I looked at one specific features: the amenities of an AirBnB. To answer the question I used the
calculated weights in question 4.