Monday, October 27, 2014

NYC Citi Bike System Data Analysis

NYC Citi Bike is a public bicycle sharing system that serves parts of Manhattan and Brooklyn, two of the most populous boroughs of New York City. The New York Citi Bike System Data is publicly available here. The data is available for past 12 months from the current date.

Citi Bike Trip data include:
  • Trip Duration (seconds)
  • Start Time and Date
  • Stop Time and Date
  • Start Station Name
  • End Station Name
  • Station ID
  • Station Latitude/Longitude
  • Bike ID
  • User Type (Customer = 24-hour pass or 7-day pass user; Subscriber = Annual Member)
  • Gender (0 = unknown; 1 = male; 2 = female)
  • Year of Birth
Since the location of Citi Bike Station is important variable and it was not available here, I then used Google to find out the location of Citi Bike Station in NYC. Citi Bike Station location data is available here.

I am using R to analyze 3 months (December 2013 to February 2014) of Citi Bike Trip data. The R code is stated below.

#############################################################################################
# Set working directory
setwd("/Users/Ankoor/Desktop/NYCBS")

# Read data
dec <- read.csv("2013-12 - Citi Bike trip data.csv", stringsAsFactors = FALSE)
jan <- read.csv("2014-01 - Citi Bike trip data.csv", stringsAsFactors = FALSE)
feb <- read.csv("2014-02 - Citi Bike trip data.csv", stringsAsFactors = FALSE)


# Merge data (Adding rows)
trip <- rbind(dec, jan, feb)
rm(dec, jan, feb)

# Bike station data
bikeStn <- read.csv("citibike.csv", stringsAsFactor = FALSE)

# Drop unnecessary columns from bikeStn
names(bikeStn) # Get column names in bikeStn
drop <- c("name", "streetAddress", "streetAddress.address2", "latitude", "longitude",
          "loc", "entityTitle", "X.context", "X.type", "X.id")
bikeStn <- bikeStn[, !(names(bikeStn) %in% drop)]

# Change column names in BikeStn to merge start
names(bikeStn) <- c("start.station.id", "start.totDocks", "start.hood", "start.zip")

# Merge data for start station (many to one)
trip <- merge(trip, bikeStn, by = c("start.station.id"))

# Change column names in BikeStn to merge start
names(bikeStn) <- c("end.station.id", "end.totDocks", "end.hood", "end.zip")

# Merge data for end station (many to one)
trip <- merge(trip, bikeStn, by = c("end.station.id"))
rm(bikeStn, drop)

# Plot - 1: NYC Citi Bike Route Popularity 

hood.trips <- table(trip$start.hood, trip$end.hood)

temp <- data.frame(hood.trips)
names(temp) <- c("startHood", "endHood", "Popularity")
temp <- temp[,c(2,1,3)]
names(temp) <- c("startHood", "endHood", "Popularity")

# ggplot2 library
library(ggplot2)

# Plot
pdf("NYC Citi Bike Route Popularity.pdf", width = 11, height = 11)
ggplot(temp, aes(startHood, endHood)) + geom_tile(aes(fill = Popularity), color = "black") + 
        scale_fill_gradient(low = 'white', high = 'blue') + theme(axis.text.x = element_text(
                angle = 90)) + xlab("Starting Neighborhood") + ylab("Ending Neighborhood") +
        ggtitle("NYC Citi Bike Route Popularity")
dev.off()

#############################################################################################

The above R code produces this plot. Here is the screen shot of the plot:




2 comments:

  1. This blog is awesome!!!! Can`t wait to follow the Quest!!

    ReplyDelete

  2. The development of artificial intelligence (AI) has propelled more programming architects, information scientists, and different experts to investigate the plausibility of a vocation in machine learning. Notwithstanding, a few newcomers will in general spotlight a lot on hypothesis and insufficient on commonsense application. IEEE final year projects on machine learning In case you will succeed, you have to begin building machine learning projects in the near future.

    Projects assist you with improving your applied ML skills rapidly while allowing you to investigate an intriguing point. Furthermore, you can include projects into your portfolio, making it simpler to get a vocation, discover cool profession openings, and Final Year Project Centers in Chennai even arrange a more significant compensation.


    Data analytics is the study of dissecting crude data so as to make decisions about that data. Data analytics advances and procedures are generally utilized in business ventures to empower associations to settle on progressively Python Training in Chennai educated business choices. In the present worldwide commercial center, it isn't sufficient to assemble data and do the math; you should realize how to apply that data to genuine situations such that will affect conduct. In the program you will initially gain proficiency with the specialized skills, including R and Python dialects most usually utilized in data analytics programming and usage; Python Training in Chennai at that point center around the commonsense application, in view of genuine business issues in a scope of industry segments, for example, wellbeing, promoting and account.

    ReplyDelete