Chapter 2 Proposal

2.1 Research topic

The research topic is about water levels in several locations in the US. Specifically, we are interested in how the water level changes over time in those locations as well as how the water levels and changes in water levels compare between locations. This is important because with this information about the different water level communities can gauge their risk of flooding and better plan for the changing sea levels. In fact many coastal communities are already seeing the effects of the changing sea levels, with more extreme water levels during storms and more frequent tidal flooding.

In particular, the states we chose to focus on were The Battery, NY; Virginia Key, Biscayne Bay, FL; Galveston Pier 21, TX; Monterey, CA; Prudhoe Bay, AK; and Honolulu, HI. We chose a location in New York, as we are currently living here and thus, the changing water levels here will have an impact on us. We also chose locations in Florida, California and New York because they are among the top ten states with the highest water area. Similarly, we chose Texas because it is facing a water shortage. We also chose Arkansas because oftentimes have to deal with drought and in fact, currently, western Arkansas is now dealing with drought. We chose Hawaii as rising sea levels is a major problem for the island as it will cause chronic flooding for their major roads, cultural sites and make much of their land unusable.

2.2 Data availability

During the process of identifying data sources, we wanted to ensure that the data was not aggregated, cleaning the data is manageable, and would be able to answer any business questions through visualizations.

Initially, we were highly interested in an Alzheimer’s data set that appeared to be able to answer questions similar to “how does race affect the frequency Alzheimer’s patients?”, “does gender affect Alzheimer’s?”, and “are there specific regions in the country that have a higher number of cases of Alzheimer’s?”. Upon further investigation, we discovered that the data set structure was highly aggregated and the structure is very difficult to work with. The data was organized by each row corresponding to multiple stratifications. For example, one row corresponded to a specific question and the mean number of days for individuals that are Hispanic, female, and 50-64 years old. Overall, we determined the Alzheimer’s data set to be fairly complex and would not be reasonable to structure it in a tidy way, given the current structure and aggregations made. Therefore, we decided to change direction for our project and do a deeper dive into the current data structure and ensure its usability.

The National Oceanic and Atmospheric Administration (NOAA) gathers information on numerous stations across the United States to monitor tides and currents. The data is collected and recorded through sensors placed on each of the stations. Throughout the United States, sensors are recording water level (WLLW), air temperature, water temperature, barometric pressure, winds, relative humidity, and visibility. For our project, we will be specifically looking at the recorded water level for 6 different stations in different regions and states throughout the United States; the stations we will be analyzing are The Battery, NY; Virginia Key, Biscayne Bay, FL; Galveston Pier 21, TX; Monterey, CA; Prudhoe Bay, AK; and Honolulu, HI. For each station, data for the water level is collected in 6-minute intervals. The page the data is on has the option to export it to a CSV file, and can be exported with specified data ranges and time intervals, which we plan to utilize.

The current format of the data will be exported by the monthly generalizations of the data, in which case the data shows the month, the highest tide per month, the Mean Higher High Water (MHHW), the Mean High Water (MHW), the Mean Sea Level (MSL), the Mean Tide Level (MTL), the Mean Low Water (MLW), the Mean Lower Low Water (MLLW), and the lowest tide. As each station is recorded separately, we will need to select the same parameters, time intervals, and data ranges for each station in separate CSV files, then merge all 6 station data sets into a single data set with an additional column to indicate the station location.

Link to all US stations: https://tidesandcurrents.noaa.gov/map/

Link to the 6 stations and station information used in analysis: