How ContactRelief monitors the COVID-19 Infectious Outbreak
Learn how ContactRelief monitors the COVID-19 Infectious Outbreak with this behind the scenes look at the web Extract-Transform-Load (ETL) process developed by ContactRelief Data Analyst Jordan Chandler.
Monday, June 22, 2020 7:08:14 PM +00:00
ContactRelief sources its COVID-19 case data from authoritative original sources. We monitor daily the websites of every state's Health Department. We look for county-level COVID-19 case numbers. If the state does not report county-level data, we monitor the County Health Department of each county in the state.
The first step in the ETL process is scraping the data from the state Health Department websites and storing the resulting raw data in the cloud. We store each state or county's data in as data "blobs" in cloud storage. This storage is fast, cheap, and protected from disaster by being automatically replicated to two or more data centers located in different parts of the country.
After the COVID-19 case data is written to blob storage, the ContactRelief ETL process is kicked off. This process cleanses the data to make sure the data is safe and free from formatting errors, organizes the data by state and severity because the associated alerts in ContactRelief are issued on that basis, finds the latest matching alert (if any) so that that alert can be concluded before the new alert is issued, and gets the root id of the alert so that the new alert can be added as the latest update of that root alert.
The county-level COVID-19 case data is used to generate a detailed description for the new alert be issued. This description includes the reported number of cases for each county in the state. Once the description is ready, the alert is created in the ContactRelief database. The first step of this process is to create an Alert record. Each alert specifies one or more regions which define the geographic scope of the alert. There will be one alert region per county shown in the COVID-19 case data for the state. The county name specified in the county-level case data is used to find the associated Federal Information Processing Standard (FIPS) number for the county. A FIPS number is a 5 digit number that uniquely identifies a county within the United States. These FIPS numbers become the names of the associated alert region. An AlertRegion record is added for each county (FIPS number). The AlertRegion Name (i.e., FIPS number) links the AlertRegion to the geometric shape data for the county in the ContactRelief database. Shape data for each county already exists so this data does not need to be added when the COVID data is loaded.
We also add hypertext links to the Alert so that recipients of the alert can access the state Health Department's website directly if more information is needed. To do this we add an AlertLinks record for each link we want to add to the alert.
This is an abbreviated form of the Alert subsystem in the ContactRelief database shown as a Entity-Relationship-Diagram (ERD) above. ERD diagrams show the entities (i.e. database tables) as boxes and the relationship between the entities with connecting lines. The markings on the line indicate how the data records in the tables relate (e.g., one-to-one, one-to-many, etc.) The tables used in the COVID-19 ETL are shown. The actual ContactRelief Alert subsystem is more complex than that shown and the full ContactRelief ERD diagram is more complex still.
The end goal of the ContactRelief COVID-19 ETL process is to map the alert data by severity in the ContactRelief Disaster Decision Engine website. ContactRelief users define disaster monitoring policies in ContactRelief. These policies specify the type of alerts to monitor and the action to take when such an alert is detected. These actions form recommendations which ContactRelief issues to the client and clients use to control their internal operations and external communication activities with consumers.