Note: The package name for this library has changed since this post was initially written. Links to statistics still point to the old name to preserve the post as-written, however the current version of the package can be found at https://github.com/MoralCode/parse-opening-hours
For the past few months or so, I have been involved with a few COVID relief projects after some friends and I got together to create vacfind.org and try and help out wherever we could. Even as vaccine supply exceeds demand and the general chaos seems to be winding down, I couldn’t resist the opportunity to make some easy pull requests to ingest a bunch of data to feed into the comprehensive nationwide map of vaccination sites at https://vaccinatethestates.com.
After some brief conversations with the VaccinateCA team and a few small pull requests to get a feel for their workflow and how the data ingest pipeline worked, I was already working on contributing an entire data source from GISCorps with nearly 30,000 vaccination sites in it.
While working on the python script to convert the data to the format that the ingest system wanted, I discovered a field that listed the operating hours of the vaccination site. This field contained data such as:
Monday - Friday 8:00 am - 2:00 pm
Saturdays 9:00 am - 12:00 pm
Mon- Fri 9am-4pm
8:30 am - 3:15 pm
M-F 8 am to 4:30 pm
8 am to 8 pm daily
Every Day 8am to 8pm
8:00 a.m. - 4:30 p.m. Monday through Friday
by appointment only
9am-5pm, Urgent hours available
Because of how unstructured this data is, it seems as though this field was meant to be directly displayed to users, rather than processed by a machine.
Despite that I went looking for a library that could parse this. A quick Github search only seemed to turn up parsers that worked with the OpenStreetMap
opening_hours format, which would likely not work here.
Be the change
With no existing solution easily available, I figured it would be useful to the broader Python community if I made my own. Starting with some of the more straightforward strings from the GISCorps data, I began building up sets of expressions for the various time formats and days of the week using the
pyparsing library and whatever I could remember about context-free grammars from my Computer Science Theory course.
For a library like this that simply takes in strings and other primitive data types and spits out some JSON, it was also incredibly easy to write unit tests as I went. While normally I’d find writing tests to be a slog, these tests were quite simple and pleasant. I was able to take an almost-test-driven approach by adding any new cases that I came up with to the unit tests first, then using the unit tests to run the code and see if it worked.
Since releasing an initial “MVP” version to PyPi on May 3rd, it seems to have gotten about 211 downloads so far according to PePy. While it is super awesome that people are (likely) downloading and using this library, I can’t help but notice the similarity with my Jekyll encrypted web payments library on RubyGems whose downloads also seemed to shoot up to about 200 in the first few days, before completely stagnating.
Given the timing of the downloads on both these libraries, I suspect the cause of this is just a bunch of bots constantly watching for new packages and mirroring them to other repositories or who-knows-what. While bots are the most likely explanation at this point, being able to say that my code is used by real developers is a much happier outlook, so at this point, I’ll split the difference and say I pretty much have no idea what’s going on with the downloads on these libraries.
Overall, I’m very proud of how well this library is already able to parse many new test cases that I come up with. I’m looking forward to continuing to work on it and expand it to handle more real-world formats for operating hours soon.