NUTSHELL:
It was a data visualization project commissioned by a research group at a library.
Prior to us coming in:
- They took a bunch of materials related to some old English author.
- Digitized materials related to them via OCR
- Hand corrected the OCR ...cause OCR damn near always fukks up(at least any time i've used the software)
- Then housed the data on an open sourced system for Library science called Omeka
- Added meta data
- exported data to a CSV file
- created 1 or 2 proof of concept visualizations in house that references those csv files
Tasks we had to perform(3 parts):
- Add functionality to pre existing visualizations
- Create new visualizations(part I worked on)
- Add functionality to the Omeka backend administration
New Visualizations:
I used the Vis.js library
vis.js to create an interactive timeline to visualize the data. React tables were also used but I didn't work on that part.
I was working on adding tables using the danfo.js Library
(similar to pandas dataframes) Creating a DataFrame - Danfo.js
I showed everybody mockups of the timeline and table system I was going to use during the 1st Cycle. Then halfway through the 2nd Cycle one of the guys on the Libraries internal team came in with react tables which undercut what I was working on
(Less work for me tho). So I just focused on the vis.js timeline.
Only real hangup was that the Omeka backend API pulls take like 12 seconds to aggregate/relay the data.
As a result when you start up the Visualization you have to wait for that 12sec.
I was looking to get around that by...
- do a regular API pull and save the data locally to the browser on the first run of the visualization
- pull from local browser storage on subsequent runs of the visualization for instant usage
- while visualization is active run a webworker in the background to perform an API pull of up to date data
- save the up to date data to the local browser storage.
- rinse and repeat steps 2-4 on all visits to the visualization ...step 1-4 if browser cache is cleared
The catch was that webworkers can't access local / session storage in the browser.
While working on a workaround they decided to just run a CRON job periodically that creates an updated CSV file with all the data needed for the visualizations.