A Massive PDF Conversion Project
Back in 2001, the Canadian government took a stance on digital accessibility issues. Lawmakers developed a standard for all government web content, naming it the Common Look and Feel (CLF).
This measure focused on many web content factors, including coding practices that supported digital accessibility. This was the start of a long, arduous journey to digital accessibility compliance.
By 2007, not a federal single agency had complied with this standard, according to a Candian Treasury department audit. That same year, a blind, special-needs consultant Donna Jodhan sued the Ottawa government for discrimination when she was unable to apply for a job on its website due to an inaccessible job application form. Jodhan, who has won four awards from IBM for technical initiatives, won her case in November 2010. The courts gave the government 15 months to make its website accessible, including any PDF documents housed there. Fast forward to 2012. The Canadian government, specifically the CIOs Branch of Treasure Board, felt immense pressure to meet this court-mandated deadline on time. One thing was holding them back: a massive amount of digitally inaccessible PDFs.
Ottawa Searches for a Solution
The Treasury Board had hoped these PDF files would be deprecated. Because that wasn’t the case, they approached several organizations, including Google, Microsoft, and Adobe, for a PDF remediation solution, without success. Government workers also didn’t have the expertise or resources to remediate the existing PDFs.
Ottawa officials had determined they would need to convert the PDF backlog into HTML format to make these files accessible. It became a time-consuming process, as workers were manually copying the PDF text from the files, pasting it into an HTML editor, and using various other tools to stylize and upload on their various web platforms.
Because HTML is innately accessible to people using assistive technology, this was a very workable solution in theory. But, the backlog was large and this manual process devoured an immense amount of time. Something needed to change.
In an attempt to streamline this massive project, the Treasury Board’s CIO approached Capital Technology Partners to evaluate and track down the never-ending backlog of legacy PDFs. At the time, Capital Technology Partners was a Google Search reseller. I worked there as a developer and Onix’s Jeff Telford worked in sales. The Treasury Board wanted to use the Google Search Appliance to help locate the PDF files across its various website properties.
We audited four of the Ottawa remediation teams to evaluate their workflow, software, and tools (which included 6-8 different PDF remediation tools), and the time spent on different categories of documents. The results of the audit were not what Ottawa officials wanted to hear. Based on optimistic numbers, we determined that it would take them 40 years to remediate just their PDF backlog. This was despite the fact that most of the Treasury Board remediation team members were software developers and/or accessibility experts.
We were asked to find a better solution. I told them I had an idea for a tool to convert the PDFs to HTML more efficiently.
Creation of a PDF Remediation Tool
Three months later, I went back to the Ottawa Treasury Board’s CIO with a working prototype tool to convert the PDF files into HTML. Today, this solution is known as Equidox.
To avoid installation, updates, licensing and maintenance issues, we decided to provide a cloud-based PDF remediation software. One of the biggest advantages of a cloud-based solution is that it enables multiple users to work on the same document. Our first Equidox version allowed our customers to work with the actual PDF, providing them with a rendition of the actual document. That interface enabled them to draw a “zone” over the document’s different elements to identify, describe and stylize them. Once all the pages were complete, workers then would simply export the result to an HTML package that they could publish.
While we were asked to create a specific tool to accomplish a specific purpose, we made sure we kept the tenets of accessibility top of mind. It was vital that the tool we created would provide a digital output that was not just technically accessible but also completely usable.
Ottawa Approves the Prototype; Equidox is Evaluated
The Treasury Board was pleased with our solution and encouraged us to finish developing the PDF remediation software. Six months later, we had a working application ready to deploy — Equidox.
We evaluated the new Equidox application by comparing the Treasury Board’s accessibility experts’ method of manually copying and pasting content from PDF into HTML formats to both the Board’s employees using Equidox, and our own team using Equidox.
Our report showed that the new software tool was anywhere from two to five times faster than the Ottawa remediators’ existing, manual method of converting PDFs, depending on the document complexity. Equidox has, by now, come a long way since that original prototype.
Equidox Continues to Improve and Win at PDF Remediation
In 2014, Onix acquired Capital Technology Partners and the Equidox software. With Onix’s support, we have continued to improve the application. Since that early version, the Equidox team has acquired much unique experience and knowledge about PDF remediation. We implemented an analysis of documents with automated detection of its elements instead of manually drawing zones. With every release, we have introduced new features. Not only does Equidox support simple document elements such as images and text, but it also supports links, lists, OCR, languages and many more features.
Perhaps the most time-saving feature is our Table Editor, which greatly simplifies table “tagging.” We also support annotations specific to screen readers that allow users to add additional information and provide a better experience to document consumers. We estimate a person using Equidox can save as much as 85% of the time spent using other tools to remediate PDFs.
Anyone with a computer can use Equidox software after a simple training session. It doesn’t require highly qualified and expensive resources. Training is part of the license, and ongoing support is available to users.
One of the most requested features by our customers was the ability to export back to an accessible PDF document. Since that feature was implemented two years ago, we have continued to improve the quality and usability of our PDF output. You can work on your PDF document from start to finish and provide either an HTML, ePub or PDF rendition of your document. Also, depending on the quality of the source document, it will generate a clean PDF/UA file.
Introduction of AI
As we all know, in the accessibility world that there is no magic bullet. There is actually no software or solution currently available that will make an existing document accessible without human intervention. It’s still not possible, but Equidox is coming closer.
As our work continues, we keep making this solution better and better. As such, we’re to announce our fifth major Equidox release this fall. This new version will include features we have been testing and refining for the past year. This includes rolling out part of our machine learning engine that will enable our customers to save even more time when they are remediating documents by further automating the tagging process.
The goal is not to be the magic bullet. We seek to continually simplify and automate the remediation process, eliminate repetitive tasks, and spend more time thinking and delivering a usable reading (listening) experience for those who use assistive technology.
Be on the lookout for our next installment of Equidox Development history, authored by David Freelan, where he discusses the implementation of the machine learning engine.
To see Equidox software in action, watch our Equidox vs. Adobe video.