Automated High Volume Solutions

An Equidox data scientist explains how high-volume automated solutions work.

Video transcript

[Dan Tuleta] okay so it's just about two  o'clock so I think we're ready to get started  for everyone that's still shuffling in all this is  being recorded so you won't miss too much in the  very first minute or two so but thank you all  for attending welcome again for those of you  who have attended our webinars before but this is  the next installment of Equidox Webinar Wednesdays  I Heard a bit of background noise there I'm  not sure if someone was off of mute but okay so  welcome everyone to Equidox Webinar Wednesdays  talking about our newest service batch processing  so as always we do appreciate your attention  if you have any questions throughout this  presentation that we do not cover please do  not hesitate to reach out to us through our  website at equidox.co we're also very active  on LinkedIn and social media so if you are  if you are on LinkedIn please feel free to  connect with us we post a lot of Articles  and information about our product services  and the accessibility space in general  now this week is a little bit different so  if you've joined us before for our Equidox  webinar Wednesdays you're you're probably  used to me talking and and demoing something  about the Equidox software but this time  I'm actually co-presenting with my friend  David Freeland and he is on the Equidox  team and he's our lead data scientist  so I'm going to let turn this over to  David to introduce himself and talk   a bit more about batch processing  [David Freelan] thank you Dan so I'm going to  tell you a bit about myself first I started  my journey at George Mason University I got  my graduate degree in Ai and machine learning  I got recruited then by a robotic soccer team  and we represented the United States in a in  a competition called RoboCop I got my way to  team lead and we competed in both Brazil and  China and went on to publish some papers  and deep learning and multi-agent systems  after my degree I actually had a connection at  the Cleveland Clinic that led me to Equidox and I  thought working for a company doing accessibility  and trying to get everybody included sounded like  a great way to apply my skills so I found myself  very motivated to contribute to equidox's goals  so wrapping things around robotics has a lot  of different components to it but what's the  most relevant for us today is computer vision  techniques which try to figure out let's say an  example for the soccer stuff wear a ball or a  robotic player is all that visual analysis of  what's going on on a soccer field can actually be  applied to analyzing a document in front of you  so my initial contribution to to machine learning  and Equidox was our list and table editor  so when making those documents accessible in a  lot of situations are our remediators have to  specify where every single row and every single  column in a table is as well as highlight each  item in a list I can use the the bullet points on  this Pages as an example a remediator would have  to go through tag this whole thing as a list  tag each and every single item one by one so  all six items here and they'd have to make sure  they tag each and every single it's a limiter  so just for this tiny little list we have a lot  of different things we we need to make sure tags  so we really want to automate as much as this as  possible so that we save a lot of business time so  the way we trained our machine learning models we  can actually take that manual human process that's  been done many many times by our remediators  and we can teach a machine learning model  based on those hundreds of thousands of examples  that that we can create and we can then use  that to consistently detect these tables and lists  and skip a lot of of all that manual labor there  so while we use our internal documents to train  these machine learning processes when we apply  ourselves to batch processing for for a company  we're going to use your documents to train we  don't want a general a one-size-fits all we're  going to take your documents and fit ourselves to  to identify the elements in there as accurately  as possible if you have a few documents out of  thousands or even millions of documents that look  a bit different we want to make sure that we use  every machine learning tool at our disposal to  find those little outlier documents which will  take a closer look at later we want to get give  you a sneak peek on how that's done so effectively  but first let's take another look at our goal  State here oh that's the next slide yeah here's  our goal State here so we want to we want to go  ahead and identify each of these single ohms so we  have a icon on the top left here for your bank we  have a table with a header and a and the table has  headers inside it we got all sorts of things going  on now this is just an internal representation  of of what's going on on our end don't worry we're  not going to be modifying your document in any way  so before we but before we train our models  we we need to we need to do a couple of  internal steps first we're going to have to get  gather these these documents and create all these  these templates and I want to show you how how we  do that but first let's get just a few definitions  out of the way I know we can tackle we can tackle  a problem together so artificial intelligence  is just a general broad scope of algorithms that  are used to mimic human-like Intelligence on any  particular task for more specifically we can be  interested in computer vision so we're trying to  we're trying to mimic say the visual cortex of  of a human and identify whether something is a  cat or a dog or whether something's a table or  a list machine learning is a different subset of  artificial intelligence which is about teaching a  model instead of like hard coding you can teach a  model over time so where this intersection is  is where you're teaching a computer to see so  that that's kind of our Baseline vocabulary here  and now that we have that out of the way let's go  ahead and try to be data scientists together for  a second and take a look at an example company so  we can take out a machine learning algorithm and  we can teach it how to group similar documents  together so in this example on the right each  dot represents a single page of a document  and the closer those dots are the more similar  that page is so to show you a bit of a better  example what I mean let's zoom in on a cluster  here in the middle let's take a look at that  so in this cluster we have a lot of pie charts  that would appear if you go from the top left the  pie chart appears near the bottom as you go toward  the bottom right of that graph the pie chart kind  of sneaks up toward toward the top so just by  looking at this cluster zooming in and clicking  on three different documents we already have  a really good idea for you know this gives our  machine learning algorithm and and our developers  to to see the variation of these documents just  by looking just by looking at three so we have we  have the opportunity now to take a look at some of  those points that are way on the outside that are  really far away from all the other points we call  it these sort of the outlier points and we see  an interesting one here where the pie chart only  has one category and this could be something that  that a machine learning algorithm with or honestly  a a human who is handwriting a template could  totally miss out on on this situation if there's  thousands or millions of documents this is just  potentially one in a million that could be hard  to find so our machine learning algorithm was able  to find that outlier and handle that all right  so there's one other thing I actually want  to note on this page there's a 100 percent  there that's right on top of the pie chart  that can actually kind of blend in with that  paragraph above it so again without noticing  these sorts of things and going through this  initial machine learning process there's a  good chance we would miss that 100 percent  and missing something means that someone isn't  included so we had to make sure we get everything  let's take a look at this real fast so this  cluster that we zoomed into has a lot of  letters on it I wanted to show here that while  we we are obviously doing a lot more than a fixed  overlay in this cluster there are some static  elements that say the logo on the top and the  address that are all in generally the the same  location we can actually take advantage of that  and make sure that our algorithm focuses on  getting these sorts of variations here as as  we look at these three individual points  we can see that these paragraphs can vary  in length some of this header can can vary in  size so we need to make sure that we account for  that we sort of have a mix of a static in the  dynamic thing going on in this in this cluster  okay so in these three clusters these these  clusters appear to be closer together because  they have a sort of similar Style  but each cluster has a different  number of columns so what I wanted to show you  all here is how each document even though they  have a differing number of columns they have a  lot of similar features so actually instead of  building three different templates one for each  cluster I could actually split up the the document  by their columns and create one template  that that can handle columns and I could  sort of combine all these clusters together  and by by graphing all these out gave me a  good view of of how these differ and how these  these documents are so similar to each other  so machine algorithm really really helped  us learn a bit more about these documents  so the last kind of document I wanted to show  you today is where the tables vary in their  structure so at the top left here we have very  few cells compared to if we go all the way down  on the graph on the lower right we can see a  lot of very dense cells so we want our machine  learning algorithm here is informing us that this  is primarily how the document is going to change  that that the the amount of cells in this table  so we want to make sure that our machine learning  algorithm not will identify these PDF elements  but is able to tag them correctly despite these  variations so we have our own table algorithms  to make sure that we handle this appropriately  and let's again let's take a look at  another outlier point because this  could be really important this  example so in in the top left  this could be a bug on on this company's end  or or it could be something that's an outlier  that we need to make sure that we can handle  but there is the one on the top left that's  completely missing a table there's there's no  cells at all so this is another thing that if  this was if this cluster had a million documents  in it without something like this this algorithm  you might never find it looking for it by hand so  this is a case where we can either alert you that  that this customer might have an issue or or we  can just keep that in mind to to our to ourselves  if you say it's not a problem and make sure that  we are handling that edge case appropriately  so once we've done all of that we can build  ourselves a lovely template this is the the  same thing we saw on on the last slide sort of  a reminder of what our goal state is We've we've  now been able to to show you how we can identify  all these different elements on the page and this  is what internally again we're not marking your  documents how how internally all these things  will will be tagged they that we'll make sure  we apply them correctly so that their their tags  compliant accessible and usable everything you'd  watch here so now you might be wondering cool we  we've we've made all these templates we've  done all this fancy machine learning stuff  how are we going to get these documents to  your company you have documents you want them  remediated by our batch processing system how's  that going to be going going to work for you so  what what we want to supply for your developers is  an interface where where you can call a a rest API  and with that rest API that rest API our end  is going to interact with either a local server  or a Cloud Server that can be scaled up and  down whether it's local or cloud is based  on your your preference so once once we have  that PDF we can go along and forward that to  the machine learning algorithm that that  we have just described in in great detail  and after the machine learning has identified  the templates and tagged them all correctly  we can send that simply right back  to you through the same rest API  and you have yourself a tagged and accessible  document for for your for your end users  next slide okay so that concludes this Dan if you   can take it away we have a few FAQs [Dan Tuleta] sure thank you David   so when we are talking about batch  processing with our prospects and  existing clients there's a lot of common questions  that come up so what we've tried to do with within  this presentation is to just compile a list of  these FAQs and we're going to walk through a few  of those common questions that we feel that many  of you on this call are probably wondering right  now so Dave if you want to jump to  the next slide and we can get started   okay so David is this really accessible  [David Freelan] yes yes it is so it's not just  my expertise that's going in into this I'm not  a one-man show at our we have our director of  accessibility and his entire team that's going  to be going through it with our developers and  all the machine learning process here to make sure  that we're complying with everything we  don't want to we don't want to leave anybody  Out [Dan Tuleta] Right   our PDF remediation team is full of PDF  experts when we send documents back to  clients as part of our PDF or mediation services  everything goes through a multi-step validation  process so every document every template that  we're designing for you is going to go through  rigorous validation and this is not just  trying to trick the accessibility Checkers  into saying that this document is accessible  we are actually building a fully tagged fully  accessible fully usable document for your clients  now again this is not an overlay so these are not  static documents that have to be exactly the same  way and every and then you just place or copy and  paste the exact same template onto every document  these documents might have minor variances as  David has explained even if that's the case  our machine-learning algorithms are able to  detect that and every document will be tagged in  a unique way to make sure that it is fully usable  so David A lot of people are wondering about  these templates you know how many templates  can you have what if they need to change and  if they do need to change how long do these  changes take to apply [David Freeland] Yeah so   the number of templates is very fluid I wouldn't  necessarily be too concerned about our number  because templates can vary in difficulty so that's  the thing we can do on a case-by-case basis if  they change again it depends on how complex the  changes I imagine a lot of changes can vary in  difficulty so that's again unfortunately one  of those things we do tackle on a case-by-case  basis and the length of time is again it's  one of those unsatisfying answers potentially but  yes we want to make sure we're working with you  and the length of time it takes us to complete  the projects is going to depend on your needs  [Dan Tuleta] Great all right that's just  one thing to keep in mind is that all of  these are custom Solutions so that's going  to be sort of a theme throughout the the  FAQs we really do need to engage with you  and your team on a you know one-to-one  basis and understand your documents your  templates and all of your needs to come  up with a custom solution that  will fit all of your documents  so David can the system run   multiple templates simultaneously [David Freelan] Yes that that's   really not a problem running an arbitrary  amount of templates is not an issue for  us we just need enough processing power to do it  and we don't think it's going to take that much  processing power to do so it's  really really not a problem for us  [Dan Tuleta] Great all right and so David  of course a very important question is this   process fully automated or does it need  like any sort of human hand-holding or   babysitting on a day-to-day basis [David Freelan] It is absolutely   automated it's just a matter of setting  those initial conversations and creating   those templates and from there it  should just it should just go smoothly  [Dan Tuleta] Great yeah once we've  had those initial discovery calls   and we better understand your templates your  documents and we can develop those templates   we can take the human involvement completely out  of the equation and it all just works like magic   so another question that we're often asked  is should this be an on-prem or a cloud-based   type of solution so in talking with our  developers both options are available so we can  we can set up this system either in the cloud or  a locally installed on your servers our developers  have recommended for this type of system that  on-prem is going to be faster because you won't  have data that has to you know ping back and  forth from the cloud going both directions  so everything can stay local into in your  own environment and also this might help you  meet some of your internal security requirements  so a lot of these documents might contain you  know client data or you know information about  specific people or account numbers for example  so a lot of there might be a lot of very strict  and rigorous internal IT security measures  that you have to adhere to so keeping it  locally installed on your server in your  environment might be a better option but if you  are interested in having a conversation about how  to install this on the cloud that's absolutely  fine we can we can support either option  now David what kind of volume and  scalability are supported here?  [David Freelan] It's it's arbitrarily large like  you can have a billion documents going through and  we can handle it you just need the hardware to  do it but we don't exactly expect you to have  to rent a supercomputer or anything to do  that sort of thing but it's again one of those  case-by-case things so we'll work with  you to to determine on exactly how  big of a machine you might need [Dan Tuleta] Right there can there   can be a lot of variance between you know  the size of the client whether you're a   you know a small Credit Union or a small  utility company or a you know a Fortune   500 company you could be talking about  thousands of documents a month or maybe   millions of documents per month but we can scale  accordingly it really is just dependent on the   hardware that you are trying to run this  through but that again is a conversation   that we can have offline once we're having a  one-to-one discussion with you and your team  now another question that we're often asked is,  is it secure? And yes it is secure so whether   you go with the cloud option or the on-prem it is  in fact a very secure system so many people might   opt for the on-prem install just for that extra  layer of security keeping everything installed   locally on your own environment to adhere to  all of your internal IT security rules and   regulations but just keep in mind when you take  everything away from this from this presentation   that this is in fact a secure solution and we  can have a more detailed discussion about any   of your specific requirements  when the time comes thank you  oh I didn't even notice you changed  the slide David thank you so what what  does it cost this is of course the  million-dollar question? So everyone wants to  know what will this cost and it really can  vary depending on the complexity of the  templates the number of templates that  you have the volume and the hardware  that you are running it on so there's a  number of factors that go into the price  and we know that this has to scale up and  down you know based on a lot of different  factors so what we want to do is of course have  a conversation with you to talk about your exact  needs talk about your documents better understand  these the situation for your organization and  then we will come up with a custom pricing  solution that will work for both parties so  this has to be kind of discussed on a  one-to-one basis there is no just you know  one size or one price fits all type of solution  here everything is custom keep that in mind  so in summary that kind of wraps up our FAQ.  So in summary Equidox batch processing is  a customized service for our clients to produce  fully accessible documents in a fully automated  way without any day-to-day human involvement so  this allows for on-demand accessible PDF creation  so that when any of your customers download a PDF  whether it be their bank statement or a utility  bill or any sort of templatized document Equidox  batch processing will automatically produce that  document in an accessible format so our  batch processing solution is a fully secure  fully scalable solution to meet your needs whether  you're producing a few thousand or a few million  documents this system will take the burden of  page-by-page remediation out of your hands it will  also ultimately ensure that there is inclusion  for all of your clients and it will mitigate  your legal risk of complying with all of the  accessibility requirements for your organization  For more information about how  Equidox Software Company can help you  with PDF accessibility Email us at EquidoxSales@equidox.co  Or give us a call at 216-529-3030  Or visit our website at www.equidox.co

Automated Batch Processing

Dan Tuleta hosts a special guest, David Freelan, an Equidox data scientist, to talk about high-volume solutions. David will discuss his artificial intelligence developments and how these can make bulk PDF remediation completely automated and accessible.

Envelope with green checkmark icon

Let’s talk!

Speak with an expert to learn how Equidox solutions make PDF accessibility easy.