1 00:00:00,480 --> 00:00:05,640 Narrator: Equidox by ONIX— Reach Everyone. 2 00:00:06,260 --> 00:00:10,680 Equidox Automated Batch Processing for PDF Remediation. 3 00:00:11,590 --> 00:00:15,750 Accessibility has become part of the normal course of business. 4 00:00:15,750 --> 00:00:17,210 Businesses and organizations 5 00:00:17,210 --> 00:00:18,980 are learning that all digital documents 6 00:00:18,980 --> 00:00:20,890 must be compliant with regulations 7 00:00:20,890 --> 00:00:25,938 such as Section 508, ADA, the ACA, or AODA. 8 00:00:25,967 --> 00:00:27,890 Many of them are now 9 00:00:27,890 --> 00:00:31,590 working to meet web content accessibility guidelines. 10 00:00:31,590 --> 00:00:33,260 Many organizations are still using 11 00:00:33,260 --> 00:00:35,910 PDF files for their digital document type, 12 00:00:35,910 --> 00:00:40,230 despite accessibility issues that arise from these documents. 13 00:00:40,230 --> 00:00:42,600 These must be remediated in order to be accessible 14 00:00:42,600 --> 00:00:45,640 for people using assistive technology. 15 00:00:45,640 --> 00:00:47,200 But what happens when your organization 16 00:00:47,200 --> 00:00:49,920 is producing hundreds or thousands of PDFs 17 00:00:49,920 --> 00:00:52,490 every month or every week? 18 00:00:52,490 --> 00:00:55,160 What can banks, financial service providers, 19 00:00:55,160 --> 00:00:57,680 insurance companies, healthcare entities, 20 00:00:57,680 --> 00:01:01,270 utilities, governments, schools and other organizations 21 00:01:01,270 --> 00:01:03,600 do to keep up with the vast quantity of documents 22 00:01:03,600 --> 00:01:05,270 they produce? 23 00:01:05,910 --> 00:01:08,400 How can they remediate so many statements, 24 00:01:08,400 --> 00:01:11,380 billing summaries, explanations of benefits, 25 00:01:11,380 --> 00:01:16,220 HR data, product catalogs, tax documents, budget reports, 26 00:01:16,220 --> 00:01:17,890 and so many other digital documents 27 00:01:17,890 --> 00:01:20,730 that are continually being created every day? 28 00:01:20,730 --> 00:01:22,270 What happens when making these documents 29 00:01:22,270 --> 00:01:24,450 accessible takes more time and manpower 30 00:01:24,450 --> 00:01:26,850 than can be allotted to the task? 31 00:01:26,850 --> 00:01:31,080 This is when automated batch processing is a must. 32 00:01:31,080 --> 00:01:32,890 Using artificial intelligence, 33 00:01:32,890 --> 00:01:36,170 specifically computer vision and machine learning, 34 00:01:36,170 --> 00:01:38,950 it is possible to properly tag large quantities 35 00:01:38,950 --> 00:01:41,150 of repetitive or similar PDFs 36 00:01:41,150 --> 00:01:44,640 and make them accessible in one seamless process. 37 00:01:44,640 --> 00:01:46,590 Documents being produced in quantity 38 00:01:46,590 --> 00:01:50,220 could be analyzed by artificial intelligence developers. 39 00:01:50,220 --> 00:01:52,170 These developers can program a computer 40 00:01:52,170 --> 00:01:54,000 to understand the repetitive elements 41 00:01:54,000 --> 00:01:56,510 contained in digital PDF documents 42 00:01:56,510 --> 00:01:59,350 and tag them appropriately so that assistive technology 43 00:01:59,350 --> 00:02:02,160 can understand what is being provided. 44 00:02:02,700 --> 00:02:06,390 Let's have a look at an example credit card statement. 45 00:02:06,390 --> 00:02:08,130 This statement contains many elements 46 00:02:08,130 --> 00:02:11,490 that will be repeated, customer after customer. 47 00:02:11,490 --> 00:02:13,020 There is a logo that will be visible 48 00:02:13,020 --> 00:02:16,380 on every statement in the same location. 49 00:02:16,380 --> 00:02:17,600 The computer can be programed 50 00:02:17,600 --> 00:02:20,840 to identify the image and location of this logo 51 00:02:20,840 --> 00:02:23,150 and ensure that it is marked as an image 52 00:02:23,150 --> 00:02:28,150 and given the correct alt text such as your bank's logo. 53 00:02:28,150 --> 00:02:30,550 Customer information will appear in the same place 54 00:02:30,550 --> 00:02:32,290 on each statement. 55 00:02:32,290 --> 00:02:36,480 This can be defined by location and marked as text. 56 00:02:37,140 --> 00:02:39,820 There are recurring headings in a specific font 57 00:02:39,820 --> 00:02:41,760 and of a specific size and weight 58 00:02:41,760 --> 00:02:45,380 that a computer can be programed to identify and label. 59 00:02:46,180 --> 00:02:49,360 Heading Level 1 is always going to be in the same place 60 00:02:49,360 --> 00:02:52,710 with the same font and weight of characters. 61 00:02:53,480 --> 00:02:58,260 Headings Level 2 will also be identifiable by their font size. 62 00:02:58,260 --> 00:03:00,770 These can be defined, and during the batch processing, 63 00:03:00,770 --> 00:03:03,330 the computer will assign them the correct heading levels 64 00:03:03,330 --> 00:03:06,270 based on that programed definition. 65 00:03:06,810 --> 00:03:10,250 There are tables that will also be seen in every statement. 66 00:03:10,250 --> 00:03:13,330 The computer can be programed to recognize these tables, 67 00:03:13,330 --> 00:03:15,710 identify the row and column headers, 68 00:03:15,710 --> 00:03:18,040 and appropriately tag the information found 69 00:03:18,040 --> 00:03:20,600 inside each cell of the table. 70 00:03:21,440 --> 00:03:24,590 After analyzing the data provided on the statements, 71 00:03:24,590 --> 00:03:26,730 the artificial intelligence developer 72 00:03:26,730 --> 00:03:28,570 will build a digital template 73 00:03:28,570 --> 00:03:32,010 in which all the elements are properly defined. 74 00:03:32,010 --> 00:03:35,300 The template will then be used by the computer to identify 75 00:03:35,300 --> 00:03:38,710 all the defined elements during batch processing. 76 00:03:39,810 --> 00:03:42,040 Here's one of our data scientists to explain 77 00:03:42,040 --> 00:03:43,610 how Equidox uses machine learning 78 00:03:43,610 --> 00:03:45,600 to analyze customer data 79 00:03:45,600 --> 00:03:46,800 and create a template. 80 00:03:46,800 --> 00:03:49,750 >> In this video, we're going to take a closer look at the system 81 00:03:49,750 --> 00:03:51,760 and demonstrate a variety of document layouts 82 00:03:51,760 --> 00:03:54,660 that we can handle. Let's dive in. 83 00:03:54,660 --> 00:03:58,210 The first thing we need to do is gather customer data. 84 00:03:58,210 --> 00:03:59,770 Thousands, or even millions, 85 00:03:59,770 --> 00:04:02,790 of documents can be fed into a generative adversarial network 86 00:04:02,790 --> 00:04:06,770 and be organized into groups of similar designs and information. 87 00:04:06,770 --> 00:04:09,490 The clustered documents are important for the developers 88 00:04:09,490 --> 00:04:12,110 to see the variation within documents. 89 00:04:12,110 --> 00:04:13,970 This particular cluster, for instance, 90 00:04:13,970 --> 00:04:16,870 contains documents that the neural network grouped together 91 00:04:16,870 --> 00:04:19,350 because they all contain pie charts. 92 00:04:19,350 --> 00:04:22,080 The documents are further organized within the cluster 93 00:04:22,080 --> 00:04:25,560 by the location of the pie chart on each individual page. 94 00:04:25,560 --> 00:04:28,050 These clusters of data often contain examples 95 00:04:28,050 --> 00:04:30,060 that don't quite fit with the rest. 96 00:04:30,060 --> 00:04:32,760 For example, this document has a pie chart 97 00:04:32,760 --> 00:04:34,980 with no distinct categories, making it 98 00:04:34,980 --> 00:04:37,930 very visually different from the other documents. 99 00:04:37,930 --> 00:04:41,500 It also has a label that is close to the paragraph above it, 100 00:04:41,500 --> 00:04:42,980 which may confuse many deep-learning 101 00:04:42,980 --> 00:04:44,600 or machine-learning algorithms, 102 00:04:44,600 --> 00:04:47,130 keeping it from being identified correctly. 103 00:04:47,130 --> 00:04:49,280 Equidox’s batch processing service 104 00:04:49,280 --> 00:04:51,510 can identify these outlying documents 105 00:04:51,510 --> 00:04:53,320 where many other deep-learning algorithms 106 00:04:53,320 --> 00:04:56,040 would fail to make the distinction consistently. 107 00:04:56,040 --> 00:04:59,060 In examples of documents such as corporate letters, 108 00:04:59,060 --> 00:05:01,750 addresses, headers, footers or even paragraphs 109 00:05:01,750 --> 00:05:04,480 may not always be the same length, size, 110 00:05:04,480 --> 00:05:07,000 or precise position every time. 111 00:05:07,000 --> 00:05:10,020 Equidox excels at recognizing these variations 112 00:05:10,020 --> 00:05:13,300 and can consistently identify these key elements. 113 00:05:13,300 --> 00:05:15,510 These three documents are located near each other 114 00:05:15,510 --> 00:05:17,330 because their layouts are similar, 115 00:05:17,330 --> 00:05:20,560 the difference being the number of columns in each grouping. 116 00:05:20,560 --> 00:05:23,840 This helps the batch processing system identify variations 117 00:05:23,840 --> 00:05:27,560 in a type of document and organize them more specifically. 118 00:05:27,560 --> 00:05:30,040 In some documents that need to be remediated, 119 00:05:30,040 --> 00:05:32,240 there are certain sections that could be crucial 120 00:05:32,240 --> 00:05:34,230 to identify consistently. 121 00:05:34,230 --> 00:05:36,450 Keywords can be used to ensure uniformity 122 00:05:36,450 --> 00:05:38,620 throughout a large group of documents, 123 00:05:38,620 --> 00:05:42,530 even if those words are not in the same place on each page. 124 00:05:42,530 --> 00:05:45,140 Typical deep-learning algorithms won't be able to identify 125 00:05:45,140 --> 00:05:48,160 keywords consistently, but Equidox has no problem 126 00:05:48,160 --> 00:05:50,020 creating a custom algorithm tailored 127 00:05:50,020 --> 00:05:52,850 to the needs of each customer. 128 00:05:52,850 --> 00:05:54,680 Tables are another vital piece of data 129 00:05:54,680 --> 00:05:57,210 in a lot of automatically generated documents. 130 00:05:57,210 --> 00:05:59,630 This cluster shows documents with tables that vary 131 00:05:59,630 --> 00:06:02,000 in the number of rows and in columns. 132 00:06:02,000 --> 00:06:04,320 The Equidox batch processing system can handle 133 00:06:04,320 --> 00:06:07,450 table variations effectively and consistently. 134 00:06:07,450 --> 00:06:09,790 Note that the outlier in the top-left corner 135 00:06:09,790 --> 00:06:12,470 has a very distinct difference compared to the rest. 136 00:06:12,470 --> 00:06:14,740 It is completely missing a table. 137 00:06:14,740 --> 00:06:16,960 These type of document examples are detected 138 00:06:16,960 --> 00:06:18,290 by the Equidox system 139 00:06:18,290 --> 00:06:21,760 and can alert the team as well as the customer of this anomaly. 140 00:06:21,760 --> 00:06:23,570 Doing this can highlight abnormalities 141 00:06:23,570 --> 00:06:27,110 that need to be corrected before it's accessed by the end user. 142 00:06:27,110 --> 00:06:29,770 There are other options out there for batch processing 143 00:06:29,770 --> 00:06:32,000 and plenty of buzzwords to accompany them, 144 00:06:32,000 --> 00:06:35,000 but Equidox excels because it uses a variety of techniques, 145 00:06:35,000 --> 00:06:36,580 not just the popular ones, 146 00:06:36,580 --> 00:06:39,230 to ensure accuracy and consistency for clients 147 00:06:39,230 --> 00:06:41,010 and end users alike. 148 00:06:41,010 --> 00:06:42,260 Narrator: Now, let's take a look 149 00:06:42,260 --> 00:06:44,150 at how the batch processing works 150 00:06:44,150 --> 00:06:46,320 and how it will fit into your strategic workflow 151 00:06:46,320 --> 00:06:48,360 from start to finish. 152 00:06:48,360 --> 00:06:50,400 As our data scientist explained, 153 00:06:50,400 --> 00:06:52,840 Equidox will gather customer data and use 154 00:06:52,840 --> 00:06:54,740 machine learning to analyze the documents 155 00:06:54,740 --> 00:06:56,850 and create a digital template. 156 00:06:56,850 --> 00:06:59,220 The template will be applied during the batch processing 157 00:06:59,220 --> 00:07:01,920 to produce an accessible PDF. 158 00:07:01,920 --> 00:07:06,380 First, the customer or end user will request a PDF file. 159 00:07:06,380 --> 00:07:08,400 For example, when a banking customer 160 00:07:08,400 --> 00:07:10,780 requests a PDF file of their monthly statement 161 00:07:10,780 --> 00:07:12,920 from the bank's website, 162 00:07:12,920 --> 00:07:15,600 the requested file will enter the REST API 163 00:07:15,600 --> 00:07:17,920 and be conveyed to either a locally installed 164 00:07:17,920 --> 00:07:20,150 or cloud server. 165 00:07:20,150 --> 00:07:22,800 This server will have the option of load balancing, 166 00:07:22,800 --> 00:07:24,090 allowing it to be expanded 167 00:07:24,090 --> 00:07:27,150 as needed in times of greater demand. 168 00:07:27,150 --> 00:07:28,680 Within that cloud server, 169 00:07:28,680 --> 00:07:32,250 Equidox will be running on a secure virtual machine. 170 00:07:32,250 --> 00:07:33,610 The PDF file 171 00:07:33,610 --> 00:07:36,770 will be deconstructed into individual pages. 172 00:07:36,770 --> 00:07:39,720 Equidox will use machine learning to compare each page 173 00:07:39,720 --> 00:07:42,930 to the digital templates that have been developed, 174 00:07:42,930 --> 00:07:45,470 then Equidox will apply a zone map 175 00:07:45,470 --> 00:07:46,910 that will create digital tags 176 00:07:46,910 --> 00:07:49,570 that can be understood by assistive technology 177 00:07:49,570 --> 00:07:53,620 such as screen readers and connected braille displays. 178 00:07:53,620 --> 00:07:56,290 The tagged and remediated PDF is generated 179 00:07:56,290 --> 00:07:59,950 and returned to the end user via the REST API. 180 00:08:01,180 --> 00:08:04,920 With one click, the results will be accessible documents 181 00:08:04,920 --> 00:08:07,610 that can be distributed online and via email 182 00:08:07,610 --> 00:08:11,050 without anyone having to manually remediate them. 183 00:08:11,050 --> 00:08:14,800 The time and manpower saved will be enormous. 184 00:08:15,430 --> 00:08:17,560 Equidox will work with your organization 185 00:08:17,560 --> 00:08:20,420 to determine the best approach to implement the accessibility 186 00:08:20,420 --> 00:08:24,170 batch processing into your strategic workflow. 187 00:08:24,170 --> 00:08:27,580 Security and confidentiality concerns will also be addressed 188 00:08:27,580 --> 00:08:30,770 to ensure the integrity of sensitive data. 189 00:08:30,770 --> 00:08:33,280 The end goal is a fast, seamless, 190 00:08:33,280 --> 00:08:36,740 and secure process to achieving compliance. 191 00:08:37,290 --> 00:08:40,044 To discuss how Equidox Automated Batch Processing 192 00:08:40,044 --> 00:08:42,990 can resolve your accessibility issues, 193 00:08:42,990 --> 00:08:47,790 contact us at EquidoxSales@onixnet.com, 194 00:08:47,790 --> 00:08:53,240 or call 800-664-9638. 195 00:08:53,240 --> 00:08:59,500 [music]