Unlocking the Potential of Unstructured Data in Finance Through Document Intelligence


According to projections, 80% of worldwide data will be un- structured by 2025. Financial services (FS) industry is no different, where most enterprises hold vast array of unstructured data which is largely under-analyzed. A huge amount of enterprise information flows through documents and thus, understanding documents and extracting relevant information is at the heart of digital transforma- tion journeys for the organizations. Documents may be of different types and formats including native PDFs or scanned images, struc- tured, semi-structured of unstructured which makes document processing and understanding an arduous task. An ability to auto- mate document processing and understanding can deliver a more comprehensive and holistic benefits to several applications and use-cases involving manual handling of these documents. In this tutorial, we focus on Financial Services (FS) industry and how Docu- ment Intelligence i.e., AI powered automated analysis of documents, allows to tap into the opportunities by analyzing huge amount of information present in such documents.
In financial services industry, documents include financial state- ments, invoices, bank statements, policies, contracts, marketing creatives etc. Data residing in such documents can be of variety types including images, tables, figures, and text. While there are challenges around processing documents, ability to quickly make decisions by leveraging such data can provide differentiated value propositions and competitive benefits. These benefits include im- proved operational excellence, automated compliance, or regulatory workflows, discovered insights from mining/ matching disparate data sources and overall enhanced customer experience. However, the very nature of unstructured data prohibits the direct applica- tion of AI/ML techniques that can be seamlessly applied on the structured data. This talk will present the arts and sciences behind developing Document Intelligence solutions covering select use cases involving semi or unstructured documents, show the busi- ness opportunities present and describe the technical challenges involved. Subsequently, we provide an outline to develop various Document Intelligence solutions that can aggregate, query, anal- yse, and accelerate the understanding of such data to unveil deep insights across Financial Services use-cases.



Himanshu Sharad Bhatt is currently a Research Director at American Express AI Labs where he is actively involved in develop- ing Document AI-based solutions for unstructured data analytics. Prior to joining Amex in 2017, Himanshu has worked with Xe- rox Research, India towards building “unstructured data analytics » capabilities for contact centres and services division. Himanshu holds a PhD degree in Computer Science & Engineering where his thesis was acknowledged with the “best thesis award » by INAE and IUPRAI in 2014. Over the years, his work has led to 30+ publica- tions in reputed conferences and journals and 5 US patent to his credit. He has co-organized a number tutorials including tutorials on Data Science and Machine Learning at Grace Hopper Celebra- tion of Women in Computing (India), ACM India Compute 2015, and Xerox Innovation Group Conference in Palo Alto Research Centre East (PARC-East) at Webster, US. He has also co-organized a workshops at International Workshop on Domain Adaptation for Dialog Agents (DADA) in ECML-PKDD 2016. He was also an invited speaker at the faculty training program on “Data Science & Analytics-2020″ at IIT-Indore sponsored by MHRD, Govt of India and Continuum-2019, the rolling seminar series held at Shailesh J. Mehta School of Management, IIT Bombay. He also presented a tu- torial on “Unlocking the Potential of Unstructured Data in Finance Through Document Intelligence » in Toronto Machine Learning Summit (TMLS), 2021.


A PDF version of the slides is available here.


previous arrow
next arrow


Sunday 22 May 2022 (14h – 17h)


The tutorial session will be online and can be attended on Microsoft Teams via the following link: https://bit.ly/38zaow9

The attendees who are in La Rochelle are welcome to join us for attending a video broadcast of the tutorial in Room 000 of the building « PASCAL ».

For more details, please check the Scientific Program of DAS 2022 (https://das2022.univ-lr.fr/index.php/scientific-program/)