DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Enterprise AI Trend Report: Gain insights on ethical AI, MLOps, generative AI, large language models, and much more.

2024 Cloud survey: Share your insights on microservices, containers, K8s, CI/CD, and DevOps (+ enter a $750 raffle!) for our Trend Reports.

PostgreSQL: Learn about the open-source RDBMS' advanced capabilities, core components, common commands and functions, and general DBA tasks.

AI Automation Essentials. Check out the latest Refcard on all things AI automation, including model training, data security, and more.

Related

  • Digital Transformation in Engineering: A Journey of Innovation in Retail
  • Simplifying Data Management: How StorageX Uses AI To Help Developers
  • Accelerate Image Processing Tasks With Nvidia GPUs
  • Microservice Design Patterns for AI

Trending

  • How to Submit a Post to DZone
  • Service Mesh Unleashed: A Riveting Dive Into the Istio Framework
  • API Appliance for Extreme Agility and Simplicity
  • DZone's Article Submission Guidelines
  1. DZone
  2. Data Engineering
  3. Data
  4. A Look at Intelligent Document Processing and E-Invoicing

A Look at Intelligent Document Processing and E-Invoicing

Buzzwords such as intelligent document processing are ubiquitous in today's business world (B2B), but many people don't know what these terms mean.

By 
Constantin Kwiatkowski user avatar
Constantin Kwiatkowski
·
Feb. 06, 24 · Analysis
Like (2)
Save
Tweet
Share
1.4K Views

Join the DZone community and get the full member experience.

Join For Free

In the "bygone era," invoices were traditionally dispatched in paper format and painstakingly transcribed into the recipient's ERP system facilitating subsequent data processing. As indicated by Brendan Foley, among others, a significant proportion—around 80 to 90 percent—of data from documents like invoices and emails continues to be manually extracted (2019). However, there has been a notable shift in recent years towards the exclusively digital transmission of documents such as business invoices, accompanied by automated data extraction processes. 

Why should a company (or its managers) embrace this shift? The rationale is clear: to conserve resources (e.g., reducing paper usage) and streamline workflow efficiency (e.g., eliminating manual data entry).

Furthermore, staying abreast of developments in this sector is crucial for companies engaged in public sector contracts. Within the European Union, there has been a longstanding push (EU Directive 2014/15/EU) to standardize invoicing and enhance machine readability. This initiative aims to facilitate the automated processing of business invoices. Similarly, in Germany, there is a noticeable shift underway in the public sector towards digital capture and automated processing, away from traditional paper-based document handling.

As a consequence of this transition, legislative bodies in Germany, for instance, have mandated that by approximately 2025, only business invoices meeting specific machine-readable formats may be digitally submitted. Consequently, the pressure on companies involved in public sector contracts is set to intensify in the coming years, marking the gradual escalation of e-invoicing into a pivotal phase.

Returning to the focal point, the intelligent processing of business invoices is paramount within the end-to-end invoice processing workflow, constituting perhaps the most critical aspect of the procedure. For instance, erroneous recognition of the IBAN (International Bank Account Number) could lead to inadvertent payment to the wrong supplier, thereby incurring substantial subsequent costs for the company.

Exploring Two Approaches for Data Field Identification in Business Invoices: AI Integration and Standardized Formats

The ensuing section will delineate two prospective methodologies for identifying data fields from business invoices.

In the initial scenario, data recognition is facilitated through the utilization of artificial intelligence (AI). Presently, various providers—such as Microsoft (utilizing the LayoutLM Model), ABBYY, SAP, and EagleDoc—offer comprehensive solutions for data extraction employing AI technologies. For instance, SAP employs a document reader to parse the extracted invoice documents, thereby discerning and categorizing the pertinent data. AI-driven OCR (Optical Character Recognition) software adeptly identifies and captures invoice data, cross-referencing it with vendor master records. Leveraging pre-existing master data, the invoice can be swiftly allocated to the appropriate supplier and designated employee. Furthermore, the classification software proficiently interprets invoice line items and associated values, facilitating immediate alignment with corresponding order data. Nevertheless, a commonality across all providers is the iterative nature of invoice processing—a perpetual refinement process. 

In the alternative approach, data fields from invoices are extracted and delineated using European or national invoice standards. At the European level, one notable initiative is the PEPPOL (Pan-European Public Procurement Online) initiative. This initiative establishes a universally accepted invoice standard (PEPPOL format) to streamline trade across member states. This format is widely acknowledged and endorsed by authorities in numerous member states.

Additionally, to facilitate domestic trade and adhere to EU directives at the national level, individual countries have established their national invoice standards alongside recognized European standards like the PEPPOL format. For instance, Austria has implemented the "ebInterface" standard, while Germany has adopted "ZUGFeRD" (Zentraler User Guide des Forums elektronische Rechnung Deutschland), serving as its national invoice standard.

Next, we will delve into the technical intricacies of these invoice standards, exploring available formats and their utilization in extracting and identifying data from business invoices.

The conventional European method for electronic invoicing revolves around the XML format. This entails the creation of each invoice in XML format, subsequently transmitted to the recipient for seamless automated processing. The national invoice standard dictates the specific XML format required for such invoices.

This structured data format facilitates the automated processing of invoices. The RNorm 16931 standardizes two XML formats for electronic invoices:

  • UN/CEFACT XML CII (Cross Industry Invoice)
  • UBL ISO/IEC 19845 (also known as UBL 2.1 Invoice, Universal Business Language)

Unlocking Data Extraction in Business Invoices With “ZUGFeRD”: Insights From the Mustang Initiative

Consider "ZUGFeRD" as a prime illustration for extracting data fields from business invoices. As per findings from the open-source initiative "Mustang," approximately 43% of companies in Germany currently transmit electronic invoices, with 45% of those utilizing the ZUGFeRD/Factur-X format.

The "Mustang" endeavor comprises an open-source Java (Jar or Maven) and .NET library, offering a suite of tools encompassing reading, editing, and validating ZUGFeRD invoices.

Suppose we possess an invoice in PDF format adhering to the "ZUGFeRD" standard. Below is an excerpt of Java code illustrating how individual data fields can be extracted from the invoice:

Java
 
public class ZUGFeRDReader {

    public static void main(String[] args) {
         
        ZUGFeRDImporter zi = new ZUGFeRDImporter("./MustangGnuaccountingBeispielRE-20201121_508.pdf");
   
        //"ZUGFeRD" validation
        if (zi.canParse()) {   
            System.out.println("Total Amount: " + zi.getAmount());
            System.out.println("BIC: " + zi.getBIC());
            System.out.println("IBAN: " + zi.getIBAN());
            System.out.println("Holder Name: " + zi.getHolder());
            System.out.println("Invoice Number: " + zi.getForeignReference());
            System.out.println("Invoice Date: " + zi.getInvoiceDate());
            System.out.println("Invoice Due Date: " + zi.getDueDate());
            System.out.println("Currency: " + zi.getCurrency());
            System.out.println("Tax ID: " + zi.getTaxID());
            System.out.println("Customer Reference: " + zi.getCustomerReference());
       
        } else {
            System.out.println("Invoice is not in the ZUGFeRD format");
        }
    }
}

Conclusion

In conclusion, with the escalating adoption of intelligent document processing in everyday business operations, the discourse on e-invoicing becomes increasingly unavoidable. We've discerned that the accurate recognition of data fields within an invoice is pivotal for electronic processing, with two distinct methodologies at play: AI and e-invoicing.

The AI-based approach proves efficacious for handling unstructured data formats like TIF, JPEG, Word documents, or email texts, while the e-invoice strategy excels in managing hybrid invoice formats such as "ZUGFeRD" and structured data available in XML files.

As observed, there's mounting public pressure to embrace standardized e-invoices. Additionally, the AI-based data collection process is iterative, initially fraught with a notable error margin, necessitating substantial time and resources for refinement through training. Conversely, the e-invoice approach enables direct and near-error-free data extraction, presumably translating into lower processing costs.

To remain future-proof, companies offering digital solutions for automated business invoice processing must diversify their offerings to encompass both AI and e-invoice approaches.

AI Data processing Document Document processing

Opinions expressed by DZone contributors are their own.

Related

  • Digital Transformation in Engineering: A Journey of Innovation in Retail
  • Simplifying Data Management: How StorageX Uses AI To Help Developers
  • Accelerate Image Processing Tasks With Nvidia GPUs
  • Microservice Design Patterns for AI

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: