Extraction of tables from PDF documents

EXTRACT TABLES FROM PDFS

A lot data exists in PDF only. At Canopy our first preference is to get data as an API (or datafeed) from the custodian.

Last Updated: ‎‎‎‎‏‏‎ ‎

January 24, 2023

A lot data exists in PDF only

At Canopy our first preference is to get data as an API (or datafeed) from the custodian.

Unfortunately a large number of custodians (especially in Europe and Asia) are not yet able to provide investments data as APIs and only provide their regular monthly statements (which are provided in paper or PDF)

Therefore to get data from banks who do not offer APIs, Canopy has developed the ability to take these monthly bank statements directly as a data source. We prefer electronically generated PDFs or ePDFs (i.e. the ones downloaded from the bank's website) but can also handle Print PDFs (i.e. scans of paper statements).

Interestingly about 86% of in investments data in Europe and Asia is available in PDF format only (this number is around 15% for North America)

Large chunks of data are available only in PDF format

Banks statements have very complex tables

Multilayer column headers and nesting are the key issues

Benchmarking of Canopy PDF Extraction to Adobe Acrobat

Canopy only extracts the relevant tables from the PDF document

Cells do not get merged in Canopy's extraction of data from PDF

Tables breaking across pages is not an issue

Multiple tables on the same page is also not an issue

‍

Extraction of tables from PDF documents

A lot data exists in PDF only

Banks statements have very complex tables

Benchmarking of Canopy PDF Extraction to Adobe Acrobat

All articles

All articles

Welcome to Canopy

We connect to every Bank in the world

List of Automated Data Feeds

Canopy Use Cases

Canopy Analytics Process

Segregated Database

Infinitely Configurable Calculations

High End Customized Reporting

Financial Metrics Dashboards

Extraction of tables from PDF documents

How does Canopy Extract work?

PDF Extract Demo (45 sec)

Canopy UL 2.0

List of Supported Formats for Canopy Extract

Evergrande likely a non-issue as investors started exiting in February

Selling across the board as ‘Fall’ season hits the markets

Interest rate ‘Netflix drama’ continues and signs of recovery in HKD equities

Interest Rate traders finally agree with Bond traders and everyone sells

Tech takes a backseat, China panic subsides as summer break continues

Canopy Data Cleansing Process

Typical Bugs Found in Custodian Statements

What are Structured Derivatives

Comprehensive Risk Report in Excel

Risk Report in Chart format

Integrations

Introduction to Accounts

Installing the Excel Add-in

Starting the Excel Add-in

Excel Add-in Menu Options

Generate Attribution Summary

Generate Strategy Transactions

Troubleshooting if Excel Add-in not working

Create Strategies

Single Line Transactions

Consolidated Holdings

Strategy Definitions

Strategy Records

Private Equity Dashboards

Real Estate Dashboards

Making a New Dashboard

Explainer (Most Used Feature)

Meta Data Analytics

Fund Look Through

Demo of the Analytics Process

Preparing Your Bank Statements for Upload

What Information to Anonymize

How to Anonymize Your Statements

Canopy Visualizer Features

Investors make a U turn

Investors buy Equities and sell Bonds even as interest rates fall

Investors ‘wait and watch’ as Fed signals earlier rate hike

Signs of activity and a preference for Asian assets as USD loses value

Investors relax while Bitcoin blows up

How Personalised Client Reporting Has Changed Wealth Management

How Right Technology Can Be Transformational For Family Offices

3 common applications of AI in asset management