ESG Information Extraction and Analysis System
Make ESG Great
🐙 Byte Buster
Diving Into the Realms of ESG
Uncover the intricate interplay of environmental data, discover social insights for informed decisions, and extract valuable governance insights for positive change.
Next Page
Project Overview
Main feature:

1

AI-Powered Backend
Cutting-edge AI drives deep ESG data extraction and analysis, unlocking insights at scale.

2

Interactive Frontend
Immerse clients in a dynamic ESG data experience, empowering informed decision-making.

3

BERT Integration
Simplifying natural language processing for seamless extraction and analysis of ESG data.

4

Comprehensive Scoring
Expert ESG assessment tailored for the financial sector, providing a holistic view of sustainability.
WHY CHOOSE US

ONLINE PROCESS!

View more

ONE-CLICK UPLOAD!

View more

REAL-TIME INSIGHT!

View more

MULTI-DIMENSIONAL ASSESSMENT!

View more

Architecture
Click to view all the works of the project and click on the corresponding buttons to view further details.

Backend

View more

Frontend

View more

Thank You
For
Watching!
🐙 Byte Buster
The following pages are hyperlinked sections of the above pages
Bulk PDF Download

1

Automated Scans
The system reads keywords from a company list and automatically scours the web for the latest sustainability reports, like a high-speed document matchmaker.

2

Seamless Retrieval
The first relevant PDF match is quickly snagged and saved locally, streamlining the data collection process.

3

Comprehensive Logging
A detailed log tracks each download success or failure, providing a scorecard of the ESG report collection efforts.

4

Visual Progress Tracking
The download progress is displayed in real-time using tqdm, like a parade of PDFs marching in.
PDF Parse (AI-driven)
PDF Sorcery
Converts each page into images, capturing tables and charts with RapidLayout for precise extraction.
Image Whisperer
Uses Gemini-Flash, Google's powerful AI model, to transform the captured images into clean Markdown format. Multithreading boosts the speed.
Comprehensive Logging
Tracks the journey of each PDF, recording every step of the conversion process for detailed reporting and troubleshooting.
Data Cleaned Data Detox

1

Table Transformation
Converts Markdown tables into clean, readable text format.

2

Markdown Makeover
Strips away distracting formatting to reveal the pure text content.

3

Unit Unification
Standardizes units and measurements for consistent data representation.

4

Text Tidying
Removes pesky characters and spaces to create a polished, professional look.

5

Sentence Splitting
Intelligently separates sentences without breaking apart abbreviations.

6

Data Detox: A Sparkling Transformation
The Data Cleaned Data Detox module
Text Filter: Streamlining the Process

Efficient Classification
The Text Filter module leverages the BERT model ‘yiyanghkust/finbert-esg’ from Hugging Face to efficiently classify text within annual reports.

Reduced Analysis
This AI-powered model quickly identifies relevant ESG-related information, reducing the volume of text that needs further analysis.
✍️Text Annotate (AI-driven)
Annotation Awesomeness
Leverages Gemini-Flash, Google's powerful AI model, for text annotation, turning words into JSON gems.
Main Features
Text Annotation: Annotates text with BIO scheme precision.
Log Everything: Tracks every annotation adventure.
Highlights
Automated Annotation: Annotates with AI flair, free your hands.
Multithreading Power: Speedy and efficient.
NER Recognize
Our NER model accurately identifies key entities like names and metrics within text. It's built upon the powerful bert-base-NER model from Hugging Face.
Data Enhancement
We enhanced our training data of 33 annotated Singapore financial reports by:
  • Combining similar labels with smaller datasets
  • Keeping 15% of non-physical sentences
  • Oversampling low-frequency tags
NER Model Performance
Our NER model delivers impressive results:
  • Precision: 84.99%
  • Recall: 86.08%
  • F1-Score: 85.26%
These metrics demonstrate the model's high accuracy in identifying and extracting key entities.
Additional Models Explored
We fine-tuned two additional models:
  • nbroad/ESG-BERT - A 110M parameter BERT model for ESG text categorization, which exhibited slightly lower performance than the base NER model.
  • bert-large-NER - A larger model with 340M parameter, achieving slightly better performance but requiring significantly higher training costs.
NER Model Performance
Loss VS Epoch
Accuracy VS Epoch
Metric Result
Our system processes raw ESG annual report text and identifies key entities.
Input:
"our target is to have 100 % of our branches certified green mark platinum in singapore by the end of 2024 . . table 1 : our energy and carbon footprint 1 : 2023 : energy : total energy consumption megawatt-hours 2 , singapore : 65,832 , hong kong : 10,041 , china : 5,480 , taiwan : 6,394 , india : 19,313 , indonesia : 12,670 , international centres : 1,627 3 , total : 121,357 : energy intensity by total income megawatt-hours/ singapore dollar million , singapore : 4.91 , hong kong : 3.12 , china : 8.88 , taiwan : 11.69 , india : 29.76 , indonesia : 19.37 , international centres : 1.90 , total : 6.09 : a from non-renewables megawatt-hours 4 , singapore : 64,816 , hong kong : 10,041 , china : 5,480 , taiwan : 6,331 , india : 19,135 , indonesia : 12,408 , international centres : 1,627 , total : 119,838 : b from renewables production megawatt-hours , singapore : 1,016 , hong kong : 0 , china : 0 , taiwan : 63 , india : 178 , indonesia : 262 , international centres : 0 , total : 1,519 : purchased renewable energy certificates megawatt-hours 5 , singapore : 0 , hong kong : 10,100 , china : 5,500 , taiwan : 0 , india : 19,200 , indonesia : 13,000 , international centres : 0 , total : 47,800 : carbon 6 : total carbon emissions , market-based tonnes of carbon dioxide equivalent 7 = a + b ii + c , singapore : 37,402 , hong kong : 2,299 , china : 2,180 , taiwan : 3,245 , india : 3,886 , indonesia : 3,084 , international centres : 1,350 , total : 53,446 : total carbon emission intensity , market-based , by total income tonnes of carbon dioxide equivalent/ singapore dollar million 8 , singapore : 2.79 , hong kong : 0.72 , china : 3.53 , taiwan : 5.93 , india : 5.99 , indonesia : 4.72 , international centres : 1.57 , total : 2.68 : a scope 1 tonnes of carbon dioxide equivalent 9 , singapore : 217 , hong kong : 22 , china : 0 , taiwan : 0 , india : 147 , indonesia : 196 , international centres : 0 , total : 582 : b scope 2 tonnes of carbon dioxide equivalent 10 : i . gross location-based , singapore : 16,265 , hong kong : 4,980 , china : 2,068 , taiwan : 2,653 , india : 11,852 , indonesia : 7,185 , international centres : 794 , total : 45,797 "
Metric Result
Output(Scripted results):
Data Retrieve (AI-driven) Data Detective
LLM Magic
Leverages the power of large language models to extract relevant information from ESG reports.
Long Context Window
Can process up to one million tokens, enabling the analysis of even the most comprehensive reports.
Structured Output
Provides results in a structured format, such as JSON, for easy analysis and integration with other systems.
Automated Sleuthing
Automates the retrieval process, freeing users from manual data extraction tasks.
Data Validation: Ensuring Accuracy

1

Validation System
Ensures data accuracy and reliability.
  • Uses automated checks and rules to identify potential errors.
  • Includes data cleansing and transformation techniques such as missing fields, repetitive units, cross validation of related metrics, inconsistent units.

2

"Human-in-the-loop"
Identifies potential errors in the data.
  • Missing or incomplete information
  • Inconsistencies or discrepancies

3

Confidence Scoring
Calculate confidence score based on one report as a unit.
  • Based on factors such as data consistency, and data quality.
  • Allows for prioritization of data for review or correction.
  • Results: The confidence scores of data points for each report are distributed between 0.75 and 1, indicating good extraction performance
Quantitative ESG Assessment: The Scoring Model
Structured Approach
The ESG scoring model uses a structured approach where each ESG dimension includes specific metrics with weighted significance.
Fixed Baseline Averages
Fixed baseline averages use a set benchmark that doesn't change with new data, enhancing stability in scoring.
Impactful Metrics
Metrics are classified as positive or negative, reflecting their impact on ESG performance.
Z-Score Normalization
The system employs z-score normalization to compare company performance against industry benchmarks.
Neutral Score for Missing Data
The model incorporates a neutral score of 5 on a 1-10 scale for missing data, preventing bias in the final ESG score.
Intuitive Letter Rating System
Summarizes ESG performance with a letter rating from ‘AAA’ to ‘CCC,’ making it easy to interpret results and identify sustainability leaders and laggards.
Green Wash Analysis (AI-driven) Greenwashing Guardian
Greenwashing Detection
Analyzes reports for greenwashing shenanigans. Identifies inconsistencies or exaggerated claims in sustainability reports.
Customized Rules
Uses Gemini-Flash to assess claims against established greenwashing criteria. Enables customization of greenwashing rules to fit specific industries or sectors.
Automated Vigilance
Guards against greenwashing with AI insight. Helps companies and investors ensure the legitimacy of sustainability claims.
Real-time ESG Sleuth: Staying Ahead of the Curve

1

AI-Powered Search
Leverages Google Search to hunt down the latest ESG news.

2

Comprehensive Summary
Provides concise summaries of ESG-related news articles, covering both positive and negative perspectives.

3

Automated Updates
Constantly monitors for new ESG developments, ensuring you always have the most current information.
Home Page
Project Overview
Provides a comprehensive introduction to the ESG information extraction and analysis system.
Evaluate
The "Evaluate" button takes users to the Evaluate Page.
Help
Provides access to documentation and FAQs, ensuring users are guided effectively.
The Home Page acts as the entry point for users.
Evaluate Page
File Upload
The Evaluate page offers a user-friendly interface for uploading ESG-related documents, such as sustainability reports, corporate social responsibility statements, and other relevant files.
Data Processing
Once uploaded, the system initiates a sophisticated data processing workflow, ensuring that the uploaded files are properly formatted and prepared for analysis.
Prompt Management
The processed data is then seamlessly integrated into the Prompt Management System, serving as the foundation for further analysis and evaluation.
Model Page
Score Model Visualization
Visualizes the ESG score model, showing the weight and importance of various ESG factors.
Key Indicator Highlights
Provides a clear view of key ESG indicators, such as carbon emissions, social impact, and corporate governance practices.
specific case details
Provides comprehensive data on specific ESG factors, performance trends, and any incidents or controversies for informed decision-making.
Analysis Page

1

Overview
The Analysis Page is the main dashboard for ESG metrics. It presents a comprehensive overview of overall, environmental, social, and governance scores. Users can easily navigate between these categories using buttons or tabs.

2

Environment
The page uses components like ESGBreakdown.tsx and MetricChart.tsx to show detailed environmental metrics with charts and progress indicators. Interactive charts and animations enhance user experience.

3

Social
The page uses components like ESGBreakdown.tsx and MetricChart.tsx to show detailed social metrics with charts and progress indicators. Interactive charts and animations enhance user experience.

4

Government
This section focuses on a company's governance practices, including board composition, risk management, and regulatory compliance. It highlights key metrics related to corporate governance, such as transparency, accountability, and ethical behavior.
The Analysis Page acts as the main dashboard for ESG metrics. It offers a comprehensive overview of overall, environmental, social, and governance scores. Users can easily navigate between these categories using buttons or tabs. The page leverages components like ESGBreakdown.tsx and MetricChart.tsx to provide detailed metrics with charts and progress indicators. Interactive charts and animations enhance user experience.
Validation Page
Workflow Page
This page provides a visual representation of the data processing pipeline, highlighting the various steps involved in extracting, cleaning, and analyzing ESG data.
Contact Page
Phone
+1 (555) 555-5555
Address
123 Main Street, Anytown, CA 12345