Skip to content

4.3 Summarize documents

Azure Cognitive Services, specifically the Azure Language Service's Summarization capability, provides advanced extractive text summarization to condense complex documents like Statements of Work (SoWs) into concise summaries.

By integrating this capability with the PostgreSQL Azure_AI extension, you can dynamically generate summaries for documents stored in relational tables, streamlining workflows and improving document accessibility.

Summarizing Statements of Work (SoWs)

SoWs in the financial industry often contain extensive details about project scope, deliverables, and milestones. Summarizing these documents into a few sentences allows decision-makers to quickly grasp the key information without reading through lengthy documents.

Key Benefits for Summarization

  • Time Efficiency: Quickly identify critical information from long-form documents.
  • Enhanced Accessibility: Summaries provide concise overviews, improving decision-making processes.
  • Scalable Automation: Automatically generate summaries for large volumes of documents without manual intervention.

Azure's Summarization API within the Language Service enables extractive summarization, creating human-like summaries that convey the document's essence rather than just extracting key phrases.


Using Azure_AI Extension with the azure_cognitive Schema

The Azure_AI extension integrates Azure Cognitive Services' Summarization capabilities directly into SQL workflows, allowing the generation of extractive summaries of SoWs or other financial documents using simple SQL commands.

Extractive Summarization

The extension's extractive summarization capabilities provide a unique, natural-language summary that encapsulates the overall intent of the original text. This is performed by calling the azure_cognitive.summarize_extractive function within the database. This will generate a 2-3 sentence summary of the text passed in.

SQL
1
SELECT azure_cognitive.summarize_extractive('This is a document text', 'en', 2)

Consider the following PostgreSQL table:

SQL
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
CREATE TABLE IF NOT EXISTS sows (
    id BIGSERIAL PRIMARY KEY,
    number text NOT NULL,
    vendor_id BIGINT NOT NULL,
    start_date DATE NOT NULL,
    end_date DATE NOT NULL,
    budget DECIMAL(18,2) NOT NULL,
    document text NOT NULL,
    metadata jsonb,
    summary text,
    FOREIGN KEY (vendor_id) REFERENCES vendors (id) ON DELETE CASCADE
);
SQL
1
2
3
-- Update the summary column with extractive summaries
UPDATE sows
SET summary = azure_cognitive.summarize_extractive(metadata::text, 'en', 2);
SQL
1
SELECT azure_cognitive.summarize_extractive('This is a document text', 'en', 2);

Insert Document Summary on Database Insert

Leveraging the azure_cognitive.summarize_extractive method of the azure_ai extension, the database scripts are able to make calls to generate a document summary on INSERT or UPDATE.

Here's an example INSERT script used by the application when creating SOW records that includes the summarization:

SQL
1
2
3
4
5
6
7
INSERT INTO sows (number, start_date, end_date, budget, document, metadata, embeddings, summary, vendor_id)
VALUES (
    $1, $2, $3, $4, $5, $6, 
    azure_openai.create_embeddings('embeddings', $7, throw_on_error => FALSE, max_attempts => 1000, retry_delay_ms => 2000),
    azure_cognitive.summarize_extractive($7, 'en', 2)
    $8)
RETURNING *;

API Implementation

The /sows/ HTTP POST method of the REST API contains code that inserts or updates SOWs based on the document uploaded. The code for this is within the src/api/app/routers/sows.py file. Open it now in Visual Studio Code and explore the code within the async def analyze_sow method that contain the code to ingest SOW documents, including the portion that performs the database INSERT or UPDATE on the sows table.

You can expand the section below to see the specific section of code that performs the azure_ai call to generate the document summary, within the database INSERT and UPDATE statements.

INSERT / UPDATE SOW with document summary generation
src/api/app/routers/sows.py
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
# Create SOW in the database
async with pool.acquire() as conn:
    if sow_id is None:
        # Create new SOW
        row = await conn.fetchrow('''
            INSERT INTO sows (number, start_date, end_date, budget, document, metadata, summary, vendor_id)
            VALUES (
            $1, $2, $3, $4, $5, $6, 
            azure_cognitive.summarize_extractive($7, 'en', 2),
            $8)
            RETURNING *;
        ''', sow_number, start_date, end_date, budget, documentName, json.dumps(metadata), full_text, vendor_id)
    else:
        # Update existing SOW with new document
        row = await conn.fetchrow('''
            UPDATE sows
            SET start_date = $1,
                end_date = $2,
                budget = $3,
                document = $4,
                metadata = $5,
                summary = azure_cognitive.summarize_extractive($6, 'en', 2)
            WHERE id = $7
            RETURNING *;
        ''', start_date, end_date, budget, documentName, json.dumps(metadata), full_text, sow_id)

Additional Learning References