Populating Azure AI Index

Page content

1. Upload the PDF document

Before populating the Azure AI index, we first need to upload a PDF document to the Azure Storage Account.

The Git repository includes a small sample PDF document, which is a snippet from the Azure AKS service. You can locate and view the PDF at:
./sample-documents/Azure-Kubernetes-Service.pdf. Later, you’ll learn how to index your own PDF documents using this solution. For now, let’s start with this sample PDF.

Run the command below to upload the sample PDF document to the Azure Blob Storage:


bash ./helper.sh upload-pdf

Let’s validate that the PDF document was successfully uploaded to the storage account. Follow these steps:

Click Storage browser
Click Blob containers and data

Observe that there is raw_data directory. Now click at the directory and validate that the Azure-Kubernetes-Service.pdf file is present.

Get help!

If you’re unsure which storage account is being used for the PDF document upload, you can retrieve the information by running the command below:


echo "${storage_account_name}"

alt

At this point the data source already be prepaired, and most likely you also saw the prepaired_data directory.

Go back to the data container and open the prepaired_data directory. Observe:

There is an image directory
There is a text directory

alt

Click at the text directory and observe that there is the Azure-Kubernetes-Service.json file. Let’s download it and look at the JSON object structure. Run the below command to download the file.


storage_account_key=$(az storage account keys list \
    --account-name "${storage_account_name}" \
    --resource-group "${resource_group_name}" \
    --output tsv \
    --query "[0].value")

az storage blob download \
    --account-name "${storage_account_name}" \
    --container-name data \
    --name "prepaired_data/text/Azure-Kubernetes-Service.json" \
    --account-key "${storage_account_key}" | json_pp

Observe that the structure of the object is an array and each object there is a content chunk, and an array of image url. Meaning, there could be zero to many images that are mapped to a chunk of text, and an object will be mapped to the object in Azure AI Search index. We’ll look at it a bit closer once we populate the index.

Before we run the indexer, let’s also validate that the images are also present in the image directory.

Go back to prepaired_data directory, and open the image directory.
Click at the Azure-Kubernetes-Service directory
Observe there are *.png image files.

Feel free to download any and open on your local machine.

2. Run indexer

Now that we know the prepaired data is ready, we’re also to finally run the indexer and populate the index!

Run the below command to run the indexer.


bash ./helper.sh run-indexer

Let’s validate the indexer ran sucessully. Open the Azure AI Search resource and

Click Search management and than click Indexers
Observe the Status is Success
Also observe the Docs succeded isn’t zero anymore but is 8/8
Lastly, observe that under Error/Warning it’s 0/0, meaning all chunks are indexed successfully.

alt

Let’s explore further details about the indexer process. Click at search-aoai-emb-indexer to see more details.

In this view you can see further helpful details, such as:

Indexer runs over time and status of each run
Duration of each run and number of docs that succeded or had error or warning.
In case there are error or warnings, you could click at the status of the individual run, Success in our case, and see further details about the errors and warnings.

alt

3. Query Azure Ai Search index using the build-in Search feature

Azure AI Search comes with build-in search feature, let’s try it out. From the Azure AI Search resource:

Click Search management and after click Indexes
Click at the search-aoai-emb index name
In Search explorer, enter kubernetes and click the Search button.
Observe that in the Results you see relevant search result to the search query (aka kubernetes).

alt

You can also write advance queries using the search explorer, to do so

Click View and select JSON view
Observe that the simple input field changed to JSON query editor, where you can write more complex queries using the lucene query syntax.

alt

Congratulations! You’ve successfully populated the Azure AI Search Index and completed the Document Data Management section! Now we’re ready to move on to the next step—configuring and running the demo application!

« Document Data Management: PDF Document Processing | Application Runtime: Overview »

In today's era of Generative AI, customers can unlock valuable insights from their unstructured or structured data to drive business value. By infusing AI into their existing or new products, customers can create powerful applications, which puts the power of AI into the hands of their users. For these Generative AI applications to work on customers data, implementing efficient RAG (Retrieval augment generation) solution is key to make sure the right context of the data is provided to the LLM based on the user query.