CI/CD of QnA Maker Knowledge Base using Azure DevOps

Mt Swargarohini; Bali Pass trek (October 2021)

Overview

I have been working on a fascinating project where there is a need for NLP. After surfing through some options, we decided to go ahead with Azure’s QnA Maker service. As this NLP requirement is a part of the product that we are building, we had to make sure that it fits well with the rest of the implementation and does not become a hassle when we want to make any changes in the future. The dataset should be open for modifications, and the service should re-train and re-deploy to different environments (dev/staging/production) whenever the dataset gets updated, as a part of deployment processes. Please check out the code related to this article here.

Knowledge Base

If you need to build a smart FAQ chatbot or anything which involves questions and their answers, then QnA Maker is the way to go. We need our solution to be smart enough to understand similar questions so that it could return the same answer. Questions like “How high is Mt. Everest” and “What’s the height of Everest” should be read as the same. For that, we first need to feed in that kind of data to our QnA Maker service, so that it gets a basic understanding of which answer to return for a specific question.

Knowledge Base (KB from hereon) is a collection of questions and answers against which the QnA Maker service will get trained. Every QnA Maker instance runs against a KB, and that’s why it is very important to manage KBs properly. KB could be imported into QnA Maker in various formats. We’ll be importing data for our KB from an excel file (.xlsx), so that the excel file becomes source of truth which is easier to manage.

Need for CI/CD

Consider this scenario: At time T1, QnA Maker is using KB version v1 across all the environments of your solution (dev, staging, and production for now). At T2, a requirement comes in where you need to update an answer in the existing KB, which makes it v2. But you’ll also need to test this modification in dev and staging before pushing this to production. So at T2, you want dev and staging to use v2, but production to still use v1. Not just the KB should be updated across different environments, the re-trained QnA Maker should be re-deployed automatically so that QnA Maker uses the latest KB. This makes us realize that CI/CD of KB and QnA Maker is something important.

Flow

Deployment flow

Deployment pipeline

We’ll be using Azure DevOps for CI/CD purposes. Our aim is as follows: We want the same pipeline to deploy to different environments, based on the branch. If new changes are pushed to the dev branch, then a pipeline should be run for dev-related changes. Hence, if the excel sheet containing questions and answers is updated in dev, then we want the KB to be updated and published so that the QnA service can now return modified answers. Hence, developers only need to manage questions and answers in the excel, while our deployment pipeline will take care of reflecting changes wherever required.

Another small thing that I’ve assumed is that each environment needs to be deployed to its own Azure subscription. Hence, variable groups need to have the service connection name of the respective subscription.

We need to create a new pipeline based on our azure-pipeline.yml. Different steps/tasks of the pipeline are as follows:

  • Variables: Library groups are a nice way to keep different sets of variables for different environments. We want our pipeline to conditionally pick the right set of variables, based on the triggering branch. If new changes are pushed to staging, then the pipeline must only access the staging variable group.
  • Replace tokens: Parameters of our IaC (ARM in our case) deployment need to be dynamic, based on the environment to which deployment is being done. Hence, we’ll need our pipeline to pick up values from variables and use those values as parameters in the ARM deployment.
  • Azure PowerShell: PowerShell task to deploy the ARM template. The ARM template provisions QnA Maker service and Azure Key Vault. The template provisions a key vault secret for QnA Maker’s authoring key, which we’ll need in Python script.
  • Azure Key Vault: This task downloads secrets from Key Vault so that they can be used as environment variables of the pipeline. We are specifically interested in the authoring key of QnA Maker.
  • Scripts: pip installs – requests library for consuming REST endpoints and openpyxl library for reading excel files easily.
  • Python script: Explained below.

Python Script

The final task of our pipeline will execute the python script which basically converts excel data to an object, and the object is sent as a payload of create/update KB REST APIs. The script accepts the following arguments: QnA’s host endpoint, authoring key, and name of the KB which needs to be created/updated. QnA Maker’s REST API references can be found here.

For authenticating against the endpoints, we need to pass the QnA’s authoring key as the value of “Ocp-Apim-Subscription-Key” header.

The script also checks if the KB has already been provisioned. If yes, then it simply updates it, and if not, then it creates a new KB.

In order to create a new KB, we need to make a POST call to /create endpoint. The endpoint returns 202 after accepting the request. A small issue with this is that our script at this point isn’t sure whether the KB got successfully created or not, as the endpoint has just acknowledged our request, but hasn’t actually confirmed the creation of the KB. We need a way to check the status of KB creation.

For monitoring status, we can use /operations endpoint. It returns “operationState” as “Succeeded” when provisioning of the KB is complete. We need to poll this endpoint till we get the desired state. Below is the monitoring function, which returns ID of the KB.

def monitorOperation(host, operationId):
    state = ""
    count = 0
    while(state != "Succeeded" and count < 10):
        response = requests.get(host + "operations/" + operationId, headers={'Ocp-Apim-Subscription-Key': subscriptionKey, 'Content-Type': 'application/json'})
        state = response.json()['operationState']
        count = count + 1
        time.sleep(1)
    if(count == 10 and state != "Succeeded"):
        raise Exception("Something went wrong while creating KB")
    return response.json()['resourceLocation'].split('/')[-1]

After creating/updating KB, we need to make the new changes available for end users. For that, we need to /publish the KB.

To see how an application (.NET Core Console App in our case) can consume the QnA Maker service, please refer to this code.

Notes

  • The Python script assumes that the Excel sheet’s first column is for questions, and the second one is for answers. Please check out the Excel here to understand the format.
  • While provisioning Key Vault using ARM, we need to keep in mind that DevOps will need access to Key Vault secrets. Hence, it’s a good idea to specify the access while provisioning the Key Vault itself, like this:
"accessPolicies": [
                    {
                        "tenantId": "[subscription().tenantId]",
                        "objectId": "[parameters('devOpsSpnObjectId')]",
                        "permissions": {
                            "keys": [],
                            "secrets": [
                                "Get",
                                "List",
                                "Set"
                            ],
                            "certificates": []
                        }
                    }
                ],

And we also need to add DevOps SPN’s object ID in variable groups. Example of a variable group:

dev variable group

All the required code and files can be found here. Thanks for reading!

Developing a food delivery system using Azure functions

Mt. Trishul (7120m) as seen from Ali Bedni Bugyal (March 2021)

Overview

When I was in college, I did an internship with a food delivery start up. While exploring serverless tech, that wonderful internship experience came to my mind. Even though the system was quite simple and we didn’t face any issues at that time, I now realize how difficult it would have been to manage scaling if we had seen a sudden surge in the number of orders. Naturally, serverless seems a lucrative option here because of development ease, optimal cost, and worry less scaling. So I thought of doing a bit of retrospection by developing a food delivery system using serverless tech. I hope to share the best of my serverless learnings using this article as the medium.

This project covers very basic functionalities of a food delivery platform. It’s main purpose is to manage orders placed by customers. You can find the source code here. A typical flow of such platform will look something like this:

Swimlane diagram
  1. The customer opens up a list of restaurants.
  2. The customer adds food items to the cart from menu.
  3. The customer places the order.
  4. The restaurant accepts the order.
  5. A delivery executive picks up the order from the restaurant.
  6. The executive delivers the order to the customer.
Components of the system

Azure functions:

Each one of these steps is basically an API, which is HTTP triggered Azure functions in our case. The functions insert the order payload in a queue for further processing. We’ll have one function app to manage all these order related APIs.

As we also need to interact with Restaurants APIs to fetch the list of restaurants and their menu items, we can have a separate function app for restaurant related APIs.

I’ve also created a function app for testing purpose. This function app will mock order lifecycle by placing multiple orders every minute continuously throughout the day.

Azure durable functions:

Once an order is passed to a queue, the order will be processed by a service bus queue triggered client function. These client functions will be responsible for starting an orchestration based on some trigger or for raising external events.

Orchestrator functions will be responsible for orchestrating activity functions and to wait for external events, like accept order or delivered order etc. One of those activity functions update order status using output binding with Azure Cosmos DB. We’ll have a separate function app for orchestrators as we want orchestration to scale independent of order APIs.

In almost any kind of delivery system, some kind of human interaction is expected. In case of food delivery, it can be a restaurant accepting an order, a delivery executive accepting a delivery, the delivery executive marking the order as delivered etc. All of these interactions must be performed within a fixed time. If the restaurant does not accept an order within let’s say a minute, then some kind of alert should be raised for the restaurant. If the order still isn’t accepted within next 5 minutes, then the order should be cancelled and the customer should be informed.

Durable functions offer a great way to manage human interaction in orchestrations with timer tasks. We can make use of human interaction pattern to raise external events for an orchestration when restaurant accepts the order. This will help us in monitoring if some external event has taken place or not without using any kind of polling. Orchestrator functions can wait for these external events and perform tasks when the event is received. If the event is not received within some specified time, then that case can be handled as well by writing the logic in some activity function.

Sequence diagram of the system

Azure Service Bus Queues:

Azure Service Bus is a great service to manage advanced queuing scenarios. The HTTP triggered functions will push orders to these queues, which can be consumed by durable client functions. We can have different queues for each order status: order-new-queue, order-accepted-queue, order-outfordelivery-queue and order-delivered-queue.

Azure Cosmos DB:

Azure Cosmos DB can be used to store restaurant and order details. It’s low latency reads and writes can definitely help us in building a good platform. Azure functions’ Cosmos DB output binding can be used to update order status as a part of orchestration. I have used serverless tier of this service just to give it a try, and it has worked great for me in this project. Efficient provisioning of RU/s is a challenge, that’s where serverless tier comes in where we’ll be charged for total RU/s consumed.

The API functions will first need to get order from Cosmos DB, and then push the order message to their respective queues. This can be very well handled using Cosmos DB input binding in the functions. One of the functions look like this:

  [FunctionName("OrderAccepted")]
        public async Task<IActionResult> OrderAccepted([HttpTrigger(AuthorizationLevel.Anonymous, "get", Route = "orders/accepted/{orderId}")] HttpRequest req,
            [ServiceBus("%OrderAcceptedQueue%", Connection = "ServiceBusConnection")]
            IAsyncCollector<dynamic> serviceBusQueue,
            [CosmosDB(
                databaseName: "FoodDeliveryDB",
                collectionName: "Orders",
                ConnectionStringSetting = "CosmosDbConnectionString",
                Id = "{orderId}",
                PartitionKey = "{orderId}")] Order order,
        ILogger log)
        {

            try
            {
                await AddToQueue(order, serviceBusQueue);
                return new OkObjectResult(order);
            }
            catch (Exception ex)
            {
                log.LogError(ex.ToString());
                return new InternalServerErrorResult();
            }
        }

It is as easy as that to integrate Cosmos DB with Azure functions! Saves a lot of time and efforts. Really!

Performance and cost

After deploying this solution to Azure, I monitored it for a few days and improved it gradually to make it as much performance and cost optimized as I can. This was my favorite part of this whole project as I got to learn A LOT from this, which I’ll tell more about in the next section. My goal was to handle maximum number of orders with minimum cost and latency, and of course, no failed orders. Please feel free to reach out to me if you think that I missed or misanalysed any point, still learning 🙂

  1. The load testing function places 60 orders every minute. That is, 3600 orders every hour, and 86400 orders every day. The load obviously can be increased or decreased, but I’ve based my analysis on this much load.
  2. Order cost (Total orders/Total cost) is INR 0.02 (~$0.00027). That is, cost of the whole cloud infrastructure to process one order is just INR 0.02 (Sounds lucrative enough for a small startup 😉 )
  3. Average server response time is around 27ms.

Learnings

Enforcing maximum scale out limit: Earlier I was under the impression that I don’t really need to worry about Azure functions’ cost as I’m charged for execution seconds, and that the number of instances don’t matter. But while working on this project, I realized that’s not the case every time. Number of instances have no impact on functions cost as the charges are based on executions, but they do impact storage account costs. The reason for that is: each instance competes for blob lease. Higher number of instances means higher number of blob transactions, hence higher transaction costs.

Instances when maximum scale out limit is not enforced

By default, Azure functions running on Consumption plan can scale out to maximum 200 instances. In documentations it is mentioned that function scaling depends on type of the trigger. This may result in a lot instances being run in parallel, out of which most instances’ CPU consumption is around just 0-1% (as shown in the image above). To get rid of those less used instances and to involve a fixed set of instances more, I turned on the setting “Enforce scale out limit” and set it maximum scale out limit to 5. This resulted in conservative scaling and much lesser blob transactions.

Enforce scale out limit on function app

The dip in the image below is the point when I had enabled this setting and could see better results from that very instant.

Blob transactions


Azure functions logging: Durable functions emit significant amount of logs, which if left unhandled, can result in high application insights data ingestion, resulting in higher cost. Refer to logging in Azure functions and durable functions to configure logging properly. In this project, I’ve just logged well-formatted exceptions to avoid unwanted dump of logs.

Azure Queues vs Service bus queues: Initially, I had used Azure Queues instead of service bus queues, and queue triggered functions were triggered by these Azure Queues (I don’t know why I made that choice, I have no justification for this). Azure Queues function trigger uses random exponential back off polling mechanism, which led to delays of a few seconds in queue message processing when orders were placed in random order.

Polling interval can be configured using maxPollingInterval property in host.json settings, but if the function is made to poll the queue very frequently, then it will impact storage account costs. That’s why I felt more comfortable in opting Service bus queues. It’ll also help in advanced queueing scenarios if required as the application grows and becomes more complicated.

Durable functions replays: In order to re-build the local state, orchestrator functions are replayed. For each replay, if not handled, log messages will be written. This will lead to duplication of log messages which isn’t helpful in anyway. Just add this piece of code which tells logger to filter out replayed logs:

log = context.CreateReplaySafeLogger(log);

CosmosDB: During testing I increased the number of orders by a factor of 6. That means that the total number of RU/s consumed should get increased by the same factor (as shown in the image below), and same applies to cost as well. This is what I liked about serverless tier – this predictability.

Total RU/s consumed

Conclusion

Serverless stack for a system is in my opinion definitely a good choice, and it seems to be the future now as it brings a lot of nice things to the table. Impressive performance can be obtained with lesser development efforts and costs. I feel like small or mid scale organizations can benefit a lot from this stack as it can bring down go to market time for their products without worrying about IT operations too much.

I wish my project is helpful to other developers to learn or get started with developing systems using serverless architecture. I’ll keep on updating the same project whenever I feel like there is scope of improvement or based on other folks’ suggestions. I want to extend it even further such that organizations or developers working on similar use case can use it as a template and build on top of it, hence I’ll keep on working on it whenever possible.

Source code: Food delivery system

Happy learning!

Update: I got an opportunity to talk about this topic in a session conducted by Pune Tech Community. Check out recording of the session below:

Authorization of applications in an Azure Function

On the way to Brahmatal summit (December 2018)

Introduction

While working with Microsoft Graph, most of us have assigned application permissions to an application so that the application can fetch data from Graph APIs based on the assigned permissions.

In this article, let’s try to imagine and develop things for the other end, for the API end. By the end of this article, you’ll understand how an incoming token which is created by an application can be validated before fetching resources for the request. For an API, I’m using a HTTP triggered Azure Function.

Continue reading Authorization of applications in an Azure Function

Importing Power Platform solution – 2

Kedarkantha base camp
Sunset at Kedarkantha base camp (December 2019)

This post is the second part of a two-part blog series on Importing Power Platform solution.

In this series, I would like to show how a Power Platform solution can be programmatically imported into a target environment using two ways:

  1. Delegated permission
  2. Application user

Introduction

In the first part of this blog series, I wrote about how can we import a Power Platform solution from a user’s context using delegated permission of Dynamics CRM. But what if we want to import a solution from an application’s context, where there is no user interaction at all? In this blog, I’ll try to explain the same.

Continue reading Importing Power Platform solution – 2

Importing Power Platform solution – 1

Kedarkantha summit
Sunrise at Kedarkantha summit (December 2019)

This post is the first part of a two-part blog series on Importing Power Platform solution.

In this series, I would like to show how a Power Platform solution can be programmatically imported into a target environment using two ways:

  1. Delegated permission
  2. Application user

Introduction

While Application Lifecycle Management of Power Platform solutions can be done using Power Platform Build Tools in Azure DevOps, solutions can be managed using custom code (PowerShell, Rest API, SDK API) as well. In this article, I’ll try to explain how we can import a Power Platform solution into a target environment using a delegated permission.

Continue reading Importing Power Platform solution – 1

Meeting Scheduler Bot in Teams: Bot Framework

Sunset at Brahmatal basecamp (December 2018)

Introduction

Imagine an experience where you ask your personal assistant to set up a meeting with your team, but your team is working on a big project and you don’t know at what time your team will be available for a meeting. No worries, your PA is super smart and quickly prepares a list of first 5 most suitable timings based on availability of all the attendees. Impressed, right? We will try to imitate the same experience by developing our own PA, a Teams bot.

Continue reading Meeting Scheduler Bot in Teams: Bot Framework

PowerApps vs Native Apps: Make the right choice

Maninda Tal, Har ki Doon (June 2019)
Maninda Tal, Har ki Doon (June 2019)

Introduction

I’m a recent college graduate. You get it now, I’m used to do things the hard and long way as constraints like time, ease of development, team collaboration etc. were not applicable to me till now. I was not exposed to industry level problems/constraints, so even if I had great ideas, it used to take me very long to develop a product. I must say I got so overwhelmed with Microsoft technologies when I started working as a Cloud Consultant. Everyday has been a new learning for me since I started my career at Rapid Circle. So this blog is an attempt of sharing my experience of going from “Woah I have to write a lot of code” to “It could be done in a short time, but woah there are so many things out there, what should I choose”. I chose to write about Power Apps because I had worked a lot on native Android when I was in college, and currently I’m exploring Power Apps. There are so many things about Power Apps that I haven’t explored yet, but that’s the story of every tech guy, right?

Continue reading PowerApps vs Native Apps: Make the right choice