Skip to content

Opensearch Plugin

Madhumita Subramaniam edited this page Apr 29, 2025 · 10 revisions


title Opensearch Cedarling Plugin


actor User
participant BOT
participant AS
participant API
participant OpenSearch
participant Disk
participant Plugin
participant Open Search Cedarling

autonumber 1

User<->BOT: Invoke Bot
box over BOT: Dovie.ai\nStarting up!
BOT<->AS: Register
BOT<->AS: Get JWT Access Token
BOT->API: Get me all data for tenant Acme_Inc and Account Foo_Bar
API->OpenSearch: Return all data for tenant=acme_inc
OpenSearch<->Disk: fetch bits
OpenSearch->Plugin: Filter out unauthorized data
Plugin<->Open Search Cedarling: Authorize data against policies
Plugin->OpenSearch: data
OpenSearch->API: data
API->BOT: data

Potential Addition to the Problem statement

  • Certain records for tenant Acme are labeled confidential, and there is a policy that no confidential information should be returned to the bot.

Some basic reference documents:

  1. Opensearch lingo - https://www.instaclustr.com/blog/learning-opensearch-from-scratch-part-1/

  2. Deepseek - Opensearch connector + BOT https://opensearch.org/blog/OpenSearch-Now-Supports-DeepSeek-Chat-Models/

Notes for Puja

  1. Use this java program - https://github.com/JanssenProject/jans/blob/main/jans-auth-server/client/src/test/java/io/jans/as/client/ws/rs/DcrDemo.java
  2. Use this document, Replicate in java - https://github.com/GluuFederation/tutorials/blob/master/cedarling/react/react-cedarling-rbac-integration-authorization.md
  3. Cedar syntax and rules - https://docs.cedarpolicy.com/

Cedarling schema

  • Principal : AI BOT
  • Action : Read
  • Rescource : tickets
  • Context : tenent = ABC, account = PQR, level = 1

Entity Types:

  1. User: the human making the request
  2. Bot: the software service (e.g. bot:SupportBot)
  3. Tenant (e.g. Acme_Inc)
  4. Account (e.g. Foo_Bar under Acme_Inc)
  5. Ticket (a support ticket)
schema {
  entity User {
    roles: set<string>,
    tenant: Tenant,
    account: Account
  }

  entity Bot {
    // bots may be allowed to act on behalf of users
    authorized_users: set<User>
  }

  entity Tenant {}

  entity Account {
    tenant: Tenant
  }

  entity Ticket {
    tenant: Tenant,
    account: Account
  }

  action ViewTicket
}


Policy

permit(
  principal: Bot,
  action == Action::"ViewTicket",
  resource: Ticket::"acme_inc/foo_bar"
)
when {
  // Acting on behalf of some user
  some user in principal.authorized_users
  if principal.tenant == resource.tenant &&
     principal.type = Workload && principal.trusted == true &&

};

Example:

User::"user:agent_1" {
  roles: ["support_agent"],
  tenant: Tenant::"tenant:Acme_Inc",
  account: Account::"account:Foo_Bar"
}

Bot::"bot:SupportBot" {
  authorized_users: [User::"user:agent_1"]
}

Ticket::"ticket:456" {
  tenant: Tenant::"tenant:Acme_Inc",
  account: Account::"account:Foo_Bar"
}

Indices in Opensearch

Index Purpose
users Store user metadata
tickets Store ticket metadata & questions
ticket_answers Store support answers

Index: Tickets

PUT tickets
{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "properties": {
      "id": { "type": "keyword" },
      "title": { "type": "text" },
      "description": { "type": "text" },
      "status": { "type": "keyword" },
      "is_deleted": { "type": "boolean" },
      "date_added": { "type": "date" },
      "date_modified": { "type": "date" },
      "assigned_to_id": { "type": "keyword" },
      "created_by_id": { "type": "keyword" },
      "modified_by_id": { "type": "keyword" },
      "is_private": { "type": "boolean" },
      "link_url": { "type": "keyword" },
      "answers_no": { "type": "integer" },
      "send_copy": { "type": "boolean" },

      "os_type": { "type": "keyword" },
      "os_version": { "type": "keyword" },
      "ram": { "type": "keyword" },
      "gluu_server_version_id": { "type": "keyword" },
      "gluu_server_version_comments": { "type": "text" },
      "created_for_id": { "type": "keyword" },
      "issue_type": { "type": "keyword" },
      "last_notification_sent": { "type": "date" },
      "ticket_category": { "type": "keyword" },
      "company_association_id": { "type": "keyword" },
      "visits": { "type": "integer" },

      "os_version_name": { "type": "text" },
      "meta_keywords": { "type": "text" },
      "set_default_gluu": { "type": "boolean" },
      "os_name": { "type": "keyword" },
      "container_management": { "type": "keyword" },
      "deployment_architecture": { "type": "keyword" },
      "gluu_edition": { "type": "keyword" },

      "company": { "type": "keyword" },       <-- for tenant filtering
      "account": { "type": "keyword" },
      "full_text": { "type": "text" }         <-- optional field for RAG/chatbot search
    }
  }
}

Index: Ticket_answer

PUT ticket_answers
{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "properties": {
      "id": { "type": "keyword" },
      "ticket_id": { "type": "keyword" },
      "answer": { "type": "text" },
      "link_url": { "type": "keyword" },
      "privacy": { "type": "keyword" },
      "is_deleted": { "type": "boolean" },
      "date_added": { "type": "date" },
      "date_modified": { "type": "date" },
      "created_by_id": { "type": "keyword" },
      "send_copy": { "type": "boolean" },
      "is_from_email": { "type": "boolean" },

      "company": { "type": "keyword" },
      "account": { "type": "keyword" },
      "full_text": { "type": "text" }         <-- useful for search/RAG
    }
  }
}

Index: Users

PUT users
{
  "mappings": {
    "properties": {
      "id": { "type": "keyword" },
      "username": { "type": "keyword" },
      "email": { "type": "keyword" },
      "first_name": { "type": "text" },
      "last_name": { "type": "text" },
      "is_active": { "type": "boolean" },
      "is_superuser": { "type": "boolean" },
      "is_staff": { "type": "boolean" },
      "last_login": { "type": "date" },
      "date_joined": { "type": "date" },
      "modified": { "type": "date" },

      "company": { "type": "keyword" },
      "is_company_admin": { "type": "boolean" },
      "job_title": { "type": "text" },
      "mobile_number": { "type": "keyword" },
      "idp_uuid": { "type": "keyword" },
      "company_association_id": { "type": "keyword" },
      "timezone": { "type": "keyword" },

      "receive_all_notifications": { "type": "boolean" },
      "crm_uuid": { "type": "keyword" },
      "get_email_notification": { "type": "boolean" },
      "all_ticket_permission": { "type": "boolean" },
      "is_from_registration": { "type": "boolean" },
      "is_onboarding_email_sent": { "type": "boolean" }
    }
  }
}

Steps in Opensearch

  1. Create indices
  2. import data from mysql
from opensearchpy import OpenSearch, helpers
import pymysql
from tqdm import tqdm

# --- Config ---
MYSQL_CONFIG = {
    "host": "localhost",
    "user": "your_user",
    "password": "your_password",
    "db": "your_db",
    "cursorclass": pymysql.cursors.DictCursor
}

OPENSEARCH_CONFIG = {
    "hosts": [{"host": "localhost", "port": 9200}],
    "http_auth": ("admin", "admin"),  # if using basic auth
    "use_ssl": False
}

# --- Connect to MySQL ---
mysql_conn = pymysql.connect(**MYSQL_CONFIG)
os_client = OpenSearch(**OPENSEARCH_CONFIG)

# --- Helper to bulk index ---
def bulk_index(index_name, docs):
    actions = [
        {
            "_index": index_name,
            "_source": doc
        }
        for doc in docs
    ]
    helpers.bulk(os_client, actions)

# --- Read Tickets ---
def get_tickets():
    with mysql_conn.cursor() as cursor:
        cursor.execute("SELECT * FROM tickets")
        return cursor.fetchall()

# --- Read Ticket Answers ---
def get_ticket_answers():
    with mysql_conn.cursor() as cursor:
        cursor.execute("SELECT * FROM ticket_answers")
        return cursor.fetchall()

# --- Main ---
if __name__ == "__main__":
    print("Fetching tickets...")
    tickets = get_tickets()
    print(f"Fetched {len(tickets)} tickets")

    print("Indexing tickets...")
    bulk_index("tickets", tickets)

    print("Fetching ticket answers...")
    answers = get_ticket_answers()
    print(f"Fetched {len(answers)} answers")

    print("Indexing answers...")
    bulk_index("ticket_answers", answers)

    print("✅ Done.")
  1. Query on the lines of :
GET tickets/_search
{
  "query": {
    "bool": {
      "filter": [
        { "term": { "company": "Acme_Inc" }},
        { "term": { "account": "Foo_Bar" }}
      ]
    }
  }
}

  1. Integration with DeepSeek embeddings for vector search - We're leaving out this step (AI)
Clone this wiki locally