Openai/6910a506-2ac0-8003-b4a7-5f5804de4257


This conversation was exported from ChatGPT[1]

gpt-5-thinking edit

User: I'm working with OpenAI Agent Builder and want to execute a ChatKit Client Tool that will left click an UI element by i… edit

I'm working with OpenAI Agent Builder and want to execute a ChatKit Client Tool that will left click an UI element by its HTML ID. Provide me with a simple:

  • A tool definition for Agent Builder
  • A tool implementation for a Lovable project

Assistant: Here’s a minimal, copy-pasteable setup for a client tool that left-clicks an element by its HTML id. edit

Here’s a minimal, copy-pasteable setup for a client tool that left-clicks an element by its HTML id.

Agent Builder – tool definition (JSON) edit

Use this as the tool schema in Agent Builder (mark it as a Client tool):

{
  "name": "click_element_by_id",
  "description": "Left-click a DOM element identified by its HTML id.",
  "strict": true,
  "parameters": {
    "type": "object",
    "properties": {
      "element_id": {
        "type": "string",
        "description": "The HTML id of the target element (without the leading #)."
      }
    },
    "required": ["element_id"],
    "additionalProperties": false
  }
}

Lovable (React) – client tool implementation edit

Add this component anywhere in your Lovable project (e.g., src/components/ChatKitClient.tsx). It registers the client tool and dispatches a real left-click.

import { useEffect } from "react";
import { ChatKit, useChatKit } from "@openai/chatkit-react";

/**
 * Mount this component once (e.g., in your App layout) after you've wrapped your app
 * with <ChatKit .../> and supplied getClientSecret.
 */
export default function ChatKitClientTool() {
  const { control } = useChatKit();

  useEffect(() => {
    if (!control) return;

    // Register the client tool so Agent Builder can call it.
    const unregister = control.registerClientTools([
      {
        name: "click_element_by_id",
        description: "Left-click a DOM element identified by its HTML id.",
        // Keep the schema identical to the Agent Builder definition:
        parameters: {
          type: "object",
          properties: {
            element_id: {
              type: "string",
              description:
                "The HTML id of the target element (without the leading #)."
            }
          },
          required: ["element_id"],
          additionalProperties: false
        },
        handler: async ({ element_id }: { element_id: string }) => {
          try {
            // 1) Find the element
            const el = document.getElementById(element_id);
            if (!el) {
              return {
                ok: false,
                error: "not_found",
                message: <code>No element found with id="${element_id}".</code>
              };
            }

            // 2) Ensure it can be interacted with
            (el as HTMLElement).scrollIntoView({ behavior: "smooth", block: "center" });
            await new Promise((r) => setTimeout(r, 120)); // allow layout/scroll to settle

            // 3) Compute a safe click point (center of the element)
            const rect = el.getBoundingClientRect();
            const clientX = Math.max(0, Math.floor(rect.left + rect.width / 2));
            const clientY = Math.max(0, Math.floor(rect.top + rect.height / 2));

            // If something overlays the element, click the topmost node at that point,
            // but still report the original target.
            const topNode = document.elementFromPoint(clientX, clientY) as HTMLElement | null;
            const target = topNode ?? (el as HTMLElement);

            // 4) Synthesize a full left-click sequence
            const baseOpts: MouseEventInit & { bubbles: boolean; cancelable: boolean } = {
              bubbles: true,
              cancelable: true,
              view: window,
              clientX,
              clientY,
              button: 0 // left
            };

            // Some frameworks listen to pointer events first
            target.dispatchEvent(new PointerEvent("pointerdown", baseOpts));
            target.dispatchEvent(new MouseEvent("mousedown", baseOpts));
            target.dispatchEvent(new PointerEvent("pointerup", baseOpts));
            target.dispatchEvent(new MouseEvent("mouseup", baseOpts));
            target.dispatchEvent(new MouseEvent("click", baseOpts));

            return {
              ok: true,
              clicked: true,
              target_id: element_id,
              topmost_tag: target.tagName,
              note:
                target !== el
                  ? "Click dispatched on the topmost element at the same point."
                  : "Click dispatched on the element with the provided id."
            };
          } catch (err: any) {
            return {
              ok: false,
              error: "unhandled_exception",
              message: err?.message ?? String(err)
            };
          }
        }
      }
    ]);

    return () => unregister?.();
  }, [control]);

  return null;
}
Typical app wrapper (for context) edit

If you don’t already have it, your app shell should include <ChatKit> and mount the tool component once:

// e.g., src/App.tsx
import { ChatKit } from "@openai/chatkit-react";
import ChatKitClientTool from "./components/ChatKitClient";

export default function App() {
  return (
    <ChatKit
      api={{
        // Provide a function that returns a short-lived client secret from your backend.
        getClientSecret: async () => {
          // fetch from your server; return the secret string
          const res = await fetch("/api/chatkit/secret");
          const { secret } = await res.json();
          return secret;
        }
      }}
    >
      {/'' Mount once so the tool is available page-wide ''/}
      <ChatKitClientTool />

      {/'' ...your app UI... ''/}
    </ChatKit>
  );
}
How the agent calls it edit

From Agent Builder, the tool call input should look like:

{ "element_id": "my-button-id" }

That’s it — the agent can now reliably left-click any interactable element by its id.

References edit

  1. "Tool definition and implementation". ChatGPT. Retrieved 2025-11-10.