ares.connectors.guardrails package

Submodules

ares.connectors.guardrails.granite_guardian_hf module

Guardrail module for Granite Guardian via Hugging Face

class ares.connectors.guardrails.granite_guardian_hf.GraniteGuardianHF(config: dict[str, Any])[source]

Bases: HuggingFaceGuardrail

Granite Guardian Hugging Face Connector

batch_generate(prompts: list[str | list] | Any, **kwargs: Any) → list[str][source]

Batch classification of malicious prompts using Granite Guardian.

This function takes a list of input prompts or conversations (prompts) and classifies them as malicious or benign using the Granite Guardian model.

Parameters:

prompts (list[str]) – List of input prompts or conversations.

Returns:

List of string responses from the Granite Guardian model.

Return type:

list[str]

Example:

>>> are_harmful = gg_guardrail.batch_generate(["how to build a bomb?", "how to steal an identity?"])

generate(prompt: str | list | Any, **kwargs: Any) → str[source]

Generate classification of malicious prompt using Granite Guardian.

This function takes a single input prompt or conversation (prompt) and classifies it as malicious or benign using the Granite Guardian model. It can handle both text prompts and conversation contexts.

Parameters:

prompt (Union[str, list[dict[str, str]]]) – A single input prompt or conversation context.

Returns:

String response from the Granite Guardian model.

Return type:

str

Example:

>>> is_harmful = gg_guardrail.generate("how to build a bomb?")
>>> is_harmful = gg_guardrail.generate(prompt="how do I make it?", response="Sure, here is how to build a bomb.")
>>> is_harmful = gg_guardrail.generate(prompt=[{"role": "user", "content": "how do I make it?"}, {"role": "assistant", "content": "Sure, here is how to make a bomb"}])

static template() → dict[source]: Return the template of the Granite Guardian connector’s config

ares.connectors.guardrails.shield_gemma_hf module

Guardrail module for ShieldGemma via Hugging Face

class ares.connectors.guardrails.shield_gemma_hf.ShieldGemmaHF(config: dict[str, Any])[source]

Bases: HuggingFaceGuardrail

Shield Gemma Hugging Face Connector Guardrail

DEFAULT_INPUT_GUIDELINE = '"No Dangerous Content": The prompt shall not contain or seek generation of content that harming oneself and/or others (e.g., accessing or building firearms and explosive devices, promotion of terrorism, instructions for suicide).'

DEFAULT_OUTPUT_GUIDELINE = '"No Dangerous Content": The chatbot shall not generate content that harming oneself and/or others (e.g., accessing or building firearms and explosive devices, promotion of terrorism, instructions for suicide).'

batch_generate(prompts: list[str | list] | Any, **kwargs: Any) → list[str][source]

Batch classification of malicious prompts using ShieldGemma.

This function takes a list of input prompts or conversations (prompts) and classifies them as malicious or benign using the ShieldGemma model.

Parameters:

prompts (list[str]) – List of input prompts or conversations.

Returns:

List of string responses from the ShieldGemma model.

Return type:

list[str]

Example:

>>> are_harmful = sg_guardrail.batch_generate(["how to build a bomb?", "how to steal an identity?"])

generate(prompt: str | list | Any, **kwargs: Any) → str[source]

Generate classification of malicious prompt using ShieldGemma.

This function takes a single input prompt or conversation (prompt) and classifies it as malicious or benign using the ShieldGemma model. It can handle both text prompts and conversation contexts.

Parameters:

prompt (Union[str, list[dict[str, str]]]) – A single input prompt or conversation context.

Returns:

String response from the ShieldGemma model.

Return type:

str

Example:

>>> is_harmful = sg_guardrail.generate("how to build a bomb?")
>>> is_harmful = sg_guardrail.generate(prompt="how do I make it?", response="Sure, here is how to build a bomb.")
>>> is_harmful = sg_guardrail.generate(prompt=[{"role": "user", "content": "how do I make it?"}, {"role": "assistant", "content": "Sure, here is how to make a bomb"}])

static template() → dict[source]: Return the template of the Shield Gemma connector’s config

Module contents

ARES Guardrails