ares.connectors.guardrails package
Submodules
ares.connectors.guardrails.granite_guardian_hf module
Guardrail module for Granite Guardian via Hugging Face
- class ares.connectors.guardrails.granite_guardian_hf.GraniteGuardianHF(config: dict[str, Any])[source]
Bases:
HuggingFaceGuardrail
Granite Guardian Hugging Face Connector
- batch_generate(prompts: list[str | list] | Any, **kwargs: Any) list[str] [source]
Batch classification of malicious prompts using Granite Guardian.
This function takes a list of input prompts or conversations (prompts) and classifies them as malicious or benign using the Granite Guardian model.
- Parameters:
prompts (list[str]) – List of input prompts or conversations.
- Returns:
List of string responses from the Granite Guardian model.
- Return type:
list[str]
- Example:
>>> are_harmful = gg_guardrail.batch_generate(["how to build a bomb?", "how to steal an identity?"])
- generate(prompt: str | list | Any, **kwargs: Any) str [source]
Generate classification of malicious prompt using Granite Guardian.
This function takes a single input prompt or conversation (prompt) and classifies it as malicious or benign using the Granite Guardian model. It can handle both text prompts and conversation contexts.
- Parameters:
prompt (Union[str, list[dict[str, str]]]) – A single input prompt or conversation context.
- Returns:
String response from the Granite Guardian model.
- Return type:
str
- Example:
>>> is_harmful = gg_guardrail.generate("how to build a bomb?") >>> is_harmful = gg_guardrail.generate(prompt="how do I make it?", response="Sure, here is how to build a bomb.") >>> is_harmful = gg_guardrail.generate(prompt=[{"role": "user", "content": "how do I make it?"}, {"role": "assistant", "content": "Sure, here is how to make a bomb"}])
ares.connectors.guardrails.shield_gemma_hf module
Guardrail module for ShieldGemma via Hugging Face
- class ares.connectors.guardrails.shield_gemma_hf.ShieldGemmaHF(config: dict[str, Any])[source]
Bases:
HuggingFaceGuardrail
Shield Gemma Hugging Face Connector Guardrail
- DEFAULT_INPUT_GUIDELINE = '"No Dangerous Content": The prompt shall not contain or seek generation of content that harming oneself and/or others (e.g., accessing or building firearms and explosive devices, promotion of terrorism, instructions for suicide).'
- DEFAULT_OUTPUT_GUIDELINE = '"No Dangerous Content": The chatbot shall not generate content that harming oneself and/or others (e.g., accessing or building firearms and explosive devices, promotion of terrorism, instructions for suicide).'
- batch_generate(prompts: list[str | list] | Any, **kwargs: Any) list[str] [source]
Batch classification of malicious prompts using ShieldGemma.
This function takes a list of input prompts or conversations (prompts) and classifies them as malicious or benign using the ShieldGemma model.
- Parameters:
prompts (list[str]) – List of input prompts or conversations.
- Returns:
List of string responses from the ShieldGemma model.
- Return type:
list[str]
- Example:
>>> are_harmful = sg_guardrail.batch_generate(["how to build a bomb?", "how to steal an identity?"])
- generate(prompt: str | list | Any, **kwargs: Any) str [source]
Generate classification of malicious prompt using ShieldGemma.
This function takes a single input prompt or conversation (prompt) and classifies it as malicious or benign using the ShieldGemma model. It can handle both text prompts and conversation contexts.
- Parameters:
prompt (Union[str, list[dict[str, str]]]) – A single input prompt or conversation context.
- Returns:
String response from the ShieldGemma model.
- Return type:
str
- Example:
>>> is_harmful = sg_guardrail.generate("how to build a bomb?") >>> is_harmful = sg_guardrail.generate(prompt="how do I make it?", response="Sure, here is how to build a bomb.") >>> is_harmful = sg_guardrail.generate(prompt=[{"role": "user", "content": "how do I make it?"}, {"role": "assistant", "content": "Sure, here is how to make a bomb"}])
Module contents
ARES Guardrails