byLLM Release Notes#

This document provides a summary of new features, improvements, and bug fixes in each version of byLLM (formerly MTLLM). For details on changes that might require updates to your existing code, please refer to the Breaking Changes page.

byllm 0.6.19 (Latest Release)#

New Features#

Feature: Re-architected visit [-->] by llm routing: LLM-guided traversal now renders each candidate as an (edge, node) pair, so routing can decide on the connecting edge's type/attributes, not just node data, and auto-injects the walker's state and the current node into the prompt (no more smuggling context through incl_info). Candidate, field, and ability descriptions are sourced from MTIR semstrings rather than runtime reflection. The model's choice is constrained to a runtime-synthesised enum of descriptive handles (the node class name, disambiguated), and a new select parameter controls cardinality (select=1 | k | (min, max) | "all"). Replaces the previous node-only mechanism that returned an unconstrained list[int]. (#6767)

Refactors#

Refactor: remove dead code in byllm (part 2/2, companion to #7032): dropped never-called methods and unused module-level globals. Behavior preserved.
Refactor: Scale references updated for the core fold-in: Comments referencing the former jac-scale plugin (e.g. the SSE-worker notes) now refer to the built-in scale backend, which lives in core jaclang as jaclang.scale. No behavior change.

byllm 0.6.18#

Bug Fixes#

Fix: byLLM's missing-jaclang hint points at the binary, not PyPI: the import-time error shown when jaclang cannot be found no longer says pip install jaclang (jaclang is no longer published to PyPI -- it ships inside the jac binary). It now tells the user to run byLLM under jac or jac install -e the plugin.
Streaming: Retry transient mid-stream drops: streaming calls now use an inter-chunk read timeout and retry the same model on transient transport errors (stall, dropped connection, brief 5xx) instead of hanging on litellm's default timeout, emitting a stream_reset event so consumers discard partial output.

byllm 0.6.17#

Bug Fixes#

Fix: Streaming by llm() no longer crashes when consumed from a foreign thread Context: Model.invoke() now resets the ContextVar Token immediately in the originating Context before returning the streaming generator. A new _stream_with_telemetry wrapper drives each next() call via contextvars.copy_context(), so the generator is safe to pull from jac-scale's SSE thread-pool worker (or any foreign Context) without raising ValueError: Token was created in a different Context.
Fix: Clear error on max_tokens-truncated tool calls: When a response is cut off at the token limit mid tool-call, byLLM now returns a clear "output truncated, retry with a smaller payload" error instead of executing the tool with partial arguments and failing with a misleading "missing required argument".

byllm 0.6.15#

New Features#

Feature: Retry structured-output generation on empty or malformed responses: A non-streaming by llm() call with a structured (non-str) return type now regenerates when the output is empty or cannot be parsed, instead of failing on the first bad generation. Between attempts byLLM resets to the original prompt and injects a corrective message that restates the failure, echoes the rejected output, and re-shows the JSON schema. The new max_output_retries knob (retries after the first attempt, default 3, 0 disables) resolves per-call > per-object > jac.toml [plugins.byllm.call_params] > default; on exhaustion the raised OutputConversionError reports the attempt count and preserves raw_output. (jaseci-labs/jaseci#6538)
Feature: MockLLM testing primitives for structured output and errors: MockRawResponse feeds text through the real parse_response (so malformed JSON raises OutputConversionError and valid JSON parses to the typed object), MockError raises a wrapped exception, and MockLLM.seen_prompts records the prompt seen per attempt. These let tests drive the structured-output and retry paths through real by llm() calls; existing MockLLM behavior is unchanged. (jaseci-labs/jaseci#6538)

Bug Fixes#

Fix: streamed text answers now reach the response channel: When a tool-enabled -> str/None agent finished by writing plain text instead of calling the hidden finish tool, byLLM left the reply on the internal thought stream and never emitted it on the chunk (answer) channel, so consumers that read only the answer channel received an empty response. The terminal text is now delivered as the answer chunk, reusing what the model already produced with no re-prompt and no extra LLM call. Structured returns and the existing finish-tool path are unchanged.

byllm 0.6.14#

New Features#

Feat: Multimodal tool results are provider-portable: Tool-result images are now rerouted into a trailing user message at request time, unblocking providers that reject images in role: "tool" (e.g. OpenAI) while Anthropic continues to work.

byllm 0.6.12#

New Features#

Add: system_prompt parameter on by llm(): New optional kwarg accepting a str or zero-arg Callable[[], str]. Extends - does not replace the base SYSTEM (jac.toml override or SYSTEM_PERSONA). Precedence: per-call > jac.toml > default. Lets conversational apps (by llm(conversation=<list>)) place the agent's standing persona in the cacheable SYSTEM slot instead of the function sem, where it would otherwise duplicate into chat history each turn. Backward compatible.

byllm 0.6.11#

New Features#

Structured by llm() calls can stream: Passing a stream_handler lets a structured-return by llm() call, including visit [-->] by model edge routing, stream its generation token-by-token while still returning its typed result.
Feat: Multimodal tool results: Tools can now return Image / Text / list[Media] and the content reaches the provider as a real image content block - previously every tool return was str()-ed and ToolCallResultMsg.to_dict dumped raw objects.
Per-call token usage in streamed responses: Streaming completions now report input-token and prompt-cache counts for each model call, not only the per-turn total.

byllm 0.6.10#

New Features#

Streaming responses report generation speed: The streaming llm_timing event now carries the per-iteration completion-token count alongside its duration, so consumers can display tokens/sec.

Bug Fixes#

Fix: tool-using ReAct loops no longer hang on models without a tool-aware chat template: Local GGUFs and ollama families like gemma drop role:"tool" messages and mis-render assistant tool_calls, so they never saw their own results and re-issued the same call forever; the transcript sent to these models is now linearised into the <tool_call>/<tool_response> text shape they can actually read.
Fix: tool calls now dispatch when streaming against ollama: ollama streams a tool call as plain text in the message content rather than a structured tool_calls field, and litellm.stream_chunk_builder cannot reassemble it. byLLM's text recovery previously ran only for backends with supports_native_tools=False, so the native ollama path silently dropped the call and the ReAct loop ended with the raw JSON as its answer. The streaming dispatch now rebuilds the message as a dict from the accumulated stream text and recovers the call whenever tools were offered but no native tool_calls came back, and the JSON recovery understands {"tool_calls": [...]} / {"tool_call": {...}} envelopes plus the name/tool/tool_name/function and arguments/args/tool_args/parameters/input key aliases small models emit. No-op for genuine native tool calls and plain-text answers, so cloud providers are unaffected.

Refactors#

Refactor: native property syntax: MTIR.runtime now uses the native has runtime: MTRuntime { getter; } property form (getter body in impl/mtir.impl.jac) instead of a @property decorator, matching the codebase-wide migration.

byllm 0.6.9#

New Features#

Tool calling for local models: byLLM now supports tool calling on backends without server-side tool support by rendering the tool protocol into the prompt and recovering tool calls from the reply.

Bug Fixes#

Fix: by llm() no longer stalls the server in async walkers: Using by llm() inside an async walker previously blocked the entire event loop for the full LLM round-trip (0.5–30 s), freezing every other concurrent request. It now runs fully non-blocking via litellm.acompletion, AsyncOpenAI, and httpx.AsyncClient. No code changes needed. Sync walkers are unaffected.
Fix: CI dependency alignment check restored for byllm: httpx>=0.27.0 was missing from jac.toml after being added to pyproject.toml in #5944, causing the CI alignment check to fail. Both files are now in sync.

byllm 0.6.6#

New Features#

Add: Universal schema-hint injection for structured-output prompts: byLLM now extracts the response_format schema's enum/description metadata and appends a human-readable hint (e.g. Schema requirements: - field must be one of: 1=WORK, 2=PERSONAL, ...) to the last user message inside BaseLLM.make_model_params, so every backend sees the same explicit name-to-value mapping cloud frontier providers' server-side prompt builders inject for free. Fixes systematic miscategorization on Ollama, vLLM, and other local-style backends where small open-weight models (Gemma 4 E4B, Llama 3.x 8B class) were defaulting to the first allowed enum value because the schema description never reached the prompt. Empirical impact on the day_planner mini-bench: Ollama gemma4:e4b jumps from 1/6 (17%) to 6/6 (100%) on the categorize fixture; in-process local:gemma-4-e4b stays at 24/24 (100%). The previously LocalLLM-only attach_schema_hint is now byllm.schema.inject_schema_hint; LocalLLM's filter_params is reduced to llama.cpp-specific concerns (multimodal-content flatten and json_schema -> json_object rewrite) since the hint is already applied upstream.

Bug Fixes#

Fix: local: model ctx_window propagation to compaction layer: LocalLLM._get_ctx_window (introduced in #5830) returned only self.spec["n_ctx"], ignoring every user override. The proactive-compaction threshold then fired against the alias's hardcoded spec value (8192 for all bundled aliases) regardless of by llm(ctx_window=N), LocalLLM(ctx_window=N), [plugins.byllm.compaction] ctx_window in jac.toml, or config={"n_ctx": N}. Worst case: a user who built the engine with config={"n_ctx": 2048} got engine overflow at ~2048 tokens because compaction was waiting for 80% of 8192.
Fix: Message.to_dict() crashing on media params with sem declarations: An explicit sem foo.img = "..." on an Image (or any other Media) parameter crashed byllm with TypeError: list indices must be integers or slices, not str. The semstr-handling branch assumed media.to_dict() returns a single dict, but all Media subclasses return list[dict] (one entry per content block). The branch is now correct, and the sem string is emitted as a standard {"type": "text", "text": ...} block immediately before the media - replacing the previous non-standard "description" key that LLM provider APIs (OpenAI/Anthropic) would have rejected anyway. Surfaces in JacCoder image uploads through JacBuilder.

Refactors#

Refactor: Auto-compaction test suite: Replaced the auto-compaction test suite (introduced in #5722) with 13 focused integration tests that drive the real by llm() pipeline end-to-end. The new suite asserts on observable outcomes (result correctness, exception type, message-list shape, hook-call counts) and never patches _default_compact, _compact_messages, _copy_for_compaction, _get_ctx_window, or model_call_no_stream; they all run as real implementations.

byllm 0.6.5#

New Features#

Add: Native MCP tool support: New McpClient and McpTool let by llm() use tools from any MCP server alongside local tools, e.g. def answer(q: str) -> str by llm(tools=[*mcp.get_tools()]);. Supports stdio, sse, and streamable-http transports, auto-detected from command= or url=. Optional dep: pip install byllm[mcp].
Add: Parent-child invocation tracking for nested by llm() calls: Model.invoke now uses a ContextVar to propagate the current invocation ID, so nested by llm() calls automatically capture their parent. Each telemetry record includes a parent_invocation_id field, enabling external consumers (e.g., jac-scale) to reconstruct the full agent call tree.
Add: parent_invocation_id forwarded to LiteLLM metadata: BaseLLM.make_model_params now includes jac_parent_invocation_id in agent_metadata, allowing LiteLLM-level loggers to correlate individual API calls with the nested invocation hierarchy.
Add: Parallel tool calling: When the LLM emits multiple tool calls in one response, byllm now runs them concurrently via a shared thread pool. Enable globally with jac.toml / BYLLM_PARALLEL_TOOL_CALLING=true /, or per-call with by llm(parallelize=True). The LLM receives scheduling hints to intelligently batch independent tools and sequence dependent ones. Default remains sequential - no change for existing applications.
Add: Auto-compaction for long ReAct loops: When a ReAct agent's message history approaches the model's context window limit, byLLM now automatically summarises old tool-call rounds and replaces them with a compact summary message, letting agents run indefinitely long tasks without hitting provider context errors. The system message and original user task are always preserved. Configure globally via [plugins.byllm.compaction] in jac.toml (enabled, threshold_ratio, keep_recent_iterations, ctx_window, compaction_model) or override per-call with by llm(ctx_window=N, threshold_ratio=0.80, ...). Supply on_compaction=my_fn to replace the built-in summarisation with custom logic. Set compaction_model to use a cheaper model for compaction calls (inherits api_key and base_url from the active model). Model and ModelPool gain a ctx_window field for models not in LiteLLM's registry. A CompactionNotEffectiveError is raised if compaction fires twice consecutively without reducing context size. Default is enabled - set compaction_enabled=False per-call or enabled = false in jac.toml to opt out.
Add: MockLLM usage tuple support for compaction testing: Each entry in MockLLM's outputs list may now be a (payload, usage_dict) tuple (e.g. (MockToolCall(...), {"prompt_tokens": 850})) to inject token-usage metadata without a real API call. Non-tuple entries behave exactly as before - fully backwards-compatible.
Add: conversation=<list> kwarg for multi-turn by llm() calls: Pass a caller-owned list to bind conversation history. byllm reads it as prior context, runs the ReAct loop, and writes the full persistable turn (user, assistant tool_calls, tool_call_id-linked results, final answer) back into the same list - byllm itself stays stateless.
Add: Built-in in-process local LLM runtime via local:<alias>: Run Gemma 4 E4B/E2B and Qwen 3.5 4B in-process through llama.cpp with no daemon and no proxy server. Activated by the opt-in [local] extra (pip install 'byllm[local]'); use --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu (or the matching /cu124, /metal, /vulkan URL) to skip the llama-cpp-python source build. Weights download lazily into ~/.cache/jac/models/<alias>/ on first use and are managed via the new jac model list/pull/rm command. Configurable under [plugins.byllm.local] (default_alias, n_ctx, n_gpu_layers, n_threads, verbose, auto_download). For most local-inference use cases, Ollama is the recommended path -- it ships with automatic GPU detection and a curated model registry, and byLLM has supported it through litellm since launch (default_model = "ollama/<model>"); the new local:* runtime is for users who specifically don't want a separate daemon. The default_llm auto-fallback uses local:<default_alias> when no provider key is configured and [local] is installed; otherwise it raises a ConfigurationError listing the three concrete fixes (set an API key, point default_model at an explicit model such as ollama/<model>, or install byllm[local]).

Bug Fixes#

Fix: Schema generation now emits a JSON-Schema enum constraint for enum-typed return values: _type_to_schema previously described enum members only in the description text, which frontier cloud models inferred but smaller open-weight models ignored. The generated schema now binds output to the actual member set on every provider.
Fix: Schema generation drops MTIR enum members with no runtime value: stale or polluted MTIR could append a member.name whose runtime value was missing, shifting the index pairing (e.g. emitting 5=FITNESS for an enum that actually maps 5=OTHER). The schema generator now skips MTIR members not present in the live __members__ map, keeping enum_names and enum_values aligned.

byllm 0.6.4#

Fix: ModelPool streaming fallback infinite recursion: Fixed a bug where ModelPool with strategy="fallback" and stream=True caused LiteLLM Router's stream_with_fallbacks to recurse infinitely on primary model failure. The fix calls Router._completion() directly per model with fallbacks=[], avoiding recursive re-entry. A yielded guard prevents corrupted output on mid-stream failures by propagating the exception instead of silently falling back. Fallback alias construction is deduplicated into a _fallback_model_names field populated in postinit.
Add: Automatic Anthropic prompt caching: Caches system prompt, tool schemas, and ReAct conversation history across iterations for Claude models, significantly reducing input token costs. Enabled by default.
Fix: Emit usage StreamEvent for no-tool streaming calls: The usage event now fires for every streaming invocation, not just ReAct loops with tools, so token accounting is complete across all by llm() shapes.

byllm 0.6.3#

Add: ModelPool for LLM fallback and load-balancing: Introduced ModelPool as a drop-in replacement for Model - use by pool() exactly like by llm(). Internally wraps a LiteLLM Router running in-process (no subprocess, no proxy server) that handles fallback, retries, and load-distribution across a list of Model instances. Exported from byllm.lib. Six routing strategies are supported: "fallback" (ordered priority, next model on failure), "simple-shuffle" (random pick per call - ideal for free-tier key rotation across multiple API keys), "cost-based-routing" (cheapest deployment via LiteLLM's built-in cost database), "latency-based-routing" (fastest by EWMA-tracked response time), "usage-based-routing" (lowest current TPM/RPM usage), and "least-busy" (fewest in-flight requests). Backward compatible - no changes needed to existing by llm() call sites.
Add: Global ModelPool defaults via jac.toml: A new [plugins.byllm.fallback] section in jac.toml provides global defaults for ModelPool construction - strategy (default "fallback"), num_retries (default 1), and timeout (default 60.0 seconds).

byllm 0.6.2#

Type Safety: BaseLLM implements LLMModel protocol: BaseLLM now extends the LLMModel protocol defined in jaclang core, and the byllm plugin's default_llm hook returns LLMModel instead of object. This enables type-safe LLM model references across the full chain from the type checker through the runtime.

byllm 0.6.1#

Add: ReAct loop interrupt via on_iteration callback: New on_iteration parameter on by llm() fires between iterations, returning CONTINUE, ABORT, or ABORT_WITH_SUMMARY. Enables stop buttons, token budgets, and doom-loop detection. Backward compatible.

byllm 0.6.0#

Security: Pin litellm to safe versions: litellm v1.82.7+ was compromised with a credential-stealing payload (supply chain attack). Pinned dependency to <=1.82.6 which is verified safe. See BerriAI/litellm#24512 for details.

byllm 0.5.9#

1 small changes.

byllm 0.5.8#

Add: Configurable LiteLLM debug logging via jac.toml: LiteLLM's verbose logging (HTTP requests, retries, headers) can now be toggled via [plugins.byllm.litellm] debug = true/false in jac.toml. Defaults to false (quiet). When disabled, _disable_debugging() silences LiteLLM's internal loggers, reducing stdout noise. byLLM's own exception logging (logger.error) is unaffected, errors are always logged and propagated regardless of this setting.
Add: LLM Telemetry & Observability: Introduced a lightweight agent telemetry publish mechanism (byllm/telemetry.jac) that emits structured per-invocation records (caller, user prompt, agent response, token usage, cost, and latency) at the end of every Model.invoke() call without storing any data in byllm itself.
Add: Invocation ID correlation: Model.invoke() now stamps a UUID invocation_id across all LLM calls in a ReAct loop, enabling external consumers (e.g., jac-scale) to correlate per-call litellm events with the top-level agent invocation into a single unified trace.

byllm 0.5.7#

byllm 0.5.6#

byllm 0.5.5#

Small refactors/formatting fixes.

byllm 0.5.4#

2 small refactors/changes.
Add: Explicit LLM exceptions: Introduced jac-byllm/byllm/exceptions.jac to define clear exception types for LLM and tooling failures (improves diagnosability and retry logic).
Fix: Error handling & propagation: Updated LLM implementations and core modules (byllm/lib.jac, byllm/llm.jac, byllm/mtir.jac, byllm/schema.jac, byllm/llm.impl/*, byllm/types.*) to raise and propagate the new exceptions instead of swallowing or returning generic errors.
Tests: Updated to assert exception semantics: jac-byllm/tests/test_byllm.jac and related tests now validate the new failure modes and edge cases.
Fix: JSONDecodeError when gpt-4o returns text alongside a tool call: Fixed a crash where parse_response was called before tool_calls_list was extracted from the LLM message. gpt-4o sometimes returns a brief plain-text string alongside a tool call (e.g. "I'll handle that."), which bypassed the empty-string guard and caused json.loads to fail with Expecting value: line 1 column 1. The fix extracts tool_calls_list first and skips parse_response entirely when tool calls are present, the typed output comes from finish_tool, not from the message content field.
Fix: JSONDecodeError / AttributeError when gpt-4o ignores finish_tool on follow-up turns: gpt-4o does not reliably follow the INSTRUCTION_TOOL system prompt on multi-turn conversations and sometimes responds with plain conversational text instead of calling finish_tool. This caused two cascading failures: json.loads crashed on the plain-text response, and even if it didn't, invoke would break out of the ReAct loop returning a raw str where a structured type was expected. Fixed with two complementary changes: (1) parse_response now wraps json.loads in try/except Exception and returns the raw string on failure rather than crashing; (2) the invoke ReAct loop detects the no-tool-call / structured-type mismatch, appends a correction message, and re-invokes with only finish_tool available, recovering the typed result without an extra round-trip in the happy path.
Fix: NameError: name 'logger' is not defined when a tool raises an exception: tool.impl.jac called logger.exception(...) in the Tool.__call__ error handler, but logger was not imported in types.jac (the module that owns the Tool type). Added import logging and glob logger = logging.getLogger(__name__) to types.jac so tool failures are logged correctly instead of raising a secondary NameError.
Add: Structured streaming event system (StreamEvent): Introduced a StreamEvent type in byllm/types.jac representing unified streaming output events (thoughts, tool calls, tool results, and answer chunks). BaseLLM.invoke now detects when logging is enabled with streaming and routes calls to a new _invoke_streaming method that yields StreamEvent objects for all intermediate steps and final answer tokens. The final-answer streaming step clears all tools so the LLM produces plain text rather than re-invoking finish_tool. StreamEvent is exported via byllm/lib.jac and byllm/llm.jac and _invoke_streaming is added to the BaseLLM public interface. This enables more granular, structured streaming output useful for debugging and UI integration.

byllm 0.5.3#

byllm 0.5.2#

Chore: Codebase Reformatted: All .jac files reformatted with improved jac format (better line-breaking, comment spacing, and ternary indentation).

byllm 0.5.1#

Fix: Enum/structured return types failing with Anthropic models: Fixed a bug where by llm() functions returning non-string types (enums, objects) would crash with Anthropic/Claude models ("Attempted to call tool: 'json_tool_call' which was not present"). The issue was that finish_tool was only added when explicit tools were provided, so when no tools were passed but a structured return type was set, LiteLLM's Anthropic adapter would inject a hidden json_tool_call tool that byllm couldn't find. The fix ensures finish_tool is always created for non-string return types, regardless of whether explicit tools are provided.
Fix: ReAct final answer tool calling error with Anthropic models: Fixed a bug where the ReAct loop's final answer step would clear tools=[], causing Anthropic to fail when the conversation history already contained tool call results ("tools must be defined" / "tools must not be empty"). The fix retains finish_tool in the tools list during the final answer step (for both streaming and non-streaming paths) and also handles the case where Anthropic returns the final answer via a finish_tool call rather than plain text.
Dependency: LiteLLM updated to >=1.81.15,<1.83.0: The minimum LiteLLM version has been raised from 1.75.5.post1 to 1.81.15 to pick up Anthropic tool-calling fixes. If you have other packages pinning LiteLLM below 1.81.15, you will need to update them.

byllm 0.5.0#

Builtin llm for Zero-Config by llm(): The llm name is now a builtin, so by llm() works without any explicit import or glob llm = ... declaration. The byllm plugin provides a default_llm hook that automatically returns a configured Model instance from jac.toml settings. Users can still override the builtin by defining glob llm = ... in their module.

byllm 0.4.21#

Deprecated method parameter: The method parameter ("ReAct", "Reason", "Chain-of-Thoughts") in by llm() is now deprecated and emits a DeprecationWarning. It was never functional; the ReAct tool-calling loop is automatically enabled when tools=[...] is provided. Simply pass tools directly instead of using method="ReAct".

byllm 0.4.20#

Minor internal refactors

byllm 0.4.19#

Various refactors
Fix: api_key parameter no longer silently ignored: The api_key passed via constructor, instance config, or global config (jac.toml) was being overwritten with None before every LLM call. The key is now properly resolved with a clear priority chain (constructor > instance config > global config > environment variables) and passed to LiteLLM, OpenAI client, and HTTP calls. API keys are also masked in verbose logs, showing only the last 4 characters.
Fix: MTIR scope key mismatch between compile-time and runtime: Fixed a bug where MTIR entries stored at compile-time could not be retrieved at runtime due to mismatched scope keys. At compile-time, the scope key was generated using the module's file path, while at runtime it used __main__. This caused by llm() calls to silently fall back to introspection mode, losing all sem string descriptions. The fix uses sys.modules['__main__'].__file__ at runtime to get the entry point's file path, then extracts the file stem to match the compile-time scope key format.

byllm 0.4.18#

byllm 0.4.17#

Enum Semantic Strings in Schema: Added support for extracting semantic strings from enum members at compile time. Enum member descriptions (e.g., sem Personality.INTROVERT = "Person who is reserved...") are now included in LLM schemas, providing richer context for enum selection.

byllm 0.4.16#

MTIR-Powered Schema Generation: MTRuntime now uses compile-time MTIR info for generating JSON schemas with semantic descriptions. Tool and return type schemas include semstrings extracted at compile time, providing richer context for LLM calls.
Python Library Fallback Mode: When MTIR is unavailable (e.g., using byLLM as a Python library without Jac compilation), the runtime gracefully falls back to introspection-based schema generation, maintaining backward compatibility.
Schema Generation Fixes: Fixed several issues in schema generation that were exposed when MTIR data became correctly available:
Union Type Null Safety: Fixed NoneType has no attribute 'type_info' errors when processing Union types like str | None.
Dataclass Inherited Fields: Schema now correctly includes inherited fields from base classes (e.g., Dog | Cat union now includes name, age from Pet base class).
Required Fields Validation: Fixed OpenAI schema validation error ("Extra required key supplied") by only listing fields that actually exist in properties.
Function Schema Fallback: Dynamically created tools (like finish_tool) now work correctly even without MTIR info by falling back to function introspection.
Internal: Explicitly declared all postinit fields across the codebase.
Internal refactors: Removed orphaned files, etc.

byllm 0.4.15#

Direct HTTP model calls: Added support for calling custom LLM endpoints via direct HTTP (http_client in model config).

byllm 0.4.14#

Max Iterations for ReAct (max_react_iterations): Added a configurable limit for ReAct tool-calling loops via by llm(max_react_iterations=3) to prevent overly long or endless reasoning cycles. When the limit is reached, the model stops calling tools and returns a final answer based on the information gathered so far.

byllm 0.4.13#

byllm 0.4.12#

byllm 0.4.9#

LLM-Powered Graph Traversal (visit by): Introduced visit [-->] by llm() syntax enabling walkers to make intelligent traversal decisions. The LLM analyzes the semantic context of available nodes and selects which ones to visit based on the walker's purpose, bringing AI-powered decision making to graph navigation.

byllm 0.4.8#

Streaming with ReAct Tool Calling: Implemented real-time streaming support for ReAct method when using tools. After tool execution completes, the LLM now streams the final synthesized answer token-by-token, providing the best of both worlds: structured tool calling with streaming responses.

byllm 0.4.7#

Custom Model Declaration: Custom model interfaces can be defined by using the BaseLLM class that can be imported form byllm.lib. A guide for using this feature is added to documentation.

byllm 0.4.6#

byLLM In-Memory Images: byLLM Image class now accepts in-memory and path-like inputs (bytes/bytearray/memoryview, BytesIO/file-like, PIL.Image, Path), plus data/gs/http(s) URLs; auto-detects MIME (incl. WEBP), preserves URLs, and reads streams.

byllm 0.4.5#

byLLM Lazy Loading: Refactored byLLM to support lazy loading by moving all exports to byllm.lib module. Users should now import from byllm.lib in Python (e.g., from byllm.lib import Model, by) and use import from byllm.lib { Model } in Jac code. This improves startup performance and reduces unnecessary module loading.
NonGPT Fallback for byLLM: Implemented automatic fallback when byLLM is not installed. When code attempts to import byllm, the system will provide mock implementations that return random values using the NonGPT.random_value_for_type() utility.

byllm 0.4.4#

is Keyword for Semstrings: Added support for using is as an alternative to = in semantic string declarations (e.g., sem MyObject.value is "A value stored in MyObject").
byLLM Plugin Interface Improved: Enhanced the byLLM plugin interface with get_mtir function hook interface and refactored the by decorator to use the plugin system, improving integration and extensibility.

byllm 0.4.3#

byLLM Enhancements:
Fixed bug with Enums without values not being properly included in prompts (e.g., enum Tell { YES, NO } now works correctly).

byllm 0.4.2#

byLLM transition: MTLLM has been transitioned to byLLM and PyPi package is renamed to byllm. Github actions are changed to push byllm PyPi. Alongside an mtllm PyPi will be pushed which installs latest byllm and produces a deprecation warning when imported as mtllm.
byLLM Feature Methods as Tools: byLLM now supports adding methods of classes as tools for the llm using such as tools=[ToolHolder.tool]

byllm 0.4.1#

byLLM transition: MTLLM has been transitioned to byLLM and PyPi package is renamed to byllm. Github actions are changed to push byllm PyPi. Alongside an mtllm PyPi will be pushed which installs latest byllm and produces a deprecation warning when imported as mtllm.

mtllm 0.4.0#

Removed LLM Override: function_call() by llm() has been removed as it was introduce ambiguity in the grammer with LALR(1) shift/reduce error. This feature will be reintroduced in a future release with a different syntax.

mtllm 0.3.8#

Semantic Strings: Introduced sem strings to attach natural language descriptions to code elements like functions, classes, and parameters. These semantic annotations can be used by Large Language Models (LLMs) to enable intelligent, AI-powered code generation and execution. (mtllm)
LLM Function Overriding: Introduced the ability to override any regular function with an LLM-powered implementation at runtime using the function_call() by llm() syntax. This allows for dynamic, on-the-fly replacement of function behavior with generative models. (mtllm)

byLLM Release Notes#

byllm 0.6.19 (Latest Release)#

New Features#

Refactors#

byllm 0.6.18#

Bug Fixes#

byllm 0.6.17#

Bug Fixes#

byllm 0.6.15#

New Features#

Bug Fixes#

byllm 0.6.14#

New Features#

byllm 0.6.12#

New Features#

byllm 0.6.11#

New Features#

byllm 0.6.10#

New Features#

Bug Fixes#

Refactors#

byllm 0.6.9#

New Features#

Bug Fixes#

byllm 0.6.6#

New Features#

Bug Fixes#

Refactors#

byllm 0.6.5#

New Features#

Bug Fixes#

byllm 0.6.4#

byllm 0.6.3#

byllm 0.6.2#

byllm 0.6.1#

byllm 0.6.0#

byllm 0.5.9#

byllm 0.5.8#

byllm 0.5.7#

byllm 0.5.6#

byllm 0.5.5#

byllm 0.5.4#

byllm 0.5.3#

byllm 0.5.2#

byllm 0.5.1#

byllm 0.5.0#

byllm 0.4.21#

byllm 0.4.20#

byllm 0.4.19#

byllm 0.4.18#

byllm 0.4.17#

byllm 0.4.16#

byllm 0.4.15#

byllm 0.4.14#

byllm 0.4.13#

byllm 0.4.12#

byllm 0.4.9#

byllm 0.4.8#

byllm 0.4.7#

byllm 0.4.6#

byllm 0.4.5#

byllm 0.4.4#

byllm 0.4.3#

byllm 0.4.2#

byllm 0.4.1#

mtllm 0.4.0#

mtllm 0.3.8#

Version 0.8.0#