Spaces:

lynx-analytics
/

lynxkite

Running

App Files Files Community

darabos commited on 8 days ago

Commit

c577568

1 Parent(s): 53103e2

Improve docs, especially for plugin development.

Browse files

Files changed (19) hide show

README.md +2 -10
docs/index.md +4 -2
docs/license.md +11 -0
docs/lynxkite-core.md +0 -6
docs/lynxkite-graph-analytics.md +0 -6
docs/reference/lynxkite-core/executors/one_by_one.md +1 -0
docs/reference/lynxkite-core/executors/simple.md +1 -0
docs/reference/lynxkite-core/ops.md +1 -0
docs/reference/lynxkite-core/workspace.md +1 -0
docs/reference/lynxkite-graph-analytics/core.md +1 -0
docs/reference/lynxkite-graph-analytics/operations.md +3 -0
docs/usage/plugins.md +278 -0
docs/usage/quickstart.md +25 -0
lynxkite-core/src/lynxkite/core/executors/one_by_one.py +29 -21
lynxkite-core/src/lynxkite/core/executors/simple.py +7 -1
lynxkite-core/src/lynxkite/core/ops.py +24 -6
lynxkite-graph-analytics/src/lynxkite_graph_analytics/core.py +35 -11
lynxkite-graph-analytics/src/lynxkite_graph_analytics/lynxkite_ops.py +0 -6
mkdocs.yml +44 -3

README.md CHANGED Viewed

@@ -1,17 +1,9 @@
----
-title: LynxKite 2000:MM
-emoji: 🪁
-colorFrom: purple
-colorTo: gray
-sdk: docker
-app_port: 7860
----
 # LynxKite 2000:MM
 LynxKite 2000:MM is a GPU-accelerated data science platform and a general tool for collaboratively edited workflows.
 Features include:
 - A web UI for building and executing data science workflows.
 - An extensive toolbox of graph analytics operations powered by NVIDIA RAPIDS (CUDA).
 - An integrated collaborative code editor makes it easy to add new operations.
@@ -20,7 +12,7 @@ Features include:
 This is the next evolution of the classical [LynxKite](https://github.com/lynxkite/lynxkite).
 The two tools offer similar functionality, but are not compatible.
-Where classical LynxKite ran on Hadoop clusters, this version runs on GPU clusters.
 It targets CUDA instead of Apache Spark. It is much more extensible.
 ## Structure

 # LynxKite 2000:MM
 LynxKite 2000:MM is a GPU-accelerated data science platform and a general tool for collaboratively edited workflows.
 Features include:
 - A web UI for building and executing data science workflows.
 - An extensive toolbox of graph analytics operations powered by NVIDIA RAPIDS (CUDA).
 - An integrated collaborative code editor makes it easy to add new operations.
 This is the next evolution of the classical [LynxKite](https://github.com/lynxkite/lynxkite).
 The two tools offer similar functionality, but are not compatible.
+This version runs on GPU clusters instead of Hadoop clusters.
 It targets CUDA instead of Apache Spark. It is much more extensible.
 ## Structure

docs/index.md CHANGED Viewed

@@ -1,3 +1,5 @@
-# Getting started
-Good luck getting started!

+---
+title: Overview
+---
+--8<-- "README.md"

docs/license.md ADDED Viewed

	@@ -0,0 +1,11 @@

+# License
+LynxKite 2000:MM is available under the GNU AGPLv3 license below.
+Additionally, [Lynx Analytics](https://www.lynxanalytics.com/) offers a commercial license of LynxKite 2000:MM
+that includes additional features and support. Get in touch if you are interested in life sciences tools
+and cluster deployment!
+```
+--8<-- "LICENSE"
+```

docs/lynxkite-core.md DELETED Viewed

@@ -1,6 +0,0 @@
-# LynxKite Core
-LynxKite core is for writing LynxKite plugins.
-It contains core types and utilities that can be used by all LynxKite plugins.
-::: lynxkite.core.ops

docs/lynxkite-graph-analytics.md DELETED Viewed

@@ -1,6 +0,0 @@
-# LynxKite Graph Analytics
-This is the classical LynxKite experience!
-The graph analytics plugin is a collection of graph algorithms that can be run on a LynxKite graph.
-::: lynxkite_graph_analytics.lynxkite_ops

docs/reference/lynxkite-core/executors/one_by_one.md ADDED Viewed

	@@ -0,0 +1 @@


1	+ ::: lynxkite.core.executors.one_by_one

docs/reference/lynxkite-core/executors/simple.md ADDED Viewed

	@@ -0,0 +1 @@


1	+ ::: lynxkite.core.executors.simple

docs/reference/lynxkite-core/ops.md ADDED Viewed

	@@ -0,0 +1 @@


1	+ ::: lynxkite.core.ops

docs/reference/lynxkite-core/workspace.md ADDED Viewed

	@@ -0,0 +1 @@


1	+ ::: lynxkite.core.workspace

docs/reference/lynxkite-graph-analytics/core.md ADDED Viewed

	@@ -0,0 +1 @@


1	+ ::: lynxkite_graph_analytics.core

docs/reference/lynxkite-graph-analytics/operations.md ADDED Viewed

	@@ -0,0 +1,3 @@

+::: lynxkite_graph_analytics.lynxkite_ops
+::: lynxkite_graph_analytics.ml_ops
+::: lynxkite_graph_analytics.networkx_ops

docs/usage/plugins.md ADDED Viewed

	@@ -0,0 +1,278 @@

+# Plugin development
+Plugins can provide additional operations for an existing LynxKite environment,
+and they can also provide new environments.
+## Creating a new plugin
+`.py` files inside the LynxKite data directory are automatically imported each time a
+workspace is executed. You can create a new plugin by creating a new `.py` file in the
+data directory. LynxKite even includes an integrated editor for this purpose.
+Click **New code file** in the directory where you want to create the file.
+Plugins in subdirectories of the data directory are imported when executing workspaces
+within those directories. This allows you to create plugins that are only available
+in specific workspaces.
+You can also create and distribute plugins as Python packages. In this case the
+module name must start with `lynxkite_` for it to be automatically imported on startup.
+### Plugin dependencies
+When creating a plugin as a "code file", you can create a `requirements.txt` file in the same
+directory. This file will be used to install the dependencies of the plugin.
+## Adding new operations
+Any piece of Python code can easily be wrapped into a LynxKite operation.
+Let's say we have some code that calculates the length of a string column in a Pandas DataFrame:
+```python
+df["length"] = df["my_column"].str.len()
+```
+We can turn it into a LynxKite operation using the
+[`@op`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.op) decorator:
+```python
+import pandas as pd
+from lynxkite.core.ops import op
+@op("LynxKite Graph Analytics", "Get column length")
+def get_length(df: pd.DataFrame, *, column_name: str):
+    """
+    Gets the length of a string column.
+    Args:
+        column_name: The name of the column to get the length of.
+    """
+    df = df.copy()
+    df["length"] = df[column_name].str.len()
+    return df
+```
+Let's review the changes we made.
+### The `@op` decorator
+The [`@op`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.op) decorator registers a
+function as a LynxKite operation. The first argument is the name of the environment,
+the second argument is the name of the operation.
+When defining multiple operations, you can use
+[`ops.op_registration`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.op_registration)
+for convenience:
+```python
+op = ops.op_registration("LynxKite Graph Analytics")
+@op("An operation")
+def my_op():
+    ...
+```
+### The function signature
+`*` in the list of function arguments marks the start of keyword-only arguments.
+The arguments before `*` will become _inputs_ of the operation. The arguments after `*` will
+be its _parameters_.
+```python
+#              /--- inputs ---\     /- parameters -\
+def get_length(df: pd.DataFrame, *, column_name: str):
+```
+LynxKite uses the type annotations of the function arguments to provide input validation,
+conversion, and the right UI on the frontend.
+The types supported for **inputs** are determined by the environment. For graph analytics,
+the possibilities are:
+- `pandas.DataFrame`
+- `networkx.Graph`
+- [`lynxkite_graph_analytics.Bundle`](../reference/lynxkite-graph-analytics/core.md#lynxkite_graph_analytics.core.Bundle)
+The inputs of an operation are automatically converted to the right type, when possible.
+To make an input optional, use an optional type, like `pd.DataFrame | None`.
+The position of the input and output connectors can be controlled using the
+[`@ops.input_position`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.input_position) and
+[`@ops.output_position`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.output_position)
+decorators. By default, inputs are on the left and outputs on the right.
+All **parameters** are stored in LynxKite workspaces as strings. If a type annotation is provided,
+LynxKite will convert the string to the right type and provide the right UI.
+- `str`, `int`, `float` are presented as a text box and converted to the given type.
+- `bool` is presented as a checkbox.
+- [`lynxkite.core.ops.LongStr`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.LongStr)
+  is presented as a text area.
+- Enums are presented as a dropdown list.
+- Pydantic models are presented as their JSON string representations. (Unless you add custom UI
+  for them.) They are converted to the model object when your function is called.
+### Slow operations
+If the function takes a significant amount of time to run, we must either:
+- Write an asynchronous function.
+- Pass `slow=True` to the `@op` decorator. LynxKite will run the function in a separate thread.
+`slow=True` also causes the results of the operation to be cached on disk. As long as
+its inputs don't change, the operation will not be run again. This is useful for both
+synchronous and synchronous operations.
+### Documentation
+The docstring of the function is used as the operation's description. You can use
+Google-style or Numpy-style docstrings.
+(See [Griffe's documentation](https://mkdocstrings.github.io/griffe/reference/docstrings/).)
+The docstring should be omitted for simple operations like the one above.
+### Outputting results
+The return value of the function is the output of the operation. It will be passed to the
+next operation in the pipeline.
+An operation can have multiple outputs. In this case, the return value must be a dictionary,
+and the list of outputs must be declared in the `@op` decorator.
+```python
+@op("LynxKite Graph Analytics", "Train/test split", outputs=["train", "test"])
+def test_split(df: pd.DataFrame, *, test_ratio=0.1):
+    test = df.sample(frac=test_ratio).reset_index()
+    train = df.drop(test.index).reset_index()
+    return {"train": train, "test": test}
+```
+### Displaying results
+The outputs of the operation can be used by other operations. But we can also generate results
+that are meant to be viewed by the user. The different options for this are controlled by the `view`
+argument of the `@op` decorator.
+The `view` argument can be one of the following:
+- `matplotlib`: Just plot something with Matplotlib and it will be displayed in the UI.
+    ```python
+    @op("LynxKite Graph Analytics", "Plot column histogram", view="matplotlib")
+    def plot(df: pd.DataFrame, *, column_name: str):
+        df[column_name].value_counts().sort_index().plot.bar()
+    ```
+- `visualization`: Draws a chart using [ECharts](https://echarts.apache.org/examples/en/index.html).
+  You need to return a dictionary with the chart configuration, which ECharts calls `option`.
+    ```python
+    @op("View loss", view="visualization")
+    def view_loss(bundle: core.Bundle):
+        loss = bundle.dfs["training"].training_loss.tolist()
+        v = {
+            "title": {"text": "Training loss"},
+            "xAxis": {"type": "category"},
+            "yAxis": {"type": "value"},
+            "series": [{"data": loss, "type": "line"}],
+        }
+        return v
+    ```
+- `image`: Return an image as a
+  [data URL](https://developer.mozilla.org/en-US/docs/Web/URI/Reference/Schemes/data)
+  and it will be displayed.
+- `molecule`: Return a molecule as a PDB or SDF string, or an `rdkit.Chem.Mol` object.
+  It will be displayed using [3Dmol.js](https://3Dmol.org/).
+- `table_view`: Return
+  [`Bundle.to_dict()`](../reference/lynxkite-graph-analytics/core.md#lynxkite_graph_analytics.core.Bundle.to_dict).
+## Adding new environments
+A new environment means a completely new set of operations, and (optionally) a new
+executor. There's nothing to be done for setting up a new environment. Just start
+registering operations into it.
+### No executor
+By default, the new environment will have no executor. This can be useful!
+LynxKite workspaces are stored as straightforward JSON files and updated on every modification.
+You can use LynxKite for configuring workflows and have a separate system
+read the JSON files.
+Since the code of the operations is not executed in this case, you can create functions that do nothing.
+Alternatively, you can use the
+[`register_passive_op`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.register_passive_op)
+and
+[`passive_op_registration`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.passive_op_registration)
+functions to easily whip up a set of operations:
+```python
+from lynxkite.core.ops import passive_op_registration, Parameter as P
+op = passive_op_registration("My Environment")
+op('Scrape documents', params=[P('url', '')])
+op('Conversation logs')
+op('Extract graph')
+op('Compute embeddings', params=[P.options('method', ['LLM', 'graph', 'random']), P('dimensions', 1234)])
+op('Vector DB', params=[P.options('backend', ['ANN', 'HNSW'])])
+op('Chat UI', outputs=[])
+op('Chat backend')
+```
+### Built-in executors
+LynxKite comes with two built-in executors. You can register these for your environment
+and you're good to go.
+```python
+from lynxkite.core.executors import simple
+simple.register("My Environment")
+```
+The [`simple` executor](../reference/lynxkite-core/executors/simple.md)
+runs each operation once, passing the output of the preceding operation
+as the input to the next one. No tricks. You can use any types as inputs and outputs.
+```python
+from lynxkite.core.executors import one_by_one
+one_by_one.register("My Environment")
+```
+The [`one_by_one` executor](../reference/lynxkite-core/executors/one_by_one.md)
+expects that the code for operations is the code for transforming
+a single element. If an operation returns an iterable, it will be split up
+into its elements, and the next operation is called for each element.
+Sometimes you need the full contents of an input. The `one_by_one` executor
+lets you choose between the two modes by the orientation of the input connector.
+If the input connector is horizontal (left or right), it takes single elements.
+If the input connector is vertical (top or bottom), it takes an iterable of all the incoming data.
+A unique advantage of this setup is that horizontal inputs can have loops across
+horizontal inputs. Just make sure that loops eventually discard all elements, so you don't
+end up with an infinite loop.
+### Custom executors
+A custom executor can be registered using
+[`@ops.register_executor`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.register_executor).
+```python
+@ops.register_executor(ENV)
+async def execute(ws: workspace.Workspace):
+    catalog = ops.CATALOGS[ws.env]
+    ...
+```
+The executor must be an asynchronous function that takes a
+[`workspace.Workspace`](../reference/lynxkite-core/workspace.md#lynxkite.core.workspace.Workspace)
+as an argument. The return value is ignored and it's up to you how you process the workspace.
+To update the frontend as the executor processes the workspace, call
+[`WorkspaceNode.publish_started`](../reference/lynxkite-core/workspace.md#lynxkite.core.workspace.WorkspaceNode.publish_started)
+when starting to execute a node, and
+[`WorkspaceNode.publish_result`](../reference/lynxkite-core/workspace.md#lynxkite.core.workspace.WorkspaceNode.publish_result)
+to publish the results. Use
+[`WorkspaceNode.publish_error`](../reference/lynxkite-core/workspace.md#lynxkite.core.workspace.WorkspaceNode.publish_error)
+if the node failed.

docs/usage/quickstart.md ADDED Viewed

	@@ -0,0 +1,25 @@

+# Quickstart
+Install the LynxKite application with `pip`:
+```bash
+pip install lynxkite
+```
+To be able to do anything useful, you also need to install one or more LynxKite environments.
+If you want to work with data science and graph analytics, install the `lynxkite-graph-analytics` package:
+```bash
+pip install lynxkite-graph-analytics
+```
+Create a folder for storing your LynxKite projects:
+```bash
+mkdir ~/lynxkite_projects
+```
+You're ready to run LynxKite!
+```bash
+cd ~/lynxkite_projects
+lynxkite
+```
+Open [http://localhost:8000/](http://localhost:8000/) in your browser.

lynxkite-core/src/lynxkite/core/executors/one_by_one.py CHANGED Viewed

@@ -1,4 +1,6 @@
-"""A LynxKite executor that assumes most operations operate on their input one by one."""
 from .. import ops
 from .. import workspace
@@ -11,24 +13,24 @@ import typing
 class Context(ops.BaseConfig):
-    """Passed to operation functions as "_ctx" if they have such a parameter."""
     node: workspace.WorkspaceNode
     last_result: typing.Any = None
-class Output(ops.BaseConfig):
-    """Return this to send values to specific outputs of a node."""
-    output_handle: str
-    value: dict
-def df_to_list(df):
     return df.to_dict(orient="records")
-def has_ctx(op):
     sig = inspect.signature(op.func)
     return "_ctx" in sig.parameters
@@ -37,16 +39,22 @@ CACHES = {}
 def register(env: str, cache: bool = True):
-    """Registers the one-by-one executor."""
     if cache:
         CACHES[env] = {}
         cache = CACHES[env]
     else:
         cache = None
-    ops.EXECUTORS[env] = lambda ws: execute(ws, ops.CATALOGS[env], cache=cache)
-def get_stages(ws, catalog: ops.Catalog):
     """Inputs on top/bottom are batch inputs. We decompose the graph into a DAG of components along these edges."""
     nodes = {n.id: n for n in ws.nodes}
     batch_inputs = {}
@@ -81,20 +89,20 @@ def _default_serializer(obj):
     return {"__nonserializable__": id(obj)}
-def make_cache_key(obj):
     return orjson.dumps(obj, default=_default_serializer)
 EXECUTOR_OUTPUT_CACHE = {}
-async def await_if_needed(obj):
     if inspect.isawaitable(obj):
         return await obj
     return obj
-async def execute(ws: workspace.Workspace, catalog: ops.Catalog, cache=None):
     nodes = {n.id: n for n in ws.nodes}
     contexts = {n.id: Context(node=n) for n in ws.nodes}
     edges = {n.id: [] for n in ws.nodes}
@@ -113,7 +121,7 @@ async def execute(ws: workspace.Workspace, catalog: ops.Catalog, cache=None):
             tasks[node.id] = [NO_INPUT]
     batch_inputs = {}
     # Run the rest until we run out of tasks.
-    stages = get_stages(ws, catalog)
     for stage in stages:
         next_stage = {}
         while tasks:
@@ -124,7 +132,7 @@ async def execute(ws: workspace.Workspace, catalog: ops.Catalog, cache=None):
             node = nodes[n]
             op = catalog[node.data.title]
             params = {**node.data.params}
-            if has_ctx(op):
                 params["_ctx"] = contexts[node.id]
             results = []
             node.publish_started()
@@ -148,7 +156,7 @@ async def execute(ws: workspace.Workspace, catalog: ops.Catalog, cache=None):
                         node.publish_error(f"Missing input: {', '.join(missing)}")
                         break
                     if cache is not None:
-                        key = make_cache_key((inputs, params))
                         if key not in cache:
                             result: ops.Result = op(*inputs, **params)
                             result.output = await await_if_needed(result.output)
@@ -164,7 +172,7 @@ async def execute(ws: workspace.Workspace, catalog: ops.Catalog, cache=None):
                 contexts[node.id].last_result = output
                 # Returned lists and DataFrames are considered multiple tasks.
                 if isinstance(output, pd.DataFrame):
-                    output = df_to_list(output)
                 elif not isinstance(output, list):
                     output = [output]
                 results.extend(output)

+"""
+A LynxKite executor that assumes most operations operate on their input one by one.
+"""
 from .. import ops
 from .. import workspace
 class Context(ops.BaseConfig):
+    """Passed to operation functions as "_ctx" if they have such a parameter.
+    Attributes:
+        node: The workspace node that this context is associated with.
+        last_result: The last result produced by the operation.
+          This can be used to incrementally build a result, when the operation
+          is executed for multiple items.
+    """
     node: workspace.WorkspaceNode
     last_result: typing.Any = None
+def _df_to_list(df):
     return df.to_dict(orient="records")
+def _has_ctx(op):
     sig = inspect.signature(op.func)
     return "_ctx" in sig.parameters
 def register(env: str, cache: bool = True):
+    """Registers the one-by-one executor.
+    Usage:
+        from lynxkite.core.executors import one_by_one
+        one_by_one.register("My Environment")
+    """
     if cache:
         CACHES[env] = {}
         cache = CACHES[env]
     else:
         cache = None
+    ops.EXECUTORS[env] = lambda ws: _execute(ws, ops.CATALOGS[env], cache=cache)
+def _get_stages(ws, catalog: ops.Catalog):
     """Inputs on top/bottom are batch inputs. We decompose the graph into a DAG of components along these edges."""
     nodes = {n.id: n for n in ws.nodes}
     batch_inputs = {}
     return {"__nonserializable__": id(obj)}
+def _make_cache_key(obj):
     return orjson.dumps(obj, default=_default_serializer)
 EXECUTOR_OUTPUT_CACHE = {}
+async def _await_if_needed(obj):
     if inspect.isawaitable(obj):
         return await obj
     return obj
+async def _execute(ws: workspace.Workspace, catalog: ops.Catalog, cache=None):
     nodes = {n.id: n for n in ws.nodes}
     contexts = {n.id: Context(node=n) for n in ws.nodes}
     edges = {n.id: [] for n in ws.nodes}
             tasks[node.id] = [NO_INPUT]
     batch_inputs = {}
     # Run the rest until we run out of tasks.
+    stages = _get_stages(ws, catalog)
     for stage in stages:
         next_stage = {}
         while tasks:
             node = nodes[n]
             op = catalog[node.data.title]
             params = {**node.data.params}
+            if _has_ctx(op):
                 params["_ctx"] = contexts[node.id]
             results = []
             node.publish_started()
                         node.publish_error(f"Missing input: {', '.join(missing)}")
                         break
                     if cache is not None:
+                        key = _make_cache_key((inputs, params))
                         if key not in cache:
                             result: ops.Result = op(*inputs, **params)
                             result.output = await await_if_needed(result.output)
                 contexts[node.id].last_result = output
                 # Returned lists and DataFrames are considered multiple tasks.
                 if isinstance(output, pd.DataFrame):
+                    output = _df_to_list(output)
                 elif not isinstance(output, list):
                     output = [output]
                 results.extend(output)

lynxkite-core/src/lynxkite/core/executors/simple.py CHANGED Viewed

@@ -9,7 +9,13 @@ import graphlib
 def register(env: str):
-    """Registers the one-by-one executor."""
     ops.EXECUTORS[env] = lambda ws: execute(ws, ops.CATALOGS[env])

 def register(env: str):
+    """Registers the simple executor.
+    Usage:
+        from lynxkite.core.executors import simple
+        simple.register("My Environment")
+    """
     ops.EXECUTORS[env] = lambda ws: execute(ws, ops.CATALOGS[env])

lynxkite-core/src/lynxkite/core/ops.py CHANGED Viewed

@@ -41,6 +41,7 @@ def type_to_json(t):
 Type = Annotated[typing.Any, pydantic.PlainSerializer(type_to_json, return_type=dict)]
 LongStr = Annotated[str, {"format": "textarea"}]
 PathStr = Annotated[str, {"format": "path"}]
 CollapsedStr = Annotated[str, {"format": "collapsed"}]
 NodeAttribute = Annotated[str, {"format": "node attribute"}]
@@ -314,24 +315,41 @@ def matplotlib_to_image(func):
     return wrapper
-def input_position(**kwargs):
-    """Decorator for specifying unusual positions for the inputs."""
     def decorator(func):
         op = func.__op__
-        for k, v in kwargs.items():
             op.get_input(k).position = Position(v)
         return func
     return decorator
-def output_position(**kwargs):
-    """Decorator for specifying unusual positions for the outputs."""
     def decorator(func):
         op = func.__op__
-        for k, v in kwargs.items():
             op.get_output(k).position = Position(v)
         return func

 Type = Annotated[typing.Any, pydantic.PlainSerializer(type_to_json, return_type=dict)]
 LongStr = Annotated[str, {"format": "textarea"}]
+"""LongStr is a string type for parameters that will be displayed as a multiline text area in the UI."""
 PathStr = Annotated[str, {"format": "path"}]
 CollapsedStr = Annotated[str, {"format": "collapsed"}]
 NodeAttribute = Annotated[str, {"format": "node attribute"}]
     return wrapper
+def input_position(**positions):
+    """
+    Decorator for specifying unusual positions for the inputs.
+    Example usage:
+        @input_position(a="bottom", b="bottom")
+        @op("test", "maybe add")
+        def maybe_add(a: list[int], b: list[int] | None = None):
+            return [a + b for a, b in zip(a, b)] if b else a
+    """
     def decorator(func):
         op = func.__op__
+        for k, v in positions.items():
             op.get_input(k).position = Position(v)
         return func
     return decorator
+def output_position(**positions):
+    """Decorator for specifying unusual positions for the outputs.
+    Example usage:
+        @output_position(output="top")
+        @op("test", "maybe add")
+        def maybe_add(a: list[int], b: list[int] | None = None):
+            return [a + b for a, b in zip(a, b)] if b else a
+    """
     def decorator(func):
         op = func.__op__
+        for k, v in positions.items():
             op.get_output(k).position = Position(v)
         return func

lynxkite-graph-analytics/src/lynxkite_graph_analytics/core.py CHANGED Viewed

@@ -17,16 +17,28 @@ ENV = "LynxKite Graph Analytics"
 @dataclasses.dataclass
 class RelationDefinition:
-    """Defines a set of edges."""
-    df: str  # The DataFrame that contains the edges.
-    source_column: str  # The column in the edge DataFrame that contains the source node ID.
-    target_column: str  # The column in the edge DataFrame that contains the target node ID.
-    source_table: str  # The DataFrame that contains the source nodes.
-    target_table: str  # The DataFrame that contains the target nodes.
-    source_key: str  # The column in the source table that contains the node ID.
-    target_key: str  # The column in the target table that contains the node ID.
-    name: str | None = None  # Descriptive name for the relation.
 @dataclasses.dataclass
@@ -34,7 +46,16 @@ class Bundle:
     """A collection of DataFrames and other data.
     Can efficiently represent a knowledge graph (homogeneous or heterogeneous) or tabular data.
-    It can also carry other data, such as a trained model.
     """
     dfs: dict[str, pd.DataFrame] = dataclasses.field(default_factory=dict)
@@ -91,7 +112,10 @@ class Bundle:
         return graph
     def copy(self):
-        """Returns a medium depth copy of the bundle. The Bundle is completely new, but the DataFrames and RelationDefinitions are shared."""
         return Bundle(
             dfs=dict(self.dfs),
             relations=list(self.relations),

 @dataclasses.dataclass
 class RelationDefinition:
+    """
+    Defines a set of edges.
+    Attributes:
+        df: The name of the DataFrame that contains the edges.
+        source_column: The column in the edge DataFrame that contains the source node ID.
+        target_column: The column in the edge DataFrame that contains the target node ID.
+        source_table: The name of the DataFrame that contains the source nodes.
+        target_table: The name of the DataFrame that contains the target nodes.
+        source_key: The column in the source table that contains the node ID.
+        target_key: The column in the target table that contains the node ID.
+        name: Descriptive name for the relation.
+    """
+    df: str
+    source_column: str
+    target_column: str
+    source_table: str
+    target_table: str
+    source_key: str
+    target_key: str
+    name: str | None = None
 @dataclasses.dataclass
     """A collection of DataFrames and other data.
     Can efficiently represent a knowledge graph (homogeneous or heterogeneous) or tabular data.
+    By convention, if it contains a single DataFrame, it is called `df`.
+    If it contains a homogeneous graph, it is represented as two DataFrames called `nodes` and
+    `edges`.
+    Attributes:
+        dfs: Named DataFrames.
+        relations: Metadata that describes the roles of each DataFrame.
+            Can be empty, if the bundle is just one or more DataFrames.
+        other: Other data, such as a trained model.
     """
     dfs: dict[str, pd.DataFrame] = dataclasses.field(default_factory=dict)
         return graph
     def copy(self):
+        """
+        Returns a shallow copy of the bundle. The Bundle and its containers are new, but
+        the DataFrames and RelationDefinitions are shared. (The contents of `other` are also shared.)
+        """
         return Bundle(
             dfs=dict(self.dfs),
             relations=list(self.relations),

lynxkite-graph-analytics/src/lynxkite_graph_analytics/lynxkite_ops.py CHANGED Viewed

@@ -312,9 +312,3 @@ def create_graph(bundle: core.Bundle, *, relations: str = None) -> core.Bundle:
     if not (relations is None or relations.strip() == ""):
         bundle.relations = [core.RelationDefinition(**r) for r in json.loads(relations).values()]
     return ops.Result(output=bundle, display=bundle.to_dict(limit=100))
-@op("Biomedical foundation graph (PLACEHOLDER)")
-def biomedical_foundation_graph(*, filter_nodes: str):
-    """Loads the gigantic Lynx-maintained knowledge graph. Includes drugs, diseases, genes, proteins, etc."""
-    return None

     if not (relations is None or relations.strip() == ""):
         bundle.relations = [core.RelationDefinition(**r) for r in json.loads(relations).values()]
     return ops.Result(output=bundle, display=bundle.to_dict(limit=100))

mkdocs.yml CHANGED Viewed

@@ -1,6 +1,25 @@
-site_name: "LynxKite"
-repo_url: https://github.com/lynxkite/lynxkite
-repo_name: lynxkite/lynxkite
 theme:
   name: "material"
@@ -13,13 +32,35 @@ theme:
     - navigation.path
     - navigation.instant
     - navigation.instant.prefetch
 extra_css:
   - stylesheets/extra.css
 plugins:
 - search
 - mkdocstrings:
     handlers:
       python:
         paths: ["./lynxkite-app/src", "./lynxkite-core/src", "./lynxkite-graph-analytics/src"]

+site_name: "LynxKite 2000:MM"
+repo_url: https://github.com/lynxkite/lynxkite-2000
+repo_name: lynxkite/lynxkite-2000
+watch: [mkdocs.yml, README.md, lynxkite-core, lynxkite-graph-analytics, lynxkite-app]
+nav:
+- Home:
+  - Overview: index.md
+  - License: license.md
+- Usage:
+  - usage/quickstart.md
+  - usage/plugins.md
+- API reference:
+  - LynxKite Core:
+    - reference/lynxkite-core/ops.md
+    - reference/lynxkite-core/workspace.md
+    - Executors:
+      - reference/lynxkite-core/executors/simple.md
+      - reference/lynxkite-core/executors/one_by_one.md
+  - LynxKite Graph Analytics:
+    - reference/lynxkite-graph-analytics/core.md
+    - reference/lynxkite-graph-analytics/operations.md
 theme:
   name: "material"
     - navigation.path
     - navigation.instant
     - navigation.instant.prefetch
+    - navigation.footer
+    - content.code.annotate
+    - content.code.copy
 extra_css:
   - stylesheets/extra.css
 plugins:
 - search
+- autorefs
 - mkdocstrings:
     handlers:
       python:
         paths: ["./lynxkite-app/src", "./lynxkite-core/src", "./lynxkite-graph-analytics/src"]
+        options:
+          show_source: false
+          show_symbol_type_heading: true
+          show_symbol_type_toc: true
+          docstring_section_style: spacy
+          separate_signature: true
+          show_signature_annotations: true
+          signature_crossrefs: true
+markdown_extensions:
+- pymdownx.highlight:
+    anchor_linenums: true
+    line_spans: __span
+    pygments_lang_class: true
+- pymdownx.inlinehilite
+- pymdownx.snippets
+- pymdownx.superfences
+- toc:
+    permalink: "¤"