Spaces:
Running
Running
Improve docs, especially for plugin development.
Browse files- README.md +2 -10
- docs/index.md +4 -2
- docs/license.md +11 -0
- docs/lynxkite-core.md +0 -6
- docs/lynxkite-graph-analytics.md +0 -6
- docs/reference/lynxkite-core/executors/one_by_one.md +1 -0
- docs/reference/lynxkite-core/executors/simple.md +1 -0
- docs/reference/lynxkite-core/ops.md +1 -0
- docs/reference/lynxkite-core/workspace.md +1 -0
- docs/reference/lynxkite-graph-analytics/core.md +1 -0
- docs/reference/lynxkite-graph-analytics/operations.md +3 -0
- docs/usage/plugins.md +278 -0
- docs/usage/quickstart.md +25 -0
- lynxkite-core/src/lynxkite/core/executors/one_by_one.py +29 -21
- lynxkite-core/src/lynxkite/core/executors/simple.py +7 -1
- lynxkite-core/src/lynxkite/core/ops.py +24 -6
- lynxkite-graph-analytics/src/lynxkite_graph_analytics/core.py +35 -11
- lynxkite-graph-analytics/src/lynxkite_graph_analytics/lynxkite_ops.py +0 -6
- mkdocs.yml +44 -3
README.md
CHANGED
@@ -1,17 +1,9 @@
|
|
1 |
-
---
|
2 |
-
title: LynxKite 2000:MM
|
3 |
-
emoji: 🪁
|
4 |
-
colorFrom: purple
|
5 |
-
colorTo: gray
|
6 |
-
sdk: docker
|
7 |
-
app_port: 7860
|
8 |
-
---
|
9 |
-
|
10 |
# LynxKite 2000:MM
|
11 |
|
12 |
LynxKite 2000:MM is a GPU-accelerated data science platform and a general tool for collaboratively edited workflows.
|
13 |
|
14 |
Features include:
|
|
|
15 |
- A web UI for building and executing data science workflows.
|
16 |
- An extensive toolbox of graph analytics operations powered by NVIDIA RAPIDS (CUDA).
|
17 |
- An integrated collaborative code editor makes it easy to add new operations.
|
@@ -20,7 +12,7 @@ Features include:
|
|
20 |
|
21 |
This is the next evolution of the classical [LynxKite](https://github.com/lynxkite/lynxkite).
|
22 |
The two tools offer similar functionality, but are not compatible.
|
23 |
-
|
24 |
It targets CUDA instead of Apache Spark. It is much more extensible.
|
25 |
|
26 |
## Structure
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# LynxKite 2000:MM
|
2 |
|
3 |
LynxKite 2000:MM is a GPU-accelerated data science platform and a general tool for collaboratively edited workflows.
|
4 |
|
5 |
Features include:
|
6 |
+
|
7 |
- A web UI for building and executing data science workflows.
|
8 |
- An extensive toolbox of graph analytics operations powered by NVIDIA RAPIDS (CUDA).
|
9 |
- An integrated collaborative code editor makes it easy to add new operations.
|
|
|
12 |
|
13 |
This is the next evolution of the classical [LynxKite](https://github.com/lynxkite/lynxkite).
|
14 |
The two tools offer similar functionality, but are not compatible.
|
15 |
+
This version runs on GPU clusters instead of Hadoop clusters.
|
16 |
It targets CUDA instead of Apache Spark. It is much more extensible.
|
17 |
|
18 |
## Structure
|
docs/index.md
CHANGED
@@ -1,3 +1,5 @@
|
|
1 |
-
|
|
|
|
|
2 |
|
3 |
-
|
|
|
1 |
+
---
|
2 |
+
title: Overview
|
3 |
+
---
|
4 |
|
5 |
+
--8<-- "README.md"
|
docs/license.md
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# License
|
2 |
+
|
3 |
+
LynxKite 2000:MM is available under the GNU AGPLv3 license below.
|
4 |
+
|
5 |
+
Additionally, [Lynx Analytics](https://www.lynxanalytics.com/) offers a commercial license of LynxKite 2000:MM
|
6 |
+
that includes additional features and support. Get in touch if you are interested in life sciences tools
|
7 |
+
and cluster deployment!
|
8 |
+
|
9 |
+
```
|
10 |
+
--8<-- "LICENSE"
|
11 |
+
```
|
docs/lynxkite-core.md
DELETED
@@ -1,6 +0,0 @@
|
|
1 |
-
# LynxKite Core
|
2 |
-
|
3 |
-
LynxKite core is for writing LynxKite plugins.
|
4 |
-
It contains core types and utilities that can be used by all LynxKite plugins.
|
5 |
-
|
6 |
-
::: lynxkite.core.ops
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/lynxkite-graph-analytics.md
DELETED
@@ -1,6 +0,0 @@
|
|
1 |
-
# LynxKite Graph Analytics
|
2 |
-
|
3 |
-
This is the classical LynxKite experience!
|
4 |
-
The graph analytics plugin is a collection of graph algorithms that can be run on a LynxKite graph.
|
5 |
-
|
6 |
-
::: lynxkite_graph_analytics.lynxkite_ops
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/reference/lynxkite-core/executors/one_by_one.md
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
::: lynxkite.core.executors.one_by_one
|
docs/reference/lynxkite-core/executors/simple.md
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
::: lynxkite.core.executors.simple
|
docs/reference/lynxkite-core/ops.md
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
::: lynxkite.core.ops
|
docs/reference/lynxkite-core/workspace.md
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
::: lynxkite.core.workspace
|
docs/reference/lynxkite-graph-analytics/core.md
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
::: lynxkite_graph_analytics.core
|
docs/reference/lynxkite-graph-analytics/operations.md
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
::: lynxkite_graph_analytics.lynxkite_ops
|
2 |
+
::: lynxkite_graph_analytics.ml_ops
|
3 |
+
::: lynxkite_graph_analytics.networkx_ops
|
docs/usage/plugins.md
ADDED
@@ -0,0 +1,278 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Plugin development
|
2 |
+
|
3 |
+
Plugins can provide additional operations for an existing LynxKite environment,
|
4 |
+
and they can also provide new environments.
|
5 |
+
|
6 |
+
## Creating a new plugin
|
7 |
+
|
8 |
+
`.py` files inside the LynxKite data directory are automatically imported each time a
|
9 |
+
workspace is executed. You can create a new plugin by creating a new `.py` file in the
|
10 |
+
data directory. LynxKite even includes an integrated editor for this purpose.
|
11 |
+
Click **New code file** in the directory where you want to create the file.
|
12 |
+
|
13 |
+
Plugins in subdirectories of the data directory are imported when executing workspaces
|
14 |
+
within those directories. This allows you to create plugins that are only available
|
15 |
+
in specific workspaces.
|
16 |
+
|
17 |
+
You can also create and distribute plugins as Python packages. In this case the
|
18 |
+
module name must start with `lynxkite_` for it to be automatically imported on startup.
|
19 |
+
|
20 |
+
### Plugin dependencies
|
21 |
+
|
22 |
+
When creating a plugin as a "code file", you can create a `requirements.txt` file in the same
|
23 |
+
directory. This file will be used to install the dependencies of the plugin.
|
24 |
+
|
25 |
+
## Adding new operations
|
26 |
+
|
27 |
+
Any piece of Python code can easily be wrapped into a LynxKite operation.
|
28 |
+
Let's say we have some code that calculates the length of a string column in a Pandas DataFrame:
|
29 |
+
|
30 |
+
```python
|
31 |
+
df["length"] = df["my_column"].str.len()
|
32 |
+
```
|
33 |
+
|
34 |
+
We can turn it into a LynxKite operation using the
|
35 |
+
[`@op`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.op) decorator:
|
36 |
+
|
37 |
+
```python
|
38 |
+
import pandas as pd
|
39 |
+
from lynxkite.core.ops import op
|
40 |
+
|
41 |
+
@op("LynxKite Graph Analytics", "Get column length")
|
42 |
+
def get_length(df: pd.DataFrame, *, column_name: str):
|
43 |
+
"""
|
44 |
+
Gets the length of a string column.
|
45 |
+
|
46 |
+
Args:
|
47 |
+
column_name: The name of the column to get the length of.
|
48 |
+
"""
|
49 |
+
df = df.copy()
|
50 |
+
df["length"] = df[column_name].str.len()
|
51 |
+
return df
|
52 |
+
```
|
53 |
+
|
54 |
+
Let's review the changes we made.
|
55 |
+
|
56 |
+
### The `@op` decorator
|
57 |
+
|
58 |
+
The [`@op`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.op) decorator registers a
|
59 |
+
function as a LynxKite operation. The first argument is the name of the environment,
|
60 |
+
the second argument is the name of the operation.
|
61 |
+
|
62 |
+
When defining multiple operations, you can use
|
63 |
+
[`ops.op_registration`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.op_registration)
|
64 |
+
for convenience:
|
65 |
+
```python
|
66 |
+
op = ops.op_registration("LynxKite Graph Analytics")
|
67 |
+
|
68 |
+
@op("An operation")
|
69 |
+
def my_op():
|
70 |
+
...
|
71 |
+
```
|
72 |
+
|
73 |
+
### The function signature
|
74 |
+
|
75 |
+
`*` in the list of function arguments marks the start of keyword-only arguments.
|
76 |
+
The arguments before `*` will become _inputs_ of the operation. The arguments after `*` will
|
77 |
+
be its _parameters_.
|
78 |
+
|
79 |
+
```python
|
80 |
+
# /--- inputs ---\ /- parameters -\
|
81 |
+
def get_length(df: pd.DataFrame, *, column_name: str):
|
82 |
+
```
|
83 |
+
|
84 |
+
LynxKite uses the type annotations of the function arguments to provide input validation,
|
85 |
+
conversion, and the right UI on the frontend.
|
86 |
+
|
87 |
+
The types supported for **inputs** are determined by the environment. For graph analytics,
|
88 |
+
the possibilities are:
|
89 |
+
|
90 |
+
- `pandas.DataFrame`
|
91 |
+
- `networkx.Graph`
|
92 |
+
- [`lynxkite_graph_analytics.Bundle`](../reference/lynxkite-graph-analytics/core.md#lynxkite_graph_analytics.core.Bundle)
|
93 |
+
|
94 |
+
The inputs of an operation are automatically converted to the right type, when possible.
|
95 |
+
|
96 |
+
To make an input optional, use an optional type, like `pd.DataFrame | None`.
|
97 |
+
|
98 |
+
The position of the input and output connectors can be controlled using the
|
99 |
+
[`@ops.input_position`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.input_position) and
|
100 |
+
[`@ops.output_position`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.output_position)
|
101 |
+
decorators. By default, inputs are on the left and outputs on the right.
|
102 |
+
|
103 |
+
All **parameters** are stored in LynxKite workspaces as strings. If a type annotation is provided,
|
104 |
+
LynxKite will convert the string to the right type and provide the right UI.
|
105 |
+
|
106 |
+
- `str`, `int`, `float` are presented as a text box and converted to the given type.
|
107 |
+
- `bool` is presented as a checkbox.
|
108 |
+
- [`lynxkite.core.ops.LongStr`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.LongStr)
|
109 |
+
is presented as a text area.
|
110 |
+
- Enums are presented as a dropdown list.
|
111 |
+
- Pydantic models are presented as their JSON string representations. (Unless you add custom UI
|
112 |
+
for them.) They are converted to the model object when your function is called.
|
113 |
+
|
114 |
+
### Slow operations
|
115 |
+
|
116 |
+
If the function takes a significant amount of time to run, we must either:
|
117 |
+
|
118 |
+
- Write an asynchronous function.
|
119 |
+
- Pass `slow=True` to the `@op` decorator. LynxKite will run the function in a separate thread.
|
120 |
+
|
121 |
+
`slow=True` also causes the results of the operation to be cached on disk. As long as
|
122 |
+
its inputs don't change, the operation will not be run again. This is useful for both
|
123 |
+
synchronous and synchronous operations.
|
124 |
+
|
125 |
+
### Documentation
|
126 |
+
|
127 |
+
The docstring of the function is used as the operation's description. You can use
|
128 |
+
Google-style or Numpy-style docstrings.
|
129 |
+
(See [Griffe's documentation](https://mkdocstrings.github.io/griffe/reference/docstrings/).)
|
130 |
+
|
131 |
+
The docstring should be omitted for simple operations like the one above.
|
132 |
+
|
133 |
+
### Outputting results
|
134 |
+
|
135 |
+
The return value of the function is the output of the operation. It will be passed to the
|
136 |
+
next operation in the pipeline.
|
137 |
+
|
138 |
+
An operation can have multiple outputs. In this case, the return value must be a dictionary,
|
139 |
+
and the list of outputs must be declared in the `@op` decorator.
|
140 |
+
|
141 |
+
```python
|
142 |
+
@op("LynxKite Graph Analytics", "Train/test split", outputs=["train", "test"])
|
143 |
+
def test_split(df: pd.DataFrame, *, test_ratio=0.1):
|
144 |
+
test = df.sample(frac=test_ratio).reset_index()
|
145 |
+
train = df.drop(test.index).reset_index()
|
146 |
+
return {"train": train, "test": test}
|
147 |
+
```
|
148 |
+
|
149 |
+
### Displaying results
|
150 |
+
|
151 |
+
The outputs of the operation can be used by other operations. But we can also generate results
|
152 |
+
that are meant to be viewed by the user. The different options for this are controlled by the `view`
|
153 |
+
argument of the `@op` decorator.
|
154 |
+
|
155 |
+
The `view` argument can be one of the following:
|
156 |
+
|
157 |
+
- `matplotlib`: Just plot something with Matplotlib and it will be displayed in the UI.
|
158 |
+
|
159 |
+
```python
|
160 |
+
@op("LynxKite Graph Analytics", "Plot column histogram", view="matplotlib")
|
161 |
+
def plot(df: pd.DataFrame, *, column_name: str):
|
162 |
+
df[column_name].value_counts().sort_index().plot.bar()
|
163 |
+
```
|
164 |
+
|
165 |
+
- `visualization`: Draws a chart using [ECharts](https://echarts.apache.org/examples/en/index.html).
|
166 |
+
You need to return a dictionary with the chart configuration, which ECharts calls `option`.
|
167 |
+
|
168 |
+
```python
|
169 |
+
@op("View loss", view="visualization")
|
170 |
+
def view_loss(bundle: core.Bundle):
|
171 |
+
loss = bundle.dfs["training"].training_loss.tolist()
|
172 |
+
v = {
|
173 |
+
"title": {"text": "Training loss"},
|
174 |
+
"xAxis": {"type": "category"},
|
175 |
+
"yAxis": {"type": "value"},
|
176 |
+
"series": [{"data": loss, "type": "line"}],
|
177 |
+
}
|
178 |
+
return v
|
179 |
+
```
|
180 |
+
|
181 |
+
- `image`: Return an image as a
|
182 |
+
[data URL](https://developer.mozilla.org/en-US/docs/Web/URI/Reference/Schemes/data)
|
183 |
+
and it will be displayed.
|
184 |
+
- `molecule`: Return a molecule as a PDB or SDF string, or an `rdkit.Chem.Mol` object.
|
185 |
+
It will be displayed using [3Dmol.js](https://3Dmol.org/).
|
186 |
+
- `table_view`: Return
|
187 |
+
[`Bundle.to_dict()`](../reference/lynxkite-graph-analytics/core.md#lynxkite_graph_analytics.core.Bundle.to_dict).
|
188 |
+
|
189 |
+
## Adding new environments
|
190 |
+
|
191 |
+
A new environment means a completely new set of operations, and (optionally) a new
|
192 |
+
executor. There's nothing to be done for setting up a new environment. Just start
|
193 |
+
registering operations into it.
|
194 |
+
|
195 |
+
### No executor
|
196 |
+
|
197 |
+
By default, the new environment will have no executor. This can be useful!
|
198 |
+
|
199 |
+
LynxKite workspaces are stored as straightforward JSON files and updated on every modification.
|
200 |
+
You can use LynxKite for configuring workflows and have a separate system
|
201 |
+
read the JSON files.
|
202 |
+
|
203 |
+
Since the code of the operations is not executed in this case, you can create functions that do nothing.
|
204 |
+
Alternatively, you can use the
|
205 |
+
[`register_passive_op`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.register_passive_op)
|
206 |
+
and
|
207 |
+
[`passive_op_registration`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.passive_op_registration)
|
208 |
+
functions to easily whip up a set of operations:
|
209 |
+
|
210 |
+
```python
|
211 |
+
from lynxkite.core.ops import passive_op_registration, Parameter as P
|
212 |
+
|
213 |
+
op = passive_op_registration("My Environment")
|
214 |
+
op('Scrape documents', params=[P('url', '')])
|
215 |
+
op('Conversation logs')
|
216 |
+
op('Extract graph')
|
217 |
+
op('Compute embeddings', params=[P.options('method', ['LLM', 'graph', 'random']), P('dimensions', 1234)])
|
218 |
+
op('Vector DB', params=[P.options('backend', ['ANN', 'HNSW'])])
|
219 |
+
op('Chat UI', outputs=[])
|
220 |
+
op('Chat backend')
|
221 |
+
```
|
222 |
+
|
223 |
+
### Built-in executors
|
224 |
+
|
225 |
+
LynxKite comes with two built-in executors. You can register these for your environment
|
226 |
+
and you're good to go.
|
227 |
+
|
228 |
+
```python
|
229 |
+
from lynxkite.core.executors import simple
|
230 |
+
simple.register("My Environment")
|
231 |
+
```
|
232 |
+
|
233 |
+
The [`simple` executor](../reference/lynxkite-core/executors/simple.md)
|
234 |
+
runs each operation once, passing the output of the preceding operation
|
235 |
+
as the input to the next one. No tricks. You can use any types as inputs and outputs.
|
236 |
+
|
237 |
+
```python
|
238 |
+
from lynxkite.core.executors import one_by_one
|
239 |
+
one_by_one.register("My Environment")
|
240 |
+
```
|
241 |
+
|
242 |
+
The [`one_by_one` executor](../reference/lynxkite-core/executors/one_by_one.md)
|
243 |
+
expects that the code for operations is the code for transforming
|
244 |
+
a single element. If an operation returns an iterable, it will be split up
|
245 |
+
into its elements, and the next operation is called for each element.
|
246 |
+
|
247 |
+
Sometimes you need the full contents of an input. The `one_by_one` executor
|
248 |
+
lets you choose between the two modes by the orientation of the input connector.
|
249 |
+
If the input connector is horizontal (left or right), it takes single elements.
|
250 |
+
If the input connector is vertical (top or bottom), it takes an iterable of all the incoming data.
|
251 |
+
|
252 |
+
A unique advantage of this setup is that horizontal inputs can have loops across
|
253 |
+
horizontal inputs. Just make sure that loops eventually discard all elements, so you don't
|
254 |
+
end up with an infinite loop.
|
255 |
+
|
256 |
+
### Custom executors
|
257 |
+
|
258 |
+
A custom executor can be registered using
|
259 |
+
[`@ops.register_executor`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.register_executor).
|
260 |
+
|
261 |
+
```python
|
262 |
+
@ops.register_executor(ENV)
|
263 |
+
async def execute(ws: workspace.Workspace):
|
264 |
+
catalog = ops.CATALOGS[ws.env]
|
265 |
+
...
|
266 |
+
```
|
267 |
+
|
268 |
+
The executor must be an asynchronous function that takes a
|
269 |
+
[`workspace.Workspace`](../reference/lynxkite-core/workspace.md#lynxkite.core.workspace.Workspace)
|
270 |
+
as an argument. The return value is ignored and it's up to you how you process the workspace.
|
271 |
+
|
272 |
+
To update the frontend as the executor processes the workspace, call
|
273 |
+
[`WorkspaceNode.publish_started`](../reference/lynxkite-core/workspace.md#lynxkite.core.workspace.WorkspaceNode.publish_started)
|
274 |
+
when starting to execute a node, and
|
275 |
+
[`WorkspaceNode.publish_result`](../reference/lynxkite-core/workspace.md#lynxkite.core.workspace.WorkspaceNode.publish_result)
|
276 |
+
to publish the results. Use
|
277 |
+
[`WorkspaceNode.publish_error`](../reference/lynxkite-core/workspace.md#lynxkite.core.workspace.WorkspaceNode.publish_error)
|
278 |
+
if the node failed.
|
docs/usage/quickstart.md
ADDED
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Quickstart
|
2 |
+
|
3 |
+
Install the LynxKite application with `pip`:
|
4 |
+
```bash
|
5 |
+
pip install lynxkite
|
6 |
+
```
|
7 |
+
|
8 |
+
To be able to do anything useful, you also need to install one or more LynxKite environments.
|
9 |
+
If you want to work with data science and graph analytics, install the `lynxkite-graph-analytics` package:
|
10 |
+
```bash
|
11 |
+
pip install lynxkite-graph-analytics
|
12 |
+
```
|
13 |
+
|
14 |
+
Create a folder for storing your LynxKite projects:
|
15 |
+
```bash
|
16 |
+
mkdir ~/lynxkite_projects
|
17 |
+
```
|
18 |
+
|
19 |
+
You're ready to run LynxKite!
|
20 |
+
```bash
|
21 |
+
cd ~/lynxkite_projects
|
22 |
+
lynxkite
|
23 |
+
```
|
24 |
+
|
25 |
+
Open [http://localhost:8000/](http://localhost:8000/) in your browser.
|
lynxkite-core/src/lynxkite/core/executors/one_by_one.py
CHANGED
@@ -1,4 +1,6 @@
|
|
1 |
-
"""
|
|
|
|
|
2 |
|
3 |
from .. import ops
|
4 |
from .. import workspace
|
@@ -11,24 +13,24 @@ import typing
|
|
11 |
|
12 |
|
13 |
class Context(ops.BaseConfig):
|
14 |
-
"""Passed to operation functions as "_ctx" if they have such a parameter.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
|
16 |
node: workspace.WorkspaceNode
|
17 |
last_result: typing.Any = None
|
18 |
|
19 |
|
20 |
-
|
21 |
-
"""Return this to send values to specific outputs of a node."""
|
22 |
-
|
23 |
-
output_handle: str
|
24 |
-
value: dict
|
25 |
-
|
26 |
-
|
27 |
-
def df_to_list(df):
|
28 |
return df.to_dict(orient="records")
|
29 |
|
30 |
|
31 |
-
def
|
32 |
sig = inspect.signature(op.func)
|
33 |
return "_ctx" in sig.parameters
|
34 |
|
@@ -37,16 +39,22 @@ CACHES = {}
|
|
37 |
|
38 |
|
39 |
def register(env: str, cache: bool = True):
|
40 |
-
"""Registers the one-by-one executor.
|
|
|
|
|
|
|
|
|
|
|
|
|
41 |
if cache:
|
42 |
CACHES[env] = {}
|
43 |
cache = CACHES[env]
|
44 |
else:
|
45 |
cache = None
|
46 |
-
ops.EXECUTORS[env] = lambda ws:
|
47 |
|
48 |
|
49 |
-
def
|
50 |
"""Inputs on top/bottom are batch inputs. We decompose the graph into a DAG of components along these edges."""
|
51 |
nodes = {n.id: n for n in ws.nodes}
|
52 |
batch_inputs = {}
|
@@ -81,20 +89,20 @@ def _default_serializer(obj):
|
|
81 |
return {"__nonserializable__": id(obj)}
|
82 |
|
83 |
|
84 |
-
def
|
85 |
return orjson.dumps(obj, default=_default_serializer)
|
86 |
|
87 |
|
88 |
EXECUTOR_OUTPUT_CACHE = {}
|
89 |
|
90 |
|
91 |
-
async def
|
92 |
if inspect.isawaitable(obj):
|
93 |
return await obj
|
94 |
return obj
|
95 |
|
96 |
|
97 |
-
async def
|
98 |
nodes = {n.id: n for n in ws.nodes}
|
99 |
contexts = {n.id: Context(node=n) for n in ws.nodes}
|
100 |
edges = {n.id: [] for n in ws.nodes}
|
@@ -113,7 +121,7 @@ async def execute(ws: workspace.Workspace, catalog: ops.Catalog, cache=None):
|
|
113 |
tasks[node.id] = [NO_INPUT]
|
114 |
batch_inputs = {}
|
115 |
# Run the rest until we run out of tasks.
|
116 |
-
stages =
|
117 |
for stage in stages:
|
118 |
next_stage = {}
|
119 |
while tasks:
|
@@ -124,7 +132,7 @@ async def execute(ws: workspace.Workspace, catalog: ops.Catalog, cache=None):
|
|
124 |
node = nodes[n]
|
125 |
op = catalog[node.data.title]
|
126 |
params = {**node.data.params}
|
127 |
-
if
|
128 |
params["_ctx"] = contexts[node.id]
|
129 |
results = []
|
130 |
node.publish_started()
|
@@ -148,7 +156,7 @@ async def execute(ws: workspace.Workspace, catalog: ops.Catalog, cache=None):
|
|
148 |
node.publish_error(f"Missing input: {', '.join(missing)}")
|
149 |
break
|
150 |
if cache is not None:
|
151 |
-
key =
|
152 |
if key not in cache:
|
153 |
result: ops.Result = op(*inputs, **params)
|
154 |
result.output = await await_if_needed(result.output)
|
@@ -164,7 +172,7 @@ async def execute(ws: workspace.Workspace, catalog: ops.Catalog, cache=None):
|
|
164 |
contexts[node.id].last_result = output
|
165 |
# Returned lists and DataFrames are considered multiple tasks.
|
166 |
if isinstance(output, pd.DataFrame):
|
167 |
-
output =
|
168 |
elif not isinstance(output, list):
|
169 |
output = [output]
|
170 |
results.extend(output)
|
|
|
1 |
+
"""
|
2 |
+
A LynxKite executor that assumes most operations operate on their input one by one.
|
3 |
+
"""
|
4 |
|
5 |
from .. import ops
|
6 |
from .. import workspace
|
|
|
13 |
|
14 |
|
15 |
class Context(ops.BaseConfig):
|
16 |
+
"""Passed to operation functions as "_ctx" if they have such a parameter.
|
17 |
+
|
18 |
+
Attributes:
|
19 |
+
node: The workspace node that this context is associated with.
|
20 |
+
last_result: The last result produced by the operation.
|
21 |
+
This can be used to incrementally build a result, when the operation
|
22 |
+
is executed for multiple items.
|
23 |
+
"""
|
24 |
|
25 |
node: workspace.WorkspaceNode
|
26 |
last_result: typing.Any = None
|
27 |
|
28 |
|
29 |
+
def _df_to_list(df):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
return df.to_dict(orient="records")
|
31 |
|
32 |
|
33 |
+
def _has_ctx(op):
|
34 |
sig = inspect.signature(op.func)
|
35 |
return "_ctx" in sig.parameters
|
36 |
|
|
|
39 |
|
40 |
|
41 |
def register(env: str, cache: bool = True):
|
42 |
+
"""Registers the one-by-one executor.
|
43 |
+
|
44 |
+
Usage:
|
45 |
+
|
46 |
+
from lynxkite.core.executors import one_by_one
|
47 |
+
one_by_one.register("My Environment")
|
48 |
+
"""
|
49 |
if cache:
|
50 |
CACHES[env] = {}
|
51 |
cache = CACHES[env]
|
52 |
else:
|
53 |
cache = None
|
54 |
+
ops.EXECUTORS[env] = lambda ws: _execute(ws, ops.CATALOGS[env], cache=cache)
|
55 |
|
56 |
|
57 |
+
def _get_stages(ws, catalog: ops.Catalog):
|
58 |
"""Inputs on top/bottom are batch inputs. We decompose the graph into a DAG of components along these edges."""
|
59 |
nodes = {n.id: n for n in ws.nodes}
|
60 |
batch_inputs = {}
|
|
|
89 |
return {"__nonserializable__": id(obj)}
|
90 |
|
91 |
|
92 |
+
def _make_cache_key(obj):
|
93 |
return orjson.dumps(obj, default=_default_serializer)
|
94 |
|
95 |
|
96 |
EXECUTOR_OUTPUT_CACHE = {}
|
97 |
|
98 |
|
99 |
+
async def _await_if_needed(obj):
|
100 |
if inspect.isawaitable(obj):
|
101 |
return await obj
|
102 |
return obj
|
103 |
|
104 |
|
105 |
+
async def _execute(ws: workspace.Workspace, catalog: ops.Catalog, cache=None):
|
106 |
nodes = {n.id: n for n in ws.nodes}
|
107 |
contexts = {n.id: Context(node=n) for n in ws.nodes}
|
108 |
edges = {n.id: [] for n in ws.nodes}
|
|
|
121 |
tasks[node.id] = [NO_INPUT]
|
122 |
batch_inputs = {}
|
123 |
# Run the rest until we run out of tasks.
|
124 |
+
stages = _get_stages(ws, catalog)
|
125 |
for stage in stages:
|
126 |
next_stage = {}
|
127 |
while tasks:
|
|
|
132 |
node = nodes[n]
|
133 |
op = catalog[node.data.title]
|
134 |
params = {**node.data.params}
|
135 |
+
if _has_ctx(op):
|
136 |
params["_ctx"] = contexts[node.id]
|
137 |
results = []
|
138 |
node.publish_started()
|
|
|
156 |
node.publish_error(f"Missing input: {', '.join(missing)}")
|
157 |
break
|
158 |
if cache is not None:
|
159 |
+
key = _make_cache_key((inputs, params))
|
160 |
if key not in cache:
|
161 |
result: ops.Result = op(*inputs, **params)
|
162 |
result.output = await await_if_needed(result.output)
|
|
|
172 |
contexts[node.id].last_result = output
|
173 |
# Returned lists and DataFrames are considered multiple tasks.
|
174 |
if isinstance(output, pd.DataFrame):
|
175 |
+
output = _df_to_list(output)
|
176 |
elif not isinstance(output, list):
|
177 |
output = [output]
|
178 |
results.extend(output)
|
lynxkite-core/src/lynxkite/core/executors/simple.py
CHANGED
@@ -9,7 +9,13 @@ import graphlib
|
|
9 |
|
10 |
|
11 |
def register(env: str):
|
12 |
-
"""Registers the
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
ops.EXECUTORS[env] = lambda ws: execute(ws, ops.CATALOGS[env])
|
14 |
|
15 |
|
|
|
9 |
|
10 |
|
11 |
def register(env: str):
|
12 |
+
"""Registers the simple executor.
|
13 |
+
|
14 |
+
Usage:
|
15 |
+
|
16 |
+
from lynxkite.core.executors import simple
|
17 |
+
simple.register("My Environment")
|
18 |
+
"""
|
19 |
ops.EXECUTORS[env] = lambda ws: execute(ws, ops.CATALOGS[env])
|
20 |
|
21 |
|
lynxkite-core/src/lynxkite/core/ops.py
CHANGED
@@ -41,6 +41,7 @@ def type_to_json(t):
|
|
41 |
|
42 |
Type = Annotated[typing.Any, pydantic.PlainSerializer(type_to_json, return_type=dict)]
|
43 |
LongStr = Annotated[str, {"format": "textarea"}]
|
|
|
44 |
PathStr = Annotated[str, {"format": "path"}]
|
45 |
CollapsedStr = Annotated[str, {"format": "collapsed"}]
|
46 |
NodeAttribute = Annotated[str, {"format": "node attribute"}]
|
@@ -314,24 +315,41 @@ def matplotlib_to_image(func):
|
|
314 |
return wrapper
|
315 |
|
316 |
|
317 |
-
def input_position(**
|
318 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
319 |
|
320 |
def decorator(func):
|
321 |
op = func.__op__
|
322 |
-
for k, v in
|
323 |
op.get_input(k).position = Position(v)
|
324 |
return func
|
325 |
|
326 |
return decorator
|
327 |
|
328 |
|
329 |
-
def output_position(**
|
330 |
-
"""Decorator for specifying unusual positions for the outputs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
331 |
|
332 |
def decorator(func):
|
333 |
op = func.__op__
|
334 |
-
for k, v in
|
335 |
op.get_output(k).position = Position(v)
|
336 |
return func
|
337 |
|
|
|
41 |
|
42 |
Type = Annotated[typing.Any, pydantic.PlainSerializer(type_to_json, return_type=dict)]
|
43 |
LongStr = Annotated[str, {"format": "textarea"}]
|
44 |
+
"""LongStr is a string type for parameters that will be displayed as a multiline text area in the UI."""
|
45 |
PathStr = Annotated[str, {"format": "path"}]
|
46 |
CollapsedStr = Annotated[str, {"format": "collapsed"}]
|
47 |
NodeAttribute = Annotated[str, {"format": "node attribute"}]
|
|
|
315 |
return wrapper
|
316 |
|
317 |
|
318 |
+
def input_position(**positions):
|
319 |
+
"""
|
320 |
+
Decorator for specifying unusual positions for the inputs.
|
321 |
+
|
322 |
+
Example usage:
|
323 |
+
|
324 |
+
@input_position(a="bottom", b="bottom")
|
325 |
+
@op("test", "maybe add")
|
326 |
+
def maybe_add(a: list[int], b: list[int] | None = None):
|
327 |
+
return [a + b for a, b in zip(a, b)] if b else a
|
328 |
+
"""
|
329 |
|
330 |
def decorator(func):
|
331 |
op = func.__op__
|
332 |
+
for k, v in positions.items():
|
333 |
op.get_input(k).position = Position(v)
|
334 |
return func
|
335 |
|
336 |
return decorator
|
337 |
|
338 |
|
339 |
+
def output_position(**positions):
|
340 |
+
"""Decorator for specifying unusual positions for the outputs.
|
341 |
+
|
342 |
+
Example usage:
|
343 |
+
|
344 |
+
@output_position(output="top")
|
345 |
+
@op("test", "maybe add")
|
346 |
+
def maybe_add(a: list[int], b: list[int] | None = None):
|
347 |
+
return [a + b for a, b in zip(a, b)] if b else a
|
348 |
+
"""
|
349 |
|
350 |
def decorator(func):
|
351 |
op = func.__op__
|
352 |
+
for k, v in positions.items():
|
353 |
op.get_output(k).position = Position(v)
|
354 |
return func
|
355 |
|
lynxkite-graph-analytics/src/lynxkite_graph_analytics/core.py
CHANGED
@@ -17,16 +17,28 @@ ENV = "LynxKite Graph Analytics"
|
|
17 |
|
18 |
@dataclasses.dataclass
|
19 |
class RelationDefinition:
|
20 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
-
df: str
|
23 |
-
source_column: str
|
24 |
-
target_column: str
|
25 |
-
source_table: str
|
26 |
-
target_table: str
|
27 |
-
source_key: str
|
28 |
-
target_key: str
|
29 |
-
name: str | None = None
|
30 |
|
31 |
|
32 |
@dataclasses.dataclass
|
@@ -34,7 +46,16 @@ class Bundle:
|
|
34 |
"""A collection of DataFrames and other data.
|
35 |
|
36 |
Can efficiently represent a knowledge graph (homogeneous or heterogeneous) or tabular data.
|
37 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
"""
|
39 |
|
40 |
dfs: dict[str, pd.DataFrame] = dataclasses.field(default_factory=dict)
|
@@ -91,7 +112,10 @@ class Bundle:
|
|
91 |
return graph
|
92 |
|
93 |
def copy(self):
|
94 |
-
"""
|
|
|
|
|
|
|
95 |
return Bundle(
|
96 |
dfs=dict(self.dfs),
|
97 |
relations=list(self.relations),
|
|
|
17 |
|
18 |
@dataclasses.dataclass
|
19 |
class RelationDefinition:
|
20 |
+
"""
|
21 |
+
Defines a set of edges.
|
22 |
+
|
23 |
+
Attributes:
|
24 |
+
df: The name of the DataFrame that contains the edges.
|
25 |
+
source_column: The column in the edge DataFrame that contains the source node ID.
|
26 |
+
target_column: The column in the edge DataFrame that contains the target node ID.
|
27 |
+
source_table: The name of the DataFrame that contains the source nodes.
|
28 |
+
target_table: The name of the DataFrame that contains the target nodes.
|
29 |
+
source_key: The column in the source table that contains the node ID.
|
30 |
+
target_key: The column in the target table that contains the node ID.
|
31 |
+
name: Descriptive name for the relation.
|
32 |
+
"""
|
33 |
|
34 |
+
df: str
|
35 |
+
source_column: str
|
36 |
+
target_column: str
|
37 |
+
source_table: str
|
38 |
+
target_table: str
|
39 |
+
source_key: str
|
40 |
+
target_key: str
|
41 |
+
name: str | None = None
|
42 |
|
43 |
|
44 |
@dataclasses.dataclass
|
|
|
46 |
"""A collection of DataFrames and other data.
|
47 |
|
48 |
Can efficiently represent a knowledge graph (homogeneous or heterogeneous) or tabular data.
|
49 |
+
|
50 |
+
By convention, if it contains a single DataFrame, it is called `df`.
|
51 |
+
If it contains a homogeneous graph, it is represented as two DataFrames called `nodes` and
|
52 |
+
`edges`.
|
53 |
+
|
54 |
+
Attributes:
|
55 |
+
dfs: Named DataFrames.
|
56 |
+
relations: Metadata that describes the roles of each DataFrame.
|
57 |
+
Can be empty, if the bundle is just one or more DataFrames.
|
58 |
+
other: Other data, such as a trained model.
|
59 |
"""
|
60 |
|
61 |
dfs: dict[str, pd.DataFrame] = dataclasses.field(default_factory=dict)
|
|
|
112 |
return graph
|
113 |
|
114 |
def copy(self):
|
115 |
+
"""
|
116 |
+
Returns a shallow copy of the bundle. The Bundle and its containers are new, but
|
117 |
+
the DataFrames and RelationDefinitions are shared. (The contents of `other` are also shared.)
|
118 |
+
"""
|
119 |
return Bundle(
|
120 |
dfs=dict(self.dfs),
|
121 |
relations=list(self.relations),
|
lynxkite-graph-analytics/src/lynxkite_graph_analytics/lynxkite_ops.py
CHANGED
@@ -312,9 +312,3 @@ def create_graph(bundle: core.Bundle, *, relations: str = None) -> core.Bundle:
|
|
312 |
if not (relations is None or relations.strip() == ""):
|
313 |
bundle.relations = [core.RelationDefinition(**r) for r in json.loads(relations).values()]
|
314 |
return ops.Result(output=bundle, display=bundle.to_dict(limit=100))
|
315 |
-
|
316 |
-
|
317 |
-
@op("Biomedical foundation graph (PLACEHOLDER)")
|
318 |
-
def biomedical_foundation_graph(*, filter_nodes: str):
|
319 |
-
"""Loads the gigantic Lynx-maintained knowledge graph. Includes drugs, diseases, genes, proteins, etc."""
|
320 |
-
return None
|
|
|
312 |
if not (relations is None or relations.strip() == ""):
|
313 |
bundle.relations = [core.RelationDefinition(**r) for r in json.loads(relations).values()]
|
314 |
return ops.Result(output=bundle, display=bundle.to_dict(limit=100))
|
|
|
|
|
|
|
|
|
|
|
|
mkdocs.yml
CHANGED
@@ -1,6 +1,25 @@
|
|
1 |
-
site_name: "LynxKite"
|
2 |
-
repo_url: https://github.com/lynxkite/lynxkite
|
3 |
-
repo_name: lynxkite/lynxkite
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
|
5 |
theme:
|
6 |
name: "material"
|
@@ -13,13 +32,35 @@ theme:
|
|
13 |
- navigation.path
|
14 |
- navigation.instant
|
15 |
- navigation.instant.prefetch
|
|
|
|
|
|
|
16 |
|
17 |
extra_css:
|
18 |
- stylesheets/extra.css
|
19 |
|
20 |
plugins:
|
21 |
- search
|
|
|
22 |
- mkdocstrings:
|
23 |
handlers:
|
24 |
python:
|
25 |
paths: ["./lynxkite-app/src", "./lynxkite-core/src", "./lynxkite-graph-analytics/src"]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
site_name: "LynxKite 2000:MM"
|
2 |
+
repo_url: https://github.com/lynxkite/lynxkite-2000
|
3 |
+
repo_name: lynxkite/lynxkite-2000
|
4 |
+
watch: [mkdocs.yml, README.md, lynxkite-core, lynxkite-graph-analytics, lynxkite-app]
|
5 |
+
|
6 |
+
nav:
|
7 |
+
- Home:
|
8 |
+
- Overview: index.md
|
9 |
+
- License: license.md
|
10 |
+
- Usage:
|
11 |
+
- usage/quickstart.md
|
12 |
+
- usage/plugins.md
|
13 |
+
- API reference:
|
14 |
+
- LynxKite Core:
|
15 |
+
- reference/lynxkite-core/ops.md
|
16 |
+
- reference/lynxkite-core/workspace.md
|
17 |
+
- Executors:
|
18 |
+
- reference/lynxkite-core/executors/simple.md
|
19 |
+
- reference/lynxkite-core/executors/one_by_one.md
|
20 |
+
- LynxKite Graph Analytics:
|
21 |
+
- reference/lynxkite-graph-analytics/core.md
|
22 |
+
- reference/lynxkite-graph-analytics/operations.md
|
23 |
|
24 |
theme:
|
25 |
name: "material"
|
|
|
32 |
- navigation.path
|
33 |
- navigation.instant
|
34 |
- navigation.instant.prefetch
|
35 |
+
- navigation.footer
|
36 |
+
- content.code.annotate
|
37 |
+
- content.code.copy
|
38 |
|
39 |
extra_css:
|
40 |
- stylesheets/extra.css
|
41 |
|
42 |
plugins:
|
43 |
- search
|
44 |
+
- autorefs
|
45 |
- mkdocstrings:
|
46 |
handlers:
|
47 |
python:
|
48 |
paths: ["./lynxkite-app/src", "./lynxkite-core/src", "./lynxkite-graph-analytics/src"]
|
49 |
+
options:
|
50 |
+
show_source: false
|
51 |
+
show_symbol_type_heading: true
|
52 |
+
show_symbol_type_toc: true
|
53 |
+
docstring_section_style: spacy
|
54 |
+
separate_signature: true
|
55 |
+
show_signature_annotations: true
|
56 |
+
signature_crossrefs: true
|
57 |
+
markdown_extensions:
|
58 |
+
- pymdownx.highlight:
|
59 |
+
anchor_linenums: true
|
60 |
+
line_spans: __span
|
61 |
+
pygments_lang_class: true
|
62 |
+
- pymdownx.inlinehilite
|
63 |
+
- pymdownx.snippets
|
64 |
+
- pymdownx.superfences
|
65 |
+
- toc:
|
66 |
+
permalink: "¤"
|