darabos commited on
Commit
c577568
·
1 Parent(s): 53103e2

Improve docs, especially for plugin development.

Browse files
README.md CHANGED
@@ -1,17 +1,9 @@
1
- ---
2
- title: LynxKite 2000:MM
3
- emoji: 🪁
4
- colorFrom: purple
5
- colorTo: gray
6
- sdk: docker
7
- app_port: 7860
8
- ---
9
-
10
  # LynxKite 2000:MM
11
 
12
  LynxKite 2000:MM is a GPU-accelerated data science platform and a general tool for collaboratively edited workflows.
13
 
14
  Features include:
 
15
  - A web UI for building and executing data science workflows.
16
  - An extensive toolbox of graph analytics operations powered by NVIDIA RAPIDS (CUDA).
17
  - An integrated collaborative code editor makes it easy to add new operations.
@@ -20,7 +12,7 @@ Features include:
20
 
21
  This is the next evolution of the classical [LynxKite](https://github.com/lynxkite/lynxkite).
22
  The two tools offer similar functionality, but are not compatible.
23
- Where classical LynxKite ran on Hadoop clusters, this version runs on GPU clusters.
24
  It targets CUDA instead of Apache Spark. It is much more extensible.
25
 
26
  ## Structure
 
 
 
 
 
 
 
 
 
 
1
  # LynxKite 2000:MM
2
 
3
  LynxKite 2000:MM is a GPU-accelerated data science platform and a general tool for collaboratively edited workflows.
4
 
5
  Features include:
6
+
7
  - A web UI for building and executing data science workflows.
8
  - An extensive toolbox of graph analytics operations powered by NVIDIA RAPIDS (CUDA).
9
  - An integrated collaborative code editor makes it easy to add new operations.
 
12
 
13
  This is the next evolution of the classical [LynxKite](https://github.com/lynxkite/lynxkite).
14
  The two tools offer similar functionality, but are not compatible.
15
+ This version runs on GPU clusters instead of Hadoop clusters.
16
  It targets CUDA instead of Apache Spark. It is much more extensible.
17
 
18
  ## Structure
docs/index.md CHANGED
@@ -1,3 +1,5 @@
1
- # Getting started
 
 
2
 
3
- Good luck getting started!
 
1
+ ---
2
+ title: Overview
3
+ ---
4
 
5
+ --8<-- "README.md"
docs/license.md ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # License
2
+
3
+ LynxKite 2000:MM is available under the GNU AGPLv3 license below.
4
+
5
+ Additionally, [Lynx Analytics](https://www.lynxanalytics.com/) offers a commercial license of LynxKite 2000:MM
6
+ that includes additional features and support. Get in touch if you are interested in life sciences tools
7
+ and cluster deployment!
8
+
9
+ ```
10
+ --8<-- "LICENSE"
11
+ ```
docs/lynxkite-core.md DELETED
@@ -1,6 +0,0 @@
1
- # LynxKite Core
2
-
3
- LynxKite core is for writing LynxKite plugins.
4
- It contains core types and utilities that can be used by all LynxKite plugins.
5
-
6
- ::: lynxkite.core.ops
 
 
 
 
 
 
 
docs/lynxkite-graph-analytics.md DELETED
@@ -1,6 +0,0 @@
1
- # LynxKite Graph Analytics
2
-
3
- This is the classical LynxKite experience!
4
- The graph analytics plugin is a collection of graph algorithms that can be run on a LynxKite graph.
5
-
6
- ::: lynxkite_graph_analytics.lynxkite_ops
 
 
 
 
 
 
 
docs/reference/lynxkite-core/executors/one_by_one.md ADDED
@@ -0,0 +1 @@
 
 
1
+ ::: lynxkite.core.executors.one_by_one
docs/reference/lynxkite-core/executors/simple.md ADDED
@@ -0,0 +1 @@
 
 
1
+ ::: lynxkite.core.executors.simple
docs/reference/lynxkite-core/ops.md ADDED
@@ -0,0 +1 @@
 
 
1
+ ::: lynxkite.core.ops
docs/reference/lynxkite-core/workspace.md ADDED
@@ -0,0 +1 @@
 
 
1
+ ::: lynxkite.core.workspace
docs/reference/lynxkite-graph-analytics/core.md ADDED
@@ -0,0 +1 @@
 
 
1
+ ::: lynxkite_graph_analytics.core
docs/reference/lynxkite-graph-analytics/operations.md ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ ::: lynxkite_graph_analytics.lynxkite_ops
2
+ ::: lynxkite_graph_analytics.ml_ops
3
+ ::: lynxkite_graph_analytics.networkx_ops
docs/usage/plugins.md ADDED
@@ -0,0 +1,278 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Plugin development
2
+
3
+ Plugins can provide additional operations for an existing LynxKite environment,
4
+ and they can also provide new environments.
5
+
6
+ ## Creating a new plugin
7
+
8
+ `.py` files inside the LynxKite data directory are automatically imported each time a
9
+ workspace is executed. You can create a new plugin by creating a new `.py` file in the
10
+ data directory. LynxKite even includes an integrated editor for this purpose.
11
+ Click **New code file** in the directory where you want to create the file.
12
+
13
+ Plugins in subdirectories of the data directory are imported when executing workspaces
14
+ within those directories. This allows you to create plugins that are only available
15
+ in specific workspaces.
16
+
17
+ You can also create and distribute plugins as Python packages. In this case the
18
+ module name must start with `lynxkite_` for it to be automatically imported on startup.
19
+
20
+ ### Plugin dependencies
21
+
22
+ When creating a plugin as a "code file", you can create a `requirements.txt` file in the same
23
+ directory. This file will be used to install the dependencies of the plugin.
24
+
25
+ ## Adding new operations
26
+
27
+ Any piece of Python code can easily be wrapped into a LynxKite operation.
28
+ Let's say we have some code that calculates the length of a string column in a Pandas DataFrame:
29
+
30
+ ```python
31
+ df["length"] = df["my_column"].str.len()
32
+ ```
33
+
34
+ We can turn it into a LynxKite operation using the
35
+ [`@op`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.op) decorator:
36
+
37
+ ```python
38
+ import pandas as pd
39
+ from lynxkite.core.ops import op
40
+
41
+ @op("LynxKite Graph Analytics", "Get column length")
42
+ def get_length(df: pd.DataFrame, *, column_name: str):
43
+ """
44
+ Gets the length of a string column.
45
+
46
+ Args:
47
+ column_name: The name of the column to get the length of.
48
+ """
49
+ df = df.copy()
50
+ df["length"] = df[column_name].str.len()
51
+ return df
52
+ ```
53
+
54
+ Let's review the changes we made.
55
+
56
+ ### The `@op` decorator
57
+
58
+ The [`@op`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.op) decorator registers a
59
+ function as a LynxKite operation. The first argument is the name of the environment,
60
+ the second argument is the name of the operation.
61
+
62
+ When defining multiple operations, you can use
63
+ [`ops.op_registration`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.op_registration)
64
+ for convenience:
65
+ ```python
66
+ op = ops.op_registration("LynxKite Graph Analytics")
67
+
68
+ @op("An operation")
69
+ def my_op():
70
+ ...
71
+ ```
72
+
73
+ ### The function signature
74
+
75
+ `*` in the list of function arguments marks the start of keyword-only arguments.
76
+ The arguments before `*` will become _inputs_ of the operation. The arguments after `*` will
77
+ be its _parameters_.
78
+
79
+ ```python
80
+ # /--- inputs ---\ /- parameters -\
81
+ def get_length(df: pd.DataFrame, *, column_name: str):
82
+ ```
83
+
84
+ LynxKite uses the type annotations of the function arguments to provide input validation,
85
+ conversion, and the right UI on the frontend.
86
+
87
+ The types supported for **inputs** are determined by the environment. For graph analytics,
88
+ the possibilities are:
89
+
90
+ - `pandas.DataFrame`
91
+ - `networkx.Graph`
92
+ - [`lynxkite_graph_analytics.Bundle`](../reference/lynxkite-graph-analytics/core.md#lynxkite_graph_analytics.core.Bundle)
93
+
94
+ The inputs of an operation are automatically converted to the right type, when possible.
95
+
96
+ To make an input optional, use an optional type, like `pd.DataFrame | None`.
97
+
98
+ The position of the input and output connectors can be controlled using the
99
+ [`@ops.input_position`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.input_position) and
100
+ [`@ops.output_position`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.output_position)
101
+ decorators. By default, inputs are on the left and outputs on the right.
102
+
103
+ All **parameters** are stored in LynxKite workspaces as strings. If a type annotation is provided,
104
+ LynxKite will convert the string to the right type and provide the right UI.
105
+
106
+ - `str`, `int`, `float` are presented as a text box and converted to the given type.
107
+ - `bool` is presented as a checkbox.
108
+ - [`lynxkite.core.ops.LongStr`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.LongStr)
109
+ is presented as a text area.
110
+ - Enums are presented as a dropdown list.
111
+ - Pydantic models are presented as their JSON string representations. (Unless you add custom UI
112
+ for them.) They are converted to the model object when your function is called.
113
+
114
+ ### Slow operations
115
+
116
+ If the function takes a significant amount of time to run, we must either:
117
+
118
+ - Write an asynchronous function.
119
+ - Pass `slow=True` to the `@op` decorator. LynxKite will run the function in a separate thread.
120
+
121
+ `slow=True` also causes the results of the operation to be cached on disk. As long as
122
+ its inputs don't change, the operation will not be run again. This is useful for both
123
+ synchronous and synchronous operations.
124
+
125
+ ### Documentation
126
+
127
+ The docstring of the function is used as the operation's description. You can use
128
+ Google-style or Numpy-style docstrings.
129
+ (See [Griffe's documentation](https://mkdocstrings.github.io/griffe/reference/docstrings/).)
130
+
131
+ The docstring should be omitted for simple operations like the one above.
132
+
133
+ ### Outputting results
134
+
135
+ The return value of the function is the output of the operation. It will be passed to the
136
+ next operation in the pipeline.
137
+
138
+ An operation can have multiple outputs. In this case, the return value must be a dictionary,
139
+ and the list of outputs must be declared in the `@op` decorator.
140
+
141
+ ```python
142
+ @op("LynxKite Graph Analytics", "Train/test split", outputs=["train", "test"])
143
+ def test_split(df: pd.DataFrame, *, test_ratio=0.1):
144
+ test = df.sample(frac=test_ratio).reset_index()
145
+ train = df.drop(test.index).reset_index()
146
+ return {"train": train, "test": test}
147
+ ```
148
+
149
+ ### Displaying results
150
+
151
+ The outputs of the operation can be used by other operations. But we can also generate results
152
+ that are meant to be viewed by the user. The different options for this are controlled by the `view`
153
+ argument of the `@op` decorator.
154
+
155
+ The `view` argument can be one of the following:
156
+
157
+ - `matplotlib`: Just plot something with Matplotlib and it will be displayed in the UI.
158
+
159
+ ```python
160
+ @op("LynxKite Graph Analytics", "Plot column histogram", view="matplotlib")
161
+ def plot(df: pd.DataFrame, *, column_name: str):
162
+ df[column_name].value_counts().sort_index().plot.bar()
163
+ ```
164
+
165
+ - `visualization`: Draws a chart using [ECharts](https://echarts.apache.org/examples/en/index.html).
166
+ You need to return a dictionary with the chart configuration, which ECharts calls `option`.
167
+
168
+ ```python
169
+ @op("View loss", view="visualization")
170
+ def view_loss(bundle: core.Bundle):
171
+ loss = bundle.dfs["training"].training_loss.tolist()
172
+ v = {
173
+ "title": {"text": "Training loss"},
174
+ "xAxis": {"type": "category"},
175
+ "yAxis": {"type": "value"},
176
+ "series": [{"data": loss, "type": "line"}],
177
+ }
178
+ return v
179
+ ```
180
+
181
+ - `image`: Return an image as a
182
+ [data URL](https://developer.mozilla.org/en-US/docs/Web/URI/Reference/Schemes/data)
183
+ and it will be displayed.
184
+ - `molecule`: Return a molecule as a PDB or SDF string, or an `rdkit.Chem.Mol` object.
185
+ It will be displayed using [3Dmol.js](https://3Dmol.org/).
186
+ - `table_view`: Return
187
+ [`Bundle.to_dict()`](../reference/lynxkite-graph-analytics/core.md#lynxkite_graph_analytics.core.Bundle.to_dict).
188
+
189
+ ## Adding new environments
190
+
191
+ A new environment means a completely new set of operations, and (optionally) a new
192
+ executor. There's nothing to be done for setting up a new environment. Just start
193
+ registering operations into it.
194
+
195
+ ### No executor
196
+
197
+ By default, the new environment will have no executor. This can be useful!
198
+
199
+ LynxKite workspaces are stored as straightforward JSON files and updated on every modification.
200
+ You can use LynxKite for configuring workflows and have a separate system
201
+ read the JSON files.
202
+
203
+ Since the code of the operations is not executed in this case, you can create functions that do nothing.
204
+ Alternatively, you can use the
205
+ [`register_passive_op`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.register_passive_op)
206
+ and
207
+ [`passive_op_registration`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.passive_op_registration)
208
+ functions to easily whip up a set of operations:
209
+
210
+ ```python
211
+ from lynxkite.core.ops import passive_op_registration, Parameter as P
212
+
213
+ op = passive_op_registration("My Environment")
214
+ op('Scrape documents', params=[P('url', '')])
215
+ op('Conversation logs')
216
+ op('Extract graph')
217
+ op('Compute embeddings', params=[P.options('method', ['LLM', 'graph', 'random']), P('dimensions', 1234)])
218
+ op('Vector DB', params=[P.options('backend', ['ANN', 'HNSW'])])
219
+ op('Chat UI', outputs=[])
220
+ op('Chat backend')
221
+ ```
222
+
223
+ ### Built-in executors
224
+
225
+ LynxKite comes with two built-in executors. You can register these for your environment
226
+ and you're good to go.
227
+
228
+ ```python
229
+ from lynxkite.core.executors import simple
230
+ simple.register("My Environment")
231
+ ```
232
+
233
+ The [`simple` executor](../reference/lynxkite-core/executors/simple.md)
234
+ runs each operation once, passing the output of the preceding operation
235
+ as the input to the next one. No tricks. You can use any types as inputs and outputs.
236
+
237
+ ```python
238
+ from lynxkite.core.executors import one_by_one
239
+ one_by_one.register("My Environment")
240
+ ```
241
+
242
+ The [`one_by_one` executor](../reference/lynxkite-core/executors/one_by_one.md)
243
+ expects that the code for operations is the code for transforming
244
+ a single element. If an operation returns an iterable, it will be split up
245
+ into its elements, and the next operation is called for each element.
246
+
247
+ Sometimes you need the full contents of an input. The `one_by_one` executor
248
+ lets you choose between the two modes by the orientation of the input connector.
249
+ If the input connector is horizontal (left or right), it takes single elements.
250
+ If the input connector is vertical (top or bottom), it takes an iterable of all the incoming data.
251
+
252
+ A unique advantage of this setup is that horizontal inputs can have loops across
253
+ horizontal inputs. Just make sure that loops eventually discard all elements, so you don't
254
+ end up with an infinite loop.
255
+
256
+ ### Custom executors
257
+
258
+ A custom executor can be registered using
259
+ [`@ops.register_executor`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.register_executor).
260
+
261
+ ```python
262
+ @ops.register_executor(ENV)
263
+ async def execute(ws: workspace.Workspace):
264
+ catalog = ops.CATALOGS[ws.env]
265
+ ...
266
+ ```
267
+
268
+ The executor must be an asynchronous function that takes a
269
+ [`workspace.Workspace`](../reference/lynxkite-core/workspace.md#lynxkite.core.workspace.Workspace)
270
+ as an argument. The return value is ignored and it's up to you how you process the workspace.
271
+
272
+ To update the frontend as the executor processes the workspace, call
273
+ [`WorkspaceNode.publish_started`](../reference/lynxkite-core/workspace.md#lynxkite.core.workspace.WorkspaceNode.publish_started)
274
+ when starting to execute a node, and
275
+ [`WorkspaceNode.publish_result`](../reference/lynxkite-core/workspace.md#lynxkite.core.workspace.WorkspaceNode.publish_result)
276
+ to publish the results. Use
277
+ [`WorkspaceNode.publish_error`](../reference/lynxkite-core/workspace.md#lynxkite.core.workspace.WorkspaceNode.publish_error)
278
+ if the node failed.
docs/usage/quickstart.md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Quickstart
2
+
3
+ Install the LynxKite application with `pip`:
4
+ ```bash
5
+ pip install lynxkite
6
+ ```
7
+
8
+ To be able to do anything useful, you also need to install one or more LynxKite environments.
9
+ If you want to work with data science and graph analytics, install the `lynxkite-graph-analytics` package:
10
+ ```bash
11
+ pip install lynxkite-graph-analytics
12
+ ```
13
+
14
+ Create a folder for storing your LynxKite projects:
15
+ ```bash
16
+ mkdir ~/lynxkite_projects
17
+ ```
18
+
19
+ You're ready to run LynxKite!
20
+ ```bash
21
+ cd ~/lynxkite_projects
22
+ lynxkite
23
+ ```
24
+
25
+ Open [http://localhost:8000/](http://localhost:8000/) in your browser.
lynxkite-core/src/lynxkite/core/executors/one_by_one.py CHANGED
@@ -1,4 +1,6 @@
1
- """A LynxKite executor that assumes most operations operate on their input one by one."""
 
 
2
 
3
  from .. import ops
4
  from .. import workspace
@@ -11,24 +13,24 @@ import typing
11
 
12
 
13
  class Context(ops.BaseConfig):
14
- """Passed to operation functions as "_ctx" if they have such a parameter."""
 
 
 
 
 
 
 
15
 
16
  node: workspace.WorkspaceNode
17
  last_result: typing.Any = None
18
 
19
 
20
- class Output(ops.BaseConfig):
21
- """Return this to send values to specific outputs of a node."""
22
-
23
- output_handle: str
24
- value: dict
25
-
26
-
27
- def df_to_list(df):
28
  return df.to_dict(orient="records")
29
 
30
 
31
- def has_ctx(op):
32
  sig = inspect.signature(op.func)
33
  return "_ctx" in sig.parameters
34
 
@@ -37,16 +39,22 @@ CACHES = {}
37
 
38
 
39
  def register(env: str, cache: bool = True):
40
- """Registers the one-by-one executor."""
 
 
 
 
 
 
41
  if cache:
42
  CACHES[env] = {}
43
  cache = CACHES[env]
44
  else:
45
  cache = None
46
- ops.EXECUTORS[env] = lambda ws: execute(ws, ops.CATALOGS[env], cache=cache)
47
 
48
 
49
- def get_stages(ws, catalog: ops.Catalog):
50
  """Inputs on top/bottom are batch inputs. We decompose the graph into a DAG of components along these edges."""
51
  nodes = {n.id: n for n in ws.nodes}
52
  batch_inputs = {}
@@ -81,20 +89,20 @@ def _default_serializer(obj):
81
  return {"__nonserializable__": id(obj)}
82
 
83
 
84
- def make_cache_key(obj):
85
  return orjson.dumps(obj, default=_default_serializer)
86
 
87
 
88
  EXECUTOR_OUTPUT_CACHE = {}
89
 
90
 
91
- async def await_if_needed(obj):
92
  if inspect.isawaitable(obj):
93
  return await obj
94
  return obj
95
 
96
 
97
- async def execute(ws: workspace.Workspace, catalog: ops.Catalog, cache=None):
98
  nodes = {n.id: n for n in ws.nodes}
99
  contexts = {n.id: Context(node=n) for n in ws.nodes}
100
  edges = {n.id: [] for n in ws.nodes}
@@ -113,7 +121,7 @@ async def execute(ws: workspace.Workspace, catalog: ops.Catalog, cache=None):
113
  tasks[node.id] = [NO_INPUT]
114
  batch_inputs = {}
115
  # Run the rest until we run out of tasks.
116
- stages = get_stages(ws, catalog)
117
  for stage in stages:
118
  next_stage = {}
119
  while tasks:
@@ -124,7 +132,7 @@ async def execute(ws: workspace.Workspace, catalog: ops.Catalog, cache=None):
124
  node = nodes[n]
125
  op = catalog[node.data.title]
126
  params = {**node.data.params}
127
- if has_ctx(op):
128
  params["_ctx"] = contexts[node.id]
129
  results = []
130
  node.publish_started()
@@ -148,7 +156,7 @@ async def execute(ws: workspace.Workspace, catalog: ops.Catalog, cache=None):
148
  node.publish_error(f"Missing input: {', '.join(missing)}")
149
  break
150
  if cache is not None:
151
- key = make_cache_key((inputs, params))
152
  if key not in cache:
153
  result: ops.Result = op(*inputs, **params)
154
  result.output = await await_if_needed(result.output)
@@ -164,7 +172,7 @@ async def execute(ws: workspace.Workspace, catalog: ops.Catalog, cache=None):
164
  contexts[node.id].last_result = output
165
  # Returned lists and DataFrames are considered multiple tasks.
166
  if isinstance(output, pd.DataFrame):
167
- output = df_to_list(output)
168
  elif not isinstance(output, list):
169
  output = [output]
170
  results.extend(output)
 
1
+ """
2
+ A LynxKite executor that assumes most operations operate on their input one by one.
3
+ """
4
 
5
  from .. import ops
6
  from .. import workspace
 
13
 
14
 
15
  class Context(ops.BaseConfig):
16
+ """Passed to operation functions as "_ctx" if they have such a parameter.
17
+
18
+ Attributes:
19
+ node: The workspace node that this context is associated with.
20
+ last_result: The last result produced by the operation.
21
+ This can be used to incrementally build a result, when the operation
22
+ is executed for multiple items.
23
+ """
24
 
25
  node: workspace.WorkspaceNode
26
  last_result: typing.Any = None
27
 
28
 
29
+ def _df_to_list(df):
 
 
 
 
 
 
 
30
  return df.to_dict(orient="records")
31
 
32
 
33
+ def _has_ctx(op):
34
  sig = inspect.signature(op.func)
35
  return "_ctx" in sig.parameters
36
 
 
39
 
40
 
41
  def register(env: str, cache: bool = True):
42
+ """Registers the one-by-one executor.
43
+
44
+ Usage:
45
+
46
+ from lynxkite.core.executors import one_by_one
47
+ one_by_one.register("My Environment")
48
+ """
49
  if cache:
50
  CACHES[env] = {}
51
  cache = CACHES[env]
52
  else:
53
  cache = None
54
+ ops.EXECUTORS[env] = lambda ws: _execute(ws, ops.CATALOGS[env], cache=cache)
55
 
56
 
57
+ def _get_stages(ws, catalog: ops.Catalog):
58
  """Inputs on top/bottom are batch inputs. We decompose the graph into a DAG of components along these edges."""
59
  nodes = {n.id: n for n in ws.nodes}
60
  batch_inputs = {}
 
89
  return {"__nonserializable__": id(obj)}
90
 
91
 
92
+ def _make_cache_key(obj):
93
  return orjson.dumps(obj, default=_default_serializer)
94
 
95
 
96
  EXECUTOR_OUTPUT_CACHE = {}
97
 
98
 
99
+ async def _await_if_needed(obj):
100
  if inspect.isawaitable(obj):
101
  return await obj
102
  return obj
103
 
104
 
105
+ async def _execute(ws: workspace.Workspace, catalog: ops.Catalog, cache=None):
106
  nodes = {n.id: n for n in ws.nodes}
107
  contexts = {n.id: Context(node=n) for n in ws.nodes}
108
  edges = {n.id: [] for n in ws.nodes}
 
121
  tasks[node.id] = [NO_INPUT]
122
  batch_inputs = {}
123
  # Run the rest until we run out of tasks.
124
+ stages = _get_stages(ws, catalog)
125
  for stage in stages:
126
  next_stage = {}
127
  while tasks:
 
132
  node = nodes[n]
133
  op = catalog[node.data.title]
134
  params = {**node.data.params}
135
+ if _has_ctx(op):
136
  params["_ctx"] = contexts[node.id]
137
  results = []
138
  node.publish_started()
 
156
  node.publish_error(f"Missing input: {', '.join(missing)}")
157
  break
158
  if cache is not None:
159
+ key = _make_cache_key((inputs, params))
160
  if key not in cache:
161
  result: ops.Result = op(*inputs, **params)
162
  result.output = await await_if_needed(result.output)
 
172
  contexts[node.id].last_result = output
173
  # Returned lists and DataFrames are considered multiple tasks.
174
  if isinstance(output, pd.DataFrame):
175
+ output = _df_to_list(output)
176
  elif not isinstance(output, list):
177
  output = [output]
178
  results.extend(output)
lynxkite-core/src/lynxkite/core/executors/simple.py CHANGED
@@ -9,7 +9,13 @@ import graphlib
9
 
10
 
11
  def register(env: str):
12
- """Registers the one-by-one executor."""
 
 
 
 
 
 
13
  ops.EXECUTORS[env] = lambda ws: execute(ws, ops.CATALOGS[env])
14
 
15
 
 
9
 
10
 
11
  def register(env: str):
12
+ """Registers the simple executor.
13
+
14
+ Usage:
15
+
16
+ from lynxkite.core.executors import simple
17
+ simple.register("My Environment")
18
+ """
19
  ops.EXECUTORS[env] = lambda ws: execute(ws, ops.CATALOGS[env])
20
 
21
 
lynxkite-core/src/lynxkite/core/ops.py CHANGED
@@ -41,6 +41,7 @@ def type_to_json(t):
41
 
42
  Type = Annotated[typing.Any, pydantic.PlainSerializer(type_to_json, return_type=dict)]
43
  LongStr = Annotated[str, {"format": "textarea"}]
 
44
  PathStr = Annotated[str, {"format": "path"}]
45
  CollapsedStr = Annotated[str, {"format": "collapsed"}]
46
  NodeAttribute = Annotated[str, {"format": "node attribute"}]
@@ -314,24 +315,41 @@ def matplotlib_to_image(func):
314
  return wrapper
315
 
316
 
317
- def input_position(**kwargs):
318
- """Decorator for specifying unusual positions for the inputs."""
 
 
 
 
 
 
 
 
 
319
 
320
  def decorator(func):
321
  op = func.__op__
322
- for k, v in kwargs.items():
323
  op.get_input(k).position = Position(v)
324
  return func
325
 
326
  return decorator
327
 
328
 
329
- def output_position(**kwargs):
330
- """Decorator for specifying unusual positions for the outputs."""
 
 
 
 
 
 
 
 
331
 
332
  def decorator(func):
333
  op = func.__op__
334
- for k, v in kwargs.items():
335
  op.get_output(k).position = Position(v)
336
  return func
337
 
 
41
 
42
  Type = Annotated[typing.Any, pydantic.PlainSerializer(type_to_json, return_type=dict)]
43
  LongStr = Annotated[str, {"format": "textarea"}]
44
+ """LongStr is a string type for parameters that will be displayed as a multiline text area in the UI."""
45
  PathStr = Annotated[str, {"format": "path"}]
46
  CollapsedStr = Annotated[str, {"format": "collapsed"}]
47
  NodeAttribute = Annotated[str, {"format": "node attribute"}]
 
315
  return wrapper
316
 
317
 
318
+ def input_position(**positions):
319
+ """
320
+ Decorator for specifying unusual positions for the inputs.
321
+
322
+ Example usage:
323
+
324
+ @input_position(a="bottom", b="bottom")
325
+ @op("test", "maybe add")
326
+ def maybe_add(a: list[int], b: list[int] | None = None):
327
+ return [a + b for a, b in zip(a, b)] if b else a
328
+ """
329
 
330
  def decorator(func):
331
  op = func.__op__
332
+ for k, v in positions.items():
333
  op.get_input(k).position = Position(v)
334
  return func
335
 
336
  return decorator
337
 
338
 
339
+ def output_position(**positions):
340
+ """Decorator for specifying unusual positions for the outputs.
341
+
342
+ Example usage:
343
+
344
+ @output_position(output="top")
345
+ @op("test", "maybe add")
346
+ def maybe_add(a: list[int], b: list[int] | None = None):
347
+ return [a + b for a, b in zip(a, b)] if b else a
348
+ """
349
 
350
  def decorator(func):
351
  op = func.__op__
352
+ for k, v in positions.items():
353
  op.get_output(k).position = Position(v)
354
  return func
355
 
lynxkite-graph-analytics/src/lynxkite_graph_analytics/core.py CHANGED
@@ -17,16 +17,28 @@ ENV = "LynxKite Graph Analytics"
17
 
18
  @dataclasses.dataclass
19
  class RelationDefinition:
20
- """Defines a set of edges."""
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
- df: str # The DataFrame that contains the edges.
23
- source_column: str # The column in the edge DataFrame that contains the source node ID.
24
- target_column: str # The column in the edge DataFrame that contains the target node ID.
25
- source_table: str # The DataFrame that contains the source nodes.
26
- target_table: str # The DataFrame that contains the target nodes.
27
- source_key: str # The column in the source table that contains the node ID.
28
- target_key: str # The column in the target table that contains the node ID.
29
- name: str | None = None # Descriptive name for the relation.
30
 
31
 
32
  @dataclasses.dataclass
@@ -34,7 +46,16 @@ class Bundle:
34
  """A collection of DataFrames and other data.
35
 
36
  Can efficiently represent a knowledge graph (homogeneous or heterogeneous) or tabular data.
37
- It can also carry other data, such as a trained model.
 
 
 
 
 
 
 
 
 
38
  """
39
 
40
  dfs: dict[str, pd.DataFrame] = dataclasses.field(default_factory=dict)
@@ -91,7 +112,10 @@ class Bundle:
91
  return graph
92
 
93
  def copy(self):
94
- """Returns a medium depth copy of the bundle. The Bundle is completely new, but the DataFrames and RelationDefinitions are shared."""
 
 
 
95
  return Bundle(
96
  dfs=dict(self.dfs),
97
  relations=list(self.relations),
 
17
 
18
  @dataclasses.dataclass
19
  class RelationDefinition:
20
+ """
21
+ Defines a set of edges.
22
+
23
+ Attributes:
24
+ df: The name of the DataFrame that contains the edges.
25
+ source_column: The column in the edge DataFrame that contains the source node ID.
26
+ target_column: The column in the edge DataFrame that contains the target node ID.
27
+ source_table: The name of the DataFrame that contains the source nodes.
28
+ target_table: The name of the DataFrame that contains the target nodes.
29
+ source_key: The column in the source table that contains the node ID.
30
+ target_key: The column in the target table that contains the node ID.
31
+ name: Descriptive name for the relation.
32
+ """
33
 
34
+ df: str
35
+ source_column: str
36
+ target_column: str
37
+ source_table: str
38
+ target_table: str
39
+ source_key: str
40
+ target_key: str
41
+ name: str | None = None
42
 
43
 
44
  @dataclasses.dataclass
 
46
  """A collection of DataFrames and other data.
47
 
48
  Can efficiently represent a knowledge graph (homogeneous or heterogeneous) or tabular data.
49
+
50
+ By convention, if it contains a single DataFrame, it is called `df`.
51
+ If it contains a homogeneous graph, it is represented as two DataFrames called `nodes` and
52
+ `edges`.
53
+
54
+ Attributes:
55
+ dfs: Named DataFrames.
56
+ relations: Metadata that describes the roles of each DataFrame.
57
+ Can be empty, if the bundle is just one or more DataFrames.
58
+ other: Other data, such as a trained model.
59
  """
60
 
61
  dfs: dict[str, pd.DataFrame] = dataclasses.field(default_factory=dict)
 
112
  return graph
113
 
114
  def copy(self):
115
+ """
116
+ Returns a shallow copy of the bundle. The Bundle and its containers are new, but
117
+ the DataFrames and RelationDefinitions are shared. (The contents of `other` are also shared.)
118
+ """
119
  return Bundle(
120
  dfs=dict(self.dfs),
121
  relations=list(self.relations),
lynxkite-graph-analytics/src/lynxkite_graph_analytics/lynxkite_ops.py CHANGED
@@ -312,9 +312,3 @@ def create_graph(bundle: core.Bundle, *, relations: str = None) -> core.Bundle:
312
  if not (relations is None or relations.strip() == ""):
313
  bundle.relations = [core.RelationDefinition(**r) for r in json.loads(relations).values()]
314
  return ops.Result(output=bundle, display=bundle.to_dict(limit=100))
315
-
316
-
317
- @op("Biomedical foundation graph (PLACEHOLDER)")
318
- def biomedical_foundation_graph(*, filter_nodes: str):
319
- """Loads the gigantic Lynx-maintained knowledge graph. Includes drugs, diseases, genes, proteins, etc."""
320
- return None
 
312
  if not (relations is None or relations.strip() == ""):
313
  bundle.relations = [core.RelationDefinition(**r) for r in json.loads(relations).values()]
314
  return ops.Result(output=bundle, display=bundle.to_dict(limit=100))
 
 
 
 
 
 
mkdocs.yml CHANGED
@@ -1,6 +1,25 @@
1
- site_name: "LynxKite"
2
- repo_url: https://github.com/lynxkite/lynxkite
3
- repo_name: lynxkite/lynxkite
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
  theme:
6
  name: "material"
@@ -13,13 +32,35 @@ theme:
13
  - navigation.path
14
  - navigation.instant
15
  - navigation.instant.prefetch
 
 
 
16
 
17
  extra_css:
18
  - stylesheets/extra.css
19
 
20
  plugins:
21
  - search
 
22
  - mkdocstrings:
23
  handlers:
24
  python:
25
  paths: ["./lynxkite-app/src", "./lynxkite-core/src", "./lynxkite-graph-analytics/src"]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ site_name: "LynxKite 2000:MM"
2
+ repo_url: https://github.com/lynxkite/lynxkite-2000
3
+ repo_name: lynxkite/lynxkite-2000
4
+ watch: [mkdocs.yml, README.md, lynxkite-core, lynxkite-graph-analytics, lynxkite-app]
5
+
6
+ nav:
7
+ - Home:
8
+ - Overview: index.md
9
+ - License: license.md
10
+ - Usage:
11
+ - usage/quickstart.md
12
+ - usage/plugins.md
13
+ - API reference:
14
+ - LynxKite Core:
15
+ - reference/lynxkite-core/ops.md
16
+ - reference/lynxkite-core/workspace.md
17
+ - Executors:
18
+ - reference/lynxkite-core/executors/simple.md
19
+ - reference/lynxkite-core/executors/one_by_one.md
20
+ - LynxKite Graph Analytics:
21
+ - reference/lynxkite-graph-analytics/core.md
22
+ - reference/lynxkite-graph-analytics/operations.md
23
 
24
  theme:
25
  name: "material"
 
32
  - navigation.path
33
  - navigation.instant
34
  - navigation.instant.prefetch
35
+ - navigation.footer
36
+ - content.code.annotate
37
+ - content.code.copy
38
 
39
  extra_css:
40
  - stylesheets/extra.css
41
 
42
  plugins:
43
  - search
44
+ - autorefs
45
  - mkdocstrings:
46
  handlers:
47
  python:
48
  paths: ["./lynxkite-app/src", "./lynxkite-core/src", "./lynxkite-graph-analytics/src"]
49
+ options:
50
+ show_source: false
51
+ show_symbol_type_heading: true
52
+ show_symbol_type_toc: true
53
+ docstring_section_style: spacy
54
+ separate_signature: true
55
+ show_signature_annotations: true
56
+ signature_crossrefs: true
57
+ markdown_extensions:
58
+ - pymdownx.highlight:
59
+ anchor_linenums: true
60
+ line_spans: __span
61
+ pygments_lang_class: true
62
+ - pymdownx.inlinehilite
63
+ - pymdownx.snippets
64
+ - pymdownx.superfences
65
+ - toc:
66
+ permalink: "¤"