File size: 3,157 Bytes
af60005
2008306
af60005
 
 
 
6597142
af60005
 
2008306
f2de1ca
af60005
 
f2de1ca
 
 
 
 
 
 
 
 
 
 
9a2f825
f2de1ca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9a5d5cb
f2de1ca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9f01a90
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
---
title: Neochar
emoji: 🖼
colorFrom: purple
colorTo: red
sdk: gradio
sdk_version: 5.25.0
app_file: app.py
pinned: false
license: openrail
short_description: Unwritten Chinese Charecters in Style
---

# What is this?

Generate New Characters by combining parts in creative ways. Write them in a controlled style.

- Inspired by
  - Lin Yutang's [Ming-Kwai typewriter](https://en.wikipedia.org/wiki/Chinese_typewriter#MingKwai_design) 
  - Wu Yue's [Glyffuser](https://yue-here.com/posts/glyffuser/)

# Why

- Fun to generate valid but unseen characters. (Never in a dictionary, nor Unicode).
- Implements Lin Yutang's ideas with generative AI/ML, without the mechanical marvel :-/ or limitations :-)
- Extends a font to support new charsets, and beyond to non-existent chars.
- Adds variation/diversity/personality to generated images. No boring duplicates from the same char.
- Other [Creative Uses](#creative-uses)

# How to use this app
- Combine components or radicals in the following way
- Specify the 'Structure' and 'Components', in a [Polish Notation](https://en.wikipedia.org/wiki/Polish_notation) fashion - Good for tree structures
  - ⿰: 'LR' Left-Rigth
  - ⿱: 'TB' Top-Bottom
  - ⿸: 'TL' Top-Left
  - ⿹: 'TR' Top-Right
  - ⿺: 'BL' Bottom-Left
  - ⿴: 'OI' Outer-Inner
  - ⿻: 'OV' Overlap
  - ⿲: 'LMR' Left-Middle-Right
  - ⿳: 'TMB' Top-Middle-Bottom
  - ⿵: 'BT' Bottom Open Enclosure
  - ⿶: 'CT' Top Open Enclosure
  - ⿷: 'RT' Right Open Enclosure
- Select a 'Style' by clicking the sample images
- Hit the 'Generate' button
- Repeat

# Usage Tips
- Simple structures work best (⿰ ⿱ ⿴ etc.)
- "Known radicals at seen positions" work best (釒on left better than right, but may also surprise you in a good way)
- Noto font family (sans and serif) gives the best results, as there are many training examples
- Cursive and handwritten styles usually give good results, as they are more tolerant
- Fonts supporting less chars are challenging
  - Current model was trained with 300k samples for only 20 epochs
  - Training will continue if this app gets attention or likes

- For dictionary chars, [decompose](https://github.com/cburgmer/cjklib/blob/master/cjklib/data/characterdecomposition.csv) first.
- For a part hard to describe, or you don't care, use a wildcard '?' (full-width question mark, or does it matter?)

- What to do when the results are not as expected
  - Pick a different 'sytle' which may have trained the model better
  - Try again with a different random seed. This will change the overall structure in an unpredictable way
  - Try again with a different 'step' number. This will change the local details in a continuous way

# Creative Uses
## Turning a bug into a feature
When you see a funny result you didn't expect (5 or 3 dots while it should be 4), don't throw it away immediately.
- Save the results to confuse/train OCR
- 3vade 3vil c3nsorship
- Share in discussion. The input text/seed/step will reliably reproduce the result.

# Future Features
- Typewriter keyboard for hard-to-input radicals, filtered by pinyin prefix
- Direct generation from a single char, auto decomposition