GenAIDevTOProd commited on
Commit
c6d1709
·
verified ·
1 Parent(s): 8358600

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -0
README.md CHANGED
@@ -7,4 +7,45 @@ sdk: static
7
  pinned: false
8
  ---
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
7
  pinned: false
8
  ---
9
 
10
+ # 🕶️ anonyspark
11
+
12
+ `anonyspark` is a lightweight Python package for schema-driven **data masking and anonymization** in **PySpark DataFrames**. Designed for ML engineers, data analysts, and compliance teams working with sensitive data in big data environments, it helps enforce **data privacy**, **PII redaction**, and **regulatory compliance** (e.g., HIPAA, GDPR).
13
+
14
+ ---
15
+
16
+ ## Motivation
17
+
18
+ In enterprise data pipelines, personally identifiable information (PII) and sensitive fields are often left exposed in logs, training data, or staging zones. `anonyspark` solves this by enabling **deterministic and schema-aware masking** of such fields **directly in Spark**, without leaving the distributed environment.
19
+
20
+ ---
21
+
22
+ ## Key Features
23
+
24
+ - **Schema-driven masking** based on column types or names
25
+ - Supports **regex**, **nulling**, **hashing**, or **custom UDF-based** masking
26
+ - Designed for **PySpark DataFrames**, not pandas
27
+ - Lightweight, dependency-free, and easy to integrate
28
+ - CLI-ready for pipeline integration (coming soon)
29
+
30
+ ---
31
+
32
+ ## Use Cases
33
+
34
+ - Mask PII fields in ETL pipelines before storage or ML training
35
+ - Anonymize user data before model sharing or analytics
36
+ - Simulate production-like data in dev/test environments
37
+ - Help comply with HIPAA, GDPR, and internal audit policies
38
+
39
+ ---
40
+
41
+ ## Installation
42
+
43
+ ```bash
44
+ pip install anonyspark
45
+
46
+ PyPi link: https://pypi.org/project/anonyspark/
47
+
48
+ License: MIT License
49
+
50
+
51
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference