SamiKoen Claude commited on
Commit
7b80b3d
·
1 Parent(s): eb0b385

fix: Complete Turkish character normalization

Browse files

- Fixed İ character handling (must normalize BEFORE lower())
- Added all uppercase Turkish characters to normalization
- Added GPT-5 prompt instructions for Turkish characters
- Now all variations work: MARLİN, MARLIN, marlin, Marlin

All return correct price: 53,000 TL

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

__pycache__/smart_warehouse_with_price.cpython-312.pyc CHANGED
Binary files a/__pycache__/smart_warehouse_with_price.cpython-312.pyc and b/__pycache__/smart_warehouse_with_price.cpython-312.pyc differ
 
smart_warehouse_with_price.py CHANGED
@@ -49,15 +49,26 @@ def get_product_price_and_link(product_name, variant=None):
49
 
50
  root = ET.fromstring(xml_content)
51
 
52
- # Normalize search terms
53
- search_name = product_name.lower()
54
- search_variant = variant.lower() if variant else ""
 
 
 
 
 
 
55
 
56
- # Turkish character normalization
57
- tr_map = {'ı': 'i', 'ğ': 'g', 'ü': 'u', 'ş': 's', 'ö': 'o', 'ç': 'c'}
 
58
  for tr, en in tr_map.items():
59
- search_name = search_name.replace(tr, en)
60
- search_variant = search_variant.replace(tr, en)
 
 
 
 
61
 
62
  best_match = None
63
  best_score = 0
@@ -303,6 +314,11 @@ If user asks about specific size (S, M, L, XL, XXL, SMALL, MEDIUM, LARGE, X-LARG
303
  If user asks generally (without size), return ALL variants of the product.
304
  {warehouse_filter}
305
 
 
 
 
 
 
306
  IMPORTANT BRAND AND PRODUCT TYPE RULES:
307
  - GOBIK: Spanish textile brand we import. When user asks about "gobik", return ALL products with "GOBIK" in the name.
308
  - Product names contain type information: FORMA (jersey/cycling shirt), TAYT (tights), İÇLİK (base layer), YAĞMURLUK (raincoat), etc.
 
49
 
50
  root = ET.fromstring(xml_content)
51
 
52
+ # Turkish character normalization FIRST (before lower())
53
+ tr_map = {
54
+ 'İ': 'i', 'I': 'i', 'ı': 'i', # All I variations to i
55
+ 'Ğ': 'g', 'ğ': 'g',
56
+ 'Ü': 'u', 'ü': 'u',
57
+ 'Ş': 's', 'ş': 's',
58
+ 'Ö': 'o', 'ö': 'o',
59
+ 'Ç': 'c', 'ç': 'c'
60
+ }
61
 
62
+ # Apply normalization to original (before lower)
63
+ search_name_normalized = product_name
64
+ search_variant_normalized = variant if variant else ""
65
  for tr, en in tr_map.items():
66
+ search_name_normalized = search_name_normalized.replace(tr, en)
67
+ search_variant_normalized = search_variant_normalized.replace(tr, en)
68
+
69
+ # Now lowercase
70
+ search_name = search_name_normalized.lower()
71
+ search_variant = search_variant_normalized.lower()
72
 
73
  best_match = None
74
  best_score = 0
 
314
  If user asks generally (without size), return ALL variants of the product.
315
  {warehouse_filter}
316
 
317
+ CRITICAL TURKISH CHARACTER RULES:
318
+ - "MARLIN" and "MARLİN" are the SAME product (Turkish İ vs I)
319
+ - Treat these as equivalent: I/İ/ı, Ö/ö, Ü/ü, Ş/ş, Ğ/ğ, Ç/ç
320
+ - If user writes "Marlin", also match "MARLİN" in the list
321
+
322
  IMPORTANT BRAND AND PRODUCT TYPE RULES:
323
  - GOBIK: Spanish textile brand we import. When user asks about "gobik", return ALL products with "GOBIK" in the name.
324
  - Product names contain type information: FORMA (jersey/cycling shirt), TAYT (tights), İÇLİK (base layer), YAĞMURLUK (raincoat), etc.