🚀 Full project sync: Hotels RAG & Audit System

 Major Features:
- Complete RAG system for hotel website analysis
- Hybrid audit with BGE-M3 embeddings + Natasha NER
- Universal horizontal Excel reports with dashboards
- Multi-region processing (SPb, Orel, Chukotka, Kamchatka)

📊 Completed Regions:
- Орловская область: 100% (36/36)
- Чукотский АО: 100% (4/4)
- г. Санкт-Петербург: 93% (893/960)
- Камчатский край: 87% (89/102)

🔧 Infrastructure:
- PostgreSQL with pgvector extension
- BGE-M3 embeddings API
- Browserless for web scraping
- N8N workflows for automation
- S3/Nextcloud file storage

📝 Documentation:
- Complete DB schemas
- API documentation
- Setup guides
- Status reports
This commit is contained in:
Фёдор
2025-10-27 22:49:42 +03:00
parent 0cf3297290
commit 684fada337
94 changed files with 14891 additions and 911 deletions

59
check_report_status.py Normal file
View File

@@ -0,0 +1,59 @@
import psycopg2
from urllib.parse import unquote
import json
conn = psycopg2.connect(
host='147.45.189.234',
port=5432,
database='default_db',
user='gen_user',
password=unquote('2~~9_%5EkVsU%3F2%5CS')
)
cur = conn.cursor()
print("\n📊 АКТУАЛЬНАЯ ИНФОРМАЦИЯ ПО ОТЧЕТАМ:\n")
# Проверяем какие версии аудита есть
cur.execute("""
SELECT audit_version, COUNT(*) as count
FROM hotel_audit_results
GROUP BY audit_version
ORDER BY audit_version
""")
print("📋 Версии аудита в базе:")
for row in cur.fetchall():
print(f" - {row[0]}: {row[1]} отелей")
# Проверяем по регионам для v1.0_with_rkn
cur.execute("""
SELECT h.region_name, COUNT(*) as count
FROM hotel_audit_results ar
JOIN hotel_main h ON ar.hotel_id = h.id
WHERE ar.audit_version = 'v1.0_with_rkn'
GROUP BY h.region_name
ORDER BY count DESC
""")
print("\n🌍 Регионы с аудитом v1.0_with_rkn:")
for row in cur.fetchall():
print(f" - {row[0]}: {row[1]} отелей")
# Проверяем структуру для Чукотки
cur.execute("""
SELECT h.full_name, ar.score_percentage, ar.criteria_results
FROM hotel_audit_results ar
JOIN hotel_main h ON ar.hotel_id = h.id
WHERE ar.audit_version = 'v1.0_with_rkn'
AND h.region_name = 'Чукотский автономный округ'
LIMIT 1
""")
result = cur.fetchone()
if result:
print(f"\n📝 Пример отеля из Чукотки:")
print(f" Название: {result[0]}")
print(f" Балл: {result[1]}%")
criteria = result[2]
if isinstance(criteria, dict):
print(f" Критериев: {len(criteria.keys())}")
print(f" Ключи: {', '.join(sorted(criteria.keys())[:5])}...")
conn.close()