Files
hotels/check_crawler.py
Фёдор 684fada337 🚀 Full project sync: Hotels RAG & Audit System
 Major Features:
- Complete RAG system for hotel website analysis
- Hybrid audit with BGE-M3 embeddings + Natasha NER
- Universal horizontal Excel reports with dashboards
- Multi-region processing (SPb, Orel, Chukotka, Kamchatka)

📊 Completed Regions:
- Орловская область: 100% (36/36)
- Чукотский АО: 100% (4/4)
- г. Санкт-Петербург: 93% (893/960)
- Камчатский край: 87% (89/102)

🔧 Infrastructure:
- PostgreSQL with pgvector extension
- BGE-M3 embeddings API
- Browserless for web scraping
- N8N workflows for automation
- S3/Nextcloud file storage

📝 Documentation:
- Complete DB schemas
- API documentation
- Setup guides
- Status reports
2025-10-27 22:49:42 +03:00

49 lines
1.5 KiB
Python

#!/usr/bin/env python3
import subprocess
import glob
import os
# Проверяем процессы
print("🔍 АКТИВНЫЕ ПРОЦЕССЫ КРАУЛЕРА:\n")
try:
result = subprocess.run(['ps', 'aux'], capture_output=True, text=True)
for line in result.stdout.split('\n'):
if 'mass_crawler.py' in line and 'grep' not in line:
print(f" {line}")
except:
print(" ❌ Ошибка проверки процессов")
# Проверяем логи
print("\n📄 ФАЙЛЫ ЛОГОВ КРАУЛЕРА:\n")
log_files = glob.glob('/root/engine/public_oversight/hotels/mass_crawler_*.log')
log_files.sort(key=os.path.getmtime, reverse=True)
for i, log_file in enumerate(log_files[:5]):
size = os.path.getsize(log_file) / 1024 # KB
mtime = os.path.getmtime(log_file)
from datetime import datetime
mod_time = datetime.fromtimestamp(mtime).strftime('%Y-%m-%d %H:%M:%S')
print(f" {i+1}. {os.path.basename(log_file)}")
print(f" Размер: {size:.1f} KB")
print(f" Изменён: {mod_time}")
# Читаем последние строки
try:
with open(log_file, 'r') as f:
lines = f.readlines()
if lines:
print(f" Строк: {len(lines)}")
# Последние 3 строки
for line in lines[-3:]:
line = line.strip()
if line:
print(f" {line[:80]}...")
except:
pass
print()