NeuroLogX is an intelligent log analysis platform that stores and indexes logs using Apache Lucene, then applies deep learning models (CNNs, Transformers) to classify, detect anomalies, and extract insights from those logs. The project simulates a real AIOps system with log ingestion, storage, and neural-powered analytics.
Component | Tech Stack |
---|---|
Log Storage/Search | Apache Lucene |
Log Generator | Python + Faker + Simulated Errors |
Log Classifier | CNN / LSTM (Keras) |
Dashboard | Streamlit for visual insights |
neurologx/
├── lucene_indexer/ # Java/Python code for indexing/searching logs
│ ├── index # Index logs
│ └── lib # Lucene 10.2.0 jar files
├── data/ # Logs and model-ready datasets
├── dashboard/ # Visualization tool
│ ├── dashboard.py
├── utils/
│ ├── log_generator.py # Python + Faker + Simulated Errors
│ └── log_classifier.py # CNN / LSTM (Keras)
└── README.md
-
📝 Simulate Logs
Use Python (Faker
,random
,datetime
) to generate synthetic logs:- Log levels: 'INFO', 'DEBUG', 'WARN', 'ERROR', 'CRITICAL'
- Components: 'AuthService', 'DBService', 'Network', 'Cache', 'APIGateway'
- Anomaly_phrases = 'Segmentation fault', 'OutOfMemoryError', 'Connection timed out', 'Database locked', 'Permission denied', 'System overheating'
- Inject random error patterns to create labeled anomalies
-
📦 Store Logs in Lucene
- Write a simple Java script to index logs with Lucene
- Each log is a document with fields: timestamp, level, component, message, label
-
🔍 Query Logs with Lucene
- Search logs using keywords or time ranges (Default searches all)
- Extract logs for training/testing to a csv file
-
🤖 Apply Deep Learning
- Preprocess logs (tokenize, pad, embed)
- Train:
- CNN/LSTM classifier to build a deep learning classifier to predict log categories
-
📊 Dashboard (Optional)
- Show:
-
Sample Predictions
-
Label Distribution
-
Classifier to Predict Logs
-
- Show:
- Predict whether a new log is likely an error or critical issue
- Automatically highlight anomalous logs from thousands of entries
- Search log events related to specific failures
- Visualize log distribution over time and subsystems
- Use FAISS or vector DB to store BERT embeddings of logs for semantic search
- Integrate OpenAI/LLM to summarize patterns in recent logs
- Auto-generate incident reports based on clusters of log anomalies
- ✅ Uses Lucene (full-text search DB) like in observability tools
- ✅ Demonstrates deep learning for NLP/log classification
- ✅ Simulates anomaly detection, root cause, and reporting
- ✅ Involves data pipelines, ML deployment, and dashboards
- ✅ Can integrate with cloud, AIOps, and IT ops data
python lucene/utils/log_generator.py
cd lucene_indexer
javac -cp "lib/*" LuceneLogIndexer.java
java -cp ".;lib/*" LuceneLogIndexer
javac -cp "lib/*" LuceneLogExporter.java
java -cp ".;lib/*" LuceneLogExporter
cd ..
python lucene/utils/log_classifier.py
cd dashboard
python -m streamlit run dashboard.py
➡️ Open your browser and go to: http://localhost:8501
Copy and paste logs in the dashboard input area:
{"timestamp": "2025-04-16 10:34:35", "level": "INFO", "component": "Cache", "message": "Listen current most ok."}
{"timestamp": "2025-04-16 11:03:46", "level": "ERROR", "component": "AuthService", "message": "System overheating"}
Zeliha Ural Merpez