diff --git a/docs/tips-and-tricks/community-wisdom.md b/docs/tips-and-tricks/community-wisdom.md new file mode 100644 index 00000000000..ce41c2efd49 --- /dev/null +++ b/docs/tips-and-tricks/community-wisdom.md @@ -0,0 +1,42 @@ +--- +sidebar_position: 1 +slug: /tips-and-tricks/community-wisdom +sidebar_label: 'Community Wisdom' +doc_type: 'overview' +keywords: [ + 'database tips', + 'community wisdom', + 'production troubleshooting', + 'performance optimization', + 'database debugging', + 'clickhouse guides', + 'real world examples', + 'database best practices', + 'meetup insights', + 'production lessons', + 'interactive tutorials', + 'database solutions' +] +title: 'ClickHouse community wisdom' +description: 'Learn from the ClickHouse community with real world scenarios and lessons learned' +--- + +# ClickHouse community wisdom: tips and tricks from meetups {#community-wisdom} + +*These interactive guides represent collective wisdom from hundreds of production deployments. Each runnable example helps you understand ClickHouse patterns using real GitHub events data - practice these concepts to avoid common mistakes and accelerate your success.* + +Combine this collected knowledge with our [Best Practices](/best-practices) guide for optimal ClickHouse Experience. + +## Problem-specific quick jumps {#problem-specific-quick-jumps} + +| Issue | Document | Description | +|-------|---------|-------------| +| **Production issue** | [Debugging insights](./debugging-insights.md) | Community production debugging tips | +| **Slow queries** | [Performance optimization](./performance-optimization.md) | Optimize Performance | +| **Materialized views** | [MV double-edged sword](./materialized-views.md) | Avoid 10x storage instances | +| **Too many parts** | [Too many parts](./too-many-parts.md) | Addressing the 'Too Many Parts' error and performance slowdown | +| **High costs** | [Cost optimization](./cost-optimization.md) | Optimize Cost | +| **Success stories** | [Success stories](./success-stories.md) | Examples of ClickHouse in successful use cases | + +**Last Updated:** Based on community meetup insights through 2024-2025 +**Contributing:** Found a mistake or have a new lesson? Community contributions welcome \ No newline at end of file diff --git a/docs/tips-and-tricks/cost-optimization.md b/docs/tips-and-tricks/cost-optimization.md new file mode 100644 index 00000000000..a302275e4a5 --- /dev/null +++ b/docs/tips-and-tricks/cost-optimization.md @@ -0,0 +1,94 @@ +--- +sidebar_position: 1 +slug: /community-wisdom/cost-optimization +sidebar_label: 'Cost Optimization' +doc_type: 'how-to-guide' +keywords: [ + 'cost optimization', + 'storage costs', + 'partition management', + 'data retention', + 'storage analysis', + 'database optimization', + 'clickhouse cost reduction', + 'storage hot spots', + 'ttl performance', + 'disk usage', + 'compression strategies', + 'retention analysis' +] +title: 'Lessons - cost optimization' +description: 'Cost optimization strategies from ClickHouse community meetups with real production examples and verified techniques.' +--- + +# Cost optimization: strategies from the community {#cost-optimization} +*This guide is part of a collection of findings gained from community meetups. The findings on this page cover community wisdom related to optimizing cost while using ClickHouse that worked well for their specific experience and setup. For more real world solutions and insights you can [browse by specific problem](./community-wisdom.md).* + +*Learn about how [ClickHouse Cloud can help manage operational costs](/cloud/overview)*. + +## Compression strategy: LZ4 vs ZSTD in production {#compression-strategy} + +When Microsoft Clarity needed to handle hundreds of terabytes of data, they discovered that compression choices have dramatic cost implications. At their scale, every bit of storage savings matters, and they faced a classic trade-off: performance versus storage costs. Microsoft Clarity handles massive volumes—two petabytes of uncompressed data per month across all accounts, processing around 60,000 queries per hour across eight nodes and serving billions of page views from millions of websites. At this scale, compression strategy becomes a critical cost factor. + +They initially used ClickHouse's default [LZ4](/sql-reference/statements/create/table#lz4) compression but discovered significant cost savings were possible with [ZSTD](/sql-reference/statements/create/table#zstd). While LZ4 is faster, ZSTD provides better compression at the cost of slightly slower performance. After testing both approaches, they made a strategic decision to prioritize storage savings. The results were significant: 50% storage savings on large tables with manageable performance impact on ingestion and queries. + +**Key results:** +- 50% storage savings on large tables through ZSTD compression +- 2 petabytes monthly data processing capacity +- Manageable performance impact on ingestion and queries +- Significant cost reduction at hundreds of TB scale + +## Column-based retention strategy {#column-retention} + +One of the most powerful cost optimization techniques comes from analyzing which columns are actually being used. Microsoft Clarity implements sophisticated column-based retention strategies using ClickHouse's built-in telemetry capabilities. ClickHouse provides detailed metrics on storage usage by column as well as comprehensive query patterns: which columns are accessed, how frequently, query duration, and overall usage statistics. + +This data-driven approach enables strategic decisions about retention policies and column lifecycle management. By analyzing this telemetry data, Microsoft can identify storage hot spots - columns that consume significant space but receive minimal queries. For these low-usage columns, they can implement aggressive retention policies, reducing storage time from 30 months to just one month, or delete the columns entirely if they're not queried at all. This selective retention strategy reduces storage costs without impacting user experience. + +**The strategy:** +- Analyze column usage patterns using ClickHouse telemetry +- Identify high-storage, low-query columns +- Implement selective retention policies +- Monitor query patterns for data-driven decisions + +**Related docs** +- [Managing Data - Column Level TTL](/observability/managing-data) + +## Partition-based data management {#partition-management} + +Microsoft Clarity discovered that partitioning strategy impacts both performance and operational simplicity. Their approach: partition by date, order by hour. This strategy delivers multiple benefits beyond just cleanup efficiency—it enables trivial data cleanup, simplifies billing calculations for their customer-facing service, and supports GDPR compliance requirements for row-based deletion. + +**Key benefits:** +- Trivial data cleanup (drop partition vs row-by-row deletion) +- Simplified billing calculations +- Better query performance through partition elimination +- Easier operational management + +**Related docs** +- [Managing Data - Partitions](/observability/managing-data#partitions) + +## String-to-integer conversion strategy {#string-integer-conversion} + +Analytics platforms often face a storage challenge with categorical data that appears repeatedly across millions of rows. Microsoft's engineering team encountered this problem with their search analytics data and developed an effective solution that achieved 60% storage reduction on affected datasets. + +In Microsoft's web analytics system, search results trigger different types of answers - weather cards, sports information, news articles, and factual responses. Each query result was tagged with descriptive strings like "weather_answer," "sports_answer," or "factual_answer." With billions of search queries processed, these string values were being stored repeatedly in ClickHouse, consuming massive amounts of storage space and requiring expensive string comparisons during queries. + +Microsoft implemented a string-to-integer mapping system using a separate MySQL database. Instead of storing the actual strings in ClickHouse, they store only integer IDs. When users run queries through the UI and request data for `weather_answer`, their query optimizer first consults the MySQL mapping table to get the corresponding integer ID, then converts the query to use that integer before sending it to ClickHouse. + +This architecture preserves the user experience - people still see meaningful labels like `weather_answer` in their dashboards - while the backend storage and queries operate on much more efficient integers. The mapping system handles all translation transparently, requiring no changes to the user interface or user workflows. + +**Key benefits:** +- 60% storage reduction on affected datasets +- Faster query performance on integer comparisons +- Reduced memory usage for joins and aggregations +- Lower network transfer costs for large result sets + +:::note +This is a an example specifically used for Microsoft Clarity's data scenario. If you have all your data in ClickHouse or do not have constraints against moving data to ClickHouse, try using [dictionaries](/dictionary) instead. +::: + +## Video sources {#video-sources} + +- **[Microsoft Clarity and ClickHouse](https://www.youtube.com/watch?v=rUVZlquVGw0)** - Microsoft Clarity Team +- **[ClickHouse journey in Contentsquare](https://www.youtube.com/watch?v=zvuCBAl2T0Q)** - Doron Hoffman & Guram Sigua (ContentSquare) + +*These community cost optimization insights represent strategies from companies processing hundreds of terabytes to petabytes of data, showing real-world approaches to reducing ClickHouse operational costs.* \ No newline at end of file diff --git a/docs/tips-and-tricks/debugging-insights.md b/docs/tips-and-tricks/debugging-insights.md new file mode 100644 index 00000000000..4dc45937519 --- /dev/null +++ b/docs/tips-and-tricks/debugging-insights.md @@ -0,0 +1,175 @@ +--- +sidebar_position: 1 +slug: /community-wisdom/debugging-insights +sidebar_label: 'Debugging Insights' +doc_type: 'how-to-guide' +keywords: [ + 'clickhouse troubleshooting', + 'clickhouse errors', + 'slow queries', + 'memory problems', + 'connection issues', + 'performance optimization', + 'database errors', + 'configuration problems', + 'debug', + 'solutions' +] +title: 'Lessons - debugging insights' +description: 'Find solutions to the most common ClickHouse problems including slow queries, memory errors, connection issues, and configuration problems.' +--- + +# ClickHouse operations: community debugging insights {#clickhouse-operations-community-debugging-insights} +*This guide is part of a collection of findings gained from community meetups. For more real world solutions and insights you can [browse by specific problem](./community-wisdom.md).* +*Suffering from high operational costs? Check out the [Cost Optimization](./cost-optimization.md) community insights guide.* + +## Essential system tables {#essential-system-tables} + +These system tables are fundamental for production debugging: + +### system.errors {#system-errors} + +Shows all active errors in your ClickHouse instance. + +```sql +SELECT name, value, changed +FROM system.errors +WHERE value > 0 +ORDER BY value DESC; +``` + +### system.replicas {#system-replicas} + +Contains replication lag and status information for monitoring cluster health. + +```sql +SELECT database, table, replica_name, absolute_delay, queue_size, inserts_in_queue +FROM system.replicas +WHERE absolute_delay > 60 +ORDER BY absolute_delay DESC; +``` + +### system.replication_queue {#system-replication-queue} + +Provides detailed information for diagnosing replication problems. + +```sql +SELECT database, table, replica_name, position, type, create_time, last_exception +FROM system.replication_queue +WHERE last_exception != '' +ORDER BY create_time DESC; +``` + +### system.merges {#system-merges} + +Shows current merge operations and can identify stuck processes. + +```sql +SELECT database, table, elapsed, progress, is_mutation, total_size_bytes_compressed +FROM system.merges +ORDER BY elapsed DESC; +``` + +### system.parts {#system-parts} + +Essential for monitoring part counts and identifying fragmentation issues. + +```sql +SELECT database, table, count() as part_count +FROM system.parts +WHERE active = 1 +GROUP BY database, table +ORDER BY count() DESC; +``` + +## Common production issues {#common-production-issues} + +### Disk space problems {#disk-space-problems} + +Disk space exhaustion in replicated setups creates cascading problems. When one node runs out of space, other nodes continue trying to sync with it, causing network traffic spikes and confusing symptoms. One community member spent 4 hours debugging what was simply low disk space. Check out this [query](/knowledgebase/useful-queries-for-troubleshooting#show-disk-storage-number-of-parts-number-of-rows-in-systemparts-and-marks-across-databases) to monitor your disk storage on a particular cluster. + +AWS users should be aware that default general purpose EBS volumes have a 16TB limit. + +### Too many parts error {#too-many-parts-error} + +Small frequent inserts create performance problems. The community has identified that insert rates above 10 per second often trigger "too many parts" errors because ClickHouse cannot merge parts fast enough. + +**Solutions:** +- Batch data using 30-second or 200MB thresholds +- Enable async_insert for automatic batching +- Use buffer tables for server-side batching +- Configure Kafka for controlled batch sizes + +[Official recommendation](/best-practices/selecting-an-insert-strategy#batch-inserts-if-synchronous): minimum 1,000 rows per insert, ideally 10,000 to 100,000. + +### Invalid timestamps issues {#data-quality-issues} + +Applications that send data with arbitrary timestamps create partition problems. This leads to partitions with data from unrealistic dates (like 1998 or 2050), causing unexpected storage behavior. + +### `ALTER` operation risks {#alter-operation-risks} + +Large `ALTER` operations on multi-terabyte tables can consume significant resources and potentially lock databases. One community example involved changing an Integer to a Float on 14TB of data, which locked the entire database and required rebuilding from backups. + +**Monitor expensive mutations:** + +```sql +SELECT database, table, mutation_id, command, parts_to_do, is_done +FROM system.mutations +WHERE is_done = 0; +``` + +Test schema changes on smaller datasets first. + +## Memory and performance {#memory-and-performance} + +### External aggregation {#external-aggregation} + +Enable external aggregation for memory-intensive operations. It's slower but prevents out-of-memory crashes by spilling to disk. You can do this by using `max_bytes_before_external_group_by` which will help prevent out of memory crashes on large `GROUP BY` operations. You can learn more about this setting [here](/operations/settings/settings#max_bytes_before_external_group_by). + +```sql +SELECT + column1, + column2, + COUNT(*) as count, + SUM(value) as total +FROM large_table +GROUP BY column1, column2 +SETTINGS max_bytes_before_external_group_by = 1000000000; -- 1GB threshold +``` + +### Async insert details {#async-insert-details} + +Async insert automatically batches small inserts server-side to improve performance. You can configure whether to wait for data to be written to disk before returning acknowledgment - immediate return is faster but less durable. Modern versions support deduplication to handle duplicate data within batches. + +**Related docs** +- [Selecting an insert strategy](/best-practices/selecting-an-insert-strategy#asynchronous-inserts) + +### Distributed table configuration {#distributed-table-configuration} + +By default, distributed tables use single-threaded inserts. Enable `insert_distributed_sync` for parallel processing and immediate data sending to shards. + +Monitor temporary data accumulation when using distributed tables. + +### Performance monitoring thresholds {#performance-monitoring-thresholds} + +Community-recommended monitoring thresholds: +- Parts per partition: preferably less than 100 +- Delayed inserts: should stay at zero +- Insert rate: limit to about 1 per second for optimal performance + +**Related docs** +- [Custom partitioning key](/engines/table-engines/mergetree-family/custom-partitioning-key) + +## Quick reference {#quick-reference} + +| Issue | Detection | Solution | +|-------|-----------|----------| +| Disk Space | Check `system.parts` total bytes | Monitor usage, plan scaling | +| Too Many Parts | Count parts per table | Batch inserts, enable async_insert | +| Replication Lag | Check `system.replicas` delay | Monitor network, restart replicas | +| Bad Data | Validate partition dates | Implement timestamp validation | +| Stuck Mutations | Check `system.mutations` status | Test on small data first | + +### Video sources {#video-sources} +- [10 Lessons from Operating ClickHouse](https://www.youtube.com/watch?v=liTgGiTuhJE) +- [Fast, Concurrent, and Consistent Asynchronous INSERTS in ClickHouse](https://www.youtube.com/watch?v=AsMPEfN5QtM) \ No newline at end of file diff --git a/docs/tips-and-tricks/materialized-views.md b/docs/tips-and-tricks/materialized-views.md new file mode 100644 index 00000000000..38bc0d32d5b --- /dev/null +++ b/docs/tips-and-tricks/materialized-views.md @@ -0,0 +1,69 @@ +--- +sidebar_position: 1 +slug: /tips-and-tricks/materialized-views +sidebar_label: 'Materialized Views' +doc_type: 'how-to' +keywords: [ + 'clickhouse materialized views', + 'materialized view optimization', + 'materialized view storage issues', + 'materialized view best practices', + 'database aggregation patterns', + 'materialized view anti-patterns', + 'storage explosion problems', + 'materialized view performance', + 'database view optimization', + 'aggregation strategy', + 'materialized view troubleshooting', + 'view storage overhead' +] +title: 'Lessons - materialized views' +description: 'Real world examples of materialized views, problems and solutions' +--- + +# Materialized views: how they can become a double edged sword {#materialized-views-the-double-edged-sword} + +*This guide is part of a collection of findings gained from community meetups. For more real world solutions and insights you can [browse by specific problem](./community-wisdom.md).* +*Too many parts bogging your database down? Check out the [Too Many Parts](./too-many-parts.md) community insights guide.* +*Learn more about [Materialized Views](/materialized-views).* + +## The 10x storage anti-pattern {#storage-antipattern} + +**Real production problem:** *"We had a materialized view. The raw log table was around 20 gig but the view from that log table exploded to 190 gig, so almost 10x the size of the raw table. This happened because we were creating one row per attribute and each log can have 10 attributes."* + +**Rule:** If your `GROUP BY` creates more rows than it eliminates, you're building an expensive index, not a materialized view. + +## Production materialized view health validation {#mv-health-validation} + +This query helps you predict whether a materialized view will compress or explode your data before you create it. Run it against your actual table and columns to avoid the "190GB explosion" scenario. + +**What it shows:** +- **Low aggregation ratio** (\<10%) = Good MV, significant compression +- **High aggregation ratio** (\>70%) = Bad MV, storage explosion risk +- **Storage multiplier** = How much bigger/smaller your MV will be + +```sql +-- Replace with your actual table and columns +SELECT + count() as total_rows, + uniq(your_group_by_columns) as unique_combinations, + round(uniq(your_group_by_columns) / count() * 100, 2) as aggregation_ratio +FROM your_table +WHERE your_filter_conditions; + +-- If aggregation_ratio > 70%, reconsider your MV design +-- If aggregation_ratio < 10%, you'll get good compression +``` + +## When materialized views become a problem {#mv-problems} + +**Warning signs to monitor:** +- Insert latency increases (queries that took 10ms now take 100ms+) +- "Too many parts" errors appearing more frequently +- CPU spikes during insert operations +- Insert timeouts that didn't happen before + +You can compare insert performance before and after adding MVs using `system.query_log` to track query duration trends. + +## Video sources {#video-sources} +- [ClickHouse at CommonRoom - Kirill Sapchuk](https://www.youtube.com/watch?v=liTgGiTuhJE) - Source of the "over enthusiastic about materialized views" and "20GB→190GB explosion" case study \ No newline at end of file diff --git a/docs/tips-and-tricks/performance-optimization.md b/docs/tips-and-tricks/performance-optimization.md new file mode 100644 index 00000000000..d0e39924ba0 --- /dev/null +++ b/docs/tips-and-tricks/performance-optimization.md @@ -0,0 +1,139 @@ +--- +sidebar_position: 1 +slug: /community-wisdom/performance-optimization +sidebar_label: 'Performance Optimization' +doc_type: 'how-to-guide' +keywords: [ + 'performance optimization', + 'query performance', + 'database tuning', + 'slow queries', + 'memory optimization', + 'cardinality analysis', + 'indexing strategies', + 'aggregation optimization', + 'sampling techniques', + 'database performance', + 'query analysis', + 'performance troubleshooting' +] +title: 'Lessons - performance optimization' +description: 'Real world examples of performance optimization strategies' +--- + +# Performance optimization: community tested strategies {#performance-optimization} +*This guide is part of a collection of findings gained from community meetups. For more real world solutions and insights you can [browse by specific problem](./community-wisdom.md).* +*Having trouble with Materialized Views? Check out the [Materialized Views](./materialized-views.md) community insights guide.* +*If you're experiencing slow queries and want more examples, we also have a [Query Optimization](/optimize/query-optimization) guide.* + +## Order by cardinality (lowest to highest) {#cardinality-ordering} +ClickHouse's primary index works best when low-cardinality columns come first, allowing it to skip large chunks of data efficiently. High-cardinality columns later in the key provide fine-grained sorting within those chunks. Start with columns that have few unique values (like status, category, country) and end with columns that have many unique values (like user_id, timestamp, session_id). + +Check out more documentation on cardinality and primary indexes: +- [Choosing a Primary Key](/best-practices/choosing-a-primary-key) +- [Primary indexes](/primary-indexes) + +## Time granularity matters {#time-granularity} +When using timestamps in your ORDER BY clause, consider the cardinality vs precision trade-off. Microsecond-precision timestamps create very high cardinality (nearly one unique value per row), which reduces the effectiveness of ClickHouse's sparse primary index. Rounded timestamps create lower cardinality that enables better index skipping, but you lose precision for time-based queries. + +```sql runnable editable +-- Challenge: Try different time functions like toStartOfMinute or toStartOfWeek +-- Experiment: Compare the cardinality differences with your own timestamp data +SELECT + 'Microsecond precision' as granularity, + uniq(created_at) as unique_values, + 'Creates massive cardinality - bad for sort key' as impact +FROM github.github_events +WHERE created_at >= '2024-01-01' +UNION ALL +SELECT + 'Hour precision', + uniq(toStartOfHour(created_at)), + 'Much better for sort key - enables skip indexing' +FROM github.github_events +WHERE created_at >= '2024-01-01' +UNION ALL +SELECT + 'Day precision', + uniq(toStartOfDay(created_at)), + 'Best for reporting queries' +FROM github.github_events +WHERE created_at >= '2024-01-01'; +``` + +## Focus on individual queries, not averages {#focus-on-individual-queries-not-averages} + +When debugging ClickHouse performance, don't rely on average query times or overall system metrics. Instead, identify why specific queries are slow. A system can have good average performance while individual queries suffer from memory exhaustion, poor filtering, or high cardinality operations. + +According to Alexey, CTO of ClickHouse: *"The right way is to ask yourself why this particular query was processed in five seconds... I don't care if median and other queries process quickly. I only care about my query"* + +When a query is slow, don't just look at averages. Ask "Why was THIS specific query slow?" and examine the actual resource usage patterns. + +## Memory and row scanning {#memory-and-row-scanning} + +Sentry is a developer-first error tracking platform processing billions of events daily from 4+ million developers. Their key insight: *"The cardinality of the grouping key that's going to drive memory in this particular situation"* - High cardinality aggregations kill performance through memory exhaustion, not row scanning. + +When queries fail, determine if it's a memory problem (too many groups) or scanning problem (too many rows). + +A query like `GROUP BY user_id, error_message, url_path` creates a separate memory state for every unique combination of all three values together. With a higher load of users, error types, and URL paths, you could easily generate millions of aggregation states that must be held in memory simultaneously. + +For extreme cases, Sentry uses deterministic sampling. A 10% sample reduces memory usage by 90% while maintaining roughly 5% accuracy for most aggregations: + +```sql +WHERE cityHash64(user_id) % 10 = 0 -- Always same 10% of users +``` + +This ensures the same users appear in every query, providing consistent results across time periods. The key insight: `cityHash64()` produces consistent hash values for the same input, so `user_id = 12345` will always hash to the same value, ensuring that user either always appears in your 10% sample or never does - no flickering between queries. + +## Sentry's bit mask optimization {#bit-mask-optimization} + +When aggregating by high-cardinality columns (like URLs), each unique value creates a separate aggregation state in memory, leading to memory exhaustion. Sentry's solution: instead of grouping by the actual URL strings, group by boolean expressions that collapse into bit masks. + +Here is a query that you can try on your own tables if this situation applies to you: + +```sql +-- Memory-Efficient Aggregation Pattern: Each condition = one integer per group +-- Key insight: sumIf() creates bounded memory regardless of data volume +-- Memory per group: N integers (N * 8 bytes) where N = number of conditions + +SELECT + your_grouping_column, + + -- Each sumIf creates exactly one integer counter per group + -- Memory stays constant regardless of how many rows match each condition + sumIf(1, your_condition_1) as condition_1_count, + sumIf(1, your_condition_2) as condition_2_count, + sumIf(1, your_text_column LIKE '%pattern%') as pattern_matches, + sumIf(1, your_numeric_column > threshold_value) as above_threshold, + + -- Complex multi-condition aggregations still use constant memory + sumIf(1, your_condition_1 AND your_text_column LIKE '%pattern%') as complex_condition_count, + + -- Standard aggregations for context + count() as total_rows, + avg(your_numeric_column) as average_value, + max(your_timestamp_column) as latest_timestamp + +FROM your_schema.your_table +WHERE your_timestamp_column >= 'start_date' + AND your_timestamp_column < 'end_date' +GROUP BY your_grouping_column +HAVING condition_1_count > minimum_threshold + OR condition_2_count > another_threshold +ORDER BY (condition_1_count + condition_2_count + pattern_matches) DESC +LIMIT 20 +``` + +Instead of storing every unique string in memory, you're storing the answer to questions about those strings as integers. The aggregation state becomes bounded and tiny, regardless of data diversity. + +From Sentry's engineering team: "These heavy queries are more than 10x faster and our memory usage is 100x lower (and, more importantly, bounded). Our largest customers no longer see errors when searching for replays and we can now support customers of arbitrary size without running out of memory." + +## Video sources {#video-sources} + +- [Lost in the Haystack - Optimizing High Cardinality Aggregations](https://www.youtube.com/watch?v=paK84-EUJCA) - Sentry's production lessons on memory optimization +- [ClickHouse Performance Analysis](https://www.youtube.com/watch?v=lxKbvmcLngo) - Alexey Milovidov on debugging methodology +- [ClickHouse Meetup: Query Optimization Techniques](https://www.youtube.com/watch?v=JBomQk4Icjo) - Community optimization strategies + +**Read Next**: +- [Query Optimization Guide](/optimize/query-optimization) +- [Materialized Views Community Insights](./materialized-views.md) \ No newline at end of file diff --git a/docs/tips-and-tricks/success-stories.md b/docs/tips-and-tricks/success-stories.md new file mode 100644 index 00000000000..c8104a136d6 --- /dev/null +++ b/docs/tips-and-tricks/success-stories.md @@ -0,0 +1,62 @@ +--- +sidebar_position: 1 +slug: /community-wisdom/creative-use-cases +sidebar_label: 'Success Stories' +doc_type: 'how-to-guide' +keywords: [ + 'clickhouse creative use cases', + 'clickhouse success stories', + 'unconventional database uses', + 'clickhouse rate limiting', + 'analytics database applications', + 'clickhouse mobile analytics', + 'customer-facing analytics', + 'database innovation', + 'clickhouse real-time applications', + 'alternative database solutions', + 'breaking database conventions', + 'production success stories' +] +title: 'Lessons - Creative Use Cases' +description: 'Find solutions to the most common ClickHouse problems including slow queries, memory errors, connection issues, and configuration problems.' +--- + +# Success stories {#breaking-the-rules} + +*This guide is part of a collection of findings gained from community meetups. For more real world solutions and insights you can [browse by specific problem](./community-wisdom.md).* +*Need tips on debugging an issue in prod? Check out the [Debugging Insights](./debugging-insights.md) community guide.* + +These stories showcase how companies found success by using ClickHouse for their use cases, some even challenging traditional database categories and proving that sometimes the "wrong" tool becomes exactly the right solution. + +## ClickHouse as rate limiter {#clickhouse-rate-limiter} + +When Craigslist needed to add tier-one rate limiting to protect their users, they faced the same decision every engineering team encounters - follow conventional wisdom and use Redis, or explore something different. Brad Lhotsky, working at Craigslist, knew Redis was the standard choice - virtually every rate limiting tutorial and example online uses Redis for good reason. It has rich primitives for rate limiting operations, well-established patterns, and proven track record. But Craigslist's experience with Redis wasn't matching the textbook examples. *"Our experience with Redis is not like what you've seen in the movies... there are a lot of weird maintenance issues that we've hit where we reboot a node in a Redis cluster and some latency spike hits the front end."* For a small team that values maintenance simplicity, these operational headaches were becoming a real problem. + +So when Brad was approached with the rate limiting requirements, he took a different approach: *"I asked my boss, 'What do you think of this idea? Maybe I can try this with ClickHouse?'"* The idea was unconventional - using an analytical database for what's typically a caching layer problem - but it addressed their core requirements: fail open, impose no latency penalties, and be maintenance-safe for a small team. The solution leveraged their existing infrastructure where access logs were already flowing into ClickHouse via Kafka. Instead of maintaining a separate Redis cluster, they could analyze request patterns directly from the access log data and inject rate limiting rules into their existing ACL API. The approach meant slightly higher latency than Redis, which *"is kind of cheating by instantiating that data set upfront"* rather than doing real-time aggregate queries, but the queries still completed in under 100 milliseconds. + +**Key Results:** +- Dramatic improvement over Redis infrastructure +- Built-in TTL for automatic cleanup eliminated maintenance overhead +- SQL flexibility enabled complex rate limiting rules beyond simple counters +- Leveraged existing data pipeline instead of requiring separate infrastructure + +## ClickHouse for customer analytics {#customer-analytics} + +When ServiceNow needed to upgrade their mobile analytics platform, they faced a simple question: *"Why would we replace something that works?"* Amir Vaza from ServiceNow knew their existing system was reliable, but customer demands were outgrowing what it could handle. *"The motivation to replace an existing reliable model is actually from the product world,"* Amir explained. ServiceNow offered mobile analytics as part of their solution for web, mobile, and chatbots, but customers wanted analytical flexibility that went beyond pre-aggregated data. + +Their previous system used about 30 different tables with pre-aggregated data segmented by fixed dimensions: application, app version, and platform. For custom properties—key-value pairs that customers could send—they created separate counters for each group. This approach delivered fast dashboard performance but came with a major limitation. *"While this is great for quick value breakdown, I mentioned limitation leads to a lot of loss of analytical context,"* Amir noted. Customers couldn't perform complex customer journey analysis or ask questions like "how many sessions started with the search term 'research RSA token'" and then analyze what those users did next. The pre-aggregated structure destroyed the sequential context needed for multi-step analysis, and every new analytical dimension required engineering work to pre-aggregate and store. + +So when the limitations became clear, ServiceNow moved to ClickHouse and eliminated these pre-computation constraints entirely. Instead of calculating every variable upfront, they broke metadata into data points and inserted everything directly into ClickHouse. They used ClickHouse's async insert queue, which Amir called *"actually amazing,"* to handle data ingestion efficiently. The approach meant customers could now create their own segments, slice data freely across any dimensions, and perform complex customer journey analysis that wasn't possible before. + +**Key Results:** +- Dynamic segmentation across any dimensions without pre-computation +- Complex customer journey analysis became possible +- Customers could create their own segments and slice data freely +- No more engineering bottlenecks for new analytical requirements + +## Video sources {#video-sources} + +- **[Breaking the Rules - Building a Rate Limiter with ClickHouse](https://www.youtube.com/watch?v=wRwqrbUjRe4)** - Brad Lhotsky (Craigslist) +- **[ClickHouse as an Analytical Solution in ServiceNow](https://www.youtube.com/watch?v=b4Pmpx3iRK4)** - Amir Vaza (ServiceNow) + +*These stories demonstrate how questioning conventional database wisdom can lead to breakthrough solutions that redefine what's possible with analytical databases.* \ No newline at end of file diff --git a/docs/tips-and-tricks/too-many-parts.md b/docs/tips-and-tricks/too-many-parts.md new file mode 100644 index 00000000000..e721e0733d3 --- /dev/null +++ b/docs/tips-and-tricks/too-many-parts.md @@ -0,0 +1,76 @@ +--- +sidebar_position: 1 +slug: /tips-and-tricks/too-many-parts +sidebar_label: 'Too Many Parts' +doc_type: 'how-to' +keywords: [ + 'clickhouse too many parts', + 'too many parts error', + 'clickhouse insert batching', + 'part explosion problem', + 'clickhouse merge performance', + 'batch insert optimization', + 'clickhouse async inserts', + 'small insert problems', + 'clickhouse parts management', + 'insert performance optimization', + 'clickhouse batching strategy', + 'database insert patterns' +] +title: 'Lessons - Too Many Parts Problem' +description: 'Solutions and prevention of Too Many Parts' +--- + +# The too many parts problem {#the-too-many-parts-problem} +*This guide is part of a collection of findings gained from community meetups. For more real world solutions and insights you can [browse by specific problem](./community-wisdom.md).* +*Need more performance optimization tips? Check out the [Performance Optimization](./performance-optimization.md) community insights guide.* + +## Understanding the problem {#understanding-the-problem} + +ClickHouse will throw a "Too many parts" error to prevent severe performance degradation. Small parts cause multiple issues: poor query performance from reading and merging more files during queries, increased memory usage since each part requires metadata in memory, reduced compression efficiency as smaller data blocks compress less effectively, higher I/O overhead from more file handles and seek operations, and slower background merges giving the merge scheduler more work. + +**Related Docs** +- [MergeTree Engine](/engines/table-engines/mergetree-family/mergetree) +- [Parts](/parts) +- [Parts System Table](/operations/system-tables/parts) + +## Recognize the problem early {#recognize-parts-problem} + +This query monitors table fragmentation by analyzing part counts and sizes across all active tables. It identifies tables with excessive or undersized parts that may need merge optimization. Use this regularly to catch fragmentation issues before they impact query performance. + +```sql runnable editable +-- Challenge: Replace with your actual database and table names for production use +-- Experiment: Adjust the part count thresholds (1000, 500, 100) based on your system +SELECT + database, + table, + count() as total_parts, + sum(rows) as total_rows, + round(avg(rows), 0) as avg_rows_per_part, + min(rows) as min_rows_per_part, + max(rows) as max_rows_per_part, + round(sum(bytes_on_disk) / 1024 / 1024, 2) as total_size_mb, + CASE + WHEN count() > 1000 THEN 'CRITICAL - Too many parts (>1000)' + WHEN count() > 500 THEN 'WARNING - Many parts (>500)' + WHEN count() > 100 THEN 'CAUTION - Getting many parts (>100)' + ELSE 'OK - Reasonable part count' + END as parts_assessment, + CASE + WHEN avg(rows) < 1000 THEN 'POOR - Very small parts' + WHEN avg(rows) < 10000 THEN 'FAIR - Small parts' + WHEN avg(rows) < 100000 THEN 'GOOD - Medium parts' + ELSE 'EXCELLENT - Large parts' + END as part_size_assessment +FROM system.parts +WHERE active = 1 + AND database NOT IN ('system', 'information_schema') +GROUP BY database, table +ORDER BY total_parts DESC +LIMIT 20; +``` + +## Video Sources {#video-sources} + +- [Fast, Concurrent, and Consistent Asynchronous INSERTS in ClickHouse](https://www.youtube.com/watch?v=AsMPEfN5QtM) - ClickHouse team member explains async inserts and the too many parts problem +- [Production ClickHouse at Scale](https://www.youtube.com/watch?v=liTgGiTuhJE) - Real-world batching strategies from observability platforms \ No newline at end of file diff --git a/docs/troubleshooting/index.md b/docs/troubleshooting/index.md new file mode 100644 index 00000000000..a85f66cdf8b --- /dev/null +++ b/docs/troubleshooting/index.md @@ -0,0 +1,151 @@ +--- +slug: /troubleshooting +sidebar_label: 'Troubleshooting' +doc_type: 'reference' +keywords: [ + 'clickhouse troubleshooting', + 'clickhouse errors', + 'database troubleshooting', + 'clickhouse connection issues', + 'memory limit exceeded', + 'clickhouse performance problems', + 'database error messages', + 'clickhouse configuration issues', + 'connection refused error', + 'clickhouse debugging', + 'database connection problems', + 'troubleshooting guide' +] +title: 'Troubleshooting Common Issues' +description: 'Find solutions to the most common ClickHouse problems including slow queries, memory errors, connection issues, and configuration problems.' +--- + +# Troubleshooting common issues {#troubleshooting-common-issues} + +Having problems with ClickHouse? Find the solutions to common issues here. + +## Performance and errors {#performance-and-errors} + +Queries running slowly, timeouts, or getting specific error messages like "Memory limit exceeded" or "Connection refused." + +
+Show performance and error solutions + +### Query performance {#query-performance} +- [Find which queries are using the most resources](/knowledgebase/find-expensive-queries) +- [Complete query optimization guide](/docs/optimize/query-optimization) +- [Optimize JOIN operations](/docs/best-practices/minimize-optimize-joins) +- [Run diagnostic queries to find bottlenecks](/docs/knowledgebase/useful-queries-for-troubleshooting) +
+### Data insertion performance {#data-insertion-performance} +- [Speed up data insertion](/docs/optimize/bulk-inserts) +- [Set up asynchronous inserts](/docs/optimize/asynchronous-inserts) +
+### Advanced analysis tools {#advanced-analysis-tools} + +- [Check what processes are running](/docs/knowledgebase/which-processes-are-currently-running) +- [Monitor system performance](/docs/operations/system-tables/processes) +
+### Error messages {#error-messages} +- **"Memory limit exceeded"** → [Debug memory limit errors](/docs/guides/developer/debugging-memory-issues) +- **"Connection refused"** → [Fix connection problems](#connections-and-authentication) +- **"Login failures"** → [Set up users, roles, and permissions](/docs/operations/access-rights) +- **"SSL certificate errors"** → [Fix certificate problems](/docs/knowledgebase/certificate_verify_failed_error) +- **"Table/database errors"** → [Database creation guide](/docs/sql-reference/statements/create/database) | [Table UUID problems](/docs/engines/database-engines/atomic) +- **"Network timeouts"** → [Network troubleshooting](/docs/interfaces/http) +- **Other issues** → [Track errors across your cluster](/docs/operations/system-tables/errors) +
+ +## Memory and resources {#memory-and-resources} + +High memory usage, out-of-memory crashes, or need help sizing your ClickHouse deployment. + +
+Show memory solutions + +### Memory debugging and monitoring: {#memory-debugging-and-monitoring} + +- [Identify what's using memory](/docs/guides/developer/debugging-memory-issues) +- [Check current memory usage](/docs/operations/system-tables/processes) +- [Memory allocation profiling](/docs/operations/allocation-profiling) +- [Analyze memory usage patterns](/docs/operations/system-tables/query_log) +
+### Memory configuration: {#memory-configuration} + +- [Configure memory limits](/docs/operations/settings/memory-overcommit) +- [Server memory settings](/docs/operations/server-configuration-parameters/settings) +- [Session memory settings](/docs/operations/settings/settings) +
+### Scaling and sizing: {#scaling-and-sizing} + +- [Right-size your service](/docs/operations/tips) +- [Configure automatic scaling](/docs/manage/scaling) + +
+ +## Connections and Authentication {#connections-and-authentication} + +Can't connect to ClickHouse, authentication failures, SSL certificate errors, or client setup issues. + +
+Show connection solutions + +### Basic Connection issues {#basic-connection-issues} +- [Fix HTTP interface issues](/docs/interfaces/http) +- [Handle SSL certificate problems](/docs/knowledgebase/certificate_verify_failed_error) +- [User authentication setup](/docs/operations/access-rights) +
+### Client interfaces {#client-interfaces} +- [Native ClickHouse clients](/docs/interfaces/natives-clients-and-interfaces) +- [MySQL interface problems](/docs/interfaces/mysql) +- [PostgreSQL interface issues](/docs/interfaces/postgresql) +- [gRPC interface configuration](/docs/interfaces/grpc) +- [SSH interface setup](/docs/interfaces/ssh) +
+### Network and data {#network-and-data} +- [Network security settings](/docs/operations/server-configuration-parameters/settings) +- [Data format parsing issues](/docs/interfaces/formats) + +
+ +## Setup and configuration {#setup-and-configuration} + +Initial installation, server configuration, database creation, data ingestion issues, or replication setup. + +
+Show setup and configuration solutions + +### Initial setup {#initial-setup} +- [Configure server settings](/docs/operations/server-configuration-parameters/settings) +- [Set up security and access control](/docs/operations/access-rights) +- [Configure hardware properly](/docs/operations/tips) +
+### Database management {#database-management} +- [Create and manage databases](/docs/sql-reference/statements/create/database) +- [Choose the right table engine](/docs/engines/table-engines) + +
+### Data operations {#data-operations} +- [Optimize bulk data insertion](/docs/optimize/bulk-inserts) +- [Handle data format problems](/docs/interfaces/formats) +- [Set up streaming data pipelines](/docs/optimize/asynchronous-inserts) +- [Improve S3 integration performance](/docs/integrations/s3/performance) +
+### Advanced configuration {#advanced-configuration} +- [Set up data replication](/docs/engines/table-engines/mergetree-family/replication) +- [Configure distributed tables](/docs/engines/table-engines/special/distributed) + +- [Set up backup and recovery](/docs/operations/backup) +- [Configure monitoring](/docs/operations/system-tables/overview) + +
+ +## Still need help? {#still-need-help} + +If you can't find a solution: + +1. **Ask AI** - Ask AI for instant answers. +1. **Check system tables** - [Overview](/operations/system-tables/overview) +2. **Review server logs** - Look for error messages in your ClickHouse logs +3. **Ask the community** - [Join Our Community Slack](https://clickhouse.com/slack), [GitHub Discussions](https://github.com/ClickHouse/ClickHouse/discussions) +4. **Get professional support** - [ClickHouse Cloud support](https://clickhouse.com/support) \ No newline at end of file diff --git a/knowledgebase/find-expensive-queries.mdx b/knowledgebase/find-expensive-queries.mdx index db30bd4a81d..5e1e59e6bdf 100644 --- a/knowledgebase/find-expensive-queries.mdx +++ b/knowledgebase/find-expensive-queries.mdx @@ -2,6 +2,7 @@ title: How to Identify the Most Expensive Queries in ClickHouse description: Learn how to use the `query_log` table in ClickHouse to identify the most memory and CPU-intensive queries across distributed nodes. date: 2023-03-26 +slug: find-expensive-queries tags: ['Performance and Optimizations'] keywords: ['Expensive Queries'] --- diff --git a/knowledgebase/finding_expensive_queries_by_memory_usage.mdx b/knowledgebase/finding_expensive_queries_by_memory_usage.mdx index 100a8317d96..6f55d1c57b1 100644 --- a/knowledgebase/finding_expensive_queries_by_memory_usage.mdx +++ b/knowledgebase/finding_expensive_queries_by_memory_usage.mdx @@ -2,6 +2,7 @@ title: Identifying Expensive Queries by Memory Usage in ClickHouse description: Learn how to use the `system.query_log` table to find the most memory-intensive queries in ClickHouse, with examples for clustered and standalone setups. date: 2023-06-07 +slug: find-expensive-queries-by-memory-usage tags: ['Performance and Optimizations'] keywords: ['Expensive Queries', 'Memory Usage'] --- diff --git a/package.json b/package.json index 5e5da2123d0..c1a951d1077 100644 --- a/package.json +++ b/package.json @@ -44,6 +44,7 @@ "@docusaurus/theme-mermaid": "3.7.0", "@docusaurus/theme-search-algolia": "^3.7.0", "@mdx-js/react": "^3.1.0", + "@monaco-editor/react": "^4.7.0", "@radix-ui/react-navigation-menu": "^1.2.13", "@redocly/cli": "^1.34.0", "axios": "^1.11.0", diff --git a/scripts/aspell-ignore/en/aspell-dict.txt b/scripts/aspell-ignore/en/aspell-dict.txt index c45f411cd4f..6d76f96dbf4 100644 --- a/scripts/aspell-ignore/en/aspell-dict.txt +++ b/scripts/aspell-ignore/en/aspell-dict.txt @@ -1,4 +1,4 @@ -personal_ws-1.1 en 3611 +personal_ws-1.1 en 3638 AArch ACLs AICPA @@ -32,6 +32,7 @@ Airbyte Akka AlertManager Alexey +Amir Anthropic AnyEvent AnythingLLM @@ -192,9 +193,13 @@ ClickBench ClickCat ClickHouse ClickHouse's +ClickHouseAccess ClickHouseClient +ClickHouseIO ClickHouseMigrator ClickHouseNIO +ClickHouseSettings +ClickHouseType ClickHouseVapor ClickPipe ClickPipes @@ -215,6 +220,7 @@ CodeLLDB Codecs CollapsingMergeTree Combinators +CommonRoom Compat CompiledExpressionCacheBytes CompiledExpressionCacheCount @@ -227,12 +233,17 @@ ConcurrencyControlSoftLimit Config ConnectionDetails Const +ContentSquare +ContentSquare's +Contentsquare ContextLockWait Contrib CopilotKit Copilotkit CountMin Covid +Craigslist +Craigslist's Cramer's Criteo Crotty @@ -321,6 +332,7 @@ DiskSpaceReservedForMerge DiskTotal DiskUnreserved DiskUsed +Displayce DistributedCacheLogMode DistributedCachePoolBehaviourOnLimit DistributedDDLOutputMode @@ -328,6 +340,7 @@ DistributedFilesToInsert DistributedProductMode DistributedSend DockerHub +Doron DoubleDelta Doxygen Draxlr @@ -437,6 +450,7 @@ GraphQL GraphiteMergeTree Greenwald Gunicorn +Guram HANA HDDs HHMM @@ -447,6 +461,7 @@ HSTS HTAP HTTPConnection HTTPThreads +Hashboard's HashedDictionary HashedDictionaryThreads HashedDictionaryThreadsActive @@ -609,6 +624,7 @@ Kerberos Khanna Kibana Kinesis +Kirill KittenHouse Klickhouse Kolmogorov @@ -634,6 +650,7 @@ LangGraph Langchain Lemire Levenshtein +Lhotsky Liao LibFuzzer LibreChat @@ -707,6 +724,7 @@ MaxPartCountForPartition MaxPushedDDLEntryID MaxThreads Mbps +McClickHouse McNeal Memcheck MemoryCode @@ -744,6 +762,7 @@ MetroHash MiB Milli Milovidov +Milovidov's MinHash MinIO MinMax @@ -1128,6 +1147,7 @@ SaaS Sackmann's Sanjeev Sankey +Sapchuk Scalable Scatterplot Schaefer @@ -1143,6 +1163,8 @@ SendExternalTables SendScalars SerDe Serverless +ServiceNow +ServiceNow's SetOperationMode SeverityText ShareAlike @@ -1153,6 +1175,7 @@ SharedMergeTree ShortCircuitFunctionEvaluation Shortkeys Signup +Sigua SimHash Simhash SimpleAggregateFunction @@ -1292,6 +1315,7 @@ TotalPrimaryKeyBytesInMemory TotalPrimaryKeyBytesInMemoryAllocated TotalRowsOfMergeTreeTables TotalTemporaryFiles +Totalprices TotalsMode Tradeoff Transactional @@ -1350,6 +1374,7 @@ VPCs VPNs Vadim Valgrind +Vaza Vectorization Vectorized Vercel @@ -1711,6 +1736,7 @@ changelogs charset charsets chartdb +chatbots chconn chdb cheatsheet @@ -2362,6 +2388,7 @@ kernal keyspace keytab kittenhouse +knowledgebase kolmogorovSmirnovTest kolmogorovsmirnovtest kolya @@ -2374,6 +2401,7 @@ kurtSamp kurtosis kurtpop kurtsamp +kusto lagInFrame laion lakehouse diff --git a/sidebars.js b/sidebars.js index 4749b3733ac..d49118a2ff8 100644 --- a/sidebars.js +++ b/sidebars.js @@ -86,6 +86,14 @@ const sidebars = { "guides/developer/mutations", ], }, + { + type: "category", + label: "Troubleshooting", + collapsed: false, + collapsible: false, + link: { type: "doc", id: "troubleshooting/index" }, + items: [] + }, { type: "category", label: "Best Practices", @@ -241,6 +249,20 @@ const sidebars = { }, ], }, + { + type: "category", + label: "Tips and Community Wisdom", + className: "top-nav-item", + collapsed: true, + collapsible: true, + link: { type: "doc", id: "tips-and-tricks/community-wisdom" }, + items: [ + { + type: "autogenerated", + dirName: "tips-and-tricks", + } + ] + }, { type: "category", label: "Example Datasets", @@ -1781,6 +1803,12 @@ const sidebars = { description: "Start here when learning ClickHouse", href: "/starter-guides" }, + { + type: "link", + label: "Troubleshooting", + description: "Troubleshooting ClickHouse", + href: "/troubleshooting" + }, { type: "link", label: "Best Practices", @@ -1799,6 +1827,12 @@ const sidebars = { description: "Common use case guides for ClickHouse", href: "/use-cases" }, + { + type: "link", + label: "Tips and Community Wisdom", + description: "Community Lessons", + href: "/tips-and-tricks/community-wisdom" + }, { type: "link", label: "Example datasets", diff --git a/src/components/CodeViewer/index.tsx b/src/components/CodeViewer/index.tsx index 6f7292bace9..14cef45abf7 100644 --- a/src/components/CodeViewer/index.tsx +++ b/src/components/CodeViewer/index.tsx @@ -1,11 +1,12 @@ -import { CodeBlock, ClickUIProvider, Text } from '@clickhouse/click-ui/bundled' +import { CodeBlock, ClickUIProvider, Text, Button } from '@clickhouse/click-ui/bundled' import CodeInterpreter from './CodeInterpreter' import { DefaultView } from './CodeResults' import { ChartConfig, ChartType } from './types' import { base64Decode } from './utils' import { useColorMode } from '@docusaurus/theme-common' -import { isValidElement } from 'react' -import DocusaurusCodeBlock from '@theme-original/CodeBlock'; +import { isValidElement, useState } from 'react' +import DocusaurusCodeBlock from '@theme-original/CodeBlock' +import Editor from '@monaco-editor/react' function getCodeContent(children: any): string { if (typeof children === 'string') return children @@ -42,6 +43,7 @@ function CodeViewer({ language = 'sql', show_line_numbers = false, runnable = 'false', + editable = 'false', run = 'false', link, view = 'table', @@ -54,9 +56,12 @@ function CodeViewer({ children, ...props }: any) { + const [code, setCode] = useState(typeof children === 'string' ? children : getCodeContent(children)) + const showLineNumbers = show_line_numbers === 'true' const runBoolean = run === 'true' const runnableBoolean = runnable === 'true' + const editableBoolean = editable === 'true' const showStatistics = show_statistics === 'true' let chart: { type: ChartType; config?: ChartConfig } | undefined @@ -71,57 +76,105 @@ function CodeViewer({ } catch { console.log('chart config is not valid') } - const { colorMode } = useColorMode(); // returns 'light' or 'dark' + + const { colorMode } = useColorMode() const extraStyle = parseInlineStyle(style) - const combinedStyle:React.CSSProperties = { + const combinedStyle: React.CSSProperties = { wordBreak: 'break-word', ...extraStyle } + + const handleKeyDown = (e: React.KeyboardEvent) => { + // Allow tab in textarea + if (e.key === 'Tab') { + e.preventDefault() + const target = e.target as HTMLTextAreaElement + const start = target.selectionStart + const end = target.selectionEnd + const newValue = code.substring(0, start) + ' ' + code.substring(end) + setCode(newValue) + setTimeout(() => { + target.selectionStart = target.selectionEnd = start + 2 + }, 0) + } + } + const header = title ? ( - <> - {title} - - ): null + {title} + ) : null - const code_block = click_ui === 'true' ? ( - - {typeof children === 'string' ? children : getCodeContent(children)} - - ): ( - + // Always show as editable Monaco editor when editable=true + const code_block = editableBoolean ? ( +
+ setCode(value || '')} + language={language} + theme={colorMode === 'dark' ? 'vs-dark' : 'vs-light'} + height={`${Math.max(200, (code.split('\n').length + 2) * 19)}px`} + options={{ + minimap: { enabled: false }, + scrollBeyondLastLine: false, + fontSize: 14, + lineNumbers: showLineNumbers ? 'on' : 'off', + wordWrap: 'on', + automaticLayout: true, + tabSize: 2, + insertSpaces: true, + folding: false, + glyphMargin: false, + lineDecorationsWidth: 0, + lineNumbersMinChars: 3, + renderLineHighlight: 'line', + selectOnLineNumbers: true, + roundedSelection: false, + scrollbar: { + verticalScrollbarSize: 8, + horizontalScrollbarSize: 8 + } + }} + /> +
+ ) : ( + click_ui === 'true' ? ( + + {code} + + ) : ( + + ) ) - const results = runnable ? ( + + const results = runnableBoolean ? ( - ): null + ) : null return ( -
- - { header } - { code_block } - { results } - -
- - +
+ + {header} + {code_block} + {results} + +
) } diff --git a/src/components/KapaAI/KapaLink.tsx b/src/components/KapaAI/KapaLink.tsx new file mode 100644 index 00000000000..041d7283d63 --- /dev/null +++ b/src/components/KapaAI/KapaLink.tsx @@ -0,0 +1,22 @@ +import React from 'react'; + +declare global { + interface Window { + Kapa?: (action: string, params?: any) => void; + } +} + +export default function KapaLink({ children, query }) { + const handleClick = (e) => { + e.preventDefault(); + if (window.Kapa) { + window.Kapa('open', query ? { query, submit: true } : {}); + } + }; + + return ( + + {children} + + ); +} diff --git a/src/css/custom.scss b/src/css/custom.scss index 9161e8722e0..f887de4cc91 100644 --- a/src/css/custom.scss +++ b/src/css/custom.scss @@ -1046,7 +1046,7 @@ prism-code { details { h1, h2, h3 { - color: var(--click-color-text-inverse); + color: var(--ifm-font-color-base); } } diff --git a/src/theme/CodeBlock/index.js b/src/theme/CodeBlock/index.js index 1d523f36c26..d4b7408d5b3 100644 --- a/src/theme/CodeBlock/index.js +++ b/src/theme/CodeBlock/index.js @@ -18,7 +18,7 @@ function countLines(text = '') { function parseMetaString(meta = '') { const result = {} - const implicit_settings = ['runnable', 'run', 'show_statistics', 'click_ui'] + const implicit_settings = ['runnable', 'run', 'show_statistics', 'click_ui', 'editable'] meta.split(' ').forEach((part) => { if (!part) return diff --git a/src/theme/MDXComponents.js b/src/theme/MDXComponents.js index 07b73d1081c..395d61909f3 100644 --- a/src/theme/MDXComponents.js +++ b/src/theme/MDXComponents.js @@ -6,10 +6,12 @@ import MDXComponents from '@theme-original/MDXComponents'; // Make sure the path matches your project structure import VStepper from '@site/src/components/Stepper/Stepper'; import GlossaryTooltip from '@site/src/components/GlossaryTooltip/GlossaryTooltip'; +import KapaLink from '@site/src/components/KapaAI/KapaLink'; // Define the enhanced components const enhancedComponents = { ...MDXComponents, + KapaLink, GlossaryTooltip, ul: (props) =>
    , ol: (props) =>
      , diff --git a/yarn.lock b/yarn.lock index 25ca98513a1..82ee27e02a2 100644 --- a/yarn.lock +++ b/yarn.lock @@ -2728,6 +2728,20 @@ "@module-federation/runtime" "0.8.4" "@module-federation/sdk" "0.8.4" +"@monaco-editor/loader@^1.5.0": + version "1.5.0" + resolved "https://registry.yarnpkg.com/@monaco-editor/loader/-/loader-1.5.0.tgz#dcdbc7fe7e905690fb449bed1c251769f325c55d" + integrity sha512-hKoGSM+7aAc7eRTRjpqAZucPmoNOC4UUbknb/VNoTkEIkCPhqV8LfbsgM1webRM7S/z21eHEx9Fkwx8Z/C/+Xw== + dependencies: + state-local "^1.0.6" + +"@monaco-editor/react@^4.7.0": + version "4.7.0" + resolved "https://registry.yarnpkg.com/@monaco-editor/react/-/react-4.7.0.tgz#35a1ec01bfe729f38bfc025df7b7bac145602a60" + integrity sha512-cyzXQCtO47ydzxpQtCGSQGOC8Gk3ZUeBXFAxD+CWXYFo5OqZyZUonFl0DwUlTyAfRHntBfw2p3w4s9R6oe1eCA== + dependencies: + "@monaco-editor/loader" "^1.5.0" + "@napi-rs/wasm-runtime@^0.2.9": version "0.2.9" resolved "https://registry.yarnpkg.com/@napi-rs/wasm-runtime/-/wasm-runtime-0.2.9.tgz#7278122cf94f3b36d8170a8eee7d85356dfa6a96" @@ -13113,6 +13127,11 @@ srcset@^4.0.0: resolved "https://registry.yarnpkg.com/srcset/-/srcset-4.0.0.tgz#336816b665b14cd013ba545b6fe62357f86e65f4" integrity sha512-wvLeHgcVHKO8Sc/H/5lkGreJQVeYMm9rlmt8PuR1xE31rIuXhuzznUUqAt8MqLhB3MqJdFzlNAfpcWnxiFUcPw== +state-local@^1.0.6: + version "1.0.7" + resolved "https://registry.yarnpkg.com/state-local/-/state-local-1.0.7.tgz#da50211d07f05748d53009bee46307a37db386d5" + integrity sha512-HTEHMNieakEnoe33shBYcZ7NX83ACUjCu8c40iOGEZsngj9zRnkqS9j1pqQPXwobB0ZcVTk27REb7COQ0UR59w== + statuses@2.0.1: version "2.0.1" resolved "https://registry.yarnpkg.com/statuses/-/statuses-2.0.1.tgz#55cb000ccf1d48728bd23c685a063998cf1a1b63"