-
Notifications
You must be signed in to change notification settings - Fork 195
Optimized version of task-1 #155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
require 'benchmark/ips' | ||
#require 'kalibera' | ||
require 'benchmark' | ||
require_relative '../task-1' | ||
|
||
=begin | ||
puts 'Starting benchmark...' | ||
|
||
Benchmark.ips do |x| | ||
x.config(confidence: 95) | ||
|
||
x.report('Building report') do | ||
work(file_name: 'data_25000_thousands_lines.txt') | ||
end | ||
end | ||
=end | ||
|
||
puts 'Starting benchmark...' | ||
|
||
time = Benchmark.realtime do | ||
work(file_name: 'data_25000_thousands_lines.txt') | ||
end | ||
|
||
puts "Finish in #{time.round(2)}" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# frozen_string_literal: true | ||
|
||
require 'benchmark/ips' | ||
require 'benchmark' | ||
require_relative '../task-1' | ||
|
||
=begin | ||
puts 'Starting benchmark...' | ||
|
||
Benchmark.ips do |x| | ||
x.config(confidence: 95) | ||
|
||
x.report('Building report') do | ||
work(file_name: 'data_150_thousands_lines.txt') | ||
end | ||
end | ||
=end | ||
|
||
|
||
puts 'Starting benchmark...' | ||
|
||
time = Benchmark.realtime do | ||
work(file_name: 'data_150_thousands_lines.txt') | ||
end | ||
|
||
puts "Finish in #{time.round(2)}" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# frozen_string_literal: true | ||
|
||
require 'benchmark' | ||
require_relative '../task-1' | ||
|
||
puts 'Starting benchmark...' | ||
|
||
time = Benchmark.realtime do | ||
work(file_name: 'data_large.txt') | ||
end | ||
|
||
puts "Finish in #{time.round(2)}" | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# frozen_string_literal: true | ||
|
||
require 'benchmark' | ||
require_relative '../task-1' | ||
|
||
puts 'Starting benchmark...' | ||
|
||
time = Benchmark.realtime do | ||
work(file_name: 'data_500_thousands_lines.txt') | ||
end | ||
|
||
puts "Finish in #{time.round(2)}" | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
## Case 2 | ||
### Метрика на 150 тысячах строк | ||
* 1.000 in 12.357131s (9 - 12 секунд) | ||
|
||
### Бюджет на метрику | ||
~2 секунды на обработку data_150_thousands_lines.txt | ||
|
||
### Пишу спеку (для фиксации текущей метрики) | ||
`rspec spec/work_performance_spec.rb:19` | ||
|
||
### Применил профилировщик stackprof speedscope | ||
5.78 секунд Array#each | ||
|
||
`sessions + [parse_session(line)]` | ||
**Решение:** | ||
|
||
Добавляю в существующий массив данные вместо каждый раз создание нового массива | ||
|
||
**Результат после оптимизационных действий** | ||
Метрика снизилась c 9 секунд до 2 секунд (~ в 4.5 раза) | ||
|
||
### Обновил спеку | ||
`rspec spec/work_performance_spec.rb:19` | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
## Case 1 | ||
|
||
### Метрика на 25000 тысячах строк | ||
* (1.000 in 5.441479s, (5.4 - 5.8) до начала оптимизации с включенным GC) | ||
`benchmark/benchmark_cpu_work.rb` | ||
### Бюджет на метрику | ||
* На 25000 тысячах lines хочу чтобы время реальной работы программы уменьшилось до 2 секунд | ||
|
||
### Пишу спеку (для фиксации текущей метрики) | ||
`rspec spec/work_performance_spec.rb:13` | ||
|
||
### Применил профилировщик stackprof speedscope | ||
* Проблема в использовании select | ||
|
||
`user_sessions = sessions.select { |session| session['user_id'] == user['id'] }` | ||
|
||
Каждый раз делается селект из массива сессий. O(n) где n количество сессий, также это всё | ||
происходит в рамках цикла по юзер лайнам (n * m) | ||
|
||
**Решение:** | ||
Один раз сгрупировать сессии по юзерам вне цикла | ||
Далее внутри цикла использовать вытаскивать сессии для конкретного пользователя по хешу (O(1)) | ||
|
||
**Результат после оптимизационных действий** | ||
На 25 тысяч строк метрика снизилась c 5.8 секунд до 632 ms (~ в 9.1 раза) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. да, и главное сложность линейная стала |
||
|
||
### Обновил тест на производительность | ||
`rspec spec/work_performance_spec.rb:13` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
### Case 4 | ||
|
||
### Метрика на 3_250_940 строк | ||
Finish in 35.71 | ||
|
||
### Бюджет | ||
оптимизировать не менее 30 секунд на обработку data_large.txt | ||
|
||
### Пишу спеку (для фиксации текущей метрики) | ||
`rspec spec/work_performance_spec.rb:31` | ||
|
||
### Применил профилировщик stackprof speedscope | ||
collect_stats_from_users | ||
(self: 9.7%, 3.41 sec / total: 41%, 14.56 ) | ||
|
||
Чуть редактирую следующих код | ||
|
||
`report['usersStats'][user_key] = report['usersStats'][user_key].merge(block.call(user))` | ||
|
||
на | ||
|
||
`report['usersStats'][user_key].merge!(block.call(user))` - без создания нового хеша, модификация имеющегося | ||
|
||
-- Всё что ниже без дополнительного массива, модифицируем в один заход. | ||
|
||
`user.sessions.map {|s| s['time']}.map {|t| t.to_i}.sum.to_s + ' min.'` | ||
|
||
на | ||
|
||
`user.sessions.map {|s| s['time']}.sum(&:to_i).to_s + ' min'` | ||
|
||
-- | ||
|
||
`user.sessions.map {|s| s['time']}.map {|t| t.to_i}.max.to_s + ' min.' }` | ||
|
||
на | ||
|
||
`user.sessions.map {|s| s['time'].to_i}.max.to_s + ' min.'` | ||
|
||
-- | ||
|
||
`user.sessions.map {|s| s['browser']}.map {|b| b.upcase}.sort.join(', ')` | ||
|
||
на | ||
|
||
`user.sessions.map { |s| s['browser'].upcase }.sort.join(', ')` | ||
|
||
-- | ||
|
||
`user.sessions.map{|s| s['browser']}.any? { |b| b.upcase =~ /INTERNET EXPLORER/ }` | ||
|
||
на | ||
|
||
`user.sessions.any? { |s| s['browser'].upcase =~ /INTERNET EXPLORER/ }` | ||
|
||
-- | ||
|
||
`user.sessions.map{|s| s['browser']}.all? { |b| b.upcase =~ /CHROME/ }` | ||
|
||
на | ||
|
||
`user.sessions.all? { |s| s['browser'].upcase =~ /CHROME/ }` | ||
|
||
-- | ||
|
||
`user.sessions.map{|s| s['date']}.map {|d| Date.parse(d)}.sort.reverse.map { |d| d.iso8601 }` | ||
|
||
на | ||
|
||
`user.sessions.map { |s| s['date'] }.sort.reverse` -- можно не вызывать Date.parse(это была следующая точка роста по отчету) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. с датой можно ничего не делать, да |
||
|
||
**Результат после оптимизационных действий** | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. лучше не делать много изменений в одну итерацию, протому что сразу становится непонятно что как сработало |
||
|
||
Не сильный прирост, но в бюджет уложился Finish in 24.53 (~ в 1.4 раза) | ||
|
||
### Обновил тест на производительность | ||
`rspec spec/work_performance_spec.rb:31` | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
## Case 3 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. по отдельным файлам сложновато читать, так как они отображаются вразнобой как вариант можно их как то назвать вроде case_1_..., case_2_... |
||
### Метрика на 500 тысячах строк | ||
* Finish in 12.99 | ||
|
||
### Бюджет на метрику | ||
~6 секунды на обработку data_500_thousands_lines.txt | ||
|
||
### Пишу спеку (для фиксации текущей метрики) | ||
`rspec spec/work_performance_spec.rb:25` | ||
|
||
### Применил профилировщик stackprof speedscope | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 stackprof + speedscope = one love |
||
6.84 секунд Array#each | ||
|
||
`user_object = User.new(attributes: attributes, sessions: user_sessions)` | ||
|
||
`users_objects = users_objects + [user_object]` | ||
|
||
**Решение:** | ||
|
||
Убираю лишнее присваивание, добавляю в существующий массив(users_object) данные | ||
|
||
**Результат после оптимизационных действий** | ||
Метрика снизилась c 12.99 секунд до 6.9 секунд (~ в 2 раза) | ||
|
||
### Обновил спеку | ||
`rspec spec/work_performance_spec.rb:19` | ||
|
||
---- | ||
|
||
### Новая итерация (бюджет тот же ~6 секунды) | ||
5.18 секунд Array#each | ||
``` | ||
sessions.each do |session| | ||
browser = session['browser'] | ||
uniqueBrowsers += [browser] if uniqueBrowsers.all? { |b| b != browser } | ||
end | ||
``` | ||
|
||
**Решение** | ||
Использую Set для выбора уникальных браузеров | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
|
||
**Результат после оптимизационных действий** | ||
Метрика снизилась c 6.9 до 4.92 (~ в 1.4 раза) | ||
|
||
### Обновил спеку | ||
`rspec spec/work_performance_spec.rb:19` |
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
# frozen_string_literal: true | ||
|
||
require 'stackprof' | ||
require_relative '../task-1' | ||
|
||
=begin | ||
StackProf.run(mode: :wall, out: 'profiling/stackprof_work.dump', interval: 1000, disable_gc: true) do | ||
work(file_path: 'data_25000_thousands_lines.txt') | ||
end | ||
=end | ||
|
||
### stackprof speedscope | ||
|
||
profiling = StackProf.run(model: :wall, raw: true, disable_gc: true) do | ||
work(file_name: 'data_large.txt') | ||
end | ||
|
||
File.write('profiling/stackprof_speedscope_3_million_250_thousand_v2.json', JSON.generate(profiling)) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# frozen_string_literal: true | ||
|
||
require 'rspec' | ||
require 'rspec-benchmark' | ||
require_relative '../task-1' | ||
|
||
RSpec.configure do |config| | ||
config.include RSpec::Benchmark::Matchers | ||
end | ||
|
||
RSpec.describe 'WorkPerformance' do | ||
describe '.work' do | ||
context 'when file contains 25 thousands lines' do | ||
it 'performs under 1 second' do | ||
expect { work(file_name: 'data_25000_thousands_lines.txt') }.to perform_under(1).sec | ||
end | ||
end | ||
|
||
context 'when file contains 150 thousands lines' do | ||
it 'performs under 3 seconds' do | ||
expect { work(file_name: 'data_150_thousands_lines.txt') }.to perform_under(3).sec | ||
end | ||
end | ||
|
||
context 'when file contains 500 thousands lines' do | ||
it 'performs under 5 seconds' do | ||
expect { work(file_name: 'data_500_thousands_lines.txt') }.to perform_under(5).sec | ||
end | ||
end | ||
|
||
context 'when file contains 3_250_940 lines' do | ||
it 'performs under 30 seconds' do | ||
expect { work(file_name: 'data_large.txt') }.to perform_under(30).sec | ||
end | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
# frozen_string_literal: true | ||
|
||
require 'rspec' | ||
require_relative '../task-1' | ||
|
||
RSpec.describe do | ||
describe '.work' do | ||
let(:work_result_file) { File.write('result.json', '') } | ||
let(:test_data_file) { File.join('spec/tmp', 'test_data.txt') } | ||
let(:expected_work_result) do | ||
'{"totalUsers":3,"uniqueBrowsersCount":14,"totalSessions":15,"allBrowsers":"CHROME 13,CHROME 20,CHROME 35,CHROME 6,FIREFOX 12,FIREFOX 32,FIREFOX 47,INTERNET EXPLORER 10,INTERNET EXPLORER 28,INTERNET EXPLORER 35,SAFARI 17,SAFARI 29,SAFARI 39,SAFARI 49","usersStats":{"Leida Cira":{"sessionsCount":6,"totalTime":"455 min.","longestSession":"118 min.","browsers":"FIREFOX 12, INTERNET EXPLORER 28, INTERNET EXPLORER 28, INTERNET EXPLORER 35, SAFARI 29, SAFARI 39","usedIE":true,"alwaysUsedChrome":false,"dates":["2017-09-27","2017-03-28","2017-02-27","2016-10-23","2016-09-15","2016-09-01"]},"Palmer Katrina":{"sessionsCount":5,"totalTime":"218 min.","longestSession":"116 min.","browsers":"CHROME 13, CHROME 6, FIREFOX 32, INTERNET EXPLORER 10, SAFARI 17","usedIE":true,"alwaysUsedChrome":false,"dates":["2017-04-29","2016-12-28","2016-12-20","2016-11-11","2016-10-21"]},"Gregory Santos":{"sessionsCount":4,"totalTime":"192 min.","longestSession":"85 min.","browsers":"CHROME 20, CHROME 35, FIREFOX 47, SAFARI 49","usedIE":false,"alwaysUsedChrome":false,"dates":["2018-09-21","2018-02-02","2017-05-22","2016-11-25"]}}}' + "\n" | ||
end | ||
|
||
before do | ||
File.write(test_data_file, <<~DATA) | ||
user,0,Leida,Cira,0 | ||
session,0,0,Safari 29,87,2016-10-23 | ||
session,0,1,Firefox 12,118,2017-02-27 | ||
session,0,2,Internet Explorer 28,31,2017-03-28 | ||
session,0,3,Internet Explorer 28,109,2016-09-15 | ||
session,0,4,Safari 39,104,2017-09-27 | ||
session,0,5,Internet Explorer 35,6,2016-09-01 | ||
user,1,Palmer,Katrina,65 | ||
session,1,0,Safari 17,12,2016-10-21 | ||
session,1,1,Firefox 32,3,2016-12-20 | ||
session,1,2,Chrome 6,59,2016-11-11 | ||
session,1,3,Internet Explorer 10,28,2017-04-29 | ||
session,1,4,Chrome 13,116,2016-12-28 | ||
user,2,Gregory,Santos,86 | ||
session,2,0,Chrome 35,6,2018-09-21 | ||
session,2,1,Safari 49,85,2017-05-22 | ||
session,2,2,Firefox 47,17,2018-02-02 | ||
session,2,3,Chrome 20,84,2016-11-25 | ||
DATA | ||
end | ||
|
||
after do | ||
File.delete(test_data_file) | ||
end | ||
|
||
it 'equals result data' do | ||
work(file_name: 'spec/tmp/test_data.txt') | ||
expect(File.read('result.json')).to eq(expected_work_result) | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ips лучше использовать для микробенчмарков, когда реально много итераций в секунду
у нас тут скорее много секунд на итерацию, поэтому проще просто в секундах время смотреть