Skip to content

Optimized version of task-1 #155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions benchmark/benchmark_cpu_work.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
require 'benchmark/ips'
#require 'kalibera'
require 'benchmark'
require_relative '../task-1'

=begin
puts 'Starting benchmark...'

Benchmark.ips do |x|
x.config(confidence: 95)

x.report('Building report') do
work(file_name: 'data_25000_thousands_lines.txt')
end
end
=end

puts 'Starting benchmark...'

time = Benchmark.realtime do
work(file_name: 'data_25000_thousands_lines.txt')
end

puts "Finish in #{time.round(2)}"
26 changes: 26 additions & 0 deletions benchmark/benchmark_cpu_work_150_thousands.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# frozen_string_literal: true

require 'benchmark/ips'
require 'benchmark'
require_relative '../task-1'

=begin
puts 'Starting benchmark...'

Benchmark.ips do |x|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ips лучше использовать для микробенчмарков, когда реально много итераций в секунду

у нас тут скорее много секунд на итерацию, поэтому проще просто в секундах время смотреть

x.config(confidence: 95)

x.report('Building report') do
work(file_name: 'data_150_thousands_lines.txt')
end
end
=end


puts 'Starting benchmark...'

time = Benchmark.realtime do
work(file_name: 'data_150_thousands_lines.txt')
end

puts "Finish in #{time.round(2)}"
13 changes: 13 additions & 0 deletions benchmark/benchmark_cpu_work_3_plus_million.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# frozen_string_literal: true

require 'benchmark'
require_relative '../task-1'

puts 'Starting benchmark...'

time = Benchmark.realtime do
work(file_name: 'data_large.txt')
end

puts "Finish in #{time.round(2)}"

13 changes: 13 additions & 0 deletions benchmark/benchmark_cpu_work_500_thousands.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# frozen_string_literal: true

require 'benchmark'
require_relative '../task-1'

puts 'Starting benchmark...'

time = Benchmark.realtime do
work(file_name: 'data_500_thousands_lines.txt')
end

puts "Finish in #{time.round(2)}"

24 changes: 24 additions & 0 deletions benchmark/metric_for_150_thousands_lines.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
## Case 2
### Метрика на 150 тысячах строк
* 1.000 in 12.357131s (9 - 12 секунд)

### Бюджет на метрику
~2 секунды на обработку data_150_thousands_lines.txt

### Пишу спеку (для фиксации текущей метрики)
`rspec spec/work_performance_spec.rb:19`

### Применил профилировщик stackprof speedscope
5.78 секунд Array#each

`sessions + [parse_session(line)]`
**Решение:**

Добавляю в существующий массив данные вместо каждый раз создание нового массива

**Результат после оптимизационных действий**
Метрика снизилась c 9 секунд до 2 секунд (~ в 4.5 раза)

### Обновил спеку
`rspec spec/work_performance_spec.rb:19`

28 changes: 28 additions & 0 deletions benchmark/metric_for_25_thousands_lines.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
## Case 1

### Метрика на 25000 тысячах строк
* (1.000 in 5.441479s, (5.4 - 5.8) до начала оптимизации с включенным GC)
`benchmark/benchmark_cpu_work.rb`
### Бюджет на метрику
* На 25000 тысячах lines хочу чтобы время реальной работы программы уменьшилось до 2 секунд

### Пишу спеку (для фиксации текущей метрики)
`rspec spec/work_performance_spec.rb:13`

### Применил профилировщик stackprof speedscope
* Проблема в использовании select

`user_sessions = sessions.select { |session| session['user_id'] == user['id'] }`

Каждый раз делается селект из массива сессий. O(n) где n количество сессий, также это всё
происходит в рамках цикла по юзер лайнам (n * m)

**Решение:**
Один раз сгрупировать сессии по юзерам вне цикла
Далее внутри цикла использовать вытаскивать сессии для конкретного пользователя по хешу (O(1))

**Результат после оптимизационных действий**
На 25 тысяч строк метрика снизилась c 5.8 секунд до 632 ms (~ в 9.1 раза)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

да, и главное сложность линейная стала


### Обновил тест на производительность
`rspec spec/work_performance_spec.rb:13`
78 changes: 78 additions & 0 deletions benchmark/metric_for_3_plus_million_lines.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
### Case 4

### Метрика на 3_250_940 строк
Finish in 35.71

### Бюджет
оптимизировать не менее 30 секунд на обработку data_large.txt

### Пишу спеку (для фиксации текущей метрики)
`rspec spec/work_performance_spec.rb:31`

### Применил профилировщик stackprof speedscope
collect_stats_from_users
(self: 9.7%, 3.41 sec / total: 41%, 14.56 )

Чуть редактирую следующих код

`report['usersStats'][user_key] = report['usersStats'][user_key].merge(block.call(user))`

на

`report['usersStats'][user_key].merge!(block.call(user))` - без создания нового хеша, модификация имеющегося

-- Всё что ниже без дополнительного массива, модифицируем в один заход.

`user.sessions.map {|s| s['time']}.map {|t| t.to_i}.sum.to_s + ' min.'`

на

`user.sessions.map {|s| s['time']}.sum(&:to_i).to_s + ' min'`

--

`user.sessions.map {|s| s['time']}.map {|t| t.to_i}.max.to_s + ' min.' }`

на

`user.sessions.map {|s| s['time'].to_i}.max.to_s + ' min.'`

--

`user.sessions.map {|s| s['browser']}.map {|b| b.upcase}.sort.join(', ')`

на

`user.sessions.map { |s| s['browser'].upcase }.sort.join(', ')`

--

`user.sessions.map{|s| s['browser']}.any? { |b| b.upcase =~ /INTERNET EXPLORER/ }`

на

`user.sessions.any? { |s| s['browser'].upcase =~ /INTERNET EXPLORER/ }`

--

`user.sessions.map{|s| s['browser']}.all? { |b| b.upcase =~ /CHROME/ }`

на

`user.sessions.all? { |s| s['browser'].upcase =~ /CHROME/ }`

--

`user.sessions.map{|s| s['date']}.map {|d| Date.parse(d)}.sort.reverse.map { |d| d.iso8601 }`

на

`user.sessions.map { |s| s['date'] }.sort.reverse` -- можно не вызывать Date.parse(это была следующая точка роста по отчету)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

с датой можно ничего не делать, да


**Результат после оптимизационных действий**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

лучше не делать много изменений в одну итерацию, протому что сразу становится непонятно что как сработало


Не сильный прирост, но в бюджет уложился Finish in 24.53 (~ в 1.4 раза)

### Обновил тест на производительность
`rspec spec/work_performance_spec.rb:31`

46 changes: 46 additions & 0 deletions benchmark/metric_for_500_thoudsnds_lined.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
## Case 3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

по отдельным файлам сложновато читать, так как они отображаются вразнобой

как вариант можно их как то назвать вроде case_1_..., case_2_...

### Метрика на 500 тысячах строк
* Finish in 12.99

### Бюджет на метрику
~6 секунды на обработку data_500_thousands_lines.txt

### Пишу спеку (для фиксации текущей метрики)
`rspec spec/work_performance_spec.rb:25`

### Применил профилировщик stackprof speedscope
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 stackprof + speedscope = one love

6.84 секунд Array#each

`user_object = User.new(attributes: attributes, sessions: user_sessions)`

`users_objects = users_objects + [user_object]`

**Решение:**

Убираю лишнее присваивание, добавляю в существующий массив(users_object) данные

**Результат после оптимизационных действий**
Метрика снизилась c 12.99 секунд до 6.9 секунд (~ в 2 раза)

### Обновил спеку
`rspec spec/work_performance_spec.rb:19`

----

### Новая итерация (бюджет тот же ~6 секунды)
5.18 секунд Array#each
```
sessions.each do |session|
browser = session['browser']
uniqueBrowsers += [browser] if uniqueBrowsers.all? { |b| b != browser }
end
```

**Решение**
Использую Set для выбора уникальных браузеров
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


**Результат после оптимизационных действий**
Метрика снизилась c 6.9 до 4.92 (~ в 1.4 раза)

### Обновил спеку
`rspec spec/work_performance_spec.rb:19`
Binary file added data_150_thousands_lines.txt.zip
Binary file not shown.
Binary file added data_25000_thousands_lines.txt.zip
Binary file not shown.
Binary file added data_500_thousands_lines.txt.zip
Binary file not shown.
Binary file added profiling/.DS_Store
Binary file not shown.
1 change: 1 addition & 0 deletions profiling/stackprof_speedscope.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions profiling/stackprof_speedscope02.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions profiling/stackprof_speedscope_150_thousand.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions profiling/stackprof_speedscope_3_million_250_thousand.json

Large diffs are not rendered by default.

Binary file not shown.
Binary file not shown.
1 change: 1 addition & 0 deletions profiling/stackprof_speedscope_500_thousand.json

Large diffs are not rendered by default.

Binary file added profiling/stackprof_work.dump
Binary file not shown.
18 changes: 18 additions & 0 deletions profiling/stackprof_work.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# frozen_string_literal: true

require 'stackprof'
require_relative '../task-1'

=begin
StackProf.run(mode: :wall, out: 'profiling/stackprof_work.dump', interval: 1000, disable_gc: true) do
work(file_path: 'data_25000_thousands_lines.txt')
end
=end

### stackprof speedscope

profiling = StackProf.run(model: :wall, raw: true, disable_gc: true) do
work(file_name: 'data_large.txt')
end

File.write('profiling/stackprof_speedscope_3_million_250_thousand_v2.json', JSON.generate(profiling))
Binary file added spec/.DS_Store
Binary file not shown.
37 changes: 37 additions & 0 deletions spec/work_performance_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# frozen_string_literal: true

require 'rspec'
require 'rspec-benchmark'
require_relative '../task-1'

RSpec.configure do |config|
config.include RSpec::Benchmark::Matchers
end

RSpec.describe 'WorkPerformance' do
describe '.work' do
context 'when file contains 25 thousands lines' do
it 'performs under 1 second' do
expect { work(file_name: 'data_25000_thousands_lines.txt') }.to perform_under(1).sec
end
end

context 'when file contains 150 thousands lines' do
it 'performs under 3 seconds' do
expect { work(file_name: 'data_150_thousands_lines.txt') }.to perform_under(3).sec
end
end

context 'when file contains 500 thousands lines' do
it 'performs under 5 seconds' do
expect { work(file_name: 'data_500_thousands_lines.txt') }.to perform_under(5).sec
end
end

context 'when file contains 3_250_940 lines' do
it 'performs under 30 seconds' do
expect { work(file_name: 'data_large.txt') }.to perform_under(30).sec
end
end
end
end
46 changes: 46 additions & 0 deletions spec/work_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# frozen_string_literal: true

require 'rspec'
require_relative '../task-1'

RSpec.describe do
describe '.work' do
let(:work_result_file) { File.write('result.json', '') }
let(:test_data_file) { File.join('spec/tmp', 'test_data.txt') }
let(:expected_work_result) do
'{"totalUsers":3,"uniqueBrowsersCount":14,"totalSessions":15,"allBrowsers":"CHROME 13,CHROME 20,CHROME 35,CHROME 6,FIREFOX 12,FIREFOX 32,FIREFOX 47,INTERNET EXPLORER 10,INTERNET EXPLORER 28,INTERNET EXPLORER 35,SAFARI 17,SAFARI 29,SAFARI 39,SAFARI 49","usersStats":{"Leida Cira":{"sessionsCount":6,"totalTime":"455 min.","longestSession":"118 min.","browsers":"FIREFOX 12, INTERNET EXPLORER 28, INTERNET EXPLORER 28, INTERNET EXPLORER 35, SAFARI 29, SAFARI 39","usedIE":true,"alwaysUsedChrome":false,"dates":["2017-09-27","2017-03-28","2017-02-27","2016-10-23","2016-09-15","2016-09-01"]},"Palmer Katrina":{"sessionsCount":5,"totalTime":"218 min.","longestSession":"116 min.","browsers":"CHROME 13, CHROME 6, FIREFOX 32, INTERNET EXPLORER 10, SAFARI 17","usedIE":true,"alwaysUsedChrome":false,"dates":["2017-04-29","2016-12-28","2016-12-20","2016-11-11","2016-10-21"]},"Gregory Santos":{"sessionsCount":4,"totalTime":"192 min.","longestSession":"85 min.","browsers":"CHROME 20, CHROME 35, FIREFOX 47, SAFARI 49","usedIE":false,"alwaysUsedChrome":false,"dates":["2018-09-21","2018-02-02","2017-05-22","2016-11-25"]}}}' + "\n"
end

before do
File.write(test_data_file, <<~DATA)
user,0,Leida,Cira,0
session,0,0,Safari 29,87,2016-10-23
session,0,1,Firefox 12,118,2017-02-27
session,0,2,Internet Explorer 28,31,2017-03-28
session,0,3,Internet Explorer 28,109,2016-09-15
session,0,4,Safari 39,104,2017-09-27
session,0,5,Internet Explorer 35,6,2016-09-01
user,1,Palmer,Katrina,65
session,1,0,Safari 17,12,2016-10-21
session,1,1,Firefox 32,3,2016-12-20
session,1,2,Chrome 6,59,2016-11-11
session,1,3,Internet Explorer 10,28,2017-04-29
session,1,4,Chrome 13,116,2016-12-28
user,2,Gregory,Santos,86
session,2,0,Chrome 35,6,2018-09-21
session,2,1,Safari 49,85,2017-05-22
session,2,2,Firefox 47,17,2018-02-02
session,2,3,Chrome 20,84,2016-11-25
DATA
end

after do
File.delete(test_data_file)
end

it 'equals result data' do
work(file_name: 'spec/tmp/test_data.txt')
expect(File.read('result.json')).to eq(expected_work_result)
end
end
end
Loading