Benchmark #

Intuisi tentang kode mana yang lebih cepat sering keliru. String concatenation dengan + atau dengan <<? Array#include? atau Set#include?? map lalu select atau filter_map? Tanpa pengukuran nyata, kamu hanya menebak — dan tebakan sering salah. Modul Benchmark di Ruby standard library menyediakan cara yang tepat untuk mengukur waktu eksekusi kode dan membandingkan beberapa implementasi secara head-to-head. Ia menampilkan waktu user, sistem, dan real time, serta menormalisasi hasil agar perbandingan adil. Artikel ini membahas seluruh API Benchmark, cara menginterpretasikan hasilnya, dan — yang sama pentingnya — bagaimana menulis benchmark yang valid sehingga hasilnya benar-benar mencerminkan performa di production.

Benchmark Dasar #

require "benchmark"

# Benchmark.measure — ukur waktu satu blok kode
hasil = Benchmark.measure do
  100_000.times { "hello" + " " + "world" }
end
puts hasil
#   0.089432   0.001234   0.090666 (  0.090789)
#   │          │          │         │
#   user time  sys time   total     real (wall clock)

# Kolom yang ditampilkan:
# user time  — waktu CPU untuk kode user space
# sys time   — waktu CPU untuk system calls
# total      — user + sys
# real time  — waktu jam dinding (wall clock), termasuk I/O wait, GC, dll

flowchart LR
    A["Benchmark.measure { kode }"] --> B[Benchmark::Tms]
    B --> C["utime — user CPU time"]
    B --> D["stime — system CPU time"]
    B --> E["total — utime + stime"]
    B --> F["real — wall clock time"]

    G["Benchmark.bm { |x| x.report }"] --> H[Tabel perbandingan]
    I["Benchmark.bmbm { |x| x.report }"] --> J[Warming up + tabel]
    K["Benchmark.realtime { kode }"] --> L["Float — detik"]

Benchmark.bm — Membandingkan Implementasi #

Benchmark.bm adalah method yang paling sering dipakai — ia menampilkan hasil dari beberapa implementasi dalam format tabel yang mudah dibaca.

require "benchmark"

n = 100_000

Benchmark.bm do |x|
  x.report("string +:") { n.times { "hello" + " " + "world" } }
  x.report("string <<:") { n.times { +"hello" << " " << "world" } }
  x.report("interpolasi:") { n.times { "hello #{"world"}" } }
end

#                   user     system      total        real
# string +:       0.089432   0.000000   0.089432 (  0.089789)
# string <<:      0.045123   0.000000   0.045123 (  0.045234)
# interpolasi:    0.032456   0.000000   0.032456 (  0.032567)

Label dan Format #

require "benchmark"

n = 500_000

# Label dengan lebar kolom konsisten
Benchmark.bm(20) do |x|   # 20 = lebar kolom label
  x.report("Array#include?:") do
    arr = (1..1000).to_a
    n.times { arr.include?(rand(1000)) }
  end

  x.report("Set#include?:") do
    require "set"
    set = Set.new(1..1000)
    n.times { set.include?(rand(1000)) }
  end

  x.report("Hash key lookup:") do
    hash = (1..1000).each_with_object({}) { |i, h| h[i] = true }
    n.times { hash.key?(rand(1000)) }
  end
end

#                         user     system      total        real
# Array#include?:        4.234512   0.000000   4.234512 (  4.235123)
# Set#include?:          0.156789   0.000000   0.156789 (  0.157234)
# Hash key lookup:       0.134567   0.000000   0.134567 (  0.135012)

Benchmark.bmbm — Benchmark yang Lebih Akurat #

bmbm menjalankan benchmark dua kali: pertama sebagai “rehearsal” (pemanasan) lalu sekali lagi untuk hasil yang sebenarnya. Ini penting karena Ruby’s garbage collector, JIT compiler, dan cache CPU bisa membuat run pertama terlihat lebih lambat dari run berikutnya.

require "benchmark"

n = 200_000

Benchmark.bmbm do |x|
  x.report("map + select:") do
    (1..n).map { |i| i * 2 }.select { |i| i > n }
  end

  x.report("filter_map:") do
    (1..n).filter_map { |i| i * 2 if i * 2 > n }
  end

  x.report("lazy:") do
    (1..n).lazy.map { |i| i * 2 }.select { |i| i > n }.to_a
  end
end

# Rehearsal -----------------------------------------------
# map + select:    0.089432   0.002345   0.091777 (  0.092123)
# filter_map:      0.045678   0.001234   0.046912 (  0.047234)
# lazy:            0.012345   0.000567   0.012912 (  0.013123)
# -------------------------------------- total: 0.150601sec
#
#                  user     system      total        real
# map + select:    0.087123   0.001234   0.088357 (  0.088678)
# filter_map:      0.044567   0.001123   0.045690 (  0.045901)
# lazy:            0.011234   0.000456   0.011690 (  0.011901)

Gunakan bmbm (bukan bm) ketika:

Membandingkan kode yang melakukan alokasi memori besar (karena GC bisa terpicu di run pertama)

Benchmarking kode yang memanfaatkan CPU cache (warmup membuat cache sudah terisi saat pengukuran)

Ingin hasil yang lebih stabil dan reproducible

Untuk benchmark sederhana tanpa concern tentang GC dan cache, bm sudah cukup.

Benchmark.realtime #

Untuk kasus sederhana di mana kamu hanya perlu mengetahui berapa lama sesuatu berjalan (bukan perbandingan), realtime mengembalikan Float dalam satuan detik.

require "benchmark"

# realtime — kembalikan wall clock time sebagai Float
elapsed = Benchmark.realtime do
  sleep(0.5)
  100_000.times { Math.sqrt(rand) }
end

puts "Selesai dalam #{elapsed.round(3)} detik"
# => Selesai dalam 0.523 detik

# Berguna untuk logging di production
def proses_laporan(data)
  waktu = Benchmark.realtime do
    @hasil = hitung_laporan(data)
  end

  logger.info "Laporan diproses dalam #{(waktu * 1000).round(1)}ms"
  @hasil
end

# Atau untuk simple profiling inline
[
  ["sort:", -> { (1..10_000).to_a.shuffle.sort }],
  ["sort_by:", -> { (1..10_000).to_a.shuffle.sort_by { |x| x } }],
  ["sort numeric:", -> { (1..10_000).to_a.shuffle.sort { |a, b| a <=> b } }]
].each do |label, kode|
  t = Benchmark.realtime { 100.times { kode.call } }
  puts "#{label.ljust(20)} #{(t * 1000).round(2)}ms"
end

Menulis Benchmark yang Valid #

Benchmark yang ditulis dengan buruk memberikan hasil yang menyesatkan. Beberapa aturan penting:

Gunakan Iterasi yang Cukup #

require "benchmark"

# ANTI-PATTERN: iterasi terlalu sedikit — noise lebih besar dari sinyal
Benchmark.bm do |x|
  x.report("sedikit:") { 10.times { "hello" + "world" } }
end
#   0.000012   0.000000   0.000012 (  0.000013)
# Angka ini tidak bermakna — terlalu kecil untuk diukur secara akurat

# BENAR: iterasi cukup agar pengukuran stabil (biasanya > 100ms per benchmark)
Benchmark.bm do |x|
  x.report("cukup:") { 1_000_000.times { "hello" + "world" } }
end
#   0.456789   0.001234   0.458023 (  0.459012)
# Angka ini bermakna

Hindari GC di Tengah Benchmark #

require "benchmark"

# ANTI-PATTERN: membiarkan GC berjalan secara acak selama benchmark
Benchmark.bm do |x|
  x.report("dengan GC:") do
    100_000.times { [1, 2, 3].map { |n| n * 2 } }
    # GC bisa terpicu kapan saja, menambah variance
  end
end

# BENAR: jalankan GC sebelum setiap benchmark untuk kondisi yang konsisten
require "gc"

Benchmark.bm do |x|
  GC.start; GC.compact if GC.respond_to?(:compact)
  x.report("bersih:") do
    GC.disable
    100_000.times { [1, 2, 3].map { |n| n * 2 } }
    GC.enable
  end
end

Pastikan Kode Benar-Benar Dieksekusi #

require "benchmark"

# ANTI-PATTERN: Ruby optimizer mungkin menghapus kode yang hasilnya tidak dipakai
Benchmark.bm do |x|
  x.report("bisa dioptimasi:") do
    100_000.times { 2 ** 32 }   # hasilnya tidak disimpan, mungkin di-optimize away
  end
end

# BENAR: simpan hasil untuk memastikan kode dieksekusi
Benchmark.bm do |x|
  x.report("pasti dieksekusi:") do
    hasil = nil
    100_000.times { hasil = 2 ** 32 }
    hasil   # pastikan hasil digunakan
  end
end

Benchmark Kondisi yang Sama #

require "benchmark"

# ANTI-PATTERN: membandingkan kondisi yang tidak setara
arr = (1..1000).to_a

Benchmark.bm do |x|
  # Ini membuat Array baru setiap kali — overhead pembuatan Set tidak terukur
  x.report("Set (overhead include):") do
    set = Set.new(arr)   # pembuatan Set ada di dalam benchmark!
    100_000.times { set.include?(rand(1000)) }
  end

  x.report("Array (fair):") do
    100_000.times { arr.include?(rand(1000)) }
  end
end

# BENAR: pisahkan setup dari yang diukur
require "set"
arr = (1..1000).to_a
set = Set.new(arr)   # setup di LUAR benchmark

Benchmark.bm(15) do |x|
  x.report("Array include?:") { 100_000.times { arr.include?(rand(1000)) } }
  x.report("Set include?:") { 100_000.times { set.include?(rand(1000)) } }
end

Contoh Benchmark Nyata #

Perbandingan String Building #

require "benchmark"

n = 200_000

Benchmark.bmbm(25) do |x|
  x.report("+ operator:") do
    n.times do
      result = ""
      result = result + "Hello" + ", " + "World" + "!"
    end
  end

  x.report("<< operator:") do
    n.times do
      result = +""
      result << "Hello" << ", " << "World" << "!"
    end
  end

  x.report("interpolasi:") do
    n.times { "Hello, World!" }
  end

  x.report("Array join:") do
    n.times { ["Hello", ", ", "World", "!"].join }
  end

  x.report("format/sprintf:") do
    n.times { format("Hello, %s!", "World") }
  end
end

# Hasil tipikal:
#                           user     system      total        real
# + operator:           0.456789   0.000000   0.456789 (  0.457012)
# << operator:          0.123456   0.000000   0.123456 (  0.123678)
# interpolasi:          0.089123   0.000000   0.089123 (  0.089345)
# Array join:           0.167890   0.000000   0.167890 (  0.168012)
# format/sprintf:       0.234567   0.000000   0.234567 (  0.234789)

Perbandingan Struktur Data #

require "benchmark"
require "set"

n = 100_000
data = (1..10_000).to_a.shuffle

Benchmark.bmbm(20) do |x|
  arr = data.dup
  set = Set.new(data)
  hash = data.each_with_object({}) { |v, h| h[v] = true }

  x.report("Array include?:") do
    n.times { arr.include?(rand(10_000)) }
  end

  x.report("Set include?:") do
    n.times { set.include?(rand(10_000)) }
  end

  x.report("Hash key?:") do
    n.times { hash.key?(rand(10_000)) }
  end

  x.report("Array bsearch:") do
    sorted = arr.sort
    n.times { sorted.bsearch { |x| x >= rand(10_000) } }
  end
end

Benchmark dengan Output Terformat #

require "benchmark"

def jalankan_benchmark(judul, n: 100_000, &blok)
  puts "\n#{judul}"
  puts "=" * 50
  puts "Iterasi: #{n.to_s.reverse.gsub(/(\d{3})(?=\d)/, '\\1.').reverse}"

  Benchmark.bmbm(20, &blok)
end

jalankan_benchmark("Hash vs Array untuk lookup", n: 500_000) do |x|
  ukuran = 10_000
  arr = (1..ukuran).to_a
  hash = (1..ukuran).each_with_object({}) { |i, h| h[i] = true }

  x.report("Array#include?:") { 500_000.times { arr.include?(rand(ukuran)) } }
  x.report("Hash#key?:") { 500_000.times { hash.key?(rand(ukuran)) } }
end

Menginterpretasikan Hasil #

Memahami output Benchmark adalah setengah dari pekerjaannya.

                  user     system      total        real
implementasi_a:  0.234567  0.001234   0.235801 (  0.236012)
implementasi_b:  0.123456  0.000567   0.124023 (  0.124234)

user time adalah waktu CPU yang digunakan oleh kode Ruby kamu. Ini yang paling relevan untuk membandingkan algoritma murni.

system time adalah waktu CPU untuk system call — I/O, alokasi memori dari OS, dan sebagainya. Tinggi jika kode banyak melakukan operasi I/O atau memori.

total adalah user + system — waktu CPU keseluruhan.

real time (wall clock time) adalah waktu sebenarnya yang berlalu. Bisa lebih tinggi dari total jika ada I/O wait, GC pause, atau context switch. Untuk kode CPU-bound, real ≈ total. Untuk kode I/O-bound, real » total.

# Tips interpretasi:
# 1. Fokus pada "real" untuk perbandingan praktis di production
# 2. Jika real >> total, ada I/O wait atau GC yang signifikan
# 3. Variance antar run (jalankan beberapa kali!) menunjukkan reliabilitas
# 4. Perbandingan relatif lebih bermakna dari angka absolut:
#    "Implementasi B 1.9x lebih cepat dari A" lebih bermakna dari
#    "Implementasi B membutuhkan 0.124 detik"

# Menghitung speedup factor
waktu_a = Benchmark.realtime { 100_000.times { arr.include?(rand(10_000)) } }
waktu_b = Benchmark.realtime { 100_000.times { hash.key?(rand(10_000)) } }

speedup = waktu_a / waktu_b
puts "Hash #{speedup.round(1)}x lebih cepat dari Array untuk lookup"

Benchmark di Production #

Benchmark modul cocok untuk development dan profiling. Untuk monitoring performa di production, pola yang sedikit berbeda lebih tepat.

# Logging waktu eksekusi di production
require "benchmark"

class PerformanceLogger
  def self.ukur(nama_operasi, threshold_ms: 100, &blok)
    hasil = nil
    waktu = Benchmark.realtime { hasil = blok.call }
    ms = (waktu * 1000).round(2)

    if ms > threshold_ms
      logger.warn "SLOW: #{nama_operasi} membutuhkan #{ms}ms (threshold: #{threshold_ms}ms)"
    else
      logger.debug "#{nama_operasi}: #{ms}ms"
    end

    hasil
  end
end

# Penggunaan
pengguna = PerformanceLogger.ukur("fetch_users", threshold_ms: 200) do
  User.where(aktif: true).includes(:profile).to_a
end

PerformanceLogger.ukur("kirim_email_batch", threshold_ms: 5000) do
  pengguna.each { |u| EmailService.kirim_welcome(u) }
end

Ringkasan #

Benchmark.bmbm untuk perbandingan serius — menjalankan benchmark dua kali (rehearsal + actual) untuk mengeliminasi efek cold start, GC, dan CPU cache; lebih akurat dari bm untuk kebanyakan kasus.

Benchmark.realtime untuk pengukuran tunggal — mengembalikan Float detik; cocok untuk logging waktu eksekusi di production atau profiling cepat.

Iterasi yang cukup untuk hasil bermakna — benchmark harus berjalan setidaknya 100ms–1s untuk hasil yang stabil; terlalu sedikit iterasi menghasilkan noise yang mendominasi sinyal.

Setup di luar blok benchmark — hanya ukur apa yang ingin dibandingkan; inisialisasi struktur data, koneksi, dan persiapan lain harus di luar blok x.report.

Bandingkan kondisi yang setara — pastikan semua implementasi memulai dari kondisi yang sama (ukuran data, state, memory) agar perbandingan adil.

Fokus pada perbandingan relatif — “2x lebih cepat” lebih bermakna dari “0.045 detik”; angka absolut berubah tergantung hardware, tapi rasio relatif lebih stabil.

“real time” untuk performa production — user+sys menunjukkan efisiensi CPU, tapi real time menunjukkan apa yang pengguna rasakan; untuk kode I/O-bound keduanya bisa sangat berbeda.

Benchmark dulu, optimasi kemudian — jangan optimasi sebelum mengukur; intuisi tentang bottleneck sering salah, dan optimasi prematur membuat kode lebih kompleks tanpa manfaat nyata.

← Sebelumnya: Tempfile