




































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The optimization of Zlib, a widely-used data compression library, on Arm processors, focusing on the benefits of NEON SIMD technology. The author, Adenilson Cavalcanti, discusses the potential optimization candidates and their impact on Chromium's performance. Topics include Adler-32 checksum, inflate_fast, and NEON-based compression methods.
Typology: Lecture notes
1 / 44
This page cannot be seen from the preview
Don't miss anything!
Adenilson Cavalcanti ARM - San Jose (California)
@adenilsonc
Why zlib?
Zlib Used everywhere (libpng, Skia, freetype, cronet , Firefox, Chrome, linux kernel, android, iOS, JDK, git, etc). Old code base released in 1995. Written in K&R C style.
Context Lacks any optimizations for ARM CPUs.
Problem statement Identify potential optimization candidates and verify positive effects in Chromium.
โ Performed some benchmarking. โ Contacted each project. โ Mixed results (1 project never replied back).
Before deepening the fork...
โ Performed some benchmarking. โ Contacted each project. โ Mixed results (1 project never replied back).
None focused on decompression* or had ARM specific optimizations.
Before forking...
*Important for a Web Browser.
Parrots are not created equal
Original: 2.7MB
Palette: 0.8MB
Zopfli: 2.6MB
Perf to the rescue
โ Optional on ARMv7. โ Mandatory on ARMv8.
NEON
Registers
ARMv โ 16 registers@128 bits: Q
ARMv โ 32 registers@128 bits: Q0 - Q31. โ 32 registers@64bits: D0 - D31. โ 32 registers@32bits: S0 - S31. โ 32 registers@8bits: H0 - H31. โ Varied set of instructions: load, store, add, mul, etc.
Entertaining definition
https://www.youtube.com/watch?v=l49MHwooaVQ
Practical explanation
a) HTML b) JPEG
Practical visualization
./binwalk -E filea) HTML: 0.68 b) JPEG: 0.
Adler-32 checksum
https://en.wikipedia.org/wiki/Adler-
Adler-32 simplistic implementation
https://en.wikipedia.org/wiki/Adler-