Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Operating System - Administrivia and Dynamic Memory Allocation - Lecture Slides | CPE 229, Study notes of Electrical and Electronics Engineering

Operating System Material Type: Notes; Professor: Staff; Class: Computer Design and Assembly Language Programming; Subject: Computer Engineering; University: California Polytechnic State University - San Luis Obispo; Term: Spring 2010;

Typology: Study notes

2010/2011

Uploaded on 08/24/2011

gongzhengz
gongzhengz 🇺🇸

5

(1)

4 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Administrivia
Project 2 due right now
- As before, free extension if you are here
- For SCPD students, put this at top of design doc: caf656
- For non-SCPD students, put this:
Midterm Tuesday
- Open book, open notes (but not open notebook computer)
- Covers first 10 lectures of course (including today)
Midterm review section tomorrow
Section for Project 3 next Friday
I will monitor newsgroup/list for questions
- Please put “[midterm]” in subject of questions about midterm
1/36
Dynamic memory allocation
Almost every useful program uses it
- Gives wonderful functionality benefits
Don’t have to statically specify complex data structures
Can have data grow as a function of input size
Allows recursive procedures (stack growth)
- But, can have a huge impact on performance
Today: how to implement it
- Lecture draws on [Wilson] (good survey from 1995)
Some interesting facts:
- Two or three line code change can have huge, non-obvious
impact on how well allocator works (examples to come)
- Proven: impossible to construct an ”always good” allocator
- Surprising result: after 35 years, memory management still
poorly understood
2/36
Why is it hard?
Satisfy arbitrary set of allocation and free’s.
Easy without free: set a pointer to the beginning of
some big chunk of memory (“heap”) and increment
on each allocation:
Problem: free creates holes (“fragmentation”) Result?
Lots of free space but cannot satisfy request!
3/36
More abstractly
What an allocator must do:
- Track which parts of memory in use, which parts are free
- Ideal: no wasted space, no time overhead
What the allocator cannot do:
- Control order of the number and size of requested blocks
- Change user ptrs =(bad) placement decisions permanent
The core fight: minimize fragmentation
- App frees blocks in any order, creating holes in “heap”
- Holes too small? cannot satisfy future requests
4/36
What is fragmentation really?
Inability to use memory that is free
Two factors required for fragmentation
- Different lifetimes—if adjacent objects die at different times, then
fragmentation:
- If they die at the same time, then no fragmentation:
- Different sizes: If all requests the same size, then no fragmentation
(that’s why no external fragmentation w. paging):
5/36
Important decisions
Placement choice: where in free memory to put a
requested block?
- Freedom: can select any memory in the heap
- Ideal: put block where it won’t cause fragmentation later
(impossible in general: requires future knowledge)
Split free blocks to satisfy smaller requests?
- Fights internal fragmentation
- Freedom: can choose any larger block to split
- One way: choose block with smallest remainder (best fit)
Coalescing free blocks to yield larger blocks
- Freedom: when to coalesce (deferring can be good) fights
external fragmentation
6/36
pf3
pf4
pf5

Partial preview of the text

Download Operating System - Administrivia and Dynamic Memory Allocation - Lecture Slides | CPE 229 and more Study notes Electrical and Electronics Engineering in PDF only on Docsity!

Administrivia

  • Project 2 due right now
    • As before, free extension if you are here
    • For SCPD students, put this at top of design doc: caf
    • For non-SCPD students, put this:
  • Midterm Tuesday
    • Open book, open notes (but not open notebook computer)
    • Covers first 10 lectures of course (including today)
  • Midterm review section tomorrow
  • Section for Project 3 next Friday
  • I will monitor newsgroup/list for questions
    • Please put “[midterm]” in subject of questions about midterm

1/

Dynamic memory allocation

  • Almost every useful program uses it
    • Gives wonderful functionality benefits ⊲ Don’t have to statically specify complex data structures ⊲ Can have data grow as a function of input size ⊲ Allows recursive procedures (stack growth)
    • But, can have a huge impact on performance
  • Today: how to implement it
    • Lecture draws on [Wilson] (good survey from 1995)
  • Some interesting facts:
    • Two or three line code change can have huge, non-obvious impact on how well allocator works (examples to come)
    • Proven: impossible to construct an ”always good” allocator
    • Surprising result: after 35 years, memory management still poorly understood 2/

Why is it hard?

  • Satisfy arbitrary set of allocation and free’s.
  • Easy without free: set a pointer to the beginning of

some big chunk of memory (“heap”) and increment

on each allocation:

  • Problem: free creates holes (“fragmentation”) Result?

Lots of free space but cannot satisfy request!

3/

More abstractly

  • What an allocator must do:
    • Track which parts of memory in use, which parts are free
    • Ideal: no wasted space, no time overhead
  • What the allocator cannot do:
    • Control order of the number and size of requested blocks
    • Change user ptrs =⇒ (bad) placement decisions permanent
  • The core fight: minimize fragmentation
    • App frees blocks in any order, creating holes in “heap”
    • Holes too small? cannot satisfy future requests 4/

What is fragmentation really?

  • Inability to use memory that is free
  • Two factors required for fragmentation
    • Different lifetimes—if adjacent objects die at different times, then fragmentation:
    • If they die at the same time, then no fragmentation:
    • Different sizes: If all requests the same size, then no fragmentation (that’s why no external fragmentation w. paging):

5/

Important decisions

  • Placement choice: where in free memory to put a

requested block?

  • Freedom: can select any memory in the heap
  • Ideal: put block where it won’t cause fragmentation later (impossible in general: requires future knowledge)
  • Split free blocks to satisfy smaller requests?
  • Fights internal fragmentation
  • Freedom: can choose any larger block to split
  • One way: choose block with smallest remainder (best fit)
  • Coalescing free blocks to yield larger blocks
  • Freedom: when to coalesce (deferring can be good) fights external fragmentation (^) 6/

Impossible to “solve” fragmentation

  • If you read allocation papers to find the best allocator
    • All discussions revolve around tradeoffs
    • The reason? There cannot be a best allocator
  • Theoretical result:
    • For any possible allocation algorithm, there exist streams of allocation and deallocation requests that defeat the allocator and force it into severe fragmentation.
  • How much fragmentation should we tolerate?
    • Let M = bytes of live data, n min = smallest allocation, n max = largest – How much gross memory required?
    • Bad allocator: M · ( n max/ n min) (only ever uses a memory location for a single size)
    • Good allocator: ∼ M · log( n max/ n min) 7/

Pathological examples

  • Given allocation of 7 20-byte chunks
    • What’s a bad stream of frees and then allocates?
    • Free every other chunk, then alloc 21 bytes
  • Given a 128-byte limit on malloced space
    • What’s a really bad combination of mallocs & frees?
    • Malloc 128 1-byte chunks, free every other
    • Malloc 32 2-byte chunks, free every other (1- & 2-byte) chunk
    • Malloc 16 4-byte chunks, free every other chunk...
  • Next: two allocators (best fit, first fit) that, in practice,

work pretty well

  • “pretty well” = ∼20% fragmentation under many workloads 8/

Pathological examples

  • Given allocation of 7 20-byte chunks
    • What’s a bad stream of frees and then allocates?
    • Free every other chunk, then alloc 21 bytes
  • Given a 128-byte limit on malloced space
    • What’s a really bad combination of mallocs & frees?
    • Malloc 128 1-byte chunks, free every other
    • Malloc 32 2-byte chunks, free every other (1- & 2-byte) chunk
    • Malloc 16 4-byte chunks, free every other chunk...
  • Next: two allocators (best fit, first fit) that, in practice,

work pretty well

  • “pretty well” = ∼20% fragmentation under many workloads

8/

Pathological examples

  • Given allocation of 7 20-byte chunks
    • What’s a bad stream of frees and then allocates?
    • Free every other chunk, then alloc 21 bytes
  • Given a 128-byte limit on malloced space
    • What’s a really bad combination of mallocs & frees?
    • Malloc 128 1-byte chunks, free every other
    • Malloc 32 2-byte chunks, free every other (1- & 2-byte) chunk
    • Malloc 16 4-byte chunks, free every other chunk...
  • Next: two allocators (best fit, first fit) that, in practice,

work pretty well

  • “pretty well” = ∼20% fragmentation under many workloads

8/

Best fit

  • Strategy: minimize fragmentation by allocating

space from block that leaves smallest fragment

  • Data structure: heap is a list of free blocks, each has a header holding block size and pointers to next
  • Code: Search freelist for block closest in size to the request. (Exact match is ideal)
  • During free (usually) coalesce adjacent blocks
  • Problem: Sawdust
  • Remainder so small that over time left with “sawdust” everywhere
  • Fortunately not a problem in practice

9/

Best fit gone wrong

  • Simple bad case: allocate n , m ( n < m ) in alternating

orders, free all the n s, then try to allocate an n + 1

  • Example: start with 100 bytes of memory
    • alloc 19, 21, 19, 21, 19
    • free 19, 19, 19:
    • alloc 20? Fails! (wasted space = 57 bytes)
  • However, doesn’t seem to happen in practice (though

the way real programs behave suggest it easily could)

10/

Slab allocation [Bonwick]

  • Kernel allocates many instances of same structures
    • E.g., a 1.7 KB task struct for every process on system
  • Often want contiguous physical memory (for DMA)
  • Slab allocation optimizes for this case:
    • A slab is multiple pages of contiguous physical memory
    • A cache contains one or more slabs
    • Each cache stores only one kind of object (fixed size)
  • Each slab is full, empty, or partial
  • E.g., need new task struct?
    • Look in the task struct cache
    • If there is a partial slab, pick free task struct in that
    • Else, use empty, or may need to allocate new slab for cache
  • Advantages: speed, and no internal fragmentation 16/

Known patterns of real programs

  • So far we’ve treated programs as black boxes.
  • Most real programs exhibit 1 or 2 (or all 3) of the

following patterns of alloc/dealloc:

  • Ramps : accumulate data monotonically over time
  • Peaks : allocate many objects, use briefly, then free all
  • Plateaus : allocate many objects, use for a long time

17/

Pattern 1: ramps

  • In a practical sense: ramp = no free!
    • Implication for fragmentation?
    • What happens if you evaluate allocator with ramp programs only?

18/

Pattern 2: peaks

  • Peaks: allocate many objects, use briefly, then free all
    • Fragmentation a real danger
    • What happens if peak allocated from contiguous memory?
    • Interleave peak & ramp? Interleave two different peaks?

19/

Exploiting peaks

  • Peak phases: alloc a lot, then free everything
    • So have new allocation interface: alloc as before, but only support free of everything
    • Called “arena allocation”, “obstack” (object stack), or alloca/procedure call (by compiler people)
  • Arena = a linked list of large chunks of memory
    • Advantages: alloc is a pointer increment, free is “free” No wasted space for tags or list pointers

20/

Pattern 3: Plateaus

  • Plateaus: allocate many objects, use for a long time
    • What happens if overlap with peak or different plateau?

21/

Fighting fragmentation

  • Segregation = reduced fragmentation:
    • Allocated at same time ∼ freed at same time
    • Different type ∼ freed at different time
  • Implementation observations:
    • Programs allocate small number of different sizes
    • Fragmentation at peak use more important than at low
    • Most allocations small (< 10 words)
    • Work done with allocated memory increases with size
    • Implications?

22/

Simple, fast segregated free lists

  • Array of free lists for small sizes, tree for larger
    • Place blocks of same size on same page
    • Have count of allocated blocks: if goes to zero, can return page
  • Pro: segregate sizes, no size tag, fast small alloc
  • Con: worst case waste: 1 page per size even w/o free,

after pessimal free waste 1 page per object

23/

Typical space overheads

  • Free list bookkeeping + alignment determine

minimum allocatable size:

  • Store size of block
  • Pointers to next and previous freelist element
  • Machine enforced overhead: alignment. Allocator doesn’t know type. Must align memory to conservative boundary
  • Minimum allocation unit? Space overhead when allocated?

24/

Getting more space from OS

  • On Unix, can use sbrk
    • E.g., to activate a new zero-filled page:
  • For large allocations, sbrk a bad idea
    • May want to give memory back to OS
    • Can’t with sbrk unless big chunk last thing allocated
    • So allocate large chunk using mmap’s MAP ANON 25/

Faults + resumption = power

  • Resuming after fault lets us emulate many things
    • “every problem can be solved with layer of indirection”
  • Example: sub-page protection
  • To protect sub-page region in paging system:
    • Set entire page to weakest permission; record in PT
    • Any access that violates perm will cause an access fault
    • Fault handler checks if page special, and if so, if access allowed. Continue or raise error, as appropriate

26/

More fault resumption examples

  • Emulate accessed bits:
    • Set page permissions to “invalid”.
    • On any access will get a fault: Mark as accessed
  • Avoid save/restore of FP registers
    • Make first FP operation fault to detect usage
  • Emulate non-existent instructions:
    • Give inst an illegal opcode; OS fault handler detects and emulates fake instruction
  • Run OS on top of another OS!
  • Slam OS into normal process
  • When does something “privileged,” real OS gets woken up with a fault.
  • If op allowed, do it, otherwise kill.
  • IBM’s VM/370. Vmware (sort of) (^) 27/

Heap overflow detection 2

  • Mark page at end of heap inaccessible
    • mprotect (heap limit, PAGE SIZE, PROT NONE);
  • Program will allocate memory beyond end of heap
  • Program will use memory and fault
    • Note: Depends on specifics of language
    • But many languages will touch allocated memory immediately
  • Invoke garbage collector
    • Must now put just allocated object into new heap
  • Note: requires more than just resumption
    • Faulting instruction must be resumed
    • But must resume with different target virtual address
    • Doable on most architectures since GC updates registers

34/

Reference counting

  • Seemingly simpler GC scheme:
    • Each object has “ref count” of pointers to it
    • Increment when pointer set to it
    • Decremented when pointer killed (C++ destructors handy for such “smart pointers”)
    • ref count == 0? Free object
  • Works well for hierarchical data structures
    • E.g., pages of physical memory 35/

Reference counting pros/cons

  • Circular data structures always have ref count > 0
    • No external pointers means lost memory
  • Can do manually w/o PL support, but error-prone
  • Potentially more efficient than real GC
    • No need to halt program to run collector
    • Avoids weird unpredictable latencies
  • Potentially less efficient than real GC
    • With real GC, copying a pointer is cheap
    • With reference counting, must write ref count each time (^) 36/