Treasure Hunt - Engineering | Sep 2024

21 September 2024

https://github.com/JuanCrg90/Clean-Code-Notes

I don’t fully subscribe to clean code but I found interesting insights on the naming, comments and formatting sections of this repository.
https://newsletter.pragmaticengineer.com/p/stacked-diffs

Big PRs/MRs only bring pain. They result in LGTM code reviews and are a nightmare when they become stale. At some point I started to split bigger PRs/MRs in smaller ones and basically chaining them. This way I can release code incrementally with more confidence and have better code reviews. Then I found the thing I was doing had a name, “stacked diffs”. Most solutions that exist target github, but I can replicate the work flow on gitlab by targeting the branch it depends on. When a branch is merged to master, all the other branches, that are dependent, target master automatically.
https://www.mnot.net/cache_docs

I never had to worry about HTTP caching, but having an idea of how things work can give some light in a rainy day.
https://pilcrowonpaper.com/blog/local-storage-cookies/

I have always been careful with what I store in local storage. In a world full of extensions, there is a high chance one can become compromised. How can we protect our sites from attacks and keep sensitive information private?
https://www.youtube.com/watch?v=-B58GgsehKQ

SEO is a black box, there are some things we know are considered but the ranking system varies from vendor to vendor and evolves with time. When starting we should stick with what is known to work, this video might help with those first steps.

Job Adventures - PDF generation | Jun 2024

23 June 2024

Well, here we are with a new series. This one is called Job Adventures where I will talk about some challenges I encountered on my day to day job.

In this article we will explore PDF generation. This is one of those classic tasks you rarely need to do but when the task eventually arrives, I get PTSD.

My first contact with building PDFs was with rails using https://github.com/mileszs/wicked_pdf. The task always seems easy, you just build HTML and render that to pdf. And in fact, the part of rendering the info to the pdf is easy. The nightmare comes when implementing what is on the mockups. How will CSS behave in printing mode? What if we have a component that can’t split on a page break, it should jump in its entirety to the next page? What if our cover page does not count to the page total? What if the cover page does not have an header/footer? Why is the pdf so big?

Some of those problems I had in the past, but at the time I was just rendering tables for a financial report. The main problem I remember having was the CSS part and the long generation time. Because I was not implementing the styling at the time, the CSS part was not really my problem, and I am sure wicked_pdf provides some default styles to help in this part. The long processing times were a problem because we were generating pdfs with over 100 pages, this process would take about 5 min and would get worse if more pdfs were being requested in parallel. I can’t remember what the solution was at the time but I think we ended up generating some pdfs in the background and sending them by email when ready. The wicked_pdf gem uses an instance of https://github.com/wkhtmltopdf/wkhtmltopdf under the hood. This causes problems because it can only generate pdfs one by one. The solution would probably be having a dedicated service that would orchestrate multiple wkhtmltopdf instances.

Jumping to today, I am using Go and my first instinct was to find a binding to wkhtmltopdf and go from there. I remember trying to find better solutions to wicked_pdf at the time and none was better, so I started with what I knew worked. What a big surprise it was when I opened wkhtmltopdf github page and found it archived. Basically, it was based on QtWebKit that stopped being maintained long ago. You can find a longer explanation here.

After some searching, I found https://github.com/gotenberg/gotenberg. It ticked a lot of boxes.

It is an independent service that communicates via HTTP. I just send the url to the page I want to convert to PDF and receive the pdf back. This way we have an easily scalable service that can be easily integrated with any other system/language.
The same team maintains a docker image. So we don’t need to worry with any basic dependencies like headless chrome or fonts. Just start a container and relax.
It is written in go, if needed, I can easily open an issue/PR or fork it.

And now you might say, all good. Just create an HTML page and we are done. I wish it would be that easy. Now it’s time to answer the questions I placed in the beginning.

How will CSS behave in printing mode? Why is the pdf so big? From what I experienced, there where not many sharp edges. The only thing that caught me off guard was print-color-adjust , it defaults to economy (which makes sense, to use less ink). The first pages I created were mostly text and tables, no problems at this point, until I added a couple of images and when previewing the print version, the colours were really saturated. It retrospective the solution was easy but at the time I had no clue if the problem was with gottenberg, what property I should change/add or if it was even possible. The solution was to set print-color-adjust to exact . Just be aware, that this is not free, the size of the pdf increased significantly.

What if we have a component that cant split on a page break, it should jump in its entirety to the next page? What if our cover page does not count to the page total? What if the cover page does not have an header/footer? By default you can easily add a header and a footer to every page, the same applies to the counter. But requirements are rarely that simple. But this problems were moderately simple to solve. I disabled footers and headers and manually implemented a header and footer component, this way I have full control when they are shown and what pages count.

The big problem came with dynamically sized content. Without an image it can be hard to explain, but some components should not break (charts and content with side images) and others should (tables). Because all this components varied in the amount of info they had, I calculate the pixel height they would occupy, the vertical space I had left in the page and choose if the component should be split or not. These solution was far from perfect and I feel there should be a better. In hindsight, after exploring more properties like page-break-before I feel this could have solved many of my issues. Even with this in mind, one of the requirements was to have the table header always present at the top on a page break and I don’t think page-break-* properties would help with that.

This feature was developed a couple months ago, so I don’t recall a lot of the issues I had but these were the lessons that stuck with me and that will apply in the next pdf I need to generate (hopefully not soon).

Treasure Hunt - Devops | Feb 2024

10 February 2024

https://unixism.net/loti/async_intro.html

This article is related to the io_ring article, basically it takes a short tour around linux async io and different solutions on how to handle io. We use this technology without knowing, for example, most web servers implement this strategy. Specially interpreted languages with global locks (python, ruby) or using an event loop model (nodejs).
https://earthly.dev/blog/chroot

What if docker is just a magic trick, something that seems so complex but if you lift the curtain it becomes quite easy to understand. This article lifts that curtain revealing docker and containers in general are just a facade on top of chroot.
https://github.com/bibendi/dip + https://github.com/evilmartians/ruby-on-whales

I have tried many times, to some success, to create a dev environment that would have a fast and automated setup and that could run in any OS. I leaned heavily on docker and recently with vscode dev containers on top of that. But I always felt it should be simpler.

That’s where dip comes in, with a Dockerfile and a dip config file (very similar to docker-compose.yml), a docker dev environment is setup and with the help of the dip cli, a near native/local experience can be achieved. As an example, this is what I have to type to start a rails console that runs inside a container dip run rails c .

And to further simplify things, we have https://github.com/evilmartians/ruby-on-whales, a rails app template that comes with default a Dockerfile and dip config.
https://biriukov.dev/docs/fd-pipe-session-terminal/0-sre-should-know-about-gnu-linux-shell-related-internals-file-descriptors-pipes-terminals-user-sessions-process-groups-and-daemons/

Deep dive into file descriptors, pipes, processes, sessions, jobs, terminals and pseudoterminals. I haven’t finished this article yet but I can only say good things. It has a clear explanation of the topics with pictures and code to go along.
https://12factor.net/

Best practices on how to build/manage an application. Most of them are intuitive, specially in a cloud world. But is always important to keep things in perspective, things that are obvious now weren’t in the past.

What is an RTOS?

05 November 2023

RTOS (Real-Time Operating System) is an OS for critical systems that need to process data and events e a defined time.

In this system, we trade speed for predictability. All processing must occur within the defined constraints. We have the guarantee that task x will run in n time. A task in this context is a set of program instructions loaded into memory.

These systems can be hard (operate within tens of milliseconds or less) or soft (operate within a few hundred milliseconds, at the scale of a human reaction), depending on how predictable they need to be. In a hard RTOS, a late answer is a wrong answer.

Size is a lot smaller being in the megabyte range instead of the gigabyte.

Task switching

Tasks are only switched if a higher priority needs to be run, instead of switching in a regular clocked interrupt, in a round-robin fashion. Even with minimal thread switching, interrupt and thread switching latency is kept at a minimum.

Interrupts

Only interrupt handlers have a higher priority than tasks and block the highest priority task from running. Because of this, they are typically kept as short as possible. Because they interrupt the running task, the internal OS object database can be in an inconsistent state. To deal with this, RTOS either disables interrupts, while the internal database is being updated (this can cause interrupts to be ignored) or creates a top-priority task to process the interrupt handler (increases latency).

Memory allocation

Memory allocation is especially important because the device should work indefinitely, without ever needing a reboot. For this reason, dynamic memory allocation is frowned upon because it can easily lead to memory leaks. Dynamic allocation and releasing of small chunks of memory will cause memory fragmentation, reducing the efficient use of free memory (if a big chunk of memory is requested, it might not be possible to be allocated although enough global free memory is available) and allocation latency (allocated memory is typically represented as a linked list, with more fragments, the number of iterations required to find a free memory fragment increase). This is unacceptable in an RTOS since memory allocation has to occur within a certain amount of time.

Memory swap is not used because mechanical disks have much longer and unpredictable response times.

Where are they used?

Flight display controller
Extraterrestrial rovers
Emergency braking systems
Engine warning systems
Magnetic resonance imaging
Surgery equipment
Factory robotics systems

References

https://en.wikipedia.org/wiki/Real-time_operating_system

https://www.windriver.com/solutions/learning/rtos

https://www.digikey.com/en/maker/projects/what-is-a-realtime-operating-system-rtos/28d8087f53844decafa5000d89608016

What is a FUSE filesystem? | Sep 2023

23 September 2023

This is a follow-up to a previous article exploring/implementing a FUSE filesystem. There is still a lot of work so this will become a series.

Series

Jan 2023
Sep 2023 (this one)

What was done

Started writing tests
Improved filesystem mounting/unmounting flow
Added logging functions
Open - Mark node as open
Write - Write data to a file
Setattr - Change node mode
Getxattr - Get extended attribute
Remove - Remove file or directory

I started by writing some tests, to explore what interfaces I should implement next.

First tried to mount a unique filesystem in each test but started having trouble because I could not unmount the filesystem properly. This would be a huge pain as my tests grew. For now, I start the filesystem manually and run the tests against it.

The fuse library includes a fstestutil package that provides some functions to do exactly what I am trying to do but for some reason, the filesystem server hangs. In the future, I might give starting and mounting a filesystem in each test another try. I am running Linux in a VM, it should not cause problems but you never know. I also found a small bug in this package. Once I get this working I will contribute to the fuse repository.

Started by testing the Write method. Firstly I started with a basic success test, writing to a regular file. Note that the filesystem is mounted at /tmp/fusefs

t.Run("Success", func(t *testing.T) {
		generatedFile := GenerateTestFile(t, "/tmp/fusefs")
		str := "hello"
		n, err := generatedFile.WriteString(str)
		assert.Equal(t, len(str), n)
		require.NoError(t, err)
	})

Then a failure test, trying to write to a read-only file

t.Run("FileIsReadOnly", func(t *testing.T) {
		generatedFile := GenerateTestFile(t, "/tmp/fusefs")
		file, err := os.OpenFile(generatedFile.Name(), os.O_RDONLY, 0)
		require.NoError(t, err)
		t.Cleanup(func() { require.NoError(t, file.Close()) })

		n, err := file.WriteString("hello")
		assert.Zero(t, n)
		require.Error(t, err)
		pathErr, ok := err.(*fs.PathError)
		require.True(t, ok, "err is not *fs.PathError")
		errno, ok := pathErr.Err.(syscall.Errno)
		require.True(t, ok, "err is not syscall.Errno")
		assert.Equal(t, syscall.EBADF, errno)
	})

I tried using chmod on the file but realised I needed to implement the fs.NodeSetattrer interface to change the node permissions. I will probably explore node permissions after this series ends.

The fuse.SetattrRequest gives us a lot of fields but we will only use Mode for now. In the fuse source code, there was a comment (“The type of the node is not guaranteed to be sent by the kernel, in which case os.ModeIrregular will be set.”), I am not sure in what cases this could happen so I added an error log. I normally use man pages as a reference to how the function should behave and what error codes it should return. I suppose chmod triggers the method Setattr but could not find any info about this case.

func (n *fuseFSNode) Setattr(ctx context.Context, req *fuse.SetattrRequest, resp *fuse.SetattrResponse) error {
	// NOTE: res.Atrr is filled by Attr method

	if req.Mode&os.ModeIrregular != 0 {
		Errorf("call to Setattr with mode irregular")
		return nil
	}

	n.Mode = req.Mode
	return nil
}

After this, I followed the filesystem logs and it made a call to the fs.NodeGetxattrer interface. Not sure what was calling this but implemented it anyway. After reading the man pages I think implementing it was not necessary cause not all filesystems need to implement it (there is a ENOTSUP error code which indicates that xattrs are not supported). I have some vague idea of xattrs, so I will explore them in the future (probably along node permissions).

func (n fuseFSNode) Getxattr(ctx context.Context, req *fuse.GetxattrRequest, res *fuse.GetxattrResponse) error {
	// NOTE: req.Size is the size of res.Xattr. Size check is performed by fuse library

	if n.Xattrs == nil {
		return syscall.ENODATA
	}

	value, found := n.Xattrs[req.Name]
	if !found {
		return syscall.ENODATA
	}

	res.Xattr = []byte(value)
	return nil
}

Finally for the test to end successfully a call to delete the file is needed, so the fs.NodeRemover was implemented.

func (n *fuseFSNode) Remove(ctx context.Context, req *fuse.RemoveRequest) error {
	for i, node := range n.Nodes {
		if node.Name == req.Name {
			// TODO: Test if rmdir fills req.Dir
			if req.Dir {
				if !node.Mode.IsDir() {
					return syscall.ENOTDIR
				}
				if len(req.Name) != 0 && req.Name[len(req.Name)-1] == '.' {
					return syscall.EINVAL
				}
			} else {
				if node.Mode.IsDir() {
					return syscall.EISDIR
				}
			}

			n.Nodes = append(n.Nodes[:i], n.Nodes[i+1:]...)
			return nil
		}
	}
	return syscall.ENOENT
}

At this point, the test was not returning the expected error. After some digging around I found that the Write method was checking the file OpenFlags . As the name suggests these flags check how the file was opened. Just had to open the file in read-only mode to make the test pass. This also made me realise I needed to check the file permissions. I will implement this in the future because I still need to figure out how to obtain the file/group owner and de request owner.

Our first method test is implemented, in the next article I will test the Remove method by using the syscalls rm, rmdir, unlink.

References

https://www.gnu.org/software/libc/manual/html_node/Error-Codes.html https://man7.org/linux/man-pages/index.html

Treasure Hunt - Engineering | Jul 2023

18 July 2023

This will be the first post of a series of posts I will call Treasure Hunts. In each post, I will showcase 5 items that caught my attention (articles, libraries, any kind of link/reference really). This post will be a Engineering Treasure Hunt, where I list items that are not related to a specific topic. In the future, there will be topic specific Treasure Hunt series (ruby, linux and containers, databases, go). Hope you enjoy this new format.

https://www.16elt.com/2023/01/06/logging-practices-I-follow

A few months ago I had a production bug that required reading logs to track down the root cause. Unfortunately, the logs were useless. Not only because of the quantity but also the quality. In a sea of logs, we need a way to track what logs belong to the same flow and get something useful from a flow that we can test against (an id, a date, SQL, etc.).

This incident made me think about a better way to do it and experiment while developing the next features.

The article is a great shortcut. It sums up what I had to find out on my own.
https://www.quantamagazine.org/how-to-prove-you-know-a-secret-without-giving-it-away-20221011

I first ran against zero-knowledge proofs while diving into the world of blockchains. But they are not only applicable in that space. If you think about a system, many proofs exist. Things like authenticating a user, verifying if a user owns an asset and anything that a user needs to prove.

This article explains this topic while keeping things beginner friendly.
https://theconversation.com/how-to-test-if-were-living-in-a-computer-simulation-194929

If you have ever thought about our existence, this article might interest you. It gives a really interesting take on the simulation hypothesis (aka we live in a computer simulation).
https://github.com/alex/what-happens-when

Explains what happens when we search on the web browser. From when the”g” key is pressed until the end of the first browser paint.
https://endoflife.date

Sometimes it can be hard to know when a version of a package/service has reached the end of life or when the day will come. Recently, I found this site that gives us these important dates. This way, we can plan our upgrades instead of stressing out when a warning appears.

What is io_uring?

07 April 2023

io_uring is a new asynchronous I/O API for Linux created by Jens Axboe from Facebook.

It aims to provide an API without the limitations of similar interfaces

read(2)/write(3) are synchronous
aio_read(3)/aio_write(3) provide asynchronous functionality, but only supports with files opened with O_DIRECT or in unbuffered mode
select(2)/poll(2)/epoll(7) work well with socks but do not behave as expected with regular files (always “ready”)

To have a more consistency API between file descriptors (sockets and regular files) we can use libuv (will probably explore it in the future) or liburing/io_uring (the star of the show).

How does it work?

As the name suggests, it uses ring buffers as the main interface for kernel-user space communication.

There are two ring buffers, one for submission of requests (submission queue or SQ) and the other that informs you about completion of those requests (completion queue or CQ).

These ring buffers are shared between kernel and user space.

Set ring buffers up with io_uring_setup() and then map them into user space with two mmap(2) calls
Create a submission queue entry (SQE) describing what operation you want to perform (read or write a file, accept client connections, etc.) and add it to SQ
Call io_uring_enter() syscall to signal SQEs are ready to be processed
1. Multiple SQEs can be added before making the syscall
2. io_uring_enter() can also wait for requests to be processed by the kernel before it returns, so you know you’re ready to read off the completion queue for results
Requests are processed by the kernel and completion queue events (CQEs) are added to the CQ
Read CQEs off the head of the completion queue ring buffer. There is one CQE corresponding to each SQE and it contains the status of that particular request

Ordering in the CQ may not correspond to the request order in the SQ. This may happen because all requests are performed in parallel, and their results will be added to the CQ as they become available. This is done for performance reasons. If a file is on an HDD and another on an SSD, we don’t want the HDD request to block the faster SSD request.

There is a polling mode available, in which the kernel polls for new entries in the submission queue. This avoids the syscall overhead of calling io_uring_enter() every time you submit entries for processing.

Because of the shared ring buffers between the kernel and user space, io_uring can be a zero-copy system.

How to use it?

Most sources indicate that the kernel interface was adopted in Linux kernel version 5.1. But from what I saw in the linux git, the linux/io_ring is only present in linux 6.0 (does anyone know where it might be declared in previous versions?).

There is also a liburing library that provides an API to interact with the kernel interface easily from userspace.

I will eventually try to interact with io_uring using Go, so keep an eye on future articles if that interests you.

References

https://en.wikipedia.org/wiki/Io_uring

https://unixism.net/loti/index.html

What is a FUSE filesystem? - Jan 2023

05 January 2023

Filesystem in USErspace (FUSE) is a software interface for Unix and Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code. This is achieved by running file system code in user space while the FUSE module provides only a bridge to the actual kernel interfaces.

FUSE is available for Linux, FreeBSD, OpenBSD, NetBSD, OpenSolaris, Minix 3, macOS, and Windows.

How does it work?

To implement a new file system, a handler program should use the libfuse library. This handler program should implement the required methods.

When the filesystem is mounted, the handler is registered with the kernel. Now, when a user calls an operation on this filesystem, the kernel will proxy these requests to the handler.

FUSE is particularly useful for writing virtual filesystems. Unlike traditional filesystems that essentially work with data on mass storage, virtual filesystems don’t actually store data themselves. They act as a view or translation of an existing filesystem or storage device.

In principle, any resource available to a FUSE implementation can be exported as a file system.

Where is this used?

Check these pages for a great examples of where FUSE is used.

https://en.wikipedia.org/wiki/Filesystem_in_Userspace#Applications

https://wiki.archlinux.org/title/FUSE

Basic implementation

https://github.com/Goamaral/fuse-filesystem

For this first implementation I used Go. After a reviewing some solutions I decided to use https://github.com/bazil/fuse. It seemed to be the easiest way to prototype.

This library implements the communication with the kernel from scratch in Go (without using libfuse) and enables an incremental implementation of a custom filesystem. It takes advantage of interfaces and if the implementation does not implement an interface (does not have a method), it has a fallback.

My goal for this implementation was to be able to list directory contents, create file, create directory.

I encourage you to check the code, it always seems harder before implementing.

Implemented interfaces

type Node interface {
	Attr(ctx context.Context, attr *fuse.Attr) error
}

Get the file/directory attributes (permissions, ownership, size, …)

type NodeStringLookuper interface {
	Lookup(ctx context.Context, name string) (Node, error)
}

Lookup file/directory by name inside a file/directory (of course, looking up anything inside a file should return an error)

type NodeCreater interface {
	Create(ctx context.Context, req *fuse.CreateRequest, resp *fuse.CreateResponse) (Node, Handle, error)
}

Creates a file (not sure if it can create directories)

type HandleReadDirAller interface {
	ReadDirAll(ctx context.Context) ([]fuse.Dirent, error)
}

List files and directories inside a directory

type NodeMkdirer interface {
	Mkdir(ctx context.Context, req *fuse.MkdirRequest) (Node, error)
}

Create directory

How do I unmount the FUSE filesystem?

$ fusermount3 -u MOUNTPOINT

What’s next?

I will definitely continue implementing move interfaces like writing/reading to a file, get file size and explore the POSIX syscalls to find new features.

After that I will probably implement the same but in C (with libfuse probably) and register the handler in the kernel.

References

https://en.wikipedia.org/wiki/Filesystem_in_Userspace

https://wiki.archlinux.org/title/FUSE

https://man7.org/linux/man-pages/man3/errno.3.html

https://github.com/libfuse/libfuse/wiki/FAQ

https://github.com/bazil/fuse

What are linux inodes?

06 December 2022

An inode is an index node for every file and directory in the filesystem. Inodes do not store actual data. Instead, they store the metadata where you can find the storage blocks of each file’s data.

Metadata

File type
Permissions
Owner ID
Group ID
Size of file
Time last accessed
Time last modified
Soft/Hard Links
Access Control List (ACLs)

How to check inode information?

$ stat /bin/gcc
  File: /bin/gcc
  Size: 956032    	Blocks: 1872       IO Block: 4096   regular file
Device: 8,1	Inode: 4993952     Links: 3
Access: (0755/-rwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-09-13 00:03:33.000000000 +0100
Modify: 2022-08-20 02:12:31.000000000 +0100
Change: 2022-09-13 00:03:33.715344081 +0100
 Birth: 2022-09-13 00:03:33.705344003 +0100

$ ls -i /bin/gcc
4993952 /bin/gcc

How to check the inode usage on filesystems?

$ df -i
Filesystem      Inodes   IUsed   IFree IUse% Mounted on
dev             878882     564  878318    1% /dev
run             880930     941  879989    1% /run
/dev/sda1      6553600 1260316 5293284   20% /
tmpfs           880930     271  880659    1% /dev/shm
tmpfs           880930      60  880870    1% /tmp
tmpfs           176186     129  176057    1% /run/user/1000

What happens to the inode assigned when moving or copying a file?

When you copy a file, linux assigns a different inode to the new file.

$ touch file1
$ ls -i file1
2674409 file1
$ cp file1 file2
$ ls -i
2674409 file1  2674178 file2

When moving a file, the inode remains the same, as long as the file does not change filesystems.

$ touch file1
$ ls -i file1
2674409 file1
$ mkdir dir
$ mv file1 dir/
$ ls -i dir/file1
2674409 dir/file1

If we change filesystems, the inode changes.

$ touch file1
$ ls -i file1
2674409 file1
$ mv file1 /run/media/architect/123253A832538F99
$ ls -i /run/media/architect/123253A832538F99/file1
37316 /run/media/architect/123253A832538F99/file1

Hard links connect directly to the same inode. Soft links creates a new inode.

$ touch file1
$ ls -i file1
2674409 file1
$ ln file1 file1_hl
$ ls -i file1_hl
2674409 file1_hl
$ ln -s file1 file1_sl
$ ls -i file1_sl
2674103 file_sl

What is the maximum inode value?

In the kernel source code it is coded as a 32-bit unsigned long integer, so the theoretical value would be 2³² (4,294,967,295).

That’s the theoretical maximum. In practice, the number of inodes in an ext4 file system is determined when the file system is created at a default ratio of one inode per 16 KB of file system capacity. Directory structures are created on the fly when the file system is in use, as files and directories are created within the file system.

References

https://docs.rackspace.com/support/how-to/what-are-inodes-in-linux/

https://www.howtogeek.com/465350/everything-you-ever-wanted-to-know-about-inodes-on-linux/

https://www.site24x7.com/learn/linux/inode.html

https://en.wikipedia.org/wiki/Inode

What is an ELF file?

20 November 2022

(The information in this article might be incomplete, I only include information I understood or considered most relevant. Please visit the references for more information)

Is a file format used for executable files, object code, shared libraries, and core dumps.

By design, the ELF format is flexible, extensible, and cross-platform. For instance, it supports different endiannesses and address sizes, so it does not exclude any particular CPU or instruction set architecture.

File layout

ELF file

Each ELF file is made up of one ELF header, followed by file data. The data can include:

Program header table (PHT), describing zero or more memory segments
Section header table (SHT), describing zero or more sections
Data referred to by entries in the program header table or section header table

The segments contain information that is needed for run time execution of the file, while sections contain important data for linking and relocation.

ELF header

32/64 bit format
endianness
target ABI
file type (relocatable, executable, shared, core, others…)
instruction set
entry point address
program header address
section header address
size of this header
program header table entry size
program header table entry count
section header table entry size
section header table entry count
index of section entry that contains the section names

Program header

type
offset of the segment in the file image
virtual address of the segment in memory
size in bytes of the segment in the file image
size in bytes of the segment in memory

Section header

name
type
virtual address of the section in memory
offset of the section in the file image
size in bytes of the section in the file image

How to check ELF file content?

ELF header

$ readelf -h /bin/gcc

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x4077a0
  Start of program headers:          64 (bytes into file)
  Start of section headers:          954048 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         14
  Size of section headers:           64 (bytes)
  Number of section headers:         31
  Section header string table index: 30

Program headers

$ readelf -l /usr/bin/gcc

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x0000000000000310 0x0000000000000310  R      0x8
  INTERP         0x0000000000000350 0x0000000000400350 0x0000000000400350
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x0000000000002538 0x0000000000002538  R      0x1000
  LOAD           0x0000000000003000 0x0000000000403000 0x0000000000403000
                 0x0000000000061301 0x0000000000061301  R E    0x1000
  LOAD           0x0000000000065000 0x0000000000465000 0x0000000000465000
                 0x0000000000080bf8 0x0000000000080bf8  R      0x1000
  LOAD           0x00000000000e6a40 0x00000000004e6a40 0x00000000004e6a40
                 0x00000000000021e8 0x0000000000005a60  RW     0x1000
  DYNAMIC        0x00000000000e7a38 0x00000000004e7a38 0x00000000004e7a38
                 0x00000000000001c0 0x00000000000001c0  RW     0x8
  NOTE           0x0000000000000370 0x0000000000400370 0x0000000000400370
                 0x0000000000000050 0x0000000000000050  R      0x8
  NOTE           0x00000000000003c0 0x00000000004003c0 0x00000000004003c0
                 0x0000000000000044 0x0000000000000044  R      0x4
  TLS            0x00000000000e6a40 0x00000000004e6a40 0x00000000004e6a40
                 0x0000000000000000 0x0000000000000010  R      0x8
  GNU_PROPERTY   0x0000000000000370 0x0000000000400370 0x0000000000400370
                 0x0000000000000050 0x0000000000000050  R      0x8
  GNU_EH_FRAME   0x00000000000da014 0x00000000004da014 0x00000000004da014
                 0x0000000000001acc 0x0000000000001acc  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x00000000000e6a40 0x00000000004e6a40 0x00000000004e6a40
                 0x00000000000015c0 0x00000000000015c0  R      0x1

Section headers

$ readelf -S /usr/bin/gcc

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .interp           PROGBITS         0000000000400350  00000350
       000000000000001c  0000000000000000   A       0     0     1
  [ 2] .note.gnu.pr[...] NOTE             0000000000400370  00000370
       0000000000000050  0000000000000000   A       0     0     8
  [ 3] .note.gnu.bu[...] NOTE             00000000004003c0  000003c0
       0000000000000024  0000000000000000   A       0     0     4
  [ 4] .note.ABI-tag     NOTE             00000000004003e4  000003e4
       0000000000000020  0000000000000000   A       0     0     4
  [ 5] .gnu.hash         GNU_HASH         0000000000400408  00000408
       00000000000000a4  0000000000000000   A       6     0     8
  [ 6] .dynsym           DYNSYM           00000000004004b0  000004b0
       0000000000000cd8  0000000000000018   A       7     1     8
  [ 7] .dynstr           STRTAB           0000000000401188  00001188
       0000000000000591  0000000000000000   A       0     0     1
  [ 8] .gnu.version      VERSYM           000000000040171a  0000171a
       0000000000000112  0000000000000002   A       6     0     2
  [ 9] .gnu.version_r    VERNEED          0000000000401830  00001830
       00000000000000f0  0000000000000000   A       7     2     8
  [10] .rela.dyn         RELA             0000000000401920  00001920
       0000000000000c18  0000000000000018   A       6     0     8
  [11] .init             PROGBITS         0000000000403000  00003000
       000000000000001b  0000000000000000  AX       0     0     4
  [12] .text             PROGBITS         0000000000403020  00003020
       00000000000612d3  0000000000000000  AX       0     0     16
  [13] .fini             PROGBITS         00000000004642f4  000642f4
       000000000000000d  0000000000000000  AX       0     0     4
  [14] .rodata           PROGBITS         0000000000465000  00065000
       0000000000075010  0000000000000000   A       0     0     32
  [15] .stapsdt.base     PROGBITS         00000000004da010  000da010
       0000000000000001  0000000000000000   A       0     0     1
  [16] .eh_frame_hdr     PROGBITS         00000000004da014  000da014
       0000000000001acc  0000000000000000   A       0     0     4
  [17] .eh_frame         PROGBITS         00000000004dbae0  000dbae0
       000000000000a048  0000000000000000   A       0     0     8
  [18] .gcc_except_table PROGBITS         00000000004e5b28  000e5b28
       00000000000000d0  0000000000000000   A       0     0     4
  [19] .tbss             NOBITS           00000000004e6a40  000e6a40
       0000000000000010  0000000000000000 WAT       0     0     8
  [20] .init_array       INIT_ARRAY       00000000004e6a40  000e6a40
       0000000000000018  0000000000000008  WA       0     0     8
  [21] .fini_array       FINI_ARRAY       00000000004e6a58  000e6a58
       0000000000000008  0000000000000008  WA       0     0     8
  [22] .data.rel.ro      PROGBITS         00000000004e6a60  000e6a60
       0000000000000fd8  0000000000000000  WA       0     0     32
  [23] .dynamic          DYNAMIC          00000000004e7a38  000e7a38
       00000000000001c0  0000000000000010  WA       7     0     8
  [24] .got              PROGBITS         00000000004e7bf8  000e7bf8
       0000000000000400  0000000000000008  WA       0     0     8
  [25] .data             PROGBITS         00000000004e8000  000e8000
       0000000000000c28  0000000000000000  WA       0     0     32
  [26] .bss              NOBITS           00000000004e8c40  000e8c28
       0000000000003860  0000000000000000  WA       0     0     32
  [27] .comment          PROGBITS         0000000000000000  000e8c28
       0000000000000012  0000000000000001  MS       0     0     1
  [28] .note.stapsdt     NOTE             0000000000000000  000e8c3c
       0000000000000130  0000000000000000           0     0     4
  [29] .gnu_debuglink    PROGBITS         0000000000000000  000e8d6c
       0000000000000010  0000000000000000           0     0     4
  [30] .shstrtab         STRTAB           0000000000000000  000e8d7c
       0000000000000143  0000000000000000           0     0     1

Everything

$ readelf -a /usr/bin/gcc
$ objdump -x /usr/bin/gcc
$ file /usr/bin/gcc

References

https://en.wikipedia.org/wiki/Executable_and_Linkable_Format

https://linuxhint.com/understanding_elf_file_format

https://man7.org/linux/man-pages/man5/elf.5.html

Static vs dynamic linking

18 November 2022

What is static linking?

Static linking links libraries at compile time, copying them to the final binary.

What is dynamic linking?

Dynamic linking loads and links libraries at runtime, loading them to memory.

Only the name of the shared libraries is saved at compile time.

These names are saved in a PLT (Procedure Linkage Table)

Static vs dynamic linking

Static

Bigger binaries

Dynamic

Depend on external libraries to be installed and be compatible
Shared libraries are shared across processes
Shared library code can be updated/patched without new compilation
Updates to shared library code can add breaking changes and prevent the program from running

How to create a statically linked binary?

$ ld [options] objfile

ld combines several object and archive files, relocates their data and ties up symbol references. Usually, the last step in compiling a program is to run ld.

$ gcc hello.c -static -o hello

How to create a dynamically linked binary?

$ gcc hello.c -o hello

How to know if a binary is statically or dynamically linked?

Check the type of linking

$ file /usr/bin/gcc

/usr/bin/gcc: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=017fc52acbca077c9bc6a4e8f04dd90eb5385243, for GNU/Linux 4.4.0, stripped

Check dynamically linked libraries

$ ldd /bin/gcc

linux-vdso.so.1 (0x00007fff6377e000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007fcd238f2000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007fcd23b02000)

Goa's Blog

In this blog you can find articles related with my passions, mainly computers.

Treasure Hunt - Engineering | Sep 2024

Job Adventures - PDF generation | Jun 2024

Treasure Hunt - Devops | Feb 2024

What is an RTOS?

Task switching

Interrupts

Memory allocation

Where are they used?

References

What is a FUSE filesystem? | Sep 2023

References

Treasure Hunt - Engineering | Jul 2023

What is io_uring?

How does it work?

How to use it?

References

What is a FUSE filesystem? - Jan 2023

How does it work?

Where is this used?

Basic implementation

How do I unmount the FUSE filesystem?

What’s next?

References

What are linux inodes?

Metadata

How to check inode information?

How to check the inode usage on filesystems?

What happens to the inode assigned when moving or copying a file?

What is the maximum inode value?

References

What is an ELF file?

File layout

ELF header

Program header

Section header

How to check ELF file content?

ELF header

Program headers

Section headers

Everything

References

Static vs dynamic linking

What is static linking?

What is dynamic linking?

Static vs dynamic linking

How to create a statically linked binary?

How to create a dynamically linked binary?

How to know if a binary is statically or dynamically linked?