Malicious website detection on Splunk using malware-filter

A guide on using malware-filter lookups

  1. Lookup file locations
  2. inputlookup basics
  3. Search for specific events
  4. Wildcard
    1. Wildcard prefix
  5. Matching multiple fields
    1. Matching individual and multiple fields
  6. Case-insensitive
  7. CIDR matching
  8. inputlookup + lookup
  9. Lookup definition
    1. Case-sensitive
    2. Wildcard (lookup)
    3. CIDR-matching (lookup)

Splunk Add-on for malware-filter includes the following CSV files:

  • botnet-filter-splunk.csv
  • botnet_ip.csv
  • opendbl_ip.csv
  • phishing-filter-splunk.csv
  • pup-filter-splunk.csv
  • urlhaus-filter-splunk-online.csv
  • vn-badsite-filter-splunk.csv

These CSV files can be used as lookups to find potentially malicious traffic. They contain a list of bad IPs/domains/URLs and we are going to look for those values in the events.

We can view the content of a lookup file by using inputlookup. When using that command, there should always be a leading pipe character “|” because it is an event-generating command.

Lookup file locations §

Lookup file can be uploaded via Splunk Web or creating the file in the following locations:

  • $SPLUNK_HOME/etc/users/<username>/<app_name>/lookups/
  • $SPLUNK_HOME/etc/apps/<app_name>/lookups/
  • $SPLUNK_HOME/etc/system/lookups/

In Splunk Web, setting the permission to app-sharing or global-sharing will automatically moves the file to the second or third location respectively. Uploaded lookup file can be used straight away without having to reload app or restart Splunk, regardless of which way it was created.

inputlookup basics §

| inputlookup botnet_ip.csv

_time field is omitted for brevity.

first_seen_utcdst_ipdst_portc2_statuslast_onlinemalwareupdated
2021-05-16 19:49:331.2.3.41234online2023-03-05Lorem2023-03-04T16:41:17Z

The output is no different to any other event, we can specify which fields to be displayed and then rename the fields.

| inputlookup botnet_ip.csv | fields dst_ip | rename dst_ip AS dst
dst
178.128.23.9

Search for specific events §

Example firewall events:

index=firewall
srcsrc_portdstaction
192.168.1.5454541.2.3.4allowed
192.168.1.3454527.6.5.4allowed
192.168.1.4454574.3.2.1allowed
192.168.1.6454517.7.5.5allowed

Notice the second row’s dst value matches dst_port value of the example lookup table shown in the previous section.

To match for dst value of the firewall events and dst_ip of the lookup file, use a subsearch with inputlookup. In this example, the subsearch extracts only the dst_ip field and rename it to dst in order to match the same field in the firewall events.

index=firewall [| inputlookup botnet_ip.csv | fields dst_ip | rename dst_ip AS dst]
srcsrc_portdstaction
192.168.1.5454541.2.3.4allowed

To display events in table format, append | table *

Wildcard §

Asterisk character (*) in the lookup file does work as a wildcard.

index=proxy
srcurldst_port
192.168.1.5foo.com/path1443
192.168.1.3foo.com/path2443
192.168.1.4bar.com/path3443

The lookup files do not include wildcard affix.

| inputlookup urlhaus-filter-splunk-online.csv
hostpathmessageupdated
foo.comurlhaus-filter malicious website detected2023-03-13T00:11:20Z

The add-on includes geturlhausfilter command along with other commands to update their respective lookup file. Those commands has wildcard_suffix argument to append wildcard to the field’s values.

| geturlhausfilter wildcard_suffix=host
| outputlookup override_if_empty=false urlhaus-filter-splunk-online.csv
hostpathmessageupdatedhost_wildcard_suffix
foo.comurlhaus-filter malicious website detected2023-03-13T00:11:20Zfoo.com*
index=proxy [| inputlookup urlhaus-filter-splunk-online.csv | fields host_wildcard_suffix | rename host_wildcard_suffix AS url ]
srcurldst_port
192.168.1.5foo.com/path1443
192.168.1.3foo.com/path2443

Wildcard prefix §

Previous section showed an example using wildcard suffix (“foo.com*“). Wildcard also works as a prefix (“*foo.com”) or even in the middle (“f*o.com”), though these are discouraged.

index=proxy
srcdomaindst_port
192.168.1.5foo.com443
192.168.1.3lorem.foo.com443
192.168.1.4bar.com443
| geturlhausfilter wildcard_prefix=host
| outputlookup override_if_empty=false urlhaus-filter-splunk-online.csv
hostpathmessageupdatedhost_wildcard_prefix
foo.comurlhaus-filter malicious website detected2023-03-13T00:11:20Z*foo.com
index=proxy [| inputlookup urlhaus-filter-splunk-online.csv | fields host_wildcard_prefix | rename host_wildcard_prefix AS domain ]
srcdomaindst_port
192.168.1.5foo.com443
192.168.1.3lorem.foo.com443

Matching multiple fields §

File hosting services like Google Docs and Dropbox are commonly abused to host phishing website. For those sites, the lookup should match both domain and path. When specifying more than one field in fields command, all fields will be matched using AND condition.

index=proxy
srcdomainpath
192.168.1.5foo.comdocument1.html
192.168.1.3foo.comdocument2.html
192.168.1.4foo.comdocument3.html
| inputlookup urlhaus-filter-splunk-online.csv
hostpathmessageupdated
foo.comdocument1.htmlurlhaus-filter malicious website detected2023-03-13T00:11:20Z
index=proxy [| inputlookup urlhaus-filter-splunk-online.csv | fields host, path | rename host AS domain ]
srcdomainpath
192.168.1.5foo.comdocument1.html

Matching individual and multiple fields §

A lookup file may have rows with empty path to denote a domain should be blocked regardless of paths, while also having rows with both domain and path to denote a specific URL should be blocked instead. The syntax is the same as what was shown in the previous section because Splunk will only match non-empty values, empty values will be ignored instead.

index=proxy
srcdomainpath
192.168.1.5bad-domain.comlorem-ipsum.html
192.168.1.3bad-domain.comfoo-bar.html
192.168.1.4docs.google.commalware.exe
192.168.1.4docs.google.comsafe.doc
| inputlookup urlhaus-filter-splunk-online.csv
hostpathmessageupdated
bad-domain.comurlhaus-filter malicious website detected2023-03-13T00:11:20Z
docs.google.commalware.exeurlhaus-filter malicious website detected2023-03-13T00:11:20Z
index=proxy [| inputlookup urlhaus-filter-splunk-online.csv | fields host, path | rename host AS domain ]
srcdomainpath
192.168.1.5bad-domain.comlorem-ipsum.html
192.168.1.3bad-domain.comfoo-bar.html
192.168.1.4docs.google.commalware.exe

Case-insensitive §

Lookup file is case-insensitive. If case-sensitive matching is required, use lookup and lookup definition.

index=proxy
srcdomain
192.168.1.5loremipsum.com
| inputlookup urlhaus-filter-splunk-online.csv
hostpathmessageupdated
lOrEmIpSuM.comurlhaus-filter malicious website detected2023-03-13T00:11:20Z
docs.google.commalware.exeurlhaus-filter malicious website detected2023-03-13T00:11:20Z
index=proxy [| inputlookup urlhaus-filter-splunk-online.csv | fields host, path | rename host AS domain ]
srcdomain
192.168.1.5loremipsum.com

CIDR matching §

Splunk automatically detects CIDR-like value in a lookup file and performs CIDR-matching accordingly. However, this behaviour is on best-effort basis and may not work as intended. To explicitly use lookup fields for CIDR-matching, use lookup and lookup definition.

index=firewall
srcsrc_portdstaction
192.168.1.545454187.190.252.167allowed
192.168.1.3454527.6.5.4allowed
192.168.1.4454574.3.2.1allowed
192.168.1.64545189.248.163.100allowed
| inputlookup opendbl_ip.csv
startendnetmaskcidr_rangenameupdated
187.190.252.167187.190.252.16732187.190.252.167/32Emerging Threats: Known Compromised Hosts2023-01-30T08:03:00Z
89.248.163.089.248.163.2552489.248.163.0/24Dshield2023-01-30T08:01:00Z
index=firewall [| inputlookup opendbl_ip.csv | fields cidr_range | rename cidr_range AS dst ]
srcsrc_portdstaction
192.168.1.545454187.190.252.167allowed
192.168.1.64545189.248.163.100allowed

inputlookup + lookup §

When using as a subsearch, inputlookup filters the event data and only outputs rows with matching values of specified field(s). lookup enriches the event data by appending new fields to the rows with matching field values. Another way to understand the difference is that inputlookup performs inner join while lookup performs left outer join where the event data is the left table and the lookup file is the right table.

Despite their difference, it can be useful to use both at the same time to enrich filtered event data, even when using the same lookup file.

| inputlookup botnet_ip.csv

_time field is omitted for brevity.

first_seen_utcdst_ipdst_portc2_statuslast_onlinemalwareupdated
2021-05-16 19:49:331.2.3.41234online2023-03-05Lorem2023-03-04T16:41:17Z
2021-05-16 19:49:334.3.2.11234online2023-03-05Ipsum2023-03-04T16:41:17Z
index=firewall
srcsrc_portdstaction
192.168.1.5454541.2.3.4allowed
192.168.1.3454527.6.5.4allowed
192.168.1.4454574.3.2.1allowed
192.168.1.6454517.7.5.5allowed
index=firewall [| inputlookup botnet_ip.csv | fields dst_ip | rename dst_ip AS dst]
srcsrc_portdstaction
192.168.1.5454541.2.3.4allowed
192.168.1.3454527.6.5.4allowed
192.168.1.4454574.3.2.1allowed
192.168.1.6454517.7.5.5allowed
index=firewall [| inputlookup botnet_ip.csv | fields dst_ip | rename dst_ip AS dst]
| lookup botnet_ip.csv dst_ip AS dst OUTPUT c2_status, malware
srcsrc_portdstactionc2_statusmalware
192.168.1.5454541.2.3.4allowedonlineLorem
192.168.1.4454574.3.2.1allowedonlineIpsum

It is also possible to rename lookup destination fields.

index=firewall [| inputlookup botnet_ip.csv | fields dst_ip | rename dst_ip AS dst]
| lookup botnet_ip.csv dst_ip AS dst OUTPUT c2_status AS "C2 Server Status", malware AS "Malware Family"
srcsrc_portdstactionC2 Server StatusMalware Family
192.168.1.5454541.2.3.4allowedonlineLorem
192.168.1.4454574.3.2.1allowedonlineIpsum

Lookup definition §

Lookup definition provides matching rules for a lookup file. It can be configured for case-sensitivity, wildcard, CIDR-matching and others through transforms.conf. It can also be configured via Splunk Web: Settings → Lookups → Lookup definitions.

A bare minimum lookup definition is as such:

transforms.conf
[lookup-definition-name] filename = lookup-filename.csv

transforms.conf can be saved in the following directories in order of priority (highest to lowest):

  • $SPLUNK_HOME/etc/users/<username>/<app_name>/local/
  • $SPLUNK_HOME/etc/apps/<app_name>/local/
  • $SPLUNK_HOME/etc/system/local/

My naming convention for lookup definition is simply removing the .csv extension, e.g. “example.csv” (lookup file), “example” (lookup definition). While it is possible to name a lookup definition with file extension (“example.csv”), I discourage it to avoid confusion.

It is imperative to note that lookup definition only applies to lookup search command and does not apply to inputlookup. Although inputlookup supports lookup definition as a lookup table (in addition to lookup file), its matching rules will be ignored.

Case-sensitive §

transforms.conf
[urlhaus-filter-splunk-online] filename = urlhaus-filter-splunk-online.csv # applies to all fields case_sensitive_match = 1
index=proxy
srcdomainpath
192.168.1.5bad-domain.comlorem-ipsum.html
192.168.1.3bad-domain.comlOrEm-iPsUm.hTmL
| inputlookup urlhaus-filter-splunk-online
hostpathmessageupdated
bad-domain.comlorem-ipsum.htmlurlhaus-filter malicious website detected2023-03-13T00:11:20Z
index=proxy
| lookup urlhaus-filter-splunk-online host AS domain, path OUTPUT message
srcdomainpathmessage
192.168.1.5bad-domain.comlorem-ipsum.htmlurlhaus-filter malicious website detected
192.168.1.3bad-domain.comlOrEm-iPsUm.hTmL

Wildcard (lookup) §

transforms.conf
[urlhaus-filter-splunk-online] filename = urlhaus-filter-splunk-online.csv match_type = WILDCARD(host_wildcard_suffix)
index=proxy
srcurldst_port
192.168.1.5foo.com/path1443
192.168.1.3foo.com/path2443
192.168.1.4bar.com/path3443

The lookup files do not include wildcard affix.

| inputlookup urlhaus-filter-splunk-online
hostpathmessageupdatedhost_wildcard_suffix
foo.comurlhaus-filter malicious website detected2023-03-13T00:11:20Zfoo.com*
index=proxy
| lookup urlhaus-filter-splunk-online host_wildcard_suffix AS url OUTPUT message
srcurldst_portmessage
192.168.1.5foo.com/path1443urlhaus-filter malicious website detected
192.168.1.3foo.com/path2443urlhaus-filter malicious website detected

CIDR-matching (lookup) §

transforms.conf
[opendbl_ip] filename = opendbl_ip.csv match_type = CIDR(cidr_range)
index=firewall
srcsrc_portdstaction
192.168.1.545454187.190.252.167allowed
192.168.1.3454527.6.5.4allowed
192.168.1.4454574.3.2.1allowed
192.168.1.64545189.248.163.100allowed
| inputlookup opendbl_ip
startendnetmaskcidr_rangenameupdated
187.190.252.167187.190.252.16732187.190.252.167/32Emerging Threats: Known Compromised Hosts2023-01-30T08:03:00Z
89.248.163.089.248.163.2552489.248.163.0/24Dshield2023-01-30T08:01:00Z
index=firewall
| lookup opendbl_ip cidr_range AS dst OUTPUT name AS threat
srcsrc_portdstactionthreat
192.168.1.545454187.190.252.167allowedEmerging Threats: Known Compromised Hosts
192.168.1.3454527.6.5.4allowed
192.168.1.4454574.3.2.1allowed
192.168.1.64545189.248.163.100allowedDshield