These CSV files can be used as lookups to find potentially malicious traffic. They contain a list of bad IPs/domains/URLs and we are going to look for those values in the events.
We can view the content of a lookup file by using inputlookup. When using that command, there should always be a leading pipe character “|” because it is an event-generating command.
In Splunk Web, setting the permission to app-sharing or global-sharing will automatically moves the file to the second or third location respectively. Uploaded lookup file can be used straight away without having to reload app or restart Splunk, regardless of which way it was created.
Notice the second row’s dst value matches dst_port value of the example lookup table shown in the previous section.
To match for dst value of the firewall events and dst_ip of the lookup file, use a subsearch with inputlookup. In this example, the subsearch extracts only the dst_ip field and rename it to dst in order to match the same field in the firewall events.
Asterisk character (*) in the lookup file does work as a wildcard.
index=proxy
src
url
dst_port
192.168.1.5
foo.com/path1
443
192.168.1.3
foo.com/path2
443
192.168.1.4
bar.com/path3
443
The lookup files do not include wildcard affix.
| inputlookup urlhaus-filter-splunk-online.csv
host
path
message
updated
foo.com
urlhaus-filter malicious website detected
2023-03-13T00:11:20Z
The add-on includes geturlhausfilter command along with other commands to update their respective lookup file. Those commands has wildcard_suffix argument to append wildcard to the field’s values.
Previous section showed an example using wildcard suffix (“foo.com*“). Wildcard also works as a prefix (“*foo.com”) or even in the middle (“f*o.com”), though these are discouraged.
File hosting services like Google Docs and Dropbox are commonly abused to host phishing website. For those sites, the lookup should match both domain and path. When specifying more than one field in fields command, all fields will be matched using AND condition.
A lookup file may have rows with empty path to denote a domain should be blocked regardless of paths, while also having rows with both domain and path to denote a specific URL should be blocked instead. The syntax is the same as what was shown in the previous section because Splunk will only match non-empty values, empty values will be ignored instead.
Splunk automatically detects CIDR-like value in a lookup file and performs CIDR-matching accordingly. However, this behaviour is on best-effort basis and may not work as intended. To explicitly use lookup fields for CIDR-matching, use lookup and lookup definition.
When using as a subsearch, inputlookup filters the event data and only outputs rows with matching values of specified field(s). lookup enriches the event data by appending new fields to the rows with matching field values. Another way to understand the difference is that inputlookup performs inner join while lookup performs left outer join where the event data is the left table and the lookup file is the right table.
Despite their difference, it can be useful to use both at the same time to enrich filtered event data, even when using the same lookup file.
Lookup definition provides matching rules for a lookup file. It can be configured for case-sensitivity, wildcard, CIDR-matching and others through transforms.conf. It can also be configured via Splunk Web: Settings → Lookups → Lookup definitions.
My naming convention for lookup definition is simply removing the .csv extension, e.g. “example.csv” (lookup file), “example” (lookup definition). While it is possible to name a lookup definition with file extension (“example.csv”), I discourage it to avoid confusion.
It is imperative to note that lookup definition only applies to lookup search command and does not apply to inputlookup. Although inputlookup supports lookup definition as a lookup table (in addition to lookup file), its matching rules will be ignored.