Search files with the API

On this page

The search API provides you with a powerful way to find files in your Sirv account, powered by the Elasticsearch engine. Common search options are described below, with code examples. For further reading, you can refer to the Elasticsearch documentation.

Search options

Your search can use any combination of these parameters:

  • filename - file name and its folder path
  • filename.raw - file name and its folder path, entire path
  • basename - file or folder name only
  • basename.raw - file or folder name, entire name
  • basename.ngram - file or folder name, exact strings
  • dirname - folder name in any part of path
  • dirname.paths - contents of specific folder and its sub-folders
  • dirname.raw - contents of specific folder only
  • extension - file extension e.g. .jpg
  • contentType - mime type e.g. image/jpeg
  • isDirectory - if item is a folder
  • mtime - modified time
  • ctime - created time
  • atime - last accessed time (by HTTP)
  • size - file size (bytes)
  • meta.tags - files with a particular meta tag
  • meta.description - files with a particular meta description
  • meta.title - files with a particular meta title
  • meta.width - images with certain width
  • meta.height - images with certain height

Search by name

Three fields let you search by name: basename, filename and dirname.

Those three fields can be made more specific with basename.ngram, filename.raw and dirname.raw.

Each of your files has a value for those three fields. You can see the three fields in any response. For example:

"filename": "/Some Images/MY-file.jpg",
"basename": "MY-file.jpg",
"dirname": "/Some Images",

Search by filename

filename searches are case insensitive. They will search for characters a-z A-Z 0-9 .. Some characters are treated as spacers £ $ % & - _ + = ; @ # ' < > , . | ¬ `space, some characters should be escaped { } / \ ! space and all other characters are ignored. The extension is indexed as part of the last part of the filename. Search terms are limited to 1024 characters. (For interest, search terms are handled by the standard analyzer and standard tokenizer.)

This query will return all files and folders containing Adidas:

"query": "filename:Adidas"

This query will return all files and folders containing Adidas and Gazelle.jpg:

"query": "filename:Adidas-Gazelle.jpg"

This query will return all files containing Gazelle and exclude any folders from the results:

"query": "filename:Gazelle NOT (isDirectory)"

This query uses isDirectory to return only folders matching Gazelle:

"query": "filename:Gazelle AND (isDirectory)"

This query uses wildcard * to return all items in the /Products folder starting with Gaz:

"query": "filename:\\/Products\\/Gaz*"

Search by filename.raw

A search with filename.raw requires an exact match of the entire filename (folder path and file name). Searches are case sensitive and all characters will be included. Search terms are limited to 1024 characters. (For interest, search terms are handled by the keyword analyzer and keyword tokenizer.)

This query will return only the file at this exact path /Products/Adidas-Gazelle.jpg:

"query": "filename.raw:\\/Products\\/Adidas-Gazelle.jpg"

This query uses the ? wildcard to return files that start with /Products/Adidas-, followed by any 2 characters, then end with .jpg:

"query": "filename.raw:\\/Products\\/Adidas-??.jpg"

Search by basename

basename searches are case insensitive. They will search for characters a-z A-Z 0-9. Some characters are treated as spacers £ $ % & - _ + = ; @ # ' < > , . | ¬ `space, some characters should be escaped { } / \ ! space and all other characters are ignored. Search terms are limited to 255 characters. (For interest, search terms are handled by the standard analyzer and standard tokenizer.)

This query will return all files containing Gaz:

"query": "basename:Gaz"

This query will return all files containing Adidas, Gazelle and jpg:

"query": "basename:Adidas-Gazelle.jpg"

Search by basename.raw

A search with basename.raw requires an exact match of the entire file name. Searches are case sensitive and all characters will be included. Search terms are limited to 255 characters. (For interest, search terms are handled by the keyword analyzer and keyword tokenizer.)

This query will return files that exactly match Adidas-Gazelle.jpg:

"query": "basename.raw:Adidas-Gazelle.jpg"

This query uses the * wildcard to return files that start with Adidas-, followed by something, then end with .jpg:

"query": "basename.raw:Adidas-*.jpg"

Search by basename.ngram

A search with basename.ngram requires an exact match of any part of the file name. Searches are case sensitive and all characters will be included. Search terms must be at least 2 characters and maximum 60 characters. (For interest, search terms are handled by the keyword analyzer and ngram tokenizer.)

This query will return files that contain Adidas-Gaze:

"query": "basename.ngram:Adidas-Gaze"

This query will return files that contain Gazelle.jpg:

"query": "basename.ngram:Gazelle.jpg"

Search by dirname.paths

dirname.paths returns the contents of a specific folder and its subfolders. Searches are case sensitive and all characters will be included. Search terms are limited to 1024 characters. (For interest, it uses the keyword analyzer and keyword tokenizer.)

This query will return the contents of the folder /Products/trainers, plus the contents of any subfolders:

"query": "dirname.paths:\\/Products\\/trainers"

Search by dirname.raw

dirname.raw is the same as dirname.paths but only for a specific folder. It won't return contents of sub-folders.

This query will return the contents of the folder /Products/trainers and not its subfolders:

"query": "dirname.raw:\\/Products\\/trainers"

Search by dirname

dirname can also return the contents of a folder but it is a broad search, which will match any part of the folder or filename. Instead, we recommend using dirname.paths or dirname.raw described above, which are more precise ways of searching a folder.

dirname is case insensitive. It will search for characters a-z A-Z 0-9. Some characters are treated as spacers £ $ % & - _ + = ; @ # ' < > , . | ¬ `space, some characters should be escaped { } / \ ! space and all other characters are ignored. Search terms are limited to 1024 characters. (For interest, search terms are handled by the standard analyzer and standard tokenizer.)

This query will return the contents (files, folders, sub-folders) of all folders containing products anywhere in its path:

"query": "dirname:products"

This query will return the contents of any folder path containing /Products/trainers (anywhere in its path):

"query": "dirname:\\/Products\\/trainers"

Index and tokens

Each file in your account will be indexed and tokenized with ElasticSearch. The API search term that you submit will also be tokenized. A file will only be returned if it has tokens that match all of the tokens in your search query.

For example, this file in your Sirv account:

/Some Images/MY-file.jpg

will have had the following tokens generated:

filename

some, images, my, file.jpg

filename.raw

/Some Images/MY-file.jpg

basename

m, my, y, f, fi, fil, file, i, il, ile, l, le, e, j, jp, jpg, p, pg, g

basename.raw

MY-file.jpg

basename.ngram

MY, MY-, MY-f, MY-fi, MY-fil, MY-file, MY-file., MY-file.j, MY-file.jp, MY-file.jpg, Y-, Y-f, Y-fi, Y-fil, Y-file, Y-file., Y-file.j, Y-file.jp, Y-file.jpg, -fi, -fil, -file, -file., -file.j, -file.jp, -file.jpg, fi, fil, file, file., file.j, file.jp, file.jpg, il, ile, ile., ile.j, ile.jp, ile.jpg, le, le., le.j, le.jp, le.jpg, e., e.j, e.jp, e.jpg, .j, .jp, .jpg, jp, jpg, pg

dirname

some, images

dirname.paths

Some Images

dirname.raw

/Some Images

Search by date/time

You can set a date range to find files in a certain period of time.

  • ctime - time created (when it was uploaded to Sirv)
  • mtime - time modified (when it was last overwritten)
  • atime - time accessed (when it was last requested by HTTP)

The primary purpose of atime is to identify files that have not been accessed for a very long time, thus may be safe to delete. If the file has never been requested, no value will show. The value will update within a few minutes of the URL being requested and it won't update if it has already been updated in the last 7 days. Tracking started on March 22, 2023, so any file without an atime has never been requested since that date.

The following query will return all files in the /Products folder created between 28 March and 3 April 2023, in UTC time (Z):

"query": "dirname:\\/Products AND ctime:[2023-03-28T00:00:00.000Z TO 2023-04-03T23:59:59.999Z]"

The following query uses now to return all files modified from 1435hrs on 15 April 2023 up until now:

"query": "mtime:[2023-04-15T14:35:00.000Z TO now]"

You can set a relative time in either minutes m, hours h, days d, weeks w, months m or years y. The following query will return all files modified in the last 48 hours:

"query": "mtime:[now-48h TO now]"

The following query uses the wildcard * to denote any time. It will return all files modified before 1 January 2020:

"query": "mtime:[* TO 2020-01-01] AND atime:[* TO *]"

The following example will return all files created since 1 April 2023 that have never been accessed:

"query": "ctime:[2023-04-01 TO now] NOT (atime:*)"

Search by file size

You can search for files by their size, in bytes. The following query will return files between 5 MB and 10 MB:

"query": "size:[5000000 TO 10000000]"

Search by image dimensions

You can use meta.width and meta.height to search for images by their dimensions.

The following query will return images of exactly 1920px width and 1080px height:

"query": "meta.width:1920 AND meta.height:1080"

To search for files smaller than 2500px width:

"query": "meta.width:<2500"

To find files between 400 and 850px height:

"query": "meta.height:[400 TO 850]"

Search by meta

You can use any of the other meta data fields to search via meta content. This query will search for all JPEG or PNG images that do not have a meta description:

"query": "contentType:image\\/jpeg OR image\\/png NOT (meta.description:*)"

Boolean operators

You can use AND, OR and NOT boolean operators to narrow down the files you require.

AND

Create a very specific search using AND or double ampersand && to search upon a variety of requirements. For example, this will return all JPEG files modified in the last 14 days and not in the Trash folder:

"query": "contentType:image\\/jpeg AND mtime:[now-14d TO now] AND -dirname:\\/.Trash"

OR

Create a broad search with OR or double pipe || to return results containing either term. For example, this will return all files containing the name black or white:

"query": "basename:black OR white"

NOT

You can exclude certain results with NOT or minus - or exclamation !. For example, this will show all results for "Gazelle" that are not a folder:

"query": "basename:Gazelle NOT -isDirectory:true"

Escape special characters

You can use special characters in your search, as long as they are escaped with a double backslash \\ beforehand.

These special characters must be escaped:

{ } / \ ! space

For example to search for /My-folder/2024/the studio.jpg use the query:

"query": "filename:\\/My-folder\\/2024\\/the\\ studio.jpg"

If your file path contains a backslash \, then it should be escaped by a single backslash, thus your query should have two backslashes \\.

Grouping

Multiple terms or clauses can be grouped together with parentheses, to form sub-queries.

This will search for files with Gazelle in the name and with either a .psd or .eps extension:

"query": "basename:Gazelle AND extension:(.psd OR .eps)"

This will search for PDF files modified in the last 14 days:

"query": "(extension:.pdf OR contentType:application\\/pdf) AND mtime:[now-14d TO now]"

Wildcards

If you need to find files containing a partial string of characters, you can use a wildcard to fill in the gaps.

  • * - represents a string of characters
  • ? - represents a single character

The wildcard can take place of any character permitted by the tokenizer. For example, tokens in filename, basename and dirname can be a-z A-Z 0-9, so the wildcard will act as any of those characters and no others.

This query will return files ending 01.jpg":

"query": "basename.raw:*01.jpg"

This query will return files at the path /Products/trainers/Adidas- ending with .jpg:

"query": "filename.raw:\\/Products\\/trainers\\/Adidas-*.jpg"

This query will return the contents of any folders starting /Products/ABC and ending with any 3 characters:

"query": "dirname:\\/Products\\/ABC???"

Wildcards are not available for basename.ngram searches.

Exclude hidden folders

Your Sirv account has some special hidden folders, which might contain files that appear in search results:

  • /Shared
  • /Profiles
  • /.Trash
  • /.processed

To exclude those folders from your search, use - and AND like so:

"query": "extension:.tif AND -dirname:\\/Shared AND -dirname:\\/Profiles AND -dirname:\\/.Trash AND -dirname:\\/.processed"

Sorting results

The sort parameter will determine the order of files in the response. The default order is ascending asc filenames. The following example will sort by basename.raw, with descending results desc:

{
 "query": "extension:.pdf AND -dirname:\\/.Trash",
 "sort": {
  "basename.raw": "desc"
 }
}

You can sort by any of these fields:

  • filename.raw
  • basename.raw
  • dirname.raw
  • contentType.raw
  • mtime
  • ctime
  • size

You can apply multiple sort parameters, to sort by one field then another. The following example sorts by content type and then modified time:

{
 "query": "mtime:[now-2y TO now] AND -dirname:\\/.Trash",
 "sort": {
  "contentType.raw": "asc",
  "mtime": "desc"
 },
}

For more sorting options, refer to the Elasticsearch sorting documentation.

Setting the number of results

Each response will contain up to 100 results. The number of results can be set with the size parameter to any value from 1 to 100.

If your query contains more than 100 results, you can fetch the remaining results by sending additional requests using the from parameter. The count starts from 0, so the first 100 results are 0-99. The following query will fetch the next 100 (from 100 to 199):

{
 "query": "ctime:[now-1y TO now] AND -dirname:\\/.Trash",
 "sort": {
  "dirname.raw": "asc"
 },
 "size": 100,
 "from": 100
}

The from and size parameters let you retrieve up to 1000 results. If you need more than 1000 results, use a scrolling search, described below.

Scroll through more than 1000 results

If you need to retrieve more than 1000 search results, you can use scrolling search to return them in batches of up to 100 at a time.

Create a scrolling search by setting scroll to true, for example:

{
 "query": "ctime:[now-1y TO now] NOT dirname:\\/.Trash",
 "sort": {
  "dirname.raw": "asc"
 },
 "size": 100,
 "from": 0,
 "scroll": true
}

That request will return a response containing the first 100 results, ending with a scrollId, like so:

        ...results will show here, then response will end with:
        }
    ],
    "total": 2812,
    "_relation": "eq",
    "scrollId": "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFmFhQ0xUVmc2UmotVXBlM1pCSDJYc3cAAAAAGlSdQBYyVmlyUmdiTVF3MjNZNkRIemptTTBn"
}

Use that scrollId to submit a request to the scroll API endpoint /v2/files/search/scroll to get the next batch of 100 files:

{
  "scrollId": "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFmFhQ0xUVmc2UmotVXBlM1pCSDJYc3cAAAAAGlSdQBYyVmlyUmdiTVF3MjNZNkRIemptTTBn"
}

Keep resubmitting that scrollId to retrieve each subsequent batches of 100 files until you have received all results. Save each response locally so that you can analyse them later however you need.

The full list of initial search results will be cached by Sirv for 20 minutes, then deleted.

Each response contains headers stating your limit for requests per hour and the number of remaining to use.

x-ratelimit-limit: 2000
x-ratelimit-remaining: 1997
x-ratelimit-reset: 1714220380
x-ratelimit-type: rest:post:files:search:scroll

Different Sirv plans provide different hourly limits (also show on your Usage page). Assuming your scrolling search limit is 2000/hour, at 100 files per query, you can retrieve up to 200,000 results. If you need to retrieve more than that, your account has an extra allowance of triple limits once a month. It will activate automatically for a 24 hour period, thus increasing your limit to 6000/hour so you can return up to 600,000 files. Keep in mind that cached results are deleted after 20 minutes, so your script will need to send requests in quick succession to retrieve all the results.

A scrolling search represents a snapshot of results at the exact point in time when you made the initial search. Any filesystem changes made in your Sirv account after that point (uploads, deletes, renames, moves etc.) will not be reflected in the cached search results.

Query length limit

A search query string can contain up to 1024 characters. This lets you combine many AND, OR and NOT parameters in a single request.

Example scripts

The REST API search documentation contains an example payload, JSON schema, response and scripts for 10 popular programming languages:

Example response

This example response shows you the file information, including all the searchable parameters and also some of the most important file meta.

The end of the response shows the total number of results (1 in this case):

{
  "hits": [
    {
      "_index": "sirvfs-v4",
      "_type": "_doc",
      "_id": "d575e090616438ded527bb40da3bfa0c6d5f05ff",
      "_score": null,
      "_routing": "sdulth0oi0t9zxpxqtxwkwvgipjgv6ud",
      "_source": {
        "accountId": "sdulth0oi0t9zxpxqtxwkwvgipjgv6ud",
        "filename": "/Products/trainers/Adidas-Gazelle.jpg",
        "dirname": "/Products/trainers",
        "basename": "Adidas-Gazelle.jpg",
        "extension": ".jpg",
        "id": "Qw5Z9DNgvwWRXWF8WEHlfpI6c3Doyx6w",
        "ctime": "2021-02-12T17:01:01.333Z",
        "mtime": "2021-02-12T17:01:01.446Z",
        "atime": "2024-04-25T17:09:25.197Z",
        "meta": {
          "title": "Adidas Gazelle trainer",
          "description": "The classic Adidas Gazelle is a true all-rounder. You can create many looks with this trainer - sporty, smart or minimalist. Now available in 18 different styles.",
          "tags": [
            "Adidas",
            "Gazelle",
            "trainer",
            "sports shoe",
            "Bruce Campbell",
            "Approved-20190619"
          ],
          "width": 3000,
          "height": 2400,
          "format": "JPEG",
          "duration": 0,
          "history": [
            {
              "userId": "Bnhi4gGjhq8gDlt2Ga4q9CxD7lD",
              "timestamp": "2023-01-10T12:19:37.788Z"
            },
            {
              "userId": "Bnhi4gGjhq8gDlt2Ga4q9CxD7lD",
              "op": "rename"
            },
            {
              "userId": "Bnhi4gGjhq8gDlt2Ga4q9CxD7lD",
              "op": "create"
            }
          ],
          "EXIF": {
            "DateTimeOriginal": "2019-06-19T00:11:38Z",
            "CreateDate": "2019-06-19T00:11:38Z",
            "ModifyDate": "2019-06-19T13:45:44Z",
            "ColorSpace": "Uncalibrated",
            "Make": "Canon",
            "Model": "Canon EOS 5D Mark II",
            "LensModel": "Canon EF 24-105mm f/4L IS USM"
          }
        }
      },
      "sort": [
        "/Products/trainers/Adidas-Gazelle.jpg"
      ]
    }
  ],
  "total": 1,
  "_relation": "eq"
}

Ask for help

If you need help with the search API, send a message to the Sirv support team. Describe your requirements and share a sample of your script, if possible.

Was this article helpful?

Array

Get help from a Sirv expert

help ukraine help ukraine Powered by Ukrainian determination and British ingenuity

How can you support Ukraine