What is Hyper Collocation?

Hyper Collocation is a search engine for finding example sentences from 811,761 English papers included in arXiv. To clarify typical phrasing, the search results are sorted by the frequency of the collocation. This tool is mainly intended to help academic writing and can be used as an alternative to Springer Exemplar (discontinued).

In the beginning, I developed this tool for my personal use but decided to go public because an r4.large instance of Amazon Web Service EC2 is too big for personal use. Enjoy! 🙂

Usage Examples


  • Migrated from r4.large instance to r5.large instance.
    The maintenance cost becomes low (0.16 → 0.152 USD / hour), and the processing becomes slightly faster. (2018/9/29)


  • Query is case-sensitive.
  • Query is whitespace-sensitive.
    So, “word” and “␣word␣” produce different results.
  • [Math] in the snippet indicactes there was an equation in a separeted line.


Hyper Collocation depends on many open projects. While it is difficult to list enormous number of the projects on which Hyper Collocation depends, I especially would like to acknowledge Succinct Data Structure Library (SDSL), Pandoc, Crow, Vue.js, and arXiv Bulk Full-Text Access.

Hyper Collocation is developed and maintained by Ichiro Maruta.

{{result.occs}} occurrences of *{{result.query|visws}}* found. ({{numeral(result.elapsed*0.001).format('0.000')}} seconds) detail

Searched phrases cover {{result.coverage}} ({{result.coverage/result.occs | percent}}). Largest missing phrase may have {{result.largest_remaining}}({{result.largest_remaining/result.occs | percent}}) occurences.

Not Listed
Not confident
Not Listed


Some low frequency (p < {{result.largest_remaining/result.occs | percent}}) phrases will be missing. The following ranking is inaccurate.

Now Searching...

please wait for a while.