What is Hyper Collocation?

Hyper Collocation is a search engine for finding example sentences from 811,761 English papers included in arXiv. To clarify typical phrasing, the search results are sorted by the frequency of the collocation. This tool is mainly intended to help academic writing and can be used as an alternative to Springer Exemplar (discontinued).

In the beginning, I developed this tool for my personal use but decided to go public because an r4.large instance of Amazon Web Service EC2 is too big for personal use. Enjoy! 🙂

Usage Examples


  • Migrated from r4.large instance to r5.large instance.
    The maintenance cost becomes low (0.16 → 0.152 USD / hour), and the processing becomes slightly faster. (2018/9/29)


  • Query is case-sensitive.
  • Query is whitespace-sensitive.
    So, “word” and “␣word␣” produce different results.
  • [Math] in the snippet indicactes there was an equation in a separeted line.

Privacy Policy

  • This site uses Google Analytics to keep track of usage to maintain and improve the service. At that time, information such as IP address may be collected by Google, Inc.
  • In using this site, Google analytics mentioned above and CDN (Content Delivery Network) used for server load reduction use cookies. If you do not agree to the use of cookies, you can disable the use of cookies by the server in your browser settings.
  • By using this site, you are considered to have given permission for data collection within the scope of privacy policy.


Hyper Collocation depends on many open projects. While it is difficult to list enormous number of the projects on which Hyper Collocation depends, I especially would like to acknowledge Succinct Data Structure Library (SDSL), Pandoc, Crow, Vue.js, and arXiv Bulk Full-Text Access.

Hyper Collocation is developed and maintained by Ichiro Maruta.

{{result.occs}} occurrences of *{{result.query|visws}}* found. ({{numeral(result.elapsed*0.001).format('0.000')}} seconds) detail

Searched phrases cover {{result.coverage}} ({{result.coverage/result.occs | percent}}). Largest missing phrase may have {{result.largest_remaining}}({{result.largest_remaining/result.occs | percent}}) occurences.

Not Listed
Not confident
Not Listed


Some low frequency (p < {{result.largest_remaining/result.occs | percent}}) phrases will be missing. The following ranking is inaccurate.

Now Searching...

please wait for a while.