Dropbox improves its «machine learning» to find content in PDF texts and images

All the storage clouds they fulfill their mission to perfection. However, the dispute over who offers a better service is getting higher and higher. The objective of these companies is technologically stand out from the rest to offer tools that other platforms do not offer.

This is the case with Dropbox. Last month it introduced "machine learning" across its entire platform. Thus we can search text in PDFs or images using a technology that manages to index a large part of the content uploaded to the cloud. Just today they just announced measures that improve this technology, and it is expected to work better than ever.

Premium features such as machine learning for premium users

The goal of machine learning is to make artificial intelligence itself capable of performing functions that improve user productivity. In the case of Dropbox, this "machine learning" allows users to search through documents that could not be because they are not indexable as such, as can be images. This technology is based on machine learning and, of course, Optical Character Recognition (OCR).

OCR is a process aimed at digitizing texts, which automatically identify symbols or characters that belong to a certain alphabet from an image, and then store them as data.

Users with Subscriptions to the more "premium" plans of Dropbox can now use this tool. The mechanics are simple: you type something in the search engine in the cloud and it will find almost any document that matches the search term. The problem with all this was that the image formats are not indexable because they do not have text content as such. In contrast, files with TXT, HTML or DOCX extensions are easier to recognize because they are text themselves.

El beneficio potencial de reconocer automáticamente el texto en las imágenes (incluidos los archivos PDF que contienen imágenes) es tremendo. Las personas han almacenado más de 20 mil millones de imágenes y archivos PDF en Dropbox. De esos archivos, 10-20% son fotos de documentos, como recibos e imágenes de pizarra, en lugar de los documentos en sí. Estos son ahora candidatos para el reconocimiento automático de texto de imagen. Del mismo modo, el 25% de estos PDF son escaneados de documentos que también son candidatos para el reconocimiento automático de texto.

The usability of this tool is infinite. Imagine that we take a photo of the electricity bill for the month of August. If we do not find, by any chance, that receipt and we know for sure that we have stored it in Dropbox; We know for sure that Dropbox's machine learning will have indexed the content of that photo and the information will be shown to us in just a few seconds.


Follow us on Google News

Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: AB Internet Networks 2008 SL
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.