来自Google Cloud视觉响应Node.js的非最大抑制 - python

所以我想看看是否有可能在Google Cloud Vision API响应的nodejs中实现非最大抑制，例如，响应如下所示：

[
  {
    "mid": "/m/09728",
    "languageCode": "",
    "name": "Bread",
    "score": 0.8558391332626343,
    "boundingPoly": {
      "vertices": [],
      "normalizedVertices": [
        {
          "x": 0.010737711563706398,
          "y": 0.26679491996765137
        },
        {
          "x": 0.9930269718170166,
          "y": 0.26679491996765137
        },
        {
          "x": 0.9930269718170166,
          "y": 0.7275580167770386
        },
        {
          "x": 0.010737711563706398,
          "y": 0.7275580167770386
        }
      ]
    }
  },
  {
    "mid": "/m/052lwg6",
    "languageCode": "",
    "name": "Baked goods",
    "score": 0.6180902123451233,
    "boundingPoly": {
      "vertices": [],
      "normalizedVertices": [
        {
          "x": 0.010737711563706398,
          "y": 0.26679491996765137
        },
        {
          "x": 0.9930269718170166,
          "y": 0.26679491996765137
        },
        {
          "x": 0.9930269718170166,
          "y": 0.7275580167770386
        },
        {
          "x": 0.010737711563706398,
          "y": 0.7275580167770386
        }
      ]
    }
  },
  {
    "mid": "/m/02wbm",
    "languageCode": "",
    "name": "Food",
    "score": 0.5861617922782898,
    "boundingPoly": {
      "vertices": [],
      "normalizedVertices": [
        {
          "x": 0.321802020072937,
          "y": 0.2874892055988312
        },
        {
          "x": 0.999139130115509,
          "y": 0.2874892055988312
        },
        {
          "x": 0.999139130115509,
          "y": 0.6866284608840942
        },
        {
          "x": 0.321802020072937,
          "y": 0.6866284608840942
        }
      ]
    }
  }
]

所以实际上应该放在外面的边界框是这样的食物：

来自Google Cloud视觉响应Node.js的非最大抑制 - python

我已经在Python中找到了执行this的示例，但这意味着我需要在Node中使用子进程来执行python脚本，然后拉回响应，这有点脏。

显然，来自Google的这些框值需要乘以图像的高度和宽度，因此，例如，假设它为288 X 512：

      const left = Math.round(vertices[0].x * 288);
      const top = Math.round(vertices[0].y * 512);
      const width = Math.round((vertices[2].x * 288)) - left;
      const height = Math.round((vertices[2].y * 512)) - top;

我改编的脚本是这样的（只需对阈值进行硬编码并从命令行获取一系列数组）：

# import the necessary packages
import numpy as np
import sys
import json

# Malisiewicz et al.
def non_max_suppression_fast():
    overlapThresh = 0.3
    boxes = json.loads(sys.argv[1])
    # if there are no boxes, return an empty list
    if len(boxes) == 0:
        return []

    # if the bounding boxes integers, convert them to floats --
    # this is important since we'll be doing a bunch of divisions
    if boxes.dtype.kind == "i":
        boxes = boxes.astype("float")

    # initialize the list of picked indexes 
    pick = []

    # grab the coordinates of the bounding boxes
    x1 = boxes[:,0]
    y1 = boxes[:,1]
    x2 = boxes[:,2]
    y2 = boxes[:,3]

    # compute the area of the bounding boxes and sort the bounding
    # boxes by the bottom-right y-coordinate of the bounding box
    area = (x2 - x1 + 1) * (y2 - y1 + 1)
    idxs = np.argsort(y2)

    # keep looping while some indexes still remain in the indexes
    # list
    while len(idxs) > 0:
        # grab the last index in the indexes list and add the
        # index value to the list of picked indexes
        last = len(idxs) - 1
        i = idxs[last]
        pick.append(i)

        # find the largest (x, y) coordinates for the start of
        # the bounding box and the smallest (x, y) coordinates
        # for the end of the bounding box
        xx1 = np.maximum(x1[i], x1[idxs[:last]])
        yy1 = np.maximum(y1[i], y1[idxs[:last]])
        xx2 = np.minimum(x2[i], x2[idxs[:last]])
        yy2 = np.minimum(y2[i], y2[idxs[:last]])

        # compute the width and height of the bounding box
        w = np.maximum(0, xx2 - xx1 + 1)
        h = np.maximum(0, yy2 - yy1 + 1)

        # compute the ratio of overlap
        overlap = (w * h) / area[idxs[:last]]

        # delete all indexes from the index list that have
        idxs = np.delete(idxs, np.concatenate(([last],
            np.where(overlap > overlapThresh)[0])))

    # return only the bounding boxes that were picked using the
    # integer data type
    return boxes[pick].astype("int")

有人能在这里给我指点吗？我敢肯定，这只是关于计算每个盒子的总面积，但是我不能完全理解。

参考方案

好的，所以实际上，如果您使用Tensorflow.js，这非常简单-使用以下函数来获取Google视觉的响应：

注意288和512是我需要设置自己的图像的宽度和高度。

function nonMaxSuppression(objects){

    return new Promise((resolve) => {
      // Loop through the objects and convert the vertices into the right format.
      for (let index = 0; index < objects.length; index++) {
        const verts = objects[index].boundingPoly.normalizedVertices;

        // As above note 288 and 512 are image width and image height for me.
        const left = Math.round(verts[0].x * 288);
        const top = Math.round(verts[0].y * 512);
        const width = Math.round((verts[2].x * 288)) - left;
        const height = Math.round((verts[2].y * 512)) - top;
        // we need an array of boxes AND an array of scores
        this.boxes.push([left, top, width, height]);
        this.scores.push(objects[index].score);
      }
      // Params are boxes, scores, max number of boxes to select.
      const theBox = tf.image.nonMaxSuppression(this.boxes, this.scores, 2);
      // the function returns the box number that matched from this.boxes, seems like it's not zero based at least in my tests so we need to - 1 to get the index from the original array.
      resolve(theBox.id -1 );
    });
}

塔达香蕉！

Python:在不更改段落顺序的情况下在文件的每个段落中反向单词？ - python

我想通过反转text_in.txt文件中的单词来生成text_out.txt文件，如下所示：text_in.txt具有两段，如下所示：Hello world, I am Here. I am eighteen years old. text_out.txt应该是这样的：Here. am I world, Hello old. years eighteen a…

用大写字母拆分字符串，但忽略AAA Python Regex - python

我的正则表达式：vendor = "MyNameIsJoe. I'mWorkerInAAAinc." ven = re.split(r'(?<=[a-z])[A-Z]|[A-Z](?=[a-z])', vendor) 以大写字母分割字符串，例如：'我的名字是乔。 I'mWorkerInAAAinc”变成…

如何在python中将从PDF提取的文本格式化为json - python

我已经使用pyPDF2提取了一些文本格式的发票PDF。我想将此文本文件转换为仅包含重要关键字和令牌的json文件。输出应该是这样的：#PurchaseOrder {"doctype":"PO", "orderingcompany":"Demo Company", "su…

我怎样才能从字典的键中算出对象？ - python

我有这本字典：dict={"asset":[("S3","A1"),"S2",("E4","E5"),("E1","S1"),"A6","A8"], "…

Python:将两列组合在一起，找到第三列的总和 - python

python真的很新，需要我完成的问题需要一些帮助。我需要根据用户对月份（MM）和年份（YYYY）的输入来找到每个时间段（月/年）的平均收入。我的输入如下：year_value = int(input("Year (YYYY): ")) month_value = int(input("Month (MM): ")) …

来自Google Cloud视觉响应Node.js的非最大抑制 - python

腾讯的同事天天给我安利让我看《三体》，说马化腾和雷军也在…