Thank you for this tutorial. As a beginner in machine learning(Thanks to Andrew Ng), I really need to know whether there will be any tutorials in future on training the models using caffe , tensor flow so on and so forth. There are loads of straight and direct approach to train Hair Cascade classifiers using OpenCV. So can I expect something like that from raywenderlich team using the other ML frameworks in the market?
hi, thanks for your question!
have you tried Berkeley Vision’s own caffe tutorial?
The RWDevCon17 session linked by this article uses tensorflow to train the happy/sad model.
Pinging @raywenderlich and @alexisgallagher for more replies to your question.
After dragging the mlmodel file to the Resources folder, I had to mark it as part of the app target in order for Xcode to generate the classes for the model.
What devices does ML and Vision work with? I know it works on the A10 chip, but does it work on the new iPad or 6s with an A9 chip?
@darren102 ran this sample app on his iPad (probably not the new one, so A9)
@hollance wrote a blog post, noting the beta has a bug that can cause specific models to crash on a device:
MobileNet uses a so-called “depthwise” convolution layer. The original model was trained in Caffe, which supports depthwise convolution by making the groups property of a regular convolution equal to the number of output channels. The resulting MobileNet.mlmodel file does the same. This works fine in the iOS simulator but it crashes on an actual device!
What happens is that the simulator uses the Accelerate framework but the device uses Metal Performance Shaders. And due to the way Metal encodes the data, the MPSCNNConvolution kernel has the restriction that you can’t make the number of groups equal to the number of output channels. Whoops!
I’ve submitted a bug report to Apple
Also, by now, Matthijs has probably tested on his new iPad
I’ve tried downloading all of the models and processed a few of my own images. None of the models appear to be able to classify images as containing people, although several promise that in the description. I’ve been using Amazon’s REKOGNITION classifier which has amazing accuracy for finding people in images. Are there any public models that can actually do this using the Vision framework?
hi Patrick: are you using the models with Vision? one thing to remember is most object detection models use only the centre square of the image, so if the people aren’t there, the model won’t see them. Using Vision to convert the images, you lose control of scaling etc.
Vision itself has face detection and, according to the WWDC demo, can find occluded and profile faces.
you know a model for the Japanese Text?
Yes, I’ve been using them with the Vision framework. I tried all of the models with a test image from the caffe project, and I don’t even see any labels that have anything to do with “people.” I tried with this test image:
and they all classify it as some kind of cat with roughly 30% confidence. But if I use images that contain people, not animals, none of these models produce any labels that I can use.
do you mean recognition of handwritten Japanese? I found these two links:
um, it is a cat … why would you expect the models to detect a person?
I was simply stating that I have my code working, properly recognizing cats. If I use an image of a person, there are no labels in any of these models that broadly classify a person or human. The REKOGNITION engine does do this, very reliably. I’m currently seeking some pre-trained models that do this. Nothing I’ve found so far seems to.
ah … that is interesting … maybe ML researchers regard people detection as a superset of face detection, and concentrate on faces, because that’s what most people are interested in?
also, those models that Apple supplies are probably the end result of competitions that specify the classifications of interest.
Hi @audrey sorry for late of answer but I was talking about Japanese text both printed or handwritten. but looking for vision in internet I’ve found some project that detect the text and draw some box on camera if detect the text. and it do it also with Japanese text and handwritten Italian text.
this is the most accurate.
after that I’ve found this: Vision Framework Accessing Text as… | Apple Developer Forums
in this thread he’s saying that is possible to get the string of detected text. unfortunately I don’t understand how use it. can you help me?
if you want you can use this code to update your tutorial. in this way everybody can get benefit
I am using Xcode 9 Beta 4. I am running the app on a real device. I have been using Core ML and Vision to detect objects. The same code was working fine with previous beta version but now the application is crashing when I try to load InceptionV3 model which is already present in my bundle.
I am using this code below to load the model
let model = try VNCoreMLModel(for: Inceptionv3().model)
Is anybody else facing the same problem? And can anyone point me to a solution?
Thanks a lot.
hi: the mlmodel inclusion can be a little temperamental. If your editor and utilities looks like this:
check the target membership box, and Xcode will see your mlmodel:
another suggestion from SO is to use Add Files to Project instead of drag and drop.
Thanks for your reply. I have fixed the crash. It seems the issue was related to not having same beta versions on iphone and mac. I had Xcode 9 beta 4 on my mac and ios 11 beta 3 on my iphone. Once I updated my iphone to beta 4. The crash was gone.
Thanks again for your response.
Might be cool to add GitHub - likedan/Awesome-CoreML-Models: Largest list of models for Core ML (for iOS 11+) to the Where to Go From Here? section. There’re lots of different Core ML and samples developers can try and experience.
thanks Jimmy, that’s a great resource! I’ve added it to the list :]