iOS 11: NLP with Core ML | Ray Wenderlich

samdavies · November 30, 2017, 8:17pm

Natural Language Processing (NLP) has benefited greatly over recent years through the development of machine learning techniques. Discover how the introduction of Core ML in iOS 11 makes NLP highly accessible with this screencast, in which you'll learn how to automatically classify movie reviews as positive or negative.

This is a companion discussion topic for the original entry at https://www.raywenderlich.com/5038-ios-11-nlp-with-core-ml

yoele · December 2, 2017, 4:48pm

@samdavies Love it when you give a bit of theory, it helps connect toe dots (

will there be a course on how to build a CML model ? get the data, classify, train and so on ?

That would be amazing

Thank you for another great screencast

samdavies · December 5, 2017, 12:27pm

Hi @yoele

I’m glad you appreciate the theory—it sometimes feels a little self-indulgent, but I’m quite passionate that we all need a bit more background, especially where Core ML is concerned.

I’m not sure what machine learning content we’ve got lined up—it’s certainly an area we’re interested in, but I don’t know how far along we’ve got with producing material.

sam

rufy · October 18, 2018, 12:28pm

Hi @samdavies , I’ve a question: i need to train a model with CreateML that makes tokenization, Part of speech, lemmatization. is it better to train only one model (i don’t know if it is possible)? or is it better to train one model for tokenization, one model fo Part of speech and one model for lemmatization?

Thanks

samdavies · October 19, 2018, 7:13am

Hi @rufy

I’ve never used Create ML for NLP, but from what I understand, it can only cope with classifying text. i.e. you provide it some training sentences, each of which has a label.

As part of this, I imagine that it’ll use in-built tokenisation, lemmatisation etc, although it’s not clear from the documentation the underlying model and process it uses.

If you explicitly need tokenisation and lemmatisation, then maybe using the in-built NLP functionality, and then train a model on the part of your system that isn’t already covered by Core ML. You can run this as a process before your training and testing with Create ML on the Mac, and then just ensure you repeat the same procedure before attempting classification on iOS.

Hope that makes some sense—hopefully I understood the question well enough to give a useful answer.

sam

rufy · October 19, 2018, 8:11am

Thank you very much @samdavies.
the need to create nlp models with createml derives from the fact that the only built-in model in ios is only tokenization and unfortunately it is not very accurate. I’ll explain better: for my app Jaapp-Dizionario, as it is a dictionary, I want to allow the user to be able to insert a Japanese sentence and do the search on all the words of that phrase showing the results of the words found. as for the Japanese language, only tokenization is made. lemmatization and part of speech, unfortunately not. so I wanted to make them myself. but I do not know if the best thing is to create a single model or one for each type of analysis. in any case, by reasoning a little it seems more reasonable to make a model for each type of analysis. and then use them in cascade or as needed. in any case I thank you for the answer and in case you think of some idea or you have some reference, post it too

samdavies · October 19, 2018, 9:01am

Oh—I didn’t realise that lemmatisation isn’t supported in Japanese—I thought you could use NLTagger to lemmatise text, by passing the appropriate NLTagScheme, but I guess that’s just for English.

I don’t really have any other advice I’m afraid. I would start by searching the web (particularly academic research papers) for lemmatisation of Japanese language text to see whether there were successful approaches you could base your work on.

Sorry to not be of more help

sam

rufy · October 19, 2018, 9:57am

don’t worry @samdavies. I speak japanese. so it’s no problem. The only my problem is: Can I create only one model to lemmatization, tokenization and part of speech (is it a good thing)? Or is it better to create one model for each type of analysis (e.g. one for tokenization, one for lemmatization etc)?

samdavies · October 19, 2018, 10:09am

@rufy I’m afraid I don’t know the answer to that. You’ll have to research some literature to work out how to approach it.

Good luck!

rufy · October 19, 2018, 10:11am

I see. anyway, thank you