AI to Help Content Managers with Tricky Knowledge Management Tasks

Time

Wednesday, 9:45 am CDT - Wednesday, 10:45 am CDT

Location

Room 314B

Description

Adding the appropriate tags to content (aka classification) can be a really tricky task, especially if you have more than one content manager. Our team at JSI observed this while working on the Infant and Young Child Feeding (IYCF) Image Bank, a website that serves as a repository of open-source illustrations designed to be used for international health and nutrition programs. In this session, we will show how we built an integrated machine learning model (based on Tensorflow.js) to analyze and categorize newly uploaded images according to our own previous categorizations of those images. This reduces the time it takes our content managers to add content to the site and enhances accuracy/consistency of image classification. The session will frame the issue and then do a deep dive into the code required for our solution.

Speakers

Nia Kathoni

Senior Software Developer @ John Snow, Inc.

Daniel Cothran

Drupal Team Lead and Senior Technical Advisor @ John Snow, Inc.

I started working with Drupal in 2010 and am maintainer of a few contributed modules, primarily the Charts module.

Daniel Cothran squashed by inflatable Druplicon

Track

Back-End

End users / Content Admin

Front-End

Feedback Form

Transcript

DANIEL:
Alright. Well, welcome to 'AI to Help Content Managers with Tricky Knowledge Management tasks. So, presenting today are Nia Kathoni, a Senior Drupal Developer in Guelph, Canada. And Daniel Cothran, Senior Technical Advisor and Drupal Lead living in Oak Park. (BACKGROUND CHATTER) And both of us work for JSI. It's an organization dedicated to improving people's lives around the world through greater health, education, and socioeconomic equity for individuals and communities, and to providing an environment where people of passion can pursue this cause. So, we're a nonprofit with a for-profit arm as well. And we do consulting and research, and we work on projects that range from very small to very large projects in the US and globally. (BACKGROUND CHATTER) So, an outline for today. We're gonna just discuss knowledge management and some of the tricky tasks associated with it. And then we're gonna use a case study of the IYCF Image bank that stands for infant and young child feeding. Then we'll talk about a machine learning approach that we used to help with some of our knowledge management tasks on that website.

How we actually, implemented the approach we wanted to take. We have a couple of little videos demonstrating how the approach works. And then we'll sort of wrap up with a broader discussion about the promise of embedding artificial intelligence and machine learning into websites, and then wrap up with questions. I will use AI and ML a little bit interchangeably, even though I recognize that they're not the same thing. OK. So, 'Tricky Knowledge Management Issues'. (DRAGS OBJECT) So, I guess, first of all, if you're not really familiar with the term right now, knowledge management is the process of enabling groups to methodically capture, create, share, and apply knowledge to better achieve their objectives. And with Drupal having such great strengths in, in content management, it's a really great platform for knowledge management activities. And especially with its flexibility there's many opportunities to address challenges that come with knowledge management. So today we're gonna really talk about one of them, which is categorization.

So, categorization's a fundamental component of knowledge management. It helps with organizing and making sense of large amounts of information. And one of the ways that we categorize in Drupal is by tagging content. So, it's fundamental, but there are a lot of challenges. Any of you who are content managers, I'm sure know this well. Some of these challenges that, that come along with categorization are subjectivity. So, different people looking at the same thing may tag things differently based on their own biases, knowledge, experiences, and perspectives. Sometimes content can just be really complicated, and it's, it's really tricky to, you know, pigeonhole something into a, you know, a discrete category. And then sometimes the context in which you're viewing something as a content manager can influence how you decide to categorize something. So, I just wanna use a quick little example here. This is an image that's taken from the IYCF Image Bank. And I want you to think about, how would you tag this image out of these possible tags listed on the right?

So, we have maternal health, counseling, stress, community health worker, gender roles, emotional support, patient interaction, and pregnancy reveal. OK? So, think about that for a second. And now what about if I showed you this image while you were tagging the other image? And this image has a, a title that comes along with it that says, 'Mother receiving emotional support from her husband and a friend.' So, right away you could probably eliminate some of those tags. For example, pregnancy reveal, patient interaction, community health worker. 'Cause there's no mention of any kind of clinical or community health worker staff, probably now counseling as well. Some of the tags that you might think about would be maternal health, since it is a mother. Potentially, stress because there's a need for emotional support. Emotional support, obviously, because it's in the title. And then gender roles, you know, that could be something that you might tag it with, but maybe not. So, showing this additional context doesn't (INAUDIBLE) perfect, but it does help you eliminate some options.

Just to summarize, categorization is difficult, but there are other ways that we can make it easier and more efficient. And one of those ways is to provide additional context, such as by showing other similar examples, and how those examples have already been categorized. But how do we figure out what content to show as context? And as we'll see in subsequent slides, machine learning can help us to figure out some of that, and do that so that content managers can make easier, faster decisions and spend time on other, more important issues. (BACKGROUND CHATTER) OK? So, like I said, we're gonna do a case study approach talking about the IYCF Image Bank. This is a website that we built several years ago. It's a repository of open-source illustrations on topics that are related to infant and young child feeding. The initial images were created for UNICEF counseling cards. So, these would be like a booklet that you would take as a community health worker or nurse, and show to someone who's seeking health services or nutrition support, breastfeeding support, that type of thing.

(BACKGROUND CHATTER) And this website started as a memorandum of understanding between UNICEF and the project I was working on, funded by USAID. And it's, it's continued into a new project that's called USAID Advancing Nutrition. If you'd like to visit the website, it's iycf.advancingnutrition.org. So, like I mentioned, it's a repository of open-source illustrations. Each image has its own page from which visitors can download a layered image file that you can then tailor to your own context. So, you know, changing some aspects of appearance, dress, putting other objects in the, the images, whatever you need to do to make that image more impactful in your particular situation. We also display similar images from the same or different contexts, which you can see a little panel of those on the bottom right. And overall, we have about 1,400 illustrations on the site. So, like I mentioned, we, we do show similar images on those image pages. And we use a code to connect the images that are similar in appearance.

And the reason we're using a code, and not just maybe a word or a collection of words is because there's not necessarily a good way of sort of stating what makes these images similar to each other, at least not concisely. Sometimes the code will match a public health topic. So, in this, in this case, the images you're seeing are of an early initiation of breastfeeding. But sometimes they're not related to any public health topic. For example, we have like 29 distinct illustrations of bowls on the website. And so, you know, that... It wouldn't make sense to call that a public health category. And some of the images are, like I said, just too complicated to really define in a couple o' words. The challenge that we faced was really, if we have content managers other than sort of the, the people who designed this website from the beginning, we need some way for them to know which visual grouping code to use when they're uploading new images to the site because we are constantly creating new images.

So, that's where I'm gonna hand it over to my colleague Nia.

NIA:
OK. Hi. So, I'm going to talk more about what approach we took, and why we ended up using AI or machine learning. So, we had to select like one technology. And as you can see, we ended up choosing like TensorFlow.js. TensorFlow.js by itself it's a... So, basically, like before we selected TensorFlow, the reason we wanted to do it, we wanted something that's easier to get started with, and it's also like easier to customize. And also, something we can scale and even use on an existing website, and with existing data, and just like getting started as quickly as possible. So, in that, we ended up choosing TensorFlow. And then TensorFlow, basically it's an end-to-end, open-source machine learning library, as I said, developed by Google. It makes it easier to build and deploy machine learning models. It offers like two APIs. And one of the one we actually went with is the, is the high-level API called like, like KERAS, to allow developer to make custom models without having to deal with the mathematics involved in machine learning.

With TensorFlow as the library, because you can also use it with Python and other programming language. But for us we wanted... We went with Tensorflow.js since with JavaScript it's something we can easily embed in Drupal. And with the client's benefits and its integration. So, this is some of the use cases of TensorFlow, where we can use it like for computer vision with object detection and image recognition. For example, like the picture here of the... It's like a picture of a dog and a ball. So, you can use that to, to generate TensorFlow. That's actually, like an example of a demo of TensorFlow under the, the first link below. And then this is actually... The other picture here it's my image here, with my webcam of a demo of TensorFlow, where it can detect the part of your (INAUDIBLE), like something... I think on the Sephora website they use the same technology, where you can go there, and if you want to test a lipstick, that's how it detect the lipstick, and put it on your, on your face.

You can also use for audio speech recognition and other natural languages. So, there are many machine learning models, like, supervised learning. There is also unsupervised learning, semi-supervised learning, and reinforced learning. With the supervised learning, for example, it's like... That help with classifying objects like spam detection, text categorization, and object recognition. And the unsupervised learning, that's what you use, for example, for clustering and product recommendations. And when you use, for example, reinforced learning - And that's what people use in game development and robotics. I mean, with other large language models like GPT-4, even if you end up using TensorFlow, you still have those capability. (BACKGROUND CHATTER) So, here, we, we had to choose a machine learning model to use. Here, like I'm showing... We ended up choosing like Mobilenet, which is a machine learning for TensorFlow. And we kind of use it along with another utility tool called K-Nearest Neighbor.

It's used like a K-Nearest Neighbor algorithm. So, for example, here you can see in this picture, it's one of the picture we used on IYCF website. If we pass it to an existing model, and tell it to give us what it think this picture is about, yeah? I don't know if you see the text, but you can see that it says almost by 80% it's a kimono in this picture. So, this is basically, like a model that have been trained with other datasets, that has nothing to do with our website. So, you can see that even if it's trying to give some, some predictions saying, this is a jersey, a t-shirt, a trumpet, or something like that, it doesn't really say what we needed to give it. But again, by using this same model, we were able to... The model itself we were able to train it with our existing datasets to give it the prediction that we wanted. So, the steps, when you are basically, faced... You're trying to solve a machine learning problem, is like you gather the data, you explore it, you prepare it, you build, train and evaluate the model you're gonna use, and then you can tune it.

And at the end of the day, you end up deploying the model. So, for us, like the advantage we had, we had already the existing data on the website. And they've kind of been classified already. The only challenge now we had is just like, try to find a way to store those data locally because we did not want to use any existing server outside from our website. We find a line, and then retrieve the, the training datasets. And then when the user is altering contents, try to provide them with a way to, to generate predictions of the image they're uploading. (BACKGROUND CHATTER) So, this is where now we're gonna talk about implementing that approach in Drupal. What we did was creating a configuration form that has all the pictures grouped by the grouping already made in Drupal. So, even if, for example, like here, you can see that the visual grouping, it's like those codes, but in the backend, I... In the markup generated by Drupal with the code we're giving it... Because it was a taxonomy term, we're giving it a taxonomy term ID.

And we're using that taxonomy term ID as the classification. So then, once you, you are on this page you could just click a button, and then we basically, now use that classification, that is the term ID, give it the image, and then we give it the term ID. And using that model, and tell it basically, this model, this is our classification, which is a number. Now, we want it to classify this image as this term ID. So, that's how we basically, now generate it. And then on the configuration form, like in the browser, it just save a JSON file. And once you save the configuration form, Drupal will save the... The file on the Drupal files. I mean that's basically, I will say, like regular Drupal things. So, now, once we finish to create the training data set, we now had to figure out how we're going to use the data set. So, in this approach, basically, what we did was like to create a form (UNKNOWN). And then you can see there is a visual grouping dropdown that was there. So, once we are on this page, in the backend, our JavaScript file is loading the training data sets and train the model on the spot.

And then wait for the user to upload an image. And then once they finish to upload an image, they can see... They can just click, 'Get predictions', and it's gonna provide the prediction. So, without further ado, I'm going to play a, a little demo. (BACKGROUND CHATTER) So, here, I'm basically, demonstrating how you go on the form. And this form was kind of very long. We had, I think, around 1400 images already categorized. So, basically, while you are viewing the, the form and going down, in the backend, we are also loading all the JavaScripts needed, so that when you get to the, to the bottom of the, of the, of the page you find the, the button that you can just click and tell it to generate the, the, the dataset. So, then once you click the 'Generate' button to, to train, basically, now to train the datasets, the JavaScripts pull in, generate the dataset, and upload it on the Drupal site. That is just the file field. And then once you upload the file field on Drupal, Drupal kick it's... By default, will just kick, kick the, the Ajax, and it will upload the file.

And the only thing you just need is to save, and you basically have a training dataset that you can use a machine... With machine learning. Sometimes I think if you are working with, if you are working with, with other machine learning, you can... Yeah? That's what you basically, have to store on other servers, or you can store them on Google Drive. So, the next part of the, of the demo is how now the, the author can come on the website, and then try to add content and be able to, to now use the, the, the... To use the, the, the generated training data sets to get predictions. So, here again, once they're entering the title, they don't know it yet, but the JavaScript in the backend, because we load it with Ajax, is training the model to... And wait for them now to upload the image. So, here for example, what I used, I basically, actually, picked one of the image which was not part of our data... Of our images. It was an image I downloaded online, that is a bowl of food, just to try to see how our training data set is.

And then I uploaded it. And then once I finish to upload it, there is... I can now click that 'Get predictions'. And then after clicking on the 'Getting predictions', there are those markup we generate that have the codes. And then you can basically click the, the link of the code. It shows you all other images that have already been categorized to confirm the prediction you are getting, and to make sure that at least you are with the confidence it's, it's good. You can see that it's, it's giving prediction, and you can see how. So, after that, again, I went ahead and removed this image. So, for example, then you can click 'Select' if you're happy with the prediction. But then I was like, just trying to confirm that everything is fine. I went ahead and select a couple of other images. That's what's gonna happen in the two videos. For example, this here is an image of a baby. And then if you scroll down again and click 'Prediction', it gives you new predictions. And then you can upload another image If you... If, for example, if you are trying to upload more images and get more predictions.

So, basically, once you finish to upload an image, you just have to click 'Predictions'. And then either if you are happy with... And then you can, again, click and see if it's giving you the, the, the related prediction. Basically, here I'm showing how you can use machine learning, and use the existing Drupal APIs, like creating a model, generating markup, so it doesn't... You doesn't need to really to spend more time learning about AI, you just need those fundamentals. And you can use them like in Drupal, and it make it easier. So, that's the, the gist of it. And now, I'm gonna pass it back to Daniel, who's gonna talk... 'The Promise For Embedding AI and Machine Learning into Drupal'. Thank you.

DANIEL:
Thank you, Nia. So, I, I think that, you know, from that we saw that there is quite a bit of promise when it comes to embedding AI and machine learning into Drupal. And you can do it either in the way that we did it, where you kind of keep everything contained within your site, which is really preferable for us, just because we don't wanna have to be managing other servers or, you know, paying for additional services and things like that. And it's not such a huge data set that we really... We're running out of resources on our site or anything like that. So, some of the tasks that I thought could be helpful to talk about when it comes to embedding machine learning or AI into your site, you know, choose those tasks that are tedious, that are routine. Tasks that are prone to inconsistency when humans are doing them, such as this categorization that would be very prone to inconsistency. Also, things that require intervention when people are not available, or when it would be too expensive.

So, some of the things that I kind of saw related to that or like when you're doing searching, or you, you need to chat with... You need to chat with help or something like that, you can use AI for those types of things because it might be too expensive to have that level of staff available to help out with different tasks. Also, it's really great when the task could otherwise be harmful to people. So, for example, no one really wants to be viewing extremely graphic content in order to block it. And so, if you can have a machine handle that for you, that's definitely better. And the other thing that's sort of important to recognize is, you know, a lot of this is automating things. But if you can automate without machine learning, that's probably the, the simpler task. So, you don't need to use machine learning when it's not necessary. And I would say, my opinion at least for the time being is, use it as a tool to help people in the real world accomplish their tasks better, easier, more efficiently, rather than using it as a way of just replacing people.

And then I just wanted to list... To link to a few modules that are already on drupal.org and have either full or beta releases. So, there is OpenAI, sort of suite of modules. There is automatic alternative text module that's... We found out about, about this one today, or not today, but as we were creating this presentation. And that's definitely one that would be helpful for categorizing or applying alternative text to the images that we're uploading on the site. There's also search API solr natural language processing, and telemetrics. So, with that, we'll switch over to questions. D'you wanna come up here for questions?

SPEAKER:
So, did you guys start it off with data set setup for 1600 images or something like...

DANIEL:
Yeah...

SPEAKER:
So, I was just curious if you guys have any (INAUDIBLE) threshold of the amount or the size of your data set for training that your return got much better and the results became much better (INAUDIBLE)?

NIA:
Yeah. So, I mean, it's basically one of the, the reason we went with TensorFlow and the Mobilenet, because by itself the Mobilenet being like a, a, a semi... Like a trained machine learning model, so it has most of the training already that you can use, and the technology to train data. So, you don't need a, a bunch of data sets to have already that effectiveness of a better... Of a bigger trained machine learning model. So, I mean, it's basically, like when I was going through and doing research, and that's where I found out that using this approach, you can, you can even give it like three or four images. And you can go from there, so yeah. Because I remember we had another developer that was saying, "Yeah, but how are we going to start? We have so much... Like, our data set sample is, is, is small." But when I was trying to get which model to use, and I ended up finding Mobilenets and, and the... The KNL utility tool. So, it allows you to start with a small data set, because I think the idea behind Tensorflow.js is to basically, break... To allow people to easily getting creative without having... I mean, people don't have the resources to start to gather the data, especially if you are a small shop, and you don't have the budget for it or to find the data, or to have people who are gonna give you the, the data.

So, it's basically allowed you and give you that opportunity to start with a small data set, and get some of the results based on your own existing content.

SPEAKER:
Cool. Yeah. I was curious about the scale, like will it make sense to use this rather than like human curation? But it sounds like you can have very few images to categorize.

NIA:
Yeah.

SPEAKER:
You talked about classifying images, do you have any other in this cases that you're interested in?

DANIEL:
I would say not just images, but also documents. S,o for a lot of other sites that we have, they're more like... We have resource libraries. And so, when we upload them, there's always an issue with tagging. A lot of times we'll be generating... Lemmi get closer to the mic. We'll be generating like, you know, 100-page technical report or something like that related to public health. And the people who are doing the knowledge management or, or content management on your site aren't necessarily the public health people. And so, you know, they do have some knowledge and awareness of what's happening on the project, but they're not an expert. So, if it's, if it's like, you know, iron fortification versus micronutrients, or something like that, they might not really be able to distinguish those things very well. And so, I, I think if we could do something like that, that'd be helpful. The other thing, one of the things that I saw was, for SEO and stuff like that, that's, that's something a lot of times we don't really pay attention to just because we don't make money from website visitors.

And so, we want, obviously, our, our things to show up at the top of a search, but there's less incentive for us to put a lot of money into SEO. And so, if we could have like machine learning or AI, like automatically applying meta tags or other things like that, that might be helpful for SEO, then that would be great. And another thing that I saw was, I think this was part of the OpenAI module that I link to, once you start writing a summary or something like that, you can highlight what you wrote, and then it has a way to make the language friendlier, or check the grade level, things like that. So, I think making better readability and things like that is really good 'Cause in public health we really struggle with some of those things. It gets very technical. So, I think those would be other really good use cases for us.

SPEAKER:
And if you don't understand follow-up, are you using your own internal developed taxonomies? Or are you really relying more on the machine learning models to generate the terms into this environment maybe to enrich? Or what is the, the mix of automated versus (INAUDIBLE)?

DANIEL:
Yeah. This is really our first foray into this. And in, in this case, we were, were using the IDs we had already developed. And so, it was really just making sure that we were being consistent when we uploaded new images, We weren't creating new IDs unnecessarily, and things like that, it just... It gave you that little bit of extra context that you needed to make sure that you're tagging it appropriately. Usually, we do start with predefined taxonomies even before, you know, building a website. I've worked on projects where the, the taxonomies are like a constant battle, it's like we've, we've got them set and then six months later, you know, maybe we have a new client who joins the team or something like that, and they're like, we don't like these taxonomy terms anymore. And it's like we have to re-tag everything. So, I could see, you know, probably some utility there as well as like figuring out ways to more efficiently re-tag stuff that's already on the site when the taxonomies change.

SPEAKER:
Thank you.

DANIEL:
Yeah.

NIA:
Yeah. I'd just like to follow up to that question, there is actually, another TensorFlow or machine learning tool. So, for example, if you go at simple machine learning for sheets.com, so it's basically, like a, a model where you can have a table of data, and it can analyze your data on your Google sheets. And then try to complete the missing data and give you like a confidence level it has with what, for example, the user folder filled up there. And not only you can use that only on Google sheets, it's also allow you, for example, to write the JavaScript, where I could say, for example, like I could see us using that where, you have, for example, you have a data and you've built like a table of maybe your users, and maybe some of them have to select what they like or not, and you have that table on the user interface. So, you can basically, for example, click a button and it can try to tell you, yeah, maybe this user would have liked this based on the, on the audience you have on your website.

So, those are kind of like the other use case you can try to apply, for example, on, on any Drupal websites.

DANIEL:
Yeah, actually recommending related content is another use case that would be really helpful for us. So, you know, if you're reading about early initiation of breastfeeding, then maybe you, you want another breastfeeding-related topic or whatever else. And we've tried to do that in certain ways just based on like, you know, the, the tags, but not with really sort of any intelligence applied to it. So, I think that's, that's a use case where having the artificial intelligence is really helpful. (BACKGROUND CHATTER)

SPEAKER:
But is there atleast terms like getting TensorFlow to a lot of staff locally? Is there like a cost of, you know, using like city server resources, where you're paying for a machine learning licensing or anything like that? Or is it all open-source, all free to use?

DANIEL:
Yeah, it's all open-source, free to use.

SPEAKER:
'Cause I was looking at the automatic alternative text module right now, and it looks like it uses Azure, the model, 'Cause they have a licensing cost to Microsoft in order to use it. And so, I know that's, you know... Have you looked into any of the higher-cost systems or are you trying to just keep everything as the free local versions?

NIA:
Yeah. I mean, with TensorFlow, definitely, it's one of the, the main thing I like about... Every time we have a topic about machine learning at work, I mean, I, I know right now GPT-4 is what I would call cool kids in the block. But for me, I always go back and try to see if I can solve it first with TensorFlow before I try any other machine learning. Because at least with it I know that I'm not going to incur any cost, or sacrifice any privacy for the data I'm gonna work with before I can... And then if, I mean, if it can't work then probably I can just go back and say, yeah, we can't have this solution without trying to use another machine learning tool.

SPEAKER:
Thanks.

NIA:
Yeah.

SPEAKER:
Did you find as you were seeing other results from the machine learning, if there were like any adjustments that you ever need to make? Like you really need to make? Like, oh, that's not quite what we wanted it to do, and make some adjustments there? (COUGHING)

NIA:
Yeah, I mean, that's what would have been our next step with these projects. So, that, for example, when... Like, when it was showing the... Like the, the demo there, when it select the predictions, what I would have actually, liked is that, if they end up selecting another code, then kind of to re-train, add that kind of a training, that part of the training data set. And I mean, we could... We definitely could have done that. And that would be basically, like an adjustment on the fly. So, if others say they know, yeah, I know this would have been maybe a code 6-0-0-0, but since I know this system, I want it to be 6-0-0-1. And then they can just change that. And then we're just gonna add that image, because I, I think the way to see it is like, with... Atleast with, with the model we had, you basically, have your image here, and you have your classification here. So, you just have to tell it, yeah, this image is this classification. And that's how it keep learning. And then every time you ask for a prediction, it's like, yeah, I've seen a lot of images like this.

They're classified this way.

SPEAKER:
Thank you, Nia.

NIA:
Welcome.

DANIEL:
Yeah. I think from our testing, the predictions were actually pretty good, and it was really helpful having that modal pop up so you could see what else was tagged with that same thing. I think one of the challenges that we faced was basically loading all those images in order to train the data set. It was like, Drupal doesn't do that very well. We definitely had to have some lazy loading. Sometimes you would have to... You'd have to sort of... You could see little broken image links. You'd have to refresh the page, and then eventually, you know, you could use Drupal's caches to make it so you could load that page, and then run the, the training. So, that's a place I think we would work if we had, you know, more time in this project. We also kind of... After we developed this, we talked more with, you know, our client on this, and he was saying, "Well, I would actually kind of like to have some sort of utility like this before I even get to the stage of adding an image. Like, I'd kind of like to know going in what this classification was." And so, I think if we did it again, we would probably just have a standalone form for categorizing it, or something like that.

So, you know, definitely some lessons that we learned from this project. But I mean, I am really impressed. It's very effective in terms of predictions.

NIA:
Any other questions?

SPEAKER:
So, you guys have solved categorization now or you guys have to play with TensorFlow when it comes using AI as a model? Are you guys looking at any kind of progression problems that you guys wanna solve? So, whereas categorizations like, you know hot dog to hot dog, the progression would be like, what comes next out of an existing data set? Or something predictive like (INAUDIBLE) predictive, you know?

DANIEL:
Yeah. We don't have anything like immediately on the horizon. I definitely think there can be... We work for a big organization. Sometimes we're a little bit more conservative in, in certain ways, I think. So, I think kind of convincing people that we should be trying some of this out, you know, there are lots of... I don't wanna say we're always conservative and we never try things out. But like, you know, sometimes it takes a little while for these things to just catch on. And then the other thing is sometimes we have to sort of... We're like a team within the organization. Lots of times, you know, the projects that we work on, there is... There're sometimes incentives for us to be outsourcing work to vendors. And so, sort of convincing some of our, our internal clients that you should, you should go with us versus like this external company that machine learning is all that it does, or something. So, sometimes there's a little bit of that at play. We do a lot of stuff using Drupal for sort of data visualization, and things like that, actually.

We're having BOF on the charts module. We're both maintainers of the charts module, but we have some sites where we are uploading data, and we might want to have machine learning processing that data, and saying like, why did you put something that's over 100% for this? Or yeah, making some sorts of predictions about like... Based on the data that you've uploaded for this country so far, like where does it predict, predict we'll be seen, you know? Or when, when should we raise a flag about whether or not these data seem correct or not? So, I think those would definitely be things we wouldn't... We might look at in the near future.

SPEAKER:
That could definitely (INAUDIBLE), especially with the spreadsheets that happen to have (INAUDIBLE) specific interface. Like, everyone that I work with has, you know, all these kinds of different spreadsheets on their hard drives that you can use as data to... When you're repeating those sort of tasks, again, using that data to predict what could be going to the cell, or something like that, seems super beneficial.

DANIEL:
Yeah. We've been working on this one thing, so... And one of the projects we work on, it's a TB project, and we're working with the Kyrgyz Republic National TB Program. And the official sort of reports of the TB data are in these different formatted spreadsheets. But we've been wanting to upload that data to create dashboards. And the challenge is it's hard to process formatted Excel spreadsheets. So, Nia has created a software that processes them, but we're relying on a key file that's developed by one of our team members. And so, they were sort of like mapping these three cells together. The value of those equals this indicator. And then, you know... But sometimes the formula would be copy-pasted, or something else that makes it hard to get clean data. So, if we could have something that's looking at all those spreadsheets, and, and the way we've used the, the key file already, and just kind of gives us a validation check so that we're not pulling in the wrong data, that would be really useful.

SPEAKER:
Yeah.

NIA:
Actually, the other thing I can add to that, the machine learning sheets, the one I shared before, they... Like, I was actually reading a paper about it, where they actually... The bank will use it to try to see which bank transactions might be fraudulent because they already have data sheets that have... I think they had close to 3000 transactions, and some of them they were already marked as fraudulent. But then they, they kind of had to rely to, to, to someone to go through those, and start to mark them. But with that machine learning model, basically what they had is, they will make it run and it will generate that extra column in Google sheets, and start to show them like, yeah, this one has a... Because it will give you percentages and say, yeah, this has an 80% chance that it's actually, a, a fraudulent transaction. And then at least those ones they have... I think they kind of had the formatting where if it's more than 80%, it will mark that row in red. And then, then it's easier for human to see like the ones that are red, and they can basically, review them.

SPEAKER:
It's cool that we actually have a use case in our, in our company where, where we add a value commission for fraudulent charges or transactions. And they don't have this super frequently, but we do have a flagging mechanism so that people don't have to go through all of our regular data, that's good transactions to find the irregular ones. So, that's a cool use case that you just described.

NIA:
Yeah. The other thing, for example, like, I used to work for, before, for an e-commerce website. But if I were to go back and work in that area again, one actually, cool thing I can build is that... You can build, if not me, but you can use it with TensorFlow. I don't know if you went on a website and then where you have like your camera, you can stand up in front of your camera, and you can... Because there is actually, some machine learning models already built that can detect a part of your body. So, it knows like... You just go stand there, and it's gonna draw your arm... Like your forearm in red, this one in blue, and something like that. So, you can use that machine learning, for example, to, to put a t-shirt on someone. They just have to stand. And that's gonna be a cool use case, for example, if, for example, if someone select 'Large', because with large, probably you've already put the centimeters, or everything in your website. It can then detect how big that person is. And they say, yeah, it's like a large t-shirt.

It can, can work better on you.

SPEAKER:
Yeah. That's be a really cool feature.

NIA:
Yeah. (BACKGROUND CHATTER) But again, as I can see, it's like, that's why... For example, for me, everytime someone mentioned machine learning or AI, I always go back to TensorFlow because I know it has so much possibility without even relying on a third- party tool, that you would probably have to pay for. Yeah. (BACKGROUND CHATTER)

DANIEL:
Alright. I'm not seeing any more questions. So, I think we can wrap up. And if you need an extra cup of coffee or tea... But thank you so much for coming, and for all your questions and, and ideas.

SPEAKER:
OK. (CHIMING SOUND) (APPLAUSE)

DANIEL:
Do I need to press this again? (PUNCHING SOUND)

MidCamp 2023

AI to Help Content Managers with Tricky Knowledge Management Tasks

Description

Nia Kathoni

Daniel Cothran

DePaul University - Lincoln Park Student Center

Thank You to our Core Sponsors